Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Statistical downscaling with artificial neural network
(USC Thesis Other)
Statistical downscaling with artificial neural network
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Statistical Downscaling with Artificial Neural Network
by
Menglin Wang
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOSTATISTICS)
May 2023
Copyright 2023 Menglin Wang
Dedication
I dedicate this thesis to my parents and my wife,
for their never ending support and love.
ii
Acknowledgements
I would like to express my gratitude and respect to my advisor Dr. Meredith Franklin for her
guidance and support throughout my life at University of Southern California. She has been a
tremendous mentor and friend. I appreciate all the time and effort she has invested in me. She
introduced me in to the field of spatial science research and provided me great opportunities and
support to learn and grow. Personally, her kind encouragement and patient suggestions helped
me learn how to deal with academic stress. Getting a doctoral degree is usually full of pressure
and uncertainties, but her expertise, encouragement and patience lightened my Ph.D. life and the
confidence she has shown in me enabled me to keep going and never give up.
I am very thankful to my dissertation committee members: Dr. William Gauderman, Dr. Chun
Li, Dr. Juan Pablo Lewinger and Dr. Jose-Luis Ambite for their invaluable suggestions and con-
structive feedback. Their wisdom and expertise helped me refining my research plan as well as this
dissertation. In addition, Dr. Chun Li and Dr. Juan Pablo Lewinger’s courses brought me into the
world of machine learning/deep learning and their enthusiasm inspired me to delve into this field.
I would like to thank my friends in the Biostatistics program. Throughout our friendship, they
have always been there for me. Their kind support and encouragement helped me through some of
the most difficult time in my life. Working with these intelligent and hard-working people is also
delightful. All the great time we spend together made my life wonderful at USC.
I would like to thank my parents. They raised me up. They taught me to be a good man. They
encouraged me to pursue my dream. They did numerous things for me. Thank you, mom and dad.
A special thanks to my beloved wife, Yi Zhang, and our dog, Shiny. I feel lucky to have you by
my side. Your trust and love enable me to face any challenges and difficulties.
iii
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables vii
List of Figures viii
Abstract xi
Chapter 1: Introduction 1
1.1 Background of Downscaling Problem . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Traditional Downscaling and Its History . . . . . . . . . . . . . . . . . . . 2
1.1.2 A More General Definition of Downscaling . . . . . . . . . . . . . . . . . 3
1.1.3 The Target Problem: Supervised Downscaling . . . . . . . . . . . . . . . . 4
1.2 Motivation of Supervised Statistical Downscaling . . . . . . . . . . . . . . . . . . 5
1.2.1 Demand for High-resolution Climate Data . . . . . . . . . . . . . . . . . . 5
1.2.2 Difficulty of High-resolution Data Generation . . . . . . . . . . . . . . . . 6
1.2.3 Supervised Statistical Downscaling: Computation Advantage and More
Reliability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Challenges of Supervised Statistical Downscaling . . . . . . . . . . . . . . . . . . 7
1.3.1 Model Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Shortage in High-resolution Data . . . . . . . . . . . . . . . . . . . . . . 7
1.3.3 Utilization of Low-Resolution Data . . . . . . . . . . . . . . . . . . . . . 8
1.3.4 Data Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3.5 Evaluation of Downscaling Performance . . . . . . . . . . . . . . . . . . 8
1.4 A Review of Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Brief History of Artificial Neural Network . . . . . . . . . . . . . . . . . . 9
1.4.2 Artificial Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.3 Back-propagation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Current Statistical Downscaling Approaches Review . . . . . . . . . . . . . . . . 11
1.5.1 Constrained Statistical Downscaling . . . . . . . . . . . . . . . . . . . . . 12
1.5.2 Point-wise Statistical Downscaling . . . . . . . . . . . . . . . . . . . . . . 13
1.5.3 Image-wise Statistical Downscaling . . . . . . . . . . . . . . . . . . . . . 13
iv
Chapter 2: Downscaling with Artificial Neural Network Enhanced with Transfer Learn-
ing 14
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1.1 MERRA-2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1.2 G5NR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.1.3 GMTED2010 Elevation . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2 Downscaling Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.2.1 ASDM/ASDMTE Network Structure . . . . . . . . . . . . . . . 23
2.2.2.2 Transferred Model . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.2.2.3 Training Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2.2.4 Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.5 Appendix: Supplemental Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
Chapter 3: Downscaling with Variational Neural Network 36
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2 Materials and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.2.1 Variational Neural Network . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.2.2 Variational Downscaling Method . . . . . . . . . . . . . . . . . . . . . . 46
3.2.2.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.2.2.2 Downscaling Model . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.2.3 Training Strategy . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2.3 Downscaling and Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . 51
3.2.3.1 In-Data Downscaling Evaluation . . . . . . . . . . . . . . . . . 51
3.2.3.2 Out-Data Downscaling Evaluation . . . . . . . . . . . . . . . . 53
3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.3.1 In-Data Downscaling Results . . . . . . . . . . . . . . . . . . . . . . . . 55
3.3.2 Out-Data Downscaling Results . . . . . . . . . . . . . . . . . . . . . . . . 57
3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.5 Appendix: Supplemental Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
Chapter 4: Dust Air Pollution Effect on Mortality in Kuwait 77
Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.2 Material and Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.1 Study Domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
4.2.2 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.2.1 Mortality Data . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.2.2.2 Visibility and Temperature Data . . . . . . . . . . . . . . . . . . 80
v
4.2.2.3 Modern-Era Retrospective Analysis for Research and Applica-
tions, version 2 (MERRA-2) Dust Data . . . . . . . . . . . . . . 80
4.2.2.4 Downscaled Dust Data . . . . . . . . . . . . . . . . . . . . . . 81
4.2.3 Statistical Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.2 Effect Estimations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
Chapter 5: Conclusions and Further Directions 87
5.1 Strength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
5.2 Limitations and Further Directions . . . . . . . . . . . . . . . . . . . . . . . . . . 89
References 91
vi
List of Tables
2.1 Image-wise R
2
and RMSE of Downscaling Results. R
2
is presented as Max(Mean).
RMSE is presented as Mean (SD). . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.1 Test RMSE of VDM and ASDMTE in all seasons . . . . . . . . . . . . . . . . . . 56
4.1 Summary statistics (mean(SD)) of daily mortality, temperature, dust AOT across
Kuwait in 2007-2016, comparing dust-storm days against non-dust-storm days
with two-sample T test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 Summary statistics (mean(SD)) of daily mortality across Kuwait in 2007-2016 by
nationality and gender, comparing dust-storm days against non-dust-storm days
with two-sample T test. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.3 Estimated rate ratio (95% CI) of lagging dust AOT on mortality across Kuwait
during the study period 2007-2016 stratified by nationality and gender. The lag
dust AOT includes two-day moving average from lag 0 (same day as mortality) to
lag 5 (5 days prior to mortality) and the subscript denotes lag days involved. For
example, Dust
0,1
is the moving average of dust AOT at lag 0 and lag 1 . . . . . . . 84
vii
List of Figures
1.1 Target downscaling problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Schematic of a single neuron . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Schematic of a simple neural network . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1 Map of the study domain. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2 Overall Neural Network structure of ASDM/ASDMTE. The notation LSTM:8s
represents a LSTM layer with 8 nodes and return sequence. The notation of Build-
ing Block:8 represents a building block with 8 nodes. The light yellow block
represents using dropout layer with dropout rate of 0.5. The transfer Block is only
used in ASDMTE thus it is connected with dash lines. . . . . . . . . . . . . . . . 23
2.3 Neural Network structure of the transferred model. The notation LSTM: 8s repre-
sents a LSTM layer with 8 nodes and return sequence. The notation of Building
Block:8 represents a building block with 8 nodes. The light yellow block repre-
sents the dropout layer with dropout rate 0.5. . . . . . . . . . . . . . . . . . . . . 25
2.4 Temporal simplification and splitting for forward and backward prediction. . . . . 26
2.5 Sample images from G5NR (top-left) and MERRA-2 (top-right) on 29 July 2006;
temporal trend of image-wise RMSE (bottom-left) and R
2
(bottom-right) with
different lags (days). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.6 Downscaled (by method) and G5NR data over the study region on 29 July 2006:
a) ASDMTE, b) ASDM, c) SRDRN, d) dissever GAM, e) dissever LM, f) G5NR. . 29
viii
2.7 Downscaling performance of ASDM, ASDMTE, SRDRN, dissever GAM and dis-
sever LM in Season 1. Please refer to Figure 2.4 and Section 2.2.2.3 for the defi-
nition of Season 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.8 Downscaling performance of ASDM, ASDMTE, SRDRN, dissever GAM and dis-
sever LM in Season 2. Please refer to Figure 2.4 and Section 2.2.2.3 for the defi-
nition of Season 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.9 Downscaling performance of ASDM, ASDMTE, SRDRN, dissever GAM and dis-
sever LM in Season 3. Please refer to Figure 2.4 and Section 2.2.2.3 for the defi-
nition of Season 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.10 Downscaling performance of ASDM, ASDMTE, SRDRN, dissever GAM and dis-
sever LM in Season 4. Please refer to Figure 2.4 and Section 2.2.2.3 for the defi-
nition of Season 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.11 Downscaled (by method) and G5NR data over the study region on 23 October
2006: a) ASDMTE, b) ASDM, c) SRDRN, d) dissever GAM, e) dissever LM, f)
G5NR. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.1 Variational neural network(VNN) schematic . . . . . . . . . . . . . . . . . . . . . 45
3.2 Map of study domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Variational downscaling method network structure . . . . . . . . . . . . . . . . . . 49
3.4 Data temporal splitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.5 Dust Extinction AOT (550 nm) Data at 5/16/2005 from G5NR and MERRA2(before
and after pre-processing) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3.6 Overall image-wise RMSE of all seasons . . . . . . . . . . . . . . . . . . . . . . 55
3.7 Season 1 Semivariogram of G5NR and Downscaled Data at 8/2/2006 . . . . . . . 58
3.8 Season 1 Dust AOT Map from G5NR, VDM and ASDMTE at 8/2/2006 . . . . . . 59
3.9 VDM Downscaled Dust AOT Map at 5/15/2008 . . . . . . . . . . . . . . . . . . . 60
3.10 Semivariogram Comparison Between G5NR and Downscaled Data . . . . . . . . . 61
3.11 Temporal Auto-Correlation Plots of Kuwait . . . . . . . . . . . . . . . . . . . . . 62
ix
3.12 Season 2 Semivariogram of G5NR and Downscaled Data at 10/27/2006 . . . . . . 66
3.13 Season 2 Dust AOT Map from G5NR, VDM and ASDMTE at 10/27/2006 . . . . . 67
3.14 Season 3 Semivariogram of G5NR and Downscaled Data at 1/25/2007 . . . . . . 68
3.15 Season 3 Dust AOT Map from G5NR, VDM and ASDMTE at 1/25/2007 . . . . . 69
3.16 Season 4 Semivariogram of G5NR and Downscaled Data at 4/28/2007 . . . . . . 70
3.17 Season 4 Dust AOT Map from G5NR, VDM and ASDMTE at 4/28/2007 . . . . . 71
3.18 Temporal Auto-Correlation Plot of Afghanistan . . . . . . . . . . . . . . . . . . . 72
3.19 Temporal Auto-Correlation Plots of United Arab Emirates . . . . . . . . . . . . . 73
3.20 Temporal Auto-Correlation Plots of Iraq . . . . . . . . . . . . . . . . . . . . . . . 74
3.21 Temporal Auto-Correlation Plots of Qatar . . . . . . . . . . . . . . . . . . . . . . 75
3.22 Temporal Auto-Correlation Plots of Saudi Arabia . . . . . . . . . . . . . . . . . . 76
4.1 Study domain: Kuwait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
x
Abstract
This dust-caused air pollution is becoming an dominant health concern for Southwest Asian. How-
ever, there is very limited amount of air quality data over this region to support environmental
health research. General Circulation Model(GCM) can provide estimations for unobserved area in
low spatial resolution. As the development of assimilation products and computation power, GCM
also produced air pollution data in higher resolution, but in limited temporal range due to computa-
tional cost. At the same time, air pollution data with both high spatial resolution and long temporal
range are essential for conducting air quality studies and assessing the health effects associated
with exposure to air pollution. Statistical downscaling is commonly used as a computationally
efficient approach to generate high-resolution air quality data based on GCM output. But tradi-
tional statistical downscaling methods are usually developed to solve the resolution mismatching
and cannot utilize the recently available high-resolution data. In this dissertation, we aim at devel-
oping novel Artificial Neural Network(ANN)-based methods to address supervised downscaling
problem, that is downscale low-resolution GCM output along with high-resolution GCM product
in a supervised format to improve downscaling performance and reliability. With these methods,
we would like to generate trustworthy high-resolution air quality data for Southwest Asian area
and promote the downstream environmental health research.
This dissertation is structured as follow: Chapter 1: introduces downscaling problem and
current approaches. Chapter 2: describes a deep learning approach for supervised downscaling
problem with transfer learning enhancement, Artificial Neural Network Sequential Downscaling
Method (ASDM) with Transfer Learning Enhancement (ASDMTE) . Chapter 3: proposes an novel
generative deep model, Variational Neural Network(VNN), and a statistical downscaling approach
xi
that based on VNN called Variational Downscaling Method(VDM). Chapter 4 presents a epidemi-
ological application to investigate air pollution health effect on mortality in Kuwait using GCM-
produced dust data and its VDM downscaled data. Finally, Chapter 5 summarizes the strength and
limitation of my current work and discusses possible further directions.
xii
Chapter 1
Introduction
Air pollution caused by dust is becoming a significant health concern in Southwest Asia, but there
is a lack of air quality data in the region to support research on environment that can promote
resident health. The General Circulation Model (GCM) can provide estimates for areas that are
not observed, but only at low spatial resolution. With the development of assimilation products
and computational power, the GCM can now produce air pollution data at higher resolution, but in
short temporal range limited by computational cost. However, air quality studies require data with
both high spatial resolution and long temporal range. To generate this data, statistical downscal-
ing is commonly used, but traditional methods cannot take advantage of high-resolution data that
has become available. This dissertation aims to develop novel Artificial Neural Network (ANN)-
based methods to address this problem by downscaling low-resolution GCM output along with
high-resolution GCM data in a supervised format. By doing so, we hope to generate reliable,
high-resolution air quality data for the Southwest Asian region, which will promote environmental
health research.
This chapter is structured as follow: Section 1.1 presents the background of general down-
scaling problem, including its history, different definitions as developing and our target supervised
statistical downscaling problem. Section 1.2 introduces why we need supervised statistical down-
scaling and its advantages over traditional statistical downscaling approaches. Section 1.3 lists the
challenges of supervised statistical downscaling that we need to address. Section 1.4 provides a
1
brief review of artificial neural network(ANN) and Section 1.5 presents some current ANN-based
downscaling methods.
1.1 Background of Downscaling Problem
1.1.1 Traditional Downscaling and Its History
Downscaling is a relatively young science, since it is rooting from the demand to generate local
scale information from general circulation models(GCMs), which themselves is the result of re-
cent development in the climate science community [1, 2]. GCMs is a type of climate model that
utilized complex mathematical models to simulate the general circulation in the Earth atmosphere.
Even thought, GCMs can provide essential information for climate research, but part of their use-
fulness is restricted by their coarse spatial and temporal resolution [3]. To make inference about
local climate using large scale observations from GCMs, downscaling techniques were developed.
Due to the various demands of different research communities, downscaling may refer to different
problems and definitions.
By Benestad, R. E., Chen, D. et [1], downscaling can be defined as the process of making the
link between the state of some variable representing a large space and the state of some variable
representing a much smaller space. In Wilby, R. L., & Wigley, T. M.’s work [4], downscaling is a
means of interpolating regional-scale atmospheric predictor variables to station-scale meteorolog-
ical series. The variable that represent large scale can be the circulation pattern over a broad area,
while the small-scale variable can be a series of temperature measurements obtained from a station.
For instance, in Khan, M. S., Coulibaly, P. et’s work [5], they compared the uncertainty assessment
of the downscaling results generated from three statistical downscaling methods, Statistical Down-
Scaling Model (SDSM), Long Ashton Research Station Weather Generator (LARS-WG) model
and Artificial Neural Network (ANN) model. The target associations of temperature and precipi-
tation are established between gridded data over the study domain called Chute-du-diable and the
2
corresponding variable measured at two meteorological stations within the study domain. Gen-
erally speaking, traditional downscaling problem is aiming to establish the association between
gridded large-scale variable and a smaller-scale variable at a specific location (area-to-point).
1.1.2 A More General Definition of Downscaling
As the development of downscaling methods, they are also applied in other fields, like remote
sensing, to solve a more general problem. In Atkinson, P. M.’s review of downscaling methods
in remote sensing [6], the downscaing problem is defined in a more general setting, that is an
increase in spatial resolution [7]. A variety of methods has been developed to solve this general
downscaling problem. For example, Wang, F., Tian, D. et [8] recently developed a deep learning
based downscaling method called Super Resolution Deep Residual Network(SRDRN) to generate
high-resolution precipitation and temperature based on low-resolution data. Compare to the area-
to-point downscaling problem above, this more general downscaling problem is trying to establish
the link between low- and high-resolution data (area-to-area).
The traditional downscaling problem in Section 1.1.1 is to establish the association between
large scale variable and a local scale variable at a specific location that is covered by the large scale
variable (area-to-point). While the more general definition of downscaling problem in Section 1.1.2
aiming to find a link between low-resolution observations and high-resolution observations(area-
to-area), that is the between-scale relationship with which we can generate high-resolution data.
The high-resolution downscaled data is usually utilized by the downstream application and solve
the resolution mismatching problem.
However, in most case, due to the lack of high-resolution target data, there is no ”True” high-
resolution data can be used to validate the downscaled output. In other words, the area-to-area
downscaling techniques are usually used as data preprocessing procedure to solve the resolution
mismatching that is to make the low-resolution data to match finer resolution requirement. For
instance, in Li, L., Franklin, M. et’s work [9], they downscaled the MERRA-2 GMI Replay Sim-
ulation (M2GMI) AOD data from 50 km resolution to 1 km resolution using a autoencoder-based
3
deep residual network to match the consequence resolution requirement. Malone, B. P., McBrat-
ney, A. B. et also developed a general method for downscaling earth resource information, called
dissever [10], which iteratively fit a regression model, like generative additive model(GAM), and
adjustment attempts to optimise the downscaling to ensure that the target variable value given for
each coarse grid cell equals the average of all target variable values at the fine scale in each coarse
grid cell.
The area-to-area downscaling methods can resolve the data mismatching problem by gener-
ating high-resolution data that matches the downstream research requirements. However, these
approaches are solving the problem in a unsupervised format without using HR data as validation
and the evaluation of the output quality seems intractable.
1.1.3 The Target Problem: Supervised Downscaling
In the context of this dissertation, the downscaling problem we are aiming to solve is refer to su-
pervised statistical downscaling problem, that is generating high-resolution data based on relevant
low-resolution data (area-to-area) and using true high-resolution data as the ground truth. Since the
setting of our target problem involved the ”true” HR data which is in a supervised learning form,
for convenience, the target problem will be refer as supervise downscaling. The simplified problem
format is shown in Figure 1.1. We have a relatively longer time of low-resolution(LR) data that
covers broader period of time with low spatial resolution and a short period high-resolution(HR)
data. LR and HR data spatially covers the same area and with some temporal overlapping. My
dissertation is intending to develop robust ANN-based methods to downscale LR data to higher
resolution and using the available HR data as the target output to validate against the downscaled
data.
4
Figure 1.1: Target downscaling problem.
1.2 Motivation of Supervised Statistical Downscaling
1.2.1 Demand for High-resolution Climate Data
Data with high spatial resolution is more informative for its downstream research, especially the re-
search focusing on local scale. For instance, high resolution aerosol data can provide essential sup-
port for air quality studies [11] and downstream health-related research. Over the past few years,
satellite-based aerosol optical depth (AOD) has been used for this purpose, primarily to estimate
PM
2.5
at fine spatial resolutions [12–14]. Satellite AOD-derived PM
2.5
estimates have been used to
examine health outcomes including respiratory [15–17] and cardiovascular [18] diseases. Recent
advances in data assimilation products including the NASA Modern-Era Retrospective Analysis
for Research and Applications, version 2 (MERRA-2) provide complete surfaces of AOD and re-
lated aerosol products, but their spatial resolutions are generally quite coarse. Downscaling from
low to high resolution can provide another possible approach to use these relatively low resolution
data.
5
1.2.2 Difficulty of High-resolution Data Generation
The computational cost of high-resolution data is usually expensive. For instance, Goddard Earth
Observing System (GEOS-5) Nature Run (G5NR) is a two-year (5/16/2005 - 5/15/2007) non-
hydrostatic 7 km global mesoscale simulation produced by the NASA Goddard Earth Observing
System (GEOS-5) atmospheric general circulation model[19]. The total output of G5NR is nearly
4 Petabytes and it took over 75 days of dedicated computation on the Discover cluster at NASA
Center for Climate Simulation (NCCS). This high computational cost restricted the temporal range
of G5NR and only two-years of data is simulated.
Another method that can generate finer resolution data from GCMs output is dynamical down-
scaling. As GCMs usually produce climate data with coarse resolution, dynamical downscaling
methods use high-resolution regional simulations to dynamically extrapolate the effects of large-
scale climate processes to finer scales [20]. However, dynamical downscaling suffers from the
same high computation cost problem with high-resolution GCMs, since it takes the coarse gridded
GCM output as condition and computationally simulates the local climate.
These simulation-based methods lead to expensive high-resolution data that discourage the
relative research requiring these data. A computational efficient approach to generate HR data is
in demand.
1.2.3 Supervised Statistical Downscaling: Computation Advantage and More
Reliability
Statistical downscaling is an approach require much less computation resources, since it is un-
necessary to computationally simulate the whole atmosphere. The basic approach of statistical
downscaling is that the local (smaller) scale variable is conditioned by a coarse (larger) scale vari-
able and local features, like topography and land-sea distribution [21]. In this perspective, the
local-scale variable can be predicted with an empirical association that relates the coarse-scale
variables (predictors) and local-scale variables (predictands).
6
Supervised statistical downscaling inherits the feature from statistical downscaling of using
statistical model rather than simulation to summarize and generate data, which means it shares all
computational advantages of statistical downscaling. It can also provide numerical evaluation of
the output quality which make the output more reliable. In addition, the ability of using recently
available high-resolution data enable it to have more information for downscaling and this usually
lead to better performance. However, we have to solve several challenges in order to leverage these
advantage.
1.3 Challenges of Supervised Statistical Downscaling
1.3.1 Model Complexity
Supervised statistical downscaling intend to find the association between low- and high-resolution
data in order to acquire downscaled data. In nature, we are trying to use a statistical model to
capture the characteristics of complex physical process in atmosphere and replace the intense
computation in simulation. This model is a simplification of all local climate process, as well
as the complex simulation. Thus, we usually need a model that is flexible enough to capture the
necessary characteristics of this between-scale association. Artificial neural network(ANN), as a
powerful non-linear approximator is a great fit for this complex modeling job.
1.3.2 Shortage in High-resolution Data
The expensive cost of HR data make it less available for downscaling in temporal range (Figure
1.1 and Section 1.2.2). On the contrast, LR data is easier to generate and would have longer
period of time coverage. However, the overlapping time that we have both LR data and HR data is
constrained by the length of HR data, which means during training, the length of HR data would
be the bottle neck. Utilizing limited high-resolution data is essential for supervised downscaling.
7
1.3.3 Utilization of Low-Resolution Data
The low cost and long time period of LR data make it a rich and informative data resource. But in
the supervised downscaling problem, only the part that overlap with HR data can serve as the input
data to model the association between low- and high-scales and the LR data out of the overlapping
range is difficult to be fully utilized. Make use of the LR data out of the overlapping range is a
difficult challenge of supervised downscaling, but it can help improving downscaling performance.
1.3.4 Data Matching
In the supervise downscaling problem, HR data and LR data are usually not directly observed,
since the cost and difficulty to obtain gridded observation is unacceptable. Instead, these data
are simulated by different assimilation algorithms, like GCMs, which make them not necessarily
point-to-point matching. Specifically, the mean of the HR grid values is not exactly equal to
coincident coarse grid values in LR data. The mismatching across different scale data sources make
some traditional downscaling techniques that assuming spatial value consistency, like dissever[10],
impractical in supervised downscaling problem. It may also cause the non-identifiable model
problem, since the same coarse gridded value may have different corresponding sub-gridded values
and this problem would become even harder to solve with limited data.
1.3.5 Evaluation of Downscaling Performance
Evaluation of downscaling performance is another difficulty. The shortage of HR data make not
only the modeling harder, but also the validation of the model performance a tough problem.
Within the overlapping period, the more data we use to validate, the less data we have for training,
which makes the decision of train/valid split important. Along with the data matching challenges
in Section 1.3.4, error-based evaluations like Mean Squared Error(MSE) cannot fully reflect the
downscaling performance. In addition, evaluation of how the downscaling model performs out of
the range of HR data is also hard, since we do not have any data that serve as the ground truth.
8
1.4 A Review of Artificial Neural Network
Artificial Neural Network(ANN) is a powerful tool for non-linear function approximation and it
has achieved great success in recent years along with the increasing computation capability. It also
provides an influential approach for the development of Artificial Intelligence(AI)[22]. Nowadays
we have had numerous novel methods and tools that based on ANN. In this section, I am not
intending to review ANN comprehensively by no means, rather, I would like to provide a brief
introduction to ANN so that the reader can have enough background to continue reviewing my
dissertation.
1.4.1 Brief History of Artificial Neural Network
In the last decade, we have seen a great number of interesting ANN-based methods and applica-
tions, like GoogLeNet[23], ResNet [24] and U-Net [25], but the history of ANN is surprisingly
long. The research of ANN experienced three stages[26]. The first research peak (1940s - 1960s)
started with McCulloch, Warren S., and Walter Pitts’ work ”A logical calculus of the idea im-
manent in nervous activity” in 1943 [27]. ANN or deep learning [28] at that time was called
Cybernetics. The first neural network model (perceptron) that had a single neuron was developed
by Rosenblatt in 1958 [29]. The second wave was in 1980s to 1990s. Rumelhart et al.’s work
(1986)[30], back-propagation, enabled the training of simple neural networks with several hidden
layers. There is no fundamental difference between the models at that time and nowadays. The
dominant limitations was computational power and data availability. The last stage is from 2000
to current. As the significant decreasing of computation and storage cost, as well as the massive
amount of data, we are able to train deep ANN models with many hidden layers and millions of
parameters. These large models showed incredible performance in many fields [31] [32] [33].
9
1.4.2 Artificial Neural Network
All artificial neural networks are made up of neurons, also known as nodes, shown in Figure 1.2.
Each neuron start with a linear combination of all its input, X
1
,X
2
,...,X
n
and their corresponding
weights w
1
,w
2
,...,w
n
, such that Z =∑
n
i=1
w
i
X
i
and is followed by an activation function f , such
that the output Y = f(Z). The intercept termβ
0
is omitted here for simplicity. Logistic regression,
for instance, can be viewed as a single neuron using logistic function as its activation function.
Figure 1.2: Schematic of a single neuron
Artificial Neural Networks(ANNs) are comprised of layers, including an input layer, one or
more hidden layers and an output layer. Each hidden layer has several neurons. Figure 1.3 shows
a simple ANN with one hidden layer. The green circles in Figure 1.3 are neurons similar to Figure
1.2.
We can easily increase the complexity of a neural network by using more neurons in hidden
layer and/or stacking more hidden layers. At the same time, the activation function of each neuron
is usually non-linear which allow the network to approximate arbitrary continuous function when
the network complexity is enough [34]. With massive amount of data and computation resources,
this feature enable neural networks to achieve surprising performance.
10
Figure 1.3: Schematic of a simple neural network
1.4.3 Back-propagation
Another important problem we need to cover in order to understand ANN is: how do ANNs learn?
As we know, learning problem is roughly equivalent to an optimization problem. For instance, we
can get the parameters for linear regression model by select the parameters that can minimize the
mean squared error(MSE). Commonly used optimization algorithms are usually gradient descent
and its variations. Back-propagation is not an exception and it is also based on gradient descent.
At first glance, calculating gradients for all hidden layers seems daunting, but this problem has a
surprisingly simple solution, the chain rule[35]. Back-propagation [30] works by computing the
loss function gradient with respect to each weight one layer at a time and it iterate backward from
the output layer to the input layer.
1.5 Current Statistical Downscaling Approaches Review
In the era of data, as the development of satellite remote sensing technique, information obtained
from remote sensing studies is growing dramatically. However, these data products are typically
11
not suitable for local/regional studies due to their coarse spatial resolution. Consequently, the
demand for finer resolution data that can provide more detail and regional information is even
stronger than before. In recent years, various statistical downscaling approaches have been devel-
oped to increase data resolution.
The purpose of this section is to provide a brief review of current statistical downscaling ap-
proaches to generate high-resolution data. These methods are categorized into three groups, includ-
ing constrained statistical downscaling, point-wise statistical downscaling and image-wise statisti-
cal downscaling. Constrained downscaling methods take the coarse-scale observation as constrain
condition for the fine-scale variables and then downscale based on this condition. Point-wise ap-
proach is aiming to establish the point-to-point relationship between fine-scale observations and
coarse-scale observations using regression methods. Image-wise approach is recently developed
based on deep learning. These methods treated the low-resolution and high-resolution data as
images and trying to find a transfer function between images with different resolutions.
1.5.1 Constrained Statistical Downscaling
In constrained statistical downscaling, the fine-scale downscaled data obtained from a model, for
instance GAM, is optimized such that the coarse-scale value can be well approximated by the
average of the corresponding spatially covered fine-scale values. This approach has a strong as-
sumption between the low- and high-resolution data, that is they should be spatially matching.
Constrained statistical downscaling can solve resolution mismatching while maintain the con-
sistency between scales. Several approaches were developed recently. Ines, A. V ., Mohanty, B.
P. et optimized the difference between low-resolution soil moisture and the average of the corre-
sponding high-resolution simulated soil moisture values by genetic algorithm approach to extract
sub-grid information of soil and vegetation[36]. Malone, B. P., McBratney, A. B. et proposed a
general frame work of downscaling, called dissever [10]. This framework was designed to fa-
cilitate a generalised method for downscaling coarse-resolution information using available finely
gridded covariate data. dissever iteratively fit a regression mode that link target fine resolution
12
variable with coarse scale variable and fine scale covariates by constrain the spatiall value consis-
tency between low- and high-resolution target data. The similar train strategy is used by Li, L.,
Franklin, M. et to downscale Modern-Era Retrospective analysis for Research and Applications
Version 2 (MERRA-2) GMI Replay Simulation (M2GMI) AOD data from 50 km to 1 km [9].
1.5.2 Point-wise Statistical Downscaling
In point-wise statistical downscaling, a statistical model is used to describe the relationship be-
tween each individual fine-scale grid point and its overlapping coarse-scale value. By this regres-
sion, each coarse-scale data point can be disaggregated to finer scale. For instance, Loew, A.and
Mauser, W. modeled the associations between each fine-scale soil moisture and its corresponding
low-scale observation using linear regression[37]. Since each linear regression model is fitted in-
dependently to each fine scale grid value, the models are dependent on location specific conditions,
like slope, height and land cover. Van den Berg et al. proposed a framework which allows for the
derivation of the modeled cumulative distribution function of the fine-scale rainfall depth, given a
coarse-scale rainfall depth [38].
1.5.3 Image-wise Statistical Downscaling
Imag-wise statistical downscaling is intending to model the image-wise association between low-
resolution data and high-resolution data. This problem is similar to image transformation in Com-
puter Vision field and thus many techniques that has achieved great success in computer vision has
been borrowed and applied to solve downscaling problems. For instance, the convolutional neu-
ral network (CNN) is a popular method for downscaling due to its ability to learn spatial features
from large gridded data [39]. Recently, Wang F., Tian D. et [8] developed a CNN-based method,
Super Resolution Deep Residual Network (SRDRN), to learn the between scale relationship of
precipitation and temperature and showed satisfactory downscaling performance.
13
Chapter 2
Downscaling with Artificial Neural Network Enhanced with
Transfer Learning
Abstract
Spatially and temporally resolved aerosol data are essential for conducting air quality studies
and assessing the health effects associated with exposure to air pollution. As these data are of-
ten expensive to acquire and time consuming to estimate, computationally efficient methods are
desirable. When coarse-scale data or imagery are available, fine-scale data can be generated
through downscaling methods. We developed an Artificial Neural Network Sequential Downscal-
ing Method (ASDM) with Transfer Learning Enhancement (ASDMTE) to translate time-series
data from coarse- to fine-scale while maintaining between-scale empirical associations as well as
inherent within-scale correlations. Using assimilated aerosol optical depth (AOD) from the GEOS-
5 Nature Run (G5NR) (2 years, daily, 7 km resolution) and Modern-Era Retrospective analysis
for Research and Applications, Version 2 (MERRA-2) (20 years, daily, 50 km resolution), cou-
pled with elevation (1 km resolution), we demonstrate the downscaling capability of ASDM and
ASDMTE and compare their performances against a deep learning downscaling method, Super
Resolution Deep Residual Network (SRDRN), and a traditional statistical downscaling framework
called dissever
14
ASDM/ASDMTE utilizes empirical between-scale associations, and accounts for within-scale
temporal associations in the fine-scale data. In addition, within-scale temporal associations in the
coarse-scale data are integrated into the ASDMTE model through the use of transfer learning to
enhance downscaling performance. These features enable ASDM/ASDMTE to be trained on short
periods of data yet achieve a good downscaling performance on a longer time-series. Among
all the test sets, ASDM and ASDMTE had mean maximum image-wise R
2
of 0.735 and 0.758,
respectively, while SRDRN, dissever GAM and dissever LM had mean maximum image-wise R
2
of 0.313, 0.106 and 0.095, respectively.
15
2.1 Introduction
Fine-scale aerosol data provide essential support for air quality studies [11] and downstream health-
related applications. Over the past several years, satellite-based aerosol optical depth (AOD) has
been used for this purpose, primarily to estimate PM
2.5
surfaces at fine spatial scales [12–14].
Satellite AOD-derived PM
2.5
estimates have been used to examine health outcomes including res-
piratory [15–17] and cardiovascular [18] diseases. Generating fine-scale PM
2.5
from satellite AOD
has several limitations including missing data due to cloud cover and bright surfaces [40], and it
requires complex statistical or machine learning techniques that incorporate multiple external data
sources [41].
Our study region encompasses several countries across Southwest Asia (Afghanistan, Iraq,
Kuwait, Saudi Arabia, United Arab Emirates, and Qatar (Figure 2.1)), which is known for its ex-
treme dry and hot hyper-arid climate. This unique environment, in addition to increased economic
development and urbanization, makes both naturally and anthropogenically occurring air pollution
a concern [42]. This region is also the basis of a larger research initiative assessing the impact of
air quality on the health of military personnel that were deployed in the region during post 9/11
wars [43, 44]. As there is very little ground-level air quality monitoring in the region, having
fine-scale aerosol data is an asset to support air pollution related research.
Recent advances in data assimilation products provide a source of AOD data, with the NASA
Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) drawing
intensive research interest since it provides complete surfaces of AOD and related aerosol products
globally from 1980 onward. Given its long time range of available data, Sun et al. (2019) [45]
analyzed the spatial distribution and temporal variation of MERRA-2 AOD over China from 1980
to 2017. Ukhov et al. (2020) [46] used MERRA-2 AOD to assess natural and anthropogenic air
pollution over the Middle East. However, the spatial resolution of MERRA-2 data is quite coarse
(∼ 50 km) which limits its application for local-scale research. In contrast, the Goddard Earth
Observing System Model, Version 5 (GEOS-5) Nature Run (G5NR) [47] can provide AOD data
in finer resolution ( ∼ 7 km). G5NR is a global non-hydrostatic mesoscale simulation performed by
16
the GEOS-5 Atmospheric General Circulation Model (GEOS-5 AGCM) and driven by prescribed
sea-surface temperature, sea ice, surface emissions and uptake of aerosols and trace gases [47].
However, G5NR is only available for two-years (2005–2007) due to its high computational cost
which restricted its research potential.
Statistical downscaling from coarse- to fine-scale is a computationally efficient solution to
generate fine-scale aerosol data, which can take advantage of both the long temporal range of
MERRA-2 AOD and the fine spatial resolution of G5NR AOD. Statistical downscaling was de-
veloped primarily to generate finer spatial scale climate information from General Circulation
Models (GCMs) [48], and these techniques have also been applied to remote sensing data [6, 10,
13, 49, 50]. The basic approach to statistical downscaling is that the fine (smaller) scale variable
is conditioned by a coarse (larger) scale variable and local features, like topography and land-sea
distribution [21]. In this perspective, the fine-scale variable can be predicted with an empirical as-
sociation that relates the coarse-scale variables (predictors) and fine-scale variables (predictands).
For instance, dissever is a general framework for downscaling earth resource information [10].
It uses an iterative algorithm to fit regression models between coarse- and fine-scale variables in
order to optimize downscaling by ensuring the value of each coarse grid is equal to the mean of
fine-scale values that are spatially covered by the corresponding coarse grid.
Deep learning [51] has surpassed traditional statistical approaches with considerable perfor-
mance improvements, and has thus been used in a variety of remote sensing data applications [52].
The convolutional neural network (CNN) is a popular method for downscaling due to its ability to
learn spatial features from large gridded data [39]. Recently, a CNN-based model called the Super
Resolution Deep Residual Network (SRDRN), which utilized convolutional layers and residual
networks, was developed to downscale daily precipitation and temperature [8]. Autoencoder-like
models with residual connections and parameter sharing have also been used to downscale by in-
corporating an iterative training strategy to force spatial value consistency [9]. Networks with
transfer learning have been used in a spatial context to generalize the empirical associations within
17
one region to apply downscaling in a different region, showing notable improvement compared to
classical statistical downscaling methods [8, 53].
We propose an artificial neural network (ANN) [54] sequential downscaling method (ASDM)
with transfer learning enhancement (ASDMTE). ASDM/ASDMTE utilizes empirical between-
scale associations, and accounts for inherent within-scale temporal associations among fine-scale
data. In addition, within-scale temporal associations in the coarse-scale data being downscaled are
integrated into the ASDMTE model through the use of transfer learning to enhance downscaling
performance.
Under the ASDM framework, the fine-scale variable can be modeled as a non-linear function
of coarse-scale variable, with a sequence of temporally lagging fine-scale variables at the same
location adjusting for geographic information (e.g., elevation), time (day of the year) and location
(latitude, longitude). To enhance the performance of ASDM, transfer learning can be incorporated
where another similar sequential ANN model is trained on the long time series of coarse-scale data
to learn its inherent temporal associations; this model is then transferred into ASDM to enhance
its downscaling performance.
We developed ASDM/ASDMTE models to downscale AOD data obtained from the Modern-
Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2), a satellite-
based reanalysis product produced by NASA’s Global Modeling and Assimilation Office (GMAO).
MERRA-2 data are available for a long period (1980–present) at relatively coarse scale (∼ 50 km).
The target for downscaling was fine-scale ( ∼ 7 km) AOD from the Goddard Earth Observing Sys-
tem Model, Version 5 (GEOS-5) Nature Run (G5NR), another satellite-based product [47]. At this
resolution, G5NR is an informative data source for understanding local-scale air quality and as an
exposure metric for health effects studies, but it is limited in temporal range (2005–2007), which
restricts its broad use for long-term studies. As the fine-scale G5NR data has limited temporal
range (2 years of daily data), it was difficult to build stable empirical associations needed for
traditional statistical downscaling that link large-scale variables with local-scale variables. Fur-
thermore, little external or covariate information were available at fine scales that could help with
18
traditional downscaling. These limitations made it impractical to establish between-scale empirical
associations without other prior knowledge, particularly since the single coarse-scale variable did
not have enough spatial variability to predict the fine-scale variable. Lastly, even though G5NR
and MERRA-2 provide the same variables over the same region and period of time, they are in-
dependent datasets that do not match on a point-to-point basis due to algorithmic differences [47].
Specifically, the mean of the G5NR 7 km grid values is not exactly equal to its coincident MERRA-
2 50 km coarse grid value.
We applied our ASDM and ASDMTE downscaling approaches to G5NR and MERRA-2 data
for several countries in Southwest Asia (Figure 2.1). ASDM/ASDMTE performances were com-
pared with a deep learning downscaling method, Super Resolution Deep Residual Network (SR-
DRN) and a traditional statistical downscaling methods in the dissever framework including gen-
eralized additive models (GAM), and linear regression model (LM) over the same study domain
and period.
Figure 2.1: Map of the study domain.
19
2.2 Materials and Methods
2.2.1 Data
2.2.1.1 MERRA-2
The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) is
a multi-decadal atmospheric reanalysis product produced by NASA’s Global Modeling and As-
similation Office (GMAO) [55]. Using the Goddard Earth Observing System, version 5 (GEOS-5)
[55], of which the key components are an atmospheric model [56, 57] and Gridpoint Statistical In-
terpolation (GSI) analysis scheme [58, 59], MERRA-2 assimilates AOD from various ground- and
space-based remote sensing platforms [60] and uses an aerosol module to simulate 15 externally
aerosol mass mixing ratio tracers [61].
We used Total Aerosol Extinction AOD 550 nm (AOD) [62]. While the MERRA-2 data are
available from 1980 forward, our study period was 16 May 2000–15 May 2018. MERRA-2 AOD
data has 0.625
◦ longitudinal resolution, 0.5
◦ latitudinal resolution (∼ 50 km) and daily tempo-
ral resolution.
2.2.1.2 G5NR
GEOS-5 Nature Run (G5NR) is a two-year (16 May 2005–15 May 2007) non-hydrostatic 7 km
global mesoscale simulation also produced by the GEOS-5 atmospheric general circulation model [19].
Its development was motivated by the observing system simulation experiment (OSSE) commu-
nity for a high-resolution sequel to the existing Nature Run, European Centre for Medium-Range
Weather Forecasts (ECMWF). Like MERRA-2, G5NR includes 15 aerosol tracers [47]. It simu-
lates its own weather system around the Earth which is constrained only by surface boundary con-
ditions for sea-surface temperatures, the burning emissions of sea-ice, daily volcanic and biomass
and high-resolution inventories of anthropogenic sources [19]. In this study we focused on all
two years of the available G5NR Total Aerosol Extinction AOD 550 nm, which had 0.0625
◦ grid
resolution (∼ 7 km) and daily temporal resolution.
20
2.2.1.3 GMTED2010 Elevation
The Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) is a global elevation
model developed by the U.S. Geological Survey and the National Geospatial-Intelligence Agency [63].
The data are available at three separate resolutions (horizontal post spacing) of 30 arc-seconds (∼ 1
km), 15 arc-seconds (∼ 500 m), and 7.5 arc-seconds (∼ 250 m) [64]. We used the 30 arc-seconds
resolution data and spatially averaged to match the∼ 7 km G5NR grid.
2.2.2 Downscaling Model
We propose an Artificial Neural Network Sequential Downscaling Method (ASDM) with Transfer
Learning Enhancement (ASDMTE) to generate fine-scale (FS) data from coarse-scale (CS) data.
The method can be formulated as follows:
Let y
i, j,t
denote the FS AOD referenced at i, j,t, where i∈{1,2,··· ,h}, j∈{1,2,··· ,w},
t∈{1,2,...,d}; h and w index latitude and longitude over the study domain and d is the time index.
Similarly, we define the CS AOD referenced at x
i
′
, j
′
,t
′, where i
′
∈{1,2,··· ,h
′
}, j
′
∈{1,2,··· ,w
′
},
t
′
∈{1,2,··· ,d
′
}; h
′
, w
′
and d
′
are latitude, longitude and time indices, respectively. Although the
CS data have a longer overall period of temporal coverage, the FS and CS data have the same time
step (day).
The estimated downscaling model
ˆ
f can then be denoted as:
y
i, j,t
=
ˆ
f(y y y
(i, j,t− 1),n
,x
i
′
, j
′
,t
,Ele
i, j
,Lat
i
,Lon
j
,Day
t
)
y y y
(i, j,t− 1),n
= y
i, j,t− 1
,··· ,y
i, j,t− n
(2.1)
where Ele
i, j
,Lat
i
,Lon
j
,Day
t
are elevation, latitude, longitude and day of the year at i, j,t, respec-
tively; x
i
′
, j
′
,t
represents CS AOD that spatially covers y
i, j,t
(at the same time t); y y y
(i, j,t− 1),n
is a list
of n temporal lagging variables at location i, j.
Through
ˆ
f, we not only learned empirical associations between the CS and FS variables, x
i
′
, j
′
,t
and y
i, j,t
, but also short-term temporal associations within the FS data by including n= 25 time
21
lags of the fine-scale variables, y y y
(i, j,t− 1),n
. In the model we also adjusted for location (latitude,
Lat
i
and longitude, Lon
j
), long-term time (day of the year, Day
t
), and geographic information
(elevation, Ele
i, j
) making
ˆ
f a function of space and time. This also enabled the use of data at
different locations and times to train our model, which provided more information for training and
partially alleviated the issue of having limited overlapping (in time) data. The larger the spatial
area and temporal range, the more data we had for training; however, at the same time, the model
ˆ
f became more complex. This increasing complexity in the target model is equivalent to adding
difficulty in the learning process, thus we made the decision to trade off between data availability
and model complexity.
To enhance the performance of
ˆ
f, we incorporated transfer learning [65] into ASDM. Machine
learning methods traditionally solve isolated tasks from scratch, which make them data hungry.
Transfer learning attempts to solve this problem by developing methods to transfer knowledge
learned in other sources and use it to improve the learning performance in a related target task [66].
The formal definition of transfer learning can be expressed as [65]:
Definition 2.2.1 (Transfer Learning) Given a source domainD
S
and learning taskT
S
, a target
domainD
T
and learning taskT
T
, transfer learning aims to help improve the learning of the target
predictive function h
T
(·) inD
T
using the knowledge inD
S
andT
S
, whereD
S
̸=D
T
, orT
S
̸=T
T
Transfer learning allows us to learn certain patterns within one dataset that can be applied to
another. Since coarse-scale data are usually cheaper to obtain and more available, we can use
inherent knowledge learned within them to improve the predictive performance of
ˆ
f. Thus, to
make use of the spatiotemporal associations within the CS data, a transfer model was trained on
CS data to learn the inherent mapping function ˆ g and, consequently, the model ˆ g was transferred
into the ASDM/ASDMTE. The transfer integration of the ASDMTE network structure is shown in
Figure 2.2. The learned inherent function ˆ g can be denoted as:
x
i
′
, j
′
,t
′ = ˆ g(x x x
(i
′
, j
′
,t
′
− 1),n
)
x x x
(i
′
, j
′
,t
′
− 1),n
= x
i
′
, j
′
,t
′
− 1
,··· ,x
i
′
, j
′
,t
′
− n
.
(2.2)
22
2.2.2.1 ASDM/ASDMTE Network Structure
Given its ability to fit non-linear functions, we used an artificial neural network to model
ˆ
f; the
overall network structure of ASDM/ASDMTE is shown in Figure 2.2.
Figure 2.2: Overall Neural Network structure of ASDM/ASDMTE. The notation LSTM:8s repre-
sents a LSTM layer with 8 nodes and return sequence. The notation of Building Block:8 represents
a building block with 8 nodes. The light yellow block represents using dropout layer with dropout
rate of 0.5. The transfer Block is only used in ASDMTE thus it is connected with dash lines.
For model fitting, longitude, latitude, day of the year and elevation were normalized to a range
[0,1]. The CS and FS AOD variables, X,Y , have natural range [0,6] which is approximately the
same scale as [0,1] and thus they were kept on their original scale. ‘Input I‘ used all available
features except lagging variables, X
i
′
, j
′
,t
,Ele
i, j
,Lat
i
,Lon
j
,Day
t
, and was processed by ‘Process
Block I‘. ‘Input II‘ was composed of the 25 FS lags y y y
(i, j,t− 1),25
and went through ‘Temporal Block
I‘ and ‘Temporal Block II‘ in ASDM. If using transfer learning enhancement (ASDMTE), ‘Input II‘
was also processed by the ‘Transfer Block‘. All output from ‘Process Block I‘, ‘Temporal Block
I‘, ‘Temporal Block II‘ and/or ‘Transfer Block‘ were combined and then processed by ‘Process
Block II‘.
23
Long Short Term Memory (LSTM) [67] was used to model the within scale temporal associ-
ations. The building block of ASDM/ASDMTE was composed of a fully connected (FC) layer,
a batch normalization layer, and an optional dropout layer. Leaky ReLU [68] was used as a non-
linear activation function of the FC layer to prevent dead neurons and can be expressed as:
LeakeyReLU(x)=
x if x> 0
αx otherwise,
where we choseα = 0.1. The batch normalization layer was used to stabilize the learning process
and reduce the training time [69, 70]. Dropout layers with rate 0.5 were used as regularization
to prevent overfitting [71, 72], but the dropout layer was applied only in selected building blocks,
marked in yellow in Figure 2.2. The loss function of this model was Mean Square Error (MSE),
which can be expressed as:
MSE =
1
n
n
∑
i=1
(Y
i, j,t
− ˆ
Y
i, j,t
). (2.3)
2.2.2.2 Transferred Model
The transferred model was trained on CS data (MERRA-2), resulting in the learned function ˆ g
(Equation 2.2). Its network structure is shown in Figure 2.3.
24
Figure 2.3: Neural Network structure of the transferred model. The notation LSTM: 8s represents
a LSTM layer with 8 nodes and return sequence. The notation of Building Block:8 represents a
building block with 8 nodes. The light yellow block represents the dropout layer with dropout rate
0.5.
The transferred model captured the within-scale association in CS data and carries this spa-
tiotemporal knowledge to the ASDM to enhance its performance. The Neural Network used to
learn ˆ g was composed of the same building block and similar structure as ASDM/ASDMTE. We
used mean squared error (MSE) as the loss function, and to prevent overfitting, dropout layer and
early stopping training were applied. We randomly chose 10% of available days as the validation
set for early stopping. The ‘Transferred Model‘ is integrated as part of ASDMTE network directly
by setting it to untrainable (i.e., it was not updated during training).
2.2.2.3 Training Strategy
There is always a trade-off between model complexity and data size. The larger spatial and tempo-
ral coverage of the data used for training, the more complex the target function f becomes. As this
makes it more difficult to learn, we simplified the learning task by spatially and temporally split-
ting the data while maintaining a reasonable data size, and fitting separate models on each of the
25
subsets. Spatially, the data were grouped into four regions: 1. Afghanistan; 2. United Arab Emi-
rates and Qatar; 3. Saudi Arabia; and 4. Iraq and Kuwait. Temporally, the data were divided
approximately equally into four seasons that have 91, 91, 91 and 92 days, respectively. In order
to produce temporally continuous downscaled predictions, a 45-day overlap was added to each
season as shown in Figure 2.4.
Figure 2.4: Temporal simplification and splitting for forward and backward prediction.
The model in Equation (2.1) illustrates the prediction for the forward temporal direction; that
is, to predict the future with historical observations. We also trained a backward prediction model
with a slight variation of the same model format, but using future observations to predict historical
data (Figrue 2.4). Training this way allowed downscaling in both directions, forward and backward
in time, which was needed for our application where we aimed to downscale before and after the
2-year training period. Consequently, 32 models (4 regions× 4 seasons× 2 directions) were fitted
on all combinations of region, season and direction. Within each subset of data, the data were
composed of the same seasons from two years (2005 and 2006), as shown in Figure 2.4. The two
years of data were evenly divided into 10 parts and the last 10% of the data were used as test set.
The validation set was the fourth 10% of data. The remaining 80 % was used as the training set.
26
2.2.2.4 Evaluation
The downscaling results in the same direction and time were combined spatially as whole images
for evaluation purposes. The main evaluation metrics were image-wise R
2
[10] and Root Mean
Square Error (RMSE), which are defined as follows:
R
2
t
= 1− ∑
h
i=1
∑
w
j=1
(y
i, j,t
− ˆ y
i, j,t
)
2
∑
h
i=1
∑
w
j=1
(y
i, j,t
− ¯ y
t
)
2
RMSE
t
=
v
u
u
t
1
hw
h
∑
i=1
w
∑
j=1
(y
i, j,t
− ˆ y
i, j,t
)
2
,
(2.4)
where ˆ y
i, j,t
is the downscaled AOD value at i,j,t and y
i, j,t
is the corresponding true value. The down-
scaled results of ASDM, ASDM with transfer enhancement (ASDMTE), SRDRN, dissever frame-
work with GAM and LM as regressors were compared on the same test sets with the above metrics.
The structure of SRDRN can be found in Wang et al. (2021) [8].
2.3 Results
Same-day images of the 7 km G5NR and 50 km MERRA-2 images are shown in Figure 2.5.
We note similarities in their spatial trends with higher values in arid regions of southeast Saudi
Arabia and United Arab Emirates (UAE), but greater definition in the fine scale G5NR image
that is particularly clear over Afghanistan. The bottom left and bottom right plots of Figure 2.5
show mean image-wise R
2
and RMSE (respectively) of G5NR and MERRA-2 AOD data with
different lagging. Both G5NR and MERRA-2 show similar temporal associations: the further
two data images are, the less they are associated, indicated by lower image-wise R
2
and higher
RMSE. These similar inherent temporal associations of G5NR and MERRA-2 provided a good
foundation for ASDM to assume that local-scale AOD can be predicted not only by between-
scale associations, but also by inherent within-scale associations. In addition, due to generative
algorithm differences between G5NR and MERRA-2 AOD data, G5NR AOD has a universally
27
higher mean value and standard deviation (0.316 (0.258)) compared to MERRA-2 AOD (0.294
(0.197)), which is the reason G5NR had higher lagging RMSE and R
2
(Figure 2.5).
Figure 2.5: Sample images from G5NR (top-left) and MERRA-2 (top-right) on 29 July 2006;
temporal trend of image-wise RMSE (bottom-left) and R
2
(bottom-right) with different lags
(days).
Model performance results comparing ASDM and ASDMTE against SRDRN, GAM and LM
are shown in Appendix Figures 2.7–2.10. Both ASDM and ASDMTE outperformed other meth-
ods, as indicated by higher image-wise R
2
and lower RMSE across all seasons and directions.
Among all of the test sets, ASDMTE had average maximum image-wise R
2
=0.758 and average
mean image-wise R
2
=0.443. The ASDM performed similarly with average maximum image-wise
R
2
=0.735 and average mean image-wise R
2
=0.431. The SRDRN, GAM and LM methods had
average maximum image-wise R
2
=0.313, 0.106 and 0.095 respectively. Notably, the downscaled
AOD map generated by ASDMTE and ASDM on 29 July 2006 (Figure 2.6a,b) preserved very sim-
ilar spatial characteristics as the true G5NR data in Figure 2.6d, while SRDRN and dissever-based
downscaling results (see Figure 2.6c,e,f) exhibit clearly different patterns.
28
Figure 2.6: Downscaled (by method) and G5NR data over the study region on 29 July 2006: a)
ASDMTE, b) ASDM, c) SRDRN, d) dissever GAM, e) dissever LM, f) G5NR.
2.4 Discussion
In this study we developed an Artificial Neural Network Sequential Downscaling Method (ASDM)
with Transfer Learning Enhancement (ASDMTE) that enabled coarse-scale AOD data (∼ 50 km)
29
to be downscaled to a finer-scale ( ∼ 7 km) where training occurred only on a limited sample of
temporally overlapping images. The ASDM/ASDMTE approach took point-wise inputs of lagged
fine-scale AOD data, coarse-scale AOD data, latitude, longitude, time and elevation to predict the
fine-scale AOD generated from G5NR. We found that this neural network approach was able to
learn complex relationships and produce reliable predictions. Based on the comparison of image-
wise R
2
and RMSE shown in Appendix Figures 2.7–2.10 and Table 2.1, ASDM/ASDMTE showed
superior downscaling performance that outperformed the CNN-based neural network—SRDRN—
and statistical downscaling approaches in dissever (GAM, LM). [h]
Table 2.1: Image-wise R
2
and RMSE of Downscaling Results. R
2
is presented as Max(Mean).
RMSE is presented as Mean (SD).
Method Mean
Forward Backward
Season 1 Season 2 Season 3 Season 4 Season 1 Season 2 Season 3 Season 4
ASDMTE
R
2
0.758 (0.443) 0.857 (0.593) 0.831 (0.381) 0.728 (0.396) 0.628 (0.174) 0.595 (0.360) 0.851 (0.496) 0.802 (0.653) 0.770 (0.488)
RMSE 0.067 (0.021) 0.051 (0.010) 0.061 (0.013) 0.074 (0.017) 0.058 (0.014) 0.088 (0.018) 0.069 (0.014) 0.043 (0.005) 0.094 (0.075)
ASDM
R
2
0.735 (0.431) 0.890 (0.656) 0.810 (0.371) 0.642 (0.394) 0.588 (0.185) 0.576 (0.313) 0.851 (0.415) 0.790 (0.616) 0.732 (0.494)
RMSE 0.068(0.020) 0.045 (0.008) 0.062 (0.013) 0.077 (0.012) 0.057 (0.012) 0.089 (0.016) 0.064 (0.010) 0.045 (0.007) 0.106 (0.078)
SRDRN
R
2
0.313 (0.088) 0.425 (0.177) 0.198 (0.067) 0.268 (0.063) 0.422 (0.123) 0.239 (0.075) 0.211 (0.040) 0.412 (0.094) 0.332 (0.067)
RMSE 0.088 (0.083) 0.177 (0.131) 0.067 (0.060) 0.063 (0.060) 0.123 (0.108) 0.075 (0.067) 0.040 (0.046) 0.094 (0.098) 0.067 (0.098)
dissever GAM
R
2
0.106 (0.046) 0.199 (0.155) 0.139 (0.055) 0.056 (0.015 ) 0.040 (0.013) 0.079 (0.018) 0.070 (0.009) 0.143 (0.068) 0.124 (0.038)
RMSE 0.213 (0.039) 0.359 (0.058) 0.130 (0.012) 0.161 (0.030) 0.280 (0.055) 0.172 (0.052) 0.131 (0.014) 0.293 (0.045) 0.181 (0.044)
dissever LM
R
2
0.095 (0.040) 0.173 (0.133) 0.108 (0.047) 0.067 (0.015) 0.031 (0.013) 0.087 (0.017) 0.062 (0.008) 0.121 (0.048) 0.113 (0.037)
RMSE 0.214 (0.039) 0.362 (0.059) 0.130 (0.012) 0.161 (0.031) 0.279 (0.055) 0.170 (0.051) 0.131 (0.013) 0.295 (0.045) 0.181 (0.044)
Statistical downscaling has a long history, rooting from the demand to generate local-scale
climate information from GCMs with less computational cost. Traditional statistical approaches
focus on establishing empirical associations between coarse-scale and fine-scale variables [1, 21].
For instance, Loew et al. (2008) modeled the associations between soil moisture at 40 km resolu-
tion and its corresponding fine-scale (1 km) observations using linear regression [37]. Leveraging
temporal replicates, they fit separate linear regression models independently to each fine scale
30
grid, ignoring spatial and temporal associations in either the fine- or coarse-scale data. Recently,
deep learning approaches have been used that address spatial features, such as Wang et al. (2021)
[8], who developed a CNN-based method, Super Resolution Deep Residual Network (SRDRN),
to downscale precipitation and temperature from coarse resolutions (25, 50 and 100 km) to fine
resolution (4 km) by learning the between-scale image-to-image mapping function. However, they
ignore the temporal associations between images.
Current downscaling methods focus only on modeling between-scale relationships and ignore
any inherent temporal associations in the data. As observed in Figure 2.5, there are inherent within-
scale temporal associations in the fine- and coarse-scale data, where at the same location temporally
near observations tend to be correlated to each other. These associations provided essential support
for downscaling and resulted in better fine-scale predictions. Essentially, the target fine-scale vari-
able can be estimated by the coarse-scale variable as well as its own temporal lagging, adjusting
for geographic features, location and time.
By defining the downscaling problem as above, the ASDM/ASDMTE approach was able to
take advantage of both the within-scale temporal associations in the fine-scale data, and between-
scale spatial associations, which allow it to have more information with which the neural network
can learn better than just using the between-scale spatial relationships. This richness in predictive
information is especially important in a situation where data are limited, since it can enable the
model to be trained on a short period of overlapping data without requiring point-to-point matching
of the fine- and coarse-scale images.
This setting also enabled the use of transfer learning (through ASDMTE) by leveraging the
within-scale temporal associations in the coarse-scale data, which had a much longer time series.
Typically in downscaling only the temporally overlapping coarse- and fine-scale data can be used
for modeling. However, in our case we wanted to downscale a longer time series, and we were
able to use transfer learning to learn from all (2000–2018) coarse-scale MERRA-2 data by training
ˆ g and transferring it to enhance the downscaling model.
31
ASDM/ASDMTE suffers from the same assumption of stationarity as other downscaling meth-
ods, that is it assumes the statistical association between coarse- and fine-scale data does not change
outside of the model training time [73, 74]. In addition, we may need to further assume stationary
of within-scale temporal associations (i.e., temporal lags) used in the model.
Another concern of ASDM/ASDMTE is its test robustness. To stabilize
ˆ
f at test time, we
trained different ASDM/ASDMTE models for each season of a year and separately for different
regions/countries, as shown in Section 2.2.2.3. The shorter period of time and smaller target do-
main simplified the learning task of each model and at the same time, simplified the domain to
which the model needed to generalize, so we obtained more robust results when testing.
In addition, ASDM/ASDMTE was designed to solve a supervised downscaling problem, that
is, to downscale coarse-scale data and validate against fine-scale data. It requires the presence
of some fine-scale data and ASDM/ASDMTE can computationally efficiently extend its temporal
range by utilizing the within-scale temporal association to downscale. In the absence of fine-scale
data, ASDM/ASDMTE cannot be applied.
A further research direction would be to stabilize the sequential downscaling performance in
the presence of shorter temporal range of fine-scale data to account for predicting over a long
time series. As shown in Appendix Figures 2.7–2.10, ASDM/ASDMTE can have good downscal-
ing performances and their performances can even recover from previous bad downscaled results,
but the performance still shows a temporally decreasing trend. Our future research will focus on
improved learning of stable temporal associations to improve sequential downscaling performance
for long time series prediction.
32
2.5 Appendix: Supplemental Figures
Figure 2.7: Downscaling performance of ASDM, ASDMTE, SRDRN, dissever GAM and dissever
LM in Season 1. Please refer to Figure 2.4 and Section 2.2.2.3 for the definition of Season 1.
Figure 2.8: Downscaling performance of ASDM, ASDMTE, SRDRN, dissever GAM and dissever
LM in Season 2. Please refer to Figure 2.4 and Section 2.2.2.3 for the definition of Season 2.
33
Figure 2.9: Downscaling performance of ASDM, ASDMTE, SRDRN, dissever GAM and dissever
LM in Season 3. Please refer to Figure 2.4 and Section 2.2.2.3 for the definition of Season 3.
Figure 2.10: Downscaling performance of ASDM, ASDMTE, SRDRN, dissever GAM and dis-
sever LM in Season 4. Please refer to Figure 2.4 and Section 2.2.2.3 for the definition of Season 4.
34
Figure 2.11: Downscaled (by method) and G5NR data over the study region on 23 October 2006:
a) ASDMTE, b) ASDM, c) SRDRN, d) dissever GAM, e) dissever LM, f) G5NR.
35
Chapter 3
Downscaling with Variational Neural Network
Abstract
Dust-related air pollution is an important health concern that affects the growing populations of
Southwest Asia, a highly arid region of the world. However, understanding the impact of dust
exposures is difficult as there is a limited amount of air quality monitoring over this area. Out-
put from General Circulation Models (GCM) offers important information regarding many atmo-
spheric properties, including dust, but the coarse resolution of these global products limits the
ability to understand local processes that affect human health.
Statistical downscaling is a computationally efficient approach to generate high-resolution data
based on low-resolution GCM output. We first proposed a deep latent model called Variational
Neural Network (VNN) to solve a regression problem by capturing its latent distribution. We then
utilized VNN to build a Variational Downscaling Method (VDM) to downscale low-resolution(∼ 50 km) Dust Aerosol Optical Thickness (AOT) data from the Modern-Era Retrospective analysis
for Research and Applications, Version 2 (MERRA-2), with high-resolution (∼ 7 km) Dust AOT
data from Goddard Earth Observing System Model, Version 5 (GEOS-5) Nature Run (G5NR)
as ground truth for validation. We demonstrate the downscaling capability of VDM in both in-
and out-data downscaling and compare the in-data downscaling performances against our previous
Artificial Neural Network Sequential Downscaling Method with Transfer Learning Enhancement
(ASDMTE).
36
Similar to ASDMTE, VDM utilizes empirical between-scale associations, and accounts for
within-scale temporal associations in the high-resolution data. In addition, VDM improves AS-
DMTE in three aspects. First of all, we developed and applied a generative model, Variational
Neural Network (VNN), to capture the underlying random process, which enables VDM to syn-
thesis high-resolution data in longer time range without losing variability. Moreover, we utilized
local neighborhood information, in addition to between- and within-scale association, to better
capture the underlying distribution as well as to improve the predictive performance. Lastly, VDM
utilized more strictly constrained output activation function and normalization to ensure the nu-
merical stability during long-term downscaling. Among all the test sets, VDM has more similar
spatial auto-correlation structure as well as better image-wise RMSE 0.0138, while ASDMTE has
image-wise RMSE 0.1456. VDM is able to produce stable long-term downscaled data (365 days
after G5NR), which show reasonable spatiotemporal structures.
37
3.1 Introduction
Air pollution in Southwest Asia is a global concern. Global dust emissions are in the range of
1000− 2000 Tg/year [75], of which it is estimated that 500 Tg/year (∼ 30% of global dust bud-
get) originates from Southwest Asia [46] . The main sources of air pollution in this region are
natural windblown mineral dust [76] as well as increasing amounts of anthropogenic particulate
matter(PM) produced by rapid economic development, industrialization and urbanization [77].
However, given the immense amount of dust emissions, Southerwest Asia is one of the most pol-
luted regions of the world.
Dust-related air pollution is therefore becoming an essential health concern that affects the
population of Southwest Asia. Studies assessing the impact of air quality on health are needed to
provide a better understanding of the association between airborne exposures to dust and health
risks. Unfortunately, there is a limited amount of available air quality data for Southwest Asia.
Historically there has not been air quality or climate data collected, likely due to resource limita-
tions. Furthermore, the extreme hot and dry climate in Southwest Asia makes the measurement of
air quality very difficult, since the equipment cannot typically handle the particle load experienced
during dust storms.
With the development of computational power and data assimilation techniques, Atmospheric
General Circulation Models (AGCM) can simulate the complex physical process and fluid dynam-
ics of the atmosphere [78], providing information on air pollutants such as aerosols. Currently,
Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2), pro-
duced by Goddard Earth Observing System-5 (GEOS-5) [57], is an AGCM that is intensively
utilized in air pollution research because it has globally covered grid aerosol after 1980. In the
recent years its use in Southwest Asia is notable: Shaheen et al. (2021) analyzed winter Aerosol
Optical Depth (AOD) trends over the Eastern Mediterranean and Middle East( EMME) based on
MERRA-2 and Moderate-Resolution Imaging Spectroradiometer (MODIS) [79], and Ukhov et
al. (2020) applied MERRA-2 AOD to evaluate natural and anthropogenic air pollution over the
Middle East [46].
38
The coarse spatial resolution of MERRA-2 data (∼ 50 km) limits its usefulness in local-scale
research. The Goddard Earth Observing System Model, Version 5 (GEOS-5) Nature Run (G5NR)
[47], is a global mesoscale simulation of MERRA-2 at much finer ( ∼ 7 km) resolution. Given
the computational cost of the G5NR simulation, its temporal range is restricted to just two-years
(2005–2007). This clearly limited its potential to be applied in downstream environmental re-
search.
Statistical downscaling is a computationally efficient approach to generate high-resolution data
based on low-resolution data. It was primarily developed to generate finer spatial scale climate
information from General Circulation Models (GCMs) output with coarse spatial resolution [48,
80]. The fundamental view of statistical downscaling is that the fine (smaller) scale variable is
conditioned by a coarse (larger) scale variable and local features, like topography and land-sea dis-
tribution [21], which means we can build empirical associations between fine-scale (predictands)
and coarse-scale (predictors) variables and apply it to downscale low-resolution data. Compare to
dynamical downscaling technique, it does not require Limited-Area Models (LAMs) or Regional
Climate Models (RCMs) and thus it is more widely utilized in climate research [5]. Statistical
downscaling techniques have also been applied to remote sensing data [6, 10, 13, 49, 50]. In this
application, statistical downscaling enables us to take advantage of both the long temporal range
of MERRA-2 data and the high spatial resolution of G5NR product with reasonable computational
cost.
Deep learning[51, 81] has achieved considerable performance improvements in the past decades
in numerous fields, like image recognition [82, 83], speech recognition [84] as well as text gener-
ation[85]. It has also recently been used in a variety of remote sensing data applications [52]. Vu,
Minh Tue, et al.(2016) downscaled GCMs during the rainy season at meteorological site locations
in Bangkok, Thailand [86]. Chaudhuri, Chiranjib, and Robertson (2020) developed a Generative
Adversarial Network (GAN)-based model, called CliGAN, to downscale annual maximum pre-
cipitation produced by Coupled Model Inter-comparison Project 6(CMIP6) [87]. Neural network
39
models with structure similar to an auto-encoder, along with skip connections and parameter shar-
ing have also been used to downscale by incorporating an iterative training strategy to force spatial
value consistency [9]. Misra, S., et al. (2018) utilized Long Short-Term Memory (LSTM) to cap-
ture spatio-temporal dependencies in local rainfall and downscale precipitation in Mahanadi, India
and Campbel River, Canada [88].
In our previous work, Artificial Neural Network Sequential Downscaling Method (ASDM)
with Transfer Learning Enhancement (ASDMTE)[89], we utilized deep neural network along with
transfer learning technique to capture both between- and within-scale association and achieved
satisfying spatial downscaling performance (Figure 2.6 and Table 2.1). However, the Artificial
Neural Network (ANN) in ASDM/ASDMTE is not able to model the underlying random processes
that produce the data. Even though ANNs can perform as a strong function approximator, but it is
inherently a deterministic model, which means it does not provide randomness in the downscaled
predictions. In short term downscaling, ASDM/ASDMTE can perform reasonably well, however,
when sequentially downscale a long time series of data, the variance will shrink gradually which
restricts the long-term downscaling ability of ASDM/ASDMTE.
To improve ASDM/ASDMTE, we aim to capture the conditional underlying latent distribution
with a deep generative model, rather than only approximating the complex non-linear associa-
tions in data. Variational Auto-Encoder (V AE) [90] [91] is a deep latent model that has garnered
intensive research interest as it has achieved significant performance in many fields, such as com-
puter vision [92] [93], Natual Language Processing (NLP)[94] [95] and biology [96]. With the
help of Variational Inference (VI)[97], it converts a latent distribution approximation problem to a
optimization problem that can take advantage of back-propagation.
We first extend V AE to a more general deep latent model, called Variational Neural Network
(VNN), enabling it to solve a regression problem, and then utilized VNN to build Variational
Downscaling Method (VDM) to downscale low-resolution (∼ 50 km) dust Aerosol Optical Thick-
ness (AOT) data from Modern- Era Retrospective analysis for Research and Applications, Version
40
2 (MERRA-2), a satellite-based reanalysis generated by NASA’s Global Modeling and Assimila-
tion Office (GMAO). The target downscaled resolution is ∼ 7 km, with Dust AOT from Goddard
Earth Observing System Model, Version 5 (GEOS-5) Nature Run (G5NR) as ground truth for
validation. G5NR is an essential data source that can provide local air quality information for
health-related studies, but its short time range (2005-2007) restricts its potential for downstream
research. On the contrary, MERRA-2 has long temporal range (1980-present), but its coarse resolu-
tion disabled itself from supporting local scale research. VDM can downscale dust AOT data from
MERRA-2 to higher resolution and use dust AOT data from G5NR as validation. The downscaled
data can benefit rich downstream studies with its long temporal range and high-resolution.
We applied VDM to G5NR and MERRA-2 dust AOT data for several countries in Southwest
Asia (Figure 3.2). The in-data test performances of VDM is compared with our previous work
ASDMTE over the same period and area. In addition, we sequentially downscaled 1 year (365
days) MERRA-2 dust AOT data after 5/15/2007, when G5NR ends, and evaluated and compared
the spatial and temporal auto-correlation structure of downscaled data (2007-2008) with G5NR
data from 2006 to 2007.
3.2 Materials and Methods
This section is divided into three parts. Section 3.2.1 introduces Variational Neural Network(VNN)
that extended Variational Auto-Encoder to a regression setting. Section 3.2.2 presents the Varia-
tional Downscaling Model(VDM), a novel generative downscaling method we developed based
on VNN to solve supervised downscaling problem, including data sources information, study do-
main, model structure and training strategy. Section 3.2.3 provides details of in-data and out-data
downscaling with VDM and their evaluations.
41
3.2.1 Variational Neural Network
We propose a novel deep generative model, Variational Neural Network(VNN). VNN is a deep
latent model similar to Variational Auto-Encoder(V AE) [91] and it aim to solve regression problem
by modeling the underlying distribution that generate the data. To illustrate VNN, let us start with
defining necessary notations.
Let’s denote x as a vector of the set of all observed independent variables. The lower case bold
letters, e.g. x, is meant to represent random vectors. We use y to denote the set of all observed
dependent variables corresponding to x.
We assume the dependent variable y are sampled from an unknown process that relies on the
independent variable x. We can denote the distribution of this underlying process as ˜ p(y|x), such
that:
y∼ ˜ p(y|x) (3.1)
In most case, this true distribution ˜ p(y|x) is unknown and we need to learn a model p
θ
(y|x)
with parameter setθ to approximate the underlying distribution, such that:
p
θ
(y|x)≈ ˜ p(y|x) (3.2)
We use neural network to parameterize p
θ
(y|x) which enable our model to be sufficiently flex-
ible to approximate ˜ p(y|x). Learning models with neural network that has multiple-layer structure
is called deep learning [51]. It also allows the model to be computationally efficient by taking ad-
vantage of stochastic gradient-based optimization techniques which enable us to train large models
on large datasets.
We can further assume there exist unobserved latent variables in the underlying process and we
use z to denote the latent variable vector. This will give us a latent variable model p(y,z|x) and it
can be factorized as:
p(y,z|x)= p(z)p(y|z,x) (3.3)
42
Maximum likelihood learning approach is applied in VNN to select a better model, however
the marginal probability of observed data is intractable. We can express the marginal distribution
as:
p(y|x)=
Z
p(y,z|x)dz (3.4)
which does not have a close form solution or estimator. This intractability to the marginal distribu-
tion make it not differentiable, which means we can not use optimization-based leaning approach
to effectively train our model.
To transform this intractable posterior into a solvable problem, we included a Sub-Recognition
Model(SRM) q
φ
(z|x) with parameter setφ. By optimizing this Sub-Recognition Model, we would
like to force it to approximate the conditional latert distribution, such that:
q
φ
(z|x)≈ p(z|x,y) (3.5)
Having Sub-Recognition Model allow us to start deriving the marginal posterior. The log-
likelihood of our latent model can be expressed as:
log p
θ
(y|x)= E
q
φ
(z|x)
(log p
θ
(y|x))
= E
q
φ
(z|x)
(log
p
θ
(y,z|x)
p
θ
(z|x,y)
)
= E
q
φ
(z|x)
(log
p
θ
(y,z|x)q
φ
(z|x)
q
φ
(z|x)p
θ
(z|x,y)
)
(3.6)
After reorganizing, the log-likelihood is:
log p
θ
(y|x)= E
q
φ
(z|x)
(log
q
φ
(z|x)
p
θ
(z|x,y)
)+ E
q
φ
(z|x)
(log
p
θ
(y,z|x)
q
φ
(z|x)
) (3.7)
43
The first term on the left side of Equation 3.7 is the Kullback-Leibler(KL) divergence between
Sub-Recognition Model q
φ
(z|x) and the true conditional latent distribution p
θ
(z|x,y) and we de-
note it as KL(q
φ
(z|x)||p
θ
(z|x,y)). This is a non-negative term that can measure the difference
between these two distributions, thus we have:
KL(q
φ
(z|x)||p
θ
(z|x,y))≥ 0 (3.8)
This non-negativity of KL(q
φ
(z|x)||p
θ
(z|x,y)) can immediately lead to:
E
q
φ
(z|x)
(log
p
θ
(y,z|x)
q
φ
(z|x)
)= log p
θ
(y|x)− KL(q
φ
(z|x)||p
θ
(z|x,y))
≤ log p
θ
(y|x)
(3.9)
which make E
q
φ
(z|x)
(log
p
θ
(y,z|x)
q
φ
(z|x)
) a lower bound of the log-likelihood. Thus this term is also called
the Evidence Lower Bound(ELBO) and we will denote it as:
ELBO
θ,φ
(y|x)= E
q
φ
(z|x)
(log
p
θ
(y,z|x)
q
φ
(z|x)
) (3.10)
For better illustration, now we can rewrite Equation 3.7 as:
log p
θ
(y|x)= KL(q
φ
(z|x)||p
θ
(z|x,y))+ ELBO
θ,φ
(y|x) (3.11)
In ordinary Maximum-Likelihood approach, we usually maximize the log-likelihood log p
θ
(y|x)
directly to select the model that are most likely given the observed data. However, we find it is
a difficult optimization problem in our study, since to maximize the log-likelihood we have to
maximize both ELBO term and non-negative KL divergence term simultaneously.
ELBO
θ,φ
(y|x)= log p
θ
(y|x)− KL(q
φ
(z|x)||p
θ
(z|x,y)) (3.12)
44
By rearrange Equation 3.11 as Equation 3.12, we can see the optimization from another point of
view, that is the non-negative KL(q
φ
(z|x,y)||p
θ
(z|x,y) is a gap between log-likelihood log p
θ
(y|x)
and ELBO
θ,φ
(y|x). If we maximizing ELBO
θ,φ
(y|x), it is equivalent to do one of the following
jobs:
1. maximizing the log-likelihood log p
θ
(y|x), which can improve our model overall and help
us select a better model that are more likely to produce observed data.
2. minimizing the gap KL(q
φ
(z|x)||p
θ
(z|x,y), which can help the Sub-Recognition Model ap-
proximate the true latent distribution better.
3. both maximizing log p
θ
(y|x) and minimizing KL(q
φ
(z|x)||p
θ
(z|x,y).
Therefor, − ELBO
θ,φ
(y|x) can be used as the loss function for model optimization. By further
assume the latent space as independent Normal distribution, we can derive the close form loss.
The Variational Neural Network structure schematic is shown in Figure 3.1. We construct VNN
as a network that include the Sub-Recognition Model q
φ
(z|x) and the Sub-Transfer Model (STM)
p
ψ
(y|x,z), connected by skip connections(green lines in Figure 3.1) over the latent space.
Figure 3.1: Variational neural network(VNN) schematic
45
As in Equation 3.5, we used SRM q
φ
(z|x) to approximate the true conditional latent distri-
bution. A natural question might be why using q
φ
(z|x) instead of q
φ
(z|x,y). By VNN, we are
addressing a regression problem that aiming to learn a model to approximate Equation 3.1, which
means we want to model the associate from x to y. When estimating the parameter of latent vari-
able z, we can only observe the independent variable x and if the dependent variable y can be used,
we do not need to solve the regression problem in the first place. In addition, we assume x is asso-
ciated with y in regression setting. This is equivalent to the statement that x has partial information
of y, which enable q
φ
(z|x) to approximate p
θ
(z|x,y).
The prediction of VNN is a little bit different from regular neural network by including a latent
space. When we have a new test input x
′
, the learned SRM q
φ
(z|x) use it to estimate the parameters
of latent variables. To distinguish with the true latent variables z, we use z
′
to denote the estimated
latent variables. We then draw a sample from P(z
′
) to serve as input for STM p
ψ
(y|x,z) to predict
y.
3.2.2 Variational Downscaling Method
Variational Downscaling Method(VDM) is a novel generative downscaling approach we developed
based on Variational Neural Network(VNN). It can make use of both between-scale and within-
scale associations in the data, similar to our previous work Artificial Neural Network Sequential
Downscaling Method (ASDM) with Transfer Learning Enhancement (ASDMTE)[89]. It can also
utilize the local neighbourhood information to improve downscaling performance. In addition,
VDM can model the underlying random process by including latent variables as the source of
randomness which enable VDM to produce more realistic long-term downscaling data. Code for
VDM is available athttps://github.com/menglinw/S2S_Downscaling.
3.2.2.1 Data
We selected the data within the circumscribed rectangles (orange rectangle in Figure 3.2) over
United Arab Emirates, Qatar, Saudi Arabia, Iraq, Kuwait and Afghanistan as shown in Figure 3.2
46
as our study domain. Afghanistan is spatially separate from other countriesy, therefore it is treated
separately as Area A and the region over other countries is Area B
Figure 3.2: Map of study domain
Dust Extinction AOT (550 nm) from high-resolution data, GEOS-5 Nature Run (G5NR), and
low-resolution data, Modern-Era Retrospective Analysis for Research and Applications, version
2 (MERRA-2), is log-transformed and normalized to ensure data range in (0,1). Elevation data,
Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010), latitude, longitude and day
of the year are normalized when pre-processing.
1. Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-
2) The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-
2) is produced by NASA’s Global Modeling and Assimilation Office (GMAO)[55]. It utilized
information from different remote sensing platforms to assimilate AOT data [60] using Goddard
Earth Observing System, version 5 (GEOS-5) [55], whose key components are an atmospheric
model [56, 57] and Gridpoint Statistical Interpolation (GSI) analysis scheme [58, 59]. Its aerosol
module simulated 15 externally aerosol mass mixing ratio tracers [61] for MERRA-2.
47
We used Dust Aerosol Extinction AOT 550 nm (AOT) [62] from 16 May 2005–15 May 2008
as our low-resolution data. MERRA-2 Dust AOT data has 0.625
◦ longitudinal resolution, 0.5
◦ latitudinal resolution (∼ 50 km) and daily temporal resolution.
2. GEOS-5 Nature Run (G5NR) GEOS-5 Nature Run (G5NR) is also produced by the GEOS-
5 atmospheric general circulation model [19] as a two-year (16 May 2005–15 May 2007) non-
hydrostatic 7 km global mesoscale simulation. The primary reason to develop G5NR was the
demand for a high-resolution reanalysis similar to the existing Nature Run, European Centre for
Medium-Range Weather Forecasts (ECMWF) from the observing system simulation experiment
(OSSE) community. G5NR includes 15 aerosol tracers [47]. G5NR’s simulated Earth weather sys-
tem is constrained only by surface boundary conditions for sea-surface temperatures, the burning
emissions of sea-ice, daily volcanic and biomass and high-resolution inventories of anthropogenic
sources [19].
In this study, we used G5NR Dust Aerosol Extinction AOT 550 nm from 16 May 2005 to 15
May 2007 as our high-resolution data. It has 0.0625
◦ grid resolution (∼ 7 km) and daily temporal
resolution.
3. Global Multi-resolution Terrain Elevation Data 2010 (GMTED2010) The Global Multi-
resolution Terrain Elevation Data 2010 (GMTED2010) is a global elevation model developed by
the U.S. Geological Survey and the National Geospatial-Intelligence Agency [63]. It provides
elevation data in three separate resolutions (horizontal post spacing) of 30 arc-seconds (∼ 1 km), 15
arc-seconds (∼ 500 m), and 7.5 arc-seconds (∼ 250 m) [64]. We used the 30 arc-seconds resolution
data and spatially averaged to match the∼ 7 km G5NR grid.
3.2.2.2 Downscaling Model
The Variational Downscaling Method network structure used in our study is shown in Figure 3.3.
In short, VDM is a auto-regressive model that uses historical high-resolution data, along with low-
resolution data and other information to make further high-resolution predictions. In addition,
48
Figure 3.3: Variational downscaling method network structure
VDM aim to model spatiotemporal structure within a small area (5× 5 pixel in G5NR). VDM has
four inputs, including 40 small lagging images from high-resolution data G5NR, its corresponding
part in low-resolution data MERRA-2, the elevation data, GMTED2010, that covers the same area
with G5NR and other spatiotemporal information including latitude, longitude and day of the year.
At each time point (day), the high-resolution dust AOT image is split into many mini-images with
shape in 5× 5 pixels. VDM assume these mini-images are spatially independent to each other. The
high-resolution data input of VDM is 40 temporal lagging mini-images at the same location. The
low-resolution data input is similar, which uses 40 temporal lagging of low-resolution mini-images
over the same region as high-resolution input. Elevation data does not have time dimension, thus
it is a 5× 5 mini-image over the same area.
ConvLSTM network is used to capture spatiotemporal features in high- and low-resolution
lagging data and the spatial feature in elevation data GMTED2010 is extracted using Convolutional
layer with 3× 3 kernel. Latitude, longitude and day of the year is processed by dense layer. All
49
input features from above layers are combined and input to a VNN to predict the high-resolution
small image at next time step.
The output activation function of VDM stem from:
f(x)=
x
1+|x|
(3.13)
and it is re-scaled to have a range (0, 1). f(x) is a member of Sigmoid function family. Compare
to another famous member of Sigmoid family, the widely used logistic function g(x)=
1
1+e
− x
,
f(x) has more gentle slope over its domain, which can help VDM producing more stable numeric
predictions while limiting the output in range(0,1).
3.2.2.3 Training Strategy
As shown in Figure 3.2, our data mainly covers two large area: Area A is the circumscribed rect-
angle of Afghanistan and Area B is the circumscribed rectangle of Iraq, Saudi Arabia, Kuwait,
Qatar and United Arab Emirates. These two regions are spatially separate from each other, thus
we trained separate models on each area independently to simplify the learning tasks. In addi-
Figure 3.4: Data temporal splitting
50
tion, to further reduce the difficulty of learning time-related structure in data, we divided our data
temporally into 4 seasons based on day of the year, shown in Figure 3.4. These seasons does not
necessarily corresponding to natural seasons. Seasons of 1-3 each covers 91 days of a year and
Season 4 covers the last 92 days of a year. In addition, we also included 45 days of overlapping be-
fore each season to have more smooth change between each model when downscaling. We trained
separate models on all temporal seasons with the last 10% days reserved for testing. The other
90% data is processed and used for training. Among the training set, 20% are randomly selected
for model validation.
3.2.3 Downscaling and Evaluation
With VDM, we would like to model the underlying data generation process and synthesize more
high-resolution data. Even though we format VDM to solve a supervised learning problem, it
is difficult to measure our model performance in simple metrics which make the evaluation pro-
cess complicated. Thus, we would like to present our downscaling and evaluation method in this
section.
In our study, downscaling and evaluation work is mainly divided into two parts. Part 1(Section
3.2.3.1) presents our downscaling method on reserved test set and the evaluation of the similarity
between downscaled test set and its corresponding ground truth. The result is compared to our
previous work Artificial Neural Network Sequential Downscaling Method (ASDM) with Transfer
Learning Enhancement (ASDMTE) [89]. Part 2(Section 3.2.3.2) introduces the sequential down-
scaling method outside of of our observed data range and the measurement of spatial and temporal
similarity between downscaled data and G5NR data. Since ASDMTE is not able to produce stable
long-term downscaled data, this part does not include out-data ASDMTE performance.
3.2.3.1 In-Data Downscaling Evaluation
At each season during training, the last 10% of the data is reserved for testing (Section 3.2.2.3).
The downscaling and evaluation on these test sets is called In-Data Downscaling Evaluation, which
51
is aiming to assess the goodness of our VDM model fitting. For each day included in test set, we
used true lagging data as input for our VDM model to predict high-resolution output. These pre-
dicted data is reconstructed to images that has the same shape as G5NR that covers the study
domain. We use two measurements to evaluate the similarity between downscaled data and its
corresponding ground truth, image-wise Root Mean Square Error(RMSE) and image-wise Semi-
variogram. Image-wise RMSE can provide a quantitative measurement of deviation from the truth,
but it cannot fully summarize how similar two images are. As supplement, we use image-wise
semivariogram to compare the spatial structures.
Image-wise RMSE calculate the RMSE between downscaled image and its ground truth and it
can be expressed as:
RMSE =
s
∑
n
s∈D
(Y
s
− ˆ
Y
s
)
2
n
(3.14)
where Y
s
is target true variable at location s,
ˆ
Y
s
is the corresponding downscaled target variable and
n is the total number of observed location s in target domain D, such that s∈ D.
Semivariogram is a widely used quantification of spatial auto-correlation in a random process,
which measures the similarity of values associate with their distance. By assuming stationary of
the underlying spatial process, the location of observing in the domain become unimportant and
the distance is the key to determine the similarity. With semivariogram, we can compare the spatial
characteristic between downscaled data with its ground truth. The empirical semivariogram can be
define as:
ˆ γ =
1
2|N(h
k
)|
∑
N(h
k
)
(Y
s
i
− Y
s
j
)
2
(3.15)
where the distance h is divided into k intervals such that N(h
K
) is the set of pairs in interval h
k
.
The empirical semivariograms of all intervals are plotted against distance to demonstrate the spatial
characteristics of the observed data.
52
3.2.3.2 Out-Data Downscaling Evaluation
The ultimate goal of VDM is to synthesize more high-resolution data, thus we downscaled one
more year (365 days) after 5/16/2007, where G5NR data ends, and evaluated the spatial and tem-
poral similarities of downscaled data and G5NR data. This procedure is called Out-Data Down-
scaling Evaluation. When sequential downscaling, VDM start with true high-resolution input,
along with low-resolution data and elevation data, to predict mini-images at next time step. These
predicted mini-images are reconstructed back to a large image that covers the study domain and
this large image is then used as the latest lagging to serve as input for further predictions. VDM
uses predictions from last step as input for next step to downscale sequentially.
Out-data downscaling evaluation is challenging, since we don’t have the ground truth for com-
parison. Error-based method, like image-wise RMSE, is thus not applicable in this case. However,
we can still compare the spatial and temporal features against available high-resolution data. Spa-
tially, we assess the spatial characterise of downscaled data with its true temporal counterpart in
previous year(2006-2007) at the same day of year using semivariogram. In addition, the tem-
poral auto-correlation structure can be expressed with Autocorrelation Function (ACF) and Partial
Autocorrelation Function (PACF). ACF and PACF are commonly used to find parameters for Auto-
Regressive Integrated Moving Average(ARIMA) model. Simply speaking, ACF is a bar plot of the
correlations between a time series and lags of itself, while PACF plot is a bar plot of the partial
correlation coefficients between the series and lags of itself. In our study, we calculated a time
series for each country in study domain, with aggregate mean over the country area at each time
step. ACF and PACF were ploted on both aggregated time series from downscaled data and G5NR
data for comparison.
53
3.3 Results
Figure 3.5 shows dust AOT data at 5/16/2005 from G5NR and MERRA-2, before and after data
processing. We note the spatial trends in G5NR(Figure Figure 3.5a) and MERRA-2 (Figure 3.5c)
is not similar. G5NR has higher value of dust AOT at area near United Arab Emirates, while
MERRA-2 shows greater value near Kuwait. In addition, original scale of dust AOT data range
approximately in(0,3) and after log-transformation and normalization, we limit its range in(0,1),
which can keep similar spatial trend with its corresponding original scale and, at the same time,
help numeric stability in our model training.
(a) G5NR Dust Extinction AOT (b) Normalized G5NR Dust Extinction AOT
(c) MERRA-2 Dust Extinction AOT (d) Normalized MERRA-2 Dust Extinction AOT
Figure 3.5: Dust Extinction AOT (550 nm) Data at 5/16/2005 from G5NR and MERRA2(before
and after pre-processing)
54
3.3.1 In-Data Downscaling Results
The In-Data average test image-wise RMSE of VDM downscaled data has range (0.007,0.03),
while ASDMTE downscaled RMSE is in (0.05,0.22) (Figure 3.6, Table 3.1), which indicate the
overall goodness of fitting of VDM is better than our previous method ASDMTE. Specifically,
in Season 1, the overall VDM RMSE, including both Area A and Area B, is 0.0085(0.0025),
reported as Mean(Standard Deviation). The RMSE of Area A and Area B is 0.0118(0.0055)
and 0.0070(0.0018), respectively. In Season 2, the overall RMSE is 0.0101(0.0042), Area A
RMSE is 0.0102(0.0040) and Area B RMSE is 0.0097(0.0051). In Season 3, the overall RMSE
is 0.0201(0.0051), Area A RMSE is 0.0171(0.0065) and Area B RMSE is 0.0204(0.0065). In
Season 4, the overall RMSE is 0.0166(0.0040), Area A RMSE is 0.0319(0.0088) and Area B
RMSE is 0.0080(0.0019).
Figure 3.6: Overall image-wise RMSE of all seasons
We used data at 8/2/2006 from Season 1 test set as a sample to evaluate downscaling spatial
characteristics. The Semivariogram plots and Dust AOT maps of other Seasons can be found
55
Table 3.1: Test RMSE of VDM and ASDMTE in all seasons
Method Season Area RMSE
Mean SD
VDM
Season 1 All 0.0085 0.0025
Season 1 Area A 0.0118 0.0055
Season 1 Area B 0.0070 0.0018
ASDMTE Season 1 All 0.1724 0.0475
VDM
Season 2 All 0.0101 0.0042
Season 2 Area A 0.0102 0.0040
Season 2 Area B 0.0097 0.0051
ASDMTE Season 2 All 0.0575 0.0200
VDM
Season 3 All 0.0201 0.0051
Season 3 Area A 0.0171 0.0065
Season 3 Area B 0.0204 0.0065
ASDMTE Season 3 All 0.1423 0.0500
VDM
Season 4 All 0.0166 0.0040
Season 4 Area A 0.0319 0.0088
Season 4 Area B 0.0080 0.0019
ASDMTE Season 4 All 0.2101 0.0526
56
in Section 3.5: Supplemental Figures (Figures. 3.12 to 3.17). In Figure 3.8, VDM downscaled
data at 8/2/2006 shows similar trend and Dust AOT range (0.9300,0.9790) as its G5NR ground
truth(0.9300,0.9840). Both of them have lower Dust AOT at the northwest part of Saudi Arabia
and northern part of Afghanistan and higher values at southeast part of Saudi Arabia, United Arab
Emirates and southern Afghanistan. ASDMTE downscaled Dust AOT manifests similar spatial
characteristics, but its range(0.0500,2.1370) differs from the true data.
To eliminate the effects of different scale on our comparison of spatial characteristics, the
semivariogram is normalized to range(0,1). In Season 1, the Semivariograms of VDM downscaled
data in both Area A and Area B (Figure 3.7c and Figure 3.7d) show similar spatial trend with their
corresponding true data in G5NR (Figure 3.7a and Figure 3.7b). While ASDMTE downscaled
data shows different semivariogram function in Area A (Figure 3.7e) with smaller Sill≈ 0.9 and
Range≈ 0.4. This semivariogram function indicates ASDMTE downscaled data at Area A has
weaker spatial association, which lead to sharper slop and smaller range.
3.3.2 Out-Data Downscaling Results
To evaluate the long-term downscaling performance of VDM, we downscaled 1 year (365 days)
of high-resolution Dust AOT data after 5/15/2007, where G5NR ends. In this section, we take the
downscaled data at the very end (5/15/2008), that is 365 days after the truth, as a sample to assess
its spatial characteristics.
The sample Dust AOT data is plotted as a map in Figure 3.9. The Dust AOT range in Area A
is(0.9117,0.9133) and the range in Area B is(0.9650,0.9730). These range is much smaller than
G5NR. To observe the variance within each area, they are plotted separately in different colors and
both of them show clear internally spatial trends.
57
(a) G5NR Area A Semivariogram (b) G5NR Area B Semivariogram
(c) VDM Downscaled Area A Semivari-
ogram
(d) VDM Downscaled Area B Semivari-
ogram
(e) ASDMTE Area A Semivariogram (f) ASDMTE Area B Semivariogram
Figure 3.7: Season 1 Semivariogram of G5NR and Downscaled Data at 8/2/2006
58
G5NR Dust AOT Map (Ground Truth)
VDM Downscaled Dust AOT Map
ASDMTE Downscaled Dust AOT Map
Figure 3.8: Season 1 Dust AOT Map from G5NR, VDM and ASDMTE at 8/2/2006
59
Figure 3.9: VDM Downscaled Dust AOT Map at 5/15/2008
In Figure 3.10, the normalized semivariogram functions of downscaled data at 5/15/2008 have
similar shapes with the true G5NR data at 5/15/2007. However, the semivariogram estimations
(blue dots) in Figure 3.10a increase slower that its G5NR counterpart(Figure 3.10c) which indicates
in downscaled data, the spatial dependence is stronger.
Kuwait is the country with special attention of air pollution and it is also the study domain in
our downstream application in Chapter 4, therefore we will use Kuwait as an example to analyze
the downscaled temporal structure. The ACF and PACF plot of other countries can be found in
Section 3.5 (Figures. 3.18 to 3.22).
If we consider the indirect correlation, the order of dependence in G5NR ACF plot (Figure
3.11a) is 40, while in VDM ACF plot, the order is 30 (Figure 3.11c). This result implies our
downscaled data has slightly shorter temporal dependency structure. In addition, the ACF of G5NR
60
(a) G5NR Area A Semivariogram at 5/15/2007 (b) G5NR Area B Semivariogram at 5/15/2007
(c) Downscaled Area A Semivariogram at
5/15/2008
(d) Downscaled Area B Semivariogram at
5/15/2008
Figure 3.10: Semivariogram Comparison Between G5NR and Downscaled Data
61
(a) G5NR Kuwait ACF Plot (b) G5NR Kuwait PACF Plot
(c) Downscaled Kuwait ACF Plot (d) Downscaled Kuwait PACF Plot
Figure 3.11: Temporal Auto-Correlation Plots of Kuwait
decrease shaply at the first two lags and then has a slow downward trend. Our downscaled data
has a linear decreasing trend in ACF plot. The PACF plots provide consistent evidence. The
order of G5NR PACF (Figure 3.11b) is 5, while VDM PACF (Figure 3.11d) order is 2. PACF
plot of G5NR(Figure 3.11b) start with a strong correlated lag, followed by 4 weak lags. While in
downscaled PACF(Figure 3.11d), the first lag correlation is stronger, but it only captured 1 weak
lag after that.
62
3.4 Discussion
In this study, we developed Variational Neural Network(VNN), a generative model that extend
Variational Auto-Encoder to solve regression problem and utilized VNN to build Variational Down-
scaling Method(VDM) which enabled low-resolution Dust AOT data (∼ 50 km) to be downscaled
to a higher-resolution (∼ 7 km). The VDM approach took inputs of lagged mini-image fine-scale
AOD data, coarse-scale mini-image AOD data, latitude, longitude, time and elevation to predict
the fine-scale AOD at next time step that covers the same area. We found that VDM approach is
able to learn complex underlying conditional random process and produce reliable predictions. By
the comparison to our previous work, Artificial Neural Network Sequential Downscaling Method
with Transfer Learning Enhancement (ASDMTE), VDM can achieve better in-data test image-
wise RMSE and more similar Semivariogram Plot, shown in Section 3.3.1. In addition, VDM, as
an improvement of ASDMTE, can produce stable long-term downscale output. We downscaled
1 year (365 days) high-resolution Dust AOT data after G5NR ends and the Semivariogram Struc-
ture, ACF and PACF plots in Section 3.3.2 suggest that the downscaled has similar spatiotemporal
structure as the true G5NR data.
Variational Downscaling Method(VDM) improved ASDMTE in three aspects. First of all, we
developed and applied a generative model, Variational Neural Network(VNN), to capture the un-
derlying random process, which enable VDM to synthesis high-resolution data in longer time range
without losing variability. Moreover, we utilized local neighborhood information, in addition to
between- and within-scale association, to better capture the underlying distribution as well as to
improve the predictive performance. Lastly, VDM utilized more strictly constrained output activa-
tion function and normalization to ensure the numerical stability during long-term downscaling.
VDM is a generative downscaling method based on Variational Neural Network, which means
rather than modeling the fixed associations for downscaling, VDM is approximating the underlying
random process that produce the data. By including the normally distributed latent variable set, we
can summarize the randomness within training data that traditional downscaling methods usually
ignore. When downscaling with VDM, it is equivalent to draw a sample from learned distribution.
63
This is essentially important for long-term downscaling using within-scale association since it can
efficiently avoid downscaled data losing randomness and reducing to constant.
Generative model is generally more difficult to learn, since the learning target is harder. If we
fit a neural network, we only need to estimate one set of parameters to optimize only one model.
But in VNN, we need to learn a set of parameters that defines a family of models that best fit our
training data. To overcome this learning barrier, we inherited the learning strategy from ASDMTE
that include both between- and within-scale association. Moreover, VDM used 5× 5 mini-imgae at
each time step which include local neighbourhood information to improve learning. It seems VDM
model can use more information than ASDMTE which maybe the reason of better performance.
However, we used exactly the same data, including spatial domain and temporal range, to train
both models. Both models are controlled to expose to the same amount of information.
Numerical stability is essential to produce reasonable output with sequential downscaling that
using within-scale association. It is even more important for VDM, since we included latent vari-
ables in our network and they may produce extreme values. In VDM, the output activation is from
Sigmoid family and re-scaled to ensure it produce output within (0,1). While in ASDMTE, the
output activation is ReLU which means we can only control the lower boundary our output larger
than 0, but the model may produce value larger than 1. Moreover, batch normalization is applied
in VDM after every layer to stabilize training process.
At the same time, VDM has some features that still need to be improved. The first is differences
between VDM models. As shown in Section 3.2.2.3, we trained independent model for each area
and season which simplified the learning task of each model. At the same time, it can also reduce
the computational requirements for training. However, as shown in Figure 3.9, the difference
between models may lead to downscaling output with sharp change which is a new concern that
deserves special attention. During sequential downscaling, every step of prediction is equivalent
to draw a sample from latent distribution which make sequential downscaling a random process.
Currently, we cannot control the two sequential downscaling at Area A and Area B to be consistent
to each other. In addition to the between-model difference, in Figure 3.9, we can also observe
64
the downscaled data range is narrower than the ground truth which implies that the VDM output
variance within each model is smaller than the truth. By the long-term ACF and PACF plot (Figure
3.11), downscaled data shows different temporal dependency pattern than the true G5NR data.
VDM can capture the strong temporal correlations successfully, but it only modeled part of the
weak temporal correlations. Lastly, VDM cannot effectively model extreme climate condition that
produce very large or small Dust AOT values. We strictly constrained VDM output range in(0,1)
to ensure long-term downscaling numerical stability, but at the same time, it limited the ability of
VDM to produce extreme output that usually be seen as outlier.
Our further research directions will focus on solving the above problems. Training one large
complex model might be a possible solution to eliminate differences between models, but this
method requires us to develop a more computationally efficient training process and to acquire
more computation resources. Currently, the low variance of long-term downscaling output can
be fixed by further bias correction procedures. In addition, we will investigate the VNN model
to discover new ways to increase predictive variance. More sophisticated network structure like
Transformer [98] or training strategy [85] should be utilized to improve the temporal structure
modeling. Lastly, modeling extreme condition and maintaining numerical stability seems a trade
off, since if we need to model extreme values, the restrictions on output have to be loosened. Our
next step will be figuring out a way to optimize this balance.
65
3.5 Appendix: Supplemental Figures
(a) G5NR Area A Semivariogram (b) G5NR Area B Semivariogram
(c) Downscaled Area A Semivari-
ogram
(d) Downscaled Area B Semivari-
ogram
(e) ASDMTE Area A Semivariogram (f) ASDMTE Area B Semivariogram
Figure 3.12: Season 2 Semivariogram of G5NR and Downscaled Data at 10/27/2006
66
G5NR Dust AOT Map (Ground Truth)
VDM Downscaled Dust AOT Map
ASDMTE Downscaled Dust AOT Map
Figure 3.13: Season 2 Dust AOT Map from G5NR, VDM and ASDMTE at 10/27/2006
67
(a) G5NR Area A Semivariogram (b) G5NR Area B Semivariogram
(c) Downscaled Area A Semivariogram (d) Downscaled Area B Semivariogram
(e) ASDMTE Area A Semivariogram (f) ASDMTE Area B Semivariogram
Figure 3.14: Season 3 Semivariogram of G5NR and Downscaled Data at 1/25/2007
68
G5NR Dust AOT Map (Ground Truth)
VDM Downscaled Dust AOT Map
ASDMTE Downscaled Dust AOT Map
Figure 3.15: Season 3 Dust AOT Map from G5NR, VDM and ASDMTE at 1/25/2007
69
(a) G5NR Area A Semivariogram (b) G5NR Area B Semivariogram
(c) Downscaled Area A Semivariogram (d) Downscaled Area B Semivariogram
(e) ASDMTE Area A Semivariogram (f) ASDMTE Area B Semivariogram
Figure 3.16: Season 4 Semivariogram of G5NR and Downscaled Data at 4/28/2007
70
G5NR Dust AOT Map (Ground Truth)
VDM Downscaled Dust AOT Map
ASDMTE Downscaled Dust AOT Map
Figure 3.17: Season 4 Dust AOT Map from G5NR, VDM and ASDMTE at 4/28/2007
71
(a) G5NR Afghanistan ACF Plot (b) G5NR Afghanistan PACF Plot
(c) Downscaled Afghanistan ACF Plot (d) Downscaled Afghanistan PACF Plot
Figure 3.18: Temporal Auto-Correlation Plot of Afghanistan
72
(a) G5NR United Arab Emirates ACF Plot (b) G5NR United Arab Emirates PACF Plot
(c) Downscaled United Arab Emirates ACF Plot (d) Downscaled United Arab Emirates PACF Plot
Figure 3.19: Temporal Auto-Correlation Plots of United Arab Emirates
73
(a) G5NR Iraq ACF Plot (b) G5NR Iraq PACF Plot
(c) Downscaled Iraq ACF Plot (d) Downscaled Iraq PACF Plot
Figure 3.20: Temporal Auto-Correlation Plots of Iraq
74
(a) G5NR Qatar ACF Plot (b) G5NR Qatar PACF Plot
(c) Downscaled Qatar ACF Plot (d) Downscaled Qatar PACF Plot
Figure 3.21: Temporal Auto-Correlation Plots of Qatar
75
(a) G5NR Saudi Arabia ACF Plot (b) G5NR Saudi Arabia PACF Plot
(c) Downscaled Saudi Arabia ACF Plot (d) Downscaled Saudi Arabia PACF Plot
Figure 3.22: Temporal Auto-Correlation Plots of Saudi Arabia
76
Chapter 4
Dust Air Pollution Effect on Mortality in Kuwait
Abstract
Kuwait is among the most polluted countries in Southwest Asian and its air pollution is an essential
health concern. In this study, we aim to assess the association between exposure to dust-caused
air pollution and mortality in Kuwait during the 9-year period from 2007 to 2016 using dust AOT
data from MERRA-2 and its Variational Downscaling Method(VDM) downscaled data. The as-
sociations are examined with Poisson Generalized Additive Model(GAM) adjusting for potential
confounders, temperature and time. We also stratify our model by population characteristics (gen-
der and nationality). The effect estimations of MERRA-2 and its downscaled data are compared
to each other. In addition, we also check the consistency of dust data from both sources against
visibility and weather data collected in Kuwait International Airport[99][100].
We found that exposure to dust has lagging effect on mortality. The effect estimates from
both dust AOT data sources suggested that Kuwait female group are in risk exposing to higher
concentration of dust. The dust data from both sources showed consistency against each other and
to the ground-based visibility data.
77
4.1 Background
Air pollution in Southwest Asian is an essential health concern. The primary cause of air pollution
in this area is the natural mineral dust[76]. As the development of economic, industrialization
and urbanization, anthropogenic Particulate Matter(PM) become a new source of air pollution
[77]. By integrating Modern-Era Retrospective Analysis for Research and Applications, version
2 (MERRA-2), Southwest Asian and its nearby area contribute about 30% of global dust mass
budget [46]. This immense amount of dust emission turn Southwest Asian into the most polluted
area worldwide and this also lead to a dominant health concern for the affected growing resident
group. Countries in this area, along with other Eastern Asian countries has the highest rates of
mortality related to air pollution in the world[101].
In Southwest Asian, Kuwait is among the most polluted countries[99]. Alolayan, Mohammad
A., et al. (2013) measured annual average PM 10 and PM 2.5 in Kuwait from 2004 to 2005,
130µg/m
3
and 53µg/m
3
respectively, and found that they exceed the annual World Health Orga-
nization(WHO) guidance for outdoor air quality (PM10: 20µg/m
3
, PM2.5: 10µg/m
3
) by a factor
of six and five [102]. In addition, Ettouney, Reem S., et al. found the concentration of nitrogen
oxides in Kuwait is also increasing over the period of 2001 to 2004, and this may associated with
growing power assumption and traffic[103]. Recent researches indicate that exposure to particulate
matter associate with mortality[104] and low birth weight[105].
In this study, we aim to assess the association between exposure to dust-caused air pollution and
mortality in Kuwait. However, there is limited amount of historical air pollution record in Kuwait
[100]. The air quality measurement is also difficult due to the extreme climate condition[106].
Therefore, we use dust Aerosol Optical Thickness(AOT) data from MERRA-2 and its downscaled
high-resolution dust AOT data using our previous work, Variational Downscaled Mehthod(VDM).
The association between exposure to dust air pollution and mortality in Kuwait during the 9-
year period from 2007 to 2016 is examined using Poisson Generalized Additive Model(GAM)
adjusting for potential confounders. We stratify our model by population characteristics (gender
and nationality). The effect estimations of MERRA-2 and VDM downscaled data are compared
78
to each other. In addition, we also check the consistency of dust data from both sources against
visibility and weather data collected in Kuwait International Airport[99][100] to evaluate data
reliability.
4.2 Material and Methods
4.2.1 Study Domain
Kuwait (Figure 4.1) is a small country in Southwest Asia bordering Saudi Arabia and Iraq (area
= 17818km
2
). It is famous for its extremely dry and hot hyper-arid climate. In July, the average
temperature is 37.4°C and maximum is 45°C and the highest temperature even reached 54°C in
2016 [107]. The average total precipitation is about 100 mm/year [108]. This unique environmental
condition make the dust storm a common climate event in Kuwait[109] as well as a major source of
pollution[110]. The population in Kuwait is about 4.5 million, 2.8 million males and 1.7 millions
females. Interestingly, 70% of Kuwait population is foreigner(Non-Kuwait residents)[111].
Figure 4.1: Study domain: Kuwait
79
4.2.2 Data
4.2.2.1 Mortality Data
The cause-specific daily mortality record from 16 May 2007 to 31 December 2016 in Kuwait is
acquired from the National Center of Health Information (NCHI) at the Ministry of Health, Kuwait.
We used total non-accidental (ICD-10 codes A00-R99) mortality data, including information on
gender and nationality (Kuwaitis, non-Kuwaitis).
4.2.2.2 Visibility and Temperature Data
Hourly visibility and temperature data is collected at the Kuwait International Airport by the U.S.
Air Force[112]. The observations from other U.S. Air Force stations that are about 45 to 70 km
from the airport is utilized to fill the missing visibility and temperature data ( correlation> 0.85).
Days with dust storm is defined as poor visibility with high aerosol load and presence of dust. The
high aerosol load is shown by large value of Aerosol Optical Thickness(AOT > 0.4) from Mod-
erate Resolution Imaging Spectroradiometer(MODIS) Terra/Aqua Multi-angle Implementation of
Atmospheric Correction(MAIAC) land 1-km data. Dust presence is indicated by the corrected
reflectance images of MODIS-Terra/Aqua satellite from the NASA Worldview application and/or
aerosol images from the BSC-DREAM8b model provided by Barcelona Supercomputing Center.
4.2.2.3 Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-
2) Dust Data
The Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2)
is produced by NASA’s Global Modeling and Assimilation Office (GMAO)[55]. It utilized in-
formation from different remote sensing platforms to assimilate AOT data [60] using Goddard
Earth Observing System, version 5 (GEOS-5) [55], whose key components are an atmospheric
model [56, 57] and Gridpoint Statistical Interpolation (GSI) analysis scheme [58, 59]. Its aerosol
module simulated 15 externally aerosol mass mixing ratio tracers [61] for MERRA-2.
80
We used Dust Aerosol Extinction AOT 550 nm (AOT) [62] from 16 May 2007 to 31 December
2016. MERRA-2 Dust AOT data has 0.625
◦ longitudinal resolution, 0.5
◦ latitudinal resolution
(∼ 50 km) and daily temporal resolution.
4.2.2.4 Downscaled Dust Data
We downscaled Dust Aerosol Extinction AOT 550 nm (AOT) from MERRA-2 with Variational
Downscaling Method(VDM), the method we previously developed, along with high-resolution
AOT data from GEOS-5 Nature Run (G5NR) and elevation data from Global Multi-resolution
Terrain Elevation Data 2010 (GMTED2010). Variational Downscaling Method(VDM) is a novel
generative downscaling method based on Variational Neural Network(VNN). It can make use of
both between-scale and within-scale associations, as well as local neighborhood information in the
data. In addition, VDM can model the underlying latent variables as the source of randomness
which enable VDM produce more realistic long-term downscaing data. The downscaled dust data
has the same temporal range as dust data from MERRA-2 (16 May 2007 to 31 December 2016).
4.2.3 Statistical Method
We assess the association between exposure to dust and daily mortality with Poisson regression.
To adjust for confounders, time and temperature, that have potential non-linear effect, Generalized
Additive Model(GAM) is utilized to fit the model. Our model can be expressed as:
log(E(Y))=β
0
+ s(K)+ s(T)+ s(T
lag
)+ X
lag
(4.1)
where E(Y) is the expected daily death count, s(K) is the spline function of year (K), s(T) and
s(T
lag
) are the spline functions of temperature and its one day lagging respectively and X
lag
is
the two-day lagging moving average of dust AOT. Specifically, we examined the two-day moving
average dust AOT from lag 0 (the same day with mortality) to lag 5 (5 days prior to motality) to
81
evaluate the lagging effect of dust pollution. The associations are also evaluated after stratifying
by gender and nationality(Kuwait and Non-Kuwait).
The MERRA-2 and VDM downscaled dust data is checked against dust storm event created by
ground-based visibility data to evaluate the consistency between different data sources. The dust
AOTs of days with and without dust storm are compared with two-sample T test.
We used R Statistical Software, version 3.6.3 for all analysis in this study.
4.3 Results
4.3.1 Summary Statistics
There are 46380 non-accidental deaths across Kuwait in the period from 2007 to 2016. Table 4.1
summarizes the daily mortality, temperate and dust AOT (MERRA-2 and VDM). 3513 days are
included in the study period, within which 1041 days are identified as having dust storm and 2473
days are identified as non-dust storm days.
The average daily mortality is 13.20 (4.74) cases and there is no significant difference of daily
mortality between dust storm days and non-dust storm days (p= 0.204). The average temperature
during the study period is 27.46 (9.88)
◦ C. The mean temperature of dust storm days is 30.56(8.83)
◦ C, which is significantly higher than average temperature of non-dust storm days 26.14 (8.83)
◦ C
(p< 0.001). Dust AOT from both MERRA-2 and its VDM downscaling show significant differ-
ence between days with and wihthout dust storm(p< 0.001, p< 0.001).
Table 4.2 shows daily non-accidental mortality stratified by nationality and gender. In our
study period, there are 24311 Kuwait mortality cases and 22069 Non-Kuwait cases. Aomong
them, 27449 cases are male and 18931 cases are female. Across all strata, motality does not show
significant difference between days with and without dust storm.
82
Table 4.1: Summary statistics (mean(SD)) of daily mortality, temperature, dust AOT across Kuwait
in 2007-2016, comparing dust-storm days against non-dust-storm days with two-sample T test.
Variable
All Days
(n=3513)
Dust Storm Days
(n=1041)
Non-Dust Storm Days
(n=2473)
P value
Total mortality (deaths/day) 13.20(4.74) 13.04(4.87) 13.27(4.87) 0.204
Temperature (
◦ C) 27.46(9.88) 30.56(8.83) 26.14(8.83) < 0.001
Dust AOT (MERRA-2) 2.09(0.12) 2.16(0.11) 2.06(0.11) < 0.001
Dust AOT (VDM) 2.11(0.22) 2.22(0.21) 2.07(0.21) < 0.001
Table 4.2: Summary statistics (mean(SD)) of daily mortality across Kuwait in 2007-2016 by na-
tionality and gender, comparing dust-storm days against non-dust-storm days with two-sample T
test.
Mortality
(deaths/day)
All Days
(n=3513)
Dust Storm Days
(n=1041)
Non-Dust Storm Days
(n=2473)
P value
Kuwait 6.92(3.11) 6.87(3.21) 6.94(3.21) 0.562
Non-Kuwait 6.28(2.92) 6.17(3.01) 6.33(3.01) 0.149
Male 7.81(3.35) 7.70(3.44) 7.86(3.44) 0.195
Females 5.39(2.69) 5.35(2.81) 5.41(2.81) 0.531
83
4.3.2 Effect Estimations
Table 4.3: Estimated rate ratio (95% CI) of lagging dust AOT on mortality across Kuwait during
the study period 2007-2016 stratified by nationality and gender. The lag dust AOT includes two-
day moving average from lag 0 (same day as mortality) to lag 5 (5 days prior to mortality) and the
subscript denotes lag days involved. For example, Dust
0,1
is the moving average of dust AOT at
lag 0 and lag 1
Mortality Data Dust
0,1
Dust
1,2
Dust
2,3
Dust
3,4
Dust
4,5
Overall
VDM 1.001 (0.931, 1.076) 1.001 (0.930, 1.077) 1.003 (0.931, 1.082) 1.016 (0.941, 1.097) 1.030 (0.952, 1.114)
MERRA-2 1.188 (1.078, 1.310)
∗ 1.140 (1.033, 1.258)
∗ 1.055 (0.956, 1.164) 1.003 (0.909, 1.106) 0.979 (0.890, 1.076)
Kuwait
VDM 1.023 (0.927, 1.130) 1.023 (0.925, 1.131) 1.024 (0.923, 1.135) 1.036 (0.932, 1.151) 1.052 (0.945, 1.171)
MERRA-2 1.219 (1.066, 1.394)
∗ 1.225 (1.070, 1.403)
∗ 1.160 (1.013, 1.330)
∗ 1.061 (0.927, 1.214) 1.008 (0.885, 1.149)
Non-Kuwait
VDM 0.965 (0.870, 1.070) 0.964 (0.868, 1.072) 0.968 (0.869, 1.078) 0.980 (0.878, 1.094) 0.990 (0.885, 1.109)
MERRA-2 1.162 (1.009, 1.338)
∗ 1.061 (0.920, 1.224) 0.953 (0.826, 1.100) 0.932 (0.809, 1.074) 0.935 (0.815, 1.073)
Male
VDM 0.950 (0.866, 1.043) 0.944 (0.859, 1.039) 0.946 (0.858, 1.042) 0.952 (0.862, 1.051) 0.961 (0.869, 1.064)
MERRA-2 1.106 (0.975, 1.256) 1.090 (0.959, 1.239) 1.042 (0.916, 1.184) 0.979 (0.862, 1.111) 0.930 (0.822, 1.053)
Female
VDM 1.075 (0.961, 1.203) 1.084 (0.967, 1.216) 1.089 (0.969, 1.224) 1.111 (0.986, 1.252) 1.131 (1.001, 1.278)
∗ MERRA-2 1.309 (1.124, 1.524)
∗ 1.210 (1.037, 1.412)
∗ 1.073 (0.919, 1.251) 1.030 (0.885, 1.200) 1.044 (0.900, 1.210)
Table 4.3 presents the Rate Ratio (RR) of lagging dust AOT to daily non-accidental mortality in
Kuwait in the period from 2007 to 2016, stratified by nationality and gender. Both MERRA-
2 and VDM dust data indicate that female group mortality is significantly associate with dust.
But MERRA-2 dust data shows shorter lagging effect by having significant effects on exposure
to dust at the same day and one day prior to mortality (RR= 1.309; 95% CI: 1.124, 1.524) and
exposure to dust at one and two days prior to mortality (RR= 1.210; 95% CI: 1.037, 1.412), while
VDM dust data shows significant effect at exposure to dust at four and five days prior to mortality
(RR= 1.131; 95% CI: 1.001, 1.278). In addition, with dust AOT data from MERRA-2, we found
similar statistically significant short lagging associations among all resident exposure to dust at the
same day and one day prior to mortality (RR= 1.188; 95% CI: 1.078, 1.310) and exposure to dust
84
at one and two days prior to mortality (RR= 1.140; 95% CI: 1.033, 1.258), among Kuwait resident
group exposure to dust at the same day and one day prior to mortality (RR= 1.219; 95% CI: 1.066,
1.394), at one and two days prior to mortality (RR= 1.225; 95% CI: 1.070, 1.403) and at two and
three days prior to mortality (RR= 1.160; 95% CI: 1.013, 1.330) and among Non-Kuwait resident
group exposure to dust at the same day and one day prior to mortality (RR= 1.162; 95% CI: 1.009,
1.338).
4.4 Discussion
In this study, we assessed the association between exposure to dust and non-accidental mortality
across Kuwait in the period of 2007 to 2016, stratified by nationality (Kuwait and Non-Kuwait)
and gender. Dust were examined up to 5 days prior to mortality. We used dust AOT data from two
different sources, Modern-Era Retrospective Analysis for Research and Applications, version 2
(MERRA-2) and its downscaling with Variational Downsclaing Method. The effect estimations of
MERRA-2 and VDM downscaled data are compared to each other. In addition, we also check the
consistency of dust data from both sources against visibility and weather data collected in Kuwait
International Airport[99][100] to evaluate data reliability.
We found that exposure to dust has significant lagging effect on mortality. Dust storm days
usually have significantly higher temperature and dust AOT, but we did not observe any significant
difference in mortality (death/day). However, we can observe mortality is associate with dust AOT
at several days prior to mortality events. This may suggest that the effect of exposure to dust on
mortality need some time (1-5 days) to appear. In addition, the effect estimates from both dust AOT
data sources suggested that Kuwait female group are in risk exposing to higher concentration of
dust. The analysis using MERRA-2 dust data also indicated that the association of exposure to dust
and mortality is significant among all resident group, Kuwait citizen group as well as Non-Kuwait
citizen group.
85
We compared dust AOT at days with dust storm against days without dust storm event and both
dust data, MERRA-2 and VDM downscaled, showed significant difference ( p< 0.001, p< 0.001).
Existence of dust storm event is based on ground-based visibility data, MODIS-Terra/Aqua satel-
lite images and aerosol images from BSC-DREAM8b model. Along with the significant effects of
dust on mortality in female group, the two dust data showed consistency against each other as well
as to the other involved data.
Our study has some limitations. The location of mortality is not available in our study and thus
we used averaged dust AOT over Kuwait for all cases. This is equivalent to assume all death cases
are exposure to the same level of dust. But considering Kuwait is not large (area = 17818km
2
) and
most of the area shares similar climate, this would not be a significant bias. In addition, we did
not take indoor air quality into consideration. In-door environment is an important part of life and
including related data can further improve our study.
86
Chapter 5
Conclusions and Further Directions
In Chapter 2, we proposed artificial neural network sequential downscaling method (ASDM) with
transfer learning enhancement (ASDMTE). ASDM/ASDMTE utilizes empirical between-scale as-
sociations as well as inherent within-scale temporal associations among fine-scale data. In addi-
tion, within-scale temporal associations in the low-resolution data are integrated into the ASDMTE
model by transfer learning to enhance downscaling performance. This enabled ASDMTE to uti-
lize long time range of low-resolution data which can provide more information for downscaling.
We applied our ASDM and ASDMTE downscaling approaches to Goddard Earth Observing Sys-
tem Model, Version 5 (GEOS-5) Nature Run (G5NR) and Modern- Era Retrospective analysis for
Research and Applications, Version 2 (MERRA-2) data for several countries in Southwest Asia.
ASDM/ASDMTE performances were compared with a deep learning downscaling method, Super
Resolution Deep Residual Network (SRDRN) and a traditional statistical downscaling methods
in the dissever framework including generalized additive models (GAM), and linear regression
model (LM) over the same study domain and period. ASDM/ASDMTE shows superior perfor-
mances. Among all the test sets, ASDM and ASDMTE had mean image-wise RMSE of 0.068 and
0.067, respectively, while SRDRN, dissever GAM and dissever LM had mean image-wise RMSE
of 0.088, 0.213 and 0.214, respectively.
In Chapter 3, we first extend V AE to a more general deep latent model, called Variational Neu-
ral Network(VNN), enabling it to solve regression problem and then utilized VNN to build Vari-
ational Downscaling Method(VDM) to downscale low-resolution(∼ 50 km) Dust Aerosol Optical
87
Thickness(AOT) data from MERRA-2. The target downscaled resolution is∼ 7 km, with Dust
AOT data from G5NR as ground truth for validation. We applied VDM on G5NR and MERRA-2
dust AOT data for several countries in Southwest Asia (Figure 3.2). The test performances of VDM
is compared with our previous work ASDMTE over the same period and area. Among all the test
sets, VDM has more similar spatial auto-correlation structure as G5NR as well as better image-
wise RMSE 0.0138, while ASDMTE has image-wise RMSE 0.1456. In addition, we downscaled
1 year (365 days) of dust AOT data after 5/15/2007, the latest known G5NR time range, and eval-
uated and compared the spatial and temporal auto-correlation structure of downscaled data with
true G5NR data.
In Chapter 4, we assessed the association between exposure to dust air pollution and mortal-
ity in Kuwait during the 9-year period from 2007 to 2016 using Poisson Generalized Additive
Model(GAM) adjusting for potential confounders, temperature and time. Dust AOT data from
MERRA-2 and its VDM downscaling were utilized to check against Kuwait non-accidental mor-
tality data acquired from the National Center of Health Information (NCHI) at the Ministry of
Health, Kuwait. We also stratified our model by population characteristics (gender and national-
ity) to assess the effects in different population groups. The effect estimations of MERRA-2 and
its VDM downscaled data are compared to each other. In addition, we also check the consistency
of dust data from both sources against visibility and weather data collected in Kuwait International
Airport. The effect estimates from both dust AOT data sources suggested that Kuwait female group
are in risk exposing to higher concentration of dust. The dust data from both sources showed con-
sistency against each other and to the ground-based visibility data.
5.1 Strength
Variational Downscaling Method(VDM), as an improvement of ASDMTE, has strength in three as-
pects. First of all, we developed and applied a generative model, Variational Neural Network(VNN),
to capture the underlying random process, which enable VDM to synthesis high-resolution data in
88
longer time range without losing variability. Moreover, VDM utilized more strictly constrained
output activation function and normalization to ensure the numerical stability. These two features
enable VDM to downscale sequentially in long time range and provide more high-resolution data
for research, which is validated in Chapter 4. Lastly, we utilized local neighborhood information
that has spatial features, in addition to between- and within-scale association, to better capture the
underlying distribution as well as to improve the predictive performance.
We utilized the low-resolution MERRA-2 dust AOT data and its VDM downscaled dust AOT
data in Chapter 4 and we found they are consistent to each other and to the ground-based visibility
data. The estimated effects on mortality in Kuwait from both sources showed similar conclusion:
female group are in death risk exposing to higher concentration of dust. This epidemiological
application results indicate the capability of producing reliable downscaled data of VDM.
5.2 Limitations and Further Directions
At the same time, VDM still has some limitations that we need to notice. The variance of VDM
downscaled data is smaller than its true counterpart. Currently, we can apply posterior bias-
correction procedures to adjust it, but ultimately an end-to-end downscaling method might be more
helpful, so we may need to improve the VDM model for larger downscaled variance.
Secondly, to maintain numeric stability, we have to sacrifice the ability of VDM to model
extreme climate condition. But numerical stability and ability to model extreme is like a trade-off
and in further work, I would like to develop methods that can adjust the balance between stability
and extreme modeling based on purpose of study.
Lastly, VDM cannot fully capture the weak temporal structure of data which lead to downscaled
data has different temporal dependency. To further improve VDM downscaling performance, one
possible direction is applying well-designed network structures or training strategies to better cap-
ture the temporal structure in data. Natural Language Processing(NLP) methods like Transformer
89
[98], [113] [114] and generative pre-training [85] may provide a lot of inspirations for further
work.
In addition, large neural network models has achieved significant performance in NLP field,
like chatGPT and Bidirectional Transformers for Language Understanding (BERT) [115]. These
models usually have millions of parameters and are trained with large amount of data, and they
show considerable generalizability in new language instance. Given the large amount of observ-
able climate and air quality data we have, training large model with these data may provide good
estimations, or at least a good model initialization for downscaling unobservable area.
90
References
1. Benestad, R. E., Chen, D. & Hanssen-Bauer, I. Empirical-statistical downscaling (World
Scientific Publishing Company, 2008).
2. Randall, D. A. General circulation model development: past, present, and future (Elsevier,
2000).
3. Wigley, T., Jones, P., Briffa, K. & Smith, G. Obtaining sub-grid-scale information from
coarse-resolution general circulation model output. Journal of Geophysical Research: At-
mospheres 95, 1943–1953 (1990).
4. Wilby, R. L. & Wigley, T. M. Downscaling general circulation model output: a review of
methods and limitations. Progress in physical geography 21, 530–548 (1997).
5. Khan, M. S., Coulibaly, P. & Dibike, Y . Uncertainty analysis of statistical downscaling meth-
ods. Journal of Hydrology 319, 357–382 (2006).
6. Atkinson, P. M. Downscaling in remote sensing. International Journal of Applied Earth
Observation and Geoinformation 22, 106–114 (2013).
7. Atkinson, P. M. & Tate, N. J. Spatial scale problems and geostatistical solutions: a review.
The Professional Geographer 52, 607–623 (2000).
8. Wang, F., Tian, D., Lowe, L., Kalin, L. & Lehrter, J. Deep Learning for Daily Precipitation
and Temperature Downscaling. Water Resources Research 57, e2020WR029308 (2021).
9. Li, L., Franklin, M., Girguis, M., Lurmann, F., Wu, J., Pavlovic, N., et al. Spatiotempo-
ral imputation of MAIAC AOD using deep learning with downscaling. Remote sensing of
environment 237, 111584 (2020).
10. Malone, B. P., McBratney, A. B., Minasny, B. & Wheeler, I. A general method for down-
scaling earth resource information. Computers & Geosciences 41, 119–125 (2012).
91
11. Chudnovsky, A., Lyapustin, A., Wang, Y ., Tang, C., Schwartz, J. & Koutrakis, P. High res-
olution aerosol data from MODIS satellite for urban air quality studies. Open Geosciences
6, 17–26 (2014).
12. Kloog, I., Chudnovsky, A. A., Just, A. C., Nordio, F., Koutrakis, P., Coull, B. A., et al. A
new hybrid spatio-temporal model for estimating daily multi-year PM2. 5 concentrations
across northeastern USA using high resolution aerosol optical depth data. Atmospheric En-
vironment 95, 581–590 (2014).
13. Li, L., Girguis, M., Lurmann, F., Pavlovic, N., McClure, C., Franklin, M., et al. Ensemble-
based deep learning for estimating PM2. 5 over California with multisource big data includ-
ing wildfire smoke. Environment International 145, 106143 (2020).
14. Zheng, C., Zhao, C., Zhu, Y ., Wang, Y ., Shi, X., Wu, X., et al. Analysis of influential fac-
tors for the relationship between PM2.5 and AOD in Beijing. Atmospheric Chemistry and
Physics 17, 13473–13489 (2017).
15. Xing, Y .-F., Xu, Y .-H., Shi, M.-H. & Lian, Y .-X. The impact of PM2.5 on the human respi-
ratory system. Journal of thoracic disease 8, E69 (2016).
16. Choi, J., Oh, J. Y ., Lee, Y . S., Min, K. H., Hur, G. Y ., Lee, S. Y ., et al. Harmful impact of air
pollution on severe acute exacerbation of chronic obstructive pulmonary disease: particulate
matter is hazardous. International journal of chronic obstructive pulmonary disease 13,
1053 (2018).
17. Chau, K., Franklin, M. & Gauderman, W. J. Satellite-Derived PM
2.5
Composition and
Its Differential Effect on Children’s Lung Function. Remote Sensing 12. doi:10.3390/
rs12061028 (2020).
18. Maji, S., Ghosh, S. & Ahmed, S. Association of air quality with respiratory and cardiovas-
cular morbidity rate in Delhi, India. International journal of environmental health research
28, 471–490 (2018).
19. Gelaro, R., Putman, W. M., Pawson, S., Draper, C., Molod, A., Norris, P. M., et al. Evalua-
tion of the 7-km GEOS-5 nature run (2015).
20. Bo´ e, J., Terray, L., Habets, F. & Martin, E. Statistical and dynamical downscaling of the
Seine basin climate for hydro-meteorological studies. International Journal of Climatology:
A Journal of the Royal Meteorological Society 27, 1643–1655 (2007).
92
21. Wilby, R. L., Charles, S. P., Zorita, E., Timbal, B., Whetton, P. & Mearns, L. O. Guidelines
for use of climate scenarios developed from statistical downscaling methods. Supporting
material of the Intergovernmental Panel on Climate Change, available from the DDC of
IPCC TGCIA 27 (2004).
22. Winston, P. H. Artificial intelligence (Addison-Wesley Longman Publishing Co., Inc., 1984).
23. Szegedy, C., Liu, W., Jia, Y ., Sermanet, P., Reed, S., Anguelov, D., et al. Going deeper
with convolutions in Proceedings of the IEEE conference on computer vision and pattern
recognition (2015), 1–9.
24. He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition in Pro-
ceedings of the IEEE conference on computer vision and pattern recognition (2016), 770–
778.
25. Ronneberger, O., Fischer, P. & Brox, T. U-net: Convolutional networks for biomedical
image segmentation in Medical Image Computing and Computer-Assisted Intervention–
MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Pro-
ceedings, Part III 18 (2015), 234–241.
26. Jain, A. K., Mao, J. & Mohiuddin, K. M. Artificial neural networks: A tutorial. Computer
29, 31–44 (1996).
27. McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity.
The bulletin of mathematical biophysics 5, 115–133 (1943).
28. Goodfellow, I., Bengio, Y . & Courville, A. Deep Learninghttp://www.deeplearningbook.
org (MIT Press, 2016).
29. Rosenblatt, F. The perceptron: a probabilistic model for information storage and organiza-
tion in the brain. Psychological review 65, 386 (1958).
30. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating
errors. nature 323, 533–536 (1986).
31. Hinton, G. E., Osindero, S. & Teh, Y .-W. A fast learning algorithm for deep belief nets.
Neural computation 18, 1527–1554 (2006).
32. Hinton, G. E. & Salakhutdinov, R. R. Reducing the dimensionality of data with neural net-
works. science 313, 504–507 (2006).
93
33. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al.
Generative adversarial networks. Communications of the ACM 63, 139–144 (2020).
34. Nielsen, M. A visual proof that neural nets can compute any function. Neural Networks and
Deep Learning (2016).
35. Rodrıguez, O. H. & Lopez Fernandez, J. M. A semiotic reflection on the didactics of the
chain rule. The Mathematics Enthusiast 7, 321–332 (2010).
36. Ines, A. V ., Mohanty, B. P. & Shin, Y . An unmixing algorithm for remotely sensed soil
moisture. Water Resources Research 49, 408–425 (2013).
37. Loew, A. & Mauser, W. On the disaggregation of passive microwave soil moisture data
using a priori knowledge of temporally persistent soil moisture fields. IEEE Transactions
on Geoscience and Remote Sensing 46, 819–834 (2008).
38. Van den Berg, M., Vandenberghe, S., Baets, B. D. & Verhoest, N. Copula-based downscal-
ing of spatial rainfall: a proof of concept. Hydrology and Earth System Sciences 15, 1445–
1457 (2011).
39. Ba˜ no-Medina, J., Manzanas, R. & Guti´ errez, J. M. Configuration and intercomparison of
deep learning neural models for statistical downscaling. Geoscientific Model Development
13, 2109–2124 (2020).
40. Franklin, M., Chau, K., Kalashnikova, O. V ., Garay, M. J., Enebish, T. & Sorek-Hamer,
M. Using multi-angle imaging spectroradiometer aerosol mixture properties for air quality
assessment in Mongolia. Remote Sensing 10, 1–14. doi:10.3390/RS10081317 (2018).
41. Franklin, M., Kalashnikova, O. V . & Garay, M. J. Size-resolved particulate matter concen-
trations derived from 4.4km-resolution size-fractionated Multi-angle Imaging SpectroRa-
diometer (MISR) aerosol optical depth over Southern California. Remote Sensing of Envi-
ronment 196, 312–323. doi:10.1016/j.rse.2017.05.002 (2017).
42. Farzanegan, M. R. & Markwardt, G. Development and pollution in the Middle East and
North Africa: democracy matters. Journal of Policy Modeling 40, 350–374 (2018).
43. Chau, K., Franklin, M., Lee, H. & Garay, M. Temporal and Spatial Autocorrelation as Deter-
minants of Regional AOD-PM 2 . 5 Model Performance in the Middle East. Remote Sensing
13, 1–18 (2021).
94
44. Li, J., Garshick, E., Hart, J. E., Li, L., Shi, L., Al-Hemoud, A., et al. Estimation of ambient
PM2.5 in Iraq and Kuwait from 2001 to 2018 using machine learning and remote sensing.
Environment International 151. doi:10.1016/j.envint.2021.106445 (2021).
45. Sun, E., Xu, X., Che, H., Tang, Z., Gui, K., An, L., et al. Variation in MERRA-2 aerosol
optical depth and absorption aerosol optical depth over China from 1980 to 2017. Journal
of Atmospheric and Solar-Terrestrial Physics 186, 8–19 (2019).
46. Ukhov, A., Mostamandi, S., da Silva, A., Flemming, J., Alshehri, Y ., Shevchenko, I., et
al. Assessment of natural and anthropogenic aerosol air pollution in the Middle East us-
ing MERRA-2, CAMS data assimilation products, and high-resolution WRF-Chem model
simulations. Atmospheric Chemistry and Physics 20, 9281–9310 (2020).
47. Da Silva, A. M., Putman, W. & Nattala, J. File Specification for the 7-km GEOS-5 Na-
ture Run, Ganymed Release Non-Hydrostatic 7-km Global Mesoscale Simulation tech. rep.
(2014).
48. Wilby, R. L., Wigley, T., Conway, D., Jones, P., Hewitson, B., Main, J., et al. Statistical
downscaling of general circulation model output: A comparison of methods. Water re-
sources research 34, 2995–3008 (1998).
49. Xu, Y ., Wang, L., Ma, Z., Li, B., Bartels, R., Liu, C., et al. Spatially explicit model for
statistical downscaling of satellite passive microwave soil moisture. IEEE Transactions on
Geoscience and Remote Sensing 58, 1182–1191 (2019).
50. Chang, H. H., Hu, X. & Liu, Y . Calibrating MODIS aerosol optical depth for predicting
daily PM 2.5 concentrations via statistical downscaling. Journal of exposure science & en-
vironmental epidemiology 24, 398–404 (2014).
51. Goodfellow, I., Bengio, Y . & Courville, A. Deep learning (MIT press, 2016).
52. Yuan, Q., Shen, H., Li, T., Li, Z., Li, S., Jiang, Y ., et al. Deep learning in environmental
remote sensing: Achievements and challenges. Remote Sensing of Environment 241, 111716
(2020).
53. Hidalgo, H. G., Dettinger, M. D. & Cayan, D. R. Downscaling with constructed analogues:
Daily precipitation and temperature fields over the United States. California Energy Com-
mission PIER Final Project Report CEC-500-2007-123 (2008).
95
54. Agatonovic-Kustrin, S. & Beresford, R. Basic concepts of artificial neural network (ANN)
modeling and its application in pharmaceutical research. Journal of pharmaceutical and
biomedical analysis 22, 717–727 (2000).
55. Gelaro, R., McCarty, W., Su´ arez, M. J., Todling, R., Molod, A., Takacs, L., et al. The
modern-era retrospective analysis for research and applications, version 2 (MERRA-2).
Journal of climate 30, 5419–5454 (2017).
56. Rienecker, M. M., Suarez, M., Todling, R., Bacmeister, J., Takacs, L., Liu, H., et al. The
GEOS-5 Data Assimilation System: Documentation of Versions 5.0. 1, 5.1. 0, and 5.2. 0
(2008).
57. Molod, A., Takacs, L., Suarez, M. & Bacmeister, J. Development of the GEOS-5 atmo-
spheric general circulation model: Evolution from MERRA to MERRA2. Geoscientific
Model Development 8, 1339–1356 (2015).
58. Wu, W.-S., Purser, R. J. & Parrish, D. F. Three-dimensional variational analysis with spa-
tially inhomogeneous covariances. Monthly Weather Review 130, 2905–2916 (2002).
59. Kleist, D. T., Parrish, D. F., Derber, J. C., Treadon, R., Wu, W.-S. & Lord, S. Introduction
of the GSI into the NCEP global data assimilation system. Weather and Forecasting 24,
1691–1705 (2009).
60. Koster, R. D., McCarty, W., Coy, L., Gelaro, R., Huang, A., Merkova, D., et al. MERRA-2
input observations: Summary and assessment (2016).
61. Randles, C. A., da Silva, A. M., Buchard, V ., Colarco, P. R., Darmenov, A., Govindaraju,
R., et al. The MERRA-2 aerosol reanalysis, 1980 onward. Part I: System description and
data assimilation evaluation. Journal of Climate 30, 6823–6850. doi:10.1175/JCLI-D-
16-0609.1 (2017).
62. Bosilovich, M., Lucchesi, R. & Suarez, M. MERRA-2: File specification (2015).
63. Danielson, J. J. & Gesch, D. B. Global multi-resolution terrain elevation data 2010 (GMTED2010)
(US Department of the Interior, US Geological Survey, 2011).
64. Carabajal, C. C., Harding, D. J., Boy, J.-P., Danielson, J. J., Gesch, D. B. & Suchdeo, V . P.
Evaluation of the global multi-resolution terrain elevation data 2010 (GMTED2010) using
ICESat geodetic control in International Symposium on Lidar and Radar Mapping 2011:
Technologies and Applications 8286 (2011), 82861Y.
96
65. Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Transactions on knowledge and
data engineering 22, 1345–1359 (2009).
66. Torrey, L. & Shavlik, J. in Handbook of research on machine learning applications and
trends: algorithms, methods, and techniques 242–264 (IGI global, 2010).
67. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Computation 9, 1735–
1780 (1997).
68. Maas, A. L., Hannun, A. Y ., Ng, A. Y ., et al. Rectifier nonlinearities improve neural network
acoustic models in Proc. icml 30 (2013), 3.
69. Santurkar, S., Tsipras, D., Ilyas, A. & Madry, A. How does batch normalization help opti-
mization? in Proceedings of the 32nd international conference on neural information pro-
cessing systems (2018), 2488–2498.
70. Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reduc-
ing internal covariate shift in International conference on machine learning (2015), 448–
456.
71. Wager, S., Wang, S. & Liang, P. S. Dropout training as adaptive regularization. Advances in
neural information processing systems 26, 351–359 (2013).
72. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a
simple way to prevent neural networks from overfitting. The journal of machine learning
research 15, 1929–1958 (2014).
73. Wang, Y ., Sivandran, G. & Bielicki, J. M. The stationarity of two statistical downscaling
methods for precipitation under different choices of cross-validation periods. International
Journal of Climatology 38, e330–e348 (2018).
74. Lanzante, J. R., Dixon, K. W., Nath, M. J., Whitlock, C. E. & Adams-Smith, D. Some
pitfalls in statistical downscaling of future climate. Bulletin of the American Meteorological
Society 99, 791–803 (2018).
75. Zender, C. S., Miller, R. & Tegen, I. Quantifying mineral dust mass budgets: Terminology,
constraints, and current estimates. Eos, Transactions American Geophysical Union 85, 509–
512 (2004).
97
76. Klingm¨ uller, K., Pozzer, A., Metzger, S., Stenchikov, G. L. & Lelieveld, J. Aerosol opti-
cal depth trend over the Middle East. Atmospheric Chemistry and Physics 16, 5063–5073
(2016).
77. Al-Taani, A. A., Nazzal, Y . & Howari, F. M. Assessment of heavy metals in roadside dust
along the Abu Dhabi–Al Ain National Highway, UAE. Environmental Earth Sciences 78,
1–13 (2019).
78. Jablonowski, C. & Williamson, D. L. The pros and cons of diffusion, filters and fixers in
atmospheric general circulation models. Numerical techniques for global atmospheric mod-
els, 381–493 (2011).
79. Shaheen, A., Wu, R., Lelieveld, J., Yousefi, R. & Aldabash, M. Winter AOD trend changes
over the Eastern Mediterranean and Middle East region. International Journal of Climatol-
ogy 41, 5516–5535 (2021).
80. Maraun, D. & Widmann, M. Statistical downscaling and bias correction for climate re-
search (Cambridge University Press, 2018).
81. LeCun, Y ., Bengio, Y . & Hinton, G. Deep learning. nature 521, 436–444 (2015).
82. Krizhevsky, A., Sutskever, I. & Hinton, G. E. Imagenet classification with deep convolu-
tional neural networks. Communications of the ACM 60, 84–90 (2017).
83. Tompson, J. J., Jain, A., LeCun, Y . & Bregler, C. Joint training of a convolutional network
and a graphical model for human pose estimation. Advances in neural information process-
ing systems 27 (2014).
84. Mikolov, T., Deoras, A., Povey, D., Burget, L. &
ˇ
Cernock, J. Strategies for training large
scale neural network language models in 2011 IEEE Workshop on Automatic Speech Recog-
nition & Understanding (2011), 196–201.
85. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I., et al. Improving language under-
standing by generative pre-training (2018).
86. Vu, M. T., Aribarg, T., Supratid, S., Raghavan, S. V . & Liong, S.-Y . Statistical downscal-
ing rainfall using artificial neural network: significantly wetter Bangkok? Theoretical and
applied climatology 126, 453–467 (2016).
98
87. Chaudhuri, C. & Robertson, C. CliGAN: A structurally sensitive convolutional neural net-
work model for statistical downscaling of precipitation from multi-model ensembles. Water
12, 3353 (2020).
88. Misra, S., Sarkar, S. & Mitra, P. Statistical downscaling of precipitation using long short-
term memory recurrent neural networks. Theoretical and applied climatology 134, 1179–
1196 (2018).
89. Wang, M., Franklin, M. & Li, L. Generating Fine-Scale Aerosol Data through Downscaling
with an Artificial Neural Network Enhanced with Transfer Learning. Atmosphere 13, 255
(2022).
90. Kingma, D. P., Welling, M., et al. An introduction to variational autoencoders. Foundations
and Trends® in Machine Learning 12, 307–392 (2019).
91. Kingma, D. P. & Welling, M. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114
(2013).
92. Razavi, A., Van den Oord, A. & Vinyals, O. Generating diverse high-fidelity images with
vq-vae-2. Advances in neural information processing systems 32 (2019).
93. Peng, J., Liu, D., Xu, S. & Li, H. Generating diverse structure for image inpainting with
hierarchical VQ-VAE in Proceedings of the IEEE/CVF Conference on Computer Vision and
Pattern Recognition (2021), 10775–10784.
94. Bahuleyan, H., Mou, L., Vechtomova, O. & Poupart, P. Variational attention for sequence-
to-sequence models. arXiv preprint arXiv:1712.08207 (2017).
95. Sun, G., Zhang, Y ., Weiss, R. J., Cao, Y ., Zen, H., Rosenberg, A., et al. Generating diverse
and natural text-to-speech samples using a quantized fine-grained VAE and autoregressive
prosody prior in ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech
and Signal Processing (ICASSP) (2020), 6699–6703.
96. Eguchi, R. R., Anand, N., Choe, C. A. & Huang, P.-S. IG-V AE: generative modeling of
immunoglobulin proteins by direct 3D coordinate generation. Biorxiv 2020, 8 (2020).
97. Blei, D. M., Kucukelbir, A. & McAuliffe, J. D. Variational inference: A review for statisti-
cians. Journal of the American statistical Association 112, 859–877 (2017).
99
98. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., et al. Attention
is all you need. Advances in neural information processing systems 30 (2017).
99. Barkley, M. P., Gonz´ alez Abad, G., Kurosu, T. P., Spurr, R., Torbatian, S. & Lerot, C. OMI
air-quality monitoring over the Middle East. Atmospheric Chemistry and Physics 17, 4687–
4709 (2017).
100. Achilleos, S., Al-Ozairi, E., Alahmad, B., Garshick, E., Neophytou, A. M., Bouhamra, W.,
et al. Acute effects of air pollution on mortality: a 17-year analysis in Kuwait. Environment
international 126, 476–483 (2019).
101. Cohen, A. J., Brauer, M., Burnett, R., Anderson, H. R., Frostad, J., Estep, K., et al. Estimates
and 25-year trends of the global burden of disease attributable to ambient air pollution: an
analysis of data from the Global Burden of Diseases Study 2015. The Lancet 389, 1907–
1918 (2017).
102. Alolayan, M. A., Brown, K. W., Evans, J. S., Bouhamra, W. S. & Koutrakis, P. Source
apportionment of fine particles in Kuwait City. Science of the Total Environment 448, 14–
25 (2013).
103. Ettouney, R. S., Zaki, J. G., El-Rifai, M. A. & Ettouney, H. M. An assessment of the air
pollution data from two monitoring stations in Kuwait. Toxicological and Environ Chemistry
92, 655–668 (2010).
104. Brunekreef, B., Beelen, R., Hoek, G., Schouten, L., Bausch-Goldbohm, S., Fischer, P., et al.
Effects of long-term exposure to traffic-related air pollution on respiratory and cardiovas-
cular mortality in the Netherlands: the NLCS-AIR study. Research report (Health Effects
Institute), 5–71 (2009).
105. Ibrahimou, B., Salihu, H., Gasana, J. & Owusu, H. Risk of low birth weight and very low
birth weight from exposure to particulate matter (PM2. 5) speciation metals during preg-
nancy. Gynecol. Obs 4, 2161–0932 (2014).
106. Council, N. R. et al. Review of the department of defense enhanced particulate matter
surveillance program report (2010).
107. Merlone, A., Al-Dashti, H., Faisal, N., Cerveny, R. S., AlSarmi, S., Bessemoulin, P., et al.
Temperature extreme records: World Meteorological Organization metrological and meteo-
rological evaluation of the 54.0 C observations in Mitribah, Kuwait and Turbat, Pakistan in
2016/2017. International Journal of Climatology 39, 5154–5169 (2019).
100
108. Khalaf, F. Desertification and aeolian processes in the Kuwait Desert. Journal of Arid Envi-
ronments 16, 125–145 (1989).
109. Li, J., Garshick, E., Al-Hemoud, A., Huang, S. & Koutrakis, P. Impacts of meteorology and
vegetation on surface dust concentrations in Middle Eastern countries. Science of the total
environment 712, 136597 (2020).
110. Al-Dousari, A. M., Al-Awadhi, J., et al. Dust fallout in northern Kuwait, major sources and
characteristics. Kuwait Journal of Science 39, 171–187 (2012).
111. PACI. The public authority for civil information 2019.
112. Masri, S., Garshick, E., Hart, J., Bouhamra, W. & Koutrakis, P. Use of visual range mea-
surements to predict fine particulate matter exposures in Southwest Asia and Afghanistan.
Journal of the Air & Waste Management Association 67, 75–85 (2017).
113. Sønderby, S. K., Sønderby, C. K., Maaløe, L. & Winther, O. Recurrent spatial transformer
networks. arXiv preprint arXiv:1509.05329 (2015).
114. Giuliari, F., Hasan, I., Cristani, M. & Galasso, F. Transformer networks for trajectory fore-
casting in 2020 25th international conference on pattern recognition (ICPR) (2021), 10335–
10342.
115. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional
transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018).
101
Abstract (if available)
Abstract
This dust-caused air pollution is becoming an dominant health concern for Southwest Asian. However, there is very limited amount of air quality data over this region to support environmental health research. General Circulation Model(GCM) can provide estimations for unobserved area in low spatial resolution. As the development of assimilation products and computation power, GCM also produced air pollution data in higher resolution, but in limited temporal range due to computational cost. At the same time, air pollution data with both high spatial resolution and long temporal range are essential for conducting air quality studies and assessing the health effects associated with exposure to air pollution. Statistical downscaling is commonly used as a computationally efficient approach to generate high-resolution air quality data based on GCM output. But traditional statistical downscaling methods are usually developed to solve the resolution mismatching and cannot utilize the recently available high-resolution data. In this dissertation, we aim at developing novel Artificial Neural Network(ANN)-based methods to address supervised downscaling problem, that is downscale low-resolution GCM output along with high-resolution GCM product in a supervised format to improve downscaling performance and reliability. With these methods, we would like to generate trustworthy high-resolution air quality data for Southwest Asian area and promote the downstream environmental health research.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Forecasting traffic volume using machine learning and kriging methods
PDF
Machine learning approaches for downscaling satellite observations of dust
PDF
Downscaling satellite observations of dust with deep learning
PDF
Using multi-angle imaging spectroradiometer aerosol mixture properties and meteorology for PM₂.₅ assessment in Iran
PDF
Covariance-based distance-weighted regression for incomplete and misaligned spatial data
PDF
Inference correction in measurement error models with a complex dosimetry system
PDF
Using artificial neural networks to estimate evolutionary parameters
PDF
Statistical citation network analysis and asymmetric error controls
PDF
Multilevel Bayesian latent class growth mixture model for longitudinal zero-inflated Poisson data
PDF
Characterization and discovery of genetic associations: multiethnic fine-mapping and incorporation of functional information
PDF
Uncertainty quantification in extreme gradient boosting with application to environmental epidemiology
PDF
Minimum p-value approach in two-step tests of genome-wide gene-environment interactions
PDF
Finding signals in Infinium DNA methylation data
PDF
Comparison of models for predicting PM2.5 concentration in Wuhan, China
PDF
Prediction and feature selection with regularized regression in integrative genomics
PDF
Sentiment analysis in the COVID-19 vaccine willingness among staff in the University of Southern California
PDF
Enhancing model performance of regularization methods by incorporating prior information
PDF
Statistical analysis of high-throughput genomic data
PDF
Hierarchical approaches for joint analysis of marginal summary statistics
PDF
Integrative analysis of multi-view data with applications in epidemiology
Asset Metadata
Creator
Wang, Menglin
(author)
Core Title
Statistical downscaling with artificial neural network
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Degree Conferral Date
2023-05
Publication Date
05/04/2023
Defense Date
03/24/2023
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Air pollution,artificial neural network,deep learning,downscaling,OAI-PMH Harvest
Format
theses
(aat)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Franklin, Meredith (
committee chair
), Ambite, Jose-Luis (
committee member
), Gauderman, William (
committee member
), Lewinger, Juan Pablo (
committee member
), Li, Chun (
committee member
)
Creator Email
menglinw@usc.edu,menglinw389@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC113099313
Unique identifier
UC113099313
Identifier
etd-WangMengli-11769.pdf (filename)
Legacy Identifier
etd-WangMengli-11769
Document Type
Dissertation
Format
theses (aat)
Rights
Wang, Menglin
Internet Media Type
application/pdf
Type
texts
Source
20230504-usctheses-batch-1036
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
artificial neural network
deep learning
downscaling