Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Using GIS to explore the tradeoffs in hydrographic survey planning: an investigation of sampling, interpolation, and local geomorphology
(USC Thesis Other)
Using GIS to explore the tradeoffs in hydrographic survey planning: an investigation of sampling, interpolation, and local geomorphology
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Using GIS to Explore the Tradeoffs in Hydrographic Survey Planning: An Investigation of Sampling,
Interpolation, and Local Geomorphology
by
Sarah Rosenthal
A Thesis Presented to the
Faculty of the USC Dornsife College of Letters, Arts and Sciences
University of Southern California
In Partial Fulfillment of the
Requirements for the Degree
Master of Science
(Geographic Information Science and Technology)
May 2020
Copyright 2020 Sarah Rosenthal
ii
To Christopher Kelley.
iii
Acknowledgments
I would like to thank all the faculty at the Spatial Science Institute I have had the
privilege to work with during my time in the GIST program. I would also like to thank my dad
for all his patience and support throughout my life. This graduate program gave me a new
respect for him and his career, one day I hope to know as much about computer science as he
does.
iv
Table of Contents
Acknowledgments.......................................................................................................................... iii
List of Tables .................................................................................................................................. v
List of Figures ................................................................................................................................ vi
List of Abbreviations .................................................................................................................... vii
Abstract ........................................................................................................................................ viii
Chapter 1 Introduction .................................................................................................................... 1
1.1 History of ocean mapping ..................................................................................................... 1
1.2 Bathymetry compilation efforts ............................................................................................ 3
1.4 Significance and goals of research ........................................................................................ 5
1.5 Thesis organization ............................................................................................................... 6
Chapter 2 Background .................................................................................................................... 7
2.1 Creating a surface ................................................................................................................. 7
2.2 Related Works ....................................................................................................................... 9
2.2.1 Inverse distance weighting ........................................................................................... 10
2.2.2 Empirical Bayesian kriging.......................................................................................... 12
2.2.3 Machine learning and random forests .......................................................................... 14
2.3 Summary ............................................................................................................................. 17
Chapter 3 Methods ........................................................................................................................ 20
3.1 Research objectives ............................................................................................................. 20
3.2 Study area............................................................................................................................ 20
3.3 Spatial data preparation....................................................................................................... 23
3.4 Procedure ............................................................................................................................ 26
3.4.1 Interpolating surfaces ................................................................................................... 26
3.4.2 Comparing models ....................................................................................................... 29
Chapter 4 Results .......................................................................................................................... 30
4.1 Interpolation accuracy ......................................................................................................... 30
4.2 Subsample accuracy ............................................................................................................ 34
Chapter 5 Conclusions .................................................................................................................. 36
5.2 Limitations .......................................................................................................................... 37
5.3 Future work ......................................................................................................................... 39
References ................................................................................................................................. 41
v
List of Tables
Table 1: SWATHplus-m sonar specifications……………………………………..………...…..23
Table 2: Number of points retained in each of the generated datasets…………………………..26
Table 3: Residual values between random subsamples compared to ground truth ...…………...35
vi
List of Figures
Figure 1: GEBCO 2014 model, underlying high-quality data, and multibeam sonar tracks….…..4
Figure 2: Bathymetry collected in Monterey Bay, California…………………………………...22
Figure 3: Study area classified by benthic region………………………………………………..25
Figure 4. Empirical semivariograms for observed data values…………………………………..27
Figure 5. RMSE between ground truth and surfaces interpolated using sample points collected
half as frequently as the original measurements…………………………………………………31
Figure 6: Medium profile shelf interpolations using 50% of measurements…...………………..33
Figure 7: High profile shelf interpolations using 50% of measurements………………………..34
vii
List of Abbreviations
ASV/AUV Autonomous surface vehicle/autonomous underwater vehicle
CART Classification and regression trees
DEM Digital elevation model
EBK Empirical Bayesian kriging
ENC Electronic navigation chart
GEBCO Global Bathymetric Chart of the Oceans
GIS Geographic information science/system
GPS Global positioning system
IDW Inverse distance weighted
IoT Internet of Things
LIDAR Light detecting and ranging
LLN Law of large numbers
MBARI Monterey Bay Aquarium Research Institute
MBES Multibeam echosounder
MLA Machine learning algorithm
RF Random forest
RTK Real time kinematic
SDB Satellite derived bathymetry
SDG Sustainable Development Goal
SONAR Sound navigation and ranging
viii
Abstract
The lack of seafloor information is often a result of the challenging logistics and expenses
involved with acquiring data in this unique environment. Yet, despite the sparsely sampled
environment, many significant efforts exist to create global bathymetry models. However, there
exists a public misunderstanding of the true sampling density in the ocean that can be largely
attributed to contemporary interpolation and enhanced cartography. The seafloor is more
sparsely sampled than most people realize. Thus, it is important to understand the influence of
the underlying source data and the interpolation technique used when creating an accurate digital
bathymetry model. The accuracy of a surfaces can depend on sampling density, interpolation
method, and local geomorphology. However, if a bathymetry surface can be accurately created
using sparse measurements, mission planning can be directed to sample the seafloor at a certain
resolution. The results of this thesis research encourage future exploration of a computationally
efficient method to assess the best method of interpolation method in different regions under
different conditions.
1
Chapter 1 Introduction
This introductory chapter discusses the history of seafloor mapping, why bathymetry
measurements remain so sparse, and why there are limited accurate measurements of shallow
bathymetry. To conclude this section, the research goals and significance of this thesis are
outlined.
1.1 History of ocean mapping
The term bathymetry refers to the measurement of various depths and shapes formed by
underwater rock or sediment features on the seabed (NOAA 2019). Bathymetric data is an
essential component within the field of hydrography that can be used to characterize baseline
information. Bathymetry is critical to many multidisciplinary oceanographic operations including
biological, geophysical, atmospheric, and even meteorological processes.
Endeavors to obtain bathymetry have presented a challenging task throughout history.
The first historical accounts of recording measurements of underwater depth dates back to
ancient Egypt in 1800 B.C. (Theberge 1989). Early measuring techniques involved a weighted
lead rope deployed from the side of a ship and lowered into the water until it reached the bottom.
While believed to be a practical preliminary tool at the time, this method did not consider more
than one specific point on the seafloor at a time, and nor did it include precise position
measurements.
Bathymetric surveys have since evolved from simple, extremely time-consuming
techniques, to innovative methods involving sound waves. Largely inspired by submarine
warfare in World War I (1920), sensors were developed that used sound waves to listen and
detect objects underwater (University of Rhode Island 2019). This efficient two-way sound
travel prompted the significant development of single beam echosounders (SBES) (Mayer 2006).
2
This improved technique involved an acoustic signal from a ship that is sent down towards the
seafloor and back. This permits depth to be calculated based on the speed of the signals return.
This method was expanded upon with the multibeam echosounder (MBES) which provided
extended swath coverage and increased efficiency in terms of the ship resources.
While sound navigation and ranging (SONAR) are considered the most effective high-
resolution acquisition technique, limitations do exist. These systems are typically mounted on the
hull of ships which restrict their use to only mapping deep waters. Ships operating hydrographic
missions require slower transit speeds in order to accurately acquire seafloor coverage and avoid
gaps. The fuel required to operate large ships constrains their use to regions whose hydrographic
offices have designated budgets to support these costly missions. The tradeoff between
resolution, propagation, and coverage has been recognized as the limiting factor in collecting
MBES data (Mayer 2016).
Often in shallow regions various physical and morphological features make these areas
the most challenging to survey by ship. Modern remote sensing technology such as satellite-
derived bathymetry (SDB) and light detecting and ranging (LIDAR) can also provide
information on the seafloor. SDB is derived from multispectral imagery and uses remote
platforms to collect data spanning multiple spectral bands. Bathymetric LIDAR uses a green
wavelength to penetrate the water column through airborne acquisition technology. Both
acquisition sources, however, come at a high cost in terms of the technical production and
operation. These optical solutions to monitoring bathymetry also are limited by water clarity and
depth. In coastal regions, collection of adequate shallow bathymetric measurements with optical
techniques only work as well as the water clarity permits.
3
Currently, there is a shift in the hydrographic community towards applications of
autonomous survey technology and processing techniques. Two common forms are autonomous
surface vehicles (ASV) and autonomous underwater vehicles (AUV). This transition towards
autonomy reduces time and human efforts, that are otherwise very costly in ship-based surveys.
Autonomous systems offer an advantage of operating in hazardous sites and areas where ships
cannot navigate, such as shallower waters or underwater caves. Nevertheless, high quality data of
the shape and depth of the seafloor remain a foundational requirement for operation planning
before a vehicle can be deployed.
1.2 Bathymetry compilation efforts
Technological advances in the past few decades have seen a substantial increase in the
ability to compute and digitally visualize the globe. The international General Bathymetric Chart
of the Oceans (GEBCO) has spent the past 100 years collecting and sharing global bathymetry
data. This dataset compiles high resolution MBES data fused with a background of coarse
resolution satellite altimetry. Figure 1 reveals the GEBCO 2014 model, an enhanced view of the
quality of the surface, and the underlying data used to create the model. A recent evaluation by
Mayer (2018) showed that 82% of the GEBCO data product contains no data values. This means
that only 18% of gridded cells across the globe contain actual data values. The results of this gap
analysis also revealed that of those gridded cells that contain data, only 9% contained data
collected by modern sonar technology capable of producing reliable measurements.
4
Figure 1: GEBCO 2014 model (left), the sparsity of underlying high-quality data (upper right),
and the actual multibeam sonar tracks (lower right) traveled to collect the swath data. (Mayer
2018).
The coastal regions represent an unusually challenging area for hydrographers. Due to the
lack of data in this highly transitional zone, the coastal zone is often referred to as the “white
ribbon” (Leon et al. 2013). Much of the known data in coastal regions comes from electronic
navigation chart (ENC) soundings. Hydrographic standards allow for contours or sounding
points to be extracted from ENC to be used in the production of gridded models. There is often a
strong bias in the spatial presence of these soundings in favor of societal needs (Zoraster and
Bayer 1992, Haigang et al. 2005). The horizontal spacing of soundings are primarily
concentrated around shoals, shipping lanes, and ports.
Many coastal regions also hold restrictions to data access making published ENC charts
the only viable option for estimating the seafloor. Coastal regions are also notorious for
continuous change due to many natural and anthropogenic influences. These frequent changes
demand updated survey coverage in order to maintain accurate data. At a local scale, incomplete
5
or missing coastal elevation data can prevent communities from understanding their own region
and lead to misrepresented needs for management and protection (Hogrefe, Wright, and
Hochberg 2008). Legacy depths displayed on ENCs are often less than ideal and do not
accurately reflect the current depth.
1.4 Significance and goals of research
The ocean covers 71% of the Earth’s surface and is a critical component to sustaining
life, controlling climate, facilitating commerce and managing marine resources. A complete
digital representation of the seafloor is necessary for an understanding of ocean science.
Additionally, the physical, chemical, and biological characteristics of many marine systems are
influenced by benthic depth and features. Yet despite its importance, most of the marine
environment remains unmapped and unexplored.
A challenge in creating an accurate bathymetry model is filling the gaps where data
acquisition is consistently difficult, expensive, or not accessible. Enabled by advances in
computer science and geospatial technology, interpolation allows for continuous surfaces to be
generated from remotely sensed data without the need to measure each individual location. Yet
the elevation models obtained from interpolation analysis are often blindly accepted as the
absolute truth. It is important to consider the underlying spatial configuration and choices made
in the process of creating a surface.
This thesis works towards understanding the influence of source data density and
interpolation methods on bathymetry accuracy. Specifically, this thesis research addresses the
question, which interpolation method will provide superior results when measurements are
collected with half the spatial frequency as the original sampling density. The second objective
presses on to consider if fewer measurements can be taken while generating relatively similar
6
results. In order to accomplish this objective, two geomorphic regions within Monterey Bay will
be explored using three different interpolants and four different densities of input data.
The significance of this research will be an increased understanding about the trade-offs
made when constructing a bathymetric model of a region with regard to sampling density,
interpolation methods, and local geomorphology. The main takeaway from the results derived in
this study is an improved understanding of how interpolants preform, with varying levels of
sparsity in sampling, in different types of coastal geomorphology. The implications of these
results can assist with decision making in planning future coastal surveys, as well as help
understand the results and accuracy of existing surveys.
1.5 Thesis organization
The organization of the remainder of this thesis begins with a review of the published
literature on the process of creating surfaces, and several interpolation techniques. The third
chapter discusses the methodology used in this study, including an overview of the study area
and its properties, GIS data preparation, and interpolation analysis. The fourth chapter provides
the results of the analysis. The final chapter concludes with a detailed discussion of results, study
limitations, and future work.
7
Chapter 2 Background
This section begins with an evaluation of literature studies that evaluate the process of
generating surfaces from point measurements, and then considers different techniques of doing
so along with their benefits and shortcomings.
2.1 Creating a surface
Models of elevation have been a subject of interest since the sixteenth century.
Techniques that create these models have seen substantial changes over the last few decades as
technology advances (Eakins and Grothe, 2014). In an ideal world, consumers of spatial data
would be able to completely rely on digitized surfaces that are composed of tightly grouped
measurements. Yet the ubiquitous nature of sparse remotely sensed data demands the existence
of many different methods, techniques and models that create a surface from different types of
data. Ultimately sources of error in a digital surface can be the result of input data or decisions
made by an analyst. It is important to understand these sources of model uncertainty to
understand the accuracy of elevation models created using these methods.
Digitizing elevation is a well-trodden area of research. This subject is unique because it is
purely enabled by Geographic Information System (GIS) and computer technology, rather than
direct measurements (Deng, Wilson, Gallant 2016). Many studies have evaluated the role of a
GIS as a means of storage and management for elevation models. A study by Jordan (2007)
suggested that the functionality provided by a GIS adds additional components of data
management, analysis, and generation of various outputs to what would otherwise be limited to
data collection and storage. Another major claim by the author was that problems inherent in
remotely sensed image analysis can be overcome by using a GIS to tease out the bare earth
surface in order to assess the field properly.
8
Reviews like Hogrefe, Wright, and Hochberg (2008) suggest the deficiency of reliable
near coastal bathymetry is due to the turbidity, shallow features, and surf conditions that inhibit
optimal sampling efforts. Plant et al. (2002) evaluated the magnitude and scale of errors that are
related to sampling and use of nearshore bathymetry data. The results of the study supported the
idea that environmental conditions and the type of sensor directly influence the arrangement and
size of gridded pixels. Additionally, the authors stated that an analysis of the interpolation error
can allow for future design of optimal sampling strategies. While many modern hybrid
techniques have emerged to address this problem, it is well accepted that management of the
dynamic coastal regions demand accurate and repetitive DEM techniques (Bernstein 2002,
Mitasova et al. 2003, 2004, Bernstein et al. 2011)
While analyzing sparse data can be problematic, geographic solutions exist that enable
continuous scalar fields to be created from sets of discrete measurements. O’Sullivan and Unwin
(2010) outline a general workflow for creating a continuous surface from remotely sensed data.
The authors suggest that this two-part process involves sampling the physical surface and
choosing a form of interpolation. Sampling produces an output from electronic sensing
equipment which is provided to an analyst as a series of numeric values that represent a mapped
variable across a surface. Using these known points, values for unmeasured locations can be
predicted using algorithms that summarize the spatial relationship between known points
(Michell and Minami 1999). The underlying theory was originally demonstrated in a study by
Tobler (1970). The author animated cartographic simulations of urban growth to show correlated
patterns between neighbors. A major claim made by the author is that distance is the most
important variable that determines the interaction between phenomena or objects in space. This
9
concept is now widely known as spatial autocorrelation and is an assumed precondition for
interpolation analysis.
2.2 Related Works
Interpolation analysis works to find the function that passes through known points while
providing an accurate representation of all unmeasured values (Burrough 1986, McCullagh 1988,
Robinson 1994). In the context of spatial data, interpolation is used to build continuous datasets
from a limited amount of discretely measured points. Throughout the literature, a broad range of
interpolation models, algorithms, and techniques are discussed. Additionally, over the past few
decades, further developments in technology and computer science have broadened opportunities
for control over different aspects within the interpolation process (Achiellos 2008).
The choice of an interpolation method is ultimately very subjective and should be chosen
to best fit its application (McCullagh 1988, Achiellos 2008). The review by Schut (1976)
demonstrated that the accuracy of a DEM is highly dependent on the complexity of terrain
characteristics, sampling rates, and the interpolant used. Additionally, the study by Achiellos
(2008) is a good example of the different results that can be produced when using different
methods, techniques, and models. The study also claimed that the selection of parameters plays a
significant role in the outcome of a DEM. The use of an interpolator or parameters that are not
well suited for an application can lead to incorrect decision making. By contrast, given ideal
conditions of spatial distribution and point density, even the most basic of interpolators can
provide exceptional results (Schut 1976).
Accordingly, an in-depth knowledge of various methods and applications can assist in the
selection of an appropriate interpolant. There is the basic assumption when creating a DEM that
the raw data presents spatial dependence and the technique chosen will meet the needs of the
10
desired product (Robinson 1994). While no comprehensive study has concluded that any method
is more suitable than another, the wide range of published literature instead focuses on using
various methods along with different data.
The following section describes the three methods used in this thesis research based on
previous literature, along with contemporary applications and uses.
2.2.1 Inverse distance weighting
One of the most basic interpolation techniques available is the inverse distance weighted
(IDW) interpolation method (Burrough 1968, Schut 1976, Achiellos 2008). Since this
deterministic method is included in most systems that create and manage DEMs, its use in spatial
research is very common. This interpolator considers a local neighborhood and predicts values
that generate a surface that passes through every data point. IDW assumes that each measured
location has local influence on the surrounding points that lessens as a function of distance. A
review by de Mesnard (2012) supported this claim by demonstrating the use of IDW by
modeling pollution. The author considered measured pollution data as “reference points” and
used this to create a model. One flaw the study revealed was that different types of pollution data
warrant different considerations instead of considering all types of data with the same arbitrary
exponent. The author suggested future studies consider a more advanced method of interpolation,
such as kriging, for this use case.
Another study by Lu and Wong (2008) used IDW interpolation analysis to evaluate the
sensitivity of the parameters used for prediction. The authors reveal that the output surface
produced by IDW can also vary depending on the user’s level of a priori knowledge of the
subject and necessary parameters. One important parameter that is applied in the IDW method is
the variable or search radius. A variable search radius controls the number of points considered
11
at once while allowing for a varied distance depending on the spatial configuration of the data
set. A fixed search radius holds a constant neighborhood size and uses a minimum number of
points to determine what is considered for interpolation. A study by Chen and Liu (2011) used
IDW to consider 46 rainfall stations along with rainfall data. The authors found that if the radius
distance examined too many or too few stations, problems could arise within the analysis. The
issues noted included increasing computational runtime (when too many stations are considered)
and an inaccurate representation of the surface (if too few or no stations were considered).
The power function is considered the most important parameter used to compute
predictions using IDW. A study by Fotheringham and O’Kelley (1989) served to exemplify this
importance. The authors reveal that a decrease in the spatial relationship between two points is
not simply proportional to distance alone. The IDW method corrects for this by using a power
function, or distance decay parameter, to modify the weight of the spatial interaction. Several
studies have identified that this power function is the most important factor in the IDW method
(Burrough and McDonnel 1998, Priyakant et al. 2003). Another finding from this study was that
an increased distance between prediction locations resulted in a decreased weight of measured
points. This means that a higher power value will provide less influence on distant points.
Despite its popularity, the exact and deterministic IDW method has limitations. One
restriction in this method is that it does not consider the spatial variability of the phenomena.
Instead, IDW acts as an exact interpolator and the output surface created is identical to the
measured points. One example of this is described in the study by Erodgan (2009). The author
used IDW in a comparative analysis of interpolation methods while considering accuracy and
uncertainty. The results of the study showed that the resulting surface produced higher
uncertainty than other methods. The uncertainty can be attributed to the nature of the IDW
12
method creating flattened peaks and valleys. This is often the case when interpolating a sparse
density of point measurements and can create misleading representations of terrain. IDW is the
most basic interpolant considered in this study and as a result can be considered a baseline to
compare the more sophisticated methods to.
2.2.2 Empirical Bayesian kriging
While deterministic methods apply mathematical functions in order to describe a field,
probabilistic techniques consider the points within a field to be statistical in nature (Krivoruchko
2012, Wilson 2018). Borgman et al. (1994) recognized that the simplified assumptions and exact
predictions introduced by deterministic methods do not necessarily recognize environmental
variability nor address the spatial behavior between sample points. Often for this reason,
geographers favor describing fields with methods that are rooted in statistics because it leads to
more realistic views of scalar data.
Within the class of probabilistic interpolators is a method called kriging. Kriging is also
referred to as the optimal spatial predictor or best linear unbiased predictor (Cressie 1990). This
method originally evolved to meet the demand for a quantitative way of characterizing spatial
autocorrelation and building continuous datasets (Oliver and Webster 1990). Since the 1960s,
many applications of kriging have been published within meteorology, agriculture, mining,
epidemiology, hydrology and many other environmental sciences (e.g. Oliver and Webster 1990,
Moore and Carpenter 1999, Skøien, Merz & Blöschl 2005, Krivoruchko 2012).
Kriging was explained in a study by Lev Gandin (1959) on optimum interpolation. The
author suggested that optimal prediction and prediction uncertainty depends on covariance. The
covariance can be quantified by estimating a semivariogram as a function of the distance and
direction between pairs of coordinates (Krivoruchko 2012). Matherson (1963) expanded upon
13
this concept through his Regionalized Variables Theory. A regionalized variable is a
continuously varied numerical function where the spatial variation cannot be accurately
described by simple mathematics across space (Dias et al. 2018).
Kriging is desirable because it offers the advantage of generating statistical estimations
with minimum error in addition to a quantified measurement of that estimation (Cressie 1990).
Other reviews have suggested that kriging predictors filter measurement error creating a highly a
reduced prediction uncertainty when compared to other models of measurement errors
(Krivoruchko 2012). Zimmerman et al. (1999) showed that kriging generated more accurate
results that did IDW regardless of the landform type or sampling scheme. The author’s results
demonstrate the flexibility of kriging along with its ability to analyze correlation and covariance
across varying spatial structures (Arun 2013).
Among the many different flavors of kriging is an approach called Empirical Bayesian
Kriging (EBK). This method requires minimal interaction from the analyst as it automatically
calculates parameters through many subsets of local models and simulations (Krivoruchko and
Butler 2013). EBK differs from other kriging methods because it accounts for the introduced
error in other kriging methods. This is accomplished through a sophisticated kriging approach
that calculates multiple semivariogram models throughout the study region, as opposed to a
single semivariogram apparent in other methods (Krivoruchko 2012). The algorithm considers a
subset of the data and through iterative simulations it averages many semivariograms across
space. By using many local models, the algorithm can adapt to small scale changes in the data
leading to accurate predictions. Although EBK does include attractive qualities, it comes at the
cost of processing speed and only allows for a limited amount of customization (Esri 2018). This
form of minimally interactive modeling also has the advantage of opening many doors for
14
automating interpolation analysis across large scale data. One study suggested that Bayesian
kriging not only obtained more precise results than other kriging methods, but the process leads
to reduced costs without sacrificing quality of information (Cui, Stein, Myers 1995).
Across the literature regarding coastal studies, the generation of bathymetric models from
interpolation analysis is a well-studied technique (e.g. Righton and Mills 2006, Ryan et al. 2007).
The use of EBK was applied by the U.S. Geological Survey (USGS) to successfully interpolate
topo bathymetric DEMs (Danielson et al. 2016). The authors found that this method performed
well when mosaicking together both sparse bathymetry with dense topography. Another success
of using EBK in this application was that the methods automated the large volumes of data.
Since coastal models demand frequent updates because of the dynamic nature of the surf,
automating interpolation with EBK provided a consistent and reliable technique to use repeatedly
with these models.
2.2.3 Machine learning and random forests
An explosive amount of data is now available due to the rise of computers, the Internet of
Things (IoT), and advanced acquisition technology. Yet in this world where nearly everything
can be measured and monitored, a review by the New York Times (2009) expressed that “data is
merely the raw material of knowledge”. The use of machines to solve advanced problems offers
an advantage because it allows for the discovery of relations that could not otherwise be detected
by humans alone. Today, machine learning is pervasive among many different domains of real
world-applications and its use is growing exponentially. The term machine learning encompasses
many types of work such as data mining, pattern recognition, multivariate statistics, and
predictive analytics.
15
A model refers to a generalized representation used for predictions that can be
extrapolated to instances where data is not available (Esri 2019). In machine learning algorithms
(MLA), relationships are learned by training a model to detect correlations between dependent
and explanatory variables. Respectively, these variables refer to the phenomena to be predicted,
and the variable that caused or explains the dependent variable. When no data is present, the
model looks to collect information from explanatory variables in order to connect data that can
provide information about predicting a value. Guisan and Zimmerman (2000) refer to this model
formulation as a data driven prediction for a specific outcome based on an observed spatial
pattern. A major claim by the authors stated that by combining machines with optimal statistics,
the data speaks for itself with exceptional predictive abilities.
MLA iteratively explore datasets while connecting variables that infer accurate
predictions through a series of questions. The model recursively goes through a series of
questions and makes decisions which infer patterns. The use of MLA to classify remotely sensed
data of Australian forests serves as an exceptional example of a large sample size that can be
analyzed to form predictions (Brown de Colstoun et al. 2003). The results of the study showed
that the land cover map produced an overall accuracy of 82% when tested against a validation
dataset and 99.5% accuracy under conditions of forest classes. The successful results show that
the results from the case study can be scaled up to applications of the entire national park system.
One strategy employed by MLA is classification and regression trees (CART). This
technique was first introduced in a study by Breiman (2001). The author demonstrated that an
efficient way to train predictive algorithms is through a series of decision trees where
explanatory variables are split into different branches. Since this type of supervised learning
algorithm can predict both categorical (classification) or continuous (regression) data, it has
16
become one of the most used methods of MLAs. As the study stated, the objective of the model
is to associate a desired output with a specific value of a specific variable at each stage. The
study also demonstrated that features are recursively considered within a hypothesis class while
searching for a function that fits the data. Depending on the training data and hypothesis space,
MLA are commonly known to observe the data too well. Dietterich (1995) explained this central
problem inherent in learning algorithms by describing the tendency of a model to fit an objective
function too closely to the training dataset. This is known as overfitting a model and can create
issues when noise is precisely mimicked from a dataset.
Breiman (2001) used random forests (RF) to directly tackle the issue of overfitting. As a
more robust way to learn generalized patterns, a forest is built through an ensemble of random
trees. When individual trees work together as a forest, a model has better predictive performance
than otherwise with constituent learning algorithms (Esri 2019). Probability theory and the law
of large numbers (LLN) states that the iterative result will be a more reliable approximation of
the truth than would a single, independent realization (Judd 1985, Durrett 1995). Other studies
also support this claim by investigating the impact of randomness on tree classifiers and model
accuracy (Amit and Geman 1997).
While successful in many applications, RF are not specifically designed for spatial
applications. Many published attempts have described the process of creating spatially specific
implementations of the method. These modifications better allow for spatial applications. Sinha
et al. (2019) explored the application of RF to model population density when predicting parcel
aggregations that are downscaled from input training features. The results of the study strongly
suggested that including spatial autocorrelation of the training data played an important role in
minimizing the residual variance of predicted values. The authors concluded that more research
17
and specialized variants of the RF algorithm may lead to better predictions to be used with
spatially correlated and heterogeneous data.
Georganos et al. (2019) extended the RF algorithm and demonstrated its use with remote
sensing data along with population modeling. This method was adapted to deal with the
challenges of highly spatially heterogeneous data. The authors showed that their specialized
implementation of a geographical RF algorithm offered encouraging support for its use as a
spatially predictive and exploratory tool. This claim was validated by the study’s results that
showed a lower residual value of geographic RF when compared to non-spatial implementations
of the same method.
Studies have demonstrated the construction of water depth models using machine
learning. Manessa et al. (2016) suggested that the depth variable and surface reflectance have a
complex influence on data collected in shallow coral reef waters. The authors advocate
incorporating Worldview-2 satellite images and single beam echosounder measurements in order
to create a robust non-linear regression of the RF algorithm. Additionally, the authors
incorporated six variable bands and their logarithms in the regression equation as robust
explanatory variables. Sagawa et al. (2019) also used satellite derived bathymetry and RF to
predict bathymetric values under conditions of very sparse data. The authors were able to predict
a depth estimation model with minimal error by incorporating satellite image analysis based on
cloud computing.
2.3 Summary
Virtually all DEMs we interact with on a daily basis are created using some form of
interpolation. The interpolation technique plays an important role in achieving a high accuracy
elevation model from discretely collected points. However, there are many different approaches
18
and techniques to consider when creating a surface. The published literature rarely supports the
use of one method over another, but instead provides comparative case studies to support the
superior use of an interpolant for a given application.
For the purpose of this project, the exact deterministic method of IDW was used as a
benchmark to compare the results of the other methods. As supported by the literature above, the
IDW method is distance based and provides an interpolant that includes every point in the
output. Regardless of the underlying spatial process or location, IDW creates one model based on
known points. To predict unmeasured values, this method connects measured values so that the
minimum and maximum values occur at sampled points. This often creates surfaces with sharp
features and can misrepresent areas with steep unmeasured peaks and valleys. In the context of
this study, this method serves as a baseline to verify the hypothesized superior performance of
for sophisticated geostatistical methods.
EBK is an advanced semi-automated method that integrates the data and creates
predictions using multiple semivariogram models. This is accomplished by dividing the data into
many subsets and selecting the best combination of parameters within that subset. EBK has been
successfully demonstrated to capture multiple underlying spatial processes that drive
geomorphology. Furthermore, the use of local models allows this method to accurately predict
values in nonstationary data.
Lastly, given the presence of secondary variables, RF can provide a method to improve
interpolation predictions. RF and other MLA are frequently used to generate spatial predictions.
However, these methods often ignore the geography of measurements in the process (Hengl
2018). While this method is technically a non-spatial model, the use of spatial covariates are
expected to increase a model’s effectiveness with spatial data. Previous research has used
19
satellite images and regression models to predict bathymetry values in areas that are unreachable
by boat or plane. Given more time and resources, this research would take a similar approach to
explore this method. However, this thesis uses only the sampled data points and therefore
excludes the use of secondary datasets to drive the regression equation.
20
Chapter 3 Methods
This chapter describes the study area and discusses the sample data used to explore
multiple interpolation algorithms.
3.1 Research objectives
The purpose of this thesis is twofold. The first goal of this research addresses which
interpolation method will provide superior results when creating a continuous bathymetric DEM
using discrete measurements collected at half the frequency of the original survey. In order to
accomplish this objective, two geomorphic regions within Monterey Bay will be explored using
three different interpolants. As supported by the literature reviewed in the previous chapter, EBK
and forest-based regression were hypothesized to produce predictions superior to those obtained
by IDW. In order to validate this claim, IDW has been included to serve as a deterministic
baseline for comparison. The second objective of this research examines if fewer measurements
can be taken while generating relatively similar results. This can be addressed by creating a
series of uniformly thinned measurements and interpolating each using different methods. It was
hypothesized that different geomorphic regions will have varying thresholds of sampling density
required to create a reasonably accurate surface. The methods for addressing these two research
questions are outlined and discussed in this chapter.
3.2 Study area
For the purpose of this study, the well sampled region of Monterey Bay will be
considered the bathymetric ground truth to which the interpolated models will be compared.
Located off the central coast of California, the region’s unique benthic environment makes it an
attractive area to study. The Monterey Canyon is one of the largest underwater canyons in the
world and can be characterized by a rocky nearshore environment and dense kelp forests. The
21
region is also home to many species of marine life including sea otters, bottlenose dolphins,
elephant seals, humpback whales, sharks, and turtles.
Research in Monterey Bay is prioritized by many different scientific and conservation
initiatives with the goal of protecting this special area. The Monterey Bay Aquarium Research
Institute has prioritized the need to map the seafloor in and around the bay. These DEMs provide
researchers with a versatile source of information that can be leveraged for many research and
operational purposes.
Figure 2 shows the swath bathymetry provided by the USGS formed the primary spatial
data component used in this study. Bathymetry is traditionally considered fuzzy data, yet the
high spatial resolution of this dataset validated its application in this thesis research. The survey
tracks covered the area inside the bay and collected elevation data on the medium and high-
profile shelf regions. The metadata provided with this bathymetry assessed the data’s fitness of
use. The information revealed cooperative weather conditions during hydrographic surveys
which permitted marine survey equipment to operate in ideal sea states and collect quality data.
The weather during data collection is important to note since high surface winds and bubbles
under the transducer are the primary cause of poor-quality bathymetry acquisition.
22
Figure 2: Bathymetry collected in Monterey Bay, California.
Bathymetry was acquired using a 234.5 kHz SEA (AP) Ltd. SWATHplus-M phase-
differencing side-scan sonar mounted to a hull brace aboard the R/V Parke Snavely (Table 1). A
common reference frame with a Geodimeter 640 Total Station was achieved throughout the
survey with the use of sonar heads, GPS antennae, and a CodaOctupus F180 inertial
measurement unit. Post processing of erroneous soundings was completed in a networked
workstation aboard the R/V Parke Snavely. Information on the error inherently introduced by the
23
sensor and survey equipment was not identified in the metadata. Consequently, the derived
results are specific to this dataset and subject to additional inherent sources of error.
Table 1: SWATHplus-m sonar specifications used in bathymetric data collection in Monterey
Bay (USGS 2009).
Specification Value
Sonar frequency 234 kHz
Maximum swath width 300 m (typically 7-12 times water depth)
Resolution across track 5 cm
Transmit pulse length 34 µs to 500 µs
Ping pulse rater
150 m swath width 10 pings per second
300 m swath width 5 pings per second
Vertical accuracy (range dependent)
57 m 0.1 m
114 m 0.2 m
171 m 0.3 m
3.3 Spatial data preparation
To prepare the data for this thesis research, bathymetry surveyed in Monterey Bay was
acquired and downloaded from the USGS repository. The downloaded raster was transformed to
a point feature class in ArcMap10.7.1. The preparation for this analysis necessitated use of two
versions of Esri’s ArcGIS software because the legacy Maritime Bathymetry extension is limited
to use only in ArcMap. The bathymetric soundings were then thinned using a shallow-biased
selection with a nominal thinning radius of 100 m. An advantage of using the Reduce Point
Density tool enabled through the Maritime Bathymetry extension is that the output feature class
is thinned for the purpose of increasing processing speed, while retaining the integrity of the
original collected data. In this case, the total number of point measurements was reduced from
81,975,729 soundings to 47,293 soundings in the feature class. This step simplified the analysis
24
by reducing the time spent running geoprocessing operations while ensuring that the program did
not crash.
The geodatabase was then connected to ArcGIS Pro 2.4 to continue preparing the
necessary datasets. The next step was to create multiple datasets composed of uniformly
distributed random selections of points each representing different levels of sampling coverage
and density. To accomplish this, an additional attribute was added to the original point feature
class containing all the measured point values. This new field was populated for each point
feature with a generated random number using a Python script. The feature class was then
iteratively selected to include 50, 25, 10, and 5 percent of the original points. Each selection was
individually exported to create a new point feature class. Introducing randomness in the process
of selecting points to be included in each layer allowed for an unbiased analysis.
Different regions of the ocean can be characterized by different seabed geomorphology
and benthic habitat. Including a constraint on the bathymetry data allowed for depths within
different zones to be properly assessed according to specific semantics. For the purpose of this
study, a geomorphologic classification scheme was adopted from Harris et al. (2014). The
authors detailed analysis generated the first digital global seafloor geomorphic features and zones
map. Applying this study to the methods in this thesis research provides the benefit of an easy
mechanism for differentiating statistically validated types of benthic terrain. The study classified
each region using quantitative differences analyzed in 30-arc second shuttle radar topography
mission (STRM) data.
A folder of polygons was downloaded from the Blue Habits portal and visualized in
ArcGIS Pro. Each of the polygons represented a different seafloor geomorphic zone or classified
region. Visual interpretation showed that two polygons representing two geomorphic
25
classifications of shelf profile (medium profile and high profile) intersected the study area.
According to Harris et al. (2014), the distinction between high and medium profile classifications
is recognized by analyzing the vertical relief of the continental shelf over a five-cell radius. The
authors’ study suggests that a medium shelf is classified by 10-50 m vertical relief while high
profile shelfs exhibit a vertical relief greater than 50 m.
Figure 3. Study area classified by benthic region.
Each of the polygons acted as bounding units for the classified regions. Within each
polygon, point features were selected using an intersect operation. This process was repeated
26
with each of the down sampled feature classes. The number of features in the resulting eight
different datasets are reported in Table 2.
Table 2: Number of points retained in each of the generated datasets.
Selection of
original points
(%)
Total point features
retained
Medium profile shelf
features
High profile shelf
features
100 47,239 Not classified Not classified
50 23,551 6,746 16,819
25 4,736 3,358 8,554
10 2,324 1,370 3,367
5 2,234 671 1,653
3.4 Procedure
This section discusses how the prepared set of reference points were interpolated to
create a continuous surface and how the different surfaces obtained were assessed for accuracy.
3.4.1 Interpolating surfaces
The first method applied to the data was IDW. As discussed earlier this method is an
exact, deterministic interpolator that places weight on each data point by averaging the value of
points within each processing cell. This method assumes that the value associated with each
point decreases its influence on neighbors as the distance between points increase. ArcGIS Pro’s
Spatial Analyst provided access to the geoprocessing tool necessary to complete this operation.
The tool was used with a default power value of 2 and a variable search radius. This step was
applied to each set of quadruplicate point datasets to produce a total of eight different surfaces.
27
Next, the geostatistical technique of EBK was applied to each of the point datasets.
Kriging calculates averages by predicting error values that minimize the linear sum. The weights
for each point represent a measure of covariance and location determined by the semivariogram.
Unlike other kriging methods, this process models many semivariances for each subset of data
and plots them together in an empirical semivariogram. Figure 4 provides the empirical
semivariograms for all eight datasets.
Figure 4. Empirical semivariograms for observed data values.
28
While EBK is known among the methods of kriging to automate the most difficult
aspects of building a valid model, user knowledge and input is still required. For each dataset,
EBK was executed through the Geostatistical Analyst tool in ArcGIS Pro. The nearly normal
distribution of values did not warrant a data transformation. The default power semivariogram
was used to calculate the output for each of the eight replicates. Additionally, a parameter was
set to include the default number of 100 simulations per subset. Given a larger, dispersed dataset,
increasing the value of allotted simulations can aid in the success of determining a valid kriging
model.
Lastly, forest-based regression was used to generate an additional set of eight surfaces.
Conceptually, a model needs to be trained in order to learn to predict values. In this study,
distance to shore, slope, aspect, curvature, and a hot spot Getis-Gi* statistic was explored as
predictor variables to train the model. All of the explanatory rasters were generated as
derivatives from the downsampled data replicates. It was determined through exploratory spatial
data analysis that distance to shore and Getis-Ord Gi* values explained most predicted values.
The Forest Based Classification and Regression tool in ArcGIS Pro generates this claim through
a Variable Importance Table output from the Train Only setting. These explanatory variables
were then used to construct the model and predict values to a raster.
The Forest Based Classification and Regression tool in ArcGIS Pro creates a series of
decision trees that are intelligently fused together into a forest. An ensemble, or forest, can
produce a substantially more robust model than if the model is constructed from individual trees
alone. From each input dataset, 30 percent of the original data was considered a test dataset and
withheld from building the model. The remainder of the dataset comprised the measurements
29
used to randomly construct the regression model. In accordance with the methods above, the
result of this interpolation method produced a total of eight surfaces.
3.4.2 Comparing models
Three interpolants were used in this study and applied to eight different sets of
bathymetry data to obtain a total of 24 surfaces. Quality assessment of each dataset is a critical
part of DEM production. In this research the accuracy of a surface is considered the absence of
measured differences between two DEMs. The 24 DEMs were symbolized in an identical
manner. This was accomplished by applying a classified rendered spectral color ramp on a
quantile interval scale from 0 to -180 m. This classification method allowed for each class to
contain values spread across the entire distribution of the data range. Symbolizing the surfaces
with identical colors and scale allowed for visual assessment by highlighting differences in
depth.
With the goal of quantitatively determining the accuracy of each interpolated surface
compared to the bathymetric truth, the root mean square error (RMSE) was used as a metric. The
root mean square error is reported as the difference of residual values between rasters. To
generate this metric, the raster calculator in ArcGIS was used to calculate in finding the squared
difference of the surfaces. The mean of this calculated surface was determined using zonal
statistics. Then, the square root of the mean was recorded. Calculations were repeated for each of
the 24 rasters to obtain a RMSE between the “known” surface and the interpolated surfaces.
30
Chapter 4 Results
This section presents the findings of the study and graphically shows the comparison
between different interpolation algorithms in different coastal geomorphologic regions with
varying levels of sample sparsity.
4.1 Interpolation accuracy
Results of this research can be interpreted in several ways. To address the first objective
of this thesis research, the surfaces generated from three different interpolation methods were
analyzed to the most accurate surface when the seafloor measurements are collected half as
frequently as the original sampling density. RMSE provided a quantitative means of determining
the most accurate representation between the generated surface and the “known” surface. This
error metric was calculated using the equation described by Li and Heap (2008). These values
are first provided as a graph in Figure 5. This formula provides a way to establish how well a
model agrees with the actual data. A higher RMSE value indicates that the predictions produced
greater residual values further away from the mean or regression line. Thus, by this measure, a
lower RMSE values represents a better interpolated model.
For both the medium and high shelf profile, the RF method provide substantially superior
results. The RMSE for medium and high profile shelfs are 0.0846 m and 0.1422 m respectively.
In the medium profile shelf, EBK functioned as a secondary accurate alternate interpolator with a
RMSE of 0.7669 meters, while the high profile shelf EBK trial exhibited a more substantial jump
in model differences with a RMSE of 2.9038 meters. As hypothesized in both geomorphic
regions, IDW generated a surface model with the greatest variation from the true surface.
31
Figure 5. RMSE between ground truth and interpolated surfaces using half as frequently as the
original measurements.
The morphological differences between shelf classifications can be demonstrated through
these results. Overall, the interpolated surfaces classified as medium profile shelves revealed a
lower RMSE for all interpolation methods. All reported surfaces within this zone contain
residual values of less than one meter. On the contrary, the RMSEs of the high-profile shelf
region revealed residual values between 0.1142 m and 3.4773 m. These values are a reflection of
the ability of interpolation to capture the variation inherent in the surface it represents.
Additionally, this quantitatively illustrates that relatively constant sloping morphology can be
digitally captured better than that of rugged terrain across all methods.
Visual analysis provides an alternate method of assessing results. Mapped visuals is one
of the benefits of using a GIS to generate surface models. In the case of this thesis, the generated
interpolated surfaces show variation by depth across the bathymetry surface near the shores of
Monterey Bay, California. While it does provide an attractive report, visual interpretation of
0
0.5
1
1.5
2
2.5
3
3.5
4
IDW EBK RF
RMSE (m)
Medium Profile Shelf High Profile Shelf
32
outcomes can only capture a subjective idea of general trends obtained with different methods.
As the precise differences between surfaces are challenging to visualize and results obtained in
this manner can vary, it is important to consider the qualitative results below along with the
quantitative results discussed above.
Figure 6 shows the medium profile shelf surfaces when considering half the sampling
density of the original measured bathymetry values. Overall, the variation in depth displayed is
relatively similar horizontally across the surface. In each of the three surfaces, the width of each
classified depth range is about the same. However, the boundaries between each of the classified
regions is more jagged in the IDW while the RF surface displays the smoothest transitions
between classes. The differences are apparent when looking at the complex benthic terrain in the
left corner of each surface. This northwest region of Monterey Bay reveals a portion of the shelf
that only comes into focus with the RF interpolation. In the IDW interpolated surface, the same
corner appears to not contain values of light orange region (-52 to -56 m) as the other depths sink
more south across the terrain. In the EBK surface, the clarity of this region appears to lie in
between that of the IDW and RF interpolation.
33
Figure 6: Side by side comparison of medium profile shelf interpolations using 50% of the
sampled measurements.
The high profile shelf region is further from shore and exhibits a greater variation of
depth than the medium shelf surfaces. Figure 7 shows the side by side comparison of high profile
shelves when interpolating bathymetry composed of measurements with half the sampling
density. The dark purple region indicating the greatest depths around the mouth of the submarine
canyon shows a good example of the differences captured by the different interpolants. The RF
interpolation captured two curved areas in the lower left corner that are not shown in the other
two surfaces. The medium purple (-120 to -128 m) region in both the IDW and the EBK surface
appear to have smoothed over the variation between this curved region. This entire classified
area appears to drop lower through the benthic terrain than does in the RF surface.
34
Figure 7: Side by side comparison of high profile shelf interpolations using 50% of the sampled
measurements.
4.2 Subsample accuracy
To address the second objective of this thesis research, artificially created subsamples
were interpolated to investigate the tradeoff between sampling efforts and the resulting accuracy
of an interpolated surface. The RMSE for all interpolation replicates produced in this study are
summarized in Table 3. In all but one trial (medium shelf 5% EBK), using less of the sampled
data provided the least accurate results. One possible cause for the unlikely high value reported is
bias due to the only one random subsampling of points that was included in the analysis. Across
all the replicates, using more data generally tended to provide more accurate results. The
transitions between accuracy of the trials were greater between the 25 and 10% subsamples than
35
the transitions between the 50 and 25% subsamples. This nonlinear relationship between
sampling density and an interpolated surface can be seen across all methods.
Table 3: Residual values (m) between random subsamples (predicted) compared to ground truth
(observed) shallow bathymetry in Monterey Bay, California.
50% 25% 10% 5%
Medium profile shelf geomorphology
IDW 0.8005 0.8079 0.7844 0.9302
EBK 0.7669 0.6791 0.6311 0.7302
RF 0.0846 0.1224 0.3734 0.5412
High profile shelf geomorphology
IDW 3.4773 4.2080 5.0856 5.9769
EBK 2.9038 3.2485 3.6754 4.4830
RF 0.1422 0.3546 1.0752 3.1566
The variation in RMSE between the trials with the greatest amount of points and the least
amount of points can also provide insight into the capability of different interpolation methods to
provide accurate predictions with varying levels of sampling density. In both geomorphic classes
and across all levels of sampling density, EBK provided the least variation in RMSE values,
while RF produced the greatest variation in model accuracy. Across both classes, the variation in
RMSE produced by IDW interpolation showed intermediate variation. This provides results in
accordance with how the interpolation methods assess a dataset as either as a global or local
model.
36
Chapter 5 Conclusions
This study created a total of 24 bathymetry surfaces in order to address two research
objectives. The first part of this project was to assess which interpolation method provided the
most accurate results when using half of the “known” measurements. This was determined by the
lowest RMSE value. In both classes, using RF produced the most accurate surface when
sampling half as frequently as the source data. EBK was demonstrated to be a close alternative
for accuracy, while IDW ranked third. The overall RMSE for a medium shelf surface was
substantially lower than the error in the high-profile shelf. This can be attributed to the steady
sloping terrain of nearshore environments that make it easier to fit a surface to a set of points. In
other words, a surface that varies more dramatically can be more difficult to model. As
demonstrated by the results, the high-profile shelf contains complex benthic features which may
complicate the interpolation process.
The second objective of this research sought to determine if fewer samples can be taken
while providing similar results. The rate of ocean mapping has historically been very slow and
the goal of providing data in every grid of a global bathymetry model seems far from reachable
given current sampling practices. However, as demonstrated by this thesis, smoother
transitioning regions such as the medium profile shelf, can require less input data to obtain
accurate results. In contrast, complex terrain such as high profile shelfs demand more input data
for quality results. This represents potential for creating models to fill gaps in data in smooth,
near coastal regions, where data acquisition is consistently difficult, expensive, or not accessible.
Across all replicates, using more data generally tended to provide more accurate results.
However, as expected, local interpolants such as EBK were able to use less data to create more
accurate results. While this was demonstrated clearly in the EBK medium profile shelf, the
37
output of EBK trials in the high-profile shelf generated relatively similar results at all sampling
levels. This indicates that the relationship between accuracy of the resulting interpolated surface
and the sampling density is not linear, and not uniform for all interpolation algorithms. The real-
world implications of this observation can provide important insight into designing sampling
schemes and choices of interpolants given varying sampling densities when creating DEMs in
the future.
The most important aspects of this thesis center around the tradeoffs between sampling
density, interpolation methods, and local geomorphology. The results obtained demonstrate that
all three criteria play a significant and interconnected role in the accuracy of a digital surface
product. Assessing the accuracy of varying sampling densities demonstrated that while generally
denser datasets resulted in more accurate products, this claim did not always hold true. When a
global interpolator (IDW and RF) was applied to the points, the entire dataset was considered as
a single model. Both methods resulted in an increase of error as sampling density was decreased.
However, application of a local interpolator (EBK) was able to capture the fine scale details of a
model by splitting the region into small subsets and modeling many semivariograms. In
situations of less data, this method proved very successful. The promising results seen in the
EBK and RF interpolated surfaces warrant the future exploration of EBK regression prediction
as way to combine a local geostatistical interpolator along with regression analysis. Given
appropriate explanatory variables, the potential to achieve more accurate results than either
method can individually achieve could present a powerful interpolator for sparsely sampled
regions.
5.2 Limitations
38
While this thesis research successfully demonstrated the main objectives, there are
limitations in the claims made. The identification and future correction of spatial biases are
essential for quality decision making. One such limitation is that the results discussed are subject
to random bias. The point measurements included in each replicate were selected through a
single random trial. By adding multiple random trials to each level of sampling density,
statistically relevant results can be obtained by averaging the RMSE within each category.
The results generated from applying the RF algorithm shows promising potential for
interpolation as it was able to more accurately produce results than IDW. However, the validity
of the regression equation cannot be overlooked. The process of deriving the covariates proved
difficult and no published literature exists on using only a single dataset to correct for a paucity
of measurements. In many cases, additional SDB or other higher resolution products showed
great potential for increasing confidence in spatial accuracy. However, more time and resources
will be required outside the scope of this thesis in order to accurately determine spatial or
nonspatial covariates that solely depend on the source data.
Additionally, a limitation of the RF algorithm is that is only performs well on data it has
been trained on. This means that it is likely to derive poor results when extrapolating to other
datasets. One possible way to overcome this challenge is to increase the sample size of the study
region. This study used one set of geomorphologic classifications to assess the differences
between seabed relief, geology, and formative processes, to provide a means of categorizing
different complexities of benthic terrain. By including other study areas with the same benthic
classification, the statistical validity can be increased as well as the possibility for a stronger
covariate to be used in the regression equation.
39
Another option for deriving statistically distinct categories is exploring the use of
indicator kriging (IK) methods. IK is commonly used in geologic and subsurface studies. Both
applications are analogous to the interpolation of shallow bathymetry where different categories
of geologic composition can overlap and become mixed within each other. While there are many
different classification schemas for benthic environments, a coarse proxy will help to statistically
infer parameters by controlling the spatial continuity within different environmental categories or
thresholds.
5.3 Future work
With the ocean covering the majority of our planet, there is a great need to increase our
knowledge of the accuracy of products representing the sparsely sampled seafloor. The accuracy
of a product can impact its usefulness in future studies. It is important for applications of DEMs
to identify both fine and large-scale details within benthic geomorphology. Misrepresentation of
these features are likely to have a ripple effect on our overall understanding of ocean science as
well as other environments across the entire blue planet.
While the obtained results successfully addressed the two research questions set forth by
this thesis research, there is encouraging potential for its scalability and development of future
work. Modifications of the methodology demonstrated in this research can increase the
application of these results in the real world. Optimizing sampling density provides a means to
understand the accuracy and results of existing surveys. It also provides a foundation for
planning future coastal surveys to be optimally executed given that extensive time and resources
are required to survey the seafloor. However, the ocean is sampled differently than a random
configuration of points. To properly represent different patterns of coastal paucity of data,
coastlines should be further studied to assess generalized sampling patterns. It is predicted that
40
there will be three levels of sampling sparsity; low resolution based on satellite gravimeters, low
resolution with well sampled transect lines throughout, and clustered datasets surrounding fixed
observation stations. It is likely that different spatial configuration composed of different
sampling densities will add an additional variable to be considered.
While this study specifically assessed only four tiers of sampling density, a more accurate
threshold level can be assessed by exploring all possible levels of point density within particular
geomorphic regions. In order to accomplish this, future studies can take advantage of many
different types of data collected by many different sensors. By using a computationally efficient
way to assess bathymetry data across the globe, such as machine learning, a model can be
optimized within each geomorphic categorical group indicating the minimum required sampling
density. Given a statistically relevant sample size, the possibility of using trained models to
create optimized interpolants for various geomorphologic and sampling schemes or densities can
assist in the most accurate representation of the seafloor.
41
References
Achilleos, G. 2008. Errors within the inverse distance weighted (IDW) interpolation procedure.
Geocarto International, 23(6), 429–449. https://doi.org/10.1080/10106040801966704
Arun, P. V. 2013. A comparative analysis of different dem interpolation methods. Geodesy and
Cartography, 39(4), 171–177. https://doi.org/10.3846/20296991.2013.859821
Bakiri, T.G., & Dietterich, B. G. 1995. Solving Multiclass Learning Problems via Error-
Correcting Output Codes. Journal of Artificial Intelligence Research, 2, 263–286. Retrieved
from http://www.jair.org/media/105/live-105-1426-jair.pdf
Bernstein, D. J. 2002. Short-Term Evolution of Cape Morphology: Cape Lookout and Cape
Fear, North Carolina (North Carolina State University). Retrieved from
http://www.lib.ncsu.edu/resolver/1840.16/1131
Borgman, L. E., Miller, C. D., Signorini, S. R., & Faucette, R. C. 1994. Stochastic interpolation
as a means to estimate oceanic fields. Atmosphere-Ocean, 32(2), 395–419.
https://doi.org/https://doi.org/10.1080/07055900.1994.9649504
Breiman, L. 2001. Random forests. Machine Learning, 45(1), 5–32.
https://doi.org/10.1023/A:1010933404324
Brown De Colstoun, E. C., Story, M. H., Thompson, C., Commisso, K., Smith, T. G., & Irons, J.
R. 2003. National Park vegetation mapping using multitemporal Landsat 7 data and a
decision tree classifier. Remote Sensing of Environment, 85(3), 316–327.
https://doi.org/10.1016/S0034-4257(03)00010-5
Burrough, P. A. 1986. Principles of geographical information systems for land resources
assessment. Principles of Geographical Information Systems for Land Resources
Assessment. https://doi.org/10.1097/00010694-198710000-00012
Chen, F. W., & Liu, C. W. 2012. Estimation of the spatial rainfall distribution using inverse
distance weighting (IDW) in the middle of Taiwan. Paddy and Water Environment, 10(3),
209–222. https://doi.org/10.1007/s10333-012-0319-1
Cressie, N. 1990. Origins of kriging. Mathematical Geology, 22, 239–252.
Cui, H., Stein, A., & Myers, D. 1995. Extension of spatial information, bayesian kriging and
updating of prior variogram parameters. Environmetrics, 6(4), 373–384.
https://doi.org/10.1002/env.3170060406
de Mesnard, L. 2013. Pollution models and inverse distance weighting: Some critical remarks.
Computers and Geosciences, 52, 459–469. https://doi.org/10.1016/j.cageo.2012.11.002
42
Deng, Y., Wilson, J. P., & Bauer, B. O. 2007. DEM resolution dependencies of terrain attributes
across a landscape. International Journal of Geographical Information Science, 21(2), 187–
213. https://doi.org/10.1080/13658810600894364
Dias, V. R. D. M., Sallo, F., Sanches, L., & Dallacort, R. 2017. Geostatistical Modeling of the
Ten-Day Rainfall in Mato Grosso State. Geografia, 42(3), 99–112.
Eakins, B. W., & Grothe, P. R. 2014. Challenges in Building Coastal Digital Elevation Models.
Journal of Coastal Research, 297, 942–953. https://doi.org/10.2112/jcoastres-d-13-00192.1
Erodgan, S. 2008. A comparision of interpolation methods for producing digital elevation
models at the field scale. Earth Surface Processes and Landforms, 34, 366–376.
Esri. 2018. “What is Empirical Bayesian Kriging? — ArcGIS Pro | ArcGIS Desktop.”
Pro.Arcgis.Com. https://pro.arcgis.com/en/pro-app/help/analysis/geostatisticalanalyst/what-
is-empirical-bayesian-Kriging-.htm.
Esri. 2019. "Comparing Models — Help | ArcGIS Desktop.” Desktop.arcgis.com.
http://desktop.arcgis.com/en/arcmap/latest/extensions/geostatistical-
analyst/comparingmodels.htm.
Esri. 2019. “Forest-based Classification and Regression – ArcGIS Pro | ArcGIS Desktop.”
Pro.arcgis.com. https://pro.arcgis.com/en/pro-app/tool-reference/spatial-
statistics/forestbasedclassificationregression.htm
Georganos, S., Grippa, T., Niang Gadiaga, A., Linard, C., Lennert, M., Vanhuysse, S.,
Kalogirou, S. 2019. Geographical random forests: a spatial extension of the random forest
algorithm to address spatial heterogeneity in remote sensing and population modelling.
Geocarto International. https://doi.org/10.1080/10106049.2019.1595177
Guisan, A., & Zimmerman, N. 2000. Predictive habitat distribution models in ecology.
Ecological Modeling, 135, 147–186.
Harris, P. T., Macmillan-Lawler, M., Rupp, J., & Baker, E. K. 2014. Geomorphology of the
oceans. Marine Geology, 352, 4–24. https://doi.org/10.1016/j.margeo.2014.01.011
Hengl, T., Nussbaum, M., Wright, M., Geuvelink, G., & Graler, B. 2018. Random forests as a
generic framework for predictive modeling of spatial and spatio-temporal variables. PeerJ,
6(e5518).
Hogrefe, K. R., Wright, D. J., & Hochberg, E. J. 2008. Derivation and Integration of Shallow-
Water Bathymetry: Implications for Coastal Terrain Modeling and Subsequent Analyses.
Marine Geodesy, 31, 299–317. https://doi.org/10.1080/01490410802466710
43
Jordan, G. 2007. Digital terrain analysis in a GIS environment. Concepts and development.
Lecture Notes in Geoinformation and Cartography, 1–43. https://doi.org/10.1007/978-3-
540-36731-4_1
Krivoruchko, K. 2012. Empirical Bayesian Kriging. Esri Press, Fall 2012, 6–10. Retrieved from
https://www.esri.com/NEWS/ARCUSER/1012/files/ebk.pdf
Lu, G. Y., & Wong, D. W. 2008. An adaptive inverse-distance weighting spatial interpolation
technique. Computers and Geosciences, 34(9), 1044–1055.
https://doi.org/10.1016/j.cageo.2007.07.010
Manessa, M. D. M., Kanno, A., Sekine, M., Haidar, M., Yamamoto, K., Imai, T., & Higuchi, T.
2016. Satellite-Derived Bathymetry Using Random Forest Algorithm and Worldview-2
Imagery. Geoplanning: Journal of Geomatics and Planning, 3(2), 117.
https://doi.org/10.14710/geoplanning.3.2.117-126
Matheron, G. 1963. Principles of geostatistics. Economic Geology, 58, 1246–1266.
Mayer, L., Jakobsson, M., Allen, G., Dorschel, B., Falconer, R., Ferrini, V., Weatherall, P. 2018.
The Nippon Foundation-GEBCO seabed 2030 project: The quest to see the world’s oceans
completely mapped by 2030. Geosciences (Switzerland), 8(2).
https://doi.org/10.3390/geosciences8020063
McCullagh, M. J. 1988. Terrain and Surface Modelling Systems: Theory and Practice. The
Photogrammetric Record, 12(72), 747–779. https://doi.org/10.1111/j.1477-
9730.1988.tb00627.x
Mitchell, A., & Minami, M. (1999). The ESRI guide to GIS analysis: Vol. 1, Geographic
patterns & relationships, Environmental Systems Research Institute. In E. S. R. Institute
(Ed.), Inc., Redlands, Calif. Redlands: Esri Press.
Moore, D. A., & Carpenter, T. E. 1999. Spatial analytical methods and geographic information
systems: Use in health research and epidemiology. Epidemiologic Reviews, 21(2), 143–161.
https://doi.org/10.1093/oxfordjournals.epirev.a017993
Oliver, M. A., & Webster, R. 1990. Kriging: A method of interpolation for geographical
information systems. International Journal of Geographical Information Systems, 4(3),
313–332. https://doi.org/10.1080/02693799008941549
O’Sullivan, D., & Unwin, D. J. 2009. Geographic Information Analysis (2nd ed.). Hoboken:
John Wiley & Sons, Inc.
Plant, N. G., Holland, K. T., & Puleo, J. A. 2002. Analysis of the scale of errors in nearshore
bathymetric data. Marine Geology, 191(1–2), 71–86. https://doi.org/10.1016/S0025-
3227(02)00497-8
44
Righton, D., & Mills, C. 2006. Application of GIS to investigate the use of space in coral reef
fish: A comparison of territorial behaviour in two Red Sea butterflyfishes. International
Journal of Geographical Information Science, 20(2), 215–232.
https://doi.org/10.1080/13658810500399159
Robinson, G. J. 1994. the Accuracy of Digital Elevation Models Derived From Digitised
Contour Data. The Photogrammetric Record, 14(83), 805–814.
https://doi.org/10.1111/j.1477-9730.1994.tb00793.x
Sagawa, T., Yamashita, Y., Okumura, T., & Yamanokuchi, T. 2019. Satellite derived bathymetry
using machine learning and multi-temporal satellite images. Remote Sensing, 11(10).
https://doi.org/10.3390/rs11101155
Schut, G. H. 1976. Review of Interpolation Methods for Digital Terrain Models. The Canadian
Surveyor, 30(5), 389–412. https://doi.org/10.1139/tcs-1976-0037
Sinha, P., Gaughan, A. E., Stevens, F. R., Nieves, J. J., Sorichetta, A., & Tatem, A. J. 2019.
Assessing the spatial sensitivity of a random forest model: Application in gridded
population modeling. Computers, Environment and Urban Systems, 75, 132–145.
https://doi.org/10.1016/j.compenvurbsys.2019.01.006
Skøien, J. O., & Blöschl, G. (2005). Spatiotemporal topological kriging of runoff time series.
Water Resources Research, 43(9). https://doi.org/10.1029/2006WR005760
Sui, H., Hua, L., Zhao, H., & Zhang, Y. 2005. A fast algorithm of cartographic sounding
selection. Geo-Spatial Information Science, 8(4), 262–268.
https://doi.org/10.1007/BF02838660
Tobler, W. 1970. A Computer Movie Simulating Urban Growth in the Detroit Region. Economic
Geography, 46, 234–240. https://doi.org/10.2307/143141
Wilson, J. 2018. Environmental Applications of Digital Terrain Modeling. John Wiley & Sons.
Zimmerman, D., Pavlik, C., Ruggles, A., & Armstrong, M. P. 1999. An experimental
comparison of ordinary and universal kriging and inverse distance weighting. Mathematical
Geology, 31(4), 375–390. https://doi.org/10.1023/A:1007586507433
Zoraster, S., & Bayer, S. 2015. Automated Cartographic Sounding Selection. The International
Hydrographic Review, 69(1).
Abstract (if available)
Abstract
The lack of seafloor information is often a result of the challenging logistics and expenses involved with acquiring data in this unique environment. Yet, despite the sparsely sampled environment, many significant efforts exist to create global bathymetry models. However, there exists a public misunderstanding of the true sampling density in the ocean that can be largely attributed to contemporary interpolation and enhanced cartography. The seafloor is more sparsely sampled than most people realize. Thus, it is important to understand the influence of the underlying source data and the interpolation technique used when creating an accurate digital bathymetry model. The accuracy of a surfaces can depend on sampling density, interpolation method, and local geomorphology. However, if a bathymetry surface can be accurately created using sparse measurements, mission planning can be directed to sample the seafloor at a certain resolution. The results of this thesis research encourage future exploration of a computationally efficient method to assess the best method of interpolation method in different regions under different conditions.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
An examination of close-range photogrammetry and traditional cave survey methods for terrestrial and underwater caves for 3-dimensional mapping
PDF
Integration of topographic and bathymetric digital elevation model using ArcGIS interpolation methods: a case study of the Klamath River Estuary
PDF
Wetland mapping and restoration decision making using remote sensing and spatial analysis: a case study at the Kawainui Marsh
PDF
Creating Hot Streets: developing an automated approach using ModelBuilder
PDF
Building better species distribution models with machine learning: assessing the role of covariate scale and tuning in Maxent models
PDF
Predicting post-wildfire regreen rates: an application of multi-factor regression modeling
PDF
The use of site suitability analysis to model changes in beach geomorphology due to coastal structures
PDF
Evaluating surface casing depths of oil & gas operations in an effort to protect local groundwater: a GIS enabled process
PDF
Integrating GIS into farm operations at the Homer C. Thompson Research Farm in Freeville, New York
PDF
Improving positional accuracy in smartphones: exploration of the use of a Broadband Global Area Network system in positional data collection
PDF
Satellite derived bathymetry in the Canadian Archipelago using multispectral and LiDAR space-based remote sensing
PDF
Evaluating the MAUP scale effects on property crime in San Francisco, California
PDF
Precision agriculture and GIS: evaluating the use of yield maps combined with LiDAR data
PDF
Modeling nitrate contamination of groundwater in Mountain Home, Idaho using the DRASTIC method
PDF
Development of a Web GIS application to aid marathon runners in the race selection and planning process
PDF
Using GIS to identify potential dynamic marine protected areas: a case study using shortfin mako shark tagging data in New Zealand
PDF
A comparison of two earthquake events in the City of Downey: the Puente Hills and Whittier faults at 7.0 and 6.8 magnitudes
PDF
Utilizing advanced spatial collection and monitoring technologies: surveying topographical datasets with unmanned aerial systems
PDF
Using GIS to perform a risk assessment for air-transmitted bioterrorism within San Diego County
PDF
Filling in the gaps: 3D mapping Arizona's Basin and Range aquifer in the Prescott Active Management Area
Asset Metadata
Creator
Rosenthal, Sarah
(author)
Core Title
Using GIS to explore the tradeoffs in hydrographic survey planning: an investigation of sampling, interpolation, and local geomorphology
School
College of Letters, Arts and Sciences
Degree
Master of Science
Degree Program
Geographic Information Science and Technology
Publication Date
05/15/2020
Defense Date
05/14/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bathymetry,hydrography,interpolation,OAI-PMH Harvest
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Fleming, Steve (
committee chair
), Loyola, Laura (
committee member
), Wilson, John (
committee member
)
Creator Email
rosenthal.sar@gmail.com,rosenths@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-310865
Unique identifier
UC11664176
Identifier
etd-RosenthalS-8536.pdf (filename),usctheses-c89-310865 (legacy record id)
Legacy Identifier
etd-RosenthalS-8536.pdf
Dmrecord
310865
Document Type
Thesis
Rights
Rosenthal, Sarah
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
bathymetry
hydrography
interpolation