Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Neighborhood greenspace's impact on residential property values: understanding the role of spatial effects
(USC Thesis Other)
Neighborhood greenspace's impact on residential property values: understanding the role of spatial effects
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NEIGHBORHOOD GREENSPACE’S IMPACT ON RESIDENTIAL PROPERTY
VALUES: UNDERSTANDING THE ROLE OF SPATIAL EFFECTS
by
Qi Christina Li
____________________________________________________________________
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(GEOGRAPHY)
May 2010
Copyright 2010 Qi Christina Li
ii
Dedication
I dedicate this dissertation to my husband Jason Byrne, whose unrelenting support,
encouragement and patience have made the completion of this work possible.
iii
Acknowledgements
I am sincerely grateful to my doctoral advisor John Wilson for enabling me to
achieve my vision, helping me to grow through this experience, for his never-ending
patience, great mentorship, and constant support. John has provided critical research
guidance and insights for this dissertation, not to mention painstakingly detailed
editing feedbacks. I would also like to thank my dissertation committee members:
Andrew Curtis for his valuable feedback; Christian Redfearn for helping me obtain
the data for analysis and sharing his expertise. I especially want to thank Professor
Jennifer Wolch, for her encouragement, nurturing, and facilitation.
I also want to thank my fellow scholars in the GIS Research Laboratory –
Chona Sister, Christine Lam, Jingfen Sheng, Thao Nguyen, Sujin Lee, Daniel
Goldberg – for sharing their knowledge and providing valuable suggestions. I want
to give a special thank to my good friend Yan Xu, who was always so helpful
whenever I asked her for technical advice. I would also like to thank my friend Wu
Tao for helping me to transform the data for analysis.
My thanks also go to the administrative staff of the Geography Department
and GIS Laboratory, including Billie Shotlow, Katherine Kelsey, Gennifer Merriday,
Leilani Banks, and Robert Alvarez. I also want to thank the GIS Lab Manager at
iv
Griffith University, Mariola Hoffman, for providing me with technical support, when
I was finishing the last part of the dissertation in Australia.
I acknowledge the financial support from Geography Department as research
assistant, Final Year Dissertation Fellowship from the College of Letters, Arts, and
Sciences – without which the dissertation would not be possible.
Last but not least, I thank my mother for looking after my baby daughter
Skye in order to give me precious child-free time to finish writing the last chapter in
China, and Skye, who will one day understand why her mummy was always tired.
v
Table of Contents
Dedication ....................................................................................................................ii
Acknowledgements.....................................................................................................iii
List of Tables .............................................................................................................vii
List of Figures .............................................................................................................ix
Abstract .......................................................................................................................xi
Chapter 1: Introduction ................................................................................................1
Chapter 2: The Economic Benefits of Green Infrastructure in Urban Areas: A
Review of Methodology...............................................................................................5
2.1 Chapter 2 Introduction .......................................................................................5
2.2 The Hedonic Pricing Method (HPM): A Brief Review .....................................7
2.3 The Inclusion of Open Space Variables in HPMs .............................................9
2.3.1 Distance-based metrics................................................................................9
2.3.2 Other metrics.............................................................................................13
2.3.3 Composite variables ..................................................................................14
2.3.4 Role of Geographic Information Systems (GIS).......................................18
2.4 Potential Problems with Inclusion of Open Space Variables in HPMs ...........19
2.4.1 Recent Work on Spatial Effects ................................................................23
2.5 Alternative Approaches to Control Spatial Effects..........................................30
2.6 Conclusions and Future Directions..................................................................36
Chapter 2 Endnotes ................................................................................................39
Chapter 3: A Spatial Autocorrelation Approach for Examining the Effects of
Greenspace on Residential Property Values: An Inner City Case Study...................40
3.1 Chapter 3 Introduction .....................................................................................40
3.2 Housing Prices and Open Space Amenities: Background ...............................42
3.2.1 Studies of Open Space Characteristics......................................................42
3.2.2 Valuation Methods of Open Space Studies...............................................45
3.3 Data and Methodology.....................................................................................47
3.3.1 Study Area and Housing Data...................................................................47
3.3.2 Greenspace Measures................................................................................49
3.3.3 Model Specification ..................................................................................50
3.4 Results..............................................................................................................57
3.4.1 Standard Hedonic Pricing Model ..............................................................57
3.4.2 Tests for Spatial Dependence....................................................................59
3.4.3 Spatial Lag Model .....................................................................................61
3.5 Discussion and Conclusions.............................................................................62
vi
Chapter 3 Endnotes ................................................................................................70
Chapter 4: The Impact of Neighborhood Greenspace on Residential Property
Values across a Spatially Heterogeneous Metropolitan Area....................................71
4.1 Chapter 4 Introduction .....................................................................................71
4.2 Spatial Modeling in Open Space Hedonic Models ..........................................73
4.3 Study Area and Data Sample ...........................................................................78
4.3.1 Derivation of Greenspace Variables .........................................................81
4.4 Models..............................................................................................................84
4.4.1 Model 1: Standard Hedonic Model ...........................................................85
4.4.2 Models 2 and 3: Spatial Expansion Models..............................................87
4.4.3 Models 4 and 5: Spatial Regression Models Targeting Spatial
Dependence ........................................................................................................88
4.4.4 Model 6: Geographically Weighted Regression (GWR) ..........................90
4.4.5 Models 7 and 8: Spatial Filtering Models .................................................91
4.5 Results..............................................................................................................94
4.5.1 Overall Model Performance Comparison Using Approach A ..................97
4.5.2 Overall Model Performance Comparison within Approach B................119
4.5.3 Auxiliary regressions of Approach B......................................................126
4.6 Discussion and Conclusions...........................................................................129
4.6.1 Significant neighborhood greenspace impact .........................................130
4.6.2 Spatial modeling of significant spatial structures in Approach A...........131
4.6.3 The Auxiliary Regression Results of Approach B..................................136
4.6.4 Final Thoughts ........................................................................................137
Chapter 5: Performance of Four Spatial Models across Two Different
Geographic Scales....................................................................................................138
5.1 Chapter 5 Introduction ...................................................................................138
5.2 Previous Comparative Studies .......................................................................140
5.2.1 A Brief Review of Some Common Spatial Modeling Techniques .........141
5.2.2 Past Empirical Studies.............................................................................144
5.3 Comparing Spatial Model Performance at Two Different Geographic
Scales ...................................................................................................................152
5.3.1 Data and Method.....................................................................................152
5.3.2 Results .....................................................................................................153
5.4 Discussion and Conclusions...........................................................................168
Chapter 6: Concluding Remarks ..............................................................................172
6.1 Overview........................................................................................................172
6.2 Continuing Research......................................................................................177
Bibliography.............................................................................................................179
vii
List of Tables
Table 2.1 List of open space variables used in hedonic models ................................10
Table 3.1 Median summary measures for house sale transactions ............................57
Table 3.3 Specification tests results..........................................................................60
Table 4.1 Descriptive statistics ..................................................................................95
Table 4.2 Model performance comparison using Approach A..................................98
Table 4.3 Benchmark model estimation results under two options ...........................99
Table 4.4 Spatial expansion methods estimation results..........................................101
Table 4.5 Estimated coefficients of Model A1 and Model A4 (with two spatial
neighbor specifications) ...................................................................................110
Table 4.6 Estimation results for four versions of Model A5 that include
different combinations of spatial expansion and spatial neighbor
specifications....................................................................................................111
Table 4.7 GWR parameter estimates .......................................................................115
Table 4.8 Estimated coefficients from spatial filtering models in the GAM
framework (Model A7) ....................................................................................118
Table 4.9 Model performance comparison within Approach B...............................120
Table 4.10 Estimated coefficients of the benchmark model (Model B1) ................121
Table 4.11 Estimated coefficients of spatial error models (Models B3 and B5) .....123
Table 4.12 Estimated coefficients of GWR (Model B6)..........................................124
Table 4.13 Estimated coefficients of Model B7 fitted with natural cubic spline*...125
Table 4.14 Estimated (OLS) coefficients of the auxiliary regression......................127
Table 4.15 Estimated coefficients for the spatial error model combined with
quadratic expansion, applied to the auxiliary regression variables..................128
Table 5.1. Model performance comparison .............................................................155
viii
Table 5.2 Specification test of spatial autocorrelation in base model......................156
Table 5.3 Coefficient estimates in scenario 1 (S1) and scenario 2 (S2) ..................157
Table 5.4 GWR parameter five-number summaries ................................................163
ix
List of Figures
Figure 2.1 Spatial autocorrelation effects: (a) similar value clustering (positive
autocorrelation); (b) random pattern; and (c) dissimilar value clustering
(negative autocorrelation) (reproduced from Griffith 1987, p.37).....................21
Figure 3.1 Vermont study area with locations of house sale transactions in
1999-2000 ..........................................................................................................48
Figure 3.2 Schematic diagram illustrating the greenspace (polygons in various
shades of green color) surrounding house locations (the red dot) in the
Vermont Corridor (buffers at 25, 50, 75, 100, 150, 250, 300, 400 and 500
feet) ....................................................................................................................51
Figure 3.3 Thiessen polygons constructed for each house location...........................56
Figure 3.4 Partial effects of the 200-300 feet greenspace ring ..................................63
Figure 4.1 Location map of the study area.................................................................78
Figure 4.2 Selected properties....................................................................................81
Figure 4.3 Derivation process of the neighborhood greenspace variable ..................83
Figure 4.4 Google Earth image with traced greenspace polygons overlaid on
top of sample parcel shown in center of map.....................................................84
Figure 4.5 Sale prices of selected properties and an extrapolated price surface........94
Figure 4.6 Box plots of the selected variables, showing median, upper and
lower quartiles, minimum and maximum data values of each variable.............96
Figure 4.7 Estimated coefficients for key variables from Model A3, with
quadratic expansion..........................................................................................106
Figure 4.8 Semi-variogram of the residuals of Model A3 .......................................108
Figure 4.9 Estimated coefficients of key variables from Model A5, specified
with a quadratic expansion and neighbors within 4 km...................................113
Figure 4.10 Estimated coefficients for key variables in Model A6 (GWR) ............116
Figure 4.11 Estimated coefficients of Ln(Green) from Model B6, GWR ...............125
x
Figure 5.1 Estimated Ln(Green) coefficients from spatial expansion model,
scenario 2 .........................................................................................................158
Figure 5.2 GWR estimation of LnGreen coefficients, scenario 1............................164
Figure 5.3 GWR estimation of LnGreen coefficients, scenario 2............................165
Figure 5.4 The difference of LnGreen coefficient estimation (GWR) between
two scenarios....................................................................................................167
xi
Abstract
The value of various types of open space has been widely assessed by environmental
economists as ‘capitalized value’ in properties. But urban greenspace –
neighborhood greenspace in particular – has attracted limited attention. A hedonic
model is often employed to assess such values, but consideration for spatial effects is
often absent when it comes to open space studies. The literatures of spatial
econometrics and spatial statistics have made considerable advances in incorporating
spatial effects into hedonic models. This dissertation examines the application of
some of these advances on hedonic models with neighborhood greenspace
characteristics, using GIS as a platform to assist with these spatial analyses. This
analytical vantage point helps to achieve the accurate estimation of the value of
neighborhood greenspace, and also helps to further the understanding of these spatial
models’ performance in obtaining such estimate.
1
Chapter 1: Introduction
In a world of rapid urbanization, the demand for urban greenspace has outstripped
supply. Conflict between property development and urban greenspace provision has
led to a lack of access to open space and greenery, especially in many inner-city
neighborhoods. The Los Angeles metropolitan area is no exception. This problem is
so severe in some neighborhoods that future real estate development is jeopardized.
It has even triggered political action. For example, Los Angeles’ 13
th
Council
District contains neighborhoods that have virtually no open space. In the late 1990s,
pressure to rectify this situation became so intense that Jackie Goldberg (the council
person then responsible for the district) proclaimed a “Parks First” policy: “until a
workable plan for the acquisition of open space is initiated, no development – even
in commercial districts – would be granted a green light”. Since then, Los Angeles
has undergone a “park renaissance” (Byrne et al., 2007) and while demand for
parkland is higher, it has been increasingly matched by government action. Yet these
advances are insufficient, because consistently high land prices have blunted efforts
to purchase lands for new parks. In its place, neighborhood-scale greening provides a
flexible alternative. Efforts such as greening alleys, parking lots, vacant lots, and
even creating community gardens can be part of the solution. However, good
policies for such neighborhood greening need to be informed by accurate estimation
of the benefits of these greenspaces, since this information may provide an additional
impetus to those seeking to promote such initiatives.
2
The economic benefit of urban open space has often been assessed in hedonic
pricing models (HPMs) as one of the attributes influencing residential property
values. In this dissertation I aim to offer an improved estimation of the value of
neighborhood greenspace in HPMs, with an emphasis on controlling the spatial
effects that may introduce bias and thereby confound the interpretation of the HPM
results. In the process, I also seek to further our understanding of the performance of
different spatial approaches in obtaining such estimations. Four sets of research
questions underpin my dissertation:
(1) How has ‘space’ been represented in existing hedonic studies assessing
open space value as part of property values?
(2) What is the impact of neighborhood greenspace in dense inner-city areas
like those found in Los Angeles? How do spatial effects influence the
estimation of property values in these settings?
(3) What is the impact of neighborhood greenspace across large and
heterogeneous metropolitan areas such as Los Angeles County? How do
spatial effects influence the estimation of greenspace values at this
geographic scale?
(4) How does the performance of four spatial models vary across different
geographic scales?
Each one of the following four chapters is structured to addresses one of the
aforementioned suits of questions. In the next chapter, I review many hedonic studies
of open space values, focusing on how these studies have incorporated the spatial
3
characteristics of open space, and whether they have addressed spatial effects. I also
review the available analysis techniques designed to model these spatial effects,
because they may offer opportunities for future studies to remove the estimation
biases and inefficiencies of the standard hedonic model, if the existing ones have not
done so.
In the third chapter, I use a dataset from the Vermont corridor just west of the
historic downtown core of the City of Los Angeles to ascertain the value of
neighborhood greenspace that has been capitalized into residential property values. I
apply a commonly used spatial regression model from the field of spatial
econometrics on top of the standard hedonic model, in order to discover the impact
of potential spatial effects.
Chapter 4 furthers the work in Chapter 3 by extending the study area to much
of Los Angeles County. A new dataset with samples drawn from across the county is
employed to obtain hedonic model estimate. I have also applied several different
spatial regression models to ascertain which model is most suitable for describing the
underlying spatial processes.
In Chapter 5, I have made some changes to the datasets used in the previous
two chapters, so that I could examine model sensitivity in more detail. I then used the
two datasets to analyze the performance of four commonly used spatial regression
models at two geographic scales (i.e. spatial extents). This comparison provides
some insights as to the strengths and weaknesses of these models in capturing
4
different spatial effects when constructing variables to use in HPMs estimating
property values.
Although these chapters constitute separate research foci, collectively they
show how neighborhood greenspace values can be captured and represented in
HPMs estimating property values. The empirical cases show the significant yet
varied impact of neighborhood greenspace at different geographic scales using a
variety of different spatial models. The estimated values are helpful for policy
decision makers looking to create more greenspace and/or to understand the impacts
of different land use choices and decisions. Chapter 6 provides some concluding
remarks.
5
Chapter 2: The Economic Benefits of Green
Infrastructure in Urban Areas: A Review of Methodology
This Chapter was submitted as:
Li, C. Q., Wilson, J. P. The economic benefits of green infrastructure
in urban areas: A review of methodology. Environmental and Resource
Economics. (November, 2009)
2.1 Chapter 2 Introduction
Open space is valued in urban areas for many reasons. It enhances aesthetics,
improves community health, and provides various nature’s services benefits. These
benefits include absorbing storm-water runoff, reducing air pollutants, providing
wildlife habitat, moderating wind, and even reducing urban heat island effects and
energy demands (Akbari et al., 2001; Dwyer et al., 1992; Luley, 1998; McPherson,
1992; Nowak et al., 2000; Scott et al., 1999). Consequently, open space often
increases the value of surrounding properties. Recently, we have seen many studies
that use property valuation methods to quantify the value of open space. A typical
choice is the hedonic pricing model (HPM), which defines property prices as a
function of various characteristics of properties. But few researchers have recognized
an important methodological issue with their hedonic models: how to incorporate
locational/spatial characteristics of open space into the model. Overlooking this issue
6
could lead to inaccurate estimation of open space values. This issue in explored in
this chapter.
We begin by reviewing HPM principles, and then examine how the
characteristics of open space have been represented in hedonic models. Next, we
review the potential problems with hedonic models, but focus only on the spatial
effects imbedded in them. We then examine existing open space studies that have
addressed spatial effects, mostly from a spatial econometrics approach. Following
this, we review the alternative techniques for spatial effects, from a spatial statistics
approach, which have rarely been explored in open space studies. We conclude by
suggesting several directions for future research, touching on the implications of
estimating open space values for planning practice.
Our review is not intended to be comprehensive. We do not seek to cover all
types of open space impacts in hedonic models – others have already done that (e.g.
McConnell and Walls, 2005). Instead, we aim to include representative cases from
the literature that have adopted a spatial approach. Most of these studies treat open
space valuation as their research goal, but some focus on land value or property
value estimation while incorporating open space attributes in the process. We
included both kinds of studies, because they both give and/or incorporate the inferred
value of open space attributes.
7
2.2 The Hedonic Pricing Method (HPM): A Brief Review
HPM originated from Lancaster’s (1966) argument that goods are not the direct
objects of utility; instead, it is the properties or characteristics of the goods from
which utility is derived. A differentiated good, such as a car or a house, should thus
be valued as a bundle of characteristics. Rosen (1974) then developed this argument
into a general theoretical framework that allows HPM to evaluate differentiated
goods whose characteristics vary distinctively even within a single market.
Equation (2.1) shows the typical form of the HPM, where house prices are a
function of various house characteristics. P is the house transaction price, S refers to
the house structural characteristics (e.g. square footage of living area), N represents
the neighborhood characteristics (e.g. median income), E represents the
environmental characteristics (e.g. open space proximity), and ε is an independently
and identically distributed (iid) error term.
P = f (S, N, E) + ε (2.1)
S, N, and E are three typical groups of independent variables. This model will
estimate the coefficients of S, N, and E, which represents their implicit marginal
prices. The marginal implicit prices are known as the inferred values of these
characteristics. The values of environmental characteristics, such as open space,
have often been inferred in this way
1
(Freeman, 1979; Taylor, 2003). In this process,
the real estate/housing market has been used as a surrogate market for the valuation
8
of environmental attributes. Many environmental problems relate to land use, and
thus have implications for these markets.
Researchers have used HPM to study open space value for more than three
decades. An early example is Correll et al.’s (1978) assessment of greenbelt value in
Bolder, Colorado. In their model, they included several variables of housing
structure (age of house, number of rooms, square footage of living area, and lot size),
a neighborhood variable (distance to urban center), and a variable measuring the
distance to greenbelt. They chose a linear function and estimated the model with
ordinary least square procedures. The results showed that the value of properties
adjacent to the greenbelt would be 32% higher than those 3,200 feet away. A recent
example of a HPM application is Kaufman and Cloutier’s (2006) article on the
impact of small brownfields and greenspaces on residential property values in a
neighborhood of Kenosha, Wisconsin. They used similar house structural variables,
but added dummy variables for architectural styles and conditions of the house, and
used distances from the two brownfield sites and the local park as the variables to
estimate residential property values. They chose a semi-log function for the hedonic
model, and incorporated the distance variables into the model via two forms: the
inverse or the log of distance. The two versions of the model showed that eliminating
brownfields would increase property values in the neighborhood between $1.19 and
$4.31 million, while reclaiming the brownfields to greenspace can increase values
between $2.40 and $7.01 million.
9
These two examples show how hedonic models have been applied in the
same way, except for some variations in terms of their functional forms. This is
confirmed by McConnell and Walls (2005), who provided similar examples in their
comprehensive review of open space studies. We examine how open space has been
incorporated in the hedonic model, and whether it has changed over the years.
2.3 The Inclusion of Open Space Variables in HPMs
2.3.1 Distance-based metrics
Distance to open space has long been a typical way to measure the impact of open
space in a hedonic model. The representation of distance can be divided into three
categories: (1) whether or not properties are adjacent to open space; (2) whether or
not properties are within a certain distance of open space; and (3) continuous
measures of the distance to open space (Table 2.1).
Although the simplest and coarsest of measures, adjacency may still reveal
the significant impact of open space on property prices. Vrooman (1978), for
example, found that land parcels adjacent to the Airondack Forest Preserve were
valued about $20 per acre higher than similar non-adjacent parcels. Thibodeau and
Ostro (1981) studied wetlands in Massachusett’s Charles River Basin, and reported
that the properties abutting wetlands were worth $400 more than non-abutting
properties. Do and Grudnitski (1995) concluded that sale prices of houses abutting a
golf course were 7.6% higher than would otherwise be expected. Focusing on vacant
land rather than houses, Knapp (1985) found that non-urban land was valued
10
significantly more inside the urban growth boundary (greenbelt) in Portland, Oregon
than outside of it. Recently, Thorsnes (2002) showed that building lots in Grand
Rapids, Michigan sell at a premium of about $5,800 to $8,400 when bordering a
forest preserve.
Table 2.1 List of open space variables used in hedonic models
(1) Adjacency (e.g. Vrooman, 1978; Thibodeau and
Ostro, 1981)
(2) Zones of distance (e.g Bolitzer and Netusil, 2000;
Geoghegan, 2002)
Distance-based
metrics
(3) Euclidian distance (e.g. Correll et al., 1978; Brown
and Connelly, 1983)
(1) Visibility (e.g. Luttik, 2000; Benson et al., 1998)
Individual
variables
Other metrics
(2) Size (e.g. Des Rosiers et al., 2002; Maddison, 2000)
Type × Distance / Visibility / Size (e.g. Bin and Polasky, 2002; Benson et al.,
1998; Lutznhiser and Netusil, 2001)
Distance × Visibility / Size (e.g. Benson et al., 1998; Anderson and West,
2006; Ready and Abdalla, 2003)
Composite
variables
Open space variable(s) × Structural or Neighborhood Attribute variables (e.g.
Anderson and West, 2006; Des Rosiers et al., 2002)
Indices of open
space variables
Use of more complicated mathematical functions combining two or more
open space variables into an index variable (e.g. Geoghegan et al., 1997;
Powe et al., 1997)
Sometimes, adjacency to open space can be divided into several types. For
example, Weicher and Zerbst (1973) studied five parks in Columbus, Ohio, and
found that properties facing open space and properties backing onto a park sold for
$1,130 more than properties one block away. However, properties facing a heavily
used recreational area in a park sold for $1,150 less than properties one block away
from the park.
11
Relatively speaking, zones based on distance(s) from open space are more
commonly used as open space metrics. The least sophisticated approach is to define
a zone within a certain radius from open space to represent the outer boundary of the
open space effect. Property prices within the zone are then compared with those
outside of the zone. For example, Bolitzer and Netusil (2000), focusing on Portland,
Oregon, estimated that homes located within 1,500 feet of a public park sold for
$2,262 more than homes located more than 1,500 feet from any open space; and the
effect on house prices within 1,500 feet of a golf course was estimated to be $3,400.
Alternatively, one can define a zone of open space within a radius from each
property, within which other characteristics of the open space can be studied.
Geoghegan (2002) for example, investigated land within 1,600 m of each parcel (i.e.
the land that can been seen in a 20 minute walk) and found that permanent open
space increased nearby residential land values over three times as much as an
equivalent amount of developable open space.
Multiple zones at different distances from a common center can provide
additional information about open space effects. The abovementioned Bolitzer and
Netusil (2000) study also divided their single 1,500 foot zone into six smaller
continuous zones, 0-100 feet, 100-400 feet, 400-700 feet, 700-1,000 feet, 1,000-
1,300 feet, and 1,300-1,500 feet intervals. Each property under investigation was
then designated to one of the zones through a dummy variable. The hedonic analysis
results showed a distance-decay pattern among the significant parameters: increased
distances to open space led to smaller increases in property values. Similarly, an
12
earlier study by Hammer et al. (1974), who focused on land values surrounding the
1,294-acre Pennypack Park in Philadelphia, Pennsylvania, found that the park
accounted for 33% of the land value at 40 feet, 9% at 1,000 feet, and 4% at 2,500
feet. Within their study area, a net increase in real estate value of $3.3 million was
directly attributable to the park.
Besides zone definition, many studies have also used distance in a continuous
fashion. For example, the frequently cited Correll et al. (1978) study measured
walking distance to the greenbelt from each property, and found a $4.2 decrease of
property prices for each additional foot away from the greenbelt up to 3,200 feet.
This meant that properties within 3,200 feet of a greenbelt are worth 32% more on
average than those beyond this distance. Brown and Connelly (1983) applied a
similar approach to state parks, and found that two New York State parks bestowed
$50 to $72 in additional values for every 100 feet that properties were closer to the
parks. However, a common critique of these approaches is that they do not recognize
the diminishing level of impact for the same unit of distance. A possible solution
could be to use multiple distance variables, one for each zone. Nelson (1986) took a
step in this direction in evaluating the impact of an urban containment greenbelt in
Salem, Oregon. He found that land value increases with proximity to greenbelt
within 5,000 feet of the greenbelt, but decreases from 5,000 to 17,000 feet outside
the greenbelt.
13
2.3.2 Other metrics
The view/visibility and size of open space have also been frequently used as open
space attributes in hedonic models. They often enter a hedonic model along with the
distance variable in a linear (i.e. additive) fashion to estimate the value of nearby
open space. The visibility of open space can be represented as a simple existence /
non-existence dummy variable in a hedonic model (Luttik, 2000) or alternatively, the
visibility can be first coded by type (e.g. ocean, lake, mountain, etc.) and / or quality,
before being converted to dummy variables (Benson et al., 1998; Paterson and Boyle,
2002). The size of open space is often measured as a percentage of the overall study
area (Des Rosiers et al., 2002; Garrod and Willis, 1992a, 1992b; Geoghegan et al.,
1997; Irwin, 2002; Irwin and Bockstael, 2001). Sometimes, researchers use the area
of open space within a certain distance to properties directly (Maddison, 2000;
Mahan et al., 2000; Morancho, 2003).
Researchers have also used proxy measures of open space area. For example,
Peiser and Schwann (1993) used ‘lot depth’ to represent greenspace area when they
compared the value trade-off between private back yard greenspace and public
greenbelt between private properties. Tree cover and sometimes tree counts have
been treated as a proxy of size, to estimate the value of mature trees surrounding
properties (Morales et al., 1976; Standiford et al., 2001). Sometimes, researchers also
take into account different spatial scales (i.e. geographic extents) while measuring
open space areas. For example, Des Rosiers et al. (2002) defined the percentage of
14
tree cover at the property and neighborhood levels, as two open space variables, in
their analysis of neighborhood landscaping and house values.
In general, both visibility and size of open space have shown significant
positive effects. For instance, Luttik (2000) found that property values in the
Netherlands were increased by pleasant views of various open space types, such as
water (8% to 10%), open space (6% to 12%), and attractive landscape (5% to 12%).
In a detailed study of wetlands, Mahan et al. (2000) reported that increasing the size
of the nearest wetland to a residential property by one acre increased the property
value by $24.
2.3.3 Composite variables
Two or more of the aforementioned open space variables are often added to hedonic
models individually (e.g. distance to waterfront is included along with the size of
greenspace setback in Brown and Pollakowski (1977); distance to sea is included
along with percentage of various farmlands and permanent grassland in Le Goffe
(2000)). Often times, however, they are combined with each other or with other open
space variables via interactive (i.e. multiplicative) terms. Such combinations allow
further division and thus more detailed analysis of open space effects.
The most common composite variables combine open space types with the
aforementioned variables: distance to open space (in the form of both zones and
continuous measures), size, or visibility of open space. These variables are often
combined with a dummy variable representing the open space type, thus allowing
their impacts to vary by different open space types. Following this approach,
15
Lutzenhiser and Netusil (2001) found that both natural area parks and parks
dedicated to specialized features / facilities had positive and significant effects on
property values, but the latter brought larger property value increments as park size
increases.
However, such explicit multiplicative/interactive composite variables have
not been implemented as often as they could have been in hedonic models. Instead of
using dummy variables, many analysts have simply constructed separate variables
for open space attributes by different open space types. For example, considering the
potential different effects of coastal and inland wetlands, Bin and Polasky (2002)
constructed distance variables for these two types of wetland separately. They found
that reducing the distance to coastal wetland by 1,000 feet raised the property prices
by $1,010 at an initial distance of one mile, but the same distance for inland wetlands
decreased the property price by $567. Similarly, Anderson and West (2006), Espey
and Owusu-Edusei (2001), King et al. (1991), Schultz and King (2001), Tyrväinen
(1997), and Tyrväinen and Miettinen (2000) also measured distance to open space by
different open space types separately. Some studies also constructed visibility
variables for different types of open space. For instance, Benson et al. (1998)
reported that the positive impact of open space view ranged from 8.2% for a poor
partial ocean view to 126.7% for a lakefront view (possibly because the lake also
provides recreational opportunities).
Another common group of composite variables combine distance (to open
space) variables with open space visibility or size. The abovementioned Benson et al.
16
(1998) study furthered their analysis by multiplying view variables with dummy
variables representing distance zones. Their results showed that an unobstructed
ocean view adds 68.3% to value if the property is located within 0.1 mile from the
ocean, but only 44.7% to value if the property is located further than a mile from the
ocean. Anderson and West (2006) used multiplicative terms to examine distances to
various open space types and their sizes in the Minneapolis–St. Paul metropolitan
area. They found that proximity to a neighborhood park, a golf course, or a lake
would decrease surrounding property values as park size increases, but this
relationship was reversed in the case of a special park (e.g. national, state, or regional
park), or a cemetery. Such multiplication terms are not always used directly. Ready
and Abdalla (2003) also combined distance and size variables when they estimated
the impact of surrounding land use on residential properties in Berks County,
Pennsylvania. They measured the size of open space within 400 m and between 400
and 1,600 m from properties separately. Their results showed that increasing open
space size had greater positive effects on property values within the 400 m zone.
These positive effects not only decreased in the 400-1,600 m zone, but they were
also only significant for public land or land covered by a conservation easement
within that distance. Geoghegan et al. (2003) took a similar approach: using separate
variables to represent the percentages of open space within a 100 m buffer around
each parcel (proxy for the view of open space) and a 1,600 m buffer (proxy for the
distance of a 20-minute walk) in three Maryland counties. The estimation results
showed very different impacts of open space coverage within these two buffers.
17
Sometimes, composite variables multiply open space variables with house
structural attributes or neighborhood attributes. For example, Anderson and West
(2006) multiplied distances to open space with lot size, population density, distance
to the CBD (central business district), income levels, crime rates, and neighborhood
age composition, respectively. These composite variables turned out to be necessary,
because the authors found that proximity to open space was valued higher in
neighborhoods that were dense, had high income, high crime, were located near the
central business district, or contained many children. The aforementioned Des
Rosiers et al. (2002) study also generated interactive variables combining landscape
feature sizes (i.e. percentage of ground cover and tree cover) with house types (e.g.
bungalow, cottage, semi-detached house, and row house) or demographic features
(e.g. age of household residents). Their results showed that increasing tree cover
upped most property values except for those households dominated by residents aged
45-64 years. Also, only bungalows and cottages benefited from larger amounts of
immediate ground cover.
Occasionally, researchers combine more than two variables into a single
‘index’ variable. For instance, Geoghegan et al. (1997) developed indices for land
use diversity and fragmentation by combining land use types, size, and distance.
Powe et al. (1997) incorporated four woodland accessibility (distance) measurements
into a forest access index in order to reduce potential multicollinearity between the
individual measurements. Din et al. (2001) constructed a geo-index that described a
weighted average of eight environmental qualities, including distance to nature and
18
quality of view. Acharya and Bennett (2001) evaluated New Haven’s watersheds by
using several index variables, including diversity, richness, and development indices
that were all compiled from multiple open space variables.
2.3.4 Role of Geographic Information Systems (GIS)
All the above mentioned open space variables – distance, view and size – can be
measured efficiently in a Geographic Information System (GIS). A GIS can quickly
assemble large amounts of spatial data, link spatial features to data, and visualize
spatial analysis results. Environmental economists have used GIS to measure open
space size and/or distance to open space (e.g. Bolitzer and Netusil, 2000; Geoghegan
et al., 1997; Kestens et al., 2004; Mahan et al., 2000; Powe et al., 1997), to link
census data with open space variables (e.g. Shultz and King 2001), or to geocode
house addresses and specify geographic coordinates for properties (e.g. Acharya and
Bennett 2001).
Some researchers have used GIS analysis to construct more elaborate spatial
variables. Spatial query and buffering are relatively simple analyses and can be
quickly used to identify open space within certain distances (e.g. Lindsey et al., 2004;
Mooney and Eisgruber, 2001). More sophisticated analyses include network analysis,
image classification, and viewshed analysis. Kestens et al. (2004) used the network
analysis function to calculate ‘car time distance’ (CTD), a more accurate measure of
accessibility than the distance ‘as the crow flies’. CTDs were computed based on a
road network, taking into account various obstacles and constraints on the road.
Kestens et al. (2004) also employed the image classification functions of GIS to
19
identify various land uses from satellite images. Viewshed analysis has been
adopted by researchers to study the impact of open space views (e.g. Bastian et al.,
2002; Din et al., 2001; Lake et al., 1998; Paterson and Boyle, 2002). These analyses
use topographic data (e.g. digital elevation models) to identify the visible areas from
a source point. Thus researchers can discover a continuous view extent and overlay
that with the composition of various land uses in their study area.
The most recent GIS developments have pushed it beyond these data
management and analysis functions. Researchers have advocated the integration of
GIS and spatial statistics to support more comprehensive spatial statistical analysis
(e.g. Anselin, 1992; Ding and Fotheringham, 1991; Goodchild et al., 1992;
Openshaw et al., 1991; Pebesma and Wesseling, 1998). This integration can
significantly help when addressing the problems that we are about to discuss in the
next section.
2.4 Potential Problems with Inclusion of Open Space Variables in
HPMs
Next, we will discuss the problems caused by spatial effects embedded in hedonic
models when assessing environmental amenities, such as open space. There are some
other, better known potential problems with hedonic models, such as functional form
misspecification, multicollinearity among variables, and omitted variable bias, etc.,
but these issues have already been reviewed in the literature (e.g. Arguea and Hsiao,
20
1993; Cropper et al., 1988; Halvorsen and Pollakowski, 1981; Palmquist, 2005;
Pendleton and Shonkwiler, 2001; Taylor, 2003).
As reviewed above, the spatial attributes of open space are often expressed as
distance to open space, open space size or visibility (Table 2.1). However, using
these variables assumes a stable and fixed spatial process for the hedonic model,
which is often not true. Open space attributes have a strong attachment to location,
which exposes them to spatial effects, namely spatial heterogeneity and spatial
dependency.
Spatial heterogeneity generally refers to the lack of spatial uniformity. It
either points to the heterogeneity of spatial units within the study area, or the
structural instability of the targeted variable’s behaviour over space due to the varied
conditions in local areas. In a regression model, heterogeneous spatial units often
cause misspecifications or measurement errors that lead to spatial heteroscedasticity
in the error term (i.e. non-constant error variance), which is also known as ‘spatially
induced heteroscedasticity’. Ignoring it will lead to inefficient estimation of
coefficients and invalid t- or F-test results. The structural instability on the other
hand, is a more substantive spatial heterogeneity. It implies varying functional forms
and parameters across the study area (Anselin, 1988b; Florax and de Graaff, 2004).
For example, the coefficient of an additional bedroom would be much larger for the
Beverly Hills area than a Compton neighborhood in the Los Angeles, California
metropolitan context. Ignoring such structural instability will cause biased
coefficient estimation. Open space attributes are typically associated with multiple
locations, and as a consequence are subject to both forms of spatial heterogeneity. In
addition, housing prices and attributes often vary between neighborhoods as well.
Spatial dependence, on the other hand, refers to the local-scale dependence
among observations that are only decided by their distances to each other. In other
words, an observation’s value at one location is affected by nearby observations’
values of the same attribute (Anselin, 1988b; Griffith, 1987) (Figure 2.1). A
regression of these observations will return spatially autocorrelated error terms when
ordinary least squares (OLS) regression is used. The estimated coefficients will thus
be inefficient although still unbiased, and the t- and F-test results will be invalid
(Anselin, 1988b; Anselin and Griffith, 1988; Cliff and Ord, 1981; Pace and Gilley,
1997). But if the autocorrelation in the error term is caused by omitted variables,
which indicates substantive spatial processes, then the estimation results will be
biased as well (Anselin, 1988b).
Figure 2.1 Spatial autocorrelation effects: (a) similar value clustering (positive
autocorrelation); (b) random pattern; and (c) dissimilar value clustering
(negative autocorrelation) (reproduced from Griffith 1987, p.37)
Nearby open space features do not necessarily interact with each other
spatially, but their values are estimated through property values, which often affect
each other within a certain neighbourhood. When such spatial interaction is
21
22
substantial enough to be captured by some variable(s), the autocorrelation is
regarded as structural spatial dependence. These variables are often spatially
correlated with other explanatory variables (e.g. open space variables were spatially
correlated with omitted variables in Irwin and Bockstael (2001)), thus omitting these
variables in a hedonic model can lead to an autocorrelated error term as well as
biased estimation.
Often times, spatial autocorrelation is also caused by measurement errors.
Such errors come from the spatial unit discrepancies between the spatial scale of the
economic data collection and that of the phenomenon under study (Anselin, 1998,
2001b, 2002). Hedonic models estimate open space values through the behaviors of
individual property values in response to the changes of various characteristics. Yet
some characteristics in the model are not measured at the individual property level,
but rather aggregated to a smaller scale, such as census data describing neighborhood
characteristics, or the variables defining local real estate markets. This aggregation
will lead to positive autocorrelation among individual property values. Such
autocorrelation is regarded as nuisance spatial dependence, which only leads to
inefficient estimations.
In summary, it is necessary to address spatial heterogeneity and spatial
dependency in hedonic models. Environmental economists need to take these spatial
effects into account, in order to achieve accurate estimation via hedonic modeling.
However, the existing literature of open space valuation only touches on this issue
sporadically. The next subsection will review the studies that have tackled these
issues thus far.
2.4.1 Recent Work on Spatial Effects
Environmental economists have used the spatial econometric approach to estimate
the value of open space. Standard econometric techniques can account for spatial
heterogeneity by introducing random coefficients and thereby producing varying
parameters. But so far, researchers have mostly chosen another option: extending the
standard econometric expansion method to the spatial context (Casetti, 1972, 1997).
The spatial expansion method can be formally described as follows:
i im i i i i i
x x y ...
1
(2.2)
where
i i i
v u
2 1 0
,
i i i
v u
2 1 0
,
i i i
v u
2 1 0
, and u
i
and v
i
are x, y coordinates for location i. The linear equation here is not always appropriate
and more complicated equations are possible. We can see that the expanded
parameters α
i
, β
i
, and τ
i
are now location specific.
A few studies have employed the spatial expansion method, although not
using x, y coordinates directly. Instead, they used distance to certain places to reflect
the locations of individual observations. For example, Geoghegan et al. (1997) added
interactive terms that multiply open space variables (e.g. open space percentage and
landscape indices) with distances from Washington D. C., in both linear and
quadratic forms, to their standard hedonic model. They found that both land use
diversity and fragmentation positively affected property prices in the immediate
23
proximity of Washington D.C. and at the extreme outer edge of the sampled area, but
the in-between areas were affected negatively. In addition, the spatial expansion
model considerably improved the significance of the parameters over the standard
hedonic model. Similarly, Patton and McErlean (2003) employed a spatial regime
model that jointly estimates separate coefficients for each sub-market in the
agricultural land market of Northern Ireland to control for spatial heterogeneity.
Both of the aforementioned studies focused on incorporating spatial
heterogeneity as structural instability, by adopting specifications that allow for
varying parameters. Des Rosiers et al. (2002) and Kestens et al. (2004) proved the
presence of spatial heteroscedasticity, another aspect of spatial heterogeneity,
through the Glejser (1969) and/or Goldfeld and Quandt (1965) tests. However,
neither study treated spatial heteroscedasticity with any modeling technique, such as
the generalized least squares specification, because it was outside the scope of their
research.
To control for spatial dependence/autocorrelation, environmental economists
have adopted spatial regressions, which take the following general form:
U X Y W Y
1
(2.3)
U W U
2
(2.4)
where W represents the spatial weighting matrices that define the spatial interactions
between each pair of observations, and U is the spatially autocorrelated residual
matrix. Thus, Y W
1
is a spatially lagged dependent variable and U W
2
is a spatially
24
autoregressive error term. As a result, ε is independently and identically distributed
about a mean of zero.
When λ = 0, the model is called a spatial lag model, in which the response
variable is also a function of the values of itself in surrounding areas. This model
focuses on the structural spatial dependence of Y that is caused by substantive spatial
processes (Can, 1990, 1992; Can and Megbolugbe, 1997). The spatial lag term
Y W
1
acts as a proxy for omitted variables representing the spatial processes.
Paterson and Boyle (2002) implemented the spatial lag model to assess the impact of
open space (size and visibility) in two Connecticut towns. But the resulting
parameter estimates were only slightly different from those obtained with the OLS
model.
When ρ=0, the model became a spatial error model, where correlated
residuals are modeled as functions of nearby observations (Pace and Barry, 1997;
Pace et al., 1998; Pace and Gilley, 1997). In this case, spatial dependence is treated
as statistical nuisance due to misspecifications or measurement errors (e.g. spill-over
on manmade boundaries). Geoghegan et al. (2003) used this model in assessing the
value of open space capitalized into the housing markets of three counties in
Maryland. Their spatial model gave results similar to the OLS estimation though,
with only slight variations in parameter values and no statistically significant
differences. In general, their results showed that the percentage of permanent open
space surrounding properties enhances property values, whereas developable open
25
26
space has insignificant or negative impacts. Bell and Bockstael (2000) also used a
spatial error model to control autocorrelation. The results showed the negative
impacts of public open space and positive impacts of private open space. We do not
know how much improvement was brought by the spatial error model though, since
the author did not provide any comparisons.
The choice between the two specifications depends on the spatial structure of
the targeted data, and the inclusion of both spatial terms may be feasible in some
instances (Kelejian and Prucha, 1998; Wilhelmsson, 2002). An ad hoc rule that may
be used to choose between the two models is to perform a specification test (e.g.
Lagrange Multiplier, Likehood ratio, or Wald test) on both models, and then choose
the one with higher test statistics that is statistically significant (Anselin and Rey,
1991; Florax et al., 2003). Alternatively, Anselin et al. (1996) and Bera and Yoon
(1993) derived misspecification tests for the spatial lag and spatial error models that
are robust to local misspecifications.
Researchers may choose either the spatial lag or spatial error model, although
Bell and Bockstael (2000) and Patton and McErlean (2003) were the only studies
that used specification tests to determine their model forms. The other studies did not
mention the rational for their choice, which echoes with Anselin’s (2002) comment
that there is a limited understanding of the model’s underlying spatial processes
when researchers apply spatial econometric techniques.
In addition, Anselin (1988b) pointed out that spatial autocorrelation should be
tested in combination with spatial heteroscedasticity, because they often coexist, and
27
their coexistence could invalidate the individual tests. But we did not find any open
space studies that performed such tests. Des Rosiers et al. (2002) and Kestens et al.
(2004) were the only studies that tested heteroscedasticity at all, albeit independent
from any spatial autocorrelation test. Such an absence is probably due to the lack of
easy-to-use tests that can detect both spatial effects. Kelejian and Robinson (1998)
and de Graaff et al. (2001) proposed new tests, the KR-SPHET and spatial BDS tests,
respectively to take a step toward a general specification test. But few empirical
studies have applied them.
Patton and McErlean (2003) did not test spatial dependence in combination
with spatial heterogeneity, but directly modeled them together. They showed how
these two spatial effects can influence each other. Their model included variables
representing spatial regimes to address spatial heterogeneity and either a spatially
lagged dependent variable or a spatially autoregressive error term to address spatial
dependence. They chose the spatial lag model in the end because specification tests
showed that the significance of spatial heterogeneity was inconclusive in the regular
spatial regime model, but definitely significant once the spatial dependence
specifications were added.
We can see that spatial regressions have a lot of subtleties to be worked out,
but they only have limited recognition in open space studies. This limitation makes it
hard to propose general rules and/or identify the impacts of applying spatial
regressions to open space studies. Therefore we need more research of these effects
and protocols in open space studies.
28
Besides the spatial econometric regressions, some other existing treatments
of spatial autocorrelation would also benefit from more exploration. For instance, the
interactive terms (i.e. multiplying two or more variables together to make composite
variables) used to build spatial expansion models to control for structural spatial
heterogeneity, have also been shown to eliminate spatial autocorrelation. For
example, Kestens et al. (2004) successfully reduced the Moran’s I value (although it
was still significant) by adding interactive variables multiplying environmental and
distance variables (i.e. multiply the percentage of lawn within a 300 m radius of a
property to the car-time distance from the property to main activity centers), when
they analyzed the impact of spatial variation of land use and vegetation patterns on
house prices in Quebec City, Canada. Des Rosiers et al. (2002) also used interaction
variables in their model to discover the impact of landscaping on house prices in
Quebec City. However, they defined interactive variables multiplying landscaping
features and housing types or neighborhood demographic features. In addition, they
applied both ‘absolute interaction’ and ‘relative interaction’ terms. ‘Relative
interaction’ variables were generated by transforming the values of continuous
variables (e.g. percent of green ground cover) to values showing their differences
from the local means. The estimation results showed that the relative interaction
approach can successfully eliminate significant spatial autocorrelation. Although the
authors did not explain why, we think that the ‘relative’ values reduced the local
clustering of values, which in turn reduced the clustering in the error terms. So in a
sense, the relative variables acted as proxies for some omitted variables regarding
29
landscaping features (i.e. similar styles, density, tree species) that lead to
autocorrelation in the error term. But this needs to be confirmed by future studies of
this type.
In addition, we found open space studies using instrumental variables to
control for spatial autocorrelation as well as endogeneity. For example, Irwin and
Bockstael (2001) and Irwin (2002) both incorporated open space attributes (e.g. size,
ownership and development potential) in their hedonic models. In the process, they
identified spatial autocorrelation caused by omitted variables that are spatially
correlated with other explanatory variables. They also discovered a common
econometric identification problem in their model: an endogenous explanatory
variable, which is determined by one or more other variables within the model.
Specifically, the likelihood of a parcel’s development is a function of its residential
value, which is a function of whether a neighboring parcel is developed. Both spatial
autocorrelation and endogenous explanatory variables will negatively affect model
estimation. To control them, both studies introduced instrumental variables –
variables that are correlated with one or more of the explanatory variables but not
correlated with the error term – into the hedonic model, and successfully removed
the estimation bias. Irwin (2002) also improved the efficiency of the model by using
a randomly drawn subset of the data that eliminated nearest neighbors. She adopted
this strategy instead of using the conventional spatial autoregressive error model,
because she could not generate a spatial weighting matrix when the error term was
correlated with the endogenous spatial variable. Her results showed that open space
30
was most valued for its permanency (i.e. an absence of development) rather than for
providing open space amenities. However, we need more studies with instrumental
variables to gain a better understanding of these types of spatial processes.
2.5 Alternative Approaches to Control Spatial Effects
The abovementioned methods all treat spatial dependence and spatial heterogeneity
with a model-driven orientation (Anselin, 1988b). Spatial statistics offer a different
approach that addresses both spatial effects in a data-driven orientation, focusing on
the nature of both space and spatial data. Some spatial statistics techniques offer
great potential. They include geostatistics, local statistics, geographically weighted
regression, and spatial filtering methods. An increasing number of real estate
researchers have adopted some spatial statistics techniques to address spatial effects.
But their models rarely incorporate open space attributes, meaning open space
features have rarely been valued with such techniques.
Geostatistics is the most common spatial statistics technique that has been
used in real estate studies. It controls spatial autocorrelation by directly modeling the
variance-covariance matrix of the error term (e.g. Basu and Thibodeau, 1998; Dubin,
1988, 1992, 1998; Gillen et al., 2001). Such modeling usually plots a semi-variogram
first, which visualizes the structure of spatial autocovariance in the error term. Such
spatial structure is shown via the degree of value differences among the residuals
over a given distance (see Bailey and Gatrell 1995 for a detailed explanation). The
semivariance γ thus represents spatial dependency based on a stationarity assumption
31
(i.e. that there is no spatial heterogeneity). Next, researchers will choose a functional
form (e.g. negative exponential, spherical, Gaussian, etc.) to fit the semi-variogram
based on the structure of the spatial data and their experience. The parameters of the
chosen functional form are then estimated simultaneously with the regression
coefficients through an iterative form of the generalized least squares approach.
Militino et al. (2004) compared this geostatistics approach with spatial regression
models, and found that the former has larger log-likelihood than the latter, although
at the cost of higher residual standard errors.
Recently, there has also been a growing interest in modeling spatial effects
from a local perspective, because this technique can reveal more information on the
underlying spatial process than global spatial regression. Only a couple of studies
have included open space attributes. For example, Orford (2002) proposed a
multilevel modeling approach, in which parks and open space were included as
important factors to be evaluated along with other locational externalities. He
believes that the spatial autocorrelation was caused by ignoring the spatial processes
at different spatial scales and regressing all factors at the global level. Thus he chose
variables at three spatial scales -- (1) the property (e.g. accessibility to major activity
centers, roads, parks), (2) the street (e.g. class and quality of streets, activity in
streets), and (3) the neighborhood (e.g. percentage of open space, social and racial
composition) scales – to capture the spatial structure of the housing data. Orford
(2002) thought that no significant autocorrelation would be left in the residuals as a
result, although no formal tests were performed to confirm it.
32
Anderson and West (2006) took a different local analysis approach, by
incorporating a local fixed effect variable (at the census block group level) into their
model to control for unobserved neighborhood characteristics, geographic locations,
and omitted spatial variables. The local fixed effect variable represents unobserved
individual local variations of value over space. These authors did not explain either
spatial heterogeneity or spatial dependence in the article, nor test for their existence.
However, we suspect that their model nonetheless reduced spatial autocorrelation
because the fixed effect variable probably served as a proxy for the omitted variables
that lead to spatial autocorrelation. Goodman and Thibodeau (1998; 2003), on the
other hand, adopted the local perspective through a classic multilevel (i.e.
hierarchical) modeling approach: segmenting the housing market into submarkets in
which the marginal prices of housing characteristics are homogeneous. The
segmentation process uses the spatial influence of housing structural attributes and
other spatial attributes at various levels and as a consequence, reduces spatial
autocorrelation.
The strength of multi-level modeling lies in its explicit breakdown of spatial
processes, which enables convenient model interpretation. However, a major
problem with multilevel modeling is that it assumes a priori knowledge of the nature
of the error term variance in order to delineate discrete boundaries, whereas the
spatial processes are actually continuous. In addition, the adoption of a multi-level
approach also decreases the number of observations for estimation at each stage,
which could increase estimation errors. Geographically Weighted Regression (GWR)
33
offers an alternative treatment to avoid these problems (Brunsdon et al., 1996;
Fotheringham et al., 2002; Fotheringham and Charlton, 1998), because it assigns
different weights to each data point based on distance, thus generating coefficients
that vary across space. The resulting coefficients of GWR can then be mapped in a
GIS, which facilitates the discovery of spatial patterns. This unique feature of GWR
is not available from spatial regression or geostatistical models. Pace and Lesage
(2004) highlighted the two major technical challenges with GWR: (1) defining a
single optimum bandwidth for the varying local weights; and (2) avoiding the impact
of outliers. But there are available counter strategies, such as the adoption of a
locally adjusted bandwidth according to data density or scarcity (as mentioned in
Bailey and Gatrell (1995)), or the adoption of a Bayesian approach as proposed by
Lesage (2000), although few empirical studies have applied them. Only a few real
estate analysis applications have adopted GWR so far (see Yu 2004 for one example),
and we only found one application on greenspace: Kestens et al.’s (2006) study
applied both GWR and spatial expansion model to assess various household
characteristics, including vegetation cover.
Spatial filtering methods are also loosely included in the local modeling
group. They strip spatial autocorrelation off the dependent variable before it is
regressed against other variables in a model. The removal is achieved through a
spatial filter that captures the local variation of the spatially dependent variable
(Getis and Griffith, 2002; Griffith, 2003; Tiefelsdorf and Griffith, 2007). This spatial
filter is included in a hedonic model as an additional explanatory variable, but its
34
sole mission is to capture all the spatial variations, in order to keep the other
explanatory variables clean from any spatial effects. The specific forms of the spatial
filter range from a locally fitted surface of x,y coordinates to a filter based on local
autocorrelation measurements (Getis, 1995) or a more sophisticated filter based on
autocorrelation measurement and eigenvector decomposition techniques (Griffith,
1996, 2000), depending on the data at hand. The spatial filtering method has a
distinct advantage in that it allows OLS estimation, which is much easier to fit and
interpolate than the maximum likelihood estimation method used for spatial
autoregressive regression. So far, spatial filtering has been applied to various
economic analyses, but we have not found any real estate hedonic models applying
this technique.
As mentioned at the end of section 3, GIS has been mostly used for data
management and as a simple spatial analysis tool to measure and construct spatial
variables or sometimes a spatial weighting matrix. This, in turn, has promoted the
growth of spatial econometric applications in environmental economics studies. But
recent advances integrating GIS and spatial statistics have not been well recognized
in the field of spatial econometrics or spatial statistics, when applied to
environmental economics studies. This is unfortunate because the integration offers
unique advantages to discover and model spatial processes. For example, recent
couplings between GIS and spatial statistics packages (e.g. SpaceStat by Anselin
1992; GeoDa by Anselin and Syabri 2003; SAGE by Wise et al. 2001), have made it
possible to view the graphic patterns of spatial autocorrelation together with
35
quantitative test results (e.g. Moran’s I, Getis-Ord General G). In addition, local
measurements of spatial autocorrelation (e.g. local Moran’s I and Getis-Ord G
i
*) can
be mapped to identify local clusters or outliers, which may help to construct
hypotheses on the causes of autocorrelation. In addition, integrating GIS and spatial
statistics helps to estimate the spatial structure of the autocorrelated residual, by
plotting and fitting a semi-variogram. The identified spatial structure is then used to
construct the weighting matrix for spatial regressions or to estimate the covariance
matrix in the error term directly. Sometimes, the integration between a GIS and
statistical package is looser (e.g. S-PLUS with ArcView, SPSS with MapInfo), and it
essentially allows regression results to be displayed in a GIS in an interactive manner,
in order to explore the spatial pattern of the regression results. Such visualizations
can nevertheless provide insights to the underlying spatial processes of the observed
pattern. Overall, the development of GIS can contribute to environmental economics
most when it generates additional information from existing spatial patterns – the
‘creative’ use of space (Bateman et al., 2002).
In summary, all the above techniques treat spatial effects according to the
spatial structure of the data, instead of imposing an a priori model. Applying these
techniques to open space studies in a hedonic model may yield unique advantages.
But few studies have attempted this, which suggests that future research along these
lines would be fruitful.
36
2.6 Conclusions and Future Directions
In this chapter, we have reviewed studies that estimate the effect of open space on
property values in hedonic pricing models, taking a methodological focus to
investigate how hedonic models incorporate open space attributes. Our review shows
that most studies have simply measured the impact of open space in terms of its
distances to properties, its view from properties, and its size. The specific form of
these measurements varies from case to case, generating a wide range of open space
variables. But these open space variables are unable to capture spatial effects caused
by spatial interactions between open space features and property attributes. Ignoring
these spatial effects, namely spatial heterogeneity and spatial
dependence/autocorrelation, can lead to inefficient or even biased estimation of the
hedonic model.
Our review found that several open space studies have started to employ
spatial modeling techniques, such as the spatial expansion method, spatial
regressions, and local spatial statistics to address spatial effects. But the limited
studies available are not enough to clarify or resolve the many subtleties
incorporated in these spatial modeling techniques. In addition, quite a few alternative
techniques, such as geostatistics, multi-level modeling, GWR, and the spatial
filtering method, are now available to provide additional insights into the spatial
processes that generate these spatial effects. But few open space studies have utilized
these techniques. There is a clear missing linkage between the available spatial
37
analysis techniques and the way space has been represented in existing open space
studies. We need more empirical studies that apply all of these spatial analysis
techniques, to ensure accurate estimation results, to gain a better understanding of
the applicability of these techniques, and to discover the characteristics that make
one approach the most appropriate one for any specific study. The recent
development of GIS has created a favorable environment for such a research agenda,
because it not only helps to construct spatial variables, but also provides a powerful
platform for spatial modeling, analysis, and visualization.
Planning authorities often face the dilemma as to how to reconcile the
demand for open space and the lack of funding for land acquisition. This dilemma
could be alleviated by the additional value of open space embedded in property
values, because the additional property values will generate an increment of property
tax which might contribute to the funding of open space acquisition and maintenance
(see Crompton 2001, 2004; Nicholls 2004 for further articulation of this argument).
It is essential to obtain accurate estimation results in order to gather support for
meaningful policy decisions, thus the necessity for the methodological improvements
discussed in this chapter.
The abovementioned dilemma can also be alleviated by switching the
acquisition target from significant acreages of open space to smaller areas, or by
adopting neighborhood-scale greening. This latter strategy can enhance greenspace
and property values without demanding major land resource commitments, and may
also help to revitalize depressed inner city residential real estate markets.
38
Unfortunately, only a few studies have investigated the impact of open space at the
neighborhood scale (Anderson and Cordell, 1985, 1988; Diamond et al., 1987;
Dombrow et al., 2000; Kestens et al., 2004, 2006; Morales et al., 1983). So there is
ample opportunity for future research to fill this knowledge gap, and at the same
time, determine the needed adjustment to hedonic modeling and spatial modeling
techniques.
39
Chapter 2 Endnotes
1
The value estimated via hedonic models is restricted to the instrumental / utilitarian
value of the environmental goods. We acknowledge the existence of other values: the
option of personal future use, ecological functions, non-use values such as future
generation benefits and existence values, and even intrinsic values of the resource in
its own right, but they are not part of the scope of this chapter.
40
Chapter 3: A Spatial Autocorrelation Approach for
Examining the Effects of Greenspace on Residential
Property Values: An Inner City Case Study
This Chapter was published as:
Conway, D., Li, C. Q., Wolch, J. R., Kahle, C., Jerrett, M. (2008). A spatial
autocorrelation approach for examining the effects of greenspace on
residential property values. Journal of Real Estate Finance and Economics,
DOI 10.1007/s11146-008-9159-6.
3.1 Chapter 3 Introduction
Lack of access to open space, greenery, and recreation facilities is a problem in many
inner city neighborhoods of the southern California region and other densely
populated urban areas. This issue is linked to several key social problems, including
community health (e.g. childhood obesity, Gordon-Larsen et al., 2000),
environmental health (e.g. higher temperature and greater air pollution, Longcore et
al., 2004), and environmental justice (Wolch et al., 2002). In some communities lack
of open space has also reduced potential for future real estate development (Harnik,
2000). As a result, affected communities have advocated the improvement of open
space. For example, Proposition 12 of California, a $2.1 billion state bond measure
of which half was dedicated to urban areas, sought to improve access to green space.
41
In inner city neighborhoods, high land prices often prevent the acquisition of
significant open space for park acreage. Demolishing affordable housing and
neighborhood businesses is an unattractive and sometimes costly alternative
1
. As a
feasible solution, smaller open spaces, as well as community greening efforts, could
positively impact property values without demanding major land resource
commitments. Such neighborhood-scale enhancements could also help to revitalize
depressed inner city residential real estate markets.
In past empirical studies, proximity to amenities, such as parks, has been
treated as an important factor for older inner city house pricing and was also
intended to inform neighborhood redesign decisions. In contrast, limited empirical
research has addressed the amenity effects of neighborhood greenspace on real estate
property values, despite its potential positive impacts. For example, proximity to
barren urban alleys usually compromises home prices, but a greened alley with
native trees and permeable surfaces could increase home values. Using data from an
inner city housing market near downtown Los Angeles, the analysis in this chapter
seeks to fill this research gap by using a geographic information system (GIS) to
create coverage variables for neighborhood greenspace that serve as explanatory
variables in a hedonic pricing model.
In addition, we extend the hedonic estimation of neighborhood greenspace
effects by controlling spatial autocorrelation through spatial regression techniques.
Ignoring spatial autocorrelation may cause ordinary least squares (OLS) to produce
inefficient coefficient estimates (Anselin, 1988b; Cliff and Ord, 1973). Our spatial
42
regression helps to correct these estimation problems for the price effects of
neighborhood greenspace – both in the form of traditional parks and the overall level
of greenspace or ‘green cover’ near residential dwellings.
3.2 Housing Prices and Open Space Amenities: Background
Studies of urban residential housing prices and the nature of demand for specific
housing and related neighborhood characteristics have typically employed variants of
Rosen’s (1974)classic hedonic price analysis. Marginal implicit prices (or
willingness to pay) of housing characteristics are estimated using either a one-step
procedure (following Ridker and Henning, 1967)or two-step structural equation
approach (following Rosen, 1974; for a comparison of these two approaches, see
Blomquist and Worley, 1981). The estimated marginal implicit prices can provide
measurement for the impacts of neighborhood characteristics, such as demographic
mix (Dubin and Sung, 1990), school quality (Li and Brown, 1980; Pogodzinski and
Sass, 1991), land use policies (Wolch and Gabriel, 1981), and controversial
neighborhood facilities (Gabriel and Wolch, 1984). In particular, a group of studies
have used marginal implicit prices to measure the value of open space or greenspace,
as an environmental amenity (see Crompton, 2004)
2
.
3.2.1 Studies of Open Space Characteristics
Some studies evaluated the characteristics of open space in general, including the
diversity of open space (Geoghegan et al., 1997), different landscape features
(Bockstael and Bell, 1998), the permanency of open space (Geoghegan, 2002), the
43
view of open space (Jim and Chen, 2006), and the distance to open space (Kaufman
and Cloutier, 2006). Sometimes, these open space characteristics can be combined
(e.g. distance times type, in Orford, 2002; Ready and Abdalla, 2003; Shultz and
King, 2001; size times type, in Lutzenhiser and Netusil, 2001) when evaluating their
impacts on house prices.
Some studies have focused on certain kinds of open space. Wetlands (Bin and
Polasky, 2003; Mahan et al., 2000), urban forests (Tyrväinen, 1997; Tyrväinen and
Miettinen, 2000), urban parks (Morancho, 2003), community gardens (Voicu and
Been, 2008), and urban greenway trails (Crompton, 2001; Lindsey et al., 2004;
Nicholls, 2002, 2004) have attracted attention in hedonic studies. Some open space
evaluations can be incorporated into a more complex and ecologically oriented
hedonic study. For example, Acharya and Bennett (2001) compiled features of open
space and other landscapes to evaluate New Haven’s River watershed.
In spite of a growth in empirical studies of open space, the extent of green
cover immediate adjacent to residential properties in a neighborhood and its impact
on housing prices has received limited attention. The few studies that considered
such impacts suggest greenspace may exert important effects on house prices. For
instance, using a comparative mean house prices sales approach, Morales et al. (1983)
found that for comparable homes, home buyers valued mature tree cover on a
residential lot at $9,500. Following the comparable sales data approach but with a
bigger sample size, Anderson and Gordell (1988) also demonstrated positive impact
of trees in Athens, Gerogia. Another study of Anderson and Gordell (1985)
44
employed hedonic models and found a 3 to 5% increase of property values due to the
presence of trees. Standiford et al. (2001) considered the value of oak woodland
stands. They found that on 5-acre lots, rangeland with at least 40 oaks per acre were
worth 27% more than open land; and that 2-acre lots with 40 trees per acre was
worth 22% more than bare land. Similarly, Diamond et al. (1987) reported that home
values in San Diego County decreased by $324 (or $3 per square foot) for every
1,000 feet of distance away from a 6,000-acre oak woodland. In another recent
hedonic study, Dombrow et al. (2000) showed that houses with mature trees in Baton
Rouge were approximately 2% higher in value than those without trees; the presence
of mature trees contributed about $1,800 to an average home selling price ($93,272).
Luttik (2000) estimated not only the value of trees, but also of water and
immediate open space in eight towns or regions in the Netherlands through a hedonic
model. He reported positive effects of gardens facing water, a pleasant view of
water/open space, or attractive landscape types. Des Rosiers et al. (2002) divided
landscaping attributes of homes and their immediate environment into ground cover,
flower arrangements, rock plants, hedges, landscape curbs, density of visible
vegetation as well as roof, patio and balcony arrangements. They found the effects of
these neighborhood greenspace objects varied with landscape types and property
types. For example, hedges positively contributed to property values by 3.6 to 3.9%,
patio landscaping contributed 12.4%, and landscaped curbs contributed 4.4%. A
larger percentage of immediate ground cover increased property values for
bungalows and cottages, but the visibility of above-average vegetation density
45
decreased home prices. Kestens et al. (2004) integrated neighborhood greenspace
with surrounding land uses in a hedonic model. They found the percentage of
residential land use with low tree density negatively impacted property value by
1.9%, but residential land use with mature trees within both 100 and 500 m radii
exhibited positive impacts. In contrast, agricultural land with dispersed trees
decreased property value about 2.3% for each 10% increase in tree coverage within
100 m radius. They also used the Normalized Difference Vegetation Index (NDVI)
to measure green density and land use diversity in the immediate vicinity of
properties and showed both had positive effects on property value.
Based on these results, we develop measures of neighborhood greenspace
coverage and recreational amenities accessibility, and estimate their contribution to
housing values in central Los Angeles for transactions in the late 1990s.
3.2.2 Valuation Methods of Open Space Studies
As mentioned above, the hedonic pricing model has often been used to estimate the
marginal implicit price of open space. However, being location-specific,
environmental characteristics (e.g. open space) often lead to spatial effects in
hedonic models. These effects cannot be captured by simply adding distance
variables into a model, because they represent spatial structures, interactions, or
misspecifications (Miron, 1984; Odland, 1988). Spatial effects typically include
spatial heterogeneity and spatial dependence. Spatial heterogeneity refers to the
unstable spatial structure of the phenomenon under study. It implies varying
functional forms and parameters across the whole study area, or just spatial
46
heteroscedasticity (non-constant error variance) due to omitted variables or some
other misspecification. Spatial heterogeneity in the form of structure instability could
lead to biased coefficient and variance estimates. Spatial heteroscedasticity alone can
reduce estimation efficiency of OLS. Spatial dependence/autocorrelation, on the
other hand, refers to the local-scale dependence among observations that is only
based on distance. In other words, an observation at one location is affected by
nearby observations for the same attribute (Cliff and Ord, 1973). If ignored, spatial
dependence can lead to unbiased but inefficient estimation of coefficients in OLS, as
well as biased variance estimates (Anselin, 1988b; Bailey and Gatrell, 1995).
However, when spatial autocorrelation results from, or is a reflection of, omitted
variable(s), ignoring it will cause biased estimation of coefficients.
Corresponding to the considerable growth of spatial econometrics from the
mid-1990s to now, spatial regressions have increasingly appeared in the real estate
literature (Basu and Thibodeau, 1998; Can, 1990, 1992; Gelfand et al., 2004; Pace et
al., 1998; Pace and Gilley, 1997) and the environmental economics literature (Beron
et al., 2004; Gawande and Jenkins-Smith, 2001; Kim et al., 2003; Leggett and
Bockstael, 2000; Paterson and Boyle, 2002; Theebe, 2004). But only a few of these
studies considered open space characteristics. Among them, some measured spatial
heterogeneity and applied spatial expansion models (e.g. the inclusion of interactive
variables by Geoghegan et al., 1997; the local fixed effect method from Anderson
and West, 2006). Some measured autocorrelation and found no significance (e.g.
Acharya and Bennett, 2001; Bastian et al., 2002; Tyrväinen and Miettinen, 2000).
47
Some discovered significant autocorrelation and applied spatial modeling techniques
other than spatial linear regressions (e.g. the inclusion of instrumental variables by
Irwin and Bockstael, 2001; relative interaction variables by Des Rosiers et al., 2002;
multi-level open space variables by Orford, 2002; or the error term covariance
structure estimation of Troy and Grove, 2008). Several others applied spatial linear
regression: Bell and Bockstael (2000), Paterson and Boyle (2002), and Geoghegan et
al. (2003) adopted the spatial autoregressive error model, whereas Patton and
McErlean (2003) employed the spatial lag model. Both models will be explained in
detail later.
Our study adds to the empirical application of spatial linear regression. We
focus on identifying spatial dependence/autocorrelation in the hedonic model for
open space valuation, and modeling it with spatial linear regression if it exists.
3.3 Data and Methodology
3.3.1 Study Area and Housing Data
Our study area was the Vermont corridor, an older region of Los Angeles just north
of the University of Southern California in the central city (Figure 3.1). The size of
the area is about 3,900 acres. The population density in this area is quite high (about
44 persons per acre), whereas the amount of green space varies from large amounts
in the western part to virtually none in the eastern area.
We used housing data on approximately 260 sales of single-family residences
from January 1999 to June 2000. The housing data were purchased from American
Real Estate Solutions in Anaheim, California and include information from the Los
Angeles County Assessor’s Office, such as the recording date and sale price of the
house, as well as characteristics of the house, such as lot size, building area, number
of rooms, year built, quality and condition. We also extracted socio-economic data
from the 2000 U.S. Census to generate variables representing neighborhood quality
(e.g. income).
The address of each house was geocoded using the ArcView 3.2 (ESRI,
Redlands, California) geocoding algorithm to determine the unique latitude and
longitude coordinates corresponding to the location of the house. The Hollywood
Freeway (i.e. U.S. Highway 101) runs diagonally through the study area.
Figure 3.1 Vermont study area with locations of house sale transactions in
1999-2000
48
49
3.3.2 Greenspace Measures
We developed measures of neighborhood greenspace area for each house through the
use of ArcView 3.2 and ArcInfo 7.2.1 (ESRI, Redlands, California) and a geo-
referenced 2001 aerial photo (Curtis Aerial Photography, California). First, the
latitude and longitude coordinates generated from geocoding enabled the creation of
a point layer of houses that contains all the sales data as attributes. Reference layers
included U.S. Census Bureau TIGER files of streets and highways, clipped to the
geographic extent of the study area. All the layers were then projected to the
Universal Transverse Mercator projection, in meters, using the projection function of
ArcInfo to match the projection of the aerial photo. The distance/display unit of the
map was set to feet to accommodate the units used for the real estate data (e.g.
square footage of a house).
The following variables were then generated: (1) distance to closest freeway
access ramp; (2) distance to closest subway portal; (3) distance to closest
park/recreation center; and (4) green acreage for varying buffer distances
surrounding home sale location (25, 50, 75, 100, 150, 200, 250, 300, 400, 500 feet).
In ArcView, a freeway access point layer was generated by examining the
connectivity of the streets and highways layers and visually confirming ramp access
point locations with a 1999 Thomas Brothers Guide of Los Angeles. With streets and
highways as a backdrop, each freeway access point in the study area was digitized.
To find the closest access ramp for each house, a spatial join between the house point
layer and freeway access point layers was run. This spatial join between the two
50
point layers created a new field, Distance (distance to the closest freeway access
point feature) in the house layer’s attribute table, together with all other attributes of
the closest freeway access point (e.g. name, direction of access, etc.) A similar
method was used to digitize access points of subway and parks/recreation centers,
and then calculate the closest distance from each house. Field visits were used to
confirm their locations.
The aerial photograph showed various dimensions of green cover, such as
tree canopy, parkways, lawns, landscaped areas, sports fields, and even cemeteries.
Using the aerial photo as a reference, all the green cover was manually digitized in
ArcView. To test the effects of green cover acreage on housing prices, the strategy
was to capture a variety of greencover values from several buffered radii centered on
house points. To achieve this goal, buffer ring layers of 25, 50, 75, 100, 150, 200,
250, 300, 400, and 500 feet around each house were first generated (Figure 3.2).
Then, each buffered layer was overlaid onto the green cover layer to determine the
amount of cover acreage at each buffer distance for each house. Results were
converted from square feet to acres. These variables, along with the distance
variables described above, were then incorporated into the hedonic regression
models.
3.3.3 Model Specification
The hedonic pricing approach used in this chapter emphasizes the role of location in
real estate analysis. We used two models. In the first, locational effects are expressed
through traditional proximity variables (i.e. distance to the nearest park, distance to
the nearest freeway ramp) in a standard OLS hedonic pricing model. With the second
model we aimed to take away spatial effects from the standard OLS model by
utilizing a spatial regression model.
Figure 3.2 Schematic diagram illustrating the greenspace (polygons in various
shades of green color) surrounding house locations (the red dot) in the Vermont
Corridor (buffers at 25, 50, 75, 100, 150, 250, 300, 400 and 500 feet)
Method 1: Standard Hedonic Pricing Model
The hedonic model for house prices relates the sales price of the house to
characteristics of the house and property, neighborhood amenities, accessibility, and
the time of the sale. The empirical specification used in this chapter is:
51
) ln(Re ) ln( ) (
) ( ) ( ) ln( ) ln( ) ln(
7 6 5
2
4 3 2 1 0 ,
cDist RampDist MedInc
Age Age LotSize Area Y
t i
(3.1)
t i
K
k
k k
T
t
t t
G Q
,
1 1
ln
where Y
i,,t
= sale price of the ith house sold in quarter t; Area = living area of the
house in square feet; LotSize = lot size of the property in square feet; Age = age of
the dwelling in years; Income = median household income for the census block
group in which the house is located; RPdist = distance in miles between the house
site and the nearest freeway ramp; PKdist = distance in miles between the house site
and the nearest park or recreation area; Q
t
= dummy variable indicating the sale
quarter (1999:2–2000:2, with 1999:1 as a base); and G
k
= amount of green space in
different concentric regions encircling the house (i.e. 200 to 300 feet, 300 to 400 feet,
400 to 500 feet away from houses).
The selection of explanatory variables for the model follows the standard
hedonic pricing approach (e.g. Basu and Thibodeau 1998). We incorporated house
characteristics (living area, lot size, and age), neighborhood characteristics (median
household income), and the extent of greenspace. Additional variables such as the
number of bedrooms and distance to the closest metro station were insignificant and
removed from the specification. The selection of the three concentric greenspace
rings were determined by the empirical facts that rings under 200 feet essentially
capture lot size only, whereas rings beyond 500 feet are outside of the house’s
immediate environment. The quadratic specification for the age effects helps to
52
model the fact that many older homes in the area are in need of repair and tend to
depress house values. Quarterly time indicators were employed to help control for
market changes in the price during this time period.
Method 2: Spatial Model
We used a Lagrange Multiplier test (LM test) to find out if significant spatial
autocorrelation exists in the OLS residuals of the standard hedonic pricing model.
Given significant autocorrelation, we planned to use further LM tests to identify the
appropriate specification of the spatial model. LM tests have been used widely in the
literature of spatial econometrics for this purpose (Anselin, 1988a; Anselin, 1988b;
Anselin et al., 1996; Anselin and Rey, 1991; Bera and Yoon, 1993; Florax and de
Graaff, 2004), because they offer the advantage of not requiring estimation of an
alternative hypothesis (the spatial model). LM tests treat the standard model as the
restricted model (null hypothesis), and the spatial model as the unrestricted model
(alternative hypothesis). Thus the tests consider the difference between the two
models as an instance of omitted variables. LM tests have been commonly used to
choose from the most common spatial models: the spatial lag and spatial error
models, or a combination of the two.
Spatial lag models, also known as mixed regressive–spatial autoregressive
models (Anselin, 1988b), interpret spatial dependence as a substantial or structural
spatial process: a consequence of omitted variables. Therefore its matrix form is:
X WY Y , (3.2)
53
where WY is a spatially lagged dependent variable, a proxy for omitted variables in
the regression, in which is the spatial lag parameter, and W is the spatial weighting
matrix specifying interconnections between different locations. Can (1990) pointed
out that using WY was close to the actual practice in the real estate industry: a
realtor often refers to the prices of nearby houses to assess a given property’s value.
Therefore such a specification is commonly used in spatial econometrics. Anselin
(1988b) showed that erroneously omitting the spatial lag WY would lead to biased
and inconsistent estimation of coefficients.
The spatial error model assumes that spatial autocorrelation is caused by
misspecifications (e.g. misspecified functional forms, measurement errors, improper
units, etc.), thus it is a statistical nuisance. To treat it, the original error term from
OLS is modeled as an autoregressive error term W where denotes the
residual matrix, W is a spatial weighting matrix and is the coefficient. The
transformed residuals are independently distributed about a mean of zero. The
resulting model in matrix form is:
W X Y (3.3)
This model is also known as a linear regression model with a spatial autoregressive
disturbance (Anselin, 1988b) in spatial econometrics or a simultaneous
autoregressive model (SAR) in spatial statistics (see Bailey and Gatrell, 1995). The
value of and can be simultaneously estimated by the maximum likelihood
method.
54
55
To identify the appropriate spatial model, we followed the ad hoc decision
rule ─ that between LM tests for the spatial lag and spatial error models, the one with
greater test statistics that are significantly different from zero is the correct
specification (Anselin and Rey, 1991). This decision rule comes from the discovery
that LM tests for the spatial lag and spatial error models exhibit power against each
other (Anselin, 2001a).
We operationalized the spatial weighting matrix W by constructing Thiessen
polygons (i.e. a Voronoi tessellation) for each house location (e.g. Bailey and Gatrell
1995, p. 156) (Figure 3.3). First-order neighbors, whose Thiessen polygons directly
connected with the targeted house polygon, are assigned a weight of 1 in the
weighting matrix, and the remainder are assigned a weight of 0. We row-
standardized the weighting matrix for the LM tests as well as the following
regression models. We also conducted a sensitivity analysis by using the adjusted
first-order neighbor approach, which incorporates average centroid-to-centroid
distances to define neighbors, but this model produced similar results to the first-
order neighbor. Therefore, we relied upon the first-order neighbor approach to
present our results.
56
Figure 3.3 Thiessen polygons constructed for each house location
57
3.4 Results
3.4.1 Standard Hedonic Pricing Model
The accuracy of the coefficients in all of these methods is highly sensitive to outliers
in the underlying data. A close inspection of the original 324 sales transactions
identified 65 data points as bona fide outliers which were removed. Some of the
outlier transactions involved foreclosures, which are not typical sales; some sale
prices were incorrectly recorded; and some houses were located too close to the
freeway ramp to be typical. Table 3.1 gives summary statistics for the remaining 259
transactions used in the study. From the table, the median sale price is $237,500 and
the median greenspace coverage within a 500 foot radius is 3.4 acres, which is about
the size of a small neighborhood park.
Table 3.1 Median summary measures for house sale transactions
Variables Median Values Std. Deviation
1999-2000 Sale Price $237,500 $178,506
Living Area 1,596 sq. ft. 734 sq.ft.
Lot Size 6,750 sq.ft. 2,021 sq.ft.
Age of House 79 years 10 years
2000 Median Household Income $23,690 $20,802
Distance to Freeway Ramp 0.66 mile 0.36 mile
Distance to Park/Rec. 0.40 mile 0.20 mile
Greenspace in 200-300 foot Ring 0.8 acre 0.4 acre
Greenspace in 300-400 foot Ring 1.13 acres 0.4 acre
Greenspace in 400-500 foot ring 1.46 acres 0.5 acre
Overall, the adjusted R
2
value of 0.83 indicates that a large proportion of the
variation in sale prices is explained by the explanatory variables. The estimated
coefficients are consistent with expectations (column two of Table 3.2). For this area
58
of Los Angeles, the model estimates that every 1% increase of living area increases
the expected sale price by about 0.6% and that every 1% increase in lot size
increases the expected sale price by 0.12%. Similarly, neighborhoods with higher
median household incomes, and houses located farther from the freeway
or closer to
a recreational park have higher expected sale prices. The estimated age coefficients
suggest that the neighborhood’s exceptionally old housing stock tends to depress
housing values.
After accounting for all the factors above, all three greenspace coefficients
still show positive effects on the sales price. Furthermore, the coefficients follow a
distance-decay pattern: the same amount of green space has a larger effect on the
sale price when it is located closer to the house. The estimated coefficients 0.076,
0.068, 0.004 exhibit declining price effects for greenspace located at distances of
200-300, 300-400 and 400-500 feet, respectively from the house. The first
coefficient (0.076) suggests that increasing greenspace by 1% in that ring can
increase the property value by 0.076%, which is lower than the impact of proximity
to parks and recreational facilities (0.128). The coefficients for the other two
greenspace variables, the 300-400 feet ring and the 400-500 feet ring, exhibit
distance decay properties (0.068, 0.004) but are not significant at the 5%
conventional level. These greenspace are probably too far from houses to have a
noticeable impact.
59
Table 3.2 Estimated coefficients (and p-values) from models with greenspace
effects and time indicators
Variables Model 1: Standard
hedonic model
Model 2: Spatial lag
model
Intercept 0.839 -0.118
(0.213) (0.874)
Ln(Structure) 0.571 0.556
(0.000) (0.000)
Ln(LotSize) 0.115 0.124
(0.031) (0.015)
Ln(RPdist) 0.226 0.206
(0.000) (0.000)
Ln(PKdist) -0.128 -0.120
(0.000) (0.000)
Ln(Income) 0.290 0.260
(0.000) (0.000)
Age 0.027 0.029
(0.038) (0.021)
Age2 -0.00021 -0.0002
(0.027) (0.014)
Quarter 2 1999 0.158 0.144
(0.001) (0.002)
Quarter 3 1999 0.101 0.091
(0.029) (0.040)
Quarter 4 1999 0.177 0.162
(0.001) (0.001)
Quarter 1 2000 0.175 0.170
(0.000) (0.000)
Quarter 2 2000 0.217 0.204
(0.000) (0.000)
Log Green 200 to 300 0.076 0.070
(0.039) (0.048)
Log Green 300 to 400 0.068 0.070
(0.130) (0.102)
Log Green 400 to 500 0.004 0.006
(0.908) (0.867)
Adjusted R
2
0.830 0.833
rho 0.110
(0.017)
3.4.2 Tests for Spatial Dependence
We used the R program to run the LM tests. The results are shown in Table 3.3. The
LM test on the residuals of the standard model (i.e. LM
error
) showed that there is
significant autocorrelation in the error term (p-value = 0.022). We also performed the
60
same LM
error
test on the residuals of a standard hedonic model without the
greenspace variables, and the result showed stronger spatial autocorrelation in the
error term (p-value = 0.003). This suggests that greenspace does capture or explain
some spatial autocorrelation effects, but it is still necessary to employ a spatial
regression model to remove the remaining autocorrelation to avoid overestimating
the coefficients.
Table 3.3 Specification tests results
Specification Test Statistics p-value
LMerror for OLS without greenspace variables 8.826 0.003
LMerror for OLS with greenspace variables 5.230 0.022
Robust LMerror 1.703 0.192
LMlag 5.799 0.016
Robust LMlag 2.272 0.132
Likelihood ratio test for spatial lag model 5.623 0.018
LM test for the residual of spatial lag model 1.161 0.281
But LM
error
alone is not enough to confirm the specification of the spatial
model, so we also performed LM tests on the spatial lag term ρWY (i.e. LM
lag
) with a
null hypothesis of ρ = 0, in order to choose between the spatial error and spatial lag
models. We also performed robust LM tests for λW ε and ρWY, respectively to
double-check the results. The robust LM
error
(or robust LM
lag
) test identifies whether
λ = 0 (or ρ = 0) given ρ ≠0 (or λ ≠0). The results in Table 3.3 show that both the
autoregressive error and spatial lag term are significant at a conventional level.
However, the p-value of LM
lag
(0.016) is slightly lower than the p-value of LM
error
(0.022), and that the p-value of robust LM
lag
(0.132) is still lower than the p-value of
robust LM
error
(0.192) although insignificant. Therefore, we chose the spatial lag
61
model as the appropriate model, although the slight difference between the p-values
suggests that the two spatial models have similar power in explaining the spatial
processes in our study.
3.4.3 Spatial Lag Model
We used the maximum likelihood method to estimate the spatial lag model. The
estimation results include the likelihood ratio test for spatial dependence, which
confirmed the result of LM tests. Significant spatial autocorrelation is present, and it
is effectively captured by the spatial lag term (p-value = 0.018) (see Table 3.3). The
estimated coefficients are shown in column three of Table 3.2. Its coefficients of
greenspace variables, like the standard hedonic model, are only significant for the
200 to 300 feet ring at the conventional 5% level. The coefficient value for this ring
is 0.07, which means that every 1% increase of greenspace in that ring increases
property values by 0.07%. This coefficient is slightly lower than the one from the
standard hedonic model, which means the spatial model has reduced the inflated
estimation caused by spatial autocorrelation. The p-value (0.048) of this coefficient
is slightly higher than that of the standard model (0.039). The reason could be that
spatial autocorrelation is caused by a missing variable, which would bias the
estimates upward and deflate variance in a standard OLS. Therefore, we might
expect the p-value to be higher when the missing variable bias imparts itself into the
residuals. We ran an LM test on the residuals of the spatial lag model, and found no
significant autocorrelation (p-value = 0.281) (Table 3.3).
62
We also plotted the partial effects of the 200-300 feet ring greenspace in both
the standard hedonic and spatial lag models, within the range of data values (Figure
3.4). The plots show that the greatest gains in the greenspace effects are at the lower
amounts (i.e. under 3,000 sq. ft.). This means that the greatest payoff is obtained by
increasing nearby neighborhood greenspace by a small amount, since diminishing
returns occur fairly quickly.
We used both the R program and the GeoDa software (Spatial Analysis
Laboratory, University of Illinois, Urbana-Champaign) to estimate the spatial lag
model. Both programs returned the same estimation results.
3.5 Discussion and Conclusions
This chapter presents a spatial econometric approach to evaluate the influence of
neighborhood greenspace in residential hedonic models. An aerial photograph of the
study area neighborhood was used to digitize all the green cover (trees, shrubs,
grasses) into greenspace polygons within the study area, using GIS software. After
digitization, we used GIS techniques to construct spatial variables for the greenspace,
which were then successively added to a hedonic model. Next, spatial
autocorrelation was shown to be significant, thus the standard hedonic model was
adjusted through a spatial lag model, specified through use of LM test procedures.
63
Figure 3.4 Partial effects of the 200-300 feet greenspace ring
The results from the spatial lag model confirmed the significant positive
impact of immediate neighborhood greenspace – the 200 to 300 feet ring – to house
values even after removing the effects of spatial autocorrelation. Specifically, a 1%
increase in the amount of greenspace within the 200 to 300 feet ring would lead to an
approximate increase of 0.07% in the expected sales price of the house, or an
b. Spatial Lag Model
0.2
0.4
0.6
0.8
1
0 20,000 40,000 60,000 80,000
Greenspace Area (sq.ft.)
LogPrice
a. Standard Hedonic Model
1
0.8
LogPrice
0.6
0.4
0.2
0
0 20,000 40,000 60,000 80,000
Greenspace Area (sq.ft.)
64
additional $171 in the median price. It resonates with Des Rosiers et al.’s (2002)
finding that a 1% increase in groundcover causes a 0.1-0.2% increase of house prices.
To illustrate our results, we follow the framework of Crompton’s “proximate
principle” (Crompton, 2004). Crompton calculated the additional property tax
revenue generated from increased property values due to proximity to greenspace,
and compared this tax revenue to the cost of developing the greenspace, in order to
see if the greening process is financially self-sustainable. Following this framework,
Geoghegan et al. (2003) found that increasing preserved agricultural land by 1%
(144 acres) in Calvert County, Maryland can generate sufficient tax revenue from
increased housing values (from properties within a one-mile radius of the preserved
parcel) to purchase an additional 88 acres in the first year. In our case, suppose there
is a 15% increase of greenspace within the 200-300 foot rings around each house
throughout the study area. This would increase the median home value by 1.05% or
$2,565. This additional greenspace constitutes 0.12 acre or 5,117 square feet – the
size of many pocket parks. This is an apt comparison, since such a vacant lot in
central Los Angeles County can be priced at $350,000 even in low-income
communities, while improvements would add another $220,000 to the cost of adding
such a park
3
. The question is: can an incremental neighborhood greening program
generate sufficient additional property tax revenue to be self-supporting?
Assuming approximately 300 houses in the study area were to increase their
greenspace by 15%, an additional $783,750 in housing stock value (based on the
median house price) would be generated. However, additional tax revenues will not
65
be forthcoming for every property, because California’s Proposition 13 only allows
property tax to be reassessed when a property goes through a market transaction.
Therefore, our calculation of property tax generation will be restricted to the 20% of
properties in the study area that are sold each year (i.e. about 60 properties). Based
on the current tax rate in Los Angeles County (1.2%), the additional property tax
revenues from these properties totals about $2,665 per year. Since property tax is
collected every year, the accumulated additional property tax for properties sold in
any given year would be $2,665 multiplied by the number of years to follow. If we
assume a 10-year scenario, in constant 2007 dollars, then properties sold at Year 1
would generate a total tax of $26,650 ($2,665 x 10) over the ten-year period.
Similarly, properties sold at Year 2 would bring $23,985 ($2,665 x 9), and properties
sold at Year 3 would bring $21,320 ($2,665 x 8), and so on. In the end, the total
additional tax revenue in 10 years on the reassessed homes would be about $146,575.
This is far less than the cost of a pocket park but could these revenues
underwrite the cost of neighborhood greening? The answer is probably no, but it
depends on the ability of jurisdictions to incentivize greening programs, for example
by providing matching grants to homeowners interested in landscape redesign. This
is a method of leveraging public investments that park projects are unable to utilize.
Moreover, public education about the property value enhancements associated with
greening can encourage homeowners to add greenspace on a purely self-interest
basis – again, an incentive that leads to individual choices to green even when
municipalities do not invest. In addition, as cities have become increasingly focused
66
on water quality, climate change and mitigating exposure to air pollution, greening
programs have attracted widespread attention and public resources as a method of
improving ambient environmental conditions and public health.
What are the opportunity costs of such greening incentive programs? No
behavioral studies are available to allow us to understand how property owners,
especially homeowners, might respond to local incentives to green their parcels, or
how large such incentives might need to be to trigger greening projects on private
property. Moreover, in many jurisdictions, nonprofit organizations dedicated to
urban forestry, parks, or similar goals exist but their capacity to assist property
owners, through (for example) donating plants or providing technical assistance, will
vary by locality. So it is very difficult to estimate the total costs of neighborhood
greening, and therefore hard to know what could be otherwise achieved with the
same infusion of funds. However, given the growing pressures to address
environmental concerns, particularly in older neighborhoods often burdened by heat
and/or pollution stress, the relatively modest spending required to leverage targeted
private greening efforts might be justified. Other approaches to greening, such as
building codes or street planning guidelines, are already being adopted in many
jurisdictions to control runoff and pollution, while greening vacant lots may require
minimal resources.
And, is it feasible to increase greenspace within the 200-300 ft. ring up to
15%? In our study area simply widening a sidewalk parkway by 5 feet and greening
an average-sized driveway can lead to a 1,000 square-foot increase of greenspace
67
area within the 200-300 foot ring. This constitutes about 3% of the average
greenspace area within that ring for individual houses. Other more aggressive
greening strategies are possible. Many older communities have alleys, which are
typically 18 ft wide. Greening half of a 50 ft alley segment behind a residential
property (to maintain ingress/egress rights) with a bioswale would generate a 500
square feet (1.5%) increase in greenspace. Other strategies include roof gardens,
hardscape reductions, or creating linear parkways in the middle of streets that can
accommodate them. Hardscape reductions are a key option, since our fieldwork and
inspection of aerial photos in the study area revealed sidewalk parkways that had
been paved over, front yards that had been paved, and many small snippets of
remnant land that the city had paved. A reasonable greening scheme, based on our
analysis of aerial photographs and satellite imagery, could generate a 7% increase of
greenspace. Finally, adopting a policy mandating that vacant lots be greened
temporarily as they await sale, planning studies or development entitlements, would
generate a significant if fluctuating greenspace resource. There are a significant
number of vacant lots awaiting reuse in older communities such as our study area.
Also, some greening can occur as a result of revised building codes, that specify how
much greenspace must be included in site plans, or street/alley design codes that
mandate parkways, bioswales, and where roads are wide enough, planted medians.
Adding all these greening opportunities together means that the total percentage
improvement can reach as high as 15%.
68
In summary, neighborhood greening programs, funded by property owners,
municipal jurisdictions, and/or nonprofit organizations can provide a flexible
strategy to support home prices in older inner city communities where the potential
for parkland acquisition is limited. At the same time, greening can make such
neighborhoods healthier, cooler, more attractive for walking and cycling, and
infiltrate and clean stormwater runoff, therefore enhancing the quality of life for
these older urban communities. While not substituting for adequate park space,
greening strategies can provide important benefits to individual homeowners as well
as community well-being.
More research is needed to hone our results. For example, the greenspace
effects estimated in our study tend to be conservative, because the area of greenspace
alone is only a crude measurement of its impact. Larger impacts would be expected
if more attributes of greenspace were to be specified and included in the hedonic
model. For example, Des Rosiers et al. (2002) categorized home landscaping into
ground cover, flower arrangements, rock plants, hedges, landscape curbs, density of
visible vegetation as well as roof, patio and balcony arrangements. They found that
the presence of hedges alone can positively contribute to property values by 3.6 to
3.9%, while patio landscaping impacts added 12.4%, and landscaped curbs
contributed 4.4%. In addition, attributes of greenspace area usually measured by
presence (e.g. presence/absence of mature trees, hedges, landscaped curbs) (Des
Rosiers et al., 2002; Dombrow et al., 2000), but such measures could be improved
through the use of more refined measurements of aesthetic quality. Armed with more
69
detailed data on the nature of greening undertaken, future studies could develop more
realistic estimates.
Future studies could also take up the question of ecosystem values of
greenspace, because cost recovery based only on tax revenue generation is not the
only cost calculus to be applied here. Greenspace can absorb stormwater runoff, take
up air pollutants, sequestrate carbon, provide wildlife habitat, moderate wind, and
reduce urban heat island effects and hence energy demands (Akbari et al., 2001;
Dwyer et al., 1992; Luley, 1998; McPherson, 1992; Nowak et al., 2000; Scott et al.,
1999). And these benefits, known as nature’s services, can be transformed into
monetary values based on the equivalent costs incurred to achieve the same results
(e.g. infrastructure cost for stormwater drainage, energy cost of heat reduction,
carbon credits when carbon trading is available in the future, etc.). For example,
Longcore et al. (2003; 2004) showed that increasing green cover can save from
$475,280 to $1,839,640 per year in municipal infrastructure costs. Therefore, future
housing value studies could develop methods for understanding how the value of
such ecosystem services provided by greenspace may be capitalized into housing
prices. At a broader urban policy level, such analyses would allow the value of
nature’s services (either in the form of higher property values, lowered urban service
costs, or enhanced environmental quality and health per se) to inform decisions about
open space development or neighborhood greening programs.
70
Chapter 3 Endnotes
1
.Delinquent tax properties could be an opportunity to fund open space
improvements, but municipalities such as city of Los Angeles historically have been
reluctant to do so (Harnik 2000).
2
Although studies on park impacts date back as early as 1939 (Herrick 1939 as cited
in Crompton 2004), the analysis then was limited to data availability and
computation capability.
3
The vacant lot value is based on the transaction data of this neighborhood in 2000.
The improvement cost is based on 2006 data for a 0.32 acre pocket park construction
project in Los Angeles County.
71
Chapter 4: The Impact of Neighborhood Greenspace on
Residential Property Values across a Spatially
Heterogeneous Metropolitan Area
This chapter has been submitted as
Li, C. Q., Wilson, J. P. The impact of neighborhood greenspace on residential
property values across a spatially heterogeneous metropolitan area. Journal
of Real Estate Finance and Economics (December 2009)
4.1 Chapter 4 Introduction
The value of open space has been acknowledged increasingly in the field of
environmental economics, but the impact of neighborhood greenspace on property
values has not attracted much attention. As reviewed in the previous chapter, just a
few studies have estimated the value of immediate greenspace, such as mature trees
(Anderson and Cordell, 1985; Dombrow et al., 2000; Morales et al., 1983),
landscaping attributes (Des Rosiers et al., 2002; Luttik, 2000), and tree density
(Kestens et al., 2004). We define neighborhood greenspace as greenspace
immediately surrounding and contained within properties. So it typically includes a
property’s front and back yards, green strips along the sidewalk in front of the
property, and street trees in front of the property. In the context of this chapter, we
define neighborhood as local geographical areas with similar socioeconomic
72
characteristics, accessibility to environmental amenities, and roughly homogeneous
housing markets.
Neighborhood greenspace has potential positive impact, not only because of
its obvious aesthetic value, but also its ability to alleviate people’s pressing demand
for open space without land acquisition. In addition, it can provide ‘nature’s
services’ by sequestrating carbon, taking up air pollutants, absorbing stormwater
runoff, and reducing urban heat island effects such that urban service costs are
reduced (Akbari et al., 2001; Daily, 1997; Dwyer et al., 1992; McPherson, 1992).
The case study in Chapter 3 targeted a small dense urban area within the City of Los
Angeles to evaluate their positive impact through a hedonic pricing model, and found
that a 0.07% increase in property values can be attributed to neighborhood
greenspace. This chapter uses the same conceptual framework as the empirical
example in Chapter 3, but extends the work in the following two respects. First, we
expand the study area to cover most of Los Angeles County in order to observe how
the local effect of neighborhood greenspace varies across a large metropolitan area.
Second, we use several spatial modeling techniques to adjust the standard hedonic
model to assure the best representation and modeling of spatial effects and the best
possible (i.e. most accurate) estimation of neighborhood greenspace impact.
Spatial effects usually exist in the housing market in the form of both spatial
heterogeneity and spatial dependence. Spatial heterogeneity refers to the fact that the
marginal prices of housing attributes may not be constant across the whole study area.
Spatial dependence indicates that objects nearer to each other are usually more
73
similar and have greater influence on each other than objects farther apart. Ignoring
either spatial effect can lead to inefficient and/or biased model estimates (Anselin,
1988b; Cliff and Ord, 1973; Dubin, 1998). Currently, many existing spatial
regression studies focus on spatial dependence, because the study area is (sometimes)
too small to exhibit significant spatial heterogeneity. The large study area covered in
this project definitely covers multiple real estate submarkets which display large
variations in property values and people’s behaviors. Thus it is appropriate to explore
multiple spatial modeling techniques to ensure the control of both of the
aforementioned spatial effects.
The remainder of this chapter is organized as follows. We begin with a
review of the pertinent literature followed by an overview of the Los Angeles County
housing market. Next, the data and methodology employed in the study are described.
We then compare the results of the models and discuss the spatial patterns observed
in the data. In the final section we draw conclusions and suggest avenues for future
research.
4.2 Spatial Modeling in Open Space Hedonic Models
As mentioned at the start of the chapter, there are few empirical studies of the impact
of neighborhood greenspace on property values. There are even fewer such studies if
we consider the treatment of spatial effects. We found only three instances: Des
Rosiers et al. (2002) investigated the impact of landscaping and Kestens et al. (2004;
2006) analyzed the impact of spatial variation of land use and vegetation patterns on
74
house prices in Quebec City, Canada. They both identified spatial dependence and
incorporated it in their hedonic models.
Des Rosiers et al. (2002) used interaction variables in their model:
multiplying landscaping variables with the housing type and/or neighborhood
demographic variables. For example, they first transformed the value of green
ground cover percentage to its deviation from the local mean, and then multiplied it
with the housing type variable. Such interactive variables turned out to be successful
in eliminating significant spatial autocorrelation. The transformed landscape
variables may have acted as proxies for some omitted landscape variables (i.e.
similar styles, density, tree species, etc.), which reduced the local clustering of values,
or because the variation of house types or demographic characteristics captured the
spatial heterogeneity of the data, which in turn reduced the autocorrelation in the
error term.
Kestens et al. (2004) also used interactive variables to reduce the Moran’s I
value (although it was still significant). They multiplied greenspace variables (i.e.
percentage of lawn area within a 300 m radius of a property) with a distance variable
(i.e. car-time distance from the property to main activity centers), such that the
resulting parameters indicate the changing impact of greenspace at different
distances to activity centers. Kestens et al.(2006) compared such an approach to the
GWR approach.
Using interactive variables is actually a form of spatial expansion model, an
extension of Casetti’s (1972; 1997) standard econometric expansion method to the
spatial context. It is designed to capture spatial heterogeneity of the data. Its general
form is:
i im i i i i i
x x y ...
1
(4.1)
where
i i i
2 1 0
,
i i i
2 1 0
,
i i i
2 1 0
, and u
i
and v
i
are x, y coordinates for location i. The expanded parameters α
i
, β
i
, and τ
i
are
now location-specific, which essentially allow parameters to vary across the study
area. Often times, researchers do not use x, y coordinates directly, but use interactive
variables instead.
A few other open space studies, although not focused on neighborhood
greenspace, have also adopted the expansion method. For example, Geoghegan et al.
(1997) multiplied open space percentage and landscape indices with distances from
Washington, D.C. in both linear and quadratic forms, and then added the
multiplication term to their standard hedonic model. They found that both land use
diversity and fragmentation exhibit drastically different effects on property prices at
different distances from Washington, D.C. In addition, the spatial expansion model
considerably improved the significance of the parameters over the standard hedonic
model.
Patton and McErlean (2003) studied the agricultural land market of Northern
Ireland. They did not include interactive variables, but expanded the standard
hedonic model by adding the same variables in each spatial regime/sub-market and
then jointly estimating separate coefficients for each sub-market. They referred to
75
76
their model as a spatial regime model. In addition, they implemented spatial lag and
spatial error regression models from the field of spatial econometrics to address
spatial dependence. The spatial lag model assumes the cause of spatial dependence to
be the inherent spatial structure and interaction, and thus adds a spatially lagged
dependent variable to the standard hedonic model as a proxy for omitted variables.
The spatial error model takes spatial dependence as statistical nuisance caused by
measurement errors, and consequently uses a spatially autoregressive error term to
remove significant spatial autocorrelation in the model residual (Dubin, 1998).
Researchers often use specification tests (e.g. the Lagrange Multiplier test) to choose
between these two models (Anselin and Rey, 1991). Patton and McErlean (2003)
used this approach and chose the spatial lag model as a result. Their analysis showed
improved estimation results when both the spatial regime and spatial lag models
were employed.
Several other open space studies have also implemented either the spatial lag
(e.g. Paterson and Boyle, 2002) or the spatial error model (e.g. Bell and Bockstael,
2000; Geoghegan et al., 2003) to control spatial dependence. However, model
estimates were either statistically similar to the standard model, or not compared
with the standard model explicitly in these studies. The spatial lag or error model
may still be effective in removing significant spatial autocorrelation in the residual;
but other spatial modeling techniques may perform better.
We also found open space studies using instrumental variables to control
spatial autocorrelation intertwined with endogeneity (Irwin, 2002; Irwin and
77
Bockstael, 2001). Instrumental variables are variables correlated with one or more of
the explanatory variables but not correlated with the error term. Therefore, when
spatial autocorrelation is caused by omitted variables that are spatially correlated
with other explanatory variables, instrumental variables can act as proxies for the
omitted variable. At the same time, if the model contains endogenous explanatory
variables – variables determined by one or more other variables within the model –
we can transform the endogenous variables as functions of appropriate instrumental
variables, thus removing the endogeneity of the model. But again, we found no
studies on neighborhood greenspace using the instrumental variable method to
manage spatial dependence.
So, the existing open space studies actually offer some experiences, although
limited, in applying spatial modeling techniques, which neighborhood greenspace
studies can refer to. In addition, there have been many new developments in the field
of spatial statistics (e.g. geostatistics, local statistics, geographically weighted
regression, spatial filtering methods, etc.) in recent years that have been applied to
multivariate regression models. Thus there are plenty of choices of spatial analysis
techniques for hedonic models targeting neighborhood greenspace valuation. So, this
chapter aims to add to our understanding of when and if so, how these techniques
should be applied. Their details are explained in the following sections.
4.3 Study Area and Data Sample
Los Angeles County is home to approximately 9.5 million people and contains 88
incorporated cities and many unincorporated areas. Not surprisingly, the Los
Angeles County housing market is characterized by substantial variation of home
values even within the same ZIP codes. Our study area spans the urbanized central
and southern parts of the county, excluding sparsely populated areas of the Santa
Clarita and Antelope valleys and the Santa Monica and San Gabriel Mountains (refer
to Figure 4.1).
Figure 4.1 Location map of the study area
78
79
We obtained property transaction data (year 2005) from DataQuick, a
company specializing in assembling and providing publicly available data, which in
this case came from the Los Angeles County Recorder’s office. We extracted
detailed information on property structural and financial attributes from the original
data source. We also obtained demographic data (Year 2000) from the U. S. Census
Bureau, at the block group scale. Both data sets include the identification number for
individual properties (i.e. AIN or Parcel Number), so we used this attribute as a
common field to join all the data with an existing parcel centroid layer in ESRI’s
ArcMap. This parcel layer was obtained from the Los Angels County Assessor’s
Office, and later edited by the staff of the USC GIS Research Laboratory. Its
projection is California Teale Albers, NAD 1983. We next used ArcMap with this
parcel centroid and several other layers to derive location-specific variables (e.g.
distance to nearest freeway ramp, area of the nearest park). Distances were
calculated using street network analysis in place of the straight line distances used in
many of the earlier studies of this type.
Within the study area, the variability of property values is accompanied by
the variability in the human, economic and physical environment, which requires a
large dataset covering the whole study area to ensure analytical reliability. Our
dataset consists of 1,907 freehold single family house sales which occurred between
July and September, 2005. We limited the data to such a short time period to
minimize the effect of temporal variations in house values. The final sample was
limited to 1,907 transactions as a result of several cleaning steps: on top of the time
80
period and housing type limitations, we also deleted records with missing values,
which reduced the sample to 26,628 sales. Then, considering the time-consuming
process of manually digitizing neighborhood greenspace for each property, we used
a stratified random sample design to select sales that was representative of submarket
characteristics.
We achieved this selection by stratifying the sample data according to NDVI
(Normalized Difference Vegetation Index) value, population density, and medium
household income. NDVI estimates the richness of vegetation coverage via remote
sensing. Its value typically ranges between -1 to 1, with higher positive values
representing denser vegetation coverage. We use NDVI value as an indicator of
physical environment, and population density and medium household income as
indicators for the socio-economic environment. All three indictors were classified
into high, medium, and low classes, which produced 27 unique categories when
combined. We then divided the parcel centroids into 27 shapefiles that matched these
categories. Within each shapefile, we examined the spatial distribution of the
existing data, referred to administrative boundaries, and used our prior knowledge
about the local real estate markets to manually select points that were representative
of the local markets. We strived to achieve an outcome that provided a broad and
comprehensive coverage of the whole study area and sufficient local data to
substantiate local market characteristics. The final selection is shown in Figure 4.2.
Figure 4.2 Selected properties
4.3.1 Derivation of Greenspace Variables
We included three greenspace variables. The variable PKdist refers to the road
distance to the nearest park from each selected property, which is calculated by
overlaying the street network with the centroids of all parks within Los Angeles
County. The Network Analyst extension of ArcGIS was used to measure the distance
to the nearest park. The variable PKarea refers to the area of the nearest park, which
is identified by the Spatial Join tool in ArcMap. A park layer was used to derive both
variables PKdist and PKarea. The layer contains various categories of parks within
Los Angeles County, ranging from neighborhood pocket parks to regional parks. The
81
82
layer was developed by the staff of the USC GIS Research Laboratory in May, 2002,
using original data from Center for Spatial Analysis and Remote Sensing (CSARS),
California State University, Los Angeles. Additional parks were added to the original
dataset and all parks were cross-referenced in 2002 using published maps and
various websites. The measurement of neighborhood greenspace area (i.e. variable
Green) is more complicated.
We first manually digitized neighborhood greenspace into polygons for each
selected property and then calculated and summed the areas of these polygons for
each property. To facilitate the digitizing process, we employed an interactive, web-
based polygon tracing tool developed by Daniel Goldberg in the USC GIS Research
Laboratory (see https://webgis.usc.edu/Services/PolygonTracing for additional
details). A database containing x, y coordinates for all the parcel centroids was
uploaded first and the tool then automatically moves to the location of a parcel
centroid when this parcel’s record is selected in the database (Figure 4.3). The
locations are referenced against Google Map and Google Earth, as seen from the
example in Figure 4.4.
Next, we used Google Earth images like that shown in Figure 4.4 to identify
all possible greenspace objects (lawns, shrubs, trees, etc.) within each parcel, and
then traced these objects to generate polygons. We factored in the condition of the
green objects by roughly reducing their sizes if they appeared to be dry or barren (i.e.
not green). Besides greenspace objects within each parcel, we usually also included
the lawn and street trees on the sidewalk, the green easement between properties and
Figure 4.3 Derivation process of the neighborhood greenspace variable
major roads, and green buffers between two neighboring properties, because their
immediate proximity to properties suggests they will exert similar impact as the
greenspace within properties. The x,y coordinates of the vertices of all the traced
83
Figure 4.4 Google Earth image with traced greenspace polygons overlaid on top
of sample parcel shown in center of map
polygons were stored/saved in the original database. We next downloaded the
database and imported it into ArcGIS, where we used the x,y coordinates of the
vertices to rebuild all these polygons as shown in Figure 4.4. Lastly, we calculated
the areas of individual polygons, then summed the ones belonging to the same
property. The summed values are then used as the Green variable input.
4.4 Models
We adopted two approaches to obtain the coefficient of the neighborhood greenspace
variable (i.e. Green). Approach A includes all the neighborhood variables (i.e.
income, demography) in regressions, which was designed to estimate the coefficient
84
85
of Green while simultaneously controlling structural and neighborhood
characteristics. Approach B, on the other hand, follows a two-step procedure: First,
we ascertained the coefficient of variable Green in regressions without neighborhood
variables. The omitted variables are likely to introduce bias and/or inefficiency to the
estimation, but we aimed to use spatial regression techniques to correct these
estimation problems. Next, we selected the best performing model, then ran an
auxiliary regression using the estimated coefficient of Green as the dependent
variable and all the neighborhood variables as independent variables, in order to
discover what neighborhood characteristics potentially associate with the impact of
neighborhood greenspace. The second step is only possible when the estimated
coefficient of Green varies across the study area. Approach B is designed to
investigate the potential association between neighborhood characteristics and
neighborhood greenspace impact. We decided on this approach by referring to the
work of Baum-Snow and Kahn (2000), Bowe and Ihlanfeldt (2001), and Redfearn
(2009).
Within each approach, we compared regression results of the standard
hedonic model with four kinds of spatial models. The following sections describe the
various models. Since the two approaches relied on the same models, we will not
repeat the model descriptions for them except to note any differences.
4.4.1 Model 1: Standard Hedonic Model
The standard hedonic model that relies on ordinary least squares (OLS) to estimate
the coefficients is used as the benchmark model in this chapter. The dependent
86
variable, Ln(SalePrice), is the transaction values of the selected properties. The
selection of independent variables is partially informed by previous studies (e.g.
Conway et al., 2008; Geoghegan et al., 2003), combined with results from
exploratory analysis (such as correlation analysis, multicollinearity tests and
stepwise regression) and our research focus (i.e. neighborhood greenspace impact).
In Approach A, we selected 15 variables after testing many variable
combinations. They can be classified into three categories. The first category
contains four structural attributes of properties: Bathroom (number of bathrooms),
LotSize (property lot size in square feet), Structure (living area in square feet), and
Age (in years). The second category includes six neighborhood characteristics at the
census block group scale: Household (total number of households), Income (medium
household income), Nonwhite (the percentage of population that is Hispanic, Black,
or Asian), HH.Child (percentage of households with children), HH.Child6
(percentage of households with children under six years), and PopDen (population
density). The variable Nonwhite is created by adding up the population of three
ethnic groups – Hispanic, Black, and Asian – which are the main nonwhite ethno-
racial groups in Los Angeles, and dividing by the census block group total
populations. We created this lump-sum variable because the individual groups turned
out to be insignificant in the regression, and they did not improve model fitness as
much as the lump-sum variable Nonwhite does.
The third category contains locational/accessibility characteristics: RPdist
(distance to nearest freeway ramp), PKdist (distance to nearest park), PKarea (area
87
of nearest park in square feet), Green (area of neighborhood greenspace in square
feet), and PK05area (area of parks within half a mile of each property). We want to
point out that PKdist and PKarea will not appear in the same model with PK05area,
but in two separate versions of the base model (we will refer to them “Option 1” and
“Option 2” throughout the remainder of this chapter). We wanted to see which one
worked better for the current data.
Approach B omits neighborhood variables by design, but their omission did
not lead to much difference in finding the best variables for the benchmark
regression. After many trials, the following variables were included: Bathroom,
Bedroom, LotSize, Structure, Age, Age2 (the square of property age), Green, PKdist,
Rpdist, PKarea.
4.4.2 Models 2 and 3: Spatial Expansion Models
As briefly reviewed in Section 4.2, a spatial expansion model can effectively capture
spatial heterogeneity, and produce varying parameters that reflect local relationships.
Often times, researchers use interactive variables (e.g. a distance-related variable
multiplied with all the other variables in the model) to expand the standard hedonic
model. Sometimes, when it is unclear which variable can serve as a proxy for local
variation, researchers directly use the x, y coordinates to expand the original model,
as shown in Equation (4.1).
We tried both approaches in this chapter. Model 2 adds interactive variables
combining PKdist (distance to the nearest park) and all the other variables (Option 1),
or PK05area (area of parks within half a mile of each property) and all the other
88
variables (Option 2), because we assume that either PKdist or PK05area represented
the variation of local greenspace availability, which is a proxy for the variation of
local physical and socioeconomic environment. We kept both options, in order to test
which one is a better fit for the data, using the results from Model A1 as the
benchmark.
Model 3 directly uses the x, y coordinates of each parcel’s centroid for model
expansion. We tested three versions: linear, quadratic, and a third-degree polynomial
expansion. For each version, we first transformed the raw coordinates to deviations
from the mean x and y values of all samples within the study area, then added the
transformed x, y coordinates and their combinations as variables representing
absolute location. These absolute-location variables were combined with all the
variables from Model 1, thus allowing the marginal price of these variables to vary
over space. For example, in the case of the third-degree polynomial expansion, there
are nine absolute-location variables, and 135 interactive variables. We checked the
estimation results to select the best expansion model.
4.4.3 Models 4 and 5: Spatial Regression Models Targeting Spatial
Dependence
As briefly reviewed in Section 4.2, the field of spatial econometrics offers two
commonly used spatial regression models: the spatial lag and the spatial error models.
They are both designed to control for spatial dependence, but are based on different
assumptions about the underlying spatial processes. Their share a general form
expressed as:
U X Y W Y
1
(4.2)
U W U
2
(4.3)
where W represents the spatial weighting matrices that define the spatial interactions
between each pair of observations, and U is the spatially autocorrelated residual
matrix. So ρW
1
Y is the spatially lagged dependent variable that models the dependent
variable as a function of itself in surrounding areas and similarly, λW
2
U is the
spatially autoregressive error term that models residuals as functions of nearby
observations. As a result, ε is independently and identically distributed about a mean
of zero.
Thus, the general form became a spatial lag model when λ = 0 or a spatial
error model when ρ = 0 (Pace and Barry, 1997; Pace and Gilley, 1997; Pace et al.,
1998). We used the Lagrange Multiplier test (LM test) to choose between the two
models. The ad hoc rule is to choose the one with higher test statistics that is
statistically significant (Anselin and Rey, 1991; Florax et al., 2003).
Model 4 applied the chosen spatial regression model to the variables from
Model 1. Model 5 applied the same spatial regression model, but on top of the best
performing spatial expansion model from the Model 2 and 3 analysis. The rationale
is that if the Model 1 residuals display significant spatial autocorrelation, the cause
could be a spatial dependence effect alone or a combination of spatial heterogeneity
and spatial dependence effects. Comparing the results of Model 4 and Model 5 can
give us more insight into the underlying spatial process(es).
89
4.4.4 Model 6: Geographically Weighted Regression (GWR)
GWR is in many ways an extension of Casetti’s (1992; 1997) spatial expansion
model. It can be written as:
i ik i i k i i i
x v u a v u a y , ,
0
(4.4)
where (u
i
,v
i
) represents the geographic coordinates of point i. GWR therefore allows
the explanatory variable coefficients a
k
and the constant term a
0
to vary across space
(Brunsdon et al., 1996; Fotheringham, 1997; Fotheringham et al., 2002;
Fotheringham and Charlton, 1998). It looks very similar to the spatial expansion
model thus far, but the big difference is that instead of incorporating x, y coordinates
as variables into the model directly, GWR employs a local spatial weighting matrix
W to assign weights to observations that are proportional to their proximity to point i.
Then, based on the weighted least squares regression, the GWR estimator of a(u
i
,v
i
)
is written as:
y v u W X X v u W X
i i
T
i i
T
, ,
1
(4.5)
The most challenging task in GWR is to define the weighting matrix W
through spatial weighting functions. It is preferred to model weights continuously
across space as a function of distances between points. Such a function is also called
a spatial kernel that displays a distance-decay curve – weights decrease as the
distance increases. For example, w
ij
= exp(-d/h
2
) where h is referred to as the
bandwidth, which controls the size as well as the spikiness of the kernel. Early on,
fixed kernels are usually employed by assuming an optimum bandwidth for all the
90
91
observations. But it can produce large local estimation variance in areas where data
are sparse, and mask subtle local variations in areas where data are dense. Thus, the
adaptive kernel function is proposed which expands or contracts (i.e. adapts) the
bandwidth to the data density by seeking a certain number of nearest neighbors for
each observation. Some typical techniques for estimating bandwidth are the cross-
validation approach and Akaike Information Criterion (AIC) (Fotheringham et al.,
2002). We employed the adaptive kernel function for Model 6.
4.4.5 Models 7 and 8: Spatial Filtering Models
The rationale behind spatial filtering models is to filter out spatial variations of the
original data before the estimation procedure is chosen. We employed two kinds of
spatial filtering techniques in Models 7 and 8, respectively.
Model 7 used x, y coordinates directly to construct a locally smoothed
surface of spatial variation, added this surface variable into the standard hedonic
model, and then estimated the surface variable together with the other variables. The
surface variable could be specified non-parametricly (e.g. locally weighted
regression LOESS) (Cleveland and Devlin, 1988), or parametricly (e.g. natural cubic
spline). Thus, the standard hedonic model turns into a Generalized Additive Model
(GAM), an extension of the Generalized Linear Model (GLM) (Hastie and
Tibshirani, 1986).
GLM is a family of likelihood-based regression models that relaxes the strict
assumptions on functional forms and error distributions of standard regressions. A
typical example might be developed as follows: (1) a likelihood is assumed for a
response variable Y, (2) a link function g relates Y’s mean μ to a linear predictor
(based on predictor variables X
i1
, …, X
i, p-1
); (3) the parameters of the
linear predictor are estimated by maximum likelihood, GAM replaces the linear
predictor with the sum of the unspecified smooth functions, denoted as
. In practice, the linear indicator and the smooth function can appear
together in a model, so we specified our models accordingly:
p
j
j j
X
1
(
j j
X s
j j
X
)
Y = X + lo(x, y) + (4.6)
or
Y = X + ns(x) + ns(y) + (4.7)
The first model option (Equation 4.6) used LOESS to smooth the surface and the
second option (Equation 4.7) used a natural cubic spline to smooth the surface.
Model 8 adopted the Getis filtering approach to remove spatial dependence
embedded in every spatially autocorrelated variable, and therefore partitioned the
original variable into two parts: a filtered non-spatial variable, and a residual spatial
variable (Getis, 1990, 1995). The non-spatial variable was then able to enter a
standard hedonic model, and generate unbiased and efficient parameter estimations
through OLS. For example, for variable X, the filtered observation x
i
*
is given as:
d G
n
W
x
x
i
i
i
i
1
*
(4.8)
92
where x
i
is the original observation, W
i
is the sum of all links (neighbors) for
observation i, n is the total number of observations, and G
i
(d) is the spatial statistic
of Getis and Ord (1992). The latter statistic can be written as:
n
j
j
n
j
j ij
i
x
x d w
d G
1
1
, i ≠ j (4.9)
where w
ij
is a one/zero spatial weighting matrix with ones for all links/neighbors
defined as within distance d of a given i, and zeros for all other elements. Therefore,
the numerator is the sum of all x
j
within d of i except when i equals j. The
denominator is the sum of all x
j
except when i equals j. Thus, G
i
(d) measures the
concentration or lack of concentration of the sum of values associated with variable
X in the study area and its mean is
1 n
W
i
. So, essentially, the Getis filter in Equation
(4.8) represents the degree of similarity between x
i
and its expectation, whereas the
degree of dissimilarity is attributed to spatial autocorrelation.
We first tried transforming only the dependent variable (SalePrice), to see if
this transformation was sufficient to take out all spatial autocorrelation in the
regression residuals. We then tried transforming all the independent variables that
could exhibit spatial dependence, and ran the regression again to compare the results
in instances when this was not the case.
93
4.5 Results
We first explored the prices of selected properties, and found significant variations
among them across the study area (Figure 4.5). The price variation goes up to 100
fold, from lows of $50,000, $75,000 in Boyle Heights, East Los Angeles, etc. to
highs of $5,250,000, $4,950,000 in Pacific Palisades, Palos Verdes, etc. Moderately
high prices are found in cities along South Bay but slightly further inland, and areas
to the northwest (e.g. Hollywood) and northeast (e.g. Pasadena) of Downtown L.A.
Middle parts of the map shows a north-south band of low values that extends from
Figure 4.5 Sale prices of selected properties and an extrapolated price surface
94
95
San Pedro, Long Beach area towards north, through part of the Gateway cities and
South Bay cities, then reaches Downtown L.A.
Next, to better understand all the variables, we summarized their mean,
median, and standard deviation (Table 4.1), and also generated box plots (Figure 4.5).
Both exploratory tools suggested significant skew in the dependent variable and most
of the independent variables, thus necessitating variable transformation. A natural
log transformation corrected the skew of most variables, but a Box-Cox
transformation was needed for the variable Nonwhite, because it is heavily skewed
due to the segregated distribution of the ethno-racial groups in Los Angeles (Allen
and Turner, 2002; Joassart-Marcelli et al., 2005). We did not transform the variables
Bathroom, Age, HH.Child and HH.Child6, as their distributions seemed normal, and
Table 4.1 Descriptive statistics
Variables Min Max Mean Median Std. Deviation
SalePrice ($) 50,000 5,250,000 775,074 637,000 504,767
Bathroom 1 8 2.05 2 1
LotSize (sq.ft.) 1,050 1,692,741 9,497 6,580 39,791
Structure (sq.ft.) 418 7,498 1,735 1,542 860
Age (years) 0 109 50.9 51 20
Household 63 2,360 672 573 410
Income ($) 13,125 200,001 63,130 56,250 30,474
Nonwhite (%) 6 99 51 46 30
HH.Child (%) 7 92 54 52 15
HH.Child6 (%) 0 40 12.1 11.7 5.9
PopDen
(person/acre)
2 70 15 12 11
Green (sq.ft.) 36 91,829 3,322 2,161 4,848
Pkdist (ft.) 137 7,965 2,543 2,307 1,389
Rpdist (ft.) 86 16,137 3,622 2,908 2,769
Pkarea (sq.ft.) 530 3,767,818 99,815 29,355 349,814
PK05area (sq.ft.) 0 18,453,126,977 197,681,313 410,288 1,879,969,553
their semi-log relationships with the dependent variable seemed to be appropriate:
their effect on sale price was non-linear for example.
Figure 4.6 Box plots of the selected variables, showing median, upper and lower
quartiles, minimum and maximum data values of each variable
96
97
We report the results for Approaches A and B separately in the following
subsections respectively. We have added letter “A” or “B” before the model numbers
to differentiate them.
4.5.1 Overall Model Performance Comparison Using Approach A
We compared model performance based on goodness-of-fit, as well as the level of
spatial autocorrelation left in the regression residuals. We used adjusted R
2
and/or
AIC value as a measure of model fitness, and used Moran’s I and/or LM test as a
measure of spatial autocorrelation. Not all models generated all four measures, due
to differences in the estimation mechanism of each model. But the available
measures are sufficient to show relative merit. Also, not all tested versions of each
model are included in the table, because we only chose the better performing one
within the same category. Hence, the text in parenthesis in Table 4.2 describes which
version we selected. For example, for Model A3, we chose the version with a
quadratic spatial expansion of x,y coordinates; for Model A4, we only included the
spatial lag model defining spatial neighbors as properties within 4 km (NB4000m);
for Model A5, we only included the spatial lag model using the same spatial
neighbors as in Model A4 and combining with the quadratic expansion of x,y
coordinates; for Model A7, we chose the spatial filtering model fitted with a natural
cubic spline (ns).
Table 4.2 shows that all spatial models improved the model fitness of the
benchmark model. Spatial lag models fit better than spatial expansions models or
spatial filtering models, while significantly reducing spatial autocorrelation in the
98
regression residuals. Spatial expansion models cannot effectively capture spatial
dependence, whereas spatial filtering models tend to overcorrect spatial
autocorrelation to the negative realm. GWR produced the highest model fitness
while capturing spatial variations most effectively and leaving its regression
residuals free of any significant spatial autocorrelation. It is the best performing
model. The following sections report the results of each model in detail.
Table 4.2 Model performance comparison using Approach A
Models
Adjusted
R
2
AIC Value
Moran’s I
(p-value)
LM test
(p-value)
Model A1 Benchmark 0.648 947.7
0.341
(0.000)
362.967
(0.000)
Model A2 Spatial expansion via
interactive variables
0.657 920.9
0.361
(0.000)
Model A3 Spatial expansion via x,y
coordinates (Quadratic)
0.718 652.7
0.207
(0.000)
Model A4 Spatial lag model
(NB4000m)
418.8
0.060
(0.806)
Model A5 Spatial lag model
(Quadratic, NB4000m)
372.8
1.681
(0.195)
Model A6 GWR 0.774 363.0
0.007
(0.713)
Model A7 Spatial filtering
(ns fitting)
793.8
-0.001
(0.640)
4.5.1.1 Model A1: Standard Hedonic Model
The results of Model A1, the standard hedonic model, serve as the benchmark for
this approach. Table 4.3 shows that about 65% of the variation in sales prices
(normalized through a logarithmic transformation) is explained by 14 variables (in
the case of Option 1) or 13 variables (in the case of Option 2). It is not a very high
return of fitness, but still acceptable because the goal of our research is to discover
the impact of neighborhood greenspace and the embedded spatial structure in the
99
data, instead of the hedonic model’s prediction accuracy. The benchmark value from
the standard hedonic model was designed to measure the improvement, if any, of the
spatial models that follow. Our calculation of Moran’s I coefficient indicates
significant spatial autocorrelation in the error term, which confirms the necessity to
apply the following spatial models.
Table 4.3 Benchmark model estimation results under two options
Variables Option 1 Option 2
Value Std. Error p-value Value Std. Error p-value
Intercept 4.148 0.375 0.000 3.989 0.369 0.000
Bathroom 0.061 0.014 0.000 0.061 0.014 0.000
Ln(LotSize) 0.052 0.021 0.013 0.052 0.021 0.014
Ln(Structure) 0.500 0.033 0.000 0.499 0.033 0.000
Age 0.004 0.001 0.000 0.004 0.001 0.000
Ln(Household) 0.048 0.014 0.001 0.049 0.014 0.000
Ln(Income) 0.440 0.024 0.000 0.440 0.024 0.000
BC(Nonwhite) -0.041 0.014 0.003 -0.041 0.014 0.003
HH.Child -0.010 0.001 0.000 -0.010 0.001 0.000
HH.Child6 0.008 0.001 0.000 0.008 0.001 0.000
Ln(Popden) 0.125 0.013 0.000 0.127 0.013 0.000
Ln(Green) 0.037 0.012 0.002 0.036 0.012 0.002
Ln(RPdist) -0.022 0.008 0.009 0.022 0.008 0.007
Ln(PKdist) -0.020 0.012 0.096
Ln(PKarea) 0.0002 0.005 0.963
Ln(PK05area) 0.001 0.001 0.208
Adjusted R
2
0.648 0.648
AIC 947.7 946.9
Moran’s I 0.341 0.342
(p-value) (0.000) (0.000)
LM
lag
362.967 362.988
(p-value) (0.000) (0.000)
Robust LM
lag
48.301 47.586
(p-value) (0.000) (0.000)
LM
error
332.559 333.747
(p-value) (0.000) (0.000)
Robust LM
error
17.892 18.345
(p-value) (0.000) (0.000)
Options 1 and 2 only differ in the way the model represents the measurement
of park impact. Option 1 uses the distance to the nearest park and the park area,
100
whereas Option 2 uses the area of all parks within half a mile of each property. The
underlying rationale is people may prefer visiting the nearest park regardless of the
distance, or they may only be interested in parks that are within half a mile, which is
the equivalent of a 20-minute walk. Both outcomes are plausible, and we had no
knowledge as to whether or not one is more preferred among Los Angeles County
residents. The estimation results show that the two options gave very similar levels
of fitness and similar coefficients for their common variables. However, the only
park variable Ln(PK05area) included in Option 2 is insignificant, which suggests
that this kind of representation cannot effectively capture park impacts. So we chose
Option 1 as the more robust model.
The Option 1 estimation results show that the neighborhood greenspace
variable Ln(Green) is significant, and it positively affects property prices such that a
1% increase of greenspace area will increase property values by 0.04%. Similarly,
proximity to neighborhood parks increases property values by 0.02% for every
percent reduction of distance to the nearest park. The four house structural variables
mostly returned coefficients that matched our expectations. For example, increasing
structural area by 1% will increase property value by 0.5%, and the signs of the
coefficient of Age (0.004) suggests the older the property the higher its value. Some
neighborhood variables also returned coefficient signs that did not match our
intuition, such as Ln(Household) and Ln(PopDen), whose positive coefficients
suggest higher property values exist in areas with higher numbers of households and
higher population density.
101
We also calculated the Moran’s I coefficient for the regression residuals, and
it confirmed the presence of significant spatial autocorrelation. So we will now try to
identify the underlying spatial processes, starting from spatial heterogeneity.
4.5.1.2 Models A2 and A3: Spatial expansion models
Models A2 and A3 adopt two different spatial expansion approaches to model spatial
heterogeneity (see details in section 4.3). Table 4.4 reports the estimated coefficients
for all the base variables plus the significant coefficients for expanded variables (we
did not include all the expanded variables in this table due to their large number).
Table 4.4 Spatial expansion methods estimation results
Variables Model A1 Model A2
Model A3
Linear
Model A3
Quadratic
Intercept 14.148 8.162 1.320 3.395
Intercept.X 2.002e-04
Intercept.X
2
2.730e-09
Bathroom 0.061 -0.441 0.057 0.056
Ln(LotSize) 0.052 0.054 0.053 0.076
Ln(Structure) 0.500 0.496 0.564 0.573
Ln(Structure.X) -4.911e-06 -4.266e-06
Age 0.004 -0.011 0.003 0.003
Age.Y 1.079e-07
Age.XY -1.356e-11
Ln(Household) 0.048 0.053 0.103 -0.119
Ln(Household).X -3.236e-06 4.829e-06
Ln(Household).Y 2.525e-05
Ln(Household).Y
2
-6.063e-10
Ln(Household).XY -4.151e-10
Ln(Income) 0.440 0.445 0.514 0.014*
Ln(Income).X -6.372e-06 4.368e-05
Ln(Income).Y 1.284e-05
Ln(Income).X
2
-8.421e-10
Ln(Income).XY -9.676e-10
BC(Nonwhite) -0.041 -0.446 0.026 0.423
BC(Nonwhite).X -5.056e-06 -4.254e-05
BC(Nonwhite).Y -2.907e-05
BC(Nonwhite).X
2
6.863e-10
BC(Nonwhite).Y
2
5.283e-10
BC(Nonwhite).XY 1.151e-09
102
T
able 4.4 Continued
Variables Model A1 Model A2
Model A3
Linear
Model A3
Quadratic
HH.Child -0.010 -0.010 -0.012 -0.004
HH.Child.X -1.195e-07 -8.024e-07
HH.Child.Y 4.088e-07
HH.Child.X
2
1.429e-11
HH.Child.XY 2.600e-11
HH.Child6 0.008 0.040 0.003 0.010
HH.Child6.X 3.062e-07
HH.Child6.Y -9.413e-07
HH.Child6.Y
2
2.542e-11
Ln(PopDen) 0.125 -0.288 0.236 0.092
Ln(PopDen).X -5.05e-06 1.101e-05
Ln(PopDen).Y -3.231e-06
Ln(PopDen).X
2
-4.160e-10
Ln(PopDen).Y
2
-1.114e-10
Ln(Green) 0.037 0.036 0.070 0.164
Ln(Green).X -2.659e-06 -8.422e-06
Ln(Green).Y -9.624e-06
Ln(Green).Y
2
2.109e-10
Ln(RPdist) -0.022 -0.023 -0.028 0.070
Ln(RPdist).X -3.110e-06
Ln(RPdist).Y -1.039e-05
Ln(RPdist).Y
2
2.401e-10
Ln(RPdist).XY 2.167e-10
Ln(PKdist) -0.020 -0.547 0.077 0.125
Ln(PKdist).X -3.636e-06 -1.491e-05
Ln(PKdist).Y -4.013e-06
Ln(PKdist).X
2
1.915e-10
Ln(PKdist).Y
2
-2.292e-10
Ln(PKdist).XY 4.956e-10
Ln(PKarea) 0.0002* -0.0003* 0.005* 0.047
Ln(PKarea).X -3.749e-06
Ln(PKarea).Y -1.203e-06
Ln(PKarea).X
2
6.244e-11
Ln(PKdist)_Bath 0.065
Ln(PKdist)_Age 0.002
Ln(PKdist)_BC
(nonwhite)
0.053
Ln(PKdist)_ HHchild6 -0.004
Ln(PKdist)_Ln
(Popden)
0.054
Adjusted R
2
0.648 0.651 0.678 0.718
Moran’s I 0.341 0.332 0.306 0.207
(p-value) (< 2.2e-16) (< 2.2e-16) (< 2.2e-16) (5.731e-13)
Note: All the reported coefficients are statistically significant at 5% level, except the ones marked
with *.
103
Model A2 expands the base model by adding interactive variables between
PKdist and all the other variables, because we assume distance to nearest park is a
proxy for the varied physical and socioeconomic environment. Columns 2 and 3 of
Table 4.4 contain estimation results for Models A1 and A2, respectively. Model A2
has a slightly higher adjusted R
2
value and slightly lower Moran’s I coefficient, and
it reversed some of the counterintuitive signs for coefficients reported in Model A1.
For example, the Ln(PopDen) and Age coefficients are now negative, meaning less
dense neighborhoods and newer houses have higher property values, which are both
common outcomes. In addition, the coefficient (0.002) of the interactive variable
Ln(PKdist)_Age further adjusts the effect of property age so that when holding
property age constant, a 1% increase of distance to the nearest park makes the
increment of sales price with respect to property age more positive by 0.002%. In
other words, the effect of property age increases in areas further away from the
nearest neighborhood park. This is possible because house structural attributes can
exert stronger impacts when other desirable attributes (e.g. park accessibility) are not
available.
Model A2 identified a total of five such significant interactive variables
between Ln(PKdist) and structural or neighborhood variables, meaning these
interactive terms successfully captured the spatial heterogeneity of these variables,
and therefore allow their parameters to vary across the study area with respect to
distance to parks. Their interpretations are similar to the above explanation with the
exception of HH.child6, for which the effect was amplified rather than reduced with
104
proximity to parks. This is understandable as households with young children tend to
appreciate neighborhood parks more than other households.
Model A3 uses x, y coordinates directly to expand the variable parameters.
Columns 4 and 5 of Table 4.4 report the estimation results for both linear and
quadratic expansion models, with only significant coefficients listed. There are 10
and 13 variables in the linear and quadratic models, respectively that display
significant spatial variation in their coefficients and whose values are the sum of the
global variable and its coordinate expansion variable(s). For example, in the linear
case, the Ln(Structure) parameter appears to decrease significantly moving from
west to east across the region (the value becomes more negative as the x coordinate
values increase), and the HH.Child parameter appears to increase significantly
moving from south to north across the region (the value becomes more positive as
the y coordinate values increase). Similarly, the Ln(Green) parameter decreases
moving east, which suggests that the same amount of neighborhood greenspace is
valued more in coastal areas. The PKdist parameter decreases moving east and north,
indicating that the northern and inland parts of the region value proximity to parks
more.
In the quadratic expansion model (i.e. column 5 in Table 4.4), the effect of
second degree expansion variables complicates the scenario, so we mapped the
locally varying coefficients in Figure 4.7, using variables Ln(Green), PKdist, PKarea
and Intercept as key examples (the PKarea variable was not significant in any
previous models, but this quadratic expansion model enhanced its performance). All
105
four variable coefficients are not only directly plotted but also extrapolated in
ArcMap for better visualization of spatial patterns. We can see the coefficients of
Ln(Green), PKdist, and PKarea initially increase moving inland from the coast and
then decrease moving further east. The spatial pattern moving from south to north-is
less obvious, but the impact of Ln(Green) and PKarea grows in strength moving
from south to north and then slightly decreases after passing the central area.
The two spatial expansion models did improve the estimation results of the
base model by capturing some of the spatial heterogeneity in the data. Model A3
performed slightly better than Model A2, based on the higher adjusted R
2
values and
lower Moran’s I coefficients. Model A3 also provided more detailed information on
the pattern of spatial variation; however, the Moran’s I coefficients were still
significant, meaning the spatial dependence effect coexists with that of spatial
heterogeneity.
Figure 4.7 Estimated coefficients for key variables from Model A3, with quadratic expansion
106
107
4.5.1.3 Models A4 and A5: Spatial regressions targeting spatial dependence
Models A4 and A5 focus on removing spatial dependence via either the spatial lag or
the spatial error model, to Models A1 (benchmark model) and A3 (the spatial
expansion model using x, y coordinates), respectively. As reviewed in section 4.2,
the spatial lag and spatial error models are commonly applied in the field of spatial
econometrics. Researchers need to use specification tests to choose the more
appropriate one for their analysis. We ran LM tests on the residuals of Models A1
and A3, and the test results point to the spatial lag model as the better one, as its test
statistics are higher, even in the robust form (i.e. for the test results given the
existence of an autoregressive error term) (see Table 4.3). So, both Models A4 and
A5 apply the spatial lag model, but to the variables of Models A1 and A3,
respectively. We wanted to compare the estimation results of the two models in order
to see how strong the spatial dependence effect is with or without addressing spatial
heterogeneity in the model.
An important parameter for estimating the spatial dependence is the
definition of spatial neighbors and the spatial weighting matrix. Based on the
exploration of the spatial structure of the study area, we decided to define spatial
neighbors as either “observations within a 4 km radius of a given property”
(thereafter referred as NB4000m) or “the nearest three observations for any given
property” (thereafter referred as Nearest3NB), and then construct the spatial
weighting matrix according to the distance of these neighbors to a given property.
The 4 km cutoff distance is based on the semi-variogram plot of the residuals of
Model A3, which indicates that 3-4 km is the range or the distance that encompasses
most of the local variation in residual values (Figure 4.8). Given the geographic
context of Los Angeles County, a 4 km radius is reasonable because it usually covers
a local city area. But significant local spatial variations can still exist within a city, so
we also chose the nearest three neighbors as a more conservative definition for
spatial neighbors.
Figure 4.8 Semi-variogram of the residuals of Model A3
The Model A4 estimation results show an obvious trend of deflated
coefficients when compared to Model A1. In addition, using NB4000m as spatial
neighbors generates more deflated coefficients than using Nearest3NB (Table 4.5).
This means the spatial lag model has effectively taken out the “inflation” of
108
109
coefficients caused by spatial dependence in the benchmark model, and that
NB4000m captures more spatial dependence than Nearest3NB. This is confirmed by
the LM test results for residual autocorrelation: the two spatial lag models reduced
the test statistics to a very insignificant level. In addition, the measurement of model
fitness (i.e. the AIC value), is significantly lower in the spatial lag models, which
indicates a better fit as well. We use AIC here instead of adjusted R
2
, because spatial
lag models utilize maximum likelihood estimation procedures, which do not derive
R
2
values by default. Again, NB4000m performed better than Nearest3NB, giving an
even lower AIC value.
The Model A5 estimation results are more complicated since we included
both linear and quadratic expansion models, as well as the two different spatial
neighbor specifications (Table 4.6). We wanted to see which combinations can
generate the best estimation results. The AIC values indicate a very clear trend:
NB4000m performs better than Nearest3NB, and the quadratic expansion performs
better than the linear expansion. However, the LM test for residual autocorrelation
suggests that the combination of linear expansion and NB4000m reintroduced spatial
dependence back into the model, and that combining quadratic expansion with
Nearest3NB reduced spatial dependence most effectively.
110
Table 4.5 Estimated coefficients of Model A1 and Model A4 (with two spatial
neighbor specifications)
Variables Model A1 Model A4 (Nearest3NB) Model A4 (NB4000m)
Intercept 4.148 2.377 0.389
(p-value) (0.000) (0.000) (0.264)
Bathroom 0.061 0.051 0.048
(p-value) (0.000) (0.000) (0.000)
Ln(LotSize) 0.052 0.047 0.058
(p-value) (0.013) (0.010) (0.001)
Ln(Structure) 0.500 0.420 0.414
(p-value) (0.000) (0.000) (0.000)
Age 0.004 0.003 0.003
(p-value) (0.000) (0.000) (0.000)
Ln(Household) 0.048 0.027 0.002
(p-value) (0.000) (0.027) (0.870)
Ln(Income) 0.440 0.243 0.161
(p-value) (0.000) (0.000) (0.000)
BC(Nonwhite) -0.041 -0.037 -0.036
(p-value) (0.003) (0.003) (0.002)
HH.Child -0.010 -0.006 -0.003
(p-value) (0.000) (0.000) (0.000)
HH.Child6 0.008 0.005 0.003
(p-value) (0.000) (0.000) (0.003)
Ln(PopDen) 0.125 0.080 0.057
(p-value) (0.000) (0.000) (0.000)
Ln(Green) 0.037 0.022 0.017
(p-value) (0.002) (0.040) (0.094)
Ln(RPdist) -0.022 -0.012 -0.005
(p-value) (0.009) (0.097) (0.486)
Ln(PKdist) -0.020 -0.017 -0.021
(p-value) (0.101) (0.121) (0.047)
Ln(PKarea) 0.0002 -0.001 0.001
(p-value) (0.963) (0.865) (0.842)
Rho N/A 0.352 0.572
(p-value) (0.000) (0.000)
AIC 947.7 631.2 418.8
LM test residual
Autocorrelation
(p-value)
362.967
(0.000)
0.024
(0.878)
0.060
(0.806)
The coefficient estimates are hard to compare since most of them vary locally. But
our key research variables still show the usual pattern: the global terms of Ln(Green),
PKdist, and PKarea have greater impacts in the quadratic expansion form than in the
linear expansion form, especially PKarea, which is statistically
111
Table 4.6 Estimation results for four versions of Model A5 that include different
combinations of spatial expansion and spatial neighbor specifications
Variables
Model A5
Linear
Nearest3NB
Model A5
Linear
NB4000m
Model A5
Quadratic
Nearest3NB
Model A5
Quadratic
NB4000m
Intercept 0.533** -1.524 2.213 0.143**
Intercept.X 1.308e-04 7.947e-05
Intercept.X
2
1.986e-09 1.164e-09*
Bathroom 0.04 0.0480 0.046 0.050
Ln(LotSize) 0.058 0.091 0.072 0.090
Ln(Structure) 0.490 0.509 0.518 0.534
Ln(Structure).X -3.878e-06 -3.054e-06* -3.863e-06**
Age 0.002 0.002 0.003 0.002
Age.Y 9.110e-08** 1.115e-07
Age.XY -1.194e-11 -1.101e-11
Ln(Household) 0.060 0.003** -0.063** -0.076**
Ln(Household).X -1.931e-06*
Ln(Household).Y 1.545e-05 1.111e-05*
Ln(Household).Y
2
-3.857e-10 -2.641e-10*
Ln(Household).XY -2.751e-10 **
Ln(Income) 0.278 0.231 -0.050** -0.053**
Ln(Income).X 3.389e-05 3.239e-05
Ln(Income).Y 9.892e-06 9.512e-06
Ln(Income).X
2
-6.297e-10 -5.702e-10
Ln(Income).XY -7.284e-10 -7.137e-10
BC(Nonwhite) 0.011** -0.010** 0.302 0.225
BC(Nonwhite).X 3.523e-06 -2.017e-06* 3.310e-05 -2.288e-05
BC(Nonwhite).Y -1.926e-05 -1.508e-05
BC(Nonwhite).X
2
5.426e-10 3.613e-10
BC(Nonwhite).Y
2
3.260e-10**
BC(Nonwhite).XY 8.661e-10 6.233e-10
HH.Child -0.008 -0.005 -0.003* -0.002**
HH.Child.X -5.640e-07 -4.799e-07
HH.Child.Y -7.157e-08 1.454e-07**
HH.Child.X
2
9.790e-12** 9.952e-12**
HH.Child.XY 1.996e-11 1.773e-11
HH.Child6 0.002** -0.002** 0.008 * 0.004**
HH.Child6.X -7.157e-08* 3.955e-07
HH.Child6.Y -6.862e-07*
HH.Child6.Y
2
1.890e-11**
Ln(PopDen) 0.154 0.118 0.076 0.073
Ln(PopDen).X 3.822e-06 2.748e-06 6.366e-06 4.881e-06*
Ln(PopDen).Y -1.590e-06*
Ln(PopDen).X
2
-2.688e-10 -2.105e-10
Ln(PopDen).Y
2
-6.956e-11* -7.930e-11
Ln(Green) 0.053 0.038 0.123 0.111
Ln(Green).X -2.493e-06 -1.640e-06 6.676e-06 -6.157e-06*
Ln(Green).Y -7.049e-06* -7.006e-06*
Ln(Green).Y
2
1.565e-10* 1.514e-10*
112
T
able 4.6 Continued
Variables
Model A5
Linear
Nearest3NB
Model A5
Linear
NB4000m
Model A5
Quadratic
Nearest3NB
Model A5
Quadratic
NB4000m
Ln(RPdist) -0.011** 0.012** 0.055* 0.069*
Ln(RPdist).X 3.069e-07** -2.476e-06 -0.000
Ln(RPdist).Y -6.804e-06 -0.000
Ln(RPdist).XY 1.465e-10*
Ln(PKdist) 0.043* 0.037** 0.072 0.093
Ln(PKdist).X -2.457e-06 -2.904e-06 -9.135e-06 -2.855e-06
Ln(PKdist).Y -2.444e-06
Ln(PKdist).X
2
1.811e-10
Ln(PKdist).Y
2
-1.549e-10 -1.749e-10
Ln(PKdist).XY 3.535e-10 4.008e-10
Ln(PKarea) 0.002** 0.003** 0.037 0.037
Ln(PKarea).X -3.438e-06 -4.242e-06
Ln(PKarea).Y -9.289e-07*
Ln(PKarea).X
2
6.434e-11 8.220e-11
Rho 0.338 0.531 0.264 0.423
AIC 529.5 443.5 420.6 372.8
LM test residual
autocorrelation
1.656 (0.198) 48.104 (0.000) 1.018 (0.313) 1.681 (0.195)
* insignificant at 5% level ** insignificant at 10% level
insignificant in the linear expansion models. To show an example of the spatial
pattern of these key variables, we mapped their coefficients in the spatial lag model
using quadratic expansion variables and the NB4000m spatial neighbor definition.
Figure 4.9 shows that Ln(Green) and PKdist display a similar pattern: coefficients
first increase as we move from the coast inland, then decrease after passing the
central area and moving further east. The pattern for PKarea is very similar except
on the east/west edges of the county where coefficients are higher again.
The results of Models A4 and A5 confirmed the effectiveness of the spatial
lag model in removing spatial dependence; Model A4 performed better according to
the LM tests of remaining spatial autocorrelation in the residual, but Model A5
performed better according to the AIC model fitness criterion.
Figure 4.9 Estimated coefficients of key variables from Model A5, specified with
a quadratic expansion and neighbors within 4 km
113
114
4.5.1.4 Model A6: Geographically Weighted Regression
We used GWR 3.0 (Fotheringham et al., 2002) to implement this model. We chose
the cross-validation option in the software to calibrate adaptive kernels, so that the
number of spatial neighbors for each observation point varies with local data density.
Local parameter estimates were then obtained by using the estimator in Equation
(4.5). Table 4.7 summarizes the resulting parameters in five key ranges. The adjusted
R
2
improves to 0.774 and the AIC value is lowered to 363, which are both better
results than those achieved in any previous model. The variables listed in Table 4.7
are all significant at the 0.01 level or higher, which proves that the marginal prices of
these characteristics are not constant, but vary across Los Angeles County.
The minimum and maximum parameter values are extreme and/or counter
intuitive in some cases. For example, the estimates for Ln(LotSize) range from -0.172
to 0.349, meaning if all else is constant, a 1% increase of lot size can reduce property
values by 0.17% at one location (i.e. regression point) and increase it by 0.35% at
another. So the inter-quartlile ranges of GWR estimates seem more plausible. Thus,
mapping the estimates to show the pattern of spatial variation seems like a better way
to interpret the results. Again, we selected our key variables, Ln(Green), Ln(PKdist),
and Ln(PKarea) for this task. Figure 4.10 shows the parameter estimates with
interpolated values as the background. Their spatial distributions are not as similar to
each other, unlike what was observed in the Model A5 map.
As expected, the estimates of neighborhood greenspace area are mostly
positive and significant throughout Los Angeles County and exhibit distinct spatial
115
trends. The highest marginal price estimates are found within the central urban areas
of the City of Los Angeles, which exhibit relatively high population density and low
greenspace coverage. The estimates also tend to be high in a corridor extending
southwest from the central urban area (west Los Angeles) to Marina del Rey, and in
the southern end (i.e. San Pedro and Long Beach), where housing density is higher.
The northeast area (north San Gabriel Valley) and the northwest area (San Fernando
Valley) also show high values (Figure 4.10).
Table 4.7 GWR parameter estimates
Variables Min
Lower
Quartile
Median
Upper
Quartile
Max
Intercept 0.426 3.686 5.516 7.173 13.147
Ln(Bathroom) -0.034 0.011 0.034 0.055 0.110
Ln(LotSize) -0.172 0.054 0.091 0.154 0.349
Ln(Structure) 0.191 0.378 0.464 0.532 0.746
Age -0.004 -0.001 0.001 0.004 0.008
Ln(Household) -0.071 0.016 0.050 0.094 0.221
Ln(Income) -0.116 0.146 0.236 0.353 0.692
BC(Nonwhite) -0.364 -0.066 0.0001 0.075 0.254
HH.Child -0.012 -0.006 -0.004 -0.002 0.003
HH.Child6 -0.021 -0.001 0.005 0.007 0.016
Ln(PopDen) -0.229 -0.024 0.036 0.098 0.198
Ln(Green) -0.128 -0.005 0.024 0.049 0.195
Ln(PKdist) -0.260 -0.040 -0.0002 0.024 0.205
Ln(RPdist) -0.090 -0.015 0.0001 0.021 0.169
Ln(PKarea) -0.082 -0.009 0.005 0.017 0.164
Adjusted R
2
0.774
AIC 363.0
RSS 102.9
The spatial pattern of Ln(PKdist) (distance to the nearest park) in Figure 4.10
shows far fewer areas with high values. The central urban area still has the highest
values with high value patches extending to the north San Gabriel Valley area. In
116
Figure 4.10 Estimated coefficients for key variables in Model A6 (GWR)
117
other words, properties in these areas get a larger value increment than properties in
other areas when their distances to the nearest park increase.
Ln(PKarea) (the area of the nearest park) displayed a slightly different spatial
pattern (Figure 4.9). The central urban areas and their neighboring areas (i.e. the
central San Gabriel Valley and west Los Angeles areas) all show up as enclaves of
high values, similar to the beach communities in Malibu, El Segundo, Manhattan
Beach, and their neighboring inland cities in the Southgate area (Downey, Lynwood,
etc.). Such patterns mean that property values in these areas appreciate more than
properties in other areas when the size of the nearest park increases.
4.5.1.5 Models A7 and A8: Spatial Filtering Models
Model A7 utilized Equations (4.6) and (4.7) to estimate a locally smoothed surface
of x, y coordinates together with the other explanatory variables in a hedonic model.
Model A8, on the other hand, utilized Equation (4.8) to filter the dependent variable
and/or the independent variables through the Getis filter, and then used the filtered
variables for the regression performed as part of the hedonic model. Estimation
results show that Model A7 drastically outperforms Model A8, therefore we focus on
reporting the Model A7 estimates here.
Model A7 produced acceptable results using both Equations (4.6) and (4.7) in
the GAM framework. Table 4.8 shows that their AIC values are both around 800,
which is lower than Model A1 (AIC = 947), but much higher than several other
models (e.g. Model A4 with AIC = 419, Model A6 with AIC = 363). Both equations
require choosing a parameter to control the level of smoothness of the surface: span
118
in the case of locally weighted regression (LOESS), and degrees of freedom (df) in
the case of natural cubic spline fitting (NS). After numerous tests, we settled on
values for both parameters that limited their neighborhood window size to a very
localized level, in order to reduce the spatial autocorrelation in the regression
residual to insignificance. The span parameter in LOESS has to be less than 0.05 (i.e.
uses < 5% of the surrounding sample data), whereas the degrees of freedom in NS
have to be greater than 250 (i.e. equivalent to a large number of knots and a less
smooth surface, considering the df value is usually 10, 20, etc.). This confirms the
significant spatial heterogeneity in our dataset found in previous models.
Table 4.8 Estimated coefficients from spatial filtering models in the GAM
framework (Model A7)
LOESS(span = 0.05) NS (df = 250)
Value Std. Error t-value Value Std. Error t-value
Intercept 4.347 0.369 11.776 4.566 0.375 10.578
Bathroom 0.061 0.014 4.471 0.039 0.014 2.729
Ln(LotSize) 0.047 0.021 2.296 0.088 0.022 3.925
Ln(Structure) 0.492 0.032 15.371 0.457 0.035 13.210
Age 0.004 0.0005 8.045 0.002 0.001 3.496
Ln(Household) 0.047 0.014 3.466 0.050 0.019 2.574
Ln(Income) 0.403 0.024 17.078 0.375 0.034 13.210
BC(Nonwhite) -0.015 0.014 -1.089* 0.013 0.022 0.585*
HH.Child -0.008 0.00 -13.020 -0.004 0.001 -4.268
HH.Child6 0.007 0.001 5.124 0.003 0.002 1.699*
Ln(Popden) 0.101 0.013 7.700 0.069 0.018 3.737
Ln(Green) 0.056 0.012 4.718 0.034 0.013 2.694
Ln(PKdist) -0.014 0.012 -1.116* -0.007 0.017 -0.421*
Ln(RPdist) -0.022 0.008 -2.690 -0.016 0.013 -1.222*
Ln(PKarea) 0.002 0.005 0.464* -0.005 0.007 -0.636*
AIC 804.7 793.8
Moran’s I in
residuals
(p-value)
0.035
(0.091)
-0.011
(0.640)
* t-values are insignificant at 5% level
119
The coefficient estimates generated with the LOESS surface took out some
inflation from the base model, but not as much as other spatial models. The natural
cubic spline surface apparently filters out more coefficient inflation from most
variables. As a result, the regression generated with the NS surface produces a lower
AIC value (i.e. higher fitness level) and a lower Moran’s I coefficient with higher p-
value, but the trade-off is that more variables became insignificant at the 5% level.
So, spatial filtering in the GAM framework seems to be a very strong although
relatively rough method to handle the spatial variation in the study data. It could
easily overcorrect the spatial pattern and generate negative Moran’s I statistics (e.g.
NS surface when df = 250).
Model A8, on the other hand, consistently returned adjusted R
2
values less
than 0.5, using different spatial weighting matrices, filtered variables and model
formats (e.g. the spatial expansion model). These R
2
values are substantially lower
than any previous model, including the base model, presumably because there is
some fundamental incompatibility between the Getis filter and the spatial structure of
the current data, since the filtered variables significantly deviated from their original
spatial patterns.
4.5.2 Overall Model Performance Comparison within Approach B
Using the same criteria as Approach A, we compared the benchmark model B1 with
four spatial regression models in Approach B. We did not repeat all the
aforementioned models, choosing instead the best performing one from each
category. The numbering of models follows that of Approach A and therefore is not
120
completely sequential. Table 4.9 summarizes the performance measures and we can
see that GWR (Model B6) is still the best performing model. The ranking of the
remaining spatial models, from high to low, placed the spatial error model combined
with quadratic expansion of x,y coordinates (Model B5) next followed by the spatial
expansion model with quadratic expansion of x,y coordinates (Model B3) and the
spatial filtering model using natural cubic spline fitting (Model B7). Among them,
Models B5 and B7 successfully removed significant spatial autocorrelation in the
regression residuals, while Model B3 did not. The following sections detail the
coefficients estimated for each of these models.
Table 4.9 Model performance comparison within Approach B
Models
Adjusted
R
2
AIC
Value
Moran’s I
(p-value)
LM test
(p-value)
LM
lag
615.7
(0.000)
Model B1 Benchmark 0.507 1588.1
0.425
(0.000)
LM
error
5232.1
(0.000)
Model B3 Spatial expansion via x,y
coordinates (Quadratic)
0.683 1023.1
0.286
(0.000)
Model B5 Spatial error model
(Quadratic, NB4000m)
570.8
15.643
(0.146)
Model B6 GWR 0.751 539.1
0.023
(0.284)
Model B7 Spatial filtering
(ns fitting)
1152.4
-0.007
(0.173)
4.5.2.1 Benchmark model B1
Table 4.9 shows that the adjusted R
2
for the standard hedonic model (Model B1) is
0.507, meaning only about 50% of sale price variations can be explained by the
included independent variables. This adjusted R
2
is much lower than that for
Benchmark model A1 (0.648), which indicates that omitted neighborhood /
121
demographic variables substantially reduced the explanatory power. Such omissions
can also lead to significant spatial autocorrelation in the regression residual, which is
confirmed by the Moran’s I statistics and LM tests. In addition, the results of LM
tests point to a spatial error model, instead of a spatial lag model, as the more
appropriate specification for modeling spatial dependence, because the LM
error
value
is significantly greater than LM
lag
.
The OLS estimation of coefficients is shown in Table 4.10. Most coefficients
are significant, except Ln(PKarea) (area of the nearest park) and most also have the
expected signs, with the exception of Bedroom and Ln(LotSize). The values of most
coefficients are greater than those in Model A1. For example, the coefficient of
Ln(Green) (neighborhood greenspace) suggests that a 1% increase of greenspace
area can increase property values by 1.18%. Such a percentage is very likely greater
than reality, so we suspect the omitted neighborhood variables have introduced some
spatial effects, which usually inflates coefficients.
Table 4.10 Estimated coefficients of the benchmark model (Model B1)
Variable Coefficient t-statistic p-value
Intercept 20.427 4.024 0.000
Bathroom 0.081 4.788 0.000
Bedroom -0.075 -5.788 0.000
Ln(LotSize) -0.073 -2.878 0.004
Ln(Structure) 0.818 20.595 0.000
Age 0.006 3.440 0.001
Age2 -3.017e-05 -1.905 0.057
Ln(Green) 1.177 3.432 0.001
Ln(Pkdist) -16.471 -2.510 0.012
Ln(RpDist) 16.708 2.071 0.038
Ln(Pkarea) 0.005 0.772 0.440
122
4.5.2.2 Model B3 and B5
Models B3 and Model B5 contain similar expansion variables and their estimation
results are reported in the same table for easier reading (see Table 4.11 for details).
Model B3 expands the benchmark model by adding interactive variables
between the base variables and the quadratic form of the x,y coordinates. We only
report the results for this model under the category of spatial expansion models
because like in Approach A, it is the best performing spatial expansion model.
Similarly, Model B5 is reported as the best performing of the spatial error models. It
includes all the quadratic expansion variables, and it adopts the NB4000m spatial
neighbor definition.
Due to the large number of quadratic expansion variables, we only included
the significant ones in Table 4.11. The results indicate that most coefficients, except
those for Age2 (the square of property age) in both Models B3 and B5, vary
significantly across the study area, both in the east-west and north-south direction.
The coefficients of Model B5 are generally lower than the corresponding ones in
Model B3, as a result of removing spatial dependence. The counter-intuitive signs of
some variables in Model B1 (e.g. Bedroom, Ln(LotSize)) are reversed in both Models
B3 and B5. Finally, the coefficients of the base variables in both models are
generally greater than those in corresponding models within Approach A. This is not
surprising considering the omitted neighborhood variables.
123
Table 4.11 Estimated coefficients of spatial error models (Models B3 and B5)
Variables Model B3 Model B5
Intercept 5.925 2.832
Intercept.X
2
4.530e-07 3.478e-07
Bathroom 0.065 0.063
Bathroom_X -6.734e-06 -6.214e-06
Bathroom_X
2
2.705e-09 1.974e-09
Bedroom 0.050 0.045
Bedroom_X -4.912e-06 -4.079e-06
Bedroom_XY 3.710e-09 2.989e-09
Ln(LotSize) 0.005 0.003
Ln(LotSize)_X -6.356e-08 -5.996e-08
Ln(LotSize)_Y -4.515e-08 -2.351e-08
Ln(LotSize)_XY 2.139e-10 2.027e-10
Ln(Structure) 0.735 0.716
Ln(Structure.X) -5.695e-05
Age 0.005 0.004
Age.Y 3.648e-06 3.515e-06
Age.XY -1.978e-9 -1.684e-9
Age2 2.150e-05 2.014e-05
Ln(Green) 0.845 0.836
Ln(Green).X -7.263e-05 -6.157e-05
Ln(Green).Y -6.493e-05 -6.158e-05
Ln(Green).X
2
7.913e-09
Ln(Green).Y
2
4.576e-09 2.642e-09
Ln(RPdist) -0.075 -0.069
Ln(RPdist).X -2.352e-04 -0.007
Ln(RPdist).Y -1.486e-04 -0.004
Ln(RPdist).Y
2
7.143e-09
Ln(RPdist).XY 5.337e-09
Ln(PKdist) -7.358 -5.093
Ln(PKdist).X -4.693e-04 -3.566e-05
Ln(PKdist).Y
Ln(PKdist).X
2
7.371e-08 3.542e-08
Ln(PKdist).Y
2
-5.127e-08 -2.679e-08
Ln(PKdist).XY 7.856e-08 3.008e-08
Ln(PKarea) 0.005 0.004
Ln(PKarea).X -3.749e-05 -4.242e-05
Ln(PKarea).Y -2.267e-05
Ln(PKarea).X
2
8.772e-09 5.720e-09
Lamda 0.613 (0.000)
4.5.2.3 Model B6: GWR
Table 4.12 summarizes the key ranges of estimated coefficients for the GWR. Figure
4.11 shows the spatial variations of Ln(Green) coefficients, and it indicates several
124
pockets of high values in areas around South Pasadena and in San Fernando Valley.
Downtown areas of the City of Los Angeles returned low or even negative values,
but low values also exist in locations to the east and south of the central city area.
Table 4.12 Estimated coefficients of GWR (Model B6)
Minimum
Lower
Quartile
Median
Upper
Quartile
Maximum
Intercept -119.079 -9.143 22.763 74.065 208.821
Bathroom -0.047 0.011 0.036 0.061 0.139
Bedroom -0.124 -0.029 -0.005 0.033 0.072
Ln(LotSize) -0.344 -0.006 0.083 0.178 0.371
Ln(Structure) 0.183 0.413 0.528 0.657 1.141
Age -0.024 -0.010 -0.005 0.004 0.026
Age2 -0.0002 -0.00002 0.00006 0.0001 0.0003
Ln(Green) -7.212 -1.067 1.258 4.281 12.096
Ln(PKdist) -268.416 -87.117 -17.274 21.392 169.093
Ln(RPdist) -229.834 -27.635 16.609 114.315 369.791
Ln(PKarea) -0.0878 -0.009 0.010 0.034 0.147
4.5.2.4 Spatial filtering model, Model B7
Given the estimation results of the spatial filtering models in Approach A, we chose
the best performing model, the spatial filtering model in the GAM framework, as our
specification for Approach B. The natural cubic spline (df = 200) once again did
better than locally weighted regression. The resulting coefficients are reported in
Table 4.13. All coefficients returned expected signs, and most of them are
statistically significant. Their values are generally slightly lower than for the
benchmark models, because the filtering process has removed some inflation caused
by spatial effects.
Figure 4.11 Estimated coefficients of Ln(Green) from Model B6, GWR
Table 4.13 Estimated coefficients of Model B7 fitted with natural cubic spline*
Variables Value Std. Error t-value
Intercept 23.736 3.172 7.578
Bathroom 0.057 0.025 2.214
Bedroom 0.008 0.002 3.187
Ln(LotSize) 0.015 0.004 2.925
Ln(Structure) 0.795 0.072 11.451
Age 0.005 0.000 5.648
Age2 -2.792e-05 3.765e-06 6.159
Ln(Green) 0.817 0.380 2.458
Ln(PKdist) -10.759 0.473 -2.713
Ln(RPdist) 7.539 3.465 1.984
Ln(PKarea) 0.010 0.017 0.596**
* df = 200, **coefficient not significant at conventional levels
125
126
4.5.3 Auxiliary regressions of Approach B
Following our confirmation that GWR was the best performing spatial model, and
that its coefficients varied across the study area, we then proceeded to the second
step of Approach B – running auxiliary regressions between the estimated
coefficients of Ln(Green) (i.e. the dependent variable) and
neighborhood/demographic variables (i.e. the independent variables). The goal of the
auxiliary regression is to discover potential associations between the variation of
neighborhood greenspace impact and the characteristics of neighborhoods.
We explored the following neighborhood characteristics: median income,
population percentage of main ethno-racial groups, household composition, and
population density. We then identified the following variables to be included in the
regression: median household income (Ln(Income)), percentage of White, Asian,
Black, and Latino population (BC(White), BC(Asian), BC(Black), BC(Latino),
respectively), percentage of households with children (HHchild), percentage of
households with children under six years (HHchild6), and population density
(Ln(Popden)). Most of these variables have been transformed to fit normal
distributions. A natural log transformation (Ln) was sufficient for income and
population density data, but a Box-Cox transformation (BC) was needed for all the
ethno-racial groups, because voluntary segregation, a special characteristic of Los
Angeles’ cultural landscape, produced heavily skewed data distributions.
OLS estimation of the regression returned very low goodness-of-fit. Only
about 9.5% of the variation of neighborhood greenspace impact can be explained by
127
these variables. But, as Table 4.14 shows, quite a few variables (e.g. BC(White),
BC(Black), HHchild6) still returned significant coefficients. We suspected the
presence of significant spatial structure in the neighborhood variables, which would
have negatively affected the estimation. Moran’s I statistics for the regression
residuals confirmed the presence of spatial autocorrelation: the value is very close to
1 when using the nearest three neighbors as the spatial weighting matrix. The spatial
autocorrelation is likely a result of the stratified sampling strategy, which was
designed to collect samples from different kinds of neighborhood, a strategy that
would inevitably lead to some clusters of observations within the same
neighborhood.
So we proceeded to apply spatial models to the auxiliary regression. We
tested both the spatial error model and GWR, because they have consistently reduced
Table 4.14 Estimated (OLS) coefficients of the auxiliary regression
Variable Coefficient t-statistic p-value
Intercept 7.704 1.220 0.223
Ln(Income) -0.435 -1.425 0.154
BC(White) 0.137 5.742 0.000
BC(Asian) -0.063 -0.443 0.658
BC(Black) -2.600 -4.058 0.000
BC(Latino) -0.467 -0.143 0.886
HHchild -0.011 -1.277 0.202
HHchild6 0.101 6.952 0.000
Ln(Popden) -0.114 -0.884 0.377
Adjusted R
2
0.095
AIC Value 9995.6
Moran’s I (p-value)
0.992 (0.000) when use Nearest3NB;
0.890 (0.000) when use NB4000m)
128
spatial autocorrelation in residuals better than other models. In addition, we chose
the spatial error instead of the spatial lag model, because the dependent variable
Ln(Green) is free from any spatial structure by definition, so a spatial lag term would
be inappropriate. The estimation results of the spatial error model turned out to be
better than GWR. Their AIC values are 1,494.9 and 3,082.4, respectively. The
corresponding adjusted R
2
values were 0.989 and 0.957, which are much higher than
the OLS result. Turning next to the coefficients, most of them turned out to be
significant (Table 4.15). For example, Ln(Income) and HHchild both positively
Table 4.15 Estimated coefficients for the spatial error model combined with
quadratic expansion, applied to the auxiliary regression variables
Variable Coefficients p-value
Intercept -0.721 0.625
Ln(Income) 0.103 0.050
BC(White) 0.027 0.001
BC(Asian) 0.084 0.097
BC(Black) -0.294 0.037
BC(Latino) 0.606 0.546
HHchild 0.004 0.001
HHchild6 0.001 0.619
Ln(Popden) 0.035 0.144
correlates with neighborhood greenspace impact. To be more specific, the value of
neighborhood greenspace would increase by 0.1% for every 1% increase in median
household income. Similarly, the value of neighborhood greenspace would increase
by 0.004% for every 1% increase in the percentage of households with children. The
ethno-racial composition of neighborhoods also appears to influence the impact of
neighborhood greenspace on house prices -- an increase of 1% in the White and
Asian populations would increase the value of neighborhood greenspace by 0.03%
129
and 0.08% respectively. But an increase of 1% in the Black population reduces the
value of neighborhood greenspace by 0.29%.
4.6 Discussion and Conclusions
This chapter adds to the understanding of applying spatial models to enhance
standard hedonic models that value neighborhood greenspace. In particular, we
followed the framework of Chapter 3 and extended the study to a larger study area
with more complex spatial structures. Thus, we can obtain a deeper insight into the
impacts of spatial effects on the valuation of neighborhood greenspace. We applied
four categories of spatial models using two approaches in order to discover both the
correct estimates of neighborhood greenspace effect as well as the underlying spatial
process(es). The spatial expansion model, spatial regressions targeting spatial
dependence, GWR, and the spatial filtering model were tried. The first approach (A)
included all applicable variables for all models, whereas the second approach (B)
omitted neighborhood variables in all models in the first model runs and then used
the estimated coefficient of neighborhood greenspace from these model runs as the
dependent variable and the omitted neighborhood variables as independent variables
in an auxiliary regression. In this way, Approach A focuses on obtaining the correct
estimation for valuing neighborhood greenspace impact, whereas Approach B aims
to also discover the underlying causes for the variation of such impact.
130
4.6.1 Significant neighborhood greenspace impact
Including benchmark models, we reported a total seven models in Approach A and
five models in Approach B. The variable for neighborhood greenspace (Ln(Green))
is significant in all models, including those that successfully removed the spatial
effects. The global estimation of the coefficient, free from spatial effects, varies from
a low of 0.017 (Model A4, spatial lag model using NB4000m to define spatial
neighbors) to a high of 0.817 (Model B5, spatial filtering model using a natural cubic
spline as the fitting strategy). In other words, every 1% increase of neighborhood
greenspace area brings increase in property values ranging from 0.02% to 0.82%. In
the median case, this adds $125 to $5,125 to property values. In the case of local
estimation such as spatial expansion models and GWR, this effect could decrease to
the negative realm or increase to as much as $75,625 (12.10% in Model B4, GWR).
We can see that the estimation coefficients from Approach B are substantially
greater than those from Approach A. Omitting neighborhood variables seems to
inflate the coefficients, even after removing spatial effects. The effects of
neighborhood characteristics seem to have been transferred to other variables in the
models. So, we are inclined to rely on the estimation results from Approach A for the
coefficient values of Ln(Green), and to restrict the use of the results of Step Two of
Approach B, the auxiliary regression, for discovering the potential reasons for the
variation of Ln(Green) coefficients across the Los Angeles metropolitan region.
131
4.6.2 Spatial modeling of significant spatial structures in Approach A
Our analyses provide strong evidence that complex spatial structures exist in the
study area: spatial heterogeneity and spatial dependence both exist, and their effects
are intertwined. GWR is the best performing spatial model in controlling these
spatial effects. We will briefly review the results of the models in the following
paragraphs, with a focus on spatial structures. In addition to the results of the
Moran’s I test and LM tests, we also refer to the value of coefficients and sometimes
their spatial patterns as indications of spatial structures.
In the benchmark model A1, some variables returned coefficients with
counter-intuitive signs: the positive effect of Age (0.004) could be due to people’s
appreciation of vintage architecture styles, but it is hard to believe that such styles
are prevalent in all of the real estate submarkets of Los Angeles County. The positive
coefficients generated for Ln(Household) and Ln(PopDen) suggest higher property
values exist in areas with larger numbers of households and higher population
density. Although this scenario is possible for dense coastal submarkets, we do not
think it is likely for the sprawling suburbs. So these counterintuitive signs could be a
result of embedded spatial structure in the data, which obscured the true effect of
these variables. In addition, coefficients from this model are generally inflated due to
spatial effects, so it is important to apply spatial analysis techniques to improve these
estimates.
The spatial expansion models returned significant coefficients for interactive
and coordinate expansion variables, proving the existence of significant spatial
132
heterogeneity. The coordinate expansion variables improved model fitness over the
base model considerably, especially the quadratic expansion. So, by allowing
parameters to vary across the region, the reduced spatial heterogeneity filtered out
some of the inefficiency of the stationary base model. At the same time, it lowered
the Moran’s I coefficients for the residuals, because modeling spatial heterogeneity
is the equivalent to adding neighborhood variables, which can be proxies for some
omitted variables that contribute to the clustering of residual values. However, the
spatial expansion regression residuals still exhibit significant spatial autocorrelation,
and the spatial patterns of coefficients cannot be fully explained. In Figure 4.6, for
example, while the patterns of Ln(Green) and Ln(PKarea) might be explained by the
lack of open space in the central Los Angeles area, the pattern for Ln(PKdist) is
counter-intuitive since the higher value in central Los Angeles means proximity to
parks negatively affects property values in that area. In addition, the spatial pattern of
Intercept seems to be exactly the opposite of the other three variables. This suggests
that, accounting for spatial variation in all 13 variables, sale prices still appear to be
lower in the central area than the surrounding areas. This result suggests the
existence of some uncovered spatial interactions. So, to further enhance model
performance, it was necessary to consider spatial dependence removal (Models A4
and A5) and more detailed exploration of spatial heterogenentiy (Model A6).
Models A4 and A5 adopted the spatial lag specification, because LM tests
suggest that the cause of spatial dependence was some structural factor such as
omitted variables. A spatially lagged dependent variable was used as a proxy for
133
omitted variables. As expected, the spatial lag model improved the model fitness of
both the benchmark and the spatial expansion models. The combination of the
quadratic spatial expansion model and the spatial lag term performed the best,
demonstrating the need to explicitly address both spatial heterogeneity and spatial
dependence. However, Model A5 (with quadratic expansion variables) still generates
questionable coefficient variation patterns for some key variables’ (Figure 4.8).
While it is easy to understand that neighborhood greenspace and parks are valued
more in central urban areas, the current variations do not differentiate the behaviors
of local markets along the north-south line crossing the central urban area. For
example, a northern neighborhood (e.g. La Canada Flintridge) might be expected to
value parks differently from a South Central neighborhood (e.g. Compton). In
addition, the spatial pattern of PKdist actually indicates that proximity to the nearest
park is less appreciated in the central areas, which seems counter-intuitive. This
result may mean that there are omitted variables such as crime rates that are
positively associated with proximity to parks, or that there are still more
undiscovered spatial structures.
In addition, comparing the LM tests for different versions of Model A5 (i.e.
different combinations of the expansion mode and the spatial neighbor definitions)
points to some subtleties when spatial heterogeneity and spatial dependence effects
coexist in a model. The quadratic expansion with the Nearest3NB definition reduced
spatial autocorrelation most effectively, whereas the linear expansion with the
NB4000m definition still left significant spatial autocorrelation in the residual. It is
134
possible that NB4000m defines too large an area for spatial dependence effects in
some locations, which conflicts with the delineation of spatial heterogeneity in the
linear expansion model. But such conflict was minimized when Nearest3NB, a much
smaller neighborhood area, was combined with the quadratic expansion, because this
combination accommodated a much more finely tuned pattern of spatial
heterogeneity. Thus we can see that it usually takes a great deal of trial and error to
find the suitable specification for the intertwined spatial heterogeneity and spatial
dependence. However, this challenging task may be avoided altogether by utilizing
local spatial statistical techniques (e.g. GWR) that avoid creating rigid, pre-defined
neighborhood boundaries and thus can discover more varied market segmentation
with less user input.
Not surprisingly, GWR estimates returned the highest fitness among all
previous models. GWR also has the advantage of displaying the spatial distribution
of each variable, which provides valuable guidance to the fundamental causes for the
underlying distributions. For example, a 1% increase of neighborhood greenspace
size (Ln(Green)) can reduce property values by up to 0.128% in the Malibu coastal
area, or increase property values by up to 0.195% in the central part of the City of
Los Angeles. However, the resulting spatial patterns in our GWR estimates cannot
all be explained, especially in some cases where the coefficient values appear
counter-intuitive. For example, the coefficients of PKdist are highest in the central
urban area, meaning property values there increase most when moving further away
from the nearest park. Considering the high population density in that area, we
135
expected people more appreciative of proximity to parks. So we suspect some
confounding factors (e.g. crime rate in parks) in this local area.
Another example is the coefficients of Ln(Green) showing high values in the
northeast area (North San Gabriel Valley) and the northwest area (San Fernando
Valley). But these areas are predominantly suburban areas where neighborhood
greenspace is not scarce. This suggests some omitted variables that are hard to
measure but whose impacts are captured by GWR local statistics regardless: maybe
the landscaping quality and perceived safety of these greenspace areas is higher than
in the surrounding areas. These observations suggest our GWR estimates include
both true variations in the marginal prices of all the attributes due to localized supply
and demand dynamics, as well as potential omitted variables and statistical
misspecifications. Overall, the GWR estimates reflect complex, localized spatial
patterns which make it difficult to draw a conclusion about the general spatial trend.
The resulting varying spatial patterns for each parameter cannot always point clearly
to their causes, but they can still provide important clues for further exploration. This
is reasonable for a complex real estate market like Los Angeles, where local
submarkets often display different behaviors.
Lastly, we used spatial filtering models as another local fitting spatial
analysis technique, to cross-check GWR’s estimation power. The spatial filtering
models are designed to take out all spatial variations from the original data before
they enter the regression. The one using a locally fitted x, y coordinate surface
(focusing on the spatial heterogeneity effect) performed much better than the one
136
using the Getis filter (focusing on spatial dependence effects). This could indicate
that the spatial heterogeneity is much stronger than the spatial dependence in our
data, or that the Getis filter has some fundamental incompatibility with the structure
of the spatial dependence. Further investigation is needed to discover the reason, but
in terms of model performance, it is safe to say that spatial filtering models are
relatively rough in correcting spatial effects: they can reduce spatial autocorrelation
to insignificance, but not as much as GWR and they cannot achieve GWR’s
goodness of fit. It is understandable because spatial filtering models still generate a
global estimation for each coefficient, which cannot reflect different spatial patterns
from multiple variables, so its performance cannot compete with local statistics in a
large study area with many local variations.
4.6.3 The Auxiliary Regression Results of Approach B
We also endeavored to discover the underlying reasons for the variations in the
impact of neighborhood greenspace on house prices. Since GWR performed best in
all the spatial models within Approach B, we used GWR’s estimation for Ln(Green)
as the dependent variable in an auxiliary regression. We found extremely low
goodness-of-fit using OLS, but extremely high spatial autocorrelation among
regression residuals. So we tested several spatial models, and found that interestingly,
in this case, the global fit of the spatial error model performed better than GWR.
Considering that the dependent variable Ln(Green) is free of spatial effects, the
strong spatial effects must come from the independent variables -- neighborhood
variables – that come with both spatial heterogeneity and spatial dependence due to
137
the stratified sample design used to select properties. The fact that a spatial error
model performs better than GWR indicates that spatial dependence is the dominant
spatial effect, and that spatial heterogeneity has been mostly captured by the
neighborhood variables themselves. Among the estimated coefficients, it is
interesting to see that Asian-Americans appreciate greenspace most (0.084),
followed by Whites (0.027), and that neighborhoods with higher Black populations
value greenspace less (-0.294). These results suggest the need for further exploration
of the impact of cultural preferences and other factors on how greensapce is
perceived and how the impact of neighborhood greenspace on house prices differs.
4.6.4 Final Thoughts
More empirical investigation is needed in future research to further explore the
mechanism of spatial models in the context of assessing neighborhood greenspace. It
is a challenging task to uncover the exact nature of the spatial interactions in a
complex real estate market like Los Angeles County, where a great number of local
submarkets exist but limited information on their individual behaviors are available.
Therefore, choosing the best spatial model for any empirical research has been
largely based on trial and error; it is as much an art as a science. More empirical
studies in the future would help build a collection of literature that we can start to
draw some general rules from. In addition, future research can also explore locally
weighted regressions (LWR) using housing attributes and/or neighborhood
greenspace attributes as weights. GWR is actually a special case of LWR.
138
Chapter 5: Performance of Four Spatial Models across
Two Different Geographic Scales
5.1 Chapter 5 Introduction
Over the last decade, an increasing number of hedonic analysis applications have
incorporated spatial effects through various spatial regression techniques, such as
spatial econometric regression models (i.e. spatial lag and error models), the spatial
expansion method, the spatial filtering method, moving window methods, and
geographically weighted regression (GWR). But such progress has not been widely
applied in hedonic applications seeking to estimate the impact of open space features.
Only a few empirical studies have employed spatial regressions (e.g. Anderson and
West, 2006; Des Rosiers et al., 2002; Geoghegan et al., 2003; Irwin, 2002; Kestens
et al., 2004, 2006; Patton and McErlean, 2003).
On a different but related note, given the substantial development of other
hedonic analyses incorporating spatial considerations (e.g. Beron et al., 2004;
Cameron, 2006; Day et al., 2007; Kim et al., 2003; Redfearn, 2009; Theebe, 2004)
and advances in the field of spatial econometrics and spatial statistics (Anselin et al.,
2004; Getis et al., 2004; Haining, 2003), comparative studies that investigate the
performance of different spatial models in a hedonic context are required to better
understand the mechanisms in play. But this kind of study is very rare as well.
139
Considering the above two limitations, there is a clear need for studies that
investigate variations in the performance of spatial models in assessing open space
value in a hedonic model. We conduct such an analysis in this chapter, building on
top of the results from the previous two chapters. Chapters 3 and 4 explored the
impact of neighborhood greenspace on residential property prices, emphasizing how
to incorporate spatial effects. These two studies differ most in terms of their
geographic scales: Chapter 3 targeted a small area of dense urban space in the middle
of the City of Los Angeles, whereas Chapter 4 included most of Los Angeles County
as the study area. Thus the underlying spatial structure of the data can be very
different, which in turn calls for different spatial modeling techniques. In this chapter,
we intend to use these two empirical studies as examples for seeking deeper
understanding of the mechanism and performance of these models at different scales,
in hopes of discovering some general rules of applying spatial modeling techniques
to open space studies.
In addition, the current practice of spatial modeling tends to separate between
the spatial econometric and spatial statistics approaches. There is rarely a
comparison of the models between the two fields, although they can both tackle
spatial effects. So we also aim to bridge the understanding of the two fields through
this study.
The remainder of this chapter is structured as follows: we will first review
existing studies that employed applied spatial modeling techniques to help organize
their empirical data in a hedonic model. Next, we will briefly describe our data and
140
method, and follow with a detailed description of our results. We will finish the
chapter with some concluding remarks and suggestions for future work.
5.2 Previous Comparative Studies
Spatial effects, namely spatial heterogeneity and spatial dependence, can result from
the inherent spatial structure of housing market behaviors, and/or the spatial structure
of one or more explanatory variables (e.g. greenspace coverage). More specifically,
housing markets often contain neighborhoods that exhibit heterogeneous behaviors.
At the same time, a property’s value often internalizes characteristics of adjacent
properties, which results in clustering of similar-value properties. Sometimes, if the
spatial structure of an omitted variable (e.g. neighborhood greenspace) correlates
with some existing variable(s), then the spatial effects will go to the residual which
in turn causes spatial autocorrelation. Understandably, Can (1992) named the two
spatial effects neighborhood effects (corresponding to spatial heterogeneity) and
adjacency effects (corresponding to spatial dependence). Ignoring these spatial
effects can lead to inefficient and/or biased coefficient estimation.
In practice, the two spatial effects are often intertwined together (Anselin,
1988b). It is difficult to fully disentangle the effect of spatial heterogeneity and
spatial dependence (Bailey and Gatrell, 1995). However, many existing spatial
regression studies only focus on spatial dependence, which can potentially lead to
errors. Applying multiple spatial modeling techniques is an effective way to capture
the active spatial effects, because these techniques offer different views to spatial
141
heterogeneity and spatial dependence. Comparing their estimation results can point
to the best direction in terms of moving forward.
5.2.1 A Brief Review of Some Common Spatial Modeling Techniques
The development of two connected fields, spatial econometrics and spatial statistics,
is the main force pushing advances in spatial modeling techniques. Spatial
econometrics mainly considers spatial data implemented in regional economic
models. It is thus model-oriented: its spatial model specification is usually instructed
by theory, especially socioeconomic theory. So it might be regarded as a subset of
spatial statistics, which is more broadly understood as a field considering the spatial
structure of all spatial data. Spatial statistics is more data-oriented, focusing on the
nature of space and spatial data (Anselin, 1988; Getis et al., 2004).
Coming from either of the two fields, the existing spatial modeling
techniques either directly model the spatial covariance of the data (i.e. data-oriented)
or start from the spatial processes (i.e. model-oriented), which is signified by the use
of a spatial weighting matrix. A spatial weighting matrix defines the spatial
proximity between neighboring observations.
The most common approach from spatial econometrics is adding a spatially
weighted autoregressive term into the base model to capture spatial dependence.
Depending on the underlying spatial processes, researchers choose to define the
autoregressive term as multiplying a spatial weighting matrix with the dependent
variable (spatial lag term), or with the error term (spatial error term). The
142
corresponding models are called spatial lag and spatial error models, respectively. A
combination of the two models is sometimes employed as well.
Spatial statistics acknowledges the same content but names it differently. The
aforementioned models are often referred to as simultaneous autoregressive models
(SARs) as a category. Spatial effects are treated as statistical nuisance in the error
term, by defining an autoregressive error term, then transforming it into other forms
as needed (Bailey and Gatrell, 1995). An additional category of models is called
conditional autoregressive model (CAR), which assumes that the probability of
values estimated at any given location are conditional on the level of neighboring
values (de Smith et al., 2007). But they often generate results similar to SARs. A
third category of model is the moving average approach, which estimates the value
of an observation by taking the average of the values in neighboring observations.
The field of spatial statistics offers more spatial modeling techniques that
directly model the spatial structure of the data. The most commonly used ones are:
geostatistics, the spatial expansion method, spatial filtering, and trend surface models.
Geostatistics, when applied in spatial regressions, is actually a special case of
directly modeling the spatial covariance of the error term. Generally, this approach
assumes a functional form for the covariance structure, then estimates its elements
simultaneously with explanatory variable coefficients (Dubin, 1988, 1992, 1998).
Geostatistics replaces the usual functional forms by fitting a semivariogram, which
models spatial variation in the error term as a function of distance (Basu and
Thibodeau, 1998; Gillen et al., 2001).
143
The spatial expansion method is designed to capture spatial heterogeneity. It
extends the standard econometric expansion method to the spatial context (Casetti,
1972, 1997). It expands parameters with each observation’s x,y coordinates, so that
each parameter is the sum of a global and a location-specific estimate. The resulting
model thus allows parameters to vary across the study area. Sometimes, the spatial
expansion method takes the spatial process approach, and employs interactive
variables that multiply explanatory variables with spatial variables, such as distance
to the CBD. The parameters thus vary according to their distances to the CBD.
The spatial filtering method aims to remove any spatial structure of the data
prior to the regression analysis. Two typical approaches in the spatial statistics
literature are the Getis filter and Griffith eigenfunction decomposition (Getis and
Griffith, 2002). These approaches transform spatially autocorrelated variables into a
filtered non-spatial part and a residual spatial part. Another approach, trend surface
modeling, is not regarded as a form of spatial filtering in the spatial statistics
literature, but it essentially filters out the spatial structure of the data in the base
model by adding a fitted surface of x,y coordinates, which captures all spatial effects.
The surface can be fit globally with other explanatory variables in a typical
polynomial function, or fit locally via nonparametric smoothing function (e.g. locally
weighted regression LOESS) or parametric smoothing function (e.g. natural cubic
spline). We refer to the latter one as spatial filtering in the framework of generalized
additive models (GAMs) (Hastie and Tibshirani, 1987).
144
Geographically weighted regression (GWR) provides a third approach,
falling between the model-oriented and data-oriented approaches. It controls spatial
effects by locally fitting the regression, allowing the explanatory variable
coefficients to vary at every observation. The local fitting is achieved by assigning
weights to all explanatory variables through a local spatial weighting matrix
(Fotheringham et al., 2002). It is model-oriented because it employs a spatial
weighting matrix to represent the spatial relationship between observations, instead
of incorporating the coordinates into the model directly as variables. It is also data-
oriented because the spatial weighting matrix is decided by the spatial distribution of
the data.
5.2.2 Past Empirical Studies
We found very few studies that applied multiple spatial modeling methods in a
hedonic model context, and compared the corresponding estimation results. These
studies often adopt both the model and data-oriented approaches and then compare
the two. For the model-oriented approach, most of the studies applied at least one of
the SAR models.
In one such study, Lambert et al. (2004) compared the SAR, nearest neighbor
approach (NN), polynomial trend regression (PTR) and geostatistical approaches,
when analyzing the impact of nitrogen fertilizer on corn yield. The regression tests
the impact of nitrogen treatment on crop yield, so it is not a hedonic model per se,
but it presents a very similar methodology.
145
The SAR specification is the autocorrelated error model. NN is essentially a
form of the moving average method and the experimental error of each observation
is the average of the error term of its neighbors, so the set of residuals of these
neighbors is added into the regression as an extra variable to account for spatial
effects. PTR adds a quadratic trend term of the x,y coordinates into the base model,
and then estimates the trend term together with other explanatory variables with OLS.
The geostatistical approach estimates the parameters of the semivariogram first, then
uses the estimation results as priors to model the regression covariance matrix, and
finally estimates the regression model via the restricted maximum likelihood (REML)
approach.
Overall, all of these spatial models performed better than OLS, with higher
fitness levels and coefficient significance. The SAR model fits best, followed by the
REML-geostatistical model, PTR and NN in order. SAR also had the most
coefficient estimates that were statistically significant at the 5% level, although
parameter estimates were very similar for the REML and SAR models. The NN and
PTR models corrected for spatial structure in the residuals but were not as efficient
as the EGLS approaches. The SAR and REML-geostatistical models were also more
efficient (significant at 5% level) than PTR (significant at 10% level) and NN (not
significant) in capturing spatial variation of the key variable (impact of nitrogen on
yield) by different topography zones. In addition, the SAR and REML-geostatistical
models returned higher estimated values for the key variable in certain topography
zones, because they transform data to more localized values through re-weighting.
146
Comparing the SAR and REML-geostatistical models, SAR requires less steps to
operate and smaller numbers of observations. The REML-geostatistical approach
becomes a good alternative to SAR when enough data is available to estimate
semivariograms, and the discrete model of spatial variance structure is untenable.
Griffith (2005) took an approach from the field of spatial statistics and
compared conventional spatial regression models (SAR, CAR) with their spatial
filter counterparts, when mapping the West Nile Virus (WNV) in the U.S. SAR is
compared with Gaussian and Bionomial spatial filtering models, and the proper CAR
hierarchical generalized linear model (HGLM) is compared with the corresponding
spatial filter HGLM. The spatial filtering models are based on eigenvalue
decomposition. The base model is not the familiar hedonic linear specification,
because the response variable is binary (the existence or absence of WNV), which
requires a logic regression approach instead.
Overall, no model provided an ideal specification. But model comparisons
showed that spatial filtering returned higher pseudo-R
2
than SAR or CAR. Spatial
filtering can also discover the significant positive and negative spatial
autocorrelation, whereas SAR or CAR both found it insignificant. It is possibly
because that SAR and CAR’s global autocorrelation detection could not differentiate
positive and negative effects, and these effects canceled each other out. In addition,
spatial filtering has the advantage of applying a generalized linear model
specification, which is flexible enough to fit different model specifications according
to different distributions of the disease data. The estimation results also showed
147
similar intercept values from different models, which the author thinks should be a
general rule.
This particular article focused on capturing the spatial structure of the data,
instead of estimating correct variable coefficients. Griffith (2005) also paid special
attention to data distribution when choosing the regression model, concluding that
non-normal data are best described with non-normal probability models. So his
approach in this article is more data- than model-oriented, which worked well for
disease mapping, but may not be that helpful for hedonic model applications.
By comparison, Calderón (2009) offered a relatively simple/straightforward
comparison between spatial econometric models (including spatial lag and error
models) and a spatial statistic model (i.e. kriging), in an application to a classic
spatial database – the crime rate in Columbus (Ohio). They found kriging returned
better model fitness: its errors are significantly lower than those obtained from the
spatial econometric models. However, kriging directly estimates crime rates for the
whole area of Columbus without considering variable coefficients. So we cannot
compare model performance in terms of coefficient significance. So in this
perspective, this particular article does not contribute to our understanding of spatial
models in the hedonic context.
Militino et al. (2004) compared four models: SAR, CAR, a geostatistical
model, and a linear mixed-effect model (LME). They employed a typical hedonic
model as a base model to explain the dwelling sale price in Spain. Their results
demonstrated that SAR and CAR estimates are very similar, with CAR having
148
slightly larger log-likelihood and slightly lower residual standard errors. The
geostatstics model, with either the exponential or the spherical covariance structure,
has larger log-likelihood than SAR and CAR, although at the cost of higher residual
standard errors.
The LME is not a typical spatial statistics model that explicitly models the
spatial structure of data in the error term. But it is still often used by researchers to
account for spatial dependence through random effects. Random effects are
parameters that are only associated with observations showing some kind of
classification or grouping. Observations sharing a common random effect are
autocorrelated. Militino et al. (2004) used the intercept as the common random effect,
and tested three autocorrelation structures: spherical, exponential and autoregressive.
The first two returned better results (identical to the geostatistical model), but the
autoregressive one also interests researchers because it avoids specifying the spatial
weighting matrix.
Several studies also incorporated a geographically weighted regression
(GWR) in their comparisons. For example, Farber and Yeates (2006) compared a
spatial lag model with a GWR, and a moving window regression (MWR). They used
multiple R
2
(or pseudo R
2
in the case of local regression), sum-of-squared errors, and
Moran’s I for residuals as comparison criteria. The results showed that GWR has the
highest R
2
, followed by the MWR and spatial lag models. The same ranking applies
to Moran’s I values, meaning that GWR accounts best for the spatial variation. But,
the researchers also pointed out that GWR is more vulnerable to extreme/irrational
149
coefficients resulting from local irregularities in variable distribution. However, they
aim to mitigate this problem in the future by adjusting the number of nearest
neighbors for the local regressions.
Páez et al. (2008) also confirm the excellent performance of GWR, when
used with Toronto housing data in a case study to compare GWR with MWR,
ordinary kriging, and moving windows kriging (MWK). MWK is a local form of
kriging, which accounts for both spatial dependence and heterogeneity. GWR and
MWR focus more on spatial heterogeneity, whereas kriging focuses on spatial
dependence. The comparison focused on cross-validation trends and goodness-of-fit
criteria and sought to find out which spatial effect can be modeled by which method
efficiently.
GWR always returned the best predictive power, although only marginally
better than MWR. These two approaches outperformed the other models, which
suggests that in this case, capturing spatial heterogeneity enhanced model
performance more than capturing spatial dependence. Compared to MWR and MWK,
GWR also presented another advantage: estimates appear to be substantially better
for a wider range of window size selections. In the cross-validation trend analysis,
MWK seems to perform better than MWR, which could be due to its capability of
combining spatial dependence and heterogeneity, but it returned the lowest out-of-
sample prediction accuracy; because error autocorrelation is very likely not present
at the optimal window size, which will either have negative or no influence on
prediction accuracy (Basu and Thibodeau, 1998).
150
However, this study only compared the model’s goodness-of-fit and
prediction accuracy, and did not explore variable coefficient significance or its
spatial pattern. Otherwise, it would have provided more insights as a reference for
our study.
Kestens et al. (2006) compared GWR with the spatial expansion model whilst
assessing the impact of household profiles to property prices in a hedonic model.
They targeted spatial heterogeneity as the spatial effect to control. The spatial
expansion model employs interactive terms that multiply household characteristics
with explanatory variables. This is the only model comparison study we found that
incorporated various greenspace variables: percentage of area with mature trees
within 100 or 500 m of properties, percentage of area with low tree density within
500 m of properties, whether or not a property has more than 29 trees, NDVI
standard deviation within 1 km radius, etc.
Spatial expansion models were fit globally, whereas GWR is a local
regression. But both methods conclusively showed coefficients of certain housing
and location attributes varying according to household profiles. So they both
captured spatial heterogeneity successfully. There are a couple of variables that were
not significantly expanded in the spatial expansion models but were considered to be
heterogeneous in GWR. As to explanatory power, GWR models returned slightly
higher R
2
than corresponding spatial expansion models. Also, it seems that GWR
models reduced slightly more local spatial autocorrelation (“hot spots”) than spatial
expansion models.
151
In general, spatial expansion terms make it possible to analyze and to fully
explain the cause of the parameter heterogeneity, whether its structure be spatial or
not, although maybe at the cost of estimate precision. But GWR provides additional
insight by measuring local regression statistics, which are often more precise.
Waller et al. (2007) compared GWR with a spatially varying coefficient (i.e.
a spatial random effect) model, when investigating the geographic variations of the
relationship between alcohol distribution and violence. They used a Poison
regression for the base model. Considering the two models specify spatial variation
very differently, the authors argue that the comparison of model performance should
avoid overgeneralization and narrow interpretation. As a result, they found
qualitative similarity in model estimates: the estimated parameters cover very similar
ranges. But GWR returns a much smoother spatial variation surface, because it uses
a kernel with a large bandwidth, whereas the spatial random effect model applies
adjacency-based spatial similarity. In addition, GWR has the advantage of directly
providing maps of smoothed parameters, which is a faster way of describing spatially
varying levels of the association between alcohol distribution and violence. The
spatial random effect model, in contrast, requires higher computation intensity thus
higher cost. The disadvantage of GWR, in this case, is limited statistical inference
regarding the amount and extent of spatial pattern. This limitation allows a narrower
interpretation of estimated associations than the random effects spatially varying
coefficient model.
152
5.3 Comparing Spatial Model Performance at Two Different
Geographic Scales
5.3.1 Data and Method
Chapters 3 and 4 both employed multiple spatial regression techniques to remove
spatial effects in hedonic models, but at two very different geographic scales. The
case study reported in Chapter 3 dealt with spatial effects in a small, relatively
homogeneous study area in the center of the City of Los Angeles (hereafter termed
scenario 1), whereas the case study reported in Chapter 4 covered most of Los
Angeles County, which is full of heterogeneous elements and behaviors (hereafter
termed scenario 2). The goal of the present work was to find out how differently, if
any, these spatial regression techniques perform at different geographic scales, in a
context of hedonic pricing models. Based on these results, we also wanted to identify
the strengths and weaknesses of each spatial regression technique in estimating the
impact of neighborhood greenspace.
However, the case studies in scenarios 1 and 2 used observation points that were
mutually exclusive, even though geographic coverage of one was contained
completely within the other. In addition, the measurement of neighborhood
greenspace is different: scenario 1 measured greenspace in concentric rings from
properties, but scenario 2 only measured greenspace immediately adjacent to
properties. Such inconsistencies cannot provide meaningful references for
comparison. Therefore, we made the following changes to the data:
153
1. We combined the observations from both scenarios, and call the resulting
data “county data”. The observations from scenario 1 are hereafter referred to
as “local data”. We then ran all of the analyses on these two data sets and
compared the regression results.
2. We digitized all the greenspace adjacent to the properties in scenario 1, using
the polygon tracing tool developed by Daniel Goldberg from the USC GIS
Research Laboratory.
3. For the other variables in the regressions, we chose those that were common
to both scenarios.
We then applied a total of five sets of models: the classic hedonic model (i.e. the
benchmark), the spatial expansion model, spatial econometric regression model,
spatial filtering, and geographically weighted regressions. We ran the first four
models in the R program and then GWR in GWR 3.0 obtained from the National
Center for Geocomputation at the University of Ireland Maynooth.
5.3.2 Results
5.3.2.1 Benchmark Model
The above data processing generated a benchmark model containing the following
variables: Ln(LotSize) (lot size), Ln(Structure) (living area square footage), Age (age
of the property), Age2 (square of the property age), Ln(RPdist) (distance to the
nearest freeway ramp), Ln(PKdist) (distance to the nearest park), Ln(Income)
(median income of the block group), and Ln(Green) (area of neighborhood
greenspace). Such variable selection may not be suitable for both scenarios, in terms
154
of best explaining the variation of the response variable (i.e. property value).
However, this approach does not compromise the purpose of comparing spatial
modeling techniques: we do not focus on the exact performance criteria value, but
their improvements over the base model in each scenario, respectively.
As shown in column two of Table 5.1, the OLS estimation found scenario 1
has a much higher-goodness-of-fit (adjusted R
2
= 0.812, AIC = - 36.0) than scenario
2 (adjusted R
2
= 0.60, AIC = 2,002.8). Such a difference is not surprising
considering the drastically different spatial coverage of the two scenarios. Scenario 2
covers a much larger and more heterogeneous area, therefore contains greater
variability among observations. In addition, more factors would contribute to
property values in scenario 2, and they are more likely to have been omitted.
Table 5.1 also shows that the key variable of interest, Ln(Green), is
statistically significant in both scenarios, but scenario 1 generated a larger impact ( β
= 0.051, p-value = 0.002) than scenario 2 ( β = 0.029, p-value = 0.021). This could be
because the neighborhood greenspace impact is “diluted” or offset by widely varying
neighborhood preferences.
155
Table 5.1. Model performance comparison
Scenario 1
Model Type Base Expansion SER
Filtering
df = 10
GWR
Adjusted R
2
0.812 0.852 0.847 0.877 0.873
AIC -36.0 -77.5 -68.0 -91.6 -89.0
Ln(Green) 0.051 0.050 0.040 0.044 range
(p-value) (0.002) (0.006) (0.004) (0.003) range
lambda 0.421
(p-value) (0.000)
Moran’s I 0.313 0.128 -0.026 0.050 0.012
(p-value) (< 6.5 e-12 ) (7.9 e-5) (0.68) (0.1271) (0.86)
Scenario 2
Model Type Base Expansion SER
Filtering
df = 400
GWR
Adjusted R
2
0.60 0.622 0.675 0.697 0.704
AIC 2002.8 1920.3 1431.7 1314.2 1201.3
Ln(Green) 0.029 range 0.036 0.033 range
(p-value) (0.021) range (0.001) (0.011) range
lambda 0.538
(p-value) (0.000)
Moran’s I 0.584 0.5514 -0.029 0.020 0.009
(p-value) (<2.2 e- 15) (<2.3 e-14) (0.96) (0.109) (0.79)
Note: “range” means the corresponding value is not a single value, but a range of
values that vary across the study area
The Moran’s I test as well as the LM tests both confirmed the existence of
significant spatial dependence in the error term for both scenarios. Also, the LM
specification tests indicated that the spatial error model better represented the spatial
process for both scenarios (Table 5.2). The Moran’s statistics is higher in scenario 2
(0.584) than scenario 1 (0.313). This is interesting, either because the sampled sites
across Los Angeles County in scenario 2 are more clustered, or spatial heterogeneity
also contributed to the presence of positive spatial autocorrelation in the residuals.
156
Table 5.2 Specification test of spatial autocorrelation in base model
Test Value p-value
Lagrange Multiplier (lag) 35.788 0.000
Robust LM (lag) 6.454 0.011
Lagrange Multiplier (error) 36.405 0.000
Robust LM (error) 7.071 0.008
Lagrange Multiplier (SARMA) 42.859 0.000
5.3.2.2 Spatial Expansion Model Results
We applied the spatial expansion model by adding independent variables that
multiply the x,y coordinates with the base model variables. Table 5.3 lists the
estimated coefficients of both base variables and expanded variables that are
significant in either scenario. The third column of Table 5.1 lists the performance
statistics. We can see that scenario 1 improved goodness-of-fit by 4.9% (R
2
= 0.852,
AIC = -77.5), but some variables from the base model dropped out completely. Some
variables, such as Ln(PKdist), are only significant when multiplied with x or y
coordinates. This means that although these factors themselves do not have
significant impact on property values, their variations along certain vectors (i.e.
direction(s)) are significant. The target Ln(Green) variable maintained a similar
significant impact ( β = 0.050) as in the base model, but with a much lower p-value
(0.0006). It seems that although no significant spatial heterogeneity was discovered
for neighborhood greenspace within this local area, its estimation efficiency was
enhanced by capturing the spatial heterogeneity of other factors. This improvement
is confirmed with the 59% reduction of the Moran’s I statistic (0.128) from the base
model, although it still indicates significant spatial dependence.
157
Table 5.3 Coefficient estimates in scenario 1 (S1) and scenario 2 (S2)
expansion S1 expansion S2
Ln(Lotsize) 0.150
(0.004)
X_Ln(LotSize) 2.223
(0.041)
Y_Ln(LotSize) 7.145 e-07
(0.075)
Ln(Structure) 0.501 -1.255
(0.000) (2.154 e-5)
Y_Ln(Structure) -3.819 e-06
(1.264 e-08)
X_Age2 -1.442 e-10
(0.011)
X_Ln(RPdist) -3.112
(1.562e-07)
X_Ln(PKdist) -7.427 e-7
(6.767 e-8)
Ln(Income) 1.125
(0.000)
X_Ln(Income) -9.883 e-6 -3.215
(9.736 e-12) (2.180e-6)
Y_Ln(Income) -4.240 e-6
(4.269 e-16)
Ln(Green) 0.050 0.762
(6.394 e-4) (0.004)
X_Ln(Green) 1.725 e-06
(0.023)
Y_Ln(Green) 2.217 e-06
(2.983 e-04)
Scenario 2 also returned improved goodness-of-fit, although slightly less than
scenario 1 at 3.7% (adjusted R
2
= 0.622, AIC = 1,920.3). It also lost some significant
variables from the base model, but variables associated with neighborhood
greenspace are all significant (see Table 5.3). It means that neighborhood greenspace
not only has significant impact, but also varies significantly within the study area in
all directions. This is consistent with our expectation, considering such a large study
area. Figure 5.1 shows the spatial variation of the parameter, which ranged from -
0.03 to 0.182. There is a very clear spatial pattern: the closer these observation points
get to the ocean, the lower their parameter values become. In other words,
neighborhood greenspace is valued more in inland areas. This sounds logical,
because proximity to ocean can substitute for the need for neighborhood greenspace.
But we still doubt if the spatial pattern should be so uniform given the highly variant
nature of neighborhoods at the same distance to the ocean. The true coefficient
values could still be masked by uncaptured spatial effects. For example, spatial
dependence is still highly significant in the residuals, given that the Moran’s I
Figure 5.1 Estimated Ln(Green) coefficients from spatial expansion model,
scenario 2
158
159
statistic (0.551) only decreased about 6% from the base model, which is substantially
less than the reduction reported in scenario 1.
5.3.2.3 Spatial Error Model Results
According to the LM specification tests of the base model, a spatial error model
should be the appropriate specification when applying spatial econometric
regressions. A spatial weighting matrix is required for the model. We chose the
nearest three neighbors as the weighting scheme, because the case study from
Chapter 4 showed that both nearest three neighbors and neighbors within 4,000 m of
each property can capture spatial dependence effectively, but a diameter of 4,000 m
is too large for a local area like scenario 1.
We ran the spatial regressions with R’s spedp package (developed and
maintained by Roger Bivand). Some key estimation results are listed in column four
of Table 5.1. The coefficient of the spatial dependence term, lambda, is significant in
both scenarios with a very low p-value in each case. The effect in scenario 2 is a bit
stronger than scenario 1, maybe because three nearest neighbors capture more spatial
variation in its sampling scheme: it was more sparse than in scenario 1. As to
goodness-of-fit, the scenario 1 adjusted R
2
improved by 4.3% (R
2
= 0.847, AIC = -
68.0), which is slightly lower than the spatial expansion model counterpart. It
improved by 12.5% in scenario 2 (R
2
= 0.675, AIC = 1,431.7), which is substantially
greater than that generated with the spatial expansion model. This is a bit surprising,
because we expect a large metropolitan area to benefit more from a model focusing
on spatial heterogeneity.
160
The effect of immediate greenspace is also significant in both scenarios.
Scenario 1 took away some inflation of the OLS estimation, which resulted in lower
coefficient values and higher efficiency (i.e. a lower p-value). Scenario 2 also
achieved higher efficiency and generated a higher coefficient than the base model.
This is interesting, because we usually expect coefficients to deflate after the spatial
dependence is peeled away from the OLS model.
The Moran’s I statistic in the residuals is no longer significant, with slightly
different values generated in each of the two scenarios. Considering the different
startup values in the base models, scenario 2 achieved greater improvement.
5.3.2.4 Spatial Filtering Results
According to the analysis in Chapter 4, spatial filtering in the GAM framework
performs well for the county level data, so we decided to also use it for this
comparison analysis. We found that a locally fitted surface with either the natural
spline or LOESS can remove spatial dependence effectively, but with very different
specifications for surface smoothness between the two scenarios. Key estimation
results are produced in column five of Table 5.1.
Scenario 1 requires the degrees of freedom (df) to be around 10, or the span
to be around 0.3. Parameters lower or higher than these would lead to significant
spatial dependence in the residuals. In other words, the moving window’s size needs
to be large enough to generate a relatively smooth surface, but not too large that it
overlooks significant spatial variation. By way of comparison, scenario 2 requires the
optimum window size to be much smaller in relation to its study area size (df of least
161
400, or a span of 0.02 at most). Such a difference is consistent with our expectation:
a large study area needs a moving window small enough to reflect heterogeneity.
What is unexpected, however, is that AIC values in both scenarios (-91.6 and
1,314.2 respectively) are lower than those for the spatial expansion and spatial error
models. The analysis reported in Chapter 4 showed the reverse: spatial filtering gives
the lowest fitness level, as a consequence of its strong capturing of spatial variation,
which can easily overcorrect spatial autocorrelation. In this case, the spatial filtering
method reduced spatial dependence without overcorrection (Moran’s I = 0.05, p-
value = 0.127 in scenario 1; Moran’s I = 0.02, p-value = 0.109 in scenario 2), so we
suspect that by including fewer variables in the current models, we left out a higher
level of spatial variation in the residual, which better matched the intensity of the
filtering process.
Consistent with the above discussion, the coefficient of neighborhood
greenspace impact (Ln(Green)) in scenario 1 is slightly less than the base model and
the spatial expansion model ( β = 0.044, p-value = 0.003), because the spatial filter
removed the inflation caused by spatial dependence. Its p-value is slightly higher,
possibly because the spatial dependence is caused by omitted variable(s), which
would usually deflate variance. This coefficient is slightly higher than the one from
the spatial error model though, possibly because the spatial error model removed
spatial dependence more thoroughly, and therefore deflated the coefficient of
neighborhood greenspace impact more.
162
In scenario 2, however, this coefficient is slightly higher than the base model,
with less variance ( β = 0.033, p-value = 0.011). This is probably a result of trading-
off the effect of dependence reduction with heterogeneity reduction: the former
deflates coefficients while the latter enhances them. In this case, a locally fitted
surface captured both effects, but the strong heterogeneity in the large study area was
reduced more substantially; its impact on coefficients may outweigh the impact of
spatial dependence.
5.3.2.5 GWR Results
We chose an adaptive kernel to fit GWR for both scenarios, so that the number of
spatial neighbors for each observation can vary according to data density. Table 5.4
reports the parameters of the GWR estimation in quartiles. Column six of Table 5.1
reports performance criteria: the adjusted R
2
improves to 0.873 (AIC = -89.0) for
scenario 1, and to 0.704 (AIC = 1201.3) for scenario 2. We also mapped the
parameters of Ln(Green) according to the quartiles, with an extrapolated surface as
background for easier reading/visualization (Figures 5.2 and 5.3).
Scenario 1 improved fitness from the base model by 7.5%, which is higher
than the expansion and spatial error models, but slightly lower than the spatial
filtering model. For variable Ln(Green), Figure 5.2 shows a clear trend of parameters
getting higher moving in a westerly direction. The median value 0.056 is located
near the center of the study area, to the east of U.S. Highway 101. Most parameters
are significant; higher parameter values also have higher t-values. However, the test
for spatial variability of parameters indicates that such variation is only significant at
163
the 9% level (Table 5.4). This is not surprising considering the small coverage of the
study area.
Table 5.4 GWR parameter five-number summaries
Variables Min
Lower
Quartile
Median
Upper
Quartile
Max
Variation
Significance
Test
Scenario 1
Intercept 0.074 1.320 3.587 4.723 6.414 0.010 **
Ln(Structure) 0.324 0.403 0.432 0.493 0.629 0.720 n/s
Ln(LotSize) -0.007 0.087 0.141 0.201 0.294 0.520 n/s
Ln(RPdist) 0.020 0.094 0.204 0.435 0.584 0.000 ***
Ln(PKdist) -0.264 -0.134 -0.108 -0.067 0.006 0.040 *
Ln(Income) 0.035 0.149 0.266 0.302 0.523 0.010 **
Age -0.021 0.001 0.005 0.021 0.052 0.720 n/s
Age2 -0.0004 -0.0002 -0.0001 -0.00004 0.0001 0.700 n/s
Ln(Green) 0.016 0.032 0.056 0.088 0.139 0.090 n/s
Scenario 2
Intercept -12.783 -3.369 1.408 3.070 8.400 0.000 ***
Ln(Structure) -0.006 -0.0003 0.002 0.005 0.491 0.000 ***
Ln(LotSize) -0.280 0.268 0.448 0.711 1.229 0.000 ***
Ln(RPdist) -0.762 0.048 0.226 0.548 1.315 0.000 ***
Ln(PKdist) -0.186 -0.058 0.011 0.050 0.355 0.000 ***
Ln(Income) -0.215 0.014 0.082 0.200 0.614 0.000 ***
Age -0.029 0.006 0.017 0.028 0.060 0.280 n/s
Age2 -0.0004 -0.0002 -0.0001 -0.00003 0.0003 0.170 n/s
Ln(Green) -0.588 -0.087 0.037 0.168 0.602 0.000 ***
*** = significant at 0.1% level ** = significant at 1% level
* = significant at 5% level n/s = not significant
Scenario 2 improved fitness from the base model by 17.3%, which is the highest
among all the models, although only slightly higher than the spatial filtering model.
Figure 5.3 shows the spatial variation of Ln(Green) in scenario 2. There are higher
value pockets along the margins of Los Angeles County. The overall spatial variation
is significant with very low variance, as indicated by the Monte Carlo significance
test (Table 5.4). Interestingly, the areas with highest values are either relatively dense
urban areas (e.g. Hollywood, Miracle Mile), or suburbs with higher median property
Figure 5.2 GWR estimation of LnGreen coefficients, scenario 1
164
Figure 5.3 GWR estimation of LnGreen coefficients, scenario 2
165
166
values (e.g. Bel Air, Beverly Glen, Malibu, Rancho Palos Verdes), or suburbs
located in areas with higher topography (e.g. Woodland Hills, La Tuna Canyon,
Walnut, inland Long Beach areas). This could be due to interactions between
topography, elevated levels of remnant vegetation, and property values.
We also compared the individual parameters of common observation points between
the two scenarios, in order to investigate the robustness of GWR estimation at
different geographic scales. Figure 5.4 shows that no observation returned the same
parameters between the two datasets. Scenario 1 mostly returned greater values than
scenario 2; higher values also came with greater t-values. The value differences
exhibit a gradient, with the highest differences on the periphery of the study area,
which decrease when moving towards the center. This pattern suggests the existence
of edge effects. However, the differences at the center (in the sense of the point
distribution, not absolute geographic location), south of U.S. Highway 101, became
negative. It means that these points returned higher parameter values when GWR
was performed in a more heterogeneous context. Overall, these results suggest that
GWR is reasonably robust at different geographic scales, but we need to be careful in
interpreting the results.
Figure 5.4 The difference of LnGreen coefficient estimation (GWR) between
two scenarios
167
168
5.4 Discussion and Conclusions
In this chapter, we compared the performance of four commonly used spatial models,
in analyzing a local dataset (Chapter 3 data) and a regional dataset (i.e. the data from
Chapters 3 and 4 combined), with the former nested in the latter. Considering spatial
effects behave differently at different geographic scales, we discovered some
interesting performance results that indicate which spatial model best captures which
spatial effect in which scenario. We referred the local and relatively homogeneous
setting as “scenario 1” and the regional and more heterogeneous setting as “scenario
2”.
In scenario 1, the base OLS model’s adjusted R
2
criterion improved between
4.3% (spatial error model) and 8% (spatial filtering model) when spatial
heterogeneity and/or spatial dependence was included in the hedonic model. All
models produced the expected signs for the greenspace effect (Ln(Green)), although
this was not always the case for the other housing or neighborhood characteristics
variables. As expected, the coefficients of Ln(Green) in the spatial models are all
lower than that of the base model, after removing the inflation caused by spatial
autocorrelation. It is counter-intuitive, however, that their corresponding p-values all
increased, since we expect improved efficiency. The reason could be that spatial
dependence is caused by omitted variable(s) which often deflate variance.
Considering SAR estimates can still be biased in the presence of omitted variables, a
local modeling technique like GWR and the spatial filtering model (with its locally
169
fitted surface) should be more reliable. The spatial filtering model generated the
lowest p-value but not the lowest coefficient value, so we think this model best
captures the effect of neighborhood greenspace in this scenario.
In scenario 2, the adjusted R
2
improved from 3.6% (spatial expansion model)
to 17.3% (GWR). The Ln(Green) coefficient increased in the spatial error and spatial
filtering models (the spatial expansion model and GWR produce varied coefficients
across the study area, so it is harder to compare), and the corresponding p-value is
lower. In other words, the impact of neighborhood greenspace has been enhanced
and this estimation also has less variance, after removing spatial effects. This is very
different from the results of scenario 1, and is very likely a result of the higher level
of spatial heterogeneity present in scenario 2. Removing such heterogeneity with
spatial filters can reduce noise and thus improve coefficient estimation. The spatial
error model, on the other hand, probably reduced a certain fraction of the spatial
autocorrelation caused by spatial heterogeneity, and therefore increased the
coefficient as well.
In terms of reducing spatial autocorrelation, all spatial models except the
spatial expansion model performed well in both scenarios. GWR appears to be most
effective without overcorrection (lowest positive Moran’s I, insignificant). The
spatial error model also reduced the Moran’s I statistic substantially (even higher p-
value than GWR in scenario 2), but tends to overcorrect spatial autocorrelation. The
spatial filtering model can reduce spatial autocorrelation effectively as well, but one
has to use the appropriate moving window size to achieve such a result.
170
Comparing adjusted R
2
and AIC values between the two scenarios, we found
scenario 2 gained much greater improvement in terms of model fitness than scenario
1, after applying the same spatial models (with the exception of the spatial expansion
model). Removing spatial heterogeneity turns out to be more rewarding than
removing spatial dependence in this case. Among them, the two local estimation
methods, spatial filtering and GWR, produced the highest goodness-of-fit for both
scenarios. This is probably due to their capability of addressing both spatial
dependence and spatial heterogeneity via local fitting. Although also categorized as a
local statistical method, the spatial expansion model did not achieve as high
goodness-of-fit results as the above two methods. It actually generated the worst fit
among all spatial models in scenario 2. The possible reason is that it is fitted globally
even though it allows parameters to vary across the study area. Interestingly, it seems
that although the spatial expansion model captures insufficient spatial dependence in
a relatively uniform area (scenario 1), the fraction that is captured can still
substantially improve model fitness. The spatial error model performed reasonably
well and produced stable but mediocre results for both scenarios.
Overall, the two local spatial modeling techniques exhibited more advantages
in capturing spatial effects and enhancing model performance. But our results are
limited to our data conditions and can by no means be considered exhaustive. There
are quite a few other spatial modeling approaches now being used by
econometricians and statisticians. For example, Day et al. (2007) controlled spatial
effects by delineating submarkets with clustering techniques, when assessing values
171
of peace and quiet. Redfearn (2009) used a locally weighted regression to vary the
implicit price of proximity to light rail stations. Fotheringham et al. (2002) proposed
a spatially autocorrelated GWR, which combines GWR and SAR. There is much to
be done in terms of future research. More empirical hedonic analyses applying
various spatial models are needed to enrich the literature. We will further our study
with different empirical data and different modeling techniques.
172
Chapter 6: Concluding Remarks
6.1 Overview
This dissertation has examined the application of spatial analysis techniques to
hedonic models assessing urban open space, particularly neighborhood greenspace. I
noted at the beginning of the dissertation that rapid urbanization is driving a chronic
greenspace shortage in many of the world’s cities and that a novel and more
empirically-grounded analytical support may be required to ameliorate this problem.
But we currently lack a good understanding of how best to value urban greenspace:
hedonic models would appear to offer us a solution in this regard, but we have a poor
understanding of their weaknesses in addressing spatial effects and the impacts these
shortcomings have on model estimation. This dissertation has sought to redress this
knowledge gap.
In the second chapter of the dissertation, I reviewed how existing studies
have incorporated spatial characteristics when estimating open space values in
hedonic models. I found that most studies have simply interpreted spatial
characteristics in terms of the distances from open space to properties, its view from
properties, and its size. Remarkably, spatial effects, namely spatial heterogeneity and
spatial dependence, have been overlooked. While several open space studies have
addressed this issue by applying some spatial modeling techniques, the limited
number and scope of such studies has been insufficient for us to properly understand
the mechanics of spatial modeling techniques as applied to open space valuation.
173
This is a clear knowledge gap that demands attention, if we are to stop the ongoing
inefficient or even biased estimation of greenspace values in such models due to
spatial effects. In addition, considerable advances in the fields of spatial
econometrics, spatial statistics, and GIS have offered us a range of new techniques.
These advances provide suits of tools for filling the aforementioned knowledge gap.
This dissertation has taken an important step toward this direction.
In Chapter 3, I evaluated the influence of neighborhood greenspace on
residential properties in a dense inner-city neighborhood of Los Angeles City. I ran
both a standard hedonic model and a spatial lag model controlling spatial
autocorrelation. The results from the spatial lag model confirmed the significant
positive impact of immediate neighborhood greenspace, even after removing the
inflation of the coefficient in the standard hedonic model. Specifically, the amount of
greenspace within the 200 to 300 feet ring from a property can increase property
value by 0.07% when greenspace area is increased by 1%. This increment is equal to
$171 of the median price. The increment is enlarged to $2,565 with a possible
maximum increase of 15% greenspace in this neighborhood. Although this may
seem small, the collective value of the increment – a premium worth about $783,750
(based on the median house price and a 20% transaction rate per year) – can generate
about $146,575 additional tax revenue to a municipality over a 10 year period, which
can be used as incentives (e.g. subsidies for tree planting) to effectively mobilize
property owners to enhance their greenspace. This is a way to kick-start ‘a virtuous
174
cycle’ of continuous greening, what Birkeland (2008) termed ‘positive
development’.
In Chapter 4, I extended this type of analysis to cover most of Los Angeles
County, where the structure of spatial effects is more complex. I tested seven spatial
models belonging to four categories: spatial expansion models, spatial regressions
from the econometric approach, spatial filtering models, and geographically
weighted regression.
The estimates from the base model and the various spatial models confirmed
the significant positive impact of neighborhood greenspace on property values: every
1% increase of neighborhood greenspace area brings increases in property values
ranging from 0.02% to 0.16% (global estimation, not considering the variation
among individual observations in local estimations). In the median case, this adds
$125 – $1,000 to property values. The local estimates produced with methods such
as GWR showed that this effect could vary from negative impacts (i.e. an increase in
greenspace reduces property values) to positive impacts as high as $1,242.
Both spatial heterogeneity and spatial dependence proved to be significant.
All spatial models captured certain spatial effect(s) effectively, and produced higher
goodness-of-fit than the base model. The rank of model fitness, from high to low, is
GWR, spatial lag model (using neighbors within 4,000 m), quadratic spatial
expansion, spatial lag model (using three nearest neighbors,) linear spatial expansion,
spatial filtering model (fitted with natural spline), and a spatial filtering model (fitted
with LOESS). The same rank does not always apply to spatial autocorrelation
175
statistics values (i.e. either Moran’s I or the LM test). For example, a spatial filtering
model fitted with a natural spline removed spatial autocorrelation thoroughly, even
with a little overcorrection (negative Moran’s I value), but its fitness level was near
the low end. GWR returned the highest model fitness with the lowest positive
Moran’s I, due to its ability to estimate the model coefficients separately for each
location.
Spatial expansion models successfully captured spatial heterogeneity, but still
left significant spatial autocorrelation in the residuals, which suggests that the spatial
pattern is more complex than what a spatial expansion model can account for. Spatial
regressions targeting spatial dependence should be considered. Thus, combining a
spatial expansion model with a spatial lag regression turned out to perform better
than either individual model. However, the resulting Moran’s I is not consistently
insignificant. This suggests potential conflicts between the delineation of the two
spatial effects. In other words, they are deeply intertwined.
For a complex real estate market like that occurring in Los Angeles County,
where a great number of local submarkets exist but limited information on their
individual behaviors are available, it is necessary to employ spatial analysis
techniques (and local statistics in particular) to model both spatial heterogeneity and
spatial dependence. Failure to do so may result in a loss of explanatory power.
In Chapter 5, I wanted to back up the observations in Chapter 4 with a further
performance comparison of the aforementioned spatial models. So I reused the
datasets from the previous two chapters after reconfiguring them to make the
176
variables consistent. Data from Chapter 3 is at a local, relatively homogeneous scale
or spatial extent (scenario 1), whereas data from Chapter 4 is at a regional, relatively
heterogeneous scale or spatial extent (scenario 2). Comparing model performances
across the two scales is likely to help clarify the strengths / weaknesses of these
spatial modeling approaches in controlling different spatial effects. I found that
GWR and the spatial filtering model perform very closely in improving model
goodness-of-fit (i.e. adjusted R
2
), with spatial filtering performing best in scenario 1
and GWR performing best in scenario 2. In this regard, the spatial expansion and
spatial error models did not perform as well. The spatial expansion model worked
better than the spatial error model (SER) in scenario 1, meaning a locally fitted
coordinate surface is more effective in enhancing model fitness than a globally fitted
autoregressive error term in a relatively stationary area. The performance was flipped
in the regional (i.e. non-stationary) environment (scenario 2).
As to reducing spatial autocorrelation in residuals, GWR always achieved the
lowest Moran’s I statistics without overcorrection, although not always with the
lowest estimation variance (i.e. highest p-value). The spatial filtering model did not
reduce Moran’s I as much, but it has the potential to improve model performance so
long as the optimum window size is chosen. SER tends to easily overcorrect, yet the
spatial expansion model can only slightly lower the Morans’s I value, leaving
significant spatial autocorrelation in the residuals.
Overall, local models like GWR and the spatial filtering model (with a locally
fitted x,y coordinate surface) appeared to offer more reliable performance over the
177
two geographic scales, which means they are more robust over a range of different
spatial effects. SER is also robust in both scenarios as a globally estimated model,
notwithstanding the generally mediocre results achieved with this approach. The
spatial expansion model is not as robust, but performed better in a local (i.e.
relatively stationary) environment.
6.2 Continuing Research
There are some unresolved issues with the previous analyses. For example, even
though GWR produced the highest model fitness, it still cannot explain some spatial
patterns and occasionally relies on some counter-intuitive coefficients. Maybe the
GWR estimates are affected by potential omitted variables and statistical
misspecifications. Future research should gather more information from other
perspectives (e.g. school districts, crime rates) for the study area that could be
overlaid with the GWR maps and used to help construct possible hypotheses. We
might try a similar approach to explain the spatial pattern of the varying coefficients
from the spatial expansion model. What is clear is that some spatial patterns have no
clear explanation, and additional work is needed to identify the underlying reason(s).
This dissertation was focused on four commonly used spatial models.
However, there are many other available spatial modeling techniques from the
spatial econometrics and spatial statistics literatures, such as submarket delineation,
locally weighted regression via other econometric techniques, multilevel modeling,
and future research should extend the analysis to these models as well.
178
Besides the above methodological improvements, future research should also
explore ways to incorporate the ecosystem value of neighborhood-scale greenspace
(e.g. nature’s services) into studies that estimate the value of open space and
neighborhood greening. Adding this together with the capitalized value in properties
would take us one step closer to the most accurate estimation of neighborhood
greenspace and provide additional support to decision makers and others who wish
to expand urban open space.
179
Bibliography
Acharya, G., Bennett, L. L. (2001). Valuing open space and land-use patterns in
urban watersheds. Journal of Real Estate Finance and Economics, 22, 221-37.
Akbari, H., Pomerantz, M., Taha, H. (2001). Cool surfaces and shade trees to reduce
energy use and improve air quality in urban areas. Solar Energy 70, 295-310.
Allen, J. P., Turner, E. (2002). Changing Faces, Changing Places: Mapping
Southern California. Los Angeles: Center for Geographical Studies, California State
University, Northridge.
Anderson, L. M., Cordell, H. K. (1985). Residential property values improved by
landscaping with trees. Southern Journal of Applied Forestry, 9, 162-166.
Anderson, L. M., Cordell, H. K. (1988). Influence of trees on residential property
values in Athens, Georgia (USA): A survey based on actual sales prices. Landscape
and Urban Planning, 12, 153-164.
Anderson, S. T., West, S. E. (2006). Open space, residential property values, and
spatial context. Regional Science and Urban Economics, 36, 773-789.
Anselin, L. (1988a). Lagrange multiplier test diagnostics for spatial dependence and
spatial heterogeneity. Geographical Analysis, 20, 1-17.
Anselin, L. (1988b). Spatial Econometrics: Methods and Models. Dordrecht,
Netherlands: Kluwer Academic.
Anselin, L. (1992). SpaceStat Software for Spatial Data Analysis. Santa Barbara,
California: National Center for Geographical Information and Analysis, University
of California.
Anselin, L. (1998). GIS research infrastructure for spatial analysis of real estate
markets. Journal of Housing Research, 9, 113-133.
Anselin, L. (2001a). Rao's score test in spatial econometrics. Journal of Statistical
Planning and Inference, 97, 113-139.
Anselin, L. (2001b). Spatial effects in econometric practice in environmental and
resource economics. American Journal of Agricultural Economics, 83, 705-710.
180
Anselin, L. (2002). Under the hood: Issues in the specification and interpretation of
spatial regression models. Agricultural Economics, 27, 247-267.
Anselin, L., Bera, A. K., Florax, R. J. G., Yoon, M. J. (1996). Simple diagnostic tests
for spatial dependence. Regional Science and Urban Economics, 26, 77-104.
Anselin, L., Florax, R. J. G., Rey, S. J. (Eds.) (2004). Advances in Spatial
Econometrics. Berlin: Springer-Verlag.
Anselin, L., Griffith, D. A. (1988). Do spatial effects really matter in regression
analysis? Papers of the Regional Science Association, 65, 11-34.
Anselin, L., Rey, S. J. (1991). Properties of tests for spatial dependence in linear
regression models. Geographical Analysis, 23, 110-131.
Anselin, L., Syabri, I. (2003). GeoDa: Software for Exploratory Spatial Data
Analysis. Urbana-Champaign, IL: Spatial Analysis Laboratory, Department of
Agricultural and Consumer Economics, University of Illinois.
Arguea, N. M., Hsiao, C. (1993). Econometric issues of estimating hedonic price
functions: With an application to the U. S. market for automobiles. Journal of
Econometrics, 56, 243-267.
Bailey, T. C., Gatrell, A. C. (1995). Interactive Spatial Data Analysis. Harlow,
Essex, England: Pearson Education.
Bastian, C. T., McLeod, D. M., Germino, M. J., Reiners, W. A., Blasko, B. J. (2002).
Environmental amenities and agricultural land values: A hedonic model using
geographic information systems data. Ecological Economics, 40, 337-349.
Basu, S., Thibodeau, T. G. (1998). Analysis of spatial autocorrelation in house
prices. Journal of Real Estate Finance and Economics, 17, 61-85.
Bateman, I. J., Jones, A. P., Lovett, A. A., Lake, I. R., Day, B. H. (2002). Applying
Geographic Information Systems to environmental and resource economics.
Environmental and Resource Economics, 22, 219-269.
Baum-Snow, N., Kahn, M. E. (2000). The effects of new public projects to expand
urban rail transit. Journal of Public Economics, 77, 241-263.
Bell, K. P., Bockstael, N. E. (2000). Applying the generalized-moments estimation
approach to spatial problems involving microlevel data. Review of Economics and
Statistics, 82, 72-82.
181
Benson, E. D., Hansen, J. L., Schwartz Jr., A. L., Smersh, G. T. (1998). Pricing
residential amenities: The value of a view. Journal of Real Estate Finance and
Economics, 16, 55-73.
Bera, A. K., Yoon, M. (1993). Specification testing with locally misspecified
alternatives. Econometric Theory, 9, 649-658.
Beron, K. J., Hanson, Y., Murdoch, J. C., Thayer, M. A. (2004). Hedonic price
function and spatial dependence: implications for the demand for urban air quality.
In L. Anselin, R. J. G. M. Florax, S. J. Rey (Eds.), Advances in Spatial
Econometrics: Methodology, Tools and Application (pp. 267-281). Berlin: Springer.
Bin, O., Polasky, S. (2002). Valuing Coastal Wetlands: A Hedonic Property Price
Approach. Greenville, NC: Department of Economics, East Carolina University.
Bin, O., Polasky, S. (2003). Valuing Inland and Coastal Wetlands in a Rural Setting
Using Parametric and Semi-parametric Hedonic Models. Greenville, NC:
Department of Economics, East Carolina University.
Birkeland, J. (2008). Postive Development: From Vicious Circles to Virtuous Cycles
through Built Environment Design. London: Earthscan.
Blomquist, G., Worley, L. (1981). Hedonic prices, demand for urban housing
amenities, and benefit estimates. Journal of Urban Economics, 9, 212-221.
Bockstael, N., Bell, K. (1998). Land-use patterns and water quality: The effect of
differential land management controls. In R. Just, S. Netanyahu (Eds.), Conflict and
Cooperation on Trans-Boundary Water Resources (pp. 169-192). Boston, MA:
Kluwer Academic Publishers.
Bolitzer, B., Netusil, N. R. (2000). The impact of open spaces on property values in
Portland, Oregon. Journal of Environmental Management, 59, 185-193.
Bowe, D. R., Ihlanfeldt, K. R. (2001). Identifying the impacts of rail transit stations
on residential property values. Journal of Urban Economics, 50, 1-25.
Brown, G. M., Pollakowski, H. O. (1977). Economic valuation of shoreline. The
Review of Economics and Statistics, 59, 272-278.
Brown, T. L., Connelly, N. A. (1983). State Parks and Residential Property Values
in New York. Ithaca, New York: Unpublished paper. Department of Natural
Resources, Cornell University.
182
Brunsdon, C. F., Fotheringham, A. S., Charlton, M. E. (1996). Geographically
weighted regression: A method for exploring spatial nonstationarity. Geographical
Analysis, 28, 281-298.
Byrne, J., Kendrick, M., Sroaf, D. (2007). The park made of oil: Towards a historical
political ecology of the Kenneth Hahn State Recreation Area. Local Environment,
12, 153-181.
Calderón, G. F.-A. (2009). Spatial regression analysis vs. kriging methods for spatial
estimation. International Advances in Economic Research, 15, 44-58.
Cameron, T. A. (2006). Directional heterogeneity in distance profiles in hedonic
property value models. Journal of Environmental Economics and Management, 51,
26-45.
Can, A. (1990). The measurement of neighborhood dynamics in urban house prices.
Economic Geography, 66, 254-272.
Can, A. (1992). Specification and estimation of hedonic housing price models.
Regional Science and Urban Economics, 22, 453-474.
Can, A., Megbolugbe, I. F. (1997). Spatial dependence and housing price index
construction. Journal of Real Estate Finance and Economics, 14, 203-222.
Casetti, E. (1972). Generating models by the expansion method: Applications to
geographical research. Geographical Analysis, 4, 81-91.
Casetti, E. (1992). The dual expansion method: An application for evaluating the
effect of population growth on development. In E. Casetti, J. P. Jones (Eds.),
Applications of the Expansion Methods (pp. 10-41). London: Routledge.
Casetti, E. (1997). The expansion method, mathematical modeling, and spatial
econometrics. International Regional Science Review, 20, 9-33.
Cleveland, W. S., Devlin, S. J. (1988). Locally weighted regression: An approach to
regression analysis by local fitting. Journal of the American Statistical Association,
83, 596-610.
Cliff, A., Ord, J. (1973). Spatial Autocorrelation. London: Pion.
Cliff, A., Ord, J. (1981). Spatial Processes, Models and Applications. London: Pion.
183
Conway, D., Li, C. Q., Wolch, J. R., Kahle, C., Jerrett, M. (2008). A spatial
autocorrelation approach for examining the effects of urban greenspace on
residential property values. Journal of Real Estate Finance and Economics,
DOI10.1007/s11146-008-9159-6.
Correll, M. R., Lillydahl, J. H., Singell, L. D. (1978). The effect of greenbelts on
residential property values: Some findings on the political economy of open space.
Land Economics, 54, 207-217.
Crompton, J. L. (2001). The impact of parks on property values: A review of the
empirical evidence. Journal of Leisure Research, 33, 1-31.
Crompton, J. L. (2004). The Proximate Principle: The Impact of Parks and Open
Space on Property Values and the Property Tax Base. Ashburn, Virginia: National
Recreation and Park Association.
Cropper, M. L., Deck, L. B., McConnell, K. E. (1988). On the choice of functional
form for hedonic price functions. Review of Economics and Statistics, 70, 668-675.
Daily, G. C. (1997). Nature's Service: Societal Dependence on Natural Ecosystems.
Washington D. C: Island Press.
Day, B., Bateman, I. J., Lake, I. (2007). Beyond implicit prices: Recovering
theoretically consistent and transferrable values for noise avoidance from a hedonic
property price model. Environment and Resource Economics, 37, 211-232.
de Graaff, T., Florax, R. J. G. M., Nijkamp, P., Reggiani, A. (2001). A general
misspecification test for spatial regression models: Dependence, heterogeneity, and
nonlinearity. Journal of Regional Science, 41, 255-276.
de Smith, M. J., Goodchild, M. F., Longley, P. A. (2007). Geospatial Analysis: A
Comprehensive Guide to Principles, Techniques and Software Tools. Leicester, UK:
Troubador Publishing.
Des Rosiers, F., Thériault, M., Kestens, Y., Villeneuve, P. (2002). Landscaping and
house values: An empirical investigation. Journal of Real Estate Research, 23, 139-
161.
Diamond, N. K., Standiford, R. B., Passof, P. C., LeBlanc, J. (1987). Oak trees have
varied effect on land values. California Agriculture, 41, 4-6.
Din, A., Hoesli, M., Bender, A. (2001). Environmental variables and real estate
prices. Urban Studies, 38, 1989-2000.
184
Ding, Y., Fotheringham, A. S. (1991). The Integration of Spatial Analysis and GIS:
The Development of the STATCAS Module for ARC/INFO. Buffalo, NY: State
University of New York.
Do, A. Q., Grudnitski, G. (1995). Golf courses and residential house prices: An
empirical examination. Journal of Real Estate Finance and Economics, 10, 261-270.
Dombrow, J., Rodriguez, M., Sirmans, C. F. (2000). The market value of mature
trees in single-family housing markets. The Appraisal Journal, 68, 39-43.
Dubin, R. A. (1988). Estimation of regression coefficients in the presence of
spatially autocorrelated error terms. The Review of Economics and Statistics, 70,
466-474.
Dubin, R. A. (1992). Spatial autocorrelation and neighborhood quality. Regional
Science and Urban Economics, 22, 433-452.
Dubin, R. A. (1998). Spatial autocorrelation: A primer. Journal of Housing
Economics, 7, 304-327.
Dubin, R. A., Sung, C. H. (1990). Specification of hedonic regressions: Non-nested
tests on measures of neighborhood quality. Journal of Urban Economics, 27, 97-110.
Dwyer, J. F., McPherson, E. G., Schroeder, H. W., Rowntree, R. W. (1992).
Assessing benefits and costs of the urban forest. Journal of Arboriculture, 18, 227-
234.
Espey, M., Owusu-Edusei, K. (2001). Neighborhood Parks and Residential Property
Values in Greenville, South Carolina. Clemson, South Carolina: Department of
Agricultural and Applied Economics, Clemson University.
Farber, S., Yeates, M. (2006). A comparison of localized regression models in a
hedonic house price context. Canadian Journal of Regional Science, 29, 405-420.
Florax, R. J. G. M., de Graaff, T. (2004). The performance of diagnostic tests for
spatial dependence in linear regression models: A meta-analysis of simulation
studies. In L. Anselin, R. J. G. M. Florax, S. J. Rey (Eds.), Advances in Spatial
Econometrics: Methodology, Tools, and Applications (pp. 29-65). Berlin: Springer.
Florax, R. J. G. M., Folmer, H., Rey, S. J. (2003). Specification searches in spatial
econometrics: The relevance of Hendry's methodology. Regional Science and Urban
Economics, 33, 557-579.
185
Fotheringham, A. S. (1997). Trends in quantitative methods I: Stressing the local.
Progress in Human Geography, 21, 88-96.
Fotheringham, A. S., Brunsdon, C. F., Charlton, M. E. (2002). Geographically
Weighted Regression: The Analysis of Spatially Varying Relationships. Chichester,
West Sussex; Hoboken, NJ: Wiley.
Fotheringham, A. S., Charlton, M. E. (1998). Geographically weighted regression: A
natural evolution of the expansion method for spatial data analysis. Environment and
Planning A, 30, 1905-1927.
Freeman, A. M., III (1979). Hedonic prices, property values and measuring
environmental benefits: A survey of the issues. Scandinavian Journal of Economics,
81, 154-173.
Gabriel, S., Wolch, J. (1984). Spillover effects of human service facilities in a
racially segmented housing market. Journal of Urban Economics, 11, 1-12.
Garrod, G. D., Willis, K. G. (1992a). The environmental economic impact of
woodland: A two-stage hedonic price model of the amenity value of forestry in
Britain. Applied Economics, 24, 715-728.
Garrod, G. D., Willis, K. G. (1992b). Valuing goods' characteristics: An application
of the hedonic price method to environmental attributes. Journal of Environmental
Management, 34, 59-76.
Gawande, K., Jenkins-Smith, H. (2001). Nuclear waste transport and residential
property values: Estimating the effects of perceived risks. Journal of Environmental
Economics and Management, 42, 207-233.
Gelfand, A. E., Ecker, M. D., Knight, J. R., Sirmans, C. F. (2004). The dynamics of
location in home price. Journal of Real Estate Finance and Economics, 29, 149-166.
Geoghegan, J. (2002). The value of open space in residential land use. Land Use
Policy, 19, 91-98.
Geoghegan, J., Lynch, L., Bucholtz, S. (2003). Capitalization of open space into
housing values and the residential property tax revenue impacts of agricultural
easement programs. Agricultural and Resource Economics Review, 32, 33-45.
Geoghegan, J., Wainger, L. A., Bockstael, N. E. (1997). Spatial landscape indices in
a hedonic framework: An ecological economics analysis using GIS. Ecological
Economics, 23, 251-264.
186
Getis, A. (1990). Screening for spatial dependence in regression analysis. Papers of
the Regional Science Association, 69, 69-81.
Getis, A. (1995). Spatial filtering in a regression framework: Examples using data on
urban crime, regional equality, and government expenditures. In L. Anselin, R. J. G.
M. Florax (Eds.), New Directions in Spatial Econometrics (pp. 172-188). Berlin:
Springer.
Getis, A., Griffith, D. A. (2002). Comparative spatial filtering in regression analysis.
Geographical Analysis, 34, 130-140.
Getis, A., Mur, J., Zoller, H. G. (Eds.) (2004). Spatial Econometrics and Spatial
Statistics. New York, NY: Palgrave MacMillan.
Getis, A., Ord, J. K. (1992). The analysis of spatial association by the use of distance
statistics. Geographical Analysis, 24, 189-206.
Gillen, K., Thibodeau, T. G., Wachter, S. (2001). Anisotropic autocorrelation in
house prices. Journal of Real Estate Finance and Economics, 23, 5-30.
Glejser, H. (1969). A new test for heteroscedasticity. Journal of the American
Statistical Association, 64, 316-323.
Goldfeld, S. M., Quandt, R. E. (1965). Some tests for homoscedasticity. Journal of
the American Statistical Association, 60, 539-547.
Goodchild, M. F., Haining, R., Wise, S., Arbia, G., Anselin, L., Bossard, E.,
Brunsdon, C. F., Diggle, P., Flowerdew, R., Green, M., Griffith, D. A., Hepple, L.,
Krug, T., Martin, R., Openshaw, S. (1992). Integrating GIS and spatial data analysis:
Problems and possibilities. International Journal of Geographical Information
Systems, 6, 407-423.
Goodman, A. C., Thibodeau, T. G. (1998). Housing market segmentation. Journal of
Housing Economics, 7, 121-143.
Goodman, A. C., Thibodeau, T. G. (2003). Housing market segmentation and
hedonic prediction accuracy. Journal of Housing Economics, 12, 181-201.
Gordon-Larsen, P., McMurray, R. G., Popkin, B. M. (2000). Determinants of
adolescent physical activity and inactivity patterns. Pediatrics, 105, 83-90.
Griffith, D. A. (1987). Spatial Autocorrelation: A Primer. Washington, D.C.:
Association of American Geographers.
187
Griffith, D. A. (1996). Spatial autocorrelation and eigenfunctions of the geographic
weighting matrix accompanying geo-referenced data. The Canadian Geographer, 40,
351-367.
Griffith, D. A. (2000). A linear regression solution to the spatial autocorrelation
problem. Journal of Geographical Systems, 2, 141-156.
Griffith, D. A. (2003). Spatial Autocorrelation and Spatial Filtering: Gaining
Understanding through Theory and Scientific Visualization. New York: Springer.
Griffith, D. A. (2005). A Comparison of six analytical disease mapping techniques as
applied to West Nile Virus in the coterminous United States. International Journal of
Health Geographics, 4, 18.
Haining, R. (2003). Spatial Data Analysis: Theory and Practice. Cambridge:
Cambridge University Press.
Halvorsen, R., Pollakowski, H. O. (1981). Choice of functional form for hedonic
price equations. Journal of Urban Economics, 10, 37-49.
Hammer, T. R., Coughlin, R. E., Horn, E. T., IV (1974). Research report: The effect
of a large park on real estate value. Journal of the American Institute of Planners, 40,
274-277.
Harnik, P. (2000). Inside City Parks. Washington, D. C.: Trust for Public Land and
the Urban Land Institute.
Hastie, T. J., Tibshirani, R. (1986). Generalized additive models (with discussion).
Statistical Science, 1, 297-318.
Herrick, C. (1939). The effects of parks upon land real estate values. The Planning
Journal, 5, 89-94.
Hastie, T. J., Tibshirani, R. (1987). Generalized additive models: Some applications.
Journal of the American Statistical Association, 82, 371-386.
Irwin, E. G. (2002). The effects of open space on residential property values. Land
Economics, 78, 465-480.
Irwin, E. G., Bockstael, N. E. (2001). The problem of identifying land use spillovers:
Measuring the effects of open space on residential property values. American
Journal of Agricultural Economics, 83, 698-704.
188
Jim, C. Y., Chen, W. Y. (2006). Impacts of urban environmental elements on
residential housing prices in Guangzhou (China). Landscape and Urban Planning,
78, 422-434.
Joassart-Marcelli, P., Wolch, J., Alonso, A., Sessoms, N. (2005). Spatial segregation
of the poor in southern California: A multidimensional analysis. Urban Geography,
26, 587-609.
Kaufman, D. A., Cloutier, N. R. (2006). The impact of small brownfields and
greenspaces on residential property values. Journal of Real Estate Finance and
Economics, 33, 19-30.
Kelejian, H. H., Prucha, I. R. (1998). A generalized spatial two-stage least squares
procedure for estimating a spatial autoregressive model with autoregressive
disturbances. Journal of Real Estate Finance and Economics, 17, 99-121.
Kelejian, H. H., Robinson, D. P. (1998). A suggested test for spatial autocorrelation
and/or heteroskedasticity and corresponding Monte Carlo results. Regional Science
and Urban Economics, 28, 389-417.
Kestens, Y., Thériault, M., Des Rosiers, F. (2004). The impact of surrounding land
use and vegetation on single-family house prices. Environment and Planning B:
Planning and Design, 31, 539-567.
Kestens, Y., Thériault, M., Des Rosiers, F. (2006). Heterogeneity in hedonic
modelling of house prices: Looking at buyers' household profiles. Journal of
Geographical Systems, 8, 61-96.
Kim, C. W., Philipps, T. T., Anselin, L. (2003). Measuring the benefits of air quality
improvement: A spatial hedonic approach. Journal of Environmental Economics and
Management, 45, 24-39.
King, D. A., White, J. L., Shaw, W. W. (1991). Influence of urban wildlife habitats
on the value of residential properties. In L. W. Adams, D. L. Leedy (Eds.), Wildlife
Conservation in Metropolitan Environments: A National Symposium on Urban
Wildlife (pp. 165-169). Columbia, MD.: National Institute for Urban Wildlife.
Knaap, G. J. (1985). The price effects of urban growth boundaries in Metropolitan
Portland, Oregon. Land Economics, 61, 26-35.
Lake, I. R., Lovett, A. A., Bateman, I. J., Langford, I. H. (1998). Modeling
environmental influences on property prices in an urban environment. Computers,
Environment, and Urban Systems, 22, 121-136.
189
Lambert, D. M., Lowenberg-Deboer, J., Bongiovanni, R. (2004). A comparison of
four spatial regression models for yield monitor data: A case study from Argentina.
Precision Agriculture, 5, 576-600.
Lancaster, K. J. (1966). A new approach to consumer theory. Journal of Political
Economy, 74, 132-157.
Le Goffe, P. (2000). Hedonic pricing of agriculture and forestry externalities.
Environmental and Resource Economics, 15, 397-401.
Leggett, C. G., Bockstael, N. E. (2000). Evidence of the effects of water quality on
residential land prices. Journal of Environmental Economics and Management, 39,
121-144.
Lesage, J. P. (2000). Bayesian estimation of limited dependent variable spatial
autoregressive models. Geographical Analysis, 32, 19-35.
Li, M. M., Brown, H. J. (1980). Micro-neighborhood externalities and hedonic
housing prices. Land Economics, 56, 125-141.
Lindsey, G., Man, J., Payton, S., Dickson, K. (2004). Property values, recreation
values and urban greenways. Journal of Parks and Recreation Administration, 22,
69-90.
Longcore, T., Li, C. Q., Wilson, J. P. (2003). Nature's services in a dense urban
neighborhood. Los Angeles, CA, University of Southern California, Center for
Sustainable Cities.
Longcore, T., Li, C. Q., Wilson, J. P. (2004). Application of CITYgreen urban
ecosystem analysis software to a densely built urban neighborhood. Urban
Geography, 25, 173-186.
Luley, C. J. (1998). The greening of urban air. Forum for Applied Research and
Public Policy 13, 33-35.
Luttik, J. (2000). The value of trees, water and open space as reflected by house
prices in the Netherlands. Landscape and Urban Planning, 48, 161-167.
Lutzenhiser, M., Netusil, N. R. (2001). The effect of open space on a home’s sale
price. Contemporary Economic Policy, 19, 291-298.
Maddison, D. (2000). A hedonic analysis of agricultural land prices in England and
Wales. European Review of Agricultural Economics, 27, 519-532.
190
Mahan, B. L., Polasky, S., Adams, R. M. (2000). Valuing urban wetlands: A
property pricing approach. Land Economics, 76, 100-113.
McConnell, V., Walls, M. (2005). The Value of Open Space: Evidence from Studies
of Nonmarket Benefits. Washington D.C.: Resources for the Future.
McPherson, E. G. (1992). Accounting for benefits and costs of urban greenspace.
Landscape and Urban Planning, 22, 41-51.
Militino, A. F., Ugarte, M. D., García-Reinaldos, L. (2004). Alternative models for
describing spatial dependence among dwelling selling prices. Journal of Real Estate
Finance and Economics, 29, 193-209.
Miron, J. (1984). Spatial autocorrelation in regression analysis: A beginner’s guide.
In G. L. Gaile, C. J. Willmott (Eds.), Spatial Statistics and Models (pp. 201-222).
Boston: D. Reidel Publishing.
Mooney, S., Eisgruber, L. M. (2001). The influence of riparian protection measures
on residential property values: The case of the Oregon Plan for salmon and
watersheds. Journal of Real Estate Finance and Economics, 22, 273-286.
Morales, D., Boyce, B. N., Favretti, R. J. (1976). The contribution of trees to
residential property value: Manchester, Connecticut. Valuation, 23, 26-43.
Morales, D., Micha, F. R., Weber, R. C. (1983). Two methods of evaluating trees on
residential site. Journal of Arboriculture, 9, 21-24.
Morancho, A. B. (2003). A hedonic valuation of urban green areas. Landscape and
Urban Planning, 66, 35-41.
Nelson, A. C. (1986). Using land markets to evaluate urban containment programs.
Journal of the American Planning Association, 52, 156-171.
Nicholls, S. (2002). Does Open Space Pay? Measuring the Impacts of Green Spaces
on Property Values and the Property Tax Base. Ph.D. Dissertation. College Station,
Texas: Texas A&M University.
Nowak, D. J., Civerolo, K. L., Rao, S. T., Sistla, G., Luley, C. J., Crane, D. E.
(2000). A modeling study of the impact of urban trees on ozone. Atmospheric
Environment, 34, 1601-1613.
Odland, J. (1988). Spatial Autocorrelation. Newbury Park, California: Sage
Publications.
191
Openshaw, S., Brunsdon, C. F., Charlton, M. E. (1991). A spatial analysis toolkit for
GIS. In Proceeding of the Second European Conference on Geographic Information
Systems, Brussels, Belgium.
Orford, S. (2002). Valuing locational externalities: A GIS and multilevel modeling
approach. Environment and Planning B: Planning and Design, 29, 105-127.
Pace, R. K., Barry, R. F. (1997). Quick computation of regressions with a spatial
autoregressive dependent variable. Geographical Analysis, 29, 232-247.
Pace, R. K., Barry, R. F., Clapp, J. M., Rodriquez, M. (1998). Spatiotemporal
autoregressive models of neighborhood effects. Journal of Real Estate Finance and
Economics, 17, 15-33.
Pace, R. K., Gilley, O. W. (1997). Using the spatial configuration of the data to
improve estimation. Journal of Real Estate Finance and Economics, 14, 333-340.
Pace, R. K., Lesage, J. P. (2004). Spatial autoregressive local estimation. In A. Getis,
J. Mur, H. G. Zoller (Eds.), Spatial Econometrics and Spatial Statistics (pp. 31-51).
New York: Palgrave McMillion.
Pàez, A., Long, F., Farber, S. (2008). Moving window approaches for hedonic price
estimation: An empirical comparison of modeling techniques. Urban Studies, 45,
1565-1581.
Palmquist, R. B. (2005). Property value models. In K.-G. Mäler, J. R. Vicent (Eds.),
Handbook of Environmental Economics, Volume 2: Valuing Environmental Changes
(pp. 764-820). North Holland: Elsevier.
Paterson, R. W., Boyle, K. J. (2002). Out of sight, out of mind? Using GIS to
incorporate visibility in hedonic property value models. Land Economics, 78, 417-
425.
Patton, M., McErlean, S. (2003). Spatial effects within the agricultural land market
in Northern Ireland. Journal of Agricultural Economics, 54, 35-54.
Pebesma, E., Wesseling, C. G. (1998). Gstat: A program for geostatistical modeling,
prediction, and simulation. Computers and Geosciences, 24, 17-31.
Peiser, R., Schwann, G. (1993). The private value of public open space within
subdivisions. Journal of Architectural and Planning Research, 10, 91-104.
Pendleton, L., Shonkwiler, J. S. (2001). Valuing bundled attributes: A latent variable
approach. Land Economics, 77, 118-129.
192
Pogodzinski, J. M., Sass, T. R. (1991). Measuring the effects of municipal zoning
regulations: A survey. Urban Studies, 28, 597-621.
Powe, N. A., Garrod, G. D., Brunston, C. F., Willis, K. G. (1997). Using a
Geographic Information System to estimate an hedonic model of the benefits of
woodland access. Forestry, 70, 139-150.
Ready, R., Abdalla, C. (2003). The Impact of Open Space and Potential Local
Disamenities on Residential Property Values in Berks County, Pennsylvania.
University Park, PA: Department of Agricultural Economics and Rural Sociology,
Pennsylvania State University.
Redfearn, C. L. (2009). How informative are average effects? Hedonic regression
and amenity capitalization in complex urban housing markets. Regional Science and
Urban Economics, 39, 297-306.
Ridker, R. G., Henning, J. A. (1967). The determinants of residential property values
with special reference to air pollution. The Review of Economics and Statistics, 49,
246-257.
Rosen, S. (1974). Hedonic price and implicit markets: Product differentiation in pure
competition. Journal of Political Economy, 82, 34-55.
Scott, K. I., Simpson, J. R., McPherson, E. G. (1999). Effects of tree cover on
parking lot microclimate and vehicle emissions. Journal of Arboriculture, 25, 129-
142.
Shultz, S. D., King, D. A. (2001). The use of census data for hedonic price estimates
of open space amenities and land use. Journal of Real Estate Finance and
Economics, 22, 239-252.
Standiford, R. B., Vreeland, J., Tietje, B. (2001). California’s Hardwood
Rangelands: Production and Conservation Values. Berkley, California: University
of California Integrated Hardwood Range Management Program, UC Berkeley.
Taylor, L. O. (2003). The hedonic method. In P. A. Champ, K. J. Boyle, T. C. Brown
(Eds.), A Primer on Nonmarket Valuation (pp. 331-393). Dordrecht: Kluwer
Academic Publisher.
Theebe, M. A. J. (2004). Planes, trains, and automobiles: the impact of traffic noise
on house prices. Journal of Real Estate Finance and Economics, 28, 209-234.
Thibodeau, F. R., Ostro, B. D. (1981). An economics analysis of wetland protection.
Journal of Environmental Management, 12, 19-30.
193
Thorsnes, P. (2002). The value of a suburban forest preserve: Estimates from sales of
vacant residential building lots. Land Economics, 78, 426-441.
Tiefelsdorf, M., Griffith, D. A. (2007). Semi-parametric filtering of spatial
autocorrelation: The eigenvector approach. Environment and Planning A, 39, 1193-
1221.
Troy, A., Grove, J. M. (2008). Property values, parks, and crime: A hedonic analysis
in Baltimore, MD. Landscape and Urban Planning, 87, 233-245.
Tyrväinen, L. (1997). The amenity value of the urban forest: An application of the
hedonic price method. Landscape and Urban Planning, 37, 211-222.
Tyrväinen, L., Miettinen, A. (2000). Property prices and urban forest amenities.
Journal of Environmental Economics and Management, 39, 205-223.
Voicu, I., Been, V. (2008). The effect of community gardens on neighboring
property values. Real Estate Economics, 36, 241-283.
Vrooman, D. H. (1978). An empirical analysis of determinants of land values in the
Adirondack Park. American Journal of Economics and Sociology, 37, 165-177.
Waller, L. A., Zhu, L., Gotway, C. A., Gorman, D. M., Gruenewald, P. J. (2007).
Quantifying geographic variations in associations between alcohol distribution and
violence: A comparison of geographically weighted regression and spatially varying
coefficient models. Stochastic Environmental Research and Risk Assessment, 21,
573-588.
Weicher, J. C., Zerbst, R. H. (1973). The externalities of neighborhood parks: An
empirical investigation. Land Economics, 49, 99-105.
Wilhelmsson, M. (2002). Spatial models in real estate economics. Housing, Theory
and Society, 19, 92-101.
Wise, S., Haining, R., Ma, J.-S. (2001). Providing spatial statistical data analysis
functionality for the GIS user: The SAGE project. International Journal of
Geographical Information Science, 15, 239-254.
Wolch, J., Gabriel, S. (1981). Local land development policies and urban housing
prices. Environment and Planning A, 13, 1253-1276.
194
Wolch, J., Wilson, J., Fehrenbach, J. (2002). Parks and Park Funding in Los
Angeles: An Equity Mapping Analysis. Los Angeles, CA: University of Southern
California Center for Sustainable Cities.
Yu, D. (2004). Modeling housing market dynamics in the city of Milwaukee: A
geographically weighted regression approach. In Proceedings of the University
Consortium for Geographic Information Science Assembly, Adelphi, Maryland.
Abstract (if available)
Abstract
The value of various types of open space has been widely assessed by environmental economists as ‘capitalized value’ in properties. But urban greenspace – neighborhood greenspace in particular – has attracted limited attention. A hedonic model is often employed to assess such values, but consideration for spatial effects is often absent when it comes to open space studies. The literatures of spatial econometrics and spatial statistics have made considerable advances in incorporating spatial effects into hedonic models. This dissertation examines the application of some of these advances on hedonic models with neighborhood greenspace characteristics, using GIS as a platform to assist with these spatial analyses. This analytical vantage point helps to achieve the accurate estimation of the value of neighborhood greenspace, and also helps to further the understanding of these spatial models’ performance in obtaining such estimate.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Redlining revisited: spatial dependence and neighborhood effects in mortgage lending
PDF
From structure to agency: Essays on the spatial analysis of residential segregation
PDF
Location choice and the costs of climate change
PDF
Investigating the association of historical preservation and neighborhood status in Detroit, 1970-2015
PDF
Optical properties of urban runoff and its effect on the coastal phytoplankton community
Asset Metadata
Creator
Li, Qi Christina
(author)
Core Title
Neighborhood greenspace's impact on residential property values: understanding the role of spatial effects
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Geography
Publication Date
03/02/2010
Defense Date
12/04/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
GIS,hedonic pricing model,neighborhood greenspace,OAI-PMH Harvest,property values,spatial effects,urban open space
Place Name
California
(states),
Los Angeles
(counties)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Wilson, John P. (
committee chair
), Curtis, Andrew J. (
committee member
), Redfearn, Christian L. (
committee member
)
Creator Email
lhq0837@yahoo.com,qli@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2861
Unique identifier
UC1217451
Identifier
etd-Li-3423 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-295870 (legacy record id),usctheses-m2861 (legacy record id)
Legacy Identifier
etd-Li-3423.pdf
Dmrecord
295870
Document Type
Dissertation
Rights
Li, Qi Christina
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
GIS
hedonic pricing model
neighborhood greenspace
property values
spatial effects
urban open space