Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Essays on price determinants in the Los Angeles housing market
(USC Thesis Other)
Essays on price determinants in the Los Angeles housing market
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Essays on Price Determinants in the Los Angeles Housing Market
by
Shichen Wang
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulllment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(Economics)
May 2020
Copyright 2020 Shichen Wang
Dedication
This dissertation is dedicated . . .
To my grandmother.
ii
Acknowledgements
I would like to express my deepest appreciation to my advisor and committee chair, Professor
Geert Ridder, for the valuable guidance and feedback throughout the project. He has always been
approachable and supportive in the last few years. It would not have been possible to complete
this dissertation without his mentoring.
I would like to extend my sincere thanks to my committee members, Andrii Parkhomenko and
Fanny Camara. This dissertation beneted greatly from their insightful suggestions to nd new
evidence and think through a dierent aspect of the problem.
I would like to thank Guofu Tan and Matthew Khan, who served on my qualifying committee.
I am also grateful to Jonathan Libgober, who has kindly spent time discussing the project and
provided feedback.
I would like to thank all of my colleagues at USC, especially Jisu Cao and Yu Cao, for their
encouragement and patience throughout the duration of this project. I would not have made it
this far without the support from my friends, Ji Li and Siqi Wu.
I very much appreciate the support from the economics department stas Young Miller, Mor-
gan Ponder, and Alexander Karnazes. Thank you for organizing everything so well.
Lastly but most importantly, I would like to thank my anc e, Nian Liu. Nian has always been
there when I need him. I cannot count the times when his unconditional love and belief save me
from my uncertainty about life. Thank you for the constant encouragement during the challenges
of graduate school. I'm truly thankful to have you in my life. I am also very grateful to my
parents and my grandfather, who raised me to be the person I am with their respect and care.
iii
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables vi
List of Figures vii
Abstract viii
Chapter 1: List Price as a Strategic Determinant of Sales Price 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 The Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.1 Realization of Transaction Price . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1.1 1-Visitor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1.2 n 2 visitors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1.3 Summary of possible outcomes . . . . . . . . . . . . . . . . . . . . 10
1.3.2 Buyer's Visiting Decision . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.2.1 In
uence of bargaining power . . . . . . . . . . . . . . . . . . . . . 12
1.3.2.2 In
uence of listing price . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.3 Seller's Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.4 Empirical Evidence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4.2 Regression Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.4.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4.3.1 Potential buyers on absolute list price . . . . . . . . . . . . . . . . 17
1.4.3.2 Potential buyers on listing low . . . . . . . . . . . . . . . . . . . . 19
1.4.3.3 Sale price on number of potential buyers . . . . . . . . . . . . . . 19
1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
Chapter 2: Heterogeneity in Price Determinants Across Areas 26
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 The Conditional Inference Tree Algorithm . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.1 Theoretical foundation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.3.2 The CTREE algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.3.2.1 Traditional decision tree approach . . . . . . . . . . . . . . . . . . 33
2.3.2.2 Conditional inference trees . . . . . . . . . . . . . . . . . . . . . . 34
2.4 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
iv
2.5 Results in the LA Housing Market . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Bibliography 49
Appendix A
Supplementary Material for Chapter 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
v
List of Tables
1.1 Above, at, and below listing sales by year in the Los Angeles County . . . . . . . . 2
1.2 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.3 Regression results with absolute list price . . . . . . . . . . . . . . . . . . . . . . . 18
1.4 Regression results with the list low strategy . . . . . . . . . . . . . . . . . . . . . . 20
1.5 Regression Results with number of potential buyers . . . . . . . . . . . . . . . . . . 21
1.6 Eects of potential buyers across years . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.1 Descriptive statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
vi
List of Figures
1.1 Percentage of listings with sales price higher than asking price. . . . . . . . . . . . 3
1.2 Realization of transaction prices, based on valuation of potential buyers. . . . . . . 10
1.3 Demand index in the Los Angeles county housing market . . . . . . . . . . . . . . 23
2.1 Apps' tree structure of housing demand. . . . . . . . . . . . . . . . . . . . . . . . . 32
2.2 LA area's most expensive neighborhoods in 2018. . . . . . . . . . . . . . . . . . . . 38
2.3 Regions in LA county from US Map Guide . . . . . . . . . . . . . . . . . . . . . . 41
2.4 Merged Regions in LA county . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.5 Sample Conditional Inference Tree for Central LA Area . . . . . . . . . . . . . . . 44
2.6 Most important variables in LA county by area . . . . . . . . . . . . . . . . . . . . 46
vii
Abstract
This dissertation explore the price determinants in the Los Angeles county housing marketing.
The rst chapter considers whether sellers strategically set the asking price based on market
condition in order to bring competition to home buying. The motivation of this chapter is that
houses sell both above and below the listing price, yet the asking price has historically been
considered as a ceiling of the transaction price. This chapter proposes a search model where the
buyers make the decision to visit a property based on the observed listing price and establishes
a one-to-one relationship between the list price and the number of visitors. The seller is then
able to maximize the expected prot over the list price. The chapter also performs a series
of empirical estimates to test the model. The data contains detailed information about homes
sold in the Los Angeles county between 2012 and 2018. We use the online viewing activities of
a broker and information platform, Redn, as a proxy of the number of potential buyers and
measure the seller's listing strategy as a comparison with prices of historically similar listings.
The regression results show that there is a negative correlation between the high list price and
the number of potential buyers, supporting our theoretical conclusion that the number of visitors
decreases in the list price. We further examine the eects of online interest on the sale price. The
Los Angeles county data shows that more potential buyers are associated with higher sale prices,
but the extent to which the number of potential buyers aect sale prices could be dierent when
market condition changes. This result coincides with our conclusion in the theoretical part in that
the optimal number of visitors and list prices should be dependent on buyer conditions, that is,
viii
the value distribution and bargaining power. In conclusion, strategic listing of the sellers is an
important determining factor of the number of potential buyers and the sale price.
The second chapter studies the heterogeneity of how housing characteristics aect sales price
across dierent areas. The research question is motivated by the price clustering of properties in
the Los Angeles county. One hypothesis that could explain the phenomenon is that the character-
istics of homes may have dierent levels of importance in determining the sales price. To examine
this hypothesis, we utilize the conditional inference tree (CTREE) approach, an established ma-
chine learning technique that has been applied to the stock market and medical research but not
the housing market. The data collected from MLS in the LA County area contains detailed at-
tributes associated with each listing and enables us to apply the CTREE algorithm to build trees
for the analysis of price determination. We partition the Los Angeles county into dierent regions
according to the geographical neighborhood dened by an online map guide. Moreover, in order
to ensure ample number of entries in each of the dataset to implement the CTREE algorithm,
we merge some of the regions dened in the map and obtained seven districts in the Los Angeles
county, each associated with close to or over 10,000 listings that are sold between January, 2012
and January, 2018. Growing trees and analyzing the importance of variables in all of the seven
districts result in a prediction of per-square-foot housing prices and each variable's contribution
to the sales price. By comparing the importance of variables among all trees, it can be seen that
the housing characteristics do have diversied in
uences to the sales price across dierent regions.
This empirical result from the MLS data supports our hypothesis.
ix
Chapter 1
List Price as a Strategic Determinant of Sales Price
1.1 Introduction
Housing is one of the most important markets in the U.S and the rest of the world. According
to the National Association of Home Builders, housing generally accounts for 15-18% of national
GDP. While investment on a property could be one of the most nancially important decisions
for a household to make, rms and government institutions also invest heavily in the housing
market. Therefore, studies of this market and its
uctuations have been very popular in the elds
of political science, urban planning, and economics. In particular, pricing and its underlying
mechanisms have drawn great interest among researchers in economics, where a large extent
of acedemic and empirical investigation typically involves the pricing of sellers, searching and
negotiation of buyers, and eorts of real-estate agents.
The housing market diers from retail goods market in that the transaction price can be
signicantly dierent from the posted asking price. This is a result of how properties are sold
in the U.S., where in a common home selling process, both the buyer and the seller work with
their agents. The seller's agent helps the client post the property of interest to the market and
the buyer's agent provides the buyer a list of available properties in the appropriate budget range
and preferred locations. After the buyer visits the property and becomes interested, the buyer
1
would then have the buying agent write an oer to the seller, which may or may not be the
same as the price listed on the seller's post. In turn, the seller could then either accept the
oer or make a counter oer, resulting in the possibility for the transaction price to be dierent
from the listing price. Historically, the negotiated transaction prices are typically lower than
the posted price. For instance, in Los Angelesduring the 1990s, the market generally favored
the buyers as sellers were desperate to sell (Los Angeles Times, 1997). However, things have
changed in recent years with the housing boom, leading to a major hike in prices and giving sellers
more power in the bargaining game. Recent trends in the current decade have seen transaction
prices being close to or occasionally even exceeding the listing prices, with the fast-growing Los
Angeles housing market being a prime example. As shown in Table 1, the percentage of listings
sold above the asking price in recent years have stayed around 35 percent to 45 percent, which
is a surprisingly high level by previous standards. Similarly, we also observe high percentages
of homes sold at prices above the listing in other housing markets (Figure 1). Overall, across
the nation, more than 20% of the properties sold have their transaction prices higher than the
listing prices, and in the hottest markets such as Seattle and Boston, the percentage has been
well over 50% throughout the recent 3 years. This phenomenon is very dierent from what has
been widely discussed in traditional literature, which normally assumes the listing price to be
the ceiling of transaction prices. Therefore, the goal of this paper is to re-formulate the buyer-
seller relationship in the housing market without the assumptions that has manifested in previous
studies. A discussion on the strategic behaviors of home buyers and sellers with the possibility of
observing higher transaction prices than that of the listings will also be presented. In the U.S.,
Table 1.1: Above, at, and below listing sales by year in the Los Angeles County
2012 2013 2014 2015 2016 2017 2018
Above 22:9% 44:9% 33:1% 35:8% 37:0% 41:2% 36:7%
At listing 31:4% 10:8% 10:7% 11:3% 11:4% 11:6% 12:3%
Below 45:7% 44:3% 56:3% 52:9% 51:6% 47:2% 51:0%
2
Figure 1.1: Percentage of listings with sales price higher than asking price.
Source: Redn Data Center.
rich data documenting the details of home sales are available. When the seller/agent posts a house
to the market, in addition to the traditional methods such as street signs and local newspapers,
the fast-growing Internet also helps spread the relavent information. The U.S. Multiple Listing
Services (MLS) and other online platforms such as Zillow and Realtor.com are becoming the most
popular tools for the home buyers to search. Among these online agencies, Redn, a large online
real estate search platform that also provides home-buying services, reports their research on the
demand of buyers. Redn presents a "demand index", which is calculated based on the number
of customers requesting to visit the properties and the number of purchasing oers made in each
market, and it indicates that the demand in November, 2017 is 74 percent higher than that in
January, 2014. The fast-increasing demand could be a reason that properties are sold higher than
the listing prices and hence under these circumstances the buyer activities tend to go beyond
negotiation and bargaining. In this paper, I will present a model that enables both negotiation
and bidding behavior between the buyers and the seller, through which prices below, at, and
above listing are made possible.
3
The rest of the paper is organized as follows. In the next section, we will discuss the previous
work that has been done on home search and price negotiation. Afterwards, the theoretical model
of the buyer's and seller's optimizing strategies will be illustrated. Then in the fourth section, we
will present the data and the reduced form regression results followed by the last section which
concludes this paper.
1.2 Literature
As mentioned previously, traditional literature has largely focused on the negotiation case,
where the listing price is a ceiling of the transaction price. Moreover, a large portion of literature
assumes buyers to arrive one by one during property visit, and hence it is meaningful to consider
the time on marketmarket (i.e., the number of days the property stays on the market). For
instance, Yavas and Yang (1995), Knight (2002), Anglin, Rutherford and, Springer (2003) and
many others examined the trade-o between selling a house fast versus at a good price. Other
researchers have also studied how sellers revise their list price as time progresses. Carrillo (2012,
2013) used a dynamic model, combined with time-on-market data, to structurally estimate the
bargaining power of the sellers. More recently, Merlo et al. (2015) made use of data on oers
made by buyers and seller revisions of asking price to model the negotiation process.
Literature has also considered the scenario which buyers' arrival decision depends on infor-
mation of the listing. The role of listing price is explored under this setting, and most literature
studies its relationship with buyer's arrival rate. For example, in Chen and Rosenthal's paper
(1996), the list price in
uences the buyer's searching process. If the seller sets the listing price to
be fairly low, buyers are more likely to visit the property because the seller has committed to a
low ceiling price. Arnold (1999) also built a model where the list price aects the arrival rate of
buyers.
4
In the recent years, eorts have been made to explore the mechanism behind transactions with
higher sales price than the list price. Albrecht, Gautier and Vroman (2012) oerred a new model
where sellers are not committing to a ceiling when setting the list price. Rather, sellers use the
asking price to attract buyers and make only limited commitments to the asking price. Meanwhile,
if the seller receives multiple oers at the same time, the buyers can bid against each other to buy
the house. As a result, this model is consistent with the observation that houses are occasionally
sold above, below, or at the asking price. A key observation in the model presented by the authors
is that it is sometimes equivalent for the seller to set a high price for the buyer to negotiate down
or to set a low price to attract bidders. As a result, the asking price is indeterminate for buyers
with high or low reservation values, giving rise to diculties in empirically estimating the model.
Han and Strange continued a series of theoretical and empirical research on the bidding war
for houses. In their empirical work in 2014, Han and Strange analyzed the National Association of
Realtor's survey data and conrmed that Albrecht, Gautier and Vroman's model was consistent
with the data. They then went on to explore the possible factors for the bidding war to occur
which included restrictions on land use, irrational consumer behavior, more frequent usage of the
Internet, and employment of both buying and selling agents. Later in 2016, Han and Strange
developed a model where the asking price in
uences the buyers' search of the property and the
sales price, but it is neither a binding commitment nor a ceiling of the transaction price. They
assumed the buyer to be either high or low type, and will visit a property only if the expected
utility of visiting exceeds the cost. The model showed that lower asking prices attracts more
visitors, but only up to a certain point. Consequently, sales price can be above, at, or below the
list price, which is consistent with the data and Albrecht, Gautier, and Vroman's model.
This paper builds on Han and Strange's simple yet eective theoretical model where we seek
to release the assumption that buyers are limited to two types. With continuous valuation of
buyers, the bidding war does not yield a deterministic transaction price as in Han and Strange's
scenario. The key dierence is that since ince buyers participating in the bidding process do not
5
necessarily have the same value, how others value the property matters to the highest bidder.
Accordingly, the buyer's searching and the seller's listing strategy change with respect to the
expected transaction price.
1.3 The Model
This section species a model that describes the buyer's search with visiting costs as well
as the seller's listing decision based on the buyer's searching response. This model is consistent
with Han and Strange in the sense that the resulting sales price of a house is sometimes above,
sometimes below, and sometimes at the asking price.
In the simplest fashion, the seller and the buyer play a 1-period game. The seller rst sets a
listing price, P
L
and the potential buyers make the decision whether to visit upon observing the
listing price and the basic characteristics of the property. When the buyer visits the property, the
number of visitors is known to the seller and all other visitors. The buyer also learns how she and
others like the house after visiting it. Based on the information learned from this visit, the buyer
can make the decision to submit an oer to the seller. The seller is committed to sell the property
as long as there is an oer above a reservation value, which indicates that the house sells even if
the oers received are all below the list price.
We now dene some basic variables. The seller has a reservation value of V
S
. We assume that
V
S
is given by nature and is known to both the seller and the buyer when the list price is set. The
buyers' valuation of the property,V
B
, consists of two parts: a "base" value,b
0
, that depends only
on the characteristics of the property, which is known before a visit, and a heterogeneous value, b
u
, that re
ects the buyers' preference, which is realized only after the visit is made. Every visitor
also learns about other people's valuation upon visiting the property. The buyer's preference value
follows a certain distribution that is known to both the buyer and the seller, and we denote it as
6
G(). To sum up, V
B
=b
0
+b
u
,b
u
G(). We are now ready to discuss the dierent scenarios in
which one or more visitors may show up to see the property.
1.3.1 Realization of Transaction Price
1.3.1.1 1-Visitor
If there is only one visitor to the property, she will engage in the bargaining process with the
seller if her realized value of the property is greater than the seller's reservation value, that is,
V
B
V
S
. The result of the negotiation depends on the bargaining power of the seller, which is
denoted as . If the seller and the buyer engage in a standard Nash Bargaining, the equilibrium
price of the house should be V
B
+ (1)V
S
. Notice however that if the asking price P
L
is low
enough such that accepting the list price makes a better deal than bargaining, it could be a binding
ceiling of the sales price. This will be the case if P
L
V
B
+ (1)V
S
, in which the buyer will
accept the asking price rather than bargain with the seller, because her bargaining power has been
fully exerted. As a result, the transaction price P would equal the listing price P
L
. Otherwise,
the transaction price would be the outcome of the bargain, that is P =V
B
+ (1)V
S
. To sum
up,
P =
8
>
>
>
<
>
>
>
:
P
L
; P
L
V
B
+ (1)V
S
V
B
+ (1)V
S
; otherwise
1.3.1.2 n 2 visitors
Suppose now that more than 1 visitor show up. As discussed before, the visitors have a hetero-
geneous preference value that is drawn after the visit, and visitors may very likely have dierent
valuations of the property. Let the values of the visitors be ranked from high to low and denoted
as V
B1
;V
B2
;:::;V
Bn
. These realized values may be higher than, lower than, or the same as the
list price. At the price determination stage, depending on visitors' values, there may be three
7
dierent cases.
Scenario 1: V
B
P
L
for all visitors
This is the traditional bargaining case discussed in the previous literature. Since all the visitors
have values lower than the list price, it serves as the ceiling of the sales price. Moreover, although
all visitors have the chance to make an oer and bargain with the seller, only the visitor with the
highest value gets to buy the house, with the price she has to pay left for determination. Similar
to the 1-visitor case, the equilibrium bargaining result between the highest visitor and the seller
isP =V
B1
+ (1)V
S
. However, if the bargained price is smaller than the second highest value,
this will not be the equilibrium price of the game. The visitor with the second highest value could
oer a price slightly higher than the bargained price and still get a positive utility from buying
the property. Therefore, the only outcome that no one will deviate from is that the visitor with
the highest value oers a price that is exactly the same as the second highest value and gets to
buy the property. The realized transaction price would then be equal to the second highest value.
To sum up what we have discussed in this paragraph,
P =maxfV
B1
+ (1)V
S
;V
B2
g
Scenario 2: V
B1
>P
L
and V
B2
;V
B3
;:::V
Bn
P
L
In this case, only one visitor has higher realized value than the asking price. This visitor has the
chance to bargain with the seller and purchase the property with price P = V
B1
+ (1)V
S
.
However, the story could be dierent since both the second highest value and the listing price will
aect how the sales price is determined. First, if the asking price is lower than the bargained price,
the buyer will be better o simply accepting the asking price. Notice that the second highest value
is known to be smaller than the asking price, therefore no one else would be willing to compete
with the top visitor. In the case that the asking price is not binding, similar to the previous case,
8
the visitor with the top value would be forced to oer the second highest value in order to buy
the property if the second highest value is higher than the bargained price. That is,
P =
8
>
>
>
<
>
>
>
:
P
L
; P
L
V
B1
+ (1)V
S
maxfV
B1
+ (1)V
S
;V
B2
g; otherwise
Thus far, we have not seen any scenario in which sales price can be higher than the asking price
so far. When the values drawn are low compared to the list price, even if there are more than
one visitor and there may be competition between buyers, the list price is still likely a ceiling of
the sales price. Things could be dierent if multiple visitors have higher realized values than the
asking price, and starting from here this paper diers from the traditional literature.
Scenario 3: V
B2
>P
L
This is the bidding war case. Two or more visitors have values above the listing price and they
would engage in a rst price sealed bid auction. The asking price is no longer a commitment of
the seller since the sales price can be higher. The equilibrium outcome of this scenario is that the
visitor with the highest value gets the property and pay the second highest value. That is,
P =V
B2
This is essentially a winner's curse to the person buying this property. She had the opportunity
to accept the asking price or bargain with the seller if others' values are not high enough to threat
her chance of buying the house. Although competition from other visitors also appears when
others have lower values, it is particularly severe for the visitor with the highest value when more
people value the property higher than the asking price, because it completely cuts the chance for
the buyer to pay any price lower than the posted price.
9
1.3.1.3 Summary of possible outcomes
As discussed in the previous sections, depending on the realization of the visitors' values,
there could be three dierent prices if the transaction occurs. When all the visitors have drawn
relatively low values compared to the list price, it is impossible for the transaction price to be
higher than the asking price. However, if there is competition among visitors, specically if the
bargained price is low enough such that the visitor with the second highest value could gain from
oering the same bargained price as the person with the highest value, bargaining cannot sustain
as an equilibrium and the buyer is forced to pay the second highest value. Without the threat of
the other visitors, the buyer with the highest valuation of the property will have the opportunity
to consider accepting the asking price, given that it makes her better o compared to bargaining.
Figure 1.2: Realization of transaction prices, based on valuation of potential buyers.
10
To sum up, if we know the asking priceP
L
and the top two realizations of buyers' valueV
B1
and
V
B2
, we can determine whether there is a bidding war and we can pin down the transaction price.
The three possible outcomes of the transaction process are: P =P
L
; V
B1
+ (1)V
S
; or V
B2
.
The graph above summarizes the dierent scenarios we have discussed so far and the corresponding
realized sales prices.
Note that this model setting has enabled prices below, at, and above listing price to be realized,
which adds to the traditional literature in the sense that we allow for auction when valuations are
high without omitting the possibility of a bargaining-down process. The result from this model
agrees with the theoretical work of Albrecht, Gautier, and Vroman (2012) as well as from Han
and Strange (2016), and is consistent with the data, which shows a large portion of above-listing
sales.
1.3.2 Buyer's Visiting Decision
Using the graph above, each visitor's expected utility of visiting a specic property can be
calculated as:
E[U
B
] =
Z 1
(P
L
V
S
)+(V
S
b
0
)
V
S
b
0
(1)(b
0
+b
u1
V
S
)G(b
u1
+ (1)(V
S
b
0
))
(n1)
g(b
u1
)db
u1
=
Z
b
u
1
(P
L
V
S
)+(V
S
b
0
)
(b
0
+b
u1
P
L
)G(P
L
b
0
)
(n1)
g(b
u1
)db
u1
=
Z
b
u
1
(P
L
V
S
)+(V
S
b
0
)
Z
b
u1
P
L
b
0
(b
u1
b
u2
)G(b
u2
)
(n2)
g(b
u2
)db
u2
g(b
u1
)db
u1
=
Z 1
(P
L
V
S
)+(V
S
b
0
)
V
S
b
0
Z
b
u1
b
u1
+(1)(V
S
b
0
)
(b
u1
b
u2
)G(b
u2
)
(n2)
g(b
u2
)db
u2
g(b
u1
)db
u1
If visiting each property incurs a cost of c, a potential buyer will visit if and only if
E[U
B
]c
11
Due to the complexity of the buyer's search decision, it is computationally hard to nd a
cutting point of the list price above which the buyer will make a visit. We can, however, calculate
the comparative statistics and see the in
uence of the listing price and the seller's bargaining
power on the buyer's expected utility (see below). If any factor negatively aects the buyer's
expected utility, increasing it will further decrease the likelihood for the searching inequality to
hold, which in turn means that the buyer will be less likely to visit the property. Notice that
although we may see negative derivatives of the list price and the bargaining power, there exists
situations where the searching inequality will not hold regardless of how these variables change.
That is, these variables aect the buyer's visiting probability only up to a certain point, beyond
which continued decrease of the list price will not attract more visitors.
1.3.2.1 In
uence of bargaining power
By taking the rst order condition with respect to bargaining power , it can be seen that:
@E[U
B
]
@
=
Z 1
(P
L
V
S
)+(V
S
b
0
)
V
S
b
0
(b
0
+b
u1
V
S
)G(b
u1
+ (1)(V
S
b
0
))
(n1)
g(b
u1
)db
u1
< 0
Therefore, holding everything else constant, an increase in the seller's bargaining power will reduce
the buyer's expected visiting utility, and thus lowering her probability to visit.
1.3.2.2 In
uence of listing price
By taking the rst order condition with respect to the listing price P
L
, it can be seen that:
@E[U
B
]
@P
L
=
Z
b
u
1
(P
L
V
S
)+(V
S
b
0
)
G(P
L
b
0
)
(n1)
g(b
u1
)db
u1
< 0
Therefore, holding everything else constant, an increase in listing price will reduce the buyer's
expected visiting utility, thus lowering her probability to visit. Here, we can establish a one-to-one
12
relationship between the number of potential buyers and the list price: with higher list prices,
fewer potential buyers will show up to visit the property.
Since we have established a one-to-one relationship between the list price and the number
of visitors, we now have the information to proceed and study the seller's problem. The seller
of a property tries to maximize the expected prot by setting the optimal list price, given the
bargaining power of the buyer.
1.3.3 Seller's Problem
The seller of a property faces a prot-maximizing problem in this process. The prot of selling
a property depends on the transaction price, which we have proved to be realized according
to the number of visitors who show up at the property and the valuation distribution of the
visitors. In the previous section, we have discussed all possible scenarios and the probability that
each transaction price occurs. Therefore, we can obtain the expected prot using the number of
visitors and the value distribution. Since there is a one-to-one relationship between the number
of visitors and the list price, the seller can essentially partially determine the prot by setting an
optimal list price.
Mathematically, the seller's prot given certain bargaining power and number of visitors is:
n =n
Z 1
(P
L
V
S
)+(V
S
b
0
)
V
S
b
0
(b
0
+b
u1
V
S
)G(b
u1
+ (1)(V
S
b
0
))
(n1)
g(b
u1
)db
u1
=
Z
b
u
1
(P
L
V
S
)+(V
S
b
0
)
(P
L
V
S
)G(P
L
b
0
)
(n1)
g(b
u1
)db
u1
=
Z
b
u
1
(P
L
V
S
)+(V
S
b
0
)
(n 1)
Z
b
u1
P
L
b
0
(b
0
+b
u2
V
S
)G(b
u2
)
(n2)
g(b
u2
)db
u2
g(b
u1
)db
u1
=
Z 1
(P
L
V
S
)+(V
S
b
0
)
V
S
b
0
(n 1)
Z
b
u1
b
u1
+(1)(V
S
b
0
)
(b
0
+b
u2
V
S
)G(b
u2
)
(n2)
g(b
u2
)db
u2
g(b
u1
)db
u1
Solving the seller's problem involves taking the rst-order condition with respect to the list
price. Due to the complexity of the prot function, obtaining a closed-form solution to this
problem is particularly dicult. We can, however, solve for an indirect relationship between the
13
prot and the number of visitors. Taking the rst-order condition with respect to the number of
visitors yields:
@n
@n
=
Z 1
(P
L
V
S
)+(V
S
b
0
)
b
u
(b
0
+b
u1
V
S
)G(b
u1
+ (1)(V
S
b0))
n1
(1 + (n 1)ln(G(b
u1
+ (1)(V
S
b
0
))))g(b
u1
)db
u1
+ [1G(
1
(P
L
(n)V
S
) +V
S
B
0
)](P
L
(n)V
S
)G(P
L
(n)V
S
)G(P
L
(n)b
0
)
n1
(1 +
@P
L
@n
+
@P
L
@n
(n 1)G(P
L
b
0
)(ln(G(P
L
(n)b
0
))
1
G(P
L
(n)b
0
)
))
+
Z
b
u
1
(P
L
V
S
)+(V
S
b
0
)
(n 1)
Z
b
u1
P
L
b
0
(b
0
+b
u2
V
S
)G(b
u2
)
n2
(1 + (n 2)ln(G(b
u2
)))g(b
u2
)db
u2
g(b
u1
)db
u1
+
Z 1
(P
L
V
S
)+(V
S
b
0
)
b
u
(n 1)
Z
b
u1
b
u1
+(1)(V
S
b
0
)
(b
0
+b
u2
V
S
)G(b
u2
)
n2
(1 + (n 2)ln(G(b
u2
)))g(b
u2
)db
u2
g(b
u1
)db
u1
Note that in the above condition, we denote the realized transaction price as P
L
(n) since it
depends on the number of visitors. This equation shows that other than the number of visitors,
the bargaining power of the buyer and the distribution of valuation also impact the expected
prot of the seller.
We have so far demonstrated that in an environment where both bargaining and auction are
possible, the buyers are able to predict the expected utility of visiting a property listed at a certain
price given the bargaining power and the transportation cost . The buyers' decisions of whether
or not to visit the property could determine the total number of visitors and thus the prot of
seller. We are also able to establish a one-to-one relationship between the number of visitors and
the list price: number of visitors decreases in the list price. In the next section, we will examine
the results from our model using housing data in the Los Angeles county.
1.4 Empirical Evidence
1.4.1 Data
The dataset used in this study is a combination of the Multiple Listing Services (MLS) of Los
Angeles county and the viewing activities on Redn.com. It contains about 118,000 homes listed
by MLS and sold between January 2012 and January 2018 with detailed information about each
14
listing. First of all, full price history is shown, which consists of the list price, the sale price, the
number of days the unit stayed on the market, and any changes in price. Furthermore, the dataset
records the physical conditions of each property, including the number of bedrooms, the number
of bathrooms, the total square footage, property age, parking spaces, and so on. To supplement
these information, we also collected viewing activities from Redn. In this regard, Redn.com was
chosen since it serves as a good representation of the entire home-buying population. Specically,
Redn is unique in that it provides broker services along with other browsing functions that many
online visitors utilize, and thus the behavior of Redn customers is a good simulation of all home
buyers. We use the number of \favorites" ratings received by each listing as a proxy of interest
shown by potential buyers. Table 2 below shows a summary statistic of the dataset.
Table 1.2: Descriptive statistics
Statistic N Mean St. Dev. Pctl(25) Pctl(75)
Bedroom 118,004 2.951 1.066 2 4
Bathroom 118,004 2.403 1.166 2 3
Sold Price 118,004 $817,025.000 1,055,588.000 $395,000 $865,000
List Price 118,004 $838,998.400 1,160,568.000 $399,000 $875,000
Square Feet 118,004 1,766.760 1,060.943 1,163 2,015
Year Built 118,004 1960 34.267 1944 1980
Association Fee 36,463 446.108 2,796.814 280.000 450.000
Days on Market 118,004 85.660 74.033 45 102
Sold-to-List Ratio 118,004 0.992 0.084 0.956 1.021
Number of Parking 118,004 1.167 2.744 0 2
View Count 115,727 691.529 648.329 313 869
Favorite Count 112,450 27.629 32. 483 8 35
In ourdataset, the average listing price is $838,998, with a minimum of $30,000 and a maximum
of $100,000,000. The average sales price is $817,205 with a minimum of $50,000 and a maximum
of $100,000,000. On average, properties are sold slightly below the listing price, with mean sale-
to-list ratio around 99.2%. A typical home in Los Angeles County has around three bedrooms,
two and a half bathrooms, one parking space, and about 1767 square feet in area. Only about 31
% of the listings involve a homeowner association with an average association fee of about $446
15
per month. On Redn, the average number of views is 691 times per property and each listing
receives special attention (favorites) from around 28 people.
1.4.2 Regression Strategy
The goal of the empirical study is to examine the eects of listing strategy on the number
of potential buyers. While we can use the Redn favorites as a proxy of potential buyers, the
question remains on how we can dene the seller's listing strategy. To this end, we use two
measures in our analysis: the absolute list price and the relative list price to historical comparable
properties ("comps"). Using the absolute list price as the independent variable in the regression
is the most straightforward idea. The regression explores the relationship between the number
of potential buyers and the covariates. Specically, the number of potential buyers N
i
t has three
additive components: a xed eect , the list price LP , and the physical characteristics X.
Mathematically, we will estimate the following equation:
N
it
=
1i
+
1it
LP
it
+
1it
X
it
+u
it
(1.1)
The second measure of listing strategy requires comparing the list price of each property in our
dataset to its comps. A previously sold property can be dened as a comp to the current listing
only if it satises the following conditions: (1) sold within 6 months; (2) geographically close to
the current property; (3) has the same number of bedrooms and bathrooms; (4) is of similar size
to the current listing. For each property in our dataset, we collect the comps that fullls all the
conditions above and calculate the median transaction price of the comps. We then compare this
median price with the list price of our current listing. If the current list price is lower than the
median historical comp price, we dene the seller's strategy as "listing low" and vice versa. Using
16
reduced form regression, we are able to examine the eects of listing strategies on the buyers'
interests. Let l
it
denote the strategy of listing low, we estimate the following regression:
N
it
=
2i
+
2it
l
it
+
2it
X
it
+u
it
(1.2)
Furthermore, we run an additional set of regressions to nd the relationship between the sale
prices and the number of potential buyers. Although the theoretical model does not establish
explicit causality between the two variables, we look to empirically explore how the number of
potential buyers aects sales price in the Los Angeles county during 2012-2018. Let P
it
denote
the transaction price, the following equation summarizes the reduced form analysis:
P
it
=
3i
+
3it
N
it
+
3it
X
it
+u
it
(1.3)
With the three sets of regressions specied above, we build up the empirical work corresponding
to the questions studied in the theoretical model. Note that in each set of regression, we run with
dierent specications regarding xed eects: no xed eects, time only, and time and location
xed eects. We use a combination of month and year as the time xed eect, and the zip code
of each listing as the location xed eect. The next section presents the regression results.
1.4.3 Results
1.4.3.1 Potential buyers on absolute list price
The results of regression using equation [1] are presented in Table 3. The absolute list price
has a signicant and negative impact on the number of potential buyers in all specications with
dierent time and location xed eects. With every $10,000 increase in the list price, the seller
expects to lose around 1 person that might be interested in the property. Given the small average
number of potential buyers (28) and the large variation in list prices over the entire dataset, this
17
expected loss of interest due to increases in list price does not seem to be trivial. The rest of
Table 1.3: Regression results with absolute list price
# potential buyers (1) (2) (3)
list price -0.00971*** -0.00971*** -0.00650***
(in thousands) (-14.61) (-14.62) (-9.92)
bedroom -4.362*** -4.363*** -2.265***
(-37.95) (-37.95) (-19.40)
bathroom 0.476*** 0.474*** 0.292***
(5.92) (5.90) (3.73)
square footage -1.803*** -1.802*** -0.962***
(-28.47) (-28.46) (-15.31)
age -0.104*** -0.104*** -0.0548***
(-33.77) (-33.78) (-17.77)
garage spaces 0.550*** 0.551*** 0.217***
(-14.79) (-14.80) (-5.96)
is mobile home -8.632*** -8.588*** 1.047
(-5.80) (-5.77) (0.71)
is single residence 8.898*** 8.904*** 10.15***
(27.48) (27.50) (31.59)
association fee -0.0119*** -0.0119*** -0.00875***
(-12.28) (-12.28) (-9.24)
N 110834 110834 110834
time No Yes Yes
location No No Yes
R
2
0.102 0.103 0.160
t statistics in parentheses
* p< 0:05, ** p< 0:01, *** p< 0:001
the regressors are mostly signicant and show expected signs. Across all three specications, the
coecients are of similar signs and magnitudes. Single family houses and properties with more
bathrooms and garage spaces attract more interest online. High association fees, old properties,
and mobile homes reduce the number of interested buyers. It is worth noting that an increase in
the number of bedrooms actually reduces the interest of buyers, potentially because more rooms
in a home infers less space in each room. The negative impact of more bedrooms has been shown
in various other hedonic studies as well. However, the square footage of a property negatively
in
uences the number of buyers, which is less studied in literature. Several social studies have
observed similar eects, mentioning that "homes have gotten bigger, but Americans aren't any
18
more pleased with the extra space" (Pinsker 2019). It is possible that the psychological burden
associated with larger homes makes people less interested them.
1.4.3.2 Potential buyers on listing low
Table 4 below shows the regression results of potential buyers on the seller's strategy of listing
low. Taking into account dierent xed eects, we obtain similar results for the coecients on
the listing low strategy, ranging from 3.5 to 4.6. Therefore, by listing at a lower price, the seller
can attract around 4 more potential buyers that are interested in the property. Considering the
overall average of 28 favorites in the full dataset, this listing strategy can help the seller increase the
number of potential buyers by nearly 15%, which is not a negligible amount. It is worth noting that
the coecients on listing strategy are much larger in magnitudes than the coecients on absolute
list price. In the current set of regression, listing strategy can be almost as important as changing
the number of bedrooms to the potential buyers, and has larger eects than most of the widely
discussed characteristics such as square footage and age. This shows that historical transaction
data plays an important role when buyers make their visiting and purchasing decisions, and the
sellers could eciently aect the interest received by setting a proper list price. The rest of the
regressors are mostly signicant and show expected signs. The magnitudes of coecients are also
very similar to the previous set of regressions with absolute list price as the target independent
variable.
1.4.3.3 Sale price on number of potential buyers
The results from the last set of regressions are presented in Table 5. It is worth discussing
the eects of the number of potential buyers on the sales price. Using the theoretical model, we
concluded that the optimal list price depends on the value distribution and the bargaining power
of buyers. While we do not observe these variables in our data, we are able to still run and see the
empirical eects of potential buyers during the specic time period (2012-2018) in the Los Angeles
19
Table 1.4: Regression results with the list low strategy
# potential buyers (1) (2) (3)
list low 4.612*** 4.613*** 3.500***
(23.98) (23.99) (18.45)
bedroom -4.632*** -4.633*** -2.547***
(-40.21) (-40.22) (-21.66)
bathroom 0.401*** 0.400*** 0.233**
(5.00) (4.98) (2.97)
square footage -1.889*** -1.888*** -1.055***
(-29.85) (-29.85) (-16.76)
age -0.104*** -0.104*** -0.0553***
(-33.62) (-33.62) (-17.98)
garage spaces 0.536*** 0.536*** 0.216***
(-14.43) (-14.44) (-5.93)
is mobile home -10.26*** -10.21*** -0.285
(-6.90) (-6.87) (-0.19)
is single residence 9.677*** 9.684*** 10.73***
(29.81) (29.83) (33.30)
association fee -0.0126*** -0.0126*** -0.00932***
(-12.96) (-12.96) (-9.85)
N 110829 110829 110829
time No Yes Yes
location No No Yes
R
2
0.107 0.107 0.163
t statistics in parentheses
* p< 0:05, ** p< 0:01, *** p< 0:001
20
county. From the regression results, we see that each additional potential buyer brings up the sale
Table 1.5: Regression Results with number of potential buyers
Sale Price (1) (2) (3)
(in thousands)
# potential buyers 2.163*** 2.161*** 0.775***
(44.82) (44.78) (17.89)
bedroom 102.9*** 102.9*** 137.7***
(55.12) (55.08) (83.68)
bathoom 149.1*** 149.1*** 114.7***
(119.95) (120.00) (105.30)
square footage 2.521* 2.567* 19.93***
(-2.42) (-2.46) (21.82)
garage spaces 28.12*** 28.13*** 12.55***
(-45.89) (-45.92) (-23.45)
age -0.862*** -0.861*** -1.548***
(16.88) (16.87) (34.52)
is mobile home -337.5*** -336.9*** -239.6***
(-13.81) (-13.78) (-11.18)
is single residence 219.4*** 219.7*** 240.2***
(41.32) (41.37) (51.62)
association fee 0.385*** 0.385*** 0.301***
(24.05) (24.06) (21.70)
N 110829 110829 110829
time No Yes Yes
location No No Yes
R
2
0.344 0.345 0.517
t statistics in parentheses
* p< 0:05, ** p< 0:01, *** p< 0:001
price by $775 if we consider both the time and location xed eects. Without considering both
xed eects, an additional interested buyer could raise the price by as high as $2, 163. To better
understand this number, we compare them with the calculated average discrepancy among the
prices of historical comps. . For each listing in our dataset, we nd the price dierence between
the highest comp and the lowest comp with the mean value of all such dierences being $13,248.
If attracting an additional visitor yields a $2,163 increase in sale price, listing the property lower
than the those with comparable characteristics could attract 4.6 more interested buyers, and
hence lift the sale price by about $9,950, which is a dierence that could bring this property from
the lowest price tier to almost the highest tier among all similar properties. Therefore, from our
21
dataset, it can be concluded that attracting more interested buyers by listing the property at a
lower price can help the seller obtain more prots from the same property.
The rest of the regressors are mostly signicant and show expected signs. Increasing the num-
ber of bedrooms and bathrooms is associated with higher sale prices. Each additional bedroom
adds about $102,900 home value, while each additional bathroom increases the price by around
$149,100. Similarly, more square footage and more garage spaces also positively aect the sale
price. On the contrary, older homes sell for lower prices on average, with around a $8,620 decrease
in sale price for every 10 years in age. The type of the property also aects the sale price signif-
icantly. Single family homes are generally sold for $219,400 more than condos and townhouses,
while on the other hand mobile homes sell for $337,500 less, holding all other conditions equal.
Lastly, properties with higher association fees are solder higher, potentially because higher fees
may indicate that more services or utilities are covered by the homeowner association, or that the
association is better managed. The analysis of factors aecting sale prices is a good supplement
to the current literature that study the home prices. In addition to obtaining similar numeric
results about the coecients of common variables, we add value by including the eect of buyers'
interest.
Recall that from our theoretical model, we concluded the optimal number of potential buyers
should depend on the buyers' value distribution and bargaining power. . This leads to a hypothesis
that in dierent markets where these two factors change, the eect of potential buyers on sale
price may be dierent. We examine our hypothesis by running separate regressions in dierent
markets. There are a number of ways to dene markets, the most common ones of which are time
and location. As such, we use each calendar year as a distinct market in our dataset. During
the time period of 2012 to 2018, the Los Angeles county housing market
uctuates from time to
time accordint to the data shown in Figure 3, which shows the demand index calculated based on
the number of Redn customers that request tours and write oers. It can be seen that market
conditions do change over the time period of our data sample. Therefore, we can infer that
22
Figure 1.3: Demand index in the Los Angeles county housing market
Source: Redn Data Center.
the variables mentioned in our model, the value distribution and the bargaining power of buyer,
should be related to the demand on the market.
Table 1.6: Eects of potential buyers across years
2012 2013 2014 2015 2016 2017 2018
# potential buyers 1.305 0.612 0.160 0.720 0.120 0.876 0.191
We run the regression of sale price on the number of potential buyers and other characteristics
with consideration of time and location xed eects in each market, which is dened as a calendar
year. The coecients of our target independent variable, the number of potential buyers, are
shown in Table 6, which shows a correlation between the coecients of number of potential
buyers and the demand index. In the years that the demand index is low, for example 2016, the
eect of having more interested buyers is reduced. On the other hand, in years which the demand
is relatively high (2015 and 2017), more competition between visitors can eectively raise the sale
price and thus there is more incentive for the sellers to implement the low listing strategy. The
regression results in dierent market time periods coincide with our modeling conclusion that
23
the optimal listing strategy is made based on the market conditions, which are related to value
distribution and bargaining power.
1.5 Conclusion
This paper has considered whether sellers strategically set the asking price based on market
condition in order to bring competition to home buying. The motivation of this paper is that in
the recent years, houses sell both above and below the listing price. However, it is not intuitive
to think that asking price does not matter to the buyers and sellers since it has historically been
considered as a ceiling of the transaction price. This paper proposes a search model where the
list price can be a binding ceiling when the competition between visitors is not erce and when
the bargaining power of the seller is high, such that the bargained price is even higher than the
asking price. On the other hand, when there is competition and the value of the other visitors
are high, the person with the highest valuation would be forced to raise her oer. After the price
determination process is clear to the buyer and the seller, they each make an optimal decision to
maximize their utility or prot. A potential visitor will visit a certain property only if the expected
utility from visiting is weakly greater than the cost, and therefore we are able to establish a one-
to-one relationship between the list price and the number of visitors. Knowing the best response
of the buyer, the seller is able to foresee the expected prot for each list price associated with the
property. The optimal list price depends on the value distribution and the bargaining power of
the buyers.
The paper also performs a series of empirical estimates to test the model. The data contains
detailed information about homes sold in the Los Angeles county between 2012 and 2018. Since
the number of visitors and bids received by each property cannot be observed, we use the online
viewing activities of a broker and information platform, Redn, as a proxy of the number of
potential buyers. The seller's listing behavior is measured using two dierent methods: the
24
absolute list price and the comparison with historically similar listings. Both sets of regressions
show that there is a negative correlation between the high list price and the number of potential
buyers, supporting our theoretical conclusion that the number of visitors decreases in the list
price. We further examine the eects of online interest on the sale price. The Los Angeles county
data shows that more potential buyers are associated with higher sale prices, but the extent to
which the number of potential buyers aect sale prices could be dierent when market condition
changes. This result coincides with our conclusion in the theoretical part in that the optimal
number of visitors and list prices should be dependent on buyer conditions, that is, the value
distribution and bargaining power.
To sum up the theory and the empirics, this essay builds a model that studies buyer and seller
behaviors in the environment that both bargaining and auctions are possible, and use reduced
form regressions to empirically examine the theoretical results. The buyer's expected utility and
the seller's expected prot depend on the number of visitors, which is a result of rational decision
of the buyers according to the list price. Therefore, strategic listing of the sellers is an important
determining factor of the number of potential buyers and the sale price. The empirical analysis
using housing data in the Los Angeles county supports the results in the model.
25
Chapter 2
Heterogeneity in Price Determinants Across Areas
2.1 Introduction
Housing is the prevailing component of many families, and an important sector of the na-
tion's economy, accounting for roughly 20% of the total household wealth in the United States.
Property value predictions are essential to the life decisions of individuals as well as the func-
tioning of the housing market. Therefore, policy makers and economists are highly interested
in the radical determinants of house prices. Real estate stakeholders, such as home buyers and
agents, also pay broad attention to price predictions because the outcomes help them make more
informed decisions. Various tools have been applied to housing market data in regards to home
value prediction and one of the most utilized tools to study the relationship between house prices
and characteristics is the hedonic-based regression models over the past several decades. These
models are applied extensively for its straightforward establishment of the correlation between
house prices and characteristics. However, it is subject to emerging criticism due to the potential
inaccuracies stemming from several aspects such as assumptions, choice of function forms, and
identication of demand and supply. In recent years, with the growing application of articial in-
telligence techniques in various elds, the literature expanded to include more and more attempts
at predicting house prices using machine learning, such as Articial Neural Networks (Selim 2009)
26
and Support Vector Machines (Gu et al. 2011). This essay proposes to utilize a simple, yet ef-
fective non-parametric approach - the conditional inference tree approach - to the Los Angeles
county housing market as a case study of house price prediction.
In studies using either the traditional hedonic analyses or the more recently developed machine
learning techniques, location is treated as an important determinant to estimate the value of a
property. In hedonic price modeling, a widely employed strategy is to control for geographical
proxies such as postal codes or census tracts. The benet of this method is its practicality, in that
it captures the average eects of services and resources that are available in the neighborhood, and
provides a good representation to explain the price discrepancy using an easily obtainable piece of
data. In applications of machine learning to the housing market, location is also considered as an
important feature in the models. With the availability of more detailed information such as the
longitude and magnitude, machine learning approaches can incorporate more specic geographic
features such as the distance from each property to the closest grocery store, which may also have
an impact on its price. Although a large amount of literature have taken into consideration the ef-
fects of location on the housing price, the modeling is often carried out considering only the whole
picture. In other words, the regressions are usually performed on the entire dataset, resulting in
universal coecients on each in
uencing factor across all neighborhoods. However, the perfor-
mance of such an estimation strategy is non-satisfactory at times, especially when the variation
in location over the whole dataset is large. In reality, houses should be considered as heteroge-
neous goods, in the sense that the impacts of physical characteristics on home prices should vary
individually from house to house. For example, in urban centers such as downtown Los Angeles,
the number of parking spaces associated with each unit is expected to play a more important role
in determining the value of a property compared to those in sub-urban areas, where ample space
for street parking is provided. It is therefore reasonable to hypothesize that the extent to which
certain home characteristics in
uence pricing will vary across dierent neighborhoods. With the
motivation to examine this hypothesis, this essay proposes to apply the conditional inference tree
27
approach to the segmented Los Angeles county housing market data as an investigation of the
house characteristic-price relationship heterogeneity across dierent areas.
Analyzing the heterogeneity of pricing factors is meaningful to not only researchers and policy-
makers, but also the stakeholders in the home-selling process, namely, buyers, sellers, and their
real estate agents. In particular, the stakeholders can benet in the following two ways: rst, a
more accurate prediction according to home characteristics gives both parties better knowledge
about the value of the property, and therefore helps them make more informed decisions in the
purchasing process; second, the seller and the listing agent are able to view the characteristics
that add the most value to the sales price, and provide targeted advertisements. In fact, listing
agents have already practiced specically advertising some of the characteristics in the description
of a property. Take the following paragraph of description from a home sold (Redn.com) in 2018
as an example:
Don't miss out!!! Come enjoy this Gorgeous, FULLY Remodeled home that is
priced to Sell!!! This home is 1,418 sq. ft., has 3 bedrooms, 2 bathrooms. The layout
is fantastic. Located in a prime location, just minutes away from restaurants & popular
shopping. Just a short 30 minute drive from Downtown Los Angeles. The spacious
open
oor plan boasts Hardwood Engineered
ooring & features a dream kitchen that
opens to the Dining area & Living Room, which is light and bright. Features galley
kitchen with all Quartz counters, new stainless steel appliances & oversize stainless
steel sink.
In the paragraph above, a few characteristics other than the basic features stand out: remodeled
home, hardwood
ooring, and stainless steel appliances. The listing agent selectively chooses
these features but not the others in this short description, potentially because the agent has some
information that they might be the most important pricing factors of the property. Thus, studying
the importance of characteristics in each area could help explain explicitly the facts we observe.
28
The rest of the essay will be organized as follows. We rst review the literature and related work
in Section 2. Section 3 then introduces the theoretical foundation for the conditional inference
tree approach and the algorithm methodology. Afterwards, section 4 provides a description of
the data and section 5 presents the application of the algorithm and the corresponding results.
Finally, section 6 concludes the study and highlights potential future areas for exploration that
are related to the heterogeneity across areas.
2.2 Literature
This section discusses the literature in the following three elds: hedonic-based regression
studies, machine learning applications, and heterogeneity analyses. The hedonic-based studies
consider housing as a combination of inherent characteristics related to physical conditions, such
as the number of bedrooms and bathrooms, and geographical conditions, such as the location and
surrounding resources. By regressing the sales price of each property on the housing characteris-
tics, each coecient can then represent the estimated contribution of that individual characteristic
to the transaction price. The literature on hedonic-based regression studies is quite substantial.
Lancaster (1966) builds the microeconomic foundations for utility analysis based on the charac-
teristics of commodities and applies that to numerous elds, for instance, the housing market
and nancial assets. Many more researches apply the hedonic pricing model to various markets
including, Benson et al. (1998), Fletcher et al. (2000a, 2000b), Malpezzi (2003), Selim (2009),
Yayar and Demir (2014), and Montero et al. (2018), as some representative examples. These
studies are carried out with multiple regressions, dierent specications, and data from numerous
sources. As mentioned previously, hedonic-based regressions are criticized for a series of shortfalls,
such as assumptions, choice of function forms, and identication of demand and supply, and the
researchers seek for other methods of price estimation. Tree-based machine learning approaches
could provide an alternative to multiple regression analysis.
29
Decision tree learning is one of the predictive modeling methods that predicts a dependent
variable according to covariates named input variables. In addition to prediction, decisition tree
also provides a powerful tool for the description, classication, and visualization of data. Over
the past decades, attempts have been made to apply decision tree techniques to elds such as
data mining and portfolio analysis. Sorensen et al. (2000) utilize a classication and regression
tree to a selection of technology stocks and evaluate their performance. A few attempts have also
been made to use the decision tree approach in studying questions related to the housing market.
Fan et al. (2006) apply decision tree learning to the Singapore resale public housing market to
analyze the importance of characteristics and show that the most crucial price determinants can
be disparate for properties with dierent numbers of bedrooms. From the standard decision tree
method, a more recent tree-based model - conditional inference tree (CTREE) - has been devel-
oped and exploited in a number of elds. CTREE is a "non-parametric class of regression trees
embedding tree-structured regression models into a well dened theory of conditional inference
procedures" (Hothorn et al., 2006). To name a few CTREE applications, Konig et al. (2015)
study the eects of multiple chronic conditions on health care costs, and Ahrazem Dfuf et al.
(2019) performs a variable importance analysis to predict electricity prices and demand in the en-
ergy industry. Currently, no known academic study has been found to apply CTREE to examine
the importance of housing characteristics to home prices.
Machine learning approaches also work well capturing the heterogeneity of variables across
dierent partitioning of the dataset, which is crucial to housing price determinants. Numerous
contributors of heterogeneity were considered in past research, and most of them were demon-
strated to greatly aect housing prices and the length of time houses remain on the market.
Location is one of the most widely discussed factors in various studies, especially among more
recently published work. Xu (2008) examines the spatial and socio-economic heterogeneities in
the eects house attributes on prices by employing expansion models to the Shenzhen housing
30
market data, which incorporates interaction behaviors between property characteristics and lo-
cation coordinates. Galati and Teppa (2017) study the extent to which house price dynamics
dier across market segments as well as possible drivers of this heterogeneity using a Dutch sur-
vey data about individual houses and mortgages. Ozhegov and Sidorovykh (2017) focus on the
heterogeneity of sellers in the housing market and evaluate the eects of sellers' pricing strategy
on pricing dynamics. Heyman and Somervoll (2019) look at the dierence in explanatory power
of absolute versus relative location, dened as the proximity to subway, parks and services, in a
hedonic-based model for apartment sales within the city of Oslo, Norway.
Overall, the previous researches considered here are related to the central question of this essay
to some extent: they either work on heterogeneity problems using hedonic-based model or employ
machine learning approaches to other elds. Nevertheless, this essay is the rst academic work so
far to apply CTREE, a more recently developed tree-based algorithm, to study the heterogeneity
of factor-importance across dierent areas.
2.3 The Conditional Inference Tree Algorithm
2.3.1 Theoretical foundation
The theoretical framework is based on the utility tree theory proposed by Strotz (1959) and
developed by Apps (1973). In Strotz (1959), the utility function assumes the consumers to be
making hierarchical decisions when allocating resources. To be more specic, consumers rst dis-
tribute the income to dierent commodity groups, and then choose the commodities within each
sub-group according to the characteristics. Correspondingly, the utility function can be expressed
recursively as a hierarchy of sub-utility functions generated within each commodity group, such
as education, housing, and living consumptions.
Based on the idea of Strotz, Apps (1973) develops a model specically for housing demand.
In Apps' model, the resource allocation in the home-buying process also follows a hierarchical
31
structure. At the base level, all features of a property form the single commodity group, housing,
which is equivalent in hierarchy to education, food, and so on. At the secondary level, housing is
then separated into three branches: space, location, and internal services. Then at the tertiary
level, each of the housing sub-branches is separated into its comprising characteristics. Figure 1
illustrates the tree structure from Apps' work.
Figure 2.1: Apps' tree structure of housing demand.
Following the work of Strotz and Apps, we establish the total utility from homebuying as a
function of levels of satisfaction gained from sub-group commodities dened in Apps. Specically,
utility function of home-buying in our study can be written as:
U
H
=U[U
1
(x
11
;x
12
;:::;x
1a
);U
2
(x
21
;x
22
;:::;x
2b
);:::] (2.1)
where
U
H
represents the level of consumer utility from home purchasing,
U
i
represents the sub-group utility from the corresponding commodity group,
x
ij
represents the component characteristic in each of the commodity group.
32
This specication of the home-buying utility indicates that the purchasing decision is made based
on the value contribution of each characteristic, and since the value of each sub-utility may
be ranked dierently in priority-ordering, the attributes most likely have varied importance in
determining the home prices.
2.3.2 The CTREE algorithm
2.3.2.1 Traditional decision tree approach
Before introducing the CTREE algorithm, we need to rst look at a more traditional machine
learning technique, the decision tree method, which has been applied to the Singapore public
resale housing market in an earlier Economic publication (Fan et al., 2006). Decision trees are
built through recursive partitioning, which is essentially a series of binary splits to the data
with the objective of maximizing the information gain represented by measures such as the Gini
coecient. The most commonly seen algorithms for building decision trees are CHAID, CART,
and C4.5/C5.0. Traditional decision trees are generally constructed through tree growing and
pruning.
In the growing process, the algorithm recursively searches for the covariate that splits the
dataset into two groups that yield the maximum discrepancy between the dependent variable in
the two sub-groups. At the beginning, the full training sample is considered a root node and
subsequent branch nodes are constructed based on each covariate selected following the splitting
process. The process goes on until no further split of the training data can produce statistically
signicant dierences between the two sub-groups, which is when we arrive at a terminal node.
The criterion used at each node to split the data vary across dierent algorithms, with one of the
most well-known being the Gini Index.
One of the problems associated with the traditional decision tree growing is over tting - the
algorithm often results in very large trees with too many splits such that it adheres to the training
33
data but is unable to generalize to other test samples. Pruning can help reduce the size of the
trees and hence improve the performance of constructed trees. To nd the optimally-sized tree,
the algorithm recursively removes sub-trees from the original tree to minimize the summation of
the misspecication rate and the complexity-cost.
Although pruning can reduce the overtting problem of traditional decision trees to some
extent, the algorithm still suers from the variable selection problem. Since the algorithm max-
imizes a splitting criterion over all possible splits simultaneously, the variable selection bias may
be introduced. This is especially the case when the covariates have dierent number of missing
values. Many researches identify this as a major problem, as seen in Kass (1980), Segal (1988),
and Breiman et al (1984). To avoid the variable selection bias, the conditional inference tree
(CTREE) method is developed based on the "distributional properties of the measures" (White
and Liu, 1994) as another recursive partitioning approach.
2.3.2.2 Conditional inference trees
The CTREE algorithm is a non-parametric prediction methodology which employs recursive
binary partitioning embedded in Strasser and Weber's (1999) framework of permutation tests.
This approach, rst introduced by Hothorn et al. (2006), results in unbiased predictor selection
and does not require pruning. CTREE avoids the variable selection bias by increasing the like-
lihood to select variables that have more catagorical levels or a lot of missing values, as theses
variables are likely to be biasly omitted by the traditional tree algorithm. Moreover, CTREE
uses a signicance test procedure in the step of selecting variables instead of maximizing an infor-
mation measure, ensuring that variables with the strongest association to the target variable are
selected. In the next few paragraphs, we will walk through the algorithm, with an introduction
of the implementing steps and the selection, splitting, and stopping criteria.
Under the conditional inference setting, the dependent variable Y is dened as a conditional
on a functiong with respect to a set ofk covariatesX (i.e. f(Yjg(X
1
;X
2
;:::;X
k
))). The learning
34
sample used to produce trees consists of n independent and identically distributed observations,
with some possibly missing values in certain covariates:
L
n
=f(Y
i
;X
1i
;X
2i
;:::;X
mi
);i = 1;:::;ng
The algorithm formulates the recursive binary partitioning using a case weights vector w =
(w
1
;w
2
;:::;w
n
). The weight equals 1 if the corresponding observation is an element of a certain
node and equals 0 otherwise. With the case weights vector dened, the algorithm is brie
y
described as in the following steps:
Step 1. At each node, test a global null hypothesisH
0
:f(YjX
j
) =f(Y ) on a pre-specied
= 0:05 level of signicance. That is, test the independence between any of them covariates
and the response. If the null hypothesis cannot be rejected, the algorithm stops. Otherwise,
select the covariate with the strongest association with the dependent variable Y , according
to a selection criterion that is explained later.
Step 2. Choose a set A and split the data at the current node into subsets A and X
j
nA
based on a splitting criterion (specied below). Determine the case weightsw
left
andw
right
.
For any binary partition of the sample into subsets A and X
j
nA, the case vector contains
w
left;i
=w
i
I(X
i
2A) and w
right;i
=w
i
I(X
i
= 2A) with I() as indicator function.
Step 3. Recursively repeat steps 1 and 2 with modied case weightsw
left
andw
right
, which
are used in the selection and splitting criteria.
To construct a tree that is interpretable and does not suer from the variable selection bias, it
is exceptionally important how the selection and splitting criteria are dened, which will be the
focus now.
35
Variable selection criteria
To select the covariate with the strongest association with the dependent variable, the measure
of association between Y and X
i
is rst dened by the following equation:
T
j
(L
n
; w) =vec
n
X
i=1
w
i
g
i
(X
ji
)h(Y
i
; (Y
1
;Y
2
;:::;Y
n
))
T
(2.2)
wherevec is the vec-operator and ()
T
is the transpose operation. For each of the X
j
's, the devi-
ation of the above measure is tested against the partial hypothesis H
j
0
:D(YjX
j
) =D(Y ). This
testing methods is generally known as permutation tests. In Step 1 of the algorithm, the covariate
with the minimum p-value over all possible permutations is selected.
Splitting criteria
With the selected independent variable chosen in step 1, the algorithm uses the permutation test
again to nd the optimal binary split. For any possible subset A at each node, the discrepancy
between the samplesfY
i
jw
i
> 0 and X
ji
2Ag andfY
i
jw
i
> 0 and X
ji
= 2Ag can be expressed as:
T
A
j
(L
n
;w) =vec
n
X
i=1
w
i
I(X
ji
2A)h(Y
i
; (Y
1
;:::;Y
n
))
T
(2.3)
This measure is again tested under the null hypothesis, and therefore the conditional expectation
and covariance can be calculated. The optimal split A
in step 2 maximizes the test statistics
over all possible permutations.
This section provides a brief overview of the conditional inference tree algorithm introduced by
Hothorn et al. (2006). The main conclusion from this section is that the CTREE algorithm avoids
the variable selection bias in the traditional decision tree approach. Building on the signicance
tests in each implemented step, the CTREE algorithm results in a tree-structured representation of
the learning data, and provides prediction and factor importance estimation. In the next section,
36
we will introduce our data that the CTREE algorithm is operated on to study the heterogeneity
of factor importance in housing prices.
2.4 Data
The dataset used in this essay contains about 128,225 homes listed by the Multiple Listing
Services that are sold between January 2012 and January 2018 in the Los Angeles county. The
location of each listing can be determined using the city and zip code combination. There are 118
zip code areas in the Los Angeles county, and the properties listed on the market show clustering
patterns in terms of geographic locations. In some of the zip code areas, little to no homes are sold
within the time period of the dataset; on the other hand, in the 10 most clustered zip code areas,
the total number of homes sold accounts for nearly 28% of the total amount. Price levels of the
property are also greatly diversied across dierent areas. In the most expensive neighborhoods,
for instance Beverly Hills, the median home value can be as high as $3.5 million, whereas in the
least expensive areas, such as Compton, the median value of a home is only $454,000. Figure 2
presents the LA area's most expensive neighborhoods in 2018 and the average home values in each
area. From the valuation map, we observe the clustering of not only the geographical locations
but also prices. In adjacent neighborhoods, the prices of properties tend to be similar and are not
likely to
uctuate signicantly, yet the diversity of prices is huge across the whole county.
This phenomenon motivates our research question: could the heterogeneity in the importance
of home characteristics be a reason of such price disparity? In light of this, we formulate a list of
features related to home prices and estimate their contribution to sale prices. The most common
housing features can be summarized into three groups: characteristics of the unit itself, community
features, and the agents responsible for the listing. Among these, a home's characteristics include
the number of bedrooms, the number of bathrooms, square footage, age, structure type, the
number of parking spaces, cooling type, heating type, the number of appliances, view from the
37
Figure 2.2: LA area's most expensive neighborhoods in 2018.
Source: Property Shark
38
property, the number of common walls shared with neighbors,
ooring type, laundry availability,
and replace availability. Community features include association fees, pool availability, common
deck availability, and other attributes such as gym and security systems. The listing agents are
also important to the sales price of listings because of the resources and experience associated
with them. Typically, agents in large real estate companies generally have broader connections
and can better advertise the property. For each of the characteristics mentioned above, we collect
and process piecewise data from the original listings in the MLS database. This list contains
both qualitative and quantitative features: the number of rooms and parking spaces can be
represented using a numerical value, whereas the structure type and laundry availability can only
be represented using verbal descriptions. For qualitative variables, we apply word processing to
the raw MLS data and classify all the contents into several preconstructed groups, each of which
representing a category to be analyzed in the algorithm. Table 1 shows the descriptive statistics
of the dataset in two parts: for numerical variables, usual statistics such as mean and variance
values are presented; for qualitative variables, we present the distribution of all category levels.
In the Los Angeles county housing dataset, the average price per square foot is about $365, and
a typical home has 3 bedrooms and 2.4 bathrooms. About 73% of the homes sold between January
2012 and January 2018 are single family residences with the rest being condos and townhouses.
Since single family homes are usually not related to any homeowner associations, a large portion
of the listings have $0 association fee. It is also reasonable that many of the community-related
features are not relevant to single family homes, such as common walls shared and community-
based ambiances. Other characteristics show diversied distributions. Around 70% of the homes
have central cooling and heating; 60% of the homes have in-unit laundry; the most common types
of
ooring are wood (33.8%) and carpet (31.33%); nearly 50% of the properties do not have
a pool. It is worth mentioning that the not all the information listed above is available in all
listings. Some of the characteristics are not specied, potentially because they are not relevant or
the listing agents simply neglected to ll in the information. The CTREE algorithm is especially
39
Table 2.1: Descriptive statistics
Table 1a Numerical variables
Statistic N Mean St. Dev. Pctl(25) Median Pctl(75)
Price per sq. ft. 128,225 $365.275 $911.211 $297.689 $434.205 $514.967
Bedrooms 128,225 2.980 1.052 2 3 4
Bathrooms 128,225 2.412 1.146 2 2 3
Square footage 128,225 1,761.457 1,002.230 1,172 1,516 2,012
Age 128,225 54.037 26.980 35 57 71
Parking spaces 128,225 3.636 1.096 4 4 4
Association fee 128,225 115.876 203.828 0 0 245
Table 1b Qualitative variables
Statistic Category Levels, Counts, and Percentages
Type condominium single family
34410 (26.84%) 93815 (73.16%)
Cooling central other none not specied
93827 (73.17%) 11734 (9.15%) 15834 (12.35%) 6830 (5.33%)
Heating central other none not specied
87688 (68.39%) 17171 (13.39%) 1141 (0.89%) 22225 (17.33%)
Appliance available none not specied
80887 (63.08%) 1436 (1.12%) 45902 (35.80%)
Flooring wood carpet other not specied
43380 (33.83%) 40172 (31.33%) 12682 (9.89% ) 31991 (24.95%)
Laundry in-unit shared none not specied
77268 (60.26%) 17620 (13.74%) 3457 (2.70%) 29880 (23.30%)
Fireplace available none not specied
25931 (20.22%) 10827 (8.44%) 91467 (71.33%)
View available no view not specied
55664 (43.41%) 60244 (46.98%) 12317 (9.61%)
Common Wall none 1+ walls not specied
94158 (73.43%) 5826 (4.54%) 28241 (22.02%)
Community has feature no feature
23141 (18.05%) 105084 (81.95%)
Deck available no deck not specied
50301 (39.23%) 452 (0.35% ) 77472 (60.42%)
Pool available no pool not specied
40073 (31.25%) 63246 (49.32%) 24906 (19.42%)
Agent large small not specied
35556 (27.73%) 81462 (63.53%) 11207 (8.74%)
40
good at processing variables with a lot of missing data for unbiased results, and therefore the
unspecied information should not be a major problem in this study.
Figure 2.3: Regions in LA county from US Map Guide
To analyze the heterogeneity of variable importance in housing prices, we need to split the
whole Los Angeles county into smaller neighborhoods. For example, the government of Los
Angeles county ocially plans for ve supervisorial districts, with each supervisor representing a
district of approximately 2 million people. Although the sizes of population are similar among
the ve districts, such partitioning does not represent the observed housing price discrepancy very
well. Each district contains both high- and low-priced neighborhoods, and some of the similar
neighborhoods lie right on the border between two districts. A better partition of the county is
41
found on USMapGuide.com, a source that shows the boundaries of regions according to zip codes.
Figure 3 illustrates a more detailed division of the county than the ve supervisorial districts.
Figure 2.4: Merged Regions in LA county
The list of zip codes corresponding to each region in the map is also provided by the same
source. According to this information, we are able to classify all of the listings in our dataset into
dierent regions. As mentioned previously, we observe geographical clustering of the homes sold,
where in the central LA area, there are 16,019 properties sold within the 6-year periods, and in
more spatial areas, such as the Santa Monica Mountains and Angeles Forest, there are less than
100 homes sold in 6 years. In order to perform the tree algorithm, the size of each subsample
cannot be too small. Therefore, we merge some of the regions that are geographically close to
each other and formulate seven districts, each containing close to or over 10,000 listings. After
the merge, we are left with the following districts: central LA, south LA, San Fernando Valley,
42
the north side, the south side, the east side, and the west side. Figure 4 shows the seven districts
merged from the original division.
In each of the regions shown in the map, we run the CTREE algorithm to grow a tree for
price determination. Along the branches of the tree, the factors in
uencing the house prices will
be shown and a prediction of prices will be presented at the terminal nodes of the tree. The next
section discusses detailed implementation of the CTEEE algorithm and the results.
2.5 Results in the LA Housing Market
The objective of this section is to investigate the importance of housing characteristics to sales
prices and identify the signicant determinants based on the decision trees for housing price built
in each of the seven districts. To create a decision tree for housing price, we use the "partykit"
package in R, which is developed by the authors of the CTREE algorithm. Prior to building trees,
we dene the training dataset by randomly selecting 80% of the full sample in each district. The
remaining 20% is used as a validation dataset to test the performance of the algorithm.
Running the CTREE algorithm on each of the training datasets yields seven dierent trees.
Because of the large number of variables and possible splits for each variable, the trees are very
large in size. The complete conditional inference trees are shown in the Appendix, and we present
a sample tree in the Central LA area with the maximum level of nodes limited to 3 (Figure 5).
In this tree, each internal node denotes a covariate of the per-square-foot house price selected by
the algorithm. According to the implementation rules of CTREE, the variables that are most
signicant (e.g. give the largest test statistics in the permutation test) are selected at the top
of the tree. Each internal node also shows the level of signicance for the selected variable.
Under each of the internal nodes, a split is performed and the resulting categories of the selected
characteristicsare separated into two branches according to CTREE's splitting rule. Following the
selection and splitting rule, all properties in the training dataset can be classied into tree leaves.
43
Figure 2.5: Sample Conditional Inference Tree for Central LA Area 44
The terminal nodes show a prediction of the average home price and the number of properties in
each leaf.
For each of the seven districts, we construct a complete conditional inference tree with the
in
uencing factors and the predicted prices. With these trees, we are able to compare the price
discrepancy across neighborhoods and the contributing factors. Notice that since the conditional
inference trees are built by recursively performing signicance tests and selecting the variables
that provide the best test statistics, a variable may appear in the tree many times, and the
component characteristics get split into more branches as the tree grows larger. Additionally, it
is not necessarily true that the order of appearance in a tree represents the level of importance
of a certain variable. In this regard, we run an additional algorithm associated with CTREE
which determines the importance of variables in their overall contribution to price determination
using the seven training samples. The results show that the variables with the highest level
of importance diversify across dierent neighborhoods. Figure 6 shows the 3 most important
in
uencing price factors in each partitioning of the Los Angeles county.
In Central LA area, the top 3 factors are the number of bathrooms, square footage, and the
type of the property, whereas in the San Fernando Valley area, the 3 most important variables
are heating type, laundry availability, and
ooring type. By growing a conditional inference
tree in each of the partitioned region of the Los Angeles county, we nd evidence that housing
characteristics do have diversied importance in determining the sales price across dierent areas.
Some of the in
uencing factors are related to the location of the properties. For instance, in our
dataset, the condos and town homes are mostly located in South LA and the East side, which
might be the reason that the community-based ambiances plays an important role on South LA
housing prices, and that the association fees are one of the top in
uencing factors in the East
side.
45
Figure 2.6: Most important variables in LA county by area
46
2.6 Conclusion
This essay studies the heterogeneity of how housing characteristics aect sales price across
dierent areas. The research question is motivated by the price clustering of properties in the Los
Angeles county. One hypothesis that could explain the phenomenon is that the characteristics of
homes may have dierent levels of importance in determining the sales price. To examine this
hypothesis, we utilize the conditional inference tree (CTREE) approach, an established machine
learning technique that has been applied to the stock market and medical research but not the
housing market. The data collected from MLS in the LA County area contains detailed attributes
associated with each listing and enables us to apply the CTREE algorithm to build trees for
the analysis of price determination. We partition the Los Angeles county into dierent regions
according to the geographical neighborhood dened by an online map guide. Moreover, in order
to ensure ample number of entries in each of the dataset to implement the CTREE algorithm,
we merge some of the regions dened in the map and obtained seven districts in the Los Angeles
county, each associated with close to or over 10,000 listings that are sold between January, 2012
and January, 2018. Growing trees and analyzing the importance of variables in all of the seven
districts result in a prediction of per-square-foot housing prices and each variable's contribution
to the sales price. By comparing the importance of variables among all trees, it can be seen that
the housing characteristics do have diversied in
uences to the sales price across dierent regions.
This empirical result from the MLS data supports our hypothesis.
At this point, there are several related questions that remain to be explored. We have demon-
strated the heterogeneity of housing price determinants across areas, and one may take a step
further to study the rationale of such heterogeneity. What could be the underlying reason that
housing characteristics contribute dierently to sales price? Two of the potential answers are: (1)
home buyers' preference in houses may dier across areas, so it is essentially the heterogeneity of
47
the buyers' preference that causes the discrepancy; (2) there might be some location-specic fac-
tors that are not observed in the housing data, which could lead to dierent valuation of housing
characteristics. Exploring further evidence to support the rationale of the heterogeneity could be
a good extension to the study in this essay, which may include collecting richer data about the
demographics and location-specic information.
48
Bibliography
[1] Ahrazem Dfuf, I. , McWilliams, J. , Gonz alez, C. ,2019. Multi-Output Conditional Inference
Trees Applied to the Electricity Market: Variable Importance Analysis. Energies 12(6), 1097.
[2] Albrecht, J. , Gautier, P. , Vroman, S. , 2016. Directed search in the housing market.Review
of Economic Dynamics Volume 19, 218-231.
[3] An, Y. , Hu, Y ., Shum, M. , 2010. Estimating rst-price auctions with an unknown number
of bidders: A misclassication approach. Journal of Econometrics 157, 328-341.
[4] Anglin, P. , Rutherford, R. , Springer, T. , 2003. The trade-obetween the selling price of
residential properties and time-on-the-market: the impact of price setting. Journal of Real
Estate Finance and Economics 26, 95{111.
[5] Apps, P. , 1973. An approach to urban modelling and evaluation. A residential model: 1.
Theory. Environment and Planning A. 5. 619-632.
[6] Arnold, M.A. , 1999. Search, bargaining and optimal asking prices. Real Estate Economics
27, 453{481.
[7] Baker, D. , 1995. County Home Sales Slide 20.4% in May. Los Angeles Times (pre-1997
Fulltext); Jun 13, 1995; pg. 1.
[8] Bensen, E. D., Hansen, J. L., Schwartz, A. L. , Smersh, G. T. , 1998. Pricing residential
amenities: the value of a view. Journal of Real Estate Finance and Economics, 16(1), pp.
55{73.
[9] Carrillo, P.E. , 2012. An empirical stationary equilibrium search model of the housing market.
International Economic Review 53, 203{234.
[10] Carrillo, P.E. , 2013. To sell or not to sell: measuring the heat of the housing market. Real
Estate Economics 41, 310{346.
[11] Chen, Y. , Rosenthal, R.W. , 1996. Asking prices as commitment devices. International
Economic Review 37, 129{155.
[12] Chen, Y. , Rosenthal, R.W. , 1996. On the use of ceiling-price commitments by monopolists.
The RAND Journal of Economics 27, 207{220.
[13] Droes, M. , Minne, A. , 2016. Do the Determinants of House Prices Change over Time?
Evidence from 200 Years of Transactions Data. Conference paper.
[14] Fan, G. , Ong, S. , Koh, H. , 2006. Determinants of House Price: A Decision Tree Approach.
Urban Studies. 43. 2301-2316.
[15] Fletcher, M., Gallimore, P. , Mangan, J. ,2000. Heteroskedasticity in hedonic house price
models. Journal of Property Research, 17(2), pp. 93{108.
49
[16] Fletcher, M., Gallimore, P. , Mangan, J. ,2000. The modeling of housing submarkets. Journal
of Property Investment and Finance, 18(4), pp. 473{487.
[17] Galati, G. , Teppa, F. ,2017. Heterogeneity in House Price Dynamics. SSRN Electronic
Journal.
[18] Gao, G. , Bao, Z. , Cao, J. , Qin, A. , Sellis, T. , Wu, Z. , 2019. Location-Centered House
Price Prediction: A Multi-Task Learning Approach. Working paper.
[19] Glower, M. , Haurin, D. , Hendershott, P. , 1998. Selling time and selling price: the in
uence
of seller motivation. Real Estate Economics 26, 719{740.
[20] Han, L. , Strange, W. , 2014. Bidding wars for houses. Real Estate Economics 42, 1{32.
[21] Han, L. , Strange, W. , 2015. The microstructure of housing markets: search, bargaining, and
brokerage. In: Duranton, G., Henderson, J.V., Strange, W. (Eds.), Handbook of Regional
and Urban Economics, Volume 5B. Elsevier, Amsterdam, pp. 813{886. [2.5pt]
[22] Han, L. , Strange, W. , 2016. What is the role of the asking price for a house? Journal of
Urban Economics 93, 115-130.
[23] Haurin, D. , 1988. The duration of marketing time of residential housing. Real Estate Eco-
nomics 16, 396{410.
[24] Heyman, A. , Sommervoll, D. , 2019. House prices and relative location. Cities Volume 95.
[25] Hothorn, T. , Hornik, K. , Zeileis, A. ,2006. Unbiased Recursive Partitioning. Journal of
Computational and Graphical Statistics - J COMPUT GRAPH STAT. 15. 651-674.
[26] Knight, J.R. , 2002. Listing price, time on market, and ultimate selling price: causes and
eects of listing price changes. Real Estate Economics 30, 213{237.
[27] K onig, H. , Leicht, H. , Bickel, H. , Fuchs, A. , Gensichen, J. , Maier, W. , Mergenthal,
K. , Riedel-Heller, S. , Sch afer, I. , Sch on, G. , Weyerer, S. , Wiese, B. , van den Bussche,
H. , Scherer, M. , Eckardt, M. , 2013. Eects of multiple chronic conditions on health care
costs: An analysis based on an advanced tree-based regression model. BMC health services
research. 13. 219.
[28] Lancaster, K. J. , 1966. A new approach to consumer theory. Journal of Political Economy,
vol. 74, no. 2, pp. 132{157.
[29] Malpezzi, S. ,2003. Hedonic pricing models. A selective and applied review, in:
T.O'SULLIVAN and K. GIBB (Eds) Housing Economics and Public Policy, pp. 67{89.
Malden, MA: Blackwell Science.
[30] Merlo, A. , Ortalo-Magne, F. , Rust, J. , 2015. The home selling problem: theory and
evidence. International Economic Review 56, 457{484.
[31] Montero, J. M. , Minguez, R. ,Fernandez-Aviles, G., 2018. Housing price prediction: para-
metric versus semi-parametric spatial hedonic models. Journal of Geographical Systems, vol.
20, no. 1, pp. 27{55.
[32] Niu, G. , Soest, A. , 2014. House Price Expectations. Institute for the Study of Labor (IZA)
discussion paper.
[33] Ozhegov, E. , Sidorovykh, A. ,2017. Heterogeneity of Sellers in Housing Market: Dierence
in Pricing Strategies. Journal of Housing Economics 37, 42-51.
50
[34] Redn.com. Listing for 3810 Randolph Ave from https://www.redn.com/CA/Los-
Angeles/3810-Randolph-Ave-90032/home/7001071.
[35] Rosen, S. , 1974. Hedonic prices and implicit markets: product dierentiation in pure com-
petition. Journal of political economy, vol. 82, no. 1, pp. 34{55.
[36] Selim, H. , 2009. Determinants of house prices in turkey: Hedonic regression versus articial
neural network," Expert Systems with Applications, vol. 36, no. 2, pp. 2843{2852.
[37] Song, U., 2004. Nonparametric estimation of an e-Bay auction model with an unknown
number of bidders. Working paper, University of British Columbia.
[38] Sorensen, E. , Miller, K. , Ooi, C. ,2000. The Decision Tree Approach to Stock Selection.
Journal of Portfolio Management - J PORTFOLIO MANAGE. 27. 42-52.
[39] Strotz, R. H. , 1957. The empirical implications of a utility tree, Econometrica, 25, pp.
269{280.
[40] Strotz, R. H. , 1959. The utility tree: a correction and further appraisal, Econometrica, 27,
pp. 482{488.
[41] Toussaint-Comeau, M. , Lee, J. , 2018. Determinants of Housing Values and Variations in
Home Prices Across Neighborhoods in Cook County. ProtWise News and Views, No. 1,
2018.
[42] Xu, T. , 2008. Heterogeneity in Housing Attribute Prices: An Interaction Approach between
Housing Attributes, Absolute Location and Household Characteristics. International Journal
of Housing Markets and Analysis.
[43] Yavas, A. , Yang, S. , 1995. The strategic role of listing price in marketing real estate: theory
and evidence. Real Estate Economics 23, 347{368.
[44] Yayar, R. , Demir, D. , 2014. Hedonic estimation of housing market prices in turkey. Erciyes
Universitesi Iktisadi ve Idari Bilimler Fakultesi Dergisi, no. 43, pp. 67{82.
51
Appendix A
Supplementary Material for Chapter 2
In the Appendix we present the full conditional inference tree results for all seven areas. Each
tree starts from a root node and grows into internal nodes as variables are selected and split. At
each node, we also present the number of observations and the estimation error.
Decision Tree Results for Central LA
[1] root
[2] Type in Condo / Townhouse
[3] Beds 1
[4] laund in in-unit
[5]
oor in carpet, not specied, other
[6] view in has view: 571.152 (n = 658, err = 12156155.0)
[7] view in no view, not specied: 484.912 (n = 142, err = 2895474.7)
[8]
oor in wood
[9] heating in central: 733.744 (n = 736, err = 15604184.4)
[10] heating in other, not specied: 599.635 (n = 23, err = 1142163.5)
[11] laund in shared, none, not specied
[12] appl in has appliance
[13] Beds 0: 586.090 (n = 70, err = 1254314.1)
[14] Beds> 0: 521.407 (n = 498, err = 7946851.7)
[15] appl in none, not specied
[16] cooling in central, not specied: 489.049 (n = 69, err = 1018549.1)
[17] cooling in none, other: 369.420 (n = 29, err = 356283.0)
[18] Beds> 1
[19] heating in central
[20] Beds 2
[21]
oor in carpet, not specied: 475.551 (n = 1319, err = 14276626.6)
[22]
oor in other, wood: 511.050 (n = 1397, err = 22146518.0)
[23] Beds> 2
[24] park in 0, 2, 3, 10: 450.304 (n = 595, err = 9156976.4)
[25] park in 1, 4: 486.675 (n = 235, err = 4829140.5)
[26] heating in other, none, not specied
[27] laund in in-unit, shared: 868.756 (n = 38, err = 549183.6)
[28] laund in not specied: 460.158 (n = 26, err = 643721.1)
[29] Type in Single Family
[30] pool in has pool
[31] Baths 3
[32] Beds 3
[33] cooling in central: 861.086 (n = 419, err = 35095453.0)
[34] cooling in none, other, not specied: 607.649 (n = 33, err = 1324228.5)
[35] Beds> 3
[36] laund in in-unit, none, shared: 710.900 (n = 133, err = 8513607.5)
52
[37] laund in not specied: 463.001 (n = 14, err = 251110.6)
[38] Baths> 3
[39] Beds 3: 1067.559 (n = 164, err = 58712011.8)
[40] Beds> 3: 879.892 (n = 814, err = 255044077.4)
[41] pool in none, not specied
[42] cooling in central
[43]
oor in carpet, not specied
[44] Beds 1: 903.937 (n = 21, err = 3360022.8)
[45] Beds> 1: 609.349 (n = 1726, err = 116014366.9)
[46]
oor in other, wood
[47] Baths 5: 684.462 (n = 4318, err = 215129064.8)
[48] Baths> 5: 954.634 (n = 146, err = 104931262.8)
[49] cooling in none, not specied, other
[50] Beds 3
[51] Beds 2: 629.852 (n = 890, err = 48359070.3)
[52] Beds> 2: 548.290 (n = 876, err = 46076123.7)
[53] Beds> 3
[54] age 99: 488.774 (n = 414, err = 17769761.9)
[55] age> 99: 310.993 (n = 216, err = 4241557.8)
Number of inner nodes: 27
Number of terminal nodes: 28
53
Decision Tree Results for the East Side
[1] root
[2]
oor in carpet, not specied, other
[3] Beds 3
[4] Type in Condo / Townhouse
[5]
oor in carpet, other
[6] replace in none: 362.718 (n = 51, err = 187668.9)
[7] replace in not specied, yes: 329.678 (n = 477, err = 2186569.5)
[8]
oor in not specied
[9] Beds 2: 310.782 (n = 178, err = 635570.5)
[10] Beds> 2: 224.828 (n = 54, err = 220021.2)
[11] Type in Single Family
[12] cooling in central, not specied
[13] Beds 2: 498.326 (n = 717, err = 19161003.2)
[14] Beds> 2: 415.644 (n = 1223, err = 18227862.8)
[15] cooling in none, other
[16] Beds 2: 404.584 (n = 1296, err = 29241523.8)
[17] Beds> 2: 338.709 (n = 890, err = 9393520.7)
[18] Beds> 3
[19] appl in has appliance
[20] cooling in central, not specied, other
[21] age 36: 319.989 (n = 163, err = 556073.1)
[22] age> 36: 370.732 (n = 178, err = 1631538.5)
[23] cooling in none
[24] Beds 4: 315.078 (n = 40, err = 360380.9)
[25] Beds> 4: 210.904 (n = 20, err = 60890.8)
[26] appl in none, not specied
[27] Beds 5
[28] replace in none, yes: 305.946 (n = 265, err = 2281593.0)
[29] replace in not specied: 261.997 (n = 201, err = 1300493.5)
[30] Beds> 5: 183.833 (n = 40, err = 153088.8)
[31]
oor in wood
[32] Type in Condo / Townhouse
[33] heating in central, not specied
[34] Baths 2
[35] view in has view, not specied: 356.401 (n = 240, err = 977851.8)
[36] view in no view: 303.007 (n = 48, err = 304630.2)
[37] Baths> 2
[38] pool in has pool, not specied: 281.095 (n = 47, err = 250572.9)
[39] pool in none: 393.009 (n = 9, err = 31403.4)
[40] heating in other: 454.074 (n = 14, err = 45637.1)
[41] Type in Single Family
[42] Beds 2
[43] cooling in central
[44] pool in has pool, not specied: 564.822 (n = 365, err = 7231235.8)
[45] pool in none: 655.647 (n = 286, err = 7665994.3)
[46] cooling in none, not specied, other
[47] appl in has appliance: 543.302 (n = 335, err = 9796545.1)
[48] appl in none, not specied: 456.120 (n = 264, err = 5373861.1)
[49] Beds> 2
[50] cooling in central, not specied, other
[51] Beds 3: 478.543 (n = 1233, err = 17915734.4)
[52] Beds> 3: 388.013 (n = 322, err = 3765302.3)
[53] cooling in none
54
[54] Beds 3: 369.663 (n = 199, err = 2016849.3)
[55] Beds> 3: 287.588 (n = 53, err = 244446.6)
Number of inner nodes: 27
Number of terminal nodes: 28
55
Decision Tree Results for the North Side
[1] root
[2] age 54.03745
[3] replace in none, yes
[4] Baths 2
[5] Type in Condo / Townhouse: 285.104 (n = 195, err = 681786.1)
[6] Type in Single Family
[7] laund in in-unit, none: 354.965 (n = 352, err = 1703921.6)
[8] laund in not specied, shared: 323.375 (n = 192, err = 889955.6)
[9] Baths> 2
[10]
oor in carpet, not specied, other
[11] heating in central, not specied: 276.505 (n = 1177, err = 3256272.6)
[12] heating in other: 214.720 (n = 12, err = 17920.6)
[13]
oor in wood: 295.803 (n = 345, err = 978807.9)
[14] replace in not specied
[15]
oor in carpet, not specied
[16] Type in Condo / Townhouse
[17] age 13: 283.877 (n = 302, err = 1664910.5)
[18] age> 13: 234.984 (n = 1007, err = 2743196.5)
[19] Type in Single Family
[20] age 38: 254.899 (n = 1210, err = 5342814.5)
[21] age> 38: 287.160 (n = 1008, err = 3140571.6)
[22]
oor in other, wood
[23] Baths 2
[24] pool in has pool: 280.537 (n = 228, err = 1610513.4)
[25] pool in none, not specied: 316.442 (n = 295, err = 1236142.1)
[26] Baths> 2
[27] pool in has pool, not specied: 279.877 (n = 613, err = 2052424.1)
[28] pool in none: 262.446 (n = 264, err = 535778.3)
[29] age> 54.03745
[30] Beds 2
[31] Baths 1
[32] Beds 1: 499.433 (n = 96, err = 2256106.5)
[33] Beds> 1
[34] comm in has feature: 478.792 (n = 104, err = 890688.0)
[35] comm in no feature: 420.016 (n = 538, err = 5384550.8)
[36] Baths> 1
[37]
oor in carpet, other, wood: 396.670 (n = 212, err = 1726521.5)
[38]
oor in not specied: 327.508 (n = 58, err = 431061.3)
[39] Beds> 2
[40] Beds 3
[41] replace in none, yes
[42]
oor in carpet, other: 353.591 (n = 291, err = 1156673.6)
[43]
oor in not specied, wood: 377.931 (n = 458, err = 2244487.9)
[44] replace in not specied
[45]
oor in carpet, not specied: 323.735 (n = 719, err = 4073127.1)
[46]
oor in other, wood: 357.855 (n = 522, err = 3036588.7)
[47] Beds> 3
[48]
oor in carpet, wood
[49] pool in has pool, none: 301.526 (n = 308, err = 1389799.7)
[50] pool in not specied: 341.035 (n = 147, err = 1862069.8)
[51]
oor in not specied, other
[52] Beds 4: 283.298 (n = 200, err = 931309.1)
[53] Beds> 4: 235.818 (n = 52, err = 207180.2)
56
Number of inner nodes: 26
Number of terminal nodes: 27
57
Decision Tree Results for the South Side
[1] root
[2] cooling in central, none, other
[3]
oor in carpet, not specied, other
[4] Type in Condo / Townhouse
[5] Beds 1
[6] appl in has appliance: 442.995 (n = 176, err = 3451640.8)
[7] appl in not specied: 359.053 (n = 94, err = 1555110.9)
[8] Beds> 1
[9] appl in has appliance, none: 346.513 (n = 1131, err = 13917604.7)
[10] appl in not specied: 292.262 (n = 606, err = 3904011.6)
[11] Type in Single Family
[12] appl in has appliance
[13] comm in has feature: 392.033 (n = 767, err = 11794616.7)
[14] comm in no feature: 465.906 (n = 653, err = 15259159.4)
[15] appl in none, not specied
[16] Beds 3: 384.857 (n = 1021, err = 17437263.9)
[17] Beds> 3: 301.512 (n = 274, err = 3599698.7)
[18]
oor in wood
[19] comm in has feature
[20] Type in Condo / Townhouse
[21] Baths 2: 374.019 (n = 161, err = 2200354.4)
[22] Baths> 2: 306.840 (n = 82, err = 609749.8)
[23] Type in Single Family
[24] Beds 3: 457.627 (n = 528, err = 8482619.6)
[25] Beds> 3: 370.784 (n = 103, err = 1992755.6)
[26] comm in no feature
[27] cooling in central, other
[28] Type in Condo / Townhouse: 465.070 (n = 207, err = 5313851.1)
[29] Type in Single Family: 580.260 (n = 457, err = 12711801.9)
[30] cooling in none
[31] pool in has pool, none: 402.863 (n = 236, err = 4756871.0)
[32] pool in not specied: 537.078 (n = 55, err = 632051.8)
[33] cooling in not specied
[34] age 60
[35] age 35: 381.412 (n = 40, err = 986879.0)
[36] age> 35
[37] laund in in-unit, shared: 492.779 (n = 220, err = 4499658.1)
[38] laund in none, not specied: 334.931 (n = 13, err = 219262.0)
[39] age> 60
[40] Beds 3
[41] pool in has pool, none: 659.486 (n = 239, err = 5779961.9)
[42] pool in not specied
[43] Baths 1: 660.978 (n = 62, err = 2069502.1)
[44] Baths> 1: 562.613 (n = 120, err = 2513255.3)
[45] Beds> 3: 550.448 (n = 121, err = 2022842.9)
Number of inner nodes: 22
Number of terminal nodes: 23
58
Decision Tree Results for San Fernando Valley
[1] root
[2] laund in in-unit, not specied, shared
[3] age 63
[4] Type in Condo / Townhouse
[5] Beds 2
[6] appl in has appliance: 317.262 (n = 5513, err = 37387415.7)
[7] appl in none, not specied: 279.932 (n = 3299, err = 21712567.0)
[8] Beds> 2
[9] age 11: 322.605 (n = 958, err = 9596463.6)
[10] age> 11: 254.531 (n = 4023, err = 15022163.8)
[11] Type in Single Family
[12]
oor in carpet, not specied
[13] view in has view: 338.188 (n = 6084, err = 69396769.4)
[14] view in no view, not specied: 311.693 (n = 12298, err = 92876151.8)
[15]
oor in other, wood
[16] Baths 5: 359.654 (n = 8266, err = 100546235.5)
[17] Baths> 5: 509.571 (n = 552, err = 38650366.9)
[18] age> 63
[19] Beds 2
[20] age 67
[21] laund in in-unit: 459.636 (n = 536, err = 9195676.3)
[22] laund in not specied, shared: 408.636 (n = 661, err = 10646613.7)
[23] age> 67
[24]
oor in carpet, not specied, other: 480.656 (n = 962, err = 26882353.9)
[25]
oor in wood: 529.664 (n = 589, err = 12808035.6)
[26] Beds> 2
[27] heating in none, not specied, other
[28] Beds 3: 376.152 (n = 2216, err = 25664480.3)
[29] Beds> 3: 323.820 (n = 638, err = 8732559.1)
[30] heating in central
[31] age 73: 405.184 (n = 4462, err = 60943497.3)
[32] age> 73: 470.133 (n = 1344, err = 40513078.2)
[33] laund in none: 709.015 (n = 874, err = 98911060074.8)
Number of inner nodes: 16
Number of terminal nodes: 17
59
Decision Tree Results for South LA
[1] root
[2] appl in has appliance
[3]
oor in carpet, other
[4] Beds 3
[5] pool in has pool, none
[6] cooling in central, not specied: 328.658 (n = 587, err = 5604697.6)
[7] cooling in none, other: 287.058 (n = 585, err = 4659271.1)
[8] pool in not specied
[9] view in has view, not specied: 302.193 (n = 162, err = 1328659.0)
[10] view in no view: 252.892 (n = 333, err = 1641703.8)
[11] Beds> 3
[12] age 42: 191.151 (n = 72, err = 246578.8)
[13] age> 42
[14] Beds 4: 266.024 (n = 269, err = 1770498.9)
[15] Beds> 4: 224.101 (n = 62, err = 355543.6)
[16]
oor in not specied, wood
[17] cooling in central, not specied
[18] pool in has pool, not specied
[19] replace in none, yes: 372.436 (n = 102, err = 1209411.4)
[20] replace in not specied: 319.449 (n = 370, err = 3047137.2)
[21] pool in none
[22]
oor in not specied: 345.096 (n = 141, err = 2034132.6)
[23]
oor in wood: 400.751 (n = 612, err = 12308908.8)
[24] cooling in none, other
[25] Beds 3
[26] view in has view: 343.728 (n = 255, err = 2503972.9)
[27] view in no view, not specied: 308.709 (n = 607, err = 4666174.7)
[28] Beds> 3: 259.100 (n = 145, err = 1060365.6)
[29] appl in none, not specied
[30] Beds 3
[31] comm in has feature
[32] laund in in-unit, none, shared
[33] Beds 2: 337.206 (n = 511, err = 4511505.1)
[34] Beds> 2: 301.735 (n = 753, err = 4686288.1)
[35] laund in not specied
[36] age 36: 217.390 (n = 111, err = 343895.3)
[37] age> 36: 291.098 (n = 2284, err = 17572965.4)
[38] comm in no feature
[39] cooling in central, not specied
[40] Beds 1: 390.374 (n = 42, err = 5991193.1)
[41] Beds> 1: 284.004 (n = 1293, err = 12130078.5)
[42] cooling in none, other
[43] Beds 2: 238.645 (n = 1104, err = 12571715.8)
[44] Beds> 2: 216.344 (n = 1251, err = 4834819.7)
[45] Beds> 3
[46] comm in has feature
[47] age 34: 193.179 (n = 119, err = 290621.8)
[48] age> 34
[49] Beds 4: 251.234 (n = 590, err = 3179980.5)
[50] Beds> 4: 224.526 (n = 127, err = 726437.0)
[51] comm in no feature
[52] cooling in central, not specied
[53]
oor in carpet, not specied, other: 215.150 (n = 209, err = 1078512.0)
60
[54]
oor in wood: 277.578 (n = 63, err = 493825.8)
[55] cooling in none, other
[56] Beds 4: 198.235 (n = 361, err = 1123637.1)
[57] Beds> 4: 168.871 (n = 102, err = 419547.7)
Number of inner nodes: 28
Number of terminal nodes: 29
61
Decision Tree Results for the West Side
[1] root
[2] Type in Condo / Townhouse
[3]
oor in carpet, not specied
[4] view in has view
[5] replace in none: 642.784 (n = 89, err = 2307186.9)
[6] replace in not specied, yes
[7] heating in none, other: 630.697 (n = 74, err = 3353160.4)
[8] heating in central, not specied: 542.605 (n = 3116, err = 59123414.7)
[9] view in no view, not specied
[10] Baths 1
[11] comm in has feature: 720.132 (n = 18, err = 247742.1)
[12] comm in no feature: 581.499 (n = 115, err = 2216200.7)
[13] Baths> 1
[14] appl in has appliance: 510.845 (n = 1037, err = 13561301.6)
[15] appl in none, not specied: 474.229 (n = 167, err = 1824623.9)
[16]
oor in other, wood
[17] cooling in central, none, other
[18] appl in has appliance
[19] agent in large: 556.371 (n = 646, err = 10689195.0)
[20] agent in not specifed, small: 586.845 (n = 2095, err = 53966756.9)
[21] appl in none, not specied: 513.905 (n = 195, err = 4510709.9)
[22] cooling in not specied
[23] pool in has pool, not specied: 541.646 (n = 46, err = 783026.1)
[24] pool in none: 771.598 (n = 67, err = 6782974.4)
[25] Type in Single Family
[26] Baths 6
[27] cooling in central, none, other
[28]
oor in carpet
[29] age 48: 696.368 (n = 869, err = 87407990.9)
[30] age> 48: 796.049 (n = 1933, err = 229063224.7)
[31]
oor in not specied, other, wood
[32] pool in has pool: 945.472 (n = 948, err = 323193568.2)
[33] pool in none, not specied: 823.665 (n = 4764, err = 536416693.2)
[34] cooling in not specied
[35] Beds 2
[36] Beds 1: 1602.793 (n = 19, err = 30219468.8)
[37] Beds> 1: 1103.083 (n = 459, err = 105813074.7)
[38] Beds> 2
[39] age 77: 863.733 (n = 844, err = 91925002.7)
[40] age> 77: 1046.282 (n = 214, err = 51044729.8)
[41] Baths> 6: 1224.430 (n = 515, err = 732777025.0)
Number of inner nodes: 20
Number of terminal nodes: 21
62
Abstract (if available)
Abstract
This dissertation explores the price determinants in the Los Angeles county housing marketing. The first chapter considers whether sellers strategically set the asking price based on market condition in order to bring competition to home buying. The motivation of this chapter is that houses sell both above and below the listing price, yet the asking price has historically been considered as a ceiling of the transaction price. This chapter proposes a search model where the buyers make the decision to visit a property based on the observed listing price and establishes a one-to-one relationship between the list price and the number of visitors. The seller is then able to maximize the expected profit over the list price. The chapter also performs a series of empirical estimates to test the model. The data contains detailed information about homes sold in the Los Angeles county between 2012 and 2018. We use the online viewing activities of a broker and information platform, Redfin, as a proxy of the number of potential buyers and measure the seller’s listing strategy as a comparison with prices of historically similar listings. The regression results show that there is a negative correlation between the high list price and the number of potential buyers, supporting our theoretical conclusion that the number of visitors decreases in the list price. We further examine the effects of online interest on the sale price. The Los Angeles county data shows that more potential buyers are associated with higher sale prices, but the extent to which the number of potential buyers affect sale prices could be different when market condition changes. This result coincides with our conclusion in the theoretical part in that the optimal number of visitors and list prices should be dependent on buyer conditions, that is, the value distribution and bargaining power. In conclusion, strategic listing of the sellers is an important determining factor of the number of potential buyers and the sale price. ❧ The second chapter studies the heterogeneity of how housing characteristics affect sales price across different areas. The research question is motivated by the price clustering of properties in the Los Angeles county. One hypothesis that could explain the phenomenon is that the characteristics of homes may have different levels of importance in determining the sales price. To examine this hypothesis, we utilize the conditional inference tree (CTREE) approach, an established machine learning technique that has been applied to the stock market and medical research but not the housing market. The data collected from MLS in the LA County area contains detailed attributes associated with each listing and enables us to apply the CTREE algorithm to build trees for the analysis of price determination. We partition the Los Angeles county into different regions according to the geographical neighborhood defined by an online map guide. Moreover, in order to ensure ample number of entries in each of the dataset to implement the CTREE algorithm, we merge some of the regions defined in the map and obtained seven districts in the Los Angeles county, each associated with close to or over 10,000 listings that are sold between January, 2012 and January, 2018. Growing trees and analyzing the importance of variables in all of the seven districts result in a prediction of per-square-foot housing prices and each variable’s contribution to the sales price. By comparing the importance of variables among all trees, it can be seen that the housing characteristics do have diversified influences to the sales price across different regions. This empirical result from the MLS data supports our hypothesis.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Three essays on agent’s strategic behavior on online trading market
PDF
Three essays on econometrics
PDF
Essays in information economics and marketing
PDF
Empirical essays on industrial organization
PDF
Early-warning systems for crisis risk
PDF
Essays on competition between multiproduct firms
PDF
Essays on nonparametric and finite-sample econometrics
PDF
Essays on competition for customer memberships
PDF
Behavioral approaches to industrial organization
PDF
Taxi driver learns dynamic multi-market equilibrium
PDF
Essays on quality screening in two-sided markets
PDF
Essays on family and labor economics
PDF
Essays on econometrics
PDF
Essays on development and health economics: social media and education policy
PDF
Applications of Markov‐switching models in economics
PDF
Essays on pricing and contracting
PDF
Essays in panel data analysis
PDF
Essays on development economics
PDF
Essays on revenue management with choice modeling
PDF
Three essays on the identification and estimation of structural economic models
Asset Metadata
Creator
Wang, Shichen
(author)
Core Title
Essays on price determinants in the Los Angeles housing market
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
04/25/2020
Defense Date
03/09/2020
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Housing,Los Angeles County,machine learning,OAI-PMH Harvest,strategic pricing
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ridder, Geert (
committee chair
), Camara, Fanny (
committee member
), Parkhomenko, Andrii (
committee member
)
Creator Email
shichenw@usc.edu,wshichen@yahoo.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-290850
Unique identifier
UC11663554
Identifier
etd-WangShiche-8339.pdf (filename),usctheses-c89-290850 (legacy record id)
Legacy Identifier
etd-WangShiche-8339.pdf
Dmrecord
290850
Document Type
Dissertation
Rights
Wang, Shichen
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
machine learning
strategic pricing