Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Applications of Markov‐switching models in economics
(USC Thesis Other)
Applications of Markov‐switching models in economics
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Applications of Markov-Switching Models in
Ecnonomics
APPLICATIONS OF MARKOV-SWITCHING MODELS IN
ECNONOMICS
BY
BO ZHOU
a thesis
submitted to the department of economics
and the school of graduate studies
of university of southern california
in partial fulfilment of the requirements
for the degree of
Doctor of Philosophy (Economics)
c
Copyright by Bo Zhou, December 2014
All Rights Reserved
Doctor of Philosophy (2014) University of Southern California
(Economics) Los Angeles, CA, USA
TITLE: ApplicationsofMarkov-Switching ModelsinEcnonomics
AUTHOR: Bo Zhou
SUPERVISOR: Dr. Cheng Hsiao
NUMBER OF PAGES: viii, 94
ii
Acknowledgements
I would like to thank my advisor, Professor Cheng Hsiao, for his invaluable guidance
and support while I was working on this dissertation. His adademic sprits as well as
care for students impressed me deeply. Special thanks for introducing me to health
economics, a facinating field. Additional thanks goes to Professor Geert Ridder,
whose suggestions in the econometrics discussion group helped me to develop this
work at its very early stage, and to Professor Hashem Pesaran, whose insightful
thoughts helped me to improve this paper substantially. I would also like to thank
Professor Christopher Jones, Professor Tae-Hwy Lee and Dr. Siniki Sakata for their
helpful comments. Also thanks to my friend Shuyang Sheng, for discussing research
questions and teaching me Taiji.
Thanks to Professor Daniel McFadden, for introducting me to the topic of con-
sumer choice in health care markets and offering guidance all along. His neverending
enthusiasm in research encouraged me to push myself through the problem-solving
process. Iamalsogratefultoothercoauthorswhosharedtheirexperience andknowl-
edge in health economics, including Jochim Winter, Florian Heiss, Amelie Wupper-
mann. Thanks to St. Clair Patrica for anwering my questions about data and pro-
gramming with kindness and patience.
Finally I would like to thank my parents for their love and support all along.
iii
Contents
Acknowledgements iii
List of Tables vii
List of Figures viii
1 Introduction 1
2 A Mixture Model for Stock Prices 3
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.2 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.1 Basic Model Setup . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2.2 Asset Prices and Returns . . . . . . . . . . . . . . . . . . . . . 10
2.2.3 Extensions of the Model . . . . . . . . . . . . . . . . . . . . . 13
2.3 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Hidden Markov Process and its Stationary Distribution . . . . 17
2.3.2 Identifiability of Parameters . . . . . . . . . . . . . . . . . . . 20
2.3.3 Maximum Likelihood Estimator of the Markov-Switching Mix-
ture Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
iv
2.3.4 Asymptotic Properties of Maximum Likelihood Estimator . . 24
2.4 Analysis of the S&P Index . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.1 Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3 Plan Switching and Inertia in Medicare Part D 34
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.2 Plan Switching in Medicare Part D . . . . . . . . . . . . . . . . . . . 37
3.2.1 Medicare Part D Plan Design . . . . . . . . . . . . . . . . . . 37
3.2.2 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.2.3 Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.2.4 Simulation of the Counterfactual Spending . . . . . . . . . . . 43
3.2.5 Main Findings . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
3.2.6 Reduced Form Analysis of Switching . . . . . . . . . . . . . . 52
3.3 A Two-Stage Model with Unobserved Ability. . . . . . . . . . . . . . 56
3.3.1 Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.3.2 Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
3.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
A Proof of Theorems in Chapter 2 64
A.1 Proof of Theorems in Section 2.2 . . . . . . . . . . . . . . . . . . . . 64
A.1.1 Proof of Proposition 1 . . . . . . . . . . . . . . . . . . . . . . 64
A.1.2 Proof of Corollary 1 . . . . . . . . . . . . . . . . . . . . . . . 65
v
A.1.3 Proof of Lemma 1 . . . . . . . . . . . . . . . . . . . . . . . . 66
A.1.4 Proof of Proposition 3 . . . . . . . . . . . . . . . . . . . . . . 66
A.1.5 Proof of Corollary 2 . . . . . . . . . . . . . . . . . . . . . . . 66
A.2 Theorems in Section 2.3 . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.2.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
A.2.2 Central Limit Theorem for the Score Function . . . . . . . . . 68
A.2.3 Law of Large Numbers for Observed Information Matrix . . . 69
A.2.4 Proof of Theorem 1 . . . . . . . . . . . . . . . . . . . . . . . . 70
B Supplements for Chapter 3 71
B.1 Simulation Details . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
B.1.1 Ordering Claims . . . . . . . . . . . . . . . . . . . . . . . . . 71
B.1.2 Calculation of Actual and Simulated Out-of-Pocket Spending . 72
B.2 Medicare Part D Spending by Switching Status . . . . . . . . . . . . 72
B.2.1 2007-08 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
B.2.2 2008-09 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
B.2.3 2009-10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
B.3 Reduced Form Regression . . . . . . . . . . . . . . . . . . . . . . . . 78
Bibliography for Chapter 2 89
Bibliography for Chapter 3 93
vi
List of Tables
3.1 Summary of PDP Non-EGWP Plans, 2007-2010 . . . . . . . . . . . 38
3.2 Summary of PDP Non-EGWP Plans, 2007-2010 . . . . . . . . . . . 39
3.3 Construction of working sample . . . . . . . . . . . . . . . . . . . . . 41
3.4 Summary Statistics of Working Sample by Year, 2007-2010 . . . . . . 42
3.5 Actual vs Simulated OOP, Sim Error = Simulated OOP - Actual OOP 45
3.6 Actual vs Simulated OOP, Sim Error = Simulated OOP - Actual OOP 46
3.7 Switching Rate by Demographics and Original Reason for Entitlement 49
3.8 Two stage model estimation results . . . . . . . . . . . . . . . . . . . 60
3.9 Probability of paying attention and variance in plan choice stage for a
representive beneficiary . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.10 Two stage model estimation results with attention triggers (1) com-
pared with one-stage multinomial logit model for plan choice stage,
same subsample as used in the two-stage model estimation (2) and full
sample (3) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
B.1 Classification of Chronic Conditions . . . . . . . . . . . . . . . . . . 78
B.2 Part A/B Health Care Use Variables . . . . . . . . . . . . . . . . . . 79
B.3 Linear Regression of Change in OOP on Chronic Conditions . . . . . 80
B.4 Reduced Form Regression . . . . . . . . . . . . . . . . . . . . . . . . 86
vii
List of Figures
2.1 Laplace Component Density . . . . . . . . . . . . . . . . . . . . . . . 27
2.2 Stationary distribution of the Normal-Laplace mixture model, two-
component Normal mixture model vs. Kernel estimation of return
distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.3 Probability of State 2, Normal-Laplace mixture model vs. Normal
mixtures. NBER recession dates are plotted in the shaded areas. . . 30
3.1 Histogram of Sim Error (|Simulated OOP - Actual OOP| <1000 ) . . 44
3.2 Simulation Error by Drug Benefit Type (1=Defined Std Benefit Plan;
2=ActuariallyEquivalentStandard;3=BasicAlternative;4=Enhanced
Alternative, Diameter of the Circle is proportional to Total Enroll-
ment) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
3.3 Ex-Ante and Ex-post Savings, Active Switchers, Stayers and Forced
Decision Makers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
viii
Chapter 1
Introduction
This thesis consists of two examples of the applications of Markov SwitchingModels
in Economics.
Chapter 2 is an application of Hidden Markov Model in time-series data in Fi-
nance. Building on a Lucas tree asset pricing model, we relates the tail risk of
asset prices to the component-density of a Normal-Laplace mixture distribution and
propose a new method to measure extreme event behavior in financial markets. The
hiddenstateofthemodelrepresentstheunderlyingstateofthemacroeconomy, which
follows a two-state Markov regime switching process. Conditional on the state being
“normal” or “extreme”, the log dividend is subject to Normal or Laplace (fat-tailed)
shocks respectively. The asset’s price is derived from discounted dividend values,
where the stochastic discount factor is determined by the utilility maximization of
a representative agent who holds the asset. Finally, the identifiability of the model
parameters, Maximum Likelihood estimation techniques and asymptotic properties
of the MLE are discussed, and the estimation results are illustrated using S&P index
returns. In this example, the stationary distribution of the hidden Markov process
1
exists and can be estimated by forward-backward algorithm.
Chapter 3 is an emprical application of Markov-switching process in Health Eco-
nomics. We focus on Medicare beneficiaries’ decision to switch their Prescription
Drug plans, assuming the switching decision is affected by a set of state variables
including demographics, previous Part D experience, health shocks, change in plan
supply side, and other events that may trigger switching. Our reduced form analysis
indicates that switching is triggered by premium or deductible increase in the old
plan, entering “the coverage gap” in the previous year, and past switching behavior
predicts future switching after controlling for other factors. Next we develop a two-
stage structural model where agents stay in old plan by default and enter the plan
choice stage only if they pay attention first. The actual switching cost is separated
from attention cost in this model. To explain the observed switcher/stayer pattern
in the data, we assume that agents differ by unobserved “ability”, where the higher
ability agent is more likely to pay attention and predicts his utility from plan choice
more precisely.
2
Chapter 2
A Mixture Model for Stock Prices
2.1 Introduction
Financial economists and econometricians have long been devoting their efforts to
model tail events, both investigating their impact on financial asset prices theoreti-
cally and using real-world financial data to evaluate tail risk statistically. However,
assessingtailriskisintrinsicallydifficultduetotheinfrequentnatureoftheseextreme
events. Intheexisting literature, selectingtailobservationsrequires a“cutoff”,which
is often arbitrarily taken since it is difficult to find an optimal rule to determine the
cutoff. Inthispaper,weproposeanewapproachtomodelextremeeventsbyusingthe
Normal-Laplace mixture model to avoid the “cutoff” problem. Our work contributes
to the asset pricing literature in two aspects. From a theoretical perspective, we
build a foundation to capture the power-law property of the tail distribution of asset
prices under an “extreme” state using the Lucas-tree model framework by assuming
the Markov-Switching behavior of the underlying macroeconomic state. Statistically,
3
we propose an easy-to-implement method to estimate the tail distribution by model-
ing stock returns as a mixture distribution of a Normal component and a fat-tailed
component.
The asset pricing model in this paper follows the framework of Lucas (1978). We
assume an endowment economy with a representative agent who holds assets pro-
ducing non-storable dividends. In the basic model, there is a single asset in the
economy; the representative-agent has the time-separable Constant Relative Risk
Aversion (CRRA) utility; and shocks to dividend growth rates are a mixture of Log-
Normal and Log-Laplace random variables. Under the Markov setting, stock prices
have closed-form solutions given the assumptions on dividend growth rates, and the
returnsaredescribed byamixtureofNormal-Laplacedistributions. Wethengeneral-
ize the model to include multiple assets with correlated tail risks and the Epstein-Zin
utility function, where the agent’s Intertemporal Elasticity of Substitution (IES) is
separated from the risk aversion.
We next discuss the estimation of the model (see Section 2.3). By using the time
series returns data, we are able to identify the parameters in the density functions
of the Normal and Laplace component, the transition probabilities of the Markov
switching process and the difference between the expected returns in the next pe-
riod conditional on the state being “normal” or “extreme”. We apply the forward-
backwardalgorithmtoestimatethetransitionmatrixandthesmoothedprobabilities.
Afterintroducingthealgorithm,weexaminetheindentifiabilityissuesandasymptotic
properties of the Maximum Likelihood Estimator(MLE).
In Section 2.4, we demonstrate the estimation results using S&P index returns.
4
We first estimate the model parameters by MLE and then calculate the filtered prob-
abilities of the Laplace component. Our results suggest that the probability of the
“extreme” state is high during crises periods. The estimated transition matrix in-
dicates persistant states and implies an expected duration of 88.3 months for the
“normal” state and 13.3 months for the “extreme” state. Next we compare the
Normal-Laplace mixture model with an alternative two-component Normal mixture
model. The first model indicates a larger probability of transition into the “extreme”
state, and its expected duration is longer.
Related Literature
This work is related to several branches of asset-pricing and financial econometrics
literature: thedividend-basedassetpricingtheory,therare-disasterexplanationofthe
equity premium puzzle, the application of Markov-Switching models in finance and
the estimation oftail risk. Following Lucas’s original work, the asset pricing model in
thispaperisrelated toaseries ofdividend-based multi-asset pricingmodels including
Cochrane et al (2008) and Martin (2013). We differ from these works by working
under a discrete-time setup and assuming time-dependence of the dividend process
through the hidden Markov chain. Although a closed form solution is generally not
obtainableunderthemulti-assetsetting,weareabletodiscussourmodelimplications
in the limit case where one asset dominates the other in market share (intuitively, the
orchard consists of one large tree which is the market and another tiny tree).
Our work can be also viewed as an application of the Markov switching specifica-
tion to the “rare disaster” asset pricing theory. Since Barro (2006) there has been a
resurgence of interest in the rare-disaster explanation of the equity premium puzzle.
5
Barro and Jin (2011) extended the work using a larger database for the GDP and
consumption in 36 countries and fitted the size distribution of consumption disasters
to a power law distribution. Gourio(2008) allowed the probability of disasters to
vary in time and take two discrete values. Watcher (2013) also assumed time-varying
disaster risk and developed a model under a continuous-time framework to explain
the equity premium. This work differs from previous research by assuming that the
dynamics of the underlying state follow a “hidden” Markov chain, where the state of
the chain is unobservable by the agent. The agent thus perceives the macroeconomy
as a mixture of “normal” and “extreme” states.
This paper is also related to the literature in estimating tail risk. The power-
law decay rate of tail distributions has been documented frequently in the literature.
This line of research can be traced back in time to Fama (1963), who estimated
the tail of financial returns in commodity markets by assuming stable Paretian dis-
tribution. Some recent works include Bollerslev and Todorov (2011), who assumed
extreme-value distributions and developed a non-parametric framework to estimate
the “large” jumps from high-frequency 5-minute S&P 500 market data. Instead of
using time-series data, Kelly (2011) estimated tail risk from cross-sectional stock re-
turns and stated that this risk measure has strong predictive power for future market
returns. Some additional examples include modeling the distribution of jump sizes in
jump-diffusion models for option pricing and fitting tail innovations of the GARCH
model (McNeil and Frey (2003)). The above applications of power-law distribution
focus on the estimation of the power-law exponent and often require a cutoff for tail
observations. We adopt the same assumption of power-law behavior and assume the
Laplace distribution in the extreme state. However, unlike previous works, this paper
6
takes non-tail observations into account. We are able to avoid the problem of using
an arbitrary cut-off for the tail distribution by taking the tail risk probability as the
component-density of the mixture model and applying mixture models to estimate
asset return distributions.
Thestatisticaltoolusedinthispaperbelongstofinitemixture-modelsandMarkov-
switching models, which have been widely used in biostatistics, medicine and engi-
neering due to their flexibility in fitting complex distributions. An introduction of
estimationandinferenceoffinitemixturemodelscanbefoundinMcLachlanandPeel
(2001). In statistics literature the regime-switching models in which the states are
unobservable are also documented as Hidden Markov Models (HMMs). An example
of the application of mixture-models in finance is Linden (2001), who estimated a
mixture model of Normal-Laplace stock returns using maximum likelihood. Markov-
Switching modelshavebeenusedinmodelingregimesofthebusiness cycle (Hamilton
(1989)), forecasting recession probabilities (Chauvet and Potter (2002)), modeling
changes in policies, characterizing stock returns (Schaller and Van Norden (1997))
and forecasting volatility (Calvet and Fisher (2004)). Recent advances in inference
in Hidden Markov models can be found in Cappe, Moulines and Ryden (2005), and
its applications in finance can be found in Mamon and Elliott (2007).
The literature on the likelihood based estimation methods of mixture models and
Hidden Markov Models is extensive(for an introduction to the estimation techniques
of finite-state regime-switching models, see Hamilton (1994)). Meanwhile, the the-
oretical study of asymptotic properties of the Maximum Likelihood Estimator has
received increasing attention. Under the stationarity assumption, the consistency of
the Maximum Likelihood Estimator (MLE) for the HMM was established by Leroux
7
(1990), and the asymptotic normality of the MLE was first documented in Bickel,
Ritov and Ryden (BRR) (1998). Le Gland and Mevel (LGM) (2000) developed an
alternative approach to proving the asymptotics of the likelihood function using the
“exponential forgetting” property of the geometrically ergodic Hidden Markov Mod-
els. The LGM approach does not require the stationarity of the Hidden Markov
Chain; however, the regularity conditions in LGM are more restrictive than those in
BRR. We follow BRR in proving the asymptotic normality of the Maximum Likeli-
hood estimator to avoid making the additional assumptions required in LGM
1
.
2.2 Model
2.2.1 Basic Model Setup
In this section, we consider an endowment economy in which the asset is a tree that
produces non-storable fruit.
The Agent’s Utility Maximization Problem
There is a representative agent who is endowed with 1 share of the tree at time 0
and trades the stock at each period after the dividend is distributed. She solves the
utility maximization problem by choosing consumption C
t
and her share of the stock
in the next period α
t+1
:
max
{Ct,α
t+1
}
∞
t=0
E
0
∞
X
t=0
β
t
C
1−γ
t
1−γ
1
LGM requires that the density functions of the components converge to zero at the same rate
at infinity, which is not satisfied in our model since the Normal distribution decays faster. We
could overcomethis by assuming the distribution functions to be truncated-Normal and truncated-
Laplace; however, we avoid making this artificial assumption and follow BRR to prove the asymp-
totic normality.
8
subject to
C
t
+P
t
α
t+1
=(P
t
+D
t
)α
t
α
0
= 1
Under the normalization condition α
0
= 1, the non-storable property of the fruit
implies that C
t
=D
t
and α
t
= 1 in equilibrium.
The stochastic discount factor is the marginal rate of substitution between t+1
and t contingent claims:
M
t+1
=
βu
′
(C
t+1
)
u
′
(C
t
)
= β
C
t+1
C
t
−γ
= β
D
t+1
D
t
−γ
The Underlying State of The Economy
Ateachperiodt, theeconomy maybeinanormalstate(1)ora“boom”or“disaster”
state(2),whichisunobservable. LetS
t
denotethestatewhichfollowsaMarkovChain
with the transition matrix:
Π =
Π
11
Π
12
Π
21
Π
22
where Π
11
+Π
12
= Π
21
+Π
22
= 1. Π
ij
is the transition probability from state i to
state j if the current state is observable. If Π
22
> Π
12
, the states are persistent, i.e.,
if the economy is in State 2 today; then, it is more likely to be in State 2 tomorrow.
We assume that the agent does not observe the state S
t
, and her subjective prob-
ability of the economy in State 2 in the next period is q
t
, which depends on her
information set F
t
at time t and the underlying state (S
t
) of the Markov-Switching
process. If we assume that the agent’s information set is F
t
= {r
1
,...,r
t
}, then
9
q
t
= Pr(S
t
= 2|r
1
,...,r
t
) and can be calculated using the forward-backward algo-
rithm for Hidden Markov Models, which we will discuss in detail in Section 2.3.
Dividend Process
The dividend{D
t
} follows a hidden Markov Process, i.e.,{D
t
} is conditionally inde-
pendent given {S
t
}. To capture the power-law behavior of the stock returns docu-
mented intheliterature, we assume thatthedistribution ofD
t
is Normal, conditional
on S
t
=1, and is fat-tailed, conditional on S
t
= 2:
ln
D
t+1
D
t
=g+
ǫ
t+1
in State 1
v
t+1
in State 2
,
where ǫ
t
∼ N(0,σ
2
), v
t+1
∼ Laplace(0,b). We assume that the dividend growth rate
g is constant (i.e. g does not depend on the state S
t
) for the sake of simplicity as we
are focusing on the fat-tail property of the asset prices under an economic “disaster”.
2.2.2 Asset Prices and Returns
Stock Prices
The stock price is determined by the first order condition of the agent’s utility max-
imization problem and the transversality condition:
P
t
= E
t
[M
t+1
(P
t+1
+D
t+1
)],
lim
k→∞
E
t
β
k
u
′
(C
t+k
)P
t+k
α
t+k
= 0.
10
Under the hidden Markov assumption for the dividend process, we obtain the
closed form solution for the stock price as:
Proposition 1. The price dividend ratio of the stock is
P
t
/D
t
= (K
0
+K
1
q
t
) (2.1)
where K
0
and K
1
are constants described as follows
K
0
=
A[(π
22
−π
12
)B−1]
π
12
A(B−1)+(1−π
22
B)(A−1)
(2.2)
K
1
=
A−B
π
12
A(B−1)+(1−π
22
B)(A−1)
(2.3)
and A and B are the expected dividend growth rates times the discount rate β in state
1 and 2:
A= βexp
−(γ−1)g+
(γ−1)
2
σ
2
2
(2.4)
B =βexp[−(γ−1)g]
1
1−(γ−1)
2
b
2
. (2.5)
We put the proof of this proposition and all the following proofs in Sections 2.2
and 2.3 of the Technical Appendix. Proposition 1 is derived from the intertemporal
Eulerequationandtransversalitycondition,andthesolutionisfoundusingtheguess-
and-verify method. The uniqueness of the solution was established in Lucas’s 1978
paper.
The price dividend ratio is a linear function of q
t
, which is the agent’s estimation
of State 2 probability in the next period t+1 given her informaition set at time t.
One limiting case is when q
t
= 0 and Π
12
= 0, the price dividend ratio P
t
/D
t
=
11
A/(1− A) is constant: we are back to the classic case where the dividend is i.i.d.
Normally distributed. When γ = 1, the agent has the myopic log utility, and it
is straightforward to verify that P
t
/D
t
= β/(1−β) no matter what the transition
probabilities are.
Return distribution
After obtaining the price dividend ratio, we derive the asset’s return distribution as
a direct corollary of Proposition 1:
Corollary 1. The conditional gross return distribution is described as a mixture of
LogNormal and Log-Laplace distributions:
R
t+1
|F
t
∼LogNormal(μ
N
t+1
,σ
2
) with probability 1−q
t
R
t+1
|F
t
∼LogLaplace(μ
L
t+1
,b) with probability q
t
where R
t+1
=(P
t+1
+D
t+1
)/P
t
denote the gross return at time t,
μ
N
t+1
= ln(D
t
/P
t
)+g+ln(1+K
0
+K
1
π
12
),
μ
L
t+1
= ln(D
t
/P
t
)+g+ln(1+K
0
+K
1
π
22
),
σ
2
andb are the parametersrelated to the variancesof the LogNormaland Log-Laplace
distributions.
Using returns data, we are able to estimate the parameters of the distribution
functions (σ, b) and the growth rate g of the log dividend. However, the dependence
ofconditionalreturnontheagent’spreferenceparameters(thediscountrateβ andthe
12
coefficient of risk aversion γ)is reflected through the constants K
0
and K
1
; therefore,
β and γ are not identified. For detailed discussion, see Section 2.3.
2.2.3 Extensions of the Model
Risk Free Asset
Consider the case where there is a risk-free bond that pays 1 unit of the fruit in any
state in the next period. When t = 0, the agent is endowed with no bonds (z
0
= 0).
The agent’s utility maximization problem is:
max
{Ct,α
t+1
,z
t+1
}
∞
t=0
E
0
∞
X
t=0
β
t
C
1−γ
t
1−γ
subject to
C
t
+P
t
α
t+1
+R
f
t
z
t+1
= (P
t
+D
t
)α
t
+z
t
α
0
= 1
z
0
= 0
In equilibrium, z
t
=0 and the risk-free rate is
R
f
t
=
1
E
t
(M
t+1
)
=
1
(1−q
t
)e
−γg+
γ
2
σ
2
2
+q
t
e
−γg
1−γ
2
b
2
.
Multiple Stocks
Consider a Lucas “orchard” where there are multiple risky assets (“trees” ). The
dividend stream of stock i follows the process:
13
ln
D
i,t+1
D
i,t
= g
i
+
ǫ
i,t+1
in State 1
v
i,t+1
in State 2
.
In general, the closed form solution of stock prices is not obtainable since the
stochastic discount factoris afunctionof(
P
i
D
i,t+1
)/(
P
i
D
i,t
)anddiscounted values
of dividend streams are difficult to calculate in our discrete-time, Hidden-Markov
setting.
For the purpose of demonstration, we consider the special case where there are
two stocks, and the market share of one stock is negligible. An intuitive explanation
is that one large “tree” (Asset 1) is the stock market, and we consider the pricing of
a small stock (Asset 2) whose market share is neglible.
InthelimitD
2,t
/(D
1,t
+D
2,t
)→0,C
t
≅ D
1,t
,Asset 2doesnotaffectthepricingof
Asset 1. How does the price of Asset 2 behave? We assume that the two stocks have
the same dividend growth rate g, and ǫ
1,t+1
and ǫ
2,t+1
are independent in State 1.
Consider the following two cases under State 2: (i) v
1,t+1
and v
2,t+1
are independent
(dividends are correlated only through the underlying hidden Markov process); (ii)
v
1,t+1
and v
2,t+1
are perfectly correlated, v
2,t+1
=λv
1,t+1
(asset prices are more highly
correlated during volatile periods). Under the power utility, the price of Asset 2 can
be calculated using the discount-factor pricing formula as follows:
Proposition 2. The price dividend ratio of Asset 2 is
P
2,t
/D
2,t
= (K
2,0
+K
2,1
q
t
) (2.6)
14
where
K
2,0
=
A
2
[(π
22
−π
12
)B
2
−1]
π
12
A
2
(B
2
−1)+(1−π
22
B
2
)(A
2
−1)
,
K
2,1
=
A
2
−B
2
π
12
A
2
(B
2
−1)+(1−π
22
B
2
)(A
2
−1)
.
and
A
2
= βexp(−(γ−1)g+
γ
2
σ
2
1
+σ
2
2
2
),
B
2
=βexp(−(γ−1)g)
1
1−γ
2
b
2
1
1
1−b
2
2
under (i),
B
2
= βexp(−(γ−1)g)
1
1−(λ−γ)
2
b
2
1
under (ii).
Epstein-Zin Preferences
The standard time-separable constant relative risk aversion (CRRA) preference im-
plies that the coefficient of risk aversion is equal to the reciprocal of intertemporal
elasticity of substitution (IES). Epstein and Zin (1989) introduced a new class of
utility functions to decouple IES from risk aversion. The Epstein-Zin utility function
is defined recursively as:
U
t
=
(1−β)C
1−ρ
t
+β
E
t
[U
1−γ
t+1
]
1−ρ
1−γ
1
1−ρ
,
where IES= 1/ρ.
When ρ = γ, we have the standard time-separable case as in Section 1.1. When
ρ = 1, the following limit is used:
U
t
= C
1−β
t
E
t
[U
1−γ
t+1
]
β
1−γ
(2.7)
15
The stochastic discount factor with Epstein-Zin preference is:
M
t+1
=β
C
t+1
C
t
−ρ
U
t+1
R
t
(U
t+1
)
ρ−γ
where R
t
(U
t+1
)=
E
t
(U
1−γ
t+1
)
1/(1−γ)
.
Findingclosed-formsolutionsofassetpriceswiththeEpstein-Zinutilityisdifficult
sincethediscountfactorisafunctionoftherecursivevaluefunctionU
t
. Underspecial
cases where ρ =1, the value function can be exactly solved as a linear function of C
t
.
Lemma 1. Given the Epstein-Zin utility function with ρ = 1, and the consumption
process as C
t
=D
t
, the value function is
U
t
=f(q
t
)C
t
(2.8)
where f(q
t
) is described by the functional equation:
f(q
t
) =e
βg
(1−q
t
)f(π
12
)
1−γ
e
(1−γ)
2
σ
2
2
+q
t
f(π
22
)
1−γ
1
1−(1−γ)
2
b
2
β
1−γ
We thus have the following form of stock prices:
Proposition 3. The price of the stock with Epstein-Zin preferences (ρ =1) is
P
EZ
t
=D
t
∗h(q
t
)
where h(q
t
) satisfies the following equation:
h(q
t
) =
β
n
(1−q
t
)[f(π
12
)]
1−γ
h(π
12
)e
(1−γ)
2
σ
2
2
+q
t
[f(π
22
)]
1−γ
h(π
22
)
1−(1−γ)
2
b
2
o
(1−q
t
)[f(π
12
)]
1−γ
h(π
12
)e
(1−γ)
2
σ
2
2
+q
t
[f(π
22
)]
1−γ
h(π
22
)
1−(1−γ)
2
b
2
(2.9)
16
Corollary 2. The conditional gross return R
EZ
described by a mixture of LogNormal
and Log-Laplace distributions:
R
EZ
t+1
|F
t
∼ LogNormal(μ
N,EZ
t+1
,σ
2
) with probability 1−q
t
R
EZ
t+1
|F
t
∼LogLaplace(μ
L,EZ
t+1
,b) with probability q
t
2.3 Estimation
We discuss the estimation methology of the model parameters using the time-series
data of returns in Section 2.3. We start with a brief description of the statistical tool
(Hidden Markov Model) in Section 2.3.1. The identifiability of the model is discussed
in Section 2.3.2. Section 2.3.3 introduces the forward-backward algorithm, one of
the most commonly used methodologies to fit Hidden Markov Models. Section 2.3.4
discusses the asymptotic properties of the Maximum Likelihood Estimator (MLE).
2.3.1 Hidden Markov Process and its Stationary Distribu-
tion
The starting point of our statistical analysis is equation (2.1), and the parameter of
interest isθ
i
= (σ
i
,b
i
,g
i
,K
0
,K
1
,Π). Weomitthesubscript ihenceforth forsimplicity
of notation. The aim of the estimation is to infer model parameter θ, including
distributional parameters and the transition probabilities of the underlying Markov
process, from the observed stock prices.
Taking the logarithm of equation (2.1) and calculating the difference in time, we
17
get
r
t+1
=Δln(P
t+1
+D
t+1
)= ln
1+K
0
+K
1
q
t+1
1+K
0
+K
1
q
t
+g+
ǫ
t+1
in state 1
v
t+1
in state 2
(2.10)
The distribution of r
t
depends on the state variables S
t
and S
t−1
. We can define
a new state variable
˜
S
t
=
1 if S
t
= 1,S
t−1
=1
2 if S
t
= 1,S
t−1
=2
3 if S
t
= 2,S
t−1
=1
4 if S
t
= 2,S
t−1
=2
and
˜
S
t
is governed by the following Markov switching process:
˜
Π =
Π
11
0 Π
12
0
Π
11
0 Π
12
0
0 Π
21
0 Π
22
0 Π
21
0 Π
22
.
Theprocess{
˜
S
t
,r
t
},inwhichthestateoftheMarkovprocess{
˜
S
t
}isnotobserved,
and inference is made using observable {r
t
}, belongs to the Hidden Markov Model
(HMMs) in the statistical literature: {
˜
S
t
} is a Markov chain on a 4-state space,{r
t
}
is not a Markov chain, but is conditionally independent given{
˜
S
t
} so the joint chain
{
˜
S
t
,r
t
} is Markov.
Thoughout this section we assume:
Assumption 1. The Markov chain{
˜
S
t
} is irreducible and aperiodic.
18
Assumption 2. The process{(
˜
S
t
,r
t
)} is stationary.
Assumption 1 ensures that{
˜
S
t
} has a stationary distribution π
∗
, where
π
∗
= (π
∗
1
π
∗
2
π
∗
3
π
∗
4
) =
Π
11
Π
22
Π
12
+Π
22
Π
12
Π
22
Π
12
+Π
22
Π
12
Π
22
Π
12
+Π
22
Π
12
Π
21
Π
12
+Π
22
is the stationary probability density of
˜
S.
The joint chain has a stationary distribution
2
Pr(
˜
S
t
= s,r
t
∈dr;θ)≡f(s,dr;θ)=π
∗
s
f
s
(r;θ)dr,
where s = 1,...,4, f
1
(r;θ) = n(r|μ
1
,σ), f
2
(r;θ) = n(r|μ
2
,σ), f
3
(r;θ) = l(r|μ
3
,b),
f
4
(r;θ)=l(r|μ
4
,b).
Assumption 2 can be relaxed if the process {
˜
S
t
,r
t
}, the prediction filter and its
derivative converge to a stationary distribution at a geometric rate (Le Gland and
Mevel 1997). This requires that the ratio of the component densities satisfy certain
intergrability conditions which are not satisfied in our setup due to the different
convergenceratesoftheNormalandLaplacedensityfunctionsatinfinity. Technically,
we could assume an upper bound for r
t
to avoid this violation. However, to keep the
proof simple we avoid making additional assumptions; instead, we tested different
probability distributions of the initial state S
0
as a robustness check. The numerical
estimation results suggest that the estimated parameter values are reasonably robust
to changes in the initial distribution.
2
For a rigorous definition of stationary distribution of the Hidden Markov process, see Cappe,
Moulines and Ryden 2005, Page 560.
19
2.3.2 Identifiability of Parameters
Identifiabilityissuesofmixturemodelsoftenarisebecauseofthelabel-switchingprob-
lem if the components are from the same family of distributions. For example, if we
interchange the labels ofthe two components forthe mixture of two normal densities,
the models are equivalent. In equation (2.10) there is no label-switching problem as
the underlying distributions are from different families.
Equation (2.12) and (2.13) suggest that the parameters K
0
, K
1
can not be iden-
tified simultaneously since the likelihood function depends on ln
1+K
0
+K
1
Π
12
1+K
0
+K
1
Π
22
. Let
δ = −ln
1+K
0
+K
1
Π
12
1+K
0
+K
1
Π
22
, then μ
2
= g−δ and μ
3
= g +δ, μ
1
= μ
4
= g, which implies
that the conditional expected return at t is equal if the economy is at the same state
at t−1 and t, but will differ if the state switches at t. So the identified parameters
are (σ,b,g,δ,Π
12
,Π
22
).
We start by discussing identifiability of the Normal-Laplace mixture model in
one dimension and then extend to the multi-period Hidden Markov Model. The
class of distribution functions H is restricted to a mixture of Normal and Laplace
distributions:
H ={H
π
(x) : H
π
(x) = πn(x|μ,σ)+(1−π)l(x|μ
′
,b)} (2.11)
We define identifiability ofH as:
Definition H is identifiable if H
π
1
(x) = H
π
2
(x)m−a.e. x if and only if π
1
= π
2
and
φ
1
=φ
2
, where φ= (μ,σ,μ
′
,b)∈ Φ and m is the Lebesgue measure.
20
Following the procedures in Teicher (1963), we verify that the class of mod-
els defined in (2.11) satisfies the above condition. Teicher’s (1967) work gener-
alizes identifiability to the product densities and showed that the class H
n
φ
(x) =
H
φ
1
(x
1
)···H
φn
(x
n
) with the parameter φ∈ Φ
n
is identifiable. In the hidden Markov
Model, φ is now a function of the state variable φ(S
t
), and identifiability in the
product measure implies
Pr
θ∗
{(φ
∗
(S
1
),...,φ
∗
(S
T
))∈ A}= Pr
θ
{(φ(S
1
),...,φ(S
T
)∈A)}
where θ =(σ,b,g,δ,Π
12
,Π
22
). This leads to θ =θ
∗
in the Hidden Markov model.
2.3.3 MaximumLikelihoodEstimatoroftheMarkov-Switching
Mixture Model
The Markov assumption has made it possible for the econometrician to write the
closed form of the likelihood function, since her estimate of q
t
is the transition prob-
ability conditional on the underlying state, i.e. q
t
= Π
s2
if S
t
= s for s = 1,2. The
conditional likelihood of r
t
can be written as:
L(r
t
|r
1
,...,r
t−1
)
=
2
X
s=1
2
X
s
′
=1
L(r
t
|S
t
=s,S
t−1
=s
′
,r
1
,...,r
t−1
)Pr(S
t
= s,S
t−1
=s
′
|r
1
,...,r
t−1
)
=
2
X
s=1
2
X
s
′
=1
L(r
t
|S
t
,S
t−1
)Pr(S
t
= s,S
t−1
=s
′
|r
1
,...,r
t−1
)
The log likelihood of the model given r
1
,...,r
T
is written as a summation of the
21
conditional probabilities:
lnL(r
1
,...,r
T
;θ)=
T
X
t=1
lnL(r
t
|r
1
,...,r
t−1
;θ)
=
T
X
t=1
ln
4
X
s=1
L(r
t
|
˜
S
t
= s,θ)Pr(
˜
S
t
= s|r
1
,...,r
t−1
,θ)
=
T
X
t=1
ln
h
Pr(
˜
S
t
= 1|r
1
,...,r
t−1
)n(r
t
|μ
1
,σ)+Pr(
˜
S
t
=2|r
1
,...,r
t−1
)n(r
t
|μ
2
,σ)
+Pr(
˜
S
t
=3|r
1
,...,r
t−1
)l(r
t
|μ
3
,b)+Pr(
˜
S
t
=4|r
1
,...,r
t−1
)l(r
t
|μ
4
,b)
i
, (2.12)
where n(r|μ,σ) and l(r|μ,b) are the density functions of the Normal and Laplace
distributions and
μ
1
= μ
4
= g
μ
2
= g+ln
1+K
0
+K
1
Π
12
1+K
0
+K
1
Π
22
μ
3
= g+ln
1+K
0
+K
1
Π
22
1+K
0
+K
1
Π
12
(2.13)
We apply Hamilton’s (1989) filtering technique to estimate Pr(
˜
S
t
= s) in the
likelihood function. The algorithm is carried out as follows:
1. Start with a guess for the initial probability Pr(
˜
S
0
= s) for s = 1,...,4 and
a guess for initial parameter values. One would expect that the asymptotic
properties of the Maximum Likelihood estimator do not depend on the initial
probability under certain regularity conditions, as we will discuss in Section
2.3.4.
Inpracticeweusethesteady-stateprobabilityastheinitialguessforprobability
distribution of S
0
. We Normally fit the log returns to obtain initial value g
0
of
log dividend growth rate and the initial σ
2
0
for the Normal density function, as
22
well as exponentially fitthe absolutevalue oflogreturns obtain theinitial value
b
0
in the Laplace density function. We select δ
0
= 0 as the initial value for δ
and choose Π
0
=
0.9 0.1
0.8 0.2
as the initial values for the transition matrix.
2. For t =1,...,T, calculate Pr(
˜
S
t
= s) iteratively using the following formula:
Pr(
˜
S
t
= s|r
1
,...,r
t−1
) =
4
X
s
′
=1
˜
Π
s
′
s
Pr(
˜
S
t−1
= s
′
|r
1
,...,r
t−1
) (2.14)
Pr(
˜
S
t
= s|r
1
,...,r
t
)=
L(r
t
|
˜
S
t
=s)Pr(
˜
S
t
= s|r
1
,...,r
t−1
)
P
4
s
′
=1
L(r
t
|
˜
S
t
=s
′
)|Pr(
˜
S
t
=s
′
|r
1
,...,r
t−1
)
(2.15)
The loglikelihood inequation(2.12)is maximized toobtaintheestimated param-
eter value
ˆ
θ
T
. Then we apply the forward-backward algorithm to infer the smoothed
probabilities at each time based on full-sample information. The steps are as follow-
ing:
1. Fort = 1,...,T,calculatethefilteredprobabilities
ˆ
Pr(
˜
S
t
= s|r
1
,...,r
t
,
ˆ
θ)using
equations (2.14) and (2.15).
2. Fort = T,T−1,...,1,thesmoothedprobabilitiescanbeobtainedbybackward
iteration starting from the observation at T:
Pr(
˜
S
t
=s|r
1
,...,r
T
)
=
4
X
s
′
=1
Pr(
˜
S
t+1
= s
′
|r
1
,...,r
T
)
ˆ
Pr(
˜
S
t
= s|r
1
,...,r
t
,
ˆ
θ)
ˆ
Π
s,s
′
Pr(
˜
S
t+1
=s
′
|r
1
,...,r
t
)
(2.16)
The smoothed probabilities Pr(
˜
S
t
= s|r
1
,...,r
T
) can be explained as the econo-
metrician’s a posteriori estimate of the economy’s state given the information at time
23
T.
2.3.4 Asymptotic Properties of Maximum Likelihood Esti-
mator
Regularity Conditions
Throughout the proof we assume the following regularity conditions hold:
Assumption 3. (i) The parameter set Θ is compact, and the true parameter θ
0
lies
in the interior of Θ.(ii) σ,b is bounded away from zero.
The compactness assumption (i) states that there are known bounds of param-
eters. It is also assumed that θ
0
is away from the boundary so that the likelihood
function can be Taylor expanded in the neighborhood of θ
0
. This condition also en-
sures that the transition matrix
˜
Π is primitive (with index of primitivity 2), thus the
process {(
˜
S
t
,r
t
)} is ergodic. (ii) rules out singularities of the density function and
ensures that the mapping θ→f
s
(r;θ) is analytic.
3
Asymptotic Normality
The proof of asymptotic normality takes the standard approach of establishing (i)
almost sure convergence of the log-likelihood to a limit function, followed by (ii) the
central limit theorem (CLT) for the score function and (iii) the law of large numbers
(LLN) of the observed information matrix.
The identifiability condition, condition (i) and compactness of the parameter set
ΘleadtostrongconsistencyoftheMLestimator,whichisprovedinLeroux(1990)by
3
In computation, NaN and Inf appear due to rounding errors. We remove the likelihood values with
NaN and Inf in the Matlab code.
24
applying Kingman’s theorems for subergodic sequences to the generalized Kullback-
Leibler divergence. Convergence of the score function and the information matrix to
a limit in L
2
-norm is obtained by writing the likelihood into summations using the
identities for missing data in Louis (1982). The summation then is approximated by
the sum of a martingale difference sequence in order to apply the CLT and the LLN.
Finally, the asymptotic distribution of
ˆ
θ
T
−θ
0
is derived using Slustky’s theorem.
Theorem 1. Under Assumptions 1, 2 and 3, the MLE is asymptotically normal:
ˆ
θ
T
−θ
0
→N(0,I
−1
0
) under P
0
.
where I
0
is the Fisher information matrix under the true parameter value θ
0
.
Proof. See Appendix.
2.4 Analysis of the S&P Index
In this section, we apply the methodology described above tothe monthly S&P index
data.
2.4.1 Data
MonthlyS&PindexdatafromDec1925toDec2011(1033observations)wasobtained
from the Center for Research in Security Prices (CRSP) database. A summary of the
data is in the following table.
25
S&P index level Returns on S&P index
No. of Obs. 1,033 1,032
min 4.43 -0.299
max 1549.38 0.422
Mean 0.00601
Variance 0.00306
Skewness 0.3034
Kurtosis 12.16
2.4.2 Results
Summary
The filtered probabilities
ˆ
Pr(S
t
= 2) are plotted in Figure 2.1. The probability of
State 2 is high during the great depression, especially in time periods 1929-1933 and
1937-1940. The figure also captures events such as the “flash crash” in May 1962, the
bear market in 1973-1974, the Russian financial crisis in 1998, the Oct. 1987 market
crash, the market downturn in Sep 2001 and Oct 2002, and also the Oct. 2008 crash
during the recent financial crisis.
The estimated transition probability matrix is
Π =
0.9887 0.0113
0.0755 0.9245
.
As expected, if the economy is at State 1 (or 2) today then it has a higher prob-
ability to remain in State 1 (or 2) tomorrow. The expected duration of a “normal”
state is 1/Π
12
= 88.3 months and the expected duration of a “extreme” state is
26
Sep29 Apr40 Aug46 May62 Aug74 Oct87 Jul98Aug02 Oct08
0
0.2
0.4
0.6
0.8
1
Pr(S
t
=2|r
1
,...,r
t
)
Estimated Probability of the Laplace Component: Monthly data, Dec1925−Dec2010
Figure 2.1: Laplace Component Density
27
1/Π
21
= 13.3 months.
The other estimated parameter values are ˆ σ = 0.0381,
ˆ
b = 0.0841, ˆ g =0.0074 and
ˆ
δ =−0.0689. δ < 0 implies that K
1
< 0, i.e., the price-dividend ratio is low when q
t
is high.
Comparison with the Normal Mixture Model
One key assumption in our model is that dividend distribution is non-Normal under
the“extreme”state. Insteadoftestingthedistributionalassumption, weestimatethe
model parameters assuming the distribution of dividend under State 2 is also Normal
with higher variance and compare with the results of the Normal-Laplace mixture
model. The estimated transition probabilities are:
Π
normmix
=
0.9820 0.0180
0.3550 0.6450
.
The estimated standard deviation of the estimated components are ˆ σ
1
= 0.0413
and ˆ σ
2
= 0.1858. The estimated ˆ g = 0.0075 is close to the Normal-Laplace mixture
model result, however,
ˆ
δ = 0.07 leads to the counterintuitive explanation: the price-
dividend ratio is higher when the probability of the “extreme” state is high.
Figure 2.2 plots the stationary density function of returns to S&P 500 of Normal-
Laplacemixturemodel,two-componentNormalmixturemodelandthekerneldensity.
The Normal-Laplaces fits the distribution better than the mixture of Normal distri-
butions. To capture the asymmetric property of returns data, a possible extension
of our model is to incorporate an asymmetric distribution for “boom” and “disaster”
states respectively.
28
−0.4 −0.3 −0.2 −0.1 0 0.1 0.2 0.3 0.4
0
1
2
3
4
5
6
7
8
9
10
log return
density function
Stationary Distribution of Return vs. Kernal Estimation
Normal−Laplace Mixture
Normal Mixtures
Kernal Density
Figure 2.2: Stationary distribution of the Normal-Laplace mixture model, two-
component Normal mixture model vs. Kernel estimation of return distribution
Although mixtures of Normal distributions are sufficient to generate the fat-tail
property, in our model, the assumption of the Normal-Laplace mixture distribution
has non-trivial implications. Compared with the Normal-Laplace mixture model,
the Normal mixture model predicts a lower probability and a shorter duration of
State 2. The expected length of the “normal” state is 55.6 months and the expected
length of the “extreme” state is 2.8 months. The NBER recession records of average
contraction length are 18.2 months (1919-1945) and 11.1 months (1945-2009), which
are closer to the Normal-Laplace mixture model results. For a comparison of the
filtered probabilities of State 2 for both models, see figure 2.3.
29
p
Probably of the Pareto Component: Monthly data, Dec1925−Dec2010
Oct29 May40 Sep46 Jun62 Sep74 Nov87 Aug98 Sep02 Nov08
0
0.2
0.4
0.6
0.8
1
p
Probably of Component 2 in the Normal−Mixture model: Monthly data, Dec1925−Dec2010
Oct29 May40 Sep46 Jun62 Sep74 Nov87 Aug98 Sep02 Nov08
0
0.2
0.4
0.6
0.8
1
Figure 2.3: Probability of State 2, Normal-Laplace mixture model vs. Normal mixtures. NBER recession dates are plotted
in the shaded areas.
30
Weevaluatetheprobabilityforecastingofrecession bycalculatingtheBrierScore:
BS =
1
T
T
X
t=1
(REC
NBER,t
− ˆ q
t
)
2
,
where REC
NBER,t
is a monthly dummy variable indicating the NBER recession pe-
riods. The calulated Brier Score for the Normal-Laplace mixture model is 0.155, and
for the Normal mixture model, the corresponding score is 0.184.
Estimating Preference Parameters Using Price-Dividend Ratios
After finding the transition matrix and the parameters in the Normal and Laplace
distribution functions, we use the price-dividend ratio data to estimate the agent’s
preference parameters: discount rate β and risk aversion coefficient γ.
Equation (2.1) states that P
t
/D
t
ratio is a linear function of q
t
. We run the
weighted least squares regression of the historical S&P price-dividend ratio data
(downloaded from R. Shiller’s website
4
) on the estimated ˆ q
t
to find
ˆ
K
0
and
ˆ
K
1
:
value robust standard error
ˆ
K
0
28.0 0.36
ˆ
K
1
-10.1 1.22
Nextwesubstitute
ˆ
K
0
,
ˆ
K
1
,andtheestimated
ˆ
Πand ˆ g, ˆ σ,
ˆ
bfromreturnsdatainto
equations (2.2), (2.3), (2.4), (2.5)to solve forthe constants A, B, and the parameters
β, γ in Proposition 1. The estimated values are:
ˆ
A
ˆ
B
ˆ
β ˆ γ
0.969 0.910 0.984 3.809
4
www.econ.yale.edu/
~
shiller/data.htm
31
Using the simplified two-step analysis above, we found that the estimated risk
aversion parameter is in a reasonable range under the power utility setting.
Volatility Forecasting
The Markov-switching property of the model implies volatility clustering property
sincetheunderlyingstatesarepersistent. Inthissection,wecompare1-peroidvolatil-
ity forecasting of the model with the prevalent GARCH(1,1) model.
Corollary 1 implies that the 1-period predicition of the mean and variance of
return is:
μ
t+1
≡E(r
t+1
|F
t
)= (1− ˆ q
t
)μ
N
t+1
+ ˆ q
t
μ
L
t+1
σ
2
t+1
≡E((r
t+1
−μ
t+1
)
2
|F
t
)
=(1− ˆ q
t
)[(μ
N
t+1
−μ
t+1
)
2
+σ
2
]+ ˆ q
t
[(μ
L
t+1
−μ
t+1
)
2
+2b
2
]
=(1− ˆ q
t
)(ˆ q
t
2
δ
2
+σ
2
)+ ˆ q
t
[(1− ˆ q
t
)
2
δ
2
+2b
2
]
We use the Dec. 1925 to Dec. 2011 S&P index data to estimate the parameters
of the Normal-Laplace mixture model and the GARCH(1,1) model, and use the Dec.
2011-Dec. 2012 data for forecasting. We use the returns data from time 1 to t
to calculate 1-period out-of-sample prediction of ˆ σ
t+1
. The mean squared error is
2.02×10
−6
for the mixture model and 2.88×10
−6
for the GARCH(1,1) model.
32
2.5 Conclusion
We propose a mixture model to capture the fat-tail property of stock prices in this
paper and estimated the model using S&P index in this paper. Our work can be
extended in several directions.
Onepossiblelineoffutureresearchistoinvestigatethepricingofindividualstocks
by looking at both return and dividend data. The estimation technique in this paper
can be directly applied to individual stocks to find out the transition matrix and
model parameters separately; alternatively, transition probabilities estimated from
the market can be used to price individual stocks.
Our model can also be extended by incorporatingasymmetry and combining with
the GARCH model to capture volatility clustering. Another direction is to explore
the model’s implications on option pricing, especially far-out-of-the-money options,
and compare with alternative methods such as jump-diffusion models.
The unsolved inference issues involved in this paper include testing the Normal-
Laplace mixture versus various alteratives such as Normal mixtures and testing the
number of components in the model. Another question is how the ML estimator
behaves asymptotically when we relax Assumption 3 and allow the true parameter
values to be close to the boundary.
33
Chapter 3
Plan Switching and Inertia in
Medicare Part D
3.1 Introduction
The Medicare Part D, introduced since 2006 to provide prescription drug coverage
to beneficiaries, has over 35 million people enrolled in the United States in 2013
1
.
The introduction of Part D establishes a heavily goverment subsidized, but privately
managed insurance market, where firms set prices of plans competitively and bene-
ficiaries select their plan among a wide range of choices during the open enrollment
period (October-December) each year. In 2006, the average enrollee in PDP plans
have 40 plans to choose in 2006, while the numbers decreased to 31 in 2013.
Studies on consumer choice has pivotal implications in Medicare Part D plan
1
Source: Kaiser Family Foundation Issue Brief, December 11, 2013
34
design and policies. Previous studies on consumer choice in Medicare Part D docu-
mented that deviation from rational utility maximization exists. On average, benefi-
ciaries lost $300 each year relative to plan finder choice, and consumers put excessive
weight on certain plan characteristics such as premium and deductibles (Heiss et al,
2013). InertiainconsumerchoiceisprevalentinMedicarePartDinsurancemarketas
discussed in several publications (Heiss et al, 2013 Ericson 2014, etc.). Handel (2013)
and Polyakova (2013) focused on the impact of switching costs on adverse selection
in Medicare Part D.
Psychological studies have shown thatwhen facingalargenumber ofchoices, con-
sumers are less satisfied and may not be able to make optimal decisions. In the field
experiments conducted by Iyengar and Lepper (2000), respondants are more likely
to purchase chocolates when facing 6 choices rather than 24 or 30 options. More-
over, medical insurance contracts are complex products that may differ in various
dimensions: deductible and initial coverage limit amount, copay rate for each drug
tier and different types of pharmacies, gap coverage for branded drugs and generics,
etc. Seniors are often poorly informed about their insurance contracts due to the
complexity of plan design. Although there are tools such as the medicare plan finder
to help beneficiaries choose their plan, only 10% of the respondants in Senior’s Opin-
ions About Medicare Prescription Drug Coverage survey have used it as a source of
information
2
.
In this paper we focus on plan switching in Medicare Part D to comprehensively
investigate switching in Medicare Part D. We suggest a two stage decision process
using an attention variable to link the two stage decisions. Our research is designed
as follows:
2
Seniors’ Opinions About Medicare Prescription Drug Coverage, 8th Year Update, KRC research
35
First, we use highly detailed medicare claims and plan design data from 2007-
2010tosimulatetheconterfactualconsumerinclusive spending(CIC)inbeneficiaries’
chosenplanandallavailablealternativeplansandevaluatethequalityofplanchoices
based on the simulation. Our simulation results suggest that our working sample of
age65orolder,non-LISPDPenrolleeshaveonaverage“foregonesavings”above$300
in 2007, and this does not alleviate in 2008-2010. We then compare the “ex-ante”
and “ex-post” savings of active switchers, forced switchers (whose old plan was no
longer available) and non-switchers. The foregone savings of “active switchers” are
$100-$200less thannon-switchers, however, theystillspent over $200morecompared
with their plan-finder choice.
Why do people switch plans? In the Health and Retirement Study (HRS) survey
in 2010, the top answers for voluntarily change Part D plan are lower premiums,
cheaper drugs and lower costs. We collect a comprehensive set of variables from
the CCW database and estimate a reduced form model to predict switching using
demographic characteristics and trigger variables. Our analysis suggest that there
is a switcher/stayer pattern: past switching behavior is a strong predictor of future
switching, and switchers achechived better cost-savings than stayers in the following
year. Thereforewefocusonabehaviorialmodelwhereagentsstayintheiroldplanby
default andmake their plan choice only ifthey pay attention first, andagents varyby
unobserved“ability”Q: beneficiarieswithhigherQtendtopayattentiontotheirPart
D plan choice process each year, and are able to predict their perspective spending
withmoreprecision. Thosewhopayattentionmaychoosetostayintheiroldplan,or
selectanewplanunderwhichtheyfaceafixedswitchingcost. Underthisspecification
we separate the actual switching cost (i.e. paperwork, switching pharmacy, etc.)
36
from attention, which are related to psychological factors and surprises in medical
conditions and prescription drug spending in the previous calendar year.
The remaining part of this chapter is organized as follows. Section 3.2 introduces
plan design in Medicare Part D, discusses our simulation techniques, summerizes
the determints and conseqences of switching, and analyzes a reduced-form model to
predict switching. Section 3.3 suggests a structural model where the beneficiaries
differ by the unobserved “ability” to pay attention and make decisions. Section 3.4
pointsoutthepolicyimplicationsanddiscusses thelimitationofthisstudyandfuture
work.
3.2 Plan Switching in Medicare Part D
3.2.1 Medicare Part D Plan Design
The Medicare Part D insurance market is heavily regulated to provide enrollees the
benefits of competition and protect them from inappropriate marketing. In each
year, the Centers for Medicare and Medicaid Services (CMS) send out a call letter to
plan sponsors, who submits the bid and formulary design of plans subject to CMS
regulations. CMS then determine the contract eligibility and after the plan benefit
details are determined, plan sponsors are required to send the Plan Annual Notice of
Change (ANOC) in September including “any change in coverage, costs, or service
area that will be effective in January (medicare.gov)”.
The two main categories are stand-alone Prescription Drug Plans (PDP) for ben-
eficiaries enrolled in original fee-for-service Medicare plans, and Medicare Advantage
Part-D Plans (MAPD) for those in managed care heath plans provided by health
37
maintenance organizations (HMO) or Preferred Provider Organizations (PPO).
CMS defines four benefit types of Part D plans: defined standard (DS) benefit
plan, actuarially equavilent(AE) standard plan, basic alternative and enhanced alter-
native. The defined standard benefit plan has a well-defined structure. For example,
in 2013, the DS plan has an annual deducible amount of $325, after which the benefi-
ciary pays 25% of drug costs up to the Initial Coverage Limit (ICL) of $2,970. After
theICLisreached, thereisacoverage gapinwhich thebeneficiary paysfullcostuntil
the catastrophic phase starts when a total out-of-pocket (OOP) of $4,750 is reached.
In the catastrophic stage the beneficiary pays 5% or a fixed copay amount, whichever
is greater. The other three types of plans must be at least as generous as the DS
plan. The actuarially equavalent standard plan differs from the DS plan only by the
cost-sharing structure. These plans may use fixed copays or a combination of copay
amount and coinsurance rate instead of the 25% coinsurance. The basic alternative
(BA) plan may have a reduced deductible or an increased Initial Coverage Limit.
These three types of plans provide basic drug coverage. The enhanced alternative
(EA) plan is the only plan type that may provide gap coverage.
Table 3.1 displays the number of non-EGWP PDP plans by drug benefit type.
Overall the number of plans decreased from 1,908 to 1,621.
Plan Type and Enrollment, 2007-2010
2007 2008 2009 2010
Total No. Plans 1,908 1,877 1,740 1,621
DS Plans 221 222 174 177
AE Plans 257 243 333 374
BA Plans 523 463 312 252
EA Plans 907 949 921 818
Table 3.1: Summary of PDP Non-EGWP Plans, 2007-2010
38
Ineachyear,partDplansmaybeconsolidated,splitorterminatedsubjecttoCMS
regulations. CMS encourage plan sponsors toconsolidate orwithdraw plans with less
than 1000 enrollees, and will force non-renewal if the plan does not meet certain
qualifications. Plan sponsors may also voluntarily terminate or consolidate plans.
Firms introducting new plans have to demonstrate that the new plan is different
from its existing plans. Table 3.2 displays the number of consolidated, renewed and
terminated PDP plans as well as new PDP plans each year.
Plan Type and Enrollment, 2007-2010
2007-2008 2008-2009 2009-2010
New Plans 214 58 109
Consolidation 290 214 398
Termination 89 87 26
Renewal 1,529 1,576 1,316
Table 3.2: Summary of PDP Non-EGWP Plans, 2007-2010
The plan provider may also change the plan premium and the details of ben-
efit design each year, such as deductible amount, drugs on formulary, copay rate.
For example, Ericson (2014) documented that plan providers tend to introduce low-
premium low plans to acquire market share and raise the premium for old plans as a
strategic reponse to consumer inertia.
3.2.2 Database
The data we use is provided by Chronic Condition Data Warehouse (CCW). The
CCW Part D database contains the enrollment data, the prescription drug event
data,andthepartDcharacteristic files, which canbelinked usingaunique identifier.
The Part D enrollment data is a subset of the beneficiary summary file (BSF) and
contains enrollment status and the encripted contract/plan id of the beneficiary by
39
month. The prescription drug event (PDE) file includes the drug information (name,
National Drug Code(NDC), dosage, supply), the pharmacy information, the date of
service, the total drug cost and beneficiary payment amount, and the benefit phase
indicating whether this claim lies in the deductible, pre-ICL, ICL or catastrophic
stage. The PDE file can be linked to the plan characteristics file, which contains
plan details such as benefit type, the copay amount (coinsurance rate) by tier of the
drug, etc. Also using an encrypted pharmacy ID, we can link the PDE file with the
pharmacy characeteristic file to find the pharmacy information (whether it is instore
or mail order) of each claim. The formulary charasteristics file, which is available
since 2010, contains drug information including tier, brand name, strength, etc.
To analyze the impact of health shocks on Part D plan switching, we use also the
Chronic Condition Summary File, which has yearly flags of 27chronic conditions and
diagnosis dates. We also use the Part A/B claims data to calculate the Hierarchical
Condition Category (HCC) risk score. The other variables that may trigger the
beneficiary’s change ofplan choice are physician visits, hospital visits, etc., which can
be extracted from Part A/B cost and use files.
3.2.3 Sample
We use a 20% random sample of the entire Medicare enrollees from 2007-2010. Since
Part D was introduced in the mid-year of 2006, we exclude the 2006 data where for
most beneficiaries only partial-year claims records are available. Medicare enrollees
in Part A and/or Part B are eligible for Part D coverage. Enrollment is voluntary,
however, beneficiaires face a late enrollment penalty if there is 63 days or more when
they do not have Part D or other creditable coverage (which is at least as good as
40
standard Part D coverage) after the initial enrollment period.
The following restrictions are applied to our sample in order to have clean and
complete records for each beneficiary: we select the age 65 and older US residents
who are enrolled in stand-alone PDP plans. We exclude the Low-Income Subsidy
recipents since they are allowed to switch plans out of the open enrollment peroid.
We also exclude people who entered or exited the PDP plan in the middle of year
due to aging into 65(death) or switching from(into) HMO plans. Table 3.3 showed
the sample selection process and the number of observations at each step.
Construction of Working Sample, 2007-2010
2007 2008 2009 2010
Total 9,299,848 9,530,609 9,781,213 10,016,372
Age 65+ US residents 7,235,063 7,403,722 8,090,770 8,264,255
Enrolled in Part D 4,003,149 4,215,955 4,641,784 4,788,333
Enrolled in standalone PDP 2,712,376 2,749,743 2,884,233 2,918,100
Non dual-elegible, Non-LIS 1,639,547 1,701,558 1,809,163 1,836,057
Non EGWP 1,511,372 1,552,163 1,639,226 1,638,196
Enrolled continuously full year 1,347,121 1,401,441 1,427,263 1,431,472
Prior year data available 1,207,166 1,249,059 1,275,796 1,280,283
Not deseased in reference year 1,202,220 1,243,978 1,270,646 1,274,610
Table 3.3: Construction of working sample
Our sample size reduced by 40% when we exclude those who do not enroll in Part
D,andshrinked another30%asthe HMOenrollees aredropped. Wedidnotconsider
HMO plans or non-enrollment as alternative options in our analysis and switching
between HMO plans in this study for several reasons. By excluding people who have
other creditable drug coverage (employer plans, federal employee health benefits) and
HMO enrollees, we focus on a subset of beneficiaries for whom the most complete
set of prescription drug claims, health conditions, cost and use records are available.
Switching between HMO plans may be driven by different factors since HMOs limit
41
Table 3.4: Summary Statistics of Working Sample by Year, 2007-2010
year
2007 2008 2009 2010 Total
% % % % %
Age at Beginning of Reference Year
65-74 48.7% 49.0% 50.2% 50.9% 49.7%
75-84 37.6% 37.0% 35.8% 35.2% 36.4%
85 and more 13.6% 14.0% 13.9% 14.0% 13.9%
Total 100.0% 100.0% 100.0% 100.0% 100.0%
8. sex
Male 36.3% 36.7% 37.4% 38.1% 37.2%
Female 63.7% 63.3% 62.6% 61.9% 62.8%
Total 100.0% 100.0% 100.0% 100.0% 100.0%
Race
White 93.6% 93.5% 93.4% 93.4% 93.5%
Black 3.2% 3.0% 3.0% 2.9% 3.0%
Hispanic 1.9% 1.9% 2.0% 2.0% 1.9%
Other 1.4% 1.5% 1.6% 1.7% 1.6%
Total 100.0% 100.0% 100.0% 100.0% 100.0%
Part D Spending: End Phase
Deductible 3.8% 3.6% 3.8% 4.2% 3.8%
Pre-ICL 66.0% 67.4% 69.8% 71.2% 68.7%
Gap 26.3% 25.1% 22.8% 21.2% 23.8%
Catastrophic 4.0% 3.9% 3.6% 3.3% 3.7%
Total 100.0% 100.0% 100.0% 100.0% 100.0%
doctors and service providers for beneficiaries’ Part A/B plans. Also, most switchers
changed within FFS plans or within HMO plans instead of moving across the two
main category: For those who switched plans in Part D, over 85% switching occurs
between different FFS plans or between different HMO plans. Another 11% moved
into HMO plans from FFS plans and 4% left HMO plans to join FFS plans.
Overall, we have 1,634,734beneficiaries in 2007-2010, in which 738,775people are
present in all 4 years. Table 3.4 displays the distribution of our sample by age group,
gender, race and final stage in Part D spending by reference year.
42
3.2.4 Simulation of the Counterfactual Spending
In order to compare the consumer inclusive spending (CIC) across plans, we simulate
thecounterfactualCICinallplansinthebeneficiary’schoicesetandusethesimulated
spendingtoevaluatepotentialsavings/benefitsfromswitching. Weusethesimulation
procedures in Heiss et al (JHE 2013): construct the annual Formulary and Benefit
Design (FBD) for each plan, including the type of drugs (identified by the National
Drug Code) on the plan’s formulary, and the copay rate of the drug in each benefit
phase; In each service region (34 CMS service regions including 50 states), construct
the plan choice set in the service area and run the claims of each beneficiary in the
reference year through to calculate her total out-of-pocket spending (OOP) in all
plans in her choice set. The detailed description of simulation rules is in Appendix
B.1.
The formulary of each plan is available since 2010: CCW provides RxID with
each claim since 2010, which can be linked with the formulary file to obtain the
brand name, strength and tier of the drug. In 2010 we use the actual formulary
for each plan in the simulation. Before 2010, we construct the emprical formulary
from the claims records of enrollees in all plans with this formulary. We then use the
NDC-RxID mapping in 2010’s actual formulary to improve previous year’s formulary
construction. Both ex-ante and ex-post CICs are computed. The ex-ante CIC is
calculated by running year t drug claims through year t+1 formulary benefit design,
which is the way Plan Finder calculates the spending for beneficiaries.
We follow Heiss et al (2013) by assuming that beneficiary follow the same order
of drug utilization, and for claims on the same date, we sort first by benefit phase in
the chosen plan and then by drug cost. We also assume agents use the same quantity
43
and dosage of the drug (or the drug in the same class) across all plans; Agents retain
the same type of pharmacy choice (in-store, mail order and long-term care) across all
alternative plans, but choose the minimum cost pharmacy within each type.
We check the validity of the simulation by comparing the simulated ex-post OOP
versus actual OOP in the chosen plan. Figure 3.1 displays the simulation error in the
chosen plan in 2007-2010 if its absolute value is less than 1000.
0 .005 .01 .015 .02 .025
Density
−1000 −500 0 500 1000
Simulated OOP Spending minus Actual OOP Spending
Figure 3.1: Histogram of Sim Error (|Simulated OOP - Actual OOP| <1000 )
Table 3.5 below displays the actual and simulated OOP, as well as selected quan-
tiles of simulation error for the simulation without drug substitution. The mean
difference between the simulated and actual OOP ranges from $-2.65 to $8.93, and
44
the median is 0. The 99% quantile simulation errors range from $162-$357, however,
there are a few huge outliers where the simulation under predicts more than $10,000.
These outliers are probably caused by uncommonly used drugs. In one example in
our data the largest outlier is caused by Gammagard Liquid, a drug to treat primary
immunodeficiency diseases.
Simulated vs Actual OOP by Year, 2007-2010
2007 2008 2009 2010
No. Obs. 1,114,087 1,161,206 1,194, 387 1,205,758
OOP 910.4 882.6 918.5 913.6
Predicted OOP 911.0 880.3 927.4 917.7
Sim Error (Mean) 0.68 -2.65 8.93 4.11
Sim Error (Std) 80.4 94.6 121 113
Sim Error (Median) 0 0 0 0
Sim Error (Min) -14,219 -57,473 -45,478 -35,438
Sim Error (Max) 4,121 9,746 6,941 7,343
Sim Error (1%) -163 -190 -208 -230
Sim Error (5%) -45.1 -45 -48.4 -70
Sim Error (95%) 60.1 26.2 141 128
Sim Error (99%) 197.3 162.4 357 296
Table 3.5: Actual vs Simulated OOP, Sim Error = Simulated OOP - Actual OOP
The simulation error may come from several sources. First, the tier informationis
missing for some claims, and these off-tier claims may come from uncovered drugs or
may be the result of the appeal process. We allocate the copay of these off-tierclaims
using our emprical formulary if the claim lies in the pre-ICL phase. If the claim is
in the deductible or gap, the simulated spending of off-tier claim is determined by
whether the drug is generic or branded and the deductible/gap coverage type and
may differ from the actual amount. In 2007, gap coverage are simply classified into
“Generics”,“GenericsandPreferredBrands”,“GenericsandBrands”,“AllFormulary
Drugs”. Since 2008, almost all plans eliminated gap coverage for branded drugs and
45
started to introduce detailed classification such as “Some Generics”, “All Preferred
Generics”, “AllGenerics”. In2009and2010evenmoredetailsareintroducedintothe
gap coverage type, such as “few generics” and “many generics”. Another source of
simulation error is that we can not identify the exact type of pharmacy of the claim.
The three major type are in-store, mail order and long-term care pharmacy, and plan
providers apply separate copay rate for in-network and out-of-network, preferred and
non-preferred pharmacies within each category. We are assuming that beneficiaries
choose the cheapest among in-store, mail order and long-term care pharmacies, how-
ever, our simulation does not capture in-network and preferred status of pharmacies
as these charasteristics are not provided in CCW database.
Simulation Error by Drug Benefit Type, 2007-2010
DS AE BA EA
No. Obs. 345,345 527,241 1,913,493 1,889,359
Sim Error (Mean) -0.85 -7.79 10.5 -1.04
Sim Error (Std) 107 142 95.9 97.9
Sim Error (Median) -0.015 -7.79 0 0
Sim Error (Min) -57,473 -35,437 -28,202 -45,478
Sim Error (Max) 3,636 4,288 3,953 9,745
Sim Error (1%) -39.0 -253 -182 -217
Sim Error (5%) -0.13 -57.3 -50.3 -60
Sim Error (95%) 0.012 17.0 145 47.4
Sim Error (99%) 0.08 148.2 318 258
Table 3.6: Actual vs Simulated OOP, Sim Error = Simulated OOP - Actual OOP
We compare the simulation error by drug benefit type of plans in Table 3.6. The
Defined Benefit Plans have the simplest benefit design and the simulation predicts
actual spending precisely for 95% enrollees (Simulation Error <0.2).
Figure 3.2 plots the mean and standard deviation of simulation error by drug
benefit type of plans. The radius of the circle on the graph is proportional to the
46
0 500 1000 1500 2000
Std Dev. of Simulation Error
0 200 400 600 800 1000
(mean) simerr
Includes only plans with Drug Benefit Type = 1
0 200 400 600 800
Std Dev. of Simulation Error
−400 −200 0 200
(mean) simerr
Includes only plans with Drug Benefit Type = 2
0 200 400 600 800 1000
Std Dev. of Simulation Error
−1500 −1000 −500 0 500
(mean) simerr
Includes only plans with Drug Benefit Type = 3
0 500 1000 1500 2000 2500
Std Dev. of Simulation Error
−500 0 500 1000 1500 2000
(mean) simerr
Includes only plans with Drug Benefit Type = 4
Figure 3.2: Simulation Error by Drug Benefit Type (1=Defined Std Benefit Plan;
2=Actuarially Equivalent Standard; 3=Basic Alternative; 4=Enhanced Alternative,
Diameter of the Circle is proportional to Total Enrollment)
47
number of enrollees in the plan. In our working sample in 2007, there are 1,531 PDP
plans and enrollment is highly concentrated in large plans. There are 26 plans with
only 1 enrollee and 376 plans have less than 30 enrollees. Meanwhile the largest plan
has 41,330 enrollees and there are 48 plans with more than 5,000 enrollees.
Byusingthesimulatedspendinginbothchosenandalternativeplans,weeliminate
the irregularities of offer-tier drugs when comparing spending across plans. However,
our approach is not able to capture certain unobserved characteristics of plans, for
example, some plans may be more generous in reimbursing the off-formulary drugs.
Wealsoignorethepossibilitythatconsumersmayswitchtoadifferentpharmacytype
whentheychangeplans(forexample, fromin-storetomailorderifthecounterfactual
plan has less in-network pharmacies) due to the complexity of analyzing the choice
of pharmacy, which may depend on different factors such as cost, distance, service
quality, etc.
3.2.5 Main Findings
On average, only 10.4% of beneficiaries in our working sample of non-LIS FFS en-
rollees switch their plans each year. The switching rate vary slightly across different
age and ethnic groups. On average, the switching rate of females are 0.6% higher
than males, and the white ethic group has a switching rate 4.5% higher than black,
3.6% higher than hispanic. Among the three age groups, the youngest group has the
highest switching rate. These findings agreewell qualitatively with the Kaiser Family
Foundation Issue Brief (October 2013)
3
In the 738,775 subsample of beneficiaries in the 4-year panel, 77.62% did not
3
KFFIssue Briefalsoanalyzesnon-LISMA-PDenrollees. Theiruse 5%randomsample ofMedicare
enrollees and include PDP enrollees in December in year t-1 and January year t from 2006-2010.
48
Table 3.7: Switching Rate by Demographics and Original Reason for Entitlement
Year
2007-2008 2008-2009 2009-2010 Total
Overall 11.2% 10.4% 9.5% 10.4%
Age
65-74 10.9% 11.1% 10.3% 10.8%
75-84 11.3% 10.3% 9.3% 10.2%
85 and more 11.8% 9.3% 8.3% 9.6%
Sex
Male 10.3% 10.3% 9.5% 10.0%
Female 11.6% 10.5% 9.6% 10.6%
Race/Ethnicity
White 11.3% 10.6% 9.7% 10.5%
Black 7.5% 5.7% 5.0% 6.0%
Hispanic 8.4% 7.0% 5.4% 6.9%
Other 8.2% 8.7% 8.5% 8.5%
Original Reason for Entitlement
Old age and survivors insurance 11.1% 10.4% 9.5% 10.3%
Disability 12.6% 10.9% 9.8% 11.0%
ESRD or Disability and ESRD 18.2% 12.9% 10.1% 13.2%
change plans in all 3 open enrollment peroids, 15.77% switched once, 5.17% switched
twice, and 1.44% switched 3 times. Among those who changed plans in 2008 in this
subsample, 30% switched again in 2009, while only 7.75% among the non-switchers
in 2008 changed plans in the next year. Moreover, those who switched twice in 2008
and 2009 has a switching rate of 43.8% in 2010.
Whether Medicare Part D enrollees are making better choices over time is under
debate. Ketchametal(2012)examinedplanchoicein2006and2007anddocumented
that overspending reduced in the second year. In contrary, Alabuck and Gruber use
2006-2009 data and found that the foregone savings increased over time. We com-
pare the plan choice quality of switchers and non-switchers to address the following
issue: does excess spending trigger switching, if so, are switchers saving money after
49
changing plans?
Our working sample is categorized into groups based on beneficiaries’ enrollment
and plan supply status for comparison: active switchers, who choose to change their
plan despite that their old plan is available; forced switchers, in whose old plan was
terminated so they have to select a new plan; new enrollees, who moved into PDP
plans during the open enrollment period due to aging or changing from HMO/other
credible coverage; non-switchers, who stayed in their old plans. We also restrict the
active switchers and stayers to those who have complete 2 years’ claims records, i.e.
enrolled for the full 24 months reference period.
Figure 3.3 compares the following three savings measures of the four groups: ex-
post foregone savings in year t-1(ex post CIC in chosen plan t-1-expost CIC cheapst
plan t-1); ex ante savings from switch (ex ante CIC in old plan t-ex ante CIC in
cheapest plan t); ex-post foregone savings t (ex post CIC in chosen plan t - ex post
CIC cheapst plan t).
The ex-post savings measure (green and blue bars in Figure 3.3) indicate that
activeswitchersaresaving$100in2008andnearly$200in2009-10relativetostayers.
Compared to new-enrollees who are healthier on average (since a large proportion in
thisgroupareage65enrollees), active switchers have over$20less “foregonesavings”
in 2008 and over $60 less in 2009. This pattern demonstrates the active switchers are
shopping around to optimize their decisions and successfully reduced their forgone
savings compared with the previous year. In contrast, in all three reference periods
stayers’ forgone savings increased in year t+1. The new enrollees’ forgone-savings
locates inbetween active switchers and stayers in all reference periods.
The ex-ante savings from switch (red bar in Figure 3.3) is the ex-ante benefits
50
0
100
200
300
400
500
600
Ex-Post ForgoneSavings
year t-1
Ex-Ante Savings from
Switch year t
Ex-Post Forgone
Savings year t
2007-08
0
50
100
150
200
250
300
350
400
450
500
Ex-Post ForgoneSavings
year t-1
Ex-Ante Savings from
Switch year t
Ex-Post Forgone
Savings year t
2008-09
0
100
200
300
400
500
600
Ex-Post ForgoneSavings
year t-1
Ex-Ante Savings from
Switch year t
Ex-Post Forgone
Savings year t
2009-10
Figure 3.3: Ex-Ante and Ex-post Savings, Active Switchers, Stayers and Forced De-
cision Makers
51
from choosing the plan-finder choice (compared with staying in old plan). Active
switchers do have a much higher ex-ante savings in t than ex-post forgone savings in
t-1, indicating that staying in old plan had become relatively expensive for them due
to increase in cost old plan or a new cheap plan was introduced.
The forced switchers are a small group (less than 0.1% in our working sample)
whose plan was terminated by CMS or the service provider. This group has highest
ex-post foregone savings in t-1, and in the next year, their choice quality was better
than stayers but not as good as active switchers (except in 2008). We drop forced
switchers in further analysis as the observations are limited (around 1,000each year),
they are regionally concentrated, and we do not know the details of the enrollment
process after the termination of their old plan.
The detailed CIC and savings measures are reported in Section B.2, where the
non-switchers are further categorized into three groups based on their plan crosswalk
status: renewal, consolidated-in (another PDP plan was consolidated into their old
plan), consolidated-out (their old plan was consolidated into another PDP plan).
The “change in old plan” is a measure of expected increase in spending in old plan
at the end of year t-1 assuming that the drug cabinet does not change in the next
year. Active switchers have $50-$90 higher expected increase in old plan than the
sample average and dominates all other groups, indicating that some change in plan
characteristics, for example, increase in premium caused them to switch plans.
3.2.6 Reduced Form Analysis of Switching
Thesummarystatisticssuggestthatswitchingratedifferacrossdifferentdemographic
groups and those who experienced change in old plan are more likely to switch. For
52
detailed analysis, we estimate the following reduced form Logit model:
S
it
= α+δT
it
+ǫ
it
Where S
it
is a binary variable and S
it
= 1 indicates the beneficiary actively changed
plan in the reference period. T
it
is a set of explanatory variables related to switching
including:
1. Demographic characteristics, including age, sex, race.
2. Socioeconomics status atthe ziplevel (median income andpercentage ofpeople
with advanced degree, bechaelor’s degree and high school diploma, no-high
school diploma as the reference group).
3. A set of savings measures, including (i) ex-ante expected savings from switch
(ii) ex-post foregone savings in year t-1. (iii) ex-post deviation from plan finder
prediction using year t-2 drug cabinet
4. Surprises in year t-1’s Part D experience such as entering gap or catarostrophic
stage.
5. Change inoldplan, including (i)premium change, (ii) changein deductible (iii)
a dummy indicator for consolidations (iv) ex-ante expected change in CIC in
year t-1 chosen plan
6. Healthshockssuchasincidenceofchronicconditions,usingasetofindicatorsof
conditionhistoryascontrols. Tosimplifyouranalysis, the27chronicconditions
are categorized into 4 groups based on Cutler and Ghosh (2013): (i) acute
recoverableconditions(ii)cancers(ii)chronicdisablingconditions(iv)non-fatal
53
controllable conditions. (See Table B.1 in Appendix for details). We further
dividetheseconditionsintocostly/noncostlysubcategoriesbasedontheirability
to predict change in Part D OOP (Table ).
7. Health care use in t-1 including physician visits, emergency center visits and
other Part A/B events. (Table B.2)
8. First enrollment year in PDP plans. We use dummy indicators for 2007-2010,
using 2006 as the reference year.
9. Year dummies for 2009 and 2010, using 2008 as reference year.
TableB.4reportstheLogitregressionresultsofallactivedecisionmakers. Werun
the pooled regression of the three-year data including 2,607,276 observations. Our
main findings on determinants of switching are summarized as follows.
Demographics and Social Economic Status
Beneficiaries are less likely toswitch as they age, andifthey are male. They are more
likely to switch if they are white compared with black, asian and hispanic. The zip
level education variables have positively significant coefficients: switching rates are
higher in areas with higher percentage of high school diploma, bachelor or advanced
degree.
Experience in Part D
Some variables consistently predicts switching over all three reference period. The
experience ofentering inthecoverage gapleads tohigherprobability tochangeplans.
54
If the old plan’s premium and deductible increased, beneficiaries are more likely to
change plans.
D2007-D2009 in Table B.4 are dummies for first year in PDP plans, using those
first joined Part D in 2006 as the reference group. Switching rate is positively related
to years in Part D, indicating possible learning effects over time.
Savings Measures
All the three savings measure are positively related to switching. The “change in
old plan” and “ex-ante savings from switch” are ex-ante savings measures, and the
positive coefficients indicate the switchers are looking at the change in their old plan
in the next year, shopping around and optimizing their plan choice. The “perfect
foresight forgone savings” is an ex-post savings measure at the end of the baseline
year, and overspending did trigger switching in the open enrollment period.
Change in Old Plan’s Deductible and Premium
In addition to the “change in old measure” which captures the overall change in for-
mulary and benefit design, we extract the change in annual premium and deductible
amount of the old plan. Both measures have a significant effect, suggesting deviation
from rational utility maximization: beneficiaries overweigh specific plan characteris-
tics such as premiums and deductibles rather than considering the total cost.
Health Conditions/Incidences
We use incidence indicators for categories of conditions, and ever indicators as con-
trols. Most of the incidence categories are negatively related to switching, and those
55
who had new costly conditions switch less than who had non-costly conditions. Pos-
sible explanations are: bad health conditions affected their cognitive ability and thus
the ; if they move to skilled nursing facilities due to deterioriated health, their plan
choice may be restricted; treatment of conditions such as cancers involves Part B
drugs instead of Part D drugs.
Part A/B Health Care Usage
Beneficiaries are more likely to switch plans if they have at lest one part B drug
events, and if they have at least one hospice visits. They are less likely to switch if
they have at least one emergency center visits, if they have have at least one home
health or skilled nursing facilities visits or if they had at least one imaging event or
test events, if they had at least one acute stays or other-in-patient stays, or if they
had at least one ermergency center visits.
Time Trend
We found a negative effect on 2009 and positive effect on 2010 compared with the
reference year (2008), which may be caused by policy changes in Medicare Part D.
3.3 A Two-Stage Model with Unobserved Ability
In this section we build a structual model to address two issues. Firstly, the inertia
in Medicare Part D may arise from beneficiaries’ inattention, since they stay in their
56
old plan by default if they do not take any action during the open enrollment per-
oid. Secondly, aswitcher/stayer patternobserved in2007-2010
4
. Moreover, switching
decision is correlated with quality of plan choice: active switchers have better plan
choice quality than forced switchers and non-switchers. We thus consider a heteroge-
neous agent model where beneficiaries have unobserved “ability” that will affect both
their attention probability and plan choice quality.
3.3.1 Model
Themodelissetupunderthetwo-stageframework. Beneficiariesentertheplanchoice
decision stage only if they pay attention first, and the probability of paying attention
is determined by “trigger variables(T
it
)” and the hidden type of the beneficiary (Q
i
):
Pr(A
it
=1) =
exp(T
it
ζ +X
it
ξ +u
a
Q
i
)
1+exp(T
it
ζ +X
it
ξ+u
a
Q
i
)
Smart people with higher Q
i
are more likely to pay attention to their plan choice.
If agents pay attention then they enter the plan choice stage. Assume agent i has
“generalized utility” from choosing plan j at year t:
U
ijt
= Z
ijt
β+C
it
D
ij−1t
+exp(u
c
Q
i
)e
it
where Z
ijt
is a vector of plan characteristics and perspective costs predicted using
the agent i’s t− 1 drug cabinet, X
it
a vector of demographic characteristics (age,
sex, race etc.) and health conditions, C
it
is the non-negative switching cost which is
4
In the cross-sectionalLogit model with lagged regressors,all lagged switching indicators are signif-
icant in the 2009 and 2010 regressions.
57
separate from attention cost (the cost of paperwork, change pharmacy, etc.), D
ij−1t
is a dummy for beneficiary i choosing the same old plan as in t− 1, and e
it
is a
standard type-1 extreme value random variable. The switching cost C
it
is dependent
on observables and the unobserved Q
i
:
C
it
= exp(Y
it
δ−u
e
Q
i
),
where Y
it
are observables related to switching costs.
3.3.2 Estimation
We assume the switching cost C
it
is constant and the distribution of Q
i
’ is standard
Normal. GivenQ
i
, The probability ofbeneficiaryichoosing planj inyear t(I
ijt
=1)
is:
Pr(I
ijt
=1|Q
i
) =
Pr(A
it
=0)+Pr(A
it
=1)×
exp((Z
ijt
β+C)/exp(ucQ
i
))
P
k
exp((Z
ikt
β+CD
ik−1t
)/exp(ucQ
i
))
if D
ij−1t
= 1
Pr(A
it
=1)×
exp(Z
ijt
β/exp(ucQ
i
))
P
k
exp((Z
ikt
β+CD
ik−1t
)/exp(ucQ
i
))
if D
ij−1t
= 0
So the log likelihood function is:
LnL =
X
i=1,j=chosen plan
Z
"
2010
Y
t=2008
Pr(I
ijt
=1|Q
i
)
#
dQ
i
.
WiththedistributionofQ
i
fullyspecified,themodelisidentifiedbythecorrelation
in switching decision and plan choice quality except that we need to restrict the sign
of u
a
and u
c
since the distribution Q
i
is symmetric. u
a
> 0 and u
c
< 0 implies
that attention probability and quality of plan choice are increasing with Q
i
. In the
Maximum Likehood estimation, the intergration over unobserved Q
i
in the likehood
58
function is evaluated numerically using the Gaussian-Hermite quadrature.
3.3.3 Results
We construct a random sample of 1,000 beneficiaries from the unbalanced panel of
active decision makers from 2007-2010 due to the constraints in computing power.
In this sample, number of beneficiaries in 2007-08, 08-09, 09-10 are 749, 792 and 802
respectively.
The estimation results of the two-stage model are displayed in Table 3.8. In
the plan choice stage, a minimal set of covariates including plan-finder prediction of
Consumer Inclusive Cost and premium are included. In the attention stage, (1) in
Table 3.8 the attention triggers are dummies for end stage in Part D in the baseline
year, predicted changes in OOPandpremium; (2)adds demographics ascontrols and
includs interaction between enrolling in plans with gap coverage and ending in gap
in the baseline year.
The “ability effect” is significant in both stages. In the attention stage, ending in
gap in the baseline year is an attention trigger, especially if they enrolled in a plan
with no gap coverage in that year. For illustration, consider a white female, aged
70, who did not enroll in a plan with gap coverage. Assume her changes in CIC and
premium are both $50. If she did not hit the “donut-hole”, model (2) predicts her
probability of attention with Q
i
= 0 at the average level as 0.023, while if Q
i
lies at
two standard deviation above the mean, she will pay attention with probability 0.67
(Table 3.9). If she ended in the gap, the attention probabilities at Q
i
= 0 and Q
i
=2
are 0.07 and 0.87 respectively.
In the plan choice stage, the beneficiary with Q
i
=2 perceives a variance as small
59
Table 3.8: Two stage model estimation results
MODEL (1) (2)
VARIABLES
stage: attention
age -0.168(0.308)
agesquared 0.001(0.002)
male -0.145(0.252)
raceblack 0.560(0.650)
raceothernonwhite -2.07(1.448)
ended in gap 0.772(0.244) *** 1.20(0.292)***
ended in catastrophic 0.513(0.503) 0.559(0.514)
old plan has gap coverage .875(.408)**
gapent*cov -1.74(.597)***
change in CIC 0.002(0.000494)*** .00197(.000519)***
change in Premium 0.00554(0.000963)*** .00568(.000995)***
constant -4.26(0.302) *** 3.69(12.2)
stage: plan choice
predicted OOP -0.00100(0.00028)*** -0.00101(0.000274)***
annualprem -0.00284(0.000664) *** -0.00278(0.000647) ***
switching cost 0.347(0.153)** 0.344(0.149) **
ability parameters
attention stage 2.15(0.254)*** 2.23(0.273) ***
plan choice stage -0.738(0.140)*** -0.748(.136) ***
Standard errors in parentheses
*** p< 0.01, ** p <0.05, * p< 0.1
Case 1 Case 2 Case 1&2
Q
i
Pr(A
i
= 1) Pr(A
i
= 1) Variance in Choice Stage
-2 0.00027 0.0009 20
-1 0.0025 0.0083 4.5
0 0.023 0.07 1
1 0.18 0.42 0.22
2 0.67 0.87 0.05
Table 3.9: Probability of paying attention and variance in plan choice stage for a
representive beneficiary
60
as 5% compared with the average beneficiary with Q
i
= 0. On average, beneficiaries
are putting more weight on plan premium than OOP (
ˆ
β
premium
/
ˆ
β
OOP
= 2.75). We
also found that the “switching cost” exists in the plan choice stage conditional on
paying attention.
Table 3.10 compares the two-stage model results with the complete set of atten-
tion triggers (Table 3.8, column (2)) with the standard one-stage Multinomial Logit
estimation results. In the latter case, we estimate a conditional logit model the agent
i has utility U
ijt
choosing plan j in year t:
U
ijt
=Z
ijt
β +cD
i,j−1t
+e
it
where c is the switching cost for the beneficiary to move away from the old plan, and
Z
ijt
are the same perspective cost variables as used in the two-stage model: premium
and plan finder prediction of beneficiaiy i’s Out-of-Pocket spending in plan j.
MODEL (1) (2) (3)
VARIABLES
No. Obs. 119,737 119,737 153,973,084
stage: plan choice
predicted OOP -0.00101(0.000274)*** -0.00100(0.000246)*** -0.00106(6.33e-6)***
annualprem -0.00278(0.000647)*** -0.00220(0.000224)*** -0.00233(6.31e-6)***
switching cost 0.344(0.149) ** 6.22(0.076)*** 6.20(0.002) ***
Standard errors in parentheses
*** p <0.01, ** p < 0.05, * p < 0.1
Table 3.10: Two stage model estimation results with attention triggers (1) compared
withone-stagemultinomial logitmodelforplan choice stage, same subsample asused
in the two-stage model estimation (2) and full sample (3)
The estimated coefficients of the full sample, and the representive sample of 1,000
beneficiaries (same as used for the two-stage estimation) are displayed in Table 3.10.
61
The estimated effects for the premium and predicted OOP are similar for the one-
stage and two-stage model, however, the estimated “inertia effect” decreased from
6.20 (one-stage model) to 0.34 (two-stage model). Inattention is a major source of
estimated switching cost in the one-stage logit model, which is consistent with the
survey results that78%peopleamongthosewhodidswitch described switching plans
is “not at all difficult” or “not difficult”.
3.4 Conclusion
In this paper, we use detailed claims data to investigate switching in Medicare Part
D. Motivated by explaining the prevailing “inertia” and substantial “foregone sav-
ings” in Medicare Part D, we analyze the effect of demographics, changes in old plan,
change in health conditions, and previous experience in Part D on switching. We
develop a structural model to separate switching cost from attention, and introduce
heterogeneous unobserved “ability” to capture the correlation between switching de-
cision and quality of plan choice. Under the two-stage model framework, we found
significant ability effect in both the attention stage and plan choice stage.
Our findings of the “ability effect” suggests that it is important for policy makers
to increase channels to help beneficiaries understanding their insurance plans and
improve the quality of plan choice. These findings also have important implications
when considering the effectofchanges inregulation: existing literature onthewelfare
implications of switching cost in medical insurance market mainly focused on how
inertiainteractswithadverse selection, andouranalysis suggestthatheterogeneity in
consumershastobetakenintoaccountwhenconsideringinertiaeffects. Forexample,
in counterfactual simulations of policy changes such as removing the “donuthole” in
62
Part D, beneficiaries may differ in their attention and response to the change.
Our analysis has several limitations and extentions. First, we focus on choice of
seniors among Fee-For-Service plans. We observed that most switching occur be-
tween different FFS plans or between different HMO plans, however, it has been
documented that beneficiaries tend to switch from HMO plans to FFS plans as their
health conditions deteriates. Second, we assume inelastic drug usage in the simula-
tion, however, agents may select cheaper drugs in the same RxNorm or IMS drug
classification. Third, our model does not explictly model that beneficiaries may learn
from past experiences and improve their plan choice over time. Last, we treat premi-
ums of health plans as exogeneous and neglect the pricing strategies of the insurance
companies, and for welfare analysis the pricing strategy of plan providers has to be
taken into account.
Our structural model can be extended to forced deciders and allow more flexible
specifications. For example, the switching cost, sententivity to “surprises” in the
attention stage and weights on premium and OOP may vary with the “ability”.
Another future direction of the structural model evaluate the wellfare loss caused
by inattention and switching cost under the “latent ability” assumption to explore
how policy interventions to force attention such as removing the default of staying
in old plan, or eliminating the transaction cost will affect the plan choice quality of
beneficiaries.
63
Appendix A
Proof of Theorems in Chapter 2
A.1 Proof of Theorems in Section 2.2
A.1.1 Proof of Proposition 1
Proof. The first order condition and transversality condition imply that
P
t
D
t
=E
t
β
P
t+1
+D
t+1
D
t
(
D
t+1
D
t
)
−γ
=E
t
"
∞
X
j=1
β
j
(
D
t+j
D
t
)
1−γ
#
. (A.1)
Note that (A.1) is a linear function of q
t
:
P
t
D
t
= (1−q
t
)E
t
"
∞
X
j=1
β
j
(
D
t+j
D
t
)
1−γ
|S
t+1
=1
#
+q
t
E
t
"
∞
X
j=1
β
j
(
D
t+j
D
t
)
1−γ
|S
t+1
=2
#
.
64
Let P
t
= K
0
+K
1
q
t
and substitute into (A.1) to find the coefficients K
0
, K
1
:
D
t
∗(K
0
+K
1
q
t
)= E
t
[M
t+1
D
t+1
(1+K
0
+K
1
q
t+1
)]
=D
t
(1−q
t
)(1+K
0
+K
1
π
12
)E
βe
−γ(g+ǫ
t+1
)
e
g+ǫ
t+1
+q
t
(1+K
0
+K
1
π
22
)E
βe
−γ(g+v
t+1
)
e
g+v
t+1
=D
t
(1−q
t
)(1+K
0
+K
1
π
12
)βexp(−(γ−1)g+
(γ−1)
2
σ
2
2
)
+q
t
(1+K
0
+K
1
π
22
)
βexp(−(γ−1)g)
1−(γ−1)
2
b
2
This must hold for all q
t
so K
0
, K
1
satisfy the equations in Proposition 1.
A.1.2 Proof of Corollary 1
Proof. The gross return R
t+1
is:
R
t+1
=(P
t+1
+D
t+1
)/P
t
=D
t+1
(1+K
0
+K
1
q
t+1
)/P
t
R
t+1
|F
t
∼
D
t
P
t
(1+K
0
+K
1
π
12
)exp(g+ǫ
t+1
) w.p. 1−q
t
(1+K
0
+K
1
π
22
)exp(g+v
t+1
) w.p. q
t
65
A.1.3 Proof of Lemma 1
Proof. From eq.(2.7),
U
t
= f(q
t
)C
t
= C
1−β
t
E
t
(U
1−γ
t+1
)
β
1−γ
=C
1−β
t
E
t
(f(q
1−γ
t+1
C
1−γ
t+1
)
β
1−γ
= C
t
e
βg
C
(1−q
t
)f(π
12
)
1−γ
e
(1−γ)
2
σ
2
C
2
+q
t
f(π
22
)
1−γ
1
1−(1−γ)
2
b
2
C
β
1−γ
A.1.4 Proof of Proposition 3
Proof. The Euler condition P
t
=E
t
[M
t+1
(P
t+1
+D
t+1
)] implies
D
t
h(q
t
)= E
t
[M
t+1
D
t+1
[1+h(q
t+1
)]]
where
M
t+1
=
βC
t
C
t+1
f(q
t+1
)C
t+1
R
t
(U
t+1
)
1−γ
.
Substitute (2.8) into the above equation to obtain equation (2.9).
A.1.5 Proof of Corollary 2
Proof. The gross return R
t+1
is:
R
EZ
t+1
= (P
t+1
+D
t+1
)/P
EZ
t
= D
t+1
(1+h(q
t+1
))/P
EZ
t
66
R
EZ
t+1
|F
t
∼
D
t
P
EZ
t
(1+h(π
12
))exp(g+ǫ
t+1
) w.p. 1−q
t
(1+h(π
22
))exp(g+v
t+1
) w.p. q
t
This is a mixture of Lognormal and Log-Laplace distributions with coefficients:
μ
EZ
t
=ln(D
t
/P
EZ
t
)+g+ln(1+h(π
12
))
k
EZ
t
= ln(D
t
/P
EZ
t
)+g+ln(1+h(π
22
))
A.2 Theorems in Section 2.3
In the following section, we sketch the proof the theorem on asymptotic normality of
the Maximum Likelihood Estimator. For a detailed reference, see Bickel, Ritov and
Ryden (1998).
The proof consists of two parts: a central limit theorem for the score function and
a law of large numbers for the observed information matrix in the following sections
A.2.2 and A.2.3.
A.2.1 Notation
Inthisappendixthefollowingshort-handnotationisused toavoidlengthy equations.
l
1:t
(θ) =lnL(r
1
,...,r
t
;θ),
l
t|1:t−1
(θ)= lnL(r
t
|r
1
,...,r
t−1
;θ).
67
Weassumethattheprocess{(
˜
S
t
,r
t
)}isstationary, thereforeitcanbeextended to
a doubly infinite sequence {(
˜
S
t
,r
t
)}
∞
t=−∞
. It is also convenient to view
˜
S
t
as missing
data in Hidden Markov Models. In the following proof, we denote the complete
information likelihood as:
l
1:t
(
˜
S;θ) =lnL(r
1
,...,r
t
,
˜
S
1
,...,
˜
S
t
;θ).
A.2.2 Central Limit Theorem for the Score Function
This section proves the following lemma of CLT for the score function:
Lemma 2. Under Assumption 1, 2 and 3,
1
√
T
∇
θ
l
1:T
(θ
0
)→N(0,I
0
) as T→∞.
Themainideaoftheproofistowritethelogoflikelihoodl
1:T
(θ
0
)intothesummu-
ation of conditional likelihood l
t|1:t−1
(θ) for each t, and compare {∇
θ
l
t|1:t−1
(θ)} with
a stationary sequence{∇
θ
l
t|−∞:t−1
(θ)}.
Proposition 4.
||∇
θ
l
t|1:t−1
(θ)−∇
θ
l
t|−∞:t−1
(θ)||
2
≤Cρ
t
,
where C is a constant and 0< ρ<1 and|| ||
2
denotes L
2
-norm under P
0
.
Proof. See BRR(1998).
Finally we prove Lemma 2 below:
Proof. Since{r
t
} is stationary{∇
θ
l
t|−∞:t−1
(θ)} is a stationary and ergodic sequence.
68
Also E
0
∇
θ
l
1|−∞:0
(θ)|r
−∞:0
= 0, so {∇
θ
l
t|−∞:t−1
(θ)} is a martingale difference se-
quence. By central limit theorem for martingales,
1
√
T
∇
θ
l
t|−∞:t−1
(θ)→N(0,I
0
).
By proposition 4,
1
√
T
T
X
t=1
∇
θ
l
t|1:t−1
(θ
0
)−
1
√
T
T
X
t=1
∇
θ
0
l
t|−∞:t−1
(θ)
≤
1
√
T
T
X
t=1
||∇
θ
l
t|1:t−1
(θ
0
)−∇
θ
l
t|−∞:t−1
(θ
0
)||
2
→0,
therefore
1
√
T
∇
θ
l
t|1:t−1
(θ
0
)
1
√
T
∇
θ
l
t|1:t−1
(θ
0
)→N(0,I
0
).
A.2.3 Law of Large Numbers for Observed Information Ma-
trix
Lemma 3. Under Assumptions 1, 2 and 3,
1
T
∇
2
θ
l
1:T
(θ
T
)→−I
0
in probability under P
0
as T→∞
if θ
T
→θ
0
almost surely.
Proof. See BRR(1998).
69
A.2.4 Proof of Theorem 1
Proof. Since
ˆ
θ
T
→ θ
0
almost surely,
ˆ
θ
T
is an interior point of Θ for large enough T.
Take the Taylor expansion of∇
θ
l
1:T
(
ˆ
θ
T
) and invert the equation:
0 =∇
θ
l
1:T
(
ˆ
θ
T
)=∇
θ
l
1:T
(θ
0
)+∇
2
θ
l
1:T
(
˜
θ
T
)(
ˆ
θ
T
−θ
0
)
√
T(
ˆ
θ
T
−θ
0
) =
−
1
T
∇
2
θ
l
1:T
(
˜
θ
T
)
−1
1
√
T
∇
θ
l
1:T
(θ
0
)
By Lemmas 2 and 3 and the Slutsky theorem, Theorem 1 holds.
70
Appendix B
Supplements for Chapter 3
B.1 Simulation Details
The simulated Out of Pocket spending is calcualted by running the beneficiary’s
claims records each year through the FBD of each plan in her choice set.
B.1.1 Ordering Claims
The claims of each beneficiary are sorted in the following order:
1. Service date;
2. Beneficiary Phase (DD/DP/DI/DC/PP/PI/PC/II/IC/CC); (“D”=Deductible
phase, “P”=Pre-ICL phase, “I”=ICL phase, “C”=Catastrophic phase. “DD”
stands for a claim in the deductile phase, “DI” denotes a straddle claim which
starts in dedictuble and ends in pre-ICL phase)
3. Patient pay amount;
71
4. Total drug cost.
B.1.2 CalculationofActualandSimulatedOut-of-PocketSpend-
ing
The actual OOP of is calculated by summing up the patient pay amount of all claims
in the reference year. The simulated OOP, for a claim in the same benefit phase, is
calculated as:
simulated OOP =
copay(tier j, phase i, days supply t) if copay amount applies
coinsurance(tier j, phase i)*totalcost if coinsurance rate applies
(B.2)
The cost of straddle claims are divided into different benefit phases once the
deductible amount/ICL limit/catastrophic threshold is reached, and in each benefit
phase, the simulated OOP is calcualted using equation (B.2).
B.2 Medicare Part D Spending by Switching Sta-
tus
B.2.1 2007-08
Ex-post cost measures year t-1 and Ex-ante cost measures t (forced decision makers
are excluded)
72
Switching Status Mean
CIC
chosen
expost
CIC
min
expost
CIC old
exante
CIC
min
exante
CIC
chosen
exante
Active Switchers (11%) 1,523.7 1,213.9 1,711.5 1,264.8 1,521.9
Stay in Old Plan: renewal (86%) 1,251.5 959.5 1,343.5 982.0 1,343.5
Stay in Old Plan: consolidated in (1%) 709.7 563.8 763.8 597.5 763.8
Stay in Old Plan: consolidated out (0%) 1,302.4 1,014.5 1,365.3 1,077.2 1,365.3
Total (100%) 1,272.6 981.2 1,374.5 1,007.3 1,353.4
Ex-ante savings measures year t (forced decision makers are excluded)
Switching Status Mean
ExPostOverpay Change
OldPlan
Save from
Switch
Saving
Exante
Active Switchers (11%) 309.7 187.8 446.7 257.1
Stay in Old Plan: renewal (86%) 292.0 92.0 361.6 361.6
Stay in Old Plan: consolidated in (1%) 145.8 54.1 166.3 166.3
Stay in Old Plan: consolidated out (0%) 287.9 62.9 288.1 288.1
Total (100%) 291.4 101.9 367.2 346.0
Ex-post cost and savings measures year t (including new enrollees and forced
switchers)
73
Switching Status Mean
CIC
chosen
expost
CIC min
expost
Saving
expost
Active Switchers (10%) 1,416.0 1,182.6 233.4
Forced: new enrollees to Part D (3%) 1,175.4 920.6 254.8
Forced: from HMO/PPO/EMP to standalone Part D (0%) 1,270.7 1,018.0 252.7
Forced: old standalone Part D plan terminated (0%) 1,598.7 1,149.9 448.8
Stay in Old Plan: renewal (83%) 1,312.3 969.9 342.5
Stay in Old Plan: consolidated in (1%) 786.7 617.3 169.4
Stay in Old Plan: consolidated out (0%) 1,196.9 922.9 274.0
Total (100%) 1,309.4 985.0 324.4
B.2.2 2008-09
Ex-post cost measures year t-1 and Ex-Ante cost measures t (forced decision makers
are excluded)
Switching Status Mean
CIC
chosen
expost
CIC
min
expost
CIC old
exante
CIC
min
exante
CIC
chosen
exante
Active Switchers (10%) 1,306.0 1,029.3 1,530.3 1,100.2 1,330.5
Stay in Old Plan: renewal (84%) 1,268.5 944.7 1,432.6 1,017.1 1,432.6
Stay in Old Plan: consolidated in (3%) 1,579.5 1,217.5 1,726.3 1,291.7 1,726.3
Stay in Old Plan: consolidated out (1%) 1,533.6 1,077.3 1,565.6 1,163.5 1,565.6
Total (100%) 1,286.9 965.2 1,455.2 1,037.7 1,434.4
74
Ex-ante savings measures year t (forced decision makers are excluded)
Switching Status Mean
ExPostOverpay Change
OldPlan
Save from
Switch
Saving
Exante
Active Switchers (10%) 276.7 224.3 430.1 230.4
Stay in Old Plan: renewal (84%) 323.8 164.1 415.4 415.4
Stay in Old Plan: consolidated in (3%) 362.1 146.7 434.6 434.6
Stay in Old Plan: consolidated out (1%) 456.3 32.0 402.1 402.1
Total (100%) 321.7 168.3 417.5 396.7
Ex-post cost and savings measures year t (including new enrollees and forced
switchers)
Switching Status Mean
CIC
chosen
expost
CIC min
expost
Saving
expost
Active Switchers (10%) 1,310.0 1,101.2 208.8
Forced: new enrollees to Part D (3%) 1,279.1 1,005.3 273.7
Forced: from HMO/PPO/EMP to standalone Part D (0%) 1,326.8 1,018.9 307.8
Forced: old standalone Part D plan terminated (0%) 1,950.1 1,617.6 332.6
Stay in Old Plan: renewal (81%) 1,445.0 1,043.3 401.7
Stay in Old Plan: consolidated in (3%) 1,766.2 1,353.0 413.2
Stay in Old Plan: consolidated out (1%) 1,622.5 1,242.5 380.0
Total (100%) 1,439.4 1,061.6 377.8
75
B.2.3 2009-10
Ex-post cost measures year t-1 and Ex-Ante cost measures t (forced decision makers
are excluded)
Switching Status Mean
CIC
chosen
expost
CIC
min
expost
CIC old
exante
CIC
min
exante
CIC
chosen
exante
Active Switchers (9%) 1,369.0 1,066.4 1,514.0 1,103.2 1,336.0
Stay in Old Plan: renewal (48%) 1,404.1 1,028.4 1,509.0 1,066.2 1,509.0
Stay in Old Plan: consolidated in (36%) 1,414.2 1,028.8 1,466.2 1,055.6 1,466.2
Stay in Old Plan: consolidated out (5%) 1,335.6 941.0 1,443.4 958.4 1,443.4
Total (100%) 1,400.6 1,027.3 1,490.1 1,059.8 1,473.1
Ex-ante savings measures year t (forced decision makers are excluded)
Switching Status Mean
ExPostOverpay Change
OldPlan
Save from
Switch
Saving
Exante
Active Switchers (9%) 302.6 145.0 410.8 232.8
Stay in Old Plan: renewal (48%) 375.7 104.8 442.7 442.7
Stay in Old Plan: consolidated in (36%) 385.4 52.0 410.6 410.6
Stay in Old Plan: consolidated out (5%) 394.6 107.8 485.1 485.1
Total (100%) 373.4 89.4 430.3 413.3
Ex-post cost and savings measures year t (including new enrollees and forced
switchers)
76
Switching Status Mean
CIC
chosen
expost
CIC min
expost
Saving
expost
Active Switchers (9%) 1,310.6 1,104.1 206.4
Forced: new enrollees to Part D (2%) 1,316.2 1,048.5 267.7
Forced: from HMO/PPO/EMP to standalone Part D (0%) 1,328.0 1,036.2 291.8
Forced: old standalone Part D plan terminated (0%) 1,412.7 1,156.3 256.4
Stay in Old Plan: renewal (46%) 1,512.0 1,095.1 416.8
Stay in Old Plan: consolidated in (35%) 1,450.9 1,071.7 379.2
Stay in Old Plan: consolidated out (5%) 1,432.3 990.9 441.4
Total (100%) 1,461.6 1,080.7 380.9
77
B.3 Reduced Form Regression
Table B.1: Classification of Chronic Conditions
Acute costly: ami
ischemic heart disease
stroke
cheap: hip fracture
Cancer costly: lung cancer
breast cancer
cheap: colorectal cancer
endometrial cancer
prostate cancer
Chronic Disabling costly: Alzheimers and Related Disorder
cheap: COPD
chronic kidney disease
chronic heart failure
Non-fatal costly: diabetes
depression
asthma
cheap: rheumatoid arthritis / osteoarthritis
glaucoma
hypertension
acquired hypothyroidism
anemia
benign prostatic hyperplasia
hyperlipidemia
atrial fibrillation
Other Cataract
78
Table B.2: Part A/B Health Care Use Variables
hhsniffind Dummy for at least one home health or sniff stay
hopind Dummy for at least one hospice stay
dialysind Dummy for at least one dialysis
oprocind Dummy for at least one other procedure
ptbdrugind Dummy for at least one drug event in Part B
imgtestind Dummy for at least one imaging or testing event
dmeind Dummy for at least one part B durable medical equipment event
othcind Dummy for at least one other part B carrier events
ervisitsind Dummy for at least one ER visit
acuteoipind Dummy for at least one out or inpatient hospital stay
ascanesind Dummy for at least one ambulatory surgery center event or anesthesia services event
emphysevt Number of doctor visits
emphysevtchange Change in number of doctor visits in t-1 compared to t-2
79
Table B.3: Linear Regression of Change in OOP on
Chronic Conditions
2007-08 2008-09 2009-10
VARIABLES d oop d oop d oop
age 1.193*** 0.769*** -0.0885
(0.108) (0.108) (0.108)
male 19.51*** 12.84*** 8.187***
(1.913) (1.918) (1.900)
race black -2.074 -14.32*** -18.18***
(3.909) (3.896) (3.893)
race asian 3.207 14.91* 0.403
(9.139) (8.913) (8.676)
race hisp 8.065 -12.77 -16.09
(10.69) (10.81) (11.00)
race other 10.57 -2.019 -4.117
(6.550) (6.273) (6.009)
ami everflag -29.14*** 0.204 5.578*
(3.242) (3.168) (3.114)
ischmcht everflag -11.73*** 5.024*** 3.316**
(1.511) (1.509) (1.497)
strketia everflag 4.317** -5.844*** 3.840**
Continued on next page
80
Table B.3 – continued from previous page
2007-08 2008-09 2009-10
(1.964) (1.943) (1.919)
hipfrac everflag 12.73*** 11.17*** 8.368**
(3.683) (3.623) (3.578)
ami incflag 176.7*** 177.8*** 211.8***
(7.923) (8.149) (8.057)
ischmcht incflag 61.42*** 66.54*** 66.95***
(3.682) (3.777) (3.874)
strketia incflag 90.41*** 90.46*** 96.53***
(4.798) (4.860) (4.928)
hipfrac incflag -73.64*** -62.66*** -78.42***
(7.328) (7.315) (7.380)
cncrbrst everflag 14.98*** 5.262** -29.60***
(2.663) (2.628) (2.590)
cncrclrc everflag 5.904 6.799* 3.393
(3.635) (3.606) (3.592)
cncrendm everflag 22.58*** -15.70** 5.477
(6.819) (6.683) (6.565)
cncrlung everflag 43.24*** 37.42*** 25.86***
(6.100) (5.913) (5.780)
cncrprst everflag 0.298 -15.09*** -10.21***
(2.964) (2.914) (2.851)
Continued on next page
81
Table B.3 – continued from previous page
2007-08 2008-09 2009-10
cncrbrst incflag 115.5*** 125.5*** 97.61***
(10.05) (10.18) (10.34)
cncrclrc incflag 3.294 5.416 25.44**
(11.39) (11.95) (12.43)
cncrendm incflag 5.381 9.265 22.46
(21.02) (21.35) (21.59)
cncrlung incflag 147.3*** 183.6*** 216.4***
(11.40) (11.68) (11.58)
cncrprst incflag 94.90*** 72.44*** 40.87***
(9.730) (10.09) (10.06)
alzhdmta everflag 123.1*** 86.98*** 105.9***
(2.350) (2.313) (2.283)
copd everflag 6.160*** 11.87*** 11.76***
(1.697) (1.677) (1.657)
chrnkidn everflag -9.138*** -3.937** -6.430***
(1.995) (1.901) (1.815)
osteoprs everflag -37.60*** -28.21*** -15.02***
(1.663) (1.648) (1.630)
alzhdmta incflag 196.6*** 173.7*** 186.3***
(4.136) (4.150) (4.143)
copd incflag 66.24*** 83.23*** 83.01***
Continued on next page
82
Table B.3 – continued from previous page
2007-08 2008-09 2009-10
(4.124) (4.290) (4.339)
chrnkidn incflag 47.30*** 34.05*** 41.87***
(3.653) (3.551) (3.483)
osteoprs incflag 30.69*** 27.98*** 28.64***
(4.193) (4.281) (4.441)
ra oa everflag -2.070 -3.502** -5.511***
(1.385) (1.390) (1.385)
diabetes everflag 5.949*** 9.791*** 8.579***
(1.491) (1.468) (1.443)
depressn everflag 2.453 -0.144 -6.343***
(1.637) (1.607) (1.572)
glaucoma everflag 9.486*** 12.07*** 9.105***
(1.541) (1.522) (1.497)
hypert everflag -16.26*** 11.50*** -1.554
(1.907) (1.950) (1.954)
hypoth everflag -6.281*** -8.867*** -3.344**
(1.633) (1.610) (1.581)
anemia everflag 0.564 1.138 -6.692***
(1.449) (1.450) (1.442)
asthma everflag -3.142 19.42*** -2.598
(2.247) (2.191) (2.132)
Continued on next page
83
Table B.3 – continued from previous page
2007-08 2008-09 2009-10
hyperp everflag 9.173*** 12.32*** -35.11***
(2.197) (2.172) (2.128)
hyperl everflag -16.13*** 2.190 -1.275
(1.695) (1.760) (1.804)
ra oa incflag 17.39*** 23.97*** 19.99***
(3.288) (3.330) (3.354)
diabetes incflag 36.47*** 47.65*** 34.46***
(4.349) (4.425) (4.470)
depressn incflag 85.83*** 79.49*** 75.37***
(3.995) (4.033) (4.033)
glaucoma incflag 27.87*** 33.53*** 44.42***
(5.084) (5.127) (5.214)
hypert incflag 56.31*** 52.05*** 52.64***
(4.221) (4.394) (4.528)
hypoth incflag 30.15*** 42.24*** 31.73***
(4.785) (4.789) (4.721)
anemia incflag 49.48*** 42.37*** 49.84***
(3.050) (3.123) (3.165)
asthma incflag 82.16*** 97.65*** 100.4***
(6.298) (6.406) (6.381)
hyperp incflag 90.52*** 85.68*** 41.62***
Continued on next page
84
Table B.3 – continued from previous page
2007-08 2008-09 2009-10
(5.556) (5.680) (5.681)
hyperl incflag 44.26*** 49.05*** 43.90***
(4.044) (4.218) (4.380)
atrialfb everflag -18.11*** -6.127*** -5.280***
(1.957) (1.934) (1.905)
cataract everflag 0.982 3.364** 4.472***
(1.592) (1.612) (1.609)
chf everflag -21.17*** -0.723 -6.679***
(1.790) (1.776) (1.762)
atrialfb incflag 54.47*** 54.47*** 46.14***
(4.751) (4.709) (4.669)
cataract incflag 24.78*** 29.41*** 31.22***
(3.583) (3.608) (3.625)
chf incflag 60.96*** 66.57*** 66.68***
(3.830) (3.907) (3.928)
Constant -103.9*** -49.02*** 11.80
(8.156) (8.130) (8.028)
Observations 954,755 999,379 1,029,106
R-squared 0.015 0.012 0.012
85
Table B.4: Logit Regression (Dependent variable:
switching indicator (1 if switched, 0 if no switch), Stan-
dard errors in parentheses, *** p < 0.01, ** p < 0.05, *
p< 0.1)
VARIABLES Coeff. Std. Err.
age -0.0363*** (0.00665)
agesquared 0.000156*** (4.18e-05)
male -0.0280*** (0.00502)
raceblack -0.497*** (0.0177)
raceasian -0.0655* (0.0353)
racehisp -0.229*** (0.0471)
raceother -0.0891*** (0.0239)
zpctadvdeg 0.731*** (0.0670)
zpctbachdeg 1.382*** (0.0525)
zpcthsgrad 1.946*** (0.0420)
zmedhhinctot -1.31e-06*** (1.62e-07)
changeold 0.00122*** (1.68e-05)
saveswiexante 0.000396*** (1.30e-05)
savingpf 0.000118*** (1.32e-05)
gapent 0.129*** (0.00566)
catent -0.126*** (0.0137)
indcons -0.229*** (0.00813)
Continued on next page
86
Table B.4 – continued from previous page
VARIABLES Coeff. Std. Err.
annualprmchange 0.00695*** (2.84e-05)
premdiff -0.0518*** (0.000153)
dedamtchange 0.00699*** (6.63e-05)
hhsniffind -0.125*** (0.00972)
hopind 0.123*** (0.00594)
dialysind -0.0209 (0.0401)
oprocind 0.0835*** (0.00575)
ptbdrugind 0.0509*** (0.00534)
imgtestind -0.0395*** (0.0108)
dmeind 0.0277*** (0.00529)
othcind 0.0873*** (0.00594)
ervisitsind -0.0785*** (0.00623)
acuteoipind -0.0147* (0.00796)
ascanesind -0.00583 (0.00564)
emphysevt -0.00157*** (0.000237)
emphysevtchange 2.50e-05 (0.000210)
anyeveracute1any -0.0979*** (0.0146)
anyeveracute2any -0.0246*** (0.00499)
anyevercancer1any -0.000183 (0.00898)
anyevercancer2any 0.00128 (0.00828)
anyeverdisable1any -0.0644*** (0.00927)
Continued on next page
87
Table B.4 – continued from previous page
VARIABLES Coeff. Std. Err.
anyeverdisable2any -0.0174*** (0.00498)
anyevernonfatal1any -0.0688*** (0.00495)
anyevernonfatal2any -0.125*** (0.0119)
anyeverotherany 0.0931*** (0.00567)
anyincacute1any -0.0358*** (0.0104)
anyincacute2any -0.0855*** (0.0304)
anyinccancer1any -0.157*** (0.0296)
anyinccancer2any -0.0653*** (0.0139)
anyincdisable1any -0.114*** (0.0174)
anyincdisable2any -0.0614*** (0.00859)
anyincnonfatal1any -0.0680*** (0.00981)
anyincnonfatal2any -0.0223*** (0.00582)
anyincotherany 0.0714*** (0.0113)
D2007 -0.00514 (0.00837)
D2008 -0.0819*** (0.0146)
D2009 -0.582*** (0.0249)
Y2009 -0.0727*** (0.00548)
Y2010 0.117*** (0.00665)
Constant -2.336*** (0.263)
Observations 2,607,276
88
Bibliography for Chapter 2
Barro, R. (2006). Rare disasters and asset markets in the twentieth century. The
Quarterly Journal of Economics, 121, 823.
Barro, R. and Jin, T. (2011). On the size distribution of macroeconomic disasters.
Econometrica, 79, 1567.
Bickel,P.J.,Ritov,Y.,andRyden,T.(1998).Asymptoticnormalityofthemaximum-
likelihood estimator for general hidden markov models. Annals of Statistics, 26,
1614.
Bollerslev, T. and Todorov, V. (2011). Estimation of jump tails. Econometrica, 79,
1727.
Calvet, L. E. and Fisher, A. J. (2004). How to forecast long-run volatility: Regime
switchingandtheestimationofmultifractalprocesses. Journal of Financial Econo-
metrics, 2, 49.
Cappe, O., Moulines, E., and Ryden, T. (2005). Inference in Hidden Markov Models.
Springer Science+Business Media, Inc.
Chauvet, M. and Potter, S. (2002). Predicting a recession: Evidence from the yield
curve in the presence of structural breaks. Economic Letters, 77, 245.
89
Cochrane, J. H., Longstaff, F. A., and Santa-Clara, P. (2008). Two trees. Review of
Financial Studies, 21(1), 347–385.
Epstein, L. G. and Zin, S. E. (1989). Substitution, risk aversion, and the temporal
behaviorofconsumptionandassetreturns: Atheoreticalframework. Econometrica,
57, 937.
Fama,E.(1963). Mandelbrotandthestableparetianhypothesis. Journalof Business,
36, 420.
Gabaix, X. (forthcoming). Variable rare disasters: An exactly solved framework for
ten puzzles in macro-finance. The Quarterly Journal of Economics.
Gourio, F. (2008). Time-series predictability in the disaster model. Finance Research
Letters, 5, 191.
Hamilton, J. D. (1989). A new approach to the economic analysis of nonstationary
time series and the business cycle. Econometrica, 57, 357.
Hamilton, J. D. (1994). Time-Series Analysis. Princeton University Press.
Kelly, B. (2011). Tail risk and asset prices. working paper.
Le Gland, F. and Mevel, L. (1997). Asymptotic properties of the mle in hidden
markov models. In Proceedings of the 4th European Control Conference, Bruxelles.
Citeseer.
Le Gland, F. and Mevel, L. (2000). Exponential forgetting and geometric ergodicity
in hidden markov models. Mathematics of Control, Signals and Systems, 13, 63.
90
Leroux, B. G. (1992). Maximum-likelihood estimation for hidden markov models.
Stochastic Processes and their Applications, 40, 127.
Linden, M. (2001). A model for stock return distribution. International Journal of
Finance and Economics, 6, 159.
Louis, T. A. (1982). Finding the observed information matrix when using the em
algorithm. Journal ofthe Royal StatisticalSociety. SeriesB (Methodological),pages
226–233.
Lucas Jr, R. E. (1978). Asset prices in an exchange economy. Econometrica: Journal
of the Econometric Society, pages 1429–1445.
Mamon, R. S. and Elliott, R. J. (2007). Hidden markov models in finance, volume 4.
Springer New York.
Martin, I. (2013). The Lucas orchard. Econometrica, 81(1), 55–111.
McLachlan, G. and Peel, D. (2000). Finite Mixture Models. JOHN WILEY & SONS,
INC.
McNeil, A. J. and Frey, R. (2000). Estimation of tail-related risk measures for het-
eroscedastic financial time series: an extreme value approach. Journal of Empirical
Finance, 7, 271.
Mevel, L. and Finesso, L. (2004). Asymptotical statistics of misspecified hidden
markov models. Automatic Control, IEEE Transactions on, 49(7), 1123–1132.
NBER (2012). US business cycle expansions and contractions.
www.nber.org/cycles.html.
91
Rietz, T. A. (1988). The equity risk premium: A solution. Journal of Monetary
Economics, 22, 117.
Schaller, H. and Van Norden, S. (1997). Regime switching and stock market returns.
Applied Financial Economics, 7, 177.
Teicher, H. (1967). Identifiability of mixtures of product measures. Annals of Math-
ematical Statistics, 38, 1300.
Wachter, J.A. (2013). Cantime-varying risk ofrare disasters explain aggregatestock
market volatility? The Journal of Finance, 68(3), 987–1035.
White, H. (1984). Asymptotic Theory for Econometricians. Academic Press, Inc.
92
Bibliography for Chapter 3
Abaluck, J. and Gruber, J. (2013). Evolving choice inconsistencies in choice of pre-
scriptiondruginsurance. Technical report, NationalBureauofEconomicResearch.
Abaluck, J. T. and Gruber, J. (2009). Choice inconsistencies among the elderly: evi-
dence from plan choice in the medicare part d program. Technical report, National
Bureau of Economic Research.
Barcellos, S. H., Wuppermann, A. C., Carman, K. G., Bauhoff, S., McFadden, D. L.,
Kapteyn,A.,Winter,J.K.,andGoldman,D.(2014). Preparednessofamericansfor
the affordable care act. Proceedings of the National Academy of Sciences,111(15),
5497–5502.
Cutler, D. M., Ghosh, K., and Landrum, M. B. (2013). Evidence for significant
compression of morbidity in the elderly us population. Technical report, National
Bureau of Economic Research.
Ericson, K. M. M. (2014). Consumer inertia and firm pricing in the medicare part
d prescription drug insurance exchange. American Economic Journal: Economic
Policy, 6(1), 38–64.
Heiss, F., Leive, A., McFadden, D., and Winter, J. (2013). Plan selection in medicare
93
part d: Evidence from administrative data. Journal of health economics, 32(6),
1325–1344.
Hoadley, J., Hargrave, E., Summer, L., Cubanski, J., and Neuman, T. (2013). To
switch or not to switch: Are medicare beneficiaries switching drug plans to save
money? Kaiser Family Foundation (October 2013): http://kaiserfamilyfoundation.
files. wordpress. com/2013/10/8501-to-switch-or-not-to-switch1. pdf.
Iyengar, S.S.andLepper, M.R.(2000). Whenchoiceisdemotivating: Canonedesire
too much of a good thing? Journal of personality and social psychology, 79(6),
995.
Ketcham, J. D., Lucarelli, C., Miravete, E. J., and Roebuck, M. C. (2012). Sinking,
swimming,orlearningtoswiminmedicarepartd.TheAmericanEconomicReview,
102(6), 2639–2673.
Luco, F. (2013). Switching costs and competition in retirement investment.
Polyakova, M. (2013). Regulation of insurance with adverse selection and switching
costs: Evidence from medicare part d.
94
Abstract (if available)
Abstract
This thesis consists of two examples of the applications of Markov Switching Models in Economics. ❧ Chapter 2 is an application of Hidden Markov Model in time-series data in Finance. Building on a Lucas tree asset pricing model, we relates the tail risk of asset prices to the component-density of a Normal-Laplace mixture distribution and propose a new method to measure extreme event behavior in financial markets. The hidden state of the model represents the underlying state of the macroeconomy, which follows a two-state Markov regime switching process. Conditional on the state being ""normal"" or ""extreme"", the log dividend is subject to Normal or Laplace (fat-tailed) shocks respectively. The asset's price is derived from discounted dividend values, where the stochastic discount factor is determined by the utility maximization of a representative agent who holds the asset. Finally, the identifiably of the model parameters, Maximum Likelihood estimation techniques and asymptotic properties of the MLE are discussed, and the estimation results are illustrated using S&P index returns. In this example, the stationary distribution of the hidden Markov process exists and can be estimated by forward-backward algorithm. ❧ Chapter 3 is an empirical application of Markov-switching process in Health Economics. We focus on Medicare beneficiaries' decision to switch their Prescription Drug plans, assuming the switching decision is affected by a set of state variables including demographics, previous Part D experience, health shocks, change in plan supply side, and other events that may trigger switching. Our reduced form analysis indicates that switching is triggered by premium or deductible increase in the old plan, entering ""the coverage gap"" in the previous year, and past switching behavior predicts future switching after controlling for other factors. Next we develop a two-stage structural model where agents stay in old plan by default and enter the plan choice stage only if they pay attention first. The actual switching cost is separated from attention cost in this model. To explain the observed switcher/stayer pattern in the data, we assume that agents differ by unobserved ""ability"", where the higher ability agent is more likely to pay attention and predicts his utility from plan choice more precisely.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Three essays on the identification and estimation of structural economic models
PDF
Panel data forecasting and application to epidemic disease
PDF
Essays on econometrics analysis of panel data models
PDF
Essays in health economics: evidence from Medicare
PDF
Two essays in econometrics: large N T properties of IV, GMM, MLE and least square model selection/averaging
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
Essays on economics of education
PDF
Behavioral approaches to industrial organization
PDF
Large N, T asymptotic analysis of panel data models with incidental parameters
PDF
Credit risk of a leveraged firm in a controlled optimal stopping framework
PDF
Essays in panel data analysis
PDF
Essays on price determinants in the Los Angeles housing market
PDF
Approximating stationary long memory processes by an AR model with application to foreign exchange rate
PDF
A structural econometric analysis of network and social interaction models
PDF
Theoretical modeling of nanoscale systems: applications and method development
PDF
A stochastic Markov chain model to describe cancer metastasis
PDF
Three essays on agent’s strategic behavior on online trading market
PDF
Bayesian analysis of stochastic volatility models with Levy jumps
PDF
Essays on the econometric analysis of cross-sectional dependence
PDF
Essay on monetary policy, macroprudential policy, and financial integration
Asset Metadata
Creator
Zhou, Bo
(author)
Core Title
Applications of Markov‐switching models in economics
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Economics
Publication Date
09/18/2014
Defense Date
09/18/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Markov-switching,Medicare Part D,normal-Laplace mixture,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Hsiao, Cheng (
committee chair
), McFadden, Daniel (
committee member
), Ridder, Geert (
committee member
)
Creator Email
bozhou.zb@gmail.com,zhoub@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-479626
Unique identifier
UC11287798
Identifier
etd-ZhouBo-2960.pdf (filename),usctheses-c3-479626 (legacy record id)
Legacy Identifier
etd-ZhouBo-2960.pdf
Dmrecord
479626
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Zhou, Bo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Markov-switching
Medicare Part D
normal-Laplace mixture