Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Attrition in longitudinal twin studies: a comparative study of SEM estimation methods
(USC Thesis Other)
Attrition in longitudinal twin studies: a comparative study of SEM estimation methods
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
ATTRITION IN LONGITUDINAL TWIN STUDIES:
A COMPARATIVE STUDY OF SEM ESTIMATION METHODS
by
Pan Wang
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(PSYCHOLOGY)
December 2012
Copyright 2012 Pan Wang
ii
Dedication
To my parents, my sister and my brother.
ii
Acknowledgements
I’m most grateful to my advisor, Professor Laura A. Baker, for being always
patient, supportive, and encouraging throughout my study at USC. She gave me much
freedom to explore my research interests and determine my future field of study. She
taught me not only the philosophy of science, but also the spirits of perseverance and the
art of life. It’s a great honor to be her student.
I’m also very thankful to my committee members. This research benefited a lot
from invaluable advices of Professor John J. McArdle. I’m in awe of h is genius and
tremendous knowledge on SEMs and longitudinal studies. Professor Duncan Thomas
provided many insightful comments in designing the simulation study. I’m indebted to
him for the long discussions that helped me sort out technical details of my work.
Professor Dr. Richard John, who kindly let me audit his Bayesian course, inspired me a
lot about future directions of this research. Professor Justin Wood and Professor Margaret
Gatz, who were both wonderful and open-minded to serve as the outside-area members,
gave me a lot of thought-provoking comments.
I would also like to express my gratitude to my colleagues and friends---Dr.
Catherine Tuvblad, Dr. Serena Bezdjian, Dr. Yu Gao and Dr. Tian Tian---who inspired
my pursuit of science with their infinite support and care.
Tremendous thanks go to the Division of High Performance Computing and
Communications at USC for the computational support---especially Mr. Rui Wang and
Mr. Brian Mendenhall, who patiently gave me much help.
iii
Last but not least, thanks to all the staff members in the Psychology Department
for their extraordinary and professional assistance during all these years---in particular,
Ms. Irene Takaragawa, Ms. Sandra Medearis and Ms. Maria Nehlsen.
iv
Table of Contents
Dedication ........................................................................................................................... ii
Acknowledgements ............................................................................................................. ii
List of Tables ..................................................................................................................... vi
List of Figures ................................................................................................................... vii
Abstract ............................................................................................................................ viii
Chapter 1: An Empirical Study of Psychopathic Traits ...................................................... 1
1.1 Introduction ............................................................................................................... 1
1.2 Method ...................................................................................................................... 2
1.3 Results ....................................................................................................................... 8
Logistic Regressions .................................................................................................... 8
Biometric Latent Growth Model .................................................................................. 8
1.4 Summary and Discussion .......................................................................................... 9
Chapter 2: Introduction to the Bayesian Approach and Missing Data Techniques .......... 13
2.1 The Bayesian approach and MCMC algorithm....................................................... 15
2.2 Existing Missing Data Techniques .......................................................................... 18
Conventional Methods ............................................................................................... 19
Maximum Likelihood ................................................................................................. 23
The Bayesian Method and Data Augmentation ......................................................... 27
Chapter 3: Design and Methods ........................................................................................ 31
3.1 Simulation of the Complete Data ............................................................................ 31
3.2 Software .................................................................................................................. 32
3.3 Dependent Variables ............................................................................................... 34
Chapter 4: Simulation Study I - Compare Performance under MCAR Data .................... 35
4.1 Design...................................................................................................................... 35
4.2 Analysis ................................................................................................................... 38
4.3 Results ..................................................................................................................... 38
4.4 Summary and Discussion ........................................................................................ 41
Chapter 5: Simulation Study II - Compare Performance under MNAR Data .................. 45
v
5.1 Design...................................................................................................................... 45
5.2 Analysis ................................................................................................................... 47
5.3 Results ..................................................................................................................... 56
5.4 Summary and Discussion ........................................................................................ 60
Chapter 6: Simulation Study III - the Sensitivity of the Bayesian Model to
Misspecification ................................................................................................................ 62
6.1 Design...................................................................................................................... 62
6.2 Analysis ................................................................................................................... 62
6.3 Results ..................................................................................................................... 64
6.4 Summary and Discussion ........................................................................................ 74
Chapter 7: Revisiting the Empirical Study ....................................................................... 76
7.1 Result ....................................................................................................................... 76
7.2 Discussion ............................................................................................................... 78
Differences between ML and Bayesian models ......................................................... 78
Discussion regarding theory of psychopathy ............................................................ 79
Chapter 8: Conclusions and Integration ............................................................................ 84
References ......................................................................................................................... 88
Appendix A: Supplementary Tables and Figures ............................................................. 95
Appendix B: Codes and Scripts ...................................................................................... 104
vi
List of Tables
Table 1.1: Descriptive statistics of the empirical CPS data in both raw values and log
transformed values .............................................................................................................. 5
Table 1.2: The profiles of CPS total score by different missing patterns ........................... 6
Table 3.1: Descriptive Statistics of complete datasets across 100 replications ................ 33
Table 4.1: Descriptive Statistics of MCAR datasets across 100 replications ................... 36
Table 4.2: Compare robustness of Bayesian and ML estimation to varying fractions of
MCAR data ....................................................................................................................... 39
Table 5.1: Descriptive Statistics of MNAR datasets across 100 replications ................... 51
Table 5.2: Compare robustness of Bayesian and ML estimation to varying fractions of
MNAR Data (Case I) ........................................................................................................ 52
Table 5.3: Compare robustness of Bayesian and ML estimation to varying fractions of
MNAR Data (Case II) ....................................................................................................... 54
Table 6.1: Results of fitting Bayesian Model-III to MCAR data ..................................... 72
Table 6.2: Results of fitting Bayesian Model-III to MNAR data ..................................... 73
Table 7.1: Results of fitting different models to the CPS data ......................................... 77
vii
List of Figures
Figure 1.1: A one-group biometric latent growth model for longitudinal twin data in path
analysis form (McArdle, 1986) ........................................................................................... 3
Figure 1.2: A longitudinal trajectory plot of the log transformed CPS total score across
four waves of testing (only 25% randomly selected families used here, one child per
family) ................................................................................................................................. 7
Figure 1.3: A plot of CPS log transformed mean scores with standard error bars across
four waves from the calculated data ................................................................................... 7
Figure 1.4: Panel Histograms of CPS total (raw scores) by wave .................................... 10
Figure 3.1: A one-group biometric latent growth model in path analysis form with true
parameters ......................................................................................................................... 32
Figure 3.2: An example plot of longitudinal trajectories of the dependent variable across
four waves (25% random sample, one child per family) for complete data ..................... 33
Figure 4.1: An example plot of longitudinal trajectories of the dependent variable across
four waves (25% random sample, one child per family) for MCAR data, (a) for MCAR
30%, (b) for MCAR 50% .................................................................................................. 37
Figure 5.1: An example plot of longitudinal trajectories of the dependent variable across
four waves (25% random sample, one child per family) for MNAR data (Case I), (a) for
MNAR 30%, (b) for MNAR 50%..................................................................................... 49
Figure 5.2: An example plot of longitudinal trajectories of the dependent variable across
four waves (25% random sample, one child per family) for MNAR data (Case II), (a) for
MNAR 30%, (b) for MNAR 50%..................................................................................... 50
Figure 6.1: Distribution of sample credible intervals for mean of b
1
and b
2
by fitting the
Bayesian logistic model-III to MCAR data ...................................................................... 67
Figure 6.2: Distribution of sample credible intervals for mean of b
2
by fitting the
Bayesian logistic model-III to MNAR data (Case II) ....................................................... 71
viii
Abstract
The overarching aims of this research were to find a way to evaluate assumptions
about patterns of missing data and minimize adverse effects on statistical inferences and
conclusions if data are missing not at random. The Bayesian approach, implemented
through the MCMC algorithms, provided possible bridges to these aims and received
multi-phase investigations under a longitudinal twin design in this research. In simulation
study I and II, the performances of the Bayesian approach were compared to the direct
Maximum likelihood (ML) estimation when data were missing completely at random
(MCAR) or not at random (MNAR), respectively. Robustness of different approaches to
varying fractions of missing information was also examined. The Bayesian approach,
despite being less robust than ML in estimating unique environmental components and
residual error, was effectively capable of capturing true models from MNAR when an
appropriate logistic regression was formulated about the missingness pattern. In
simulation study III, the sensitivity of the Bayesian approach itself to misspecifications of
the missingness pattern received further investigation. The results suggested that, as long
as the logistic regression equation in the Bayesian model was a saturated one taking into
account as many predictors as possible, the Bayesian approach could correctly detect
misspecifications with an acceptable type-I error. Finally, the Bayesian model was
applied to an empirical study of psychopathic traits to demonstrate the influence of
MNAR data and what different conclusions from regular ML estimation it could achieve.
The implications of these different conclusions regarding the theory of psychopathy itself
were also discussed.
1
Chapter 1: An Empirical Study of Psychopathic Traits
Missing data present a major challenge to almost any substantive research since
they could lead to bias in parameter estimates and significance test and probably distort
the conclusion. This problem is more prominent in longitudinal studies, where subjects
are more likely to drop out due to moving, illness, death, etc., than other types of studies.
The drop-out in longitudinal studies is also referred to as ‘ attrition’. In this chapter, an
empirical study of psychopathic traits measured by Child Psychopathy Scale (CPS;
Lynam, 1997) was analyzed to demonstrate how attrition could be more challenging for
longitudinal studies on clinically relevant outcomes (e.g., depression, psychopathy)
where subjects at higher extremes are usually less likely to come back, as compared to
studies on less clinically relevant outcomes.
1.1 Introduction
Psychopathy in adults has been characterized as disturbances in interpersonal and
affective functioning with impulsive behavioral and antisocial tendencies (R. D. Hare,
2002; R. D. Hare, 2003). Even though the prevalence of psychopathy among adults is
estimated to be less than 1%, these affected individuals are believed to account for up to
50% of all serious crimes and their recidivism rates are higher than for other offenders (R.
D. Hare, 2003; Neumann & Hare, 2008). Genetic variance has been shown to influence
individual differences in psychopathic personality in both adolescents and young adults
(Blonigen et al., 2006; Forsman et al., 2008). However, studies investigating the roles of
genetic and environmental factors in change of psychopathic personality across
2
development are still fairly limited. Therefore, the main goals of this study were to: (1)
investigate the developmental trajectory of psychopathic traits across the whole stage of
adolescence; and (2) identify the respective contributions of additive genetic, shared
environmental and unique environmental effects in this developmental trajectory by use
of a twin design.
To achieve these aims, the biometric latent growth model (see Figure 1.1) was
employed. This is a very popular model to describe the dynamic genetic and
environmental changes of some repeated measure over time. It is a hierarchical model
with two levels. At its bottom level, there is a latent growth model, a statistical technique
used to represent the growth trajectory of repeated measures data over time (Meredith &
Tisak, 1990). Two latent factors are important in this trajectory: the intercept (referred as
G0) to describe a baseline level and the slope (referred as G1) to describe a change rate
across time. At the higher level, the variance-covariance matrix between G0 and G1 is
decomposed into three sources: the genetic part (additive A or dominant D), shared
environmental part (C), and the unique environmental part (E) within a twin design. A
more detailed description about this model can be found in McArdle & Hamagami (2003)
and McArdle (1986).
1.2 Method
Participants. Participants comprised a subgroup from the ongoing University of
Southern California Twin Study of Risk Factors for Antisocial Behavior (USC-RFAB,
See Baker et al., 2006 for a detailed description). This is a longitudinal study beginning
from 2000 and having the fourth wave data of collection close to the end by the time of
proposal of this dissertation research. Twins were mainly recruited through public
3
schools in the Greater Los Angeles area around 2000-2001 and the ethnic distribution
was comparable to that of the Southern California urban community. The current study
included 1,214 children (605 sets of twins and triplets, 46% MZ and 54% DZ) from this
study. About 49% of them were males and the other 51% were females. The mean ages
for waves 1, 2, 3 and 4 were 9.60, 11.79, 14.87 and 17.24.
Figure 1.1: A one-group biometric latent growth model for longitudinal twin data in path
analysis form (McArdle, 1986)
Measures. Subjects’ psychopathic traits were assessed by Child Psychopathy
Scale (CPS; Lynam, 1997). This measure, based on Hare’s PCL -R (1991), has been
β
1
β
0
A [2]
σ
c01
σ
e01
σ
a01
σ
e1
2
σ
c1
2
σ
a1
2
A [4]
A [3]
A [1]
σ
e0
2
σ
c0
2
σ
a0
2
C0 E0 A0
G0
Y [1] Y [2] Y [3] Y [4]
C1 E1 A1
G1
1
ε[2] ε[3] ε[4]
σ
u
2
σ
u
2
σ
u
2
σ
u
2
ε[1]
4
shown to have strong psychometric properties. The full scale contains 58 age-appropriate
questions rated on a two-point scale:(Yes=1 or No=0) and taps into 14 of 20 PCL-R
criteria: Glibness, Grandiosity, Boredom Susceptibility, Untruthfulness, Manipulation,
Lack of Guilt, Poverty of Affect, Callousness, Impulsiveness, Parasitic Lifestyle,
Behavioral Dyscontrol, Lack of Planning, Unreliability, and Failure to Accept
Responsibility. A higher-order factor analysis with oblique rotation on these 14 subscales
revealed two broad factors that are not structured according to Hare’s original
conceptualization (Bezdjian et al., 2011). The first factor tapped into the affective and
impulsive behavioral traits in Hare’s structure (thus referred to as Callous/Disinhibited),
while the second factor focused more on the interpersonal aspect of psychopathy
involving deceit and manipulation of others (thus referred as Manipulative/Deceitful).
There were both caregiver report version and youth self-report version available
for the CPS. In the current study, only the youth self-report data were included. The CPS
total score was subject to analysis. It was calculated as the sum of all 58 items and its
internal consistency was good (Cronbach’s α = .84). The six -month test–retest correlation
was high (r =0.87). There were initially 1,214 children participating at wave 1 (M = 13.18,
SD = 5.91). Among these 1214 subjects, 803 responded at wave 2 (response rate = 66.1%,
M = 12.82, SD = 7.00), 736 responded at wave 3 (response rate = 60.6%, M = 15.79, SD
= 7.35), and 519 responded at wave 4 (42.8%, M = 14.43, SD = 6.56).
Male subjects exhibited significantly higher level of psychopathic traits than
female subjects at wave 1(t (1208) =5.28, p < .05), wave 2 (t (783) =2.51, p < .05) and
wave 3 (t (730) =1.98, p < .05), but were insignificantly different from female subjects at
wave 4 (t (509) =1.87, p =0.06).
5
Because CPS was positively skewed, data were log transformed first via
equation . The mean values and standard deviations for the
transformed data were as follows: M = 11.14, SD = 1.87 for wave 1, M = 10.85, SD =
2.27 for wave 2, M = 11.79, SD = 2.11 for wave 3 and M = 11.45, SD = 2.03 for wave 4.
These descriptive statistics are also summarized in Table 1.1. Figure 1.2 presents a
longitudinal trajectory plot of the log transformed CPS total score across four waves of
testing. This plot was based on a subsample consisting of 25% of the families randomly
selected, showing only one child in each family.
Table 1.1: Descriptive statistics of the empirical CPS data in both raw values and log
transformed values
Raw Values
Log Transformed Values
All Families T1 T2 T3 T4
T1 T2 T3 T4
N 1214 803 736 519
Response Rate -- 66.1% 60.6% 42.8%
Mean
(SD)
13.18
(5.91)
12.82
(7.00)
15.79
(7.35)
14.43
(6.56)
11.14
(1.87)
10.85
(2.27)
11.79
(2.11)
11.45
(2.03)
Skewness .94 1.10 .47 .53
-.46 -.44 -.55 -.59
Kurtosis 1.41 1.42 -.28 .01
.81 .69 -.16 .21
Lab visit only Raw Values
Log Transformed Values
Families
T1 T2 T3 T4
T1 T2 T3 T4
N 152 150 143 145
Mean
(SD)
12.68
(5.53)
11.38
(5.49)
15.17
(7.56)
14.18
(7.02)
10.98
(1.91)
10.50
(1.95)
11.56
(2.23)
11.30
(2.21)
6
Analysis. Due to existence of large fractions of missing data in wave 3 and 4, a
series of logistic regressions were conducted first to examine whether the missingness
patterns in later three waves were associated with levels of psychopathic traits at previous
waves. Given the dependent nature of twin data, PROC GLIMMIX (SAS, 2004) was
employed to implement these logistic regressions. Then a full ACE biometric latent
growth model (see Figure 1.1) that freely estimated the latent basis was fitted to the data
using Mplus (L. K. Muthén & Muthén, 2000a).
Table 1.2: The profiles of CPS total score by different missing patterns
Raw Values
Log Transformed Values
T1 T2 T3 T4
T1 T2 T3 T4
N=315 12.28
(5.78)
12.08
(6.60)
14.90
(7.25)
14.03
(6.74)
10.83
(1.93)
10.61
(2.28)
11.52
(2.15)
11.30
(2.12)
N =261 13.34
(5.71)
13.31
(7.06)
16.69
(7.49)
--
11.21
(1.83)
11.02
(2.21)
12.06
(2.00)
--
N =137 14.70
(6.53)
13.46
(7.66)
-- --
11.62
(1.69)
11.01
(2.33)
-- --
N=226 13.36
(5.53)
-- -- --
11.24
(1.74)
-- -- --
N =72 13.33
(5.78)
13.50
(6.49)
-- 14.54
(5.51)
11.20
(1.83)
11.17
(2.04)
-- 11.64
(1.57)
N=77 12.70
(5.93)
-- 16.64
(7.15)
15.63
(6.91)
10.96
(1.93)
-- 12.00
(2.21)
11.79
(2.03)
N= 47 13.53
(7.11)
-- -- 15.38
(6.06)
11.02
(2.51)
-- -- 11.78
(1.94)
N=79 13.24
(5.83)
-- 15.71
(7.26)
--
11.17
(1.83)
-- 11.77
(2.10)
--
7
Figure 1.2: A longitudinal trajectory plot of the log transformed CPS total score across
four waves of testing (only 25% randomly selected families used here, one child per
family)
Figure 1.3: A plot of CPS log transformed mean scores with standard error bars across
four waves from the calculated data
ycpstot
8
9
10
11
12
13
14
15
16
17
18
Wave at Testing
1 2 3 4
ycpstot
10.7
10.8
10.9
11.0
11.1
11.2
11.3
11.4
11.5
11.6
11.7
11.8
11.9
Wave at Testing
1 2 3 4
8
1.3 Results
Logistic Regressions
The logistic regression on wave 2 showed that the attrition probability was
insignificantly associated with levels of psychopathic traits at wave 1 (b = 0.01, SE =
0.05, t (606) = 0.16, p = 0.87). The attrition probability at wave 3 was found to be
marginally associated with psychopathic traits at wave 1 (b = 0.09, SE = 0.05, t (606) =
1.85, p = 0.07) and insignificantly associated with levels of psychopathic traits at wave 2
(b = 0.03, SE = 0.05, t (398) = 0.59, p = 0.55). In contrast, although no significant
associations were found between the wave 4 attrition and psychopathic traits at wave 2 (b
= 0.06, SE = 0.05, t (398) = 1.27, p = 0.20) or wave 3 (b = 0.09, SE = 0.05, t (355) = 1.59,
p = 0.11), the probability that a participant dropped out in wave 4 was significantly
associated with their psychopathic traits levels at the first wave (b = 0.10, SE = 0.05, t
(606) = 2.15, p < .05). The positive association implies that subjects with higher levels of
psychopathic traits at initial wave were more likely to drop out in wave 4.
Biometric Latent Growth Model
The full ACE biometric latent growth model was fitted to investigate the
underlying genetic and environmental contributions to individual differences in both level
(intercept) and change (slope) in psychopathic personality. Latent bases A[1] and A[4]
were constrained to be 0 and 3 while A[2] and A[3] were freely estimated, a non-linear
model thus constructed. The fixed effects of this model were: G0 = 11.11, G1= 0.20 and
A[t] = (0, -1.18, 3.49, 3). Random effects from this model can be used to derive the
relative effects of genetic and environmental influences on both level and change in
psychopathic personality. The total variance of G0 can be composed into additive genetic
component (σ
a0
2
)
,
shared environmental component (σ
c0
2
) and non-shared environmental
9
component (σ
e0
2
): 0.94+0.53+0.22 = 1.69. Therefore, the relative genetic influence due to
additive genetic effects was 56% (=0.94/1.69). The majority of the variance in level of
psychopathic personality is explained by genetic effects. Shared environment explained
31% (=0.53/1.69) of the variance while non-shared environment accounts for the
remaining 13% (=0.22/1.69) of the variance. Similarly, relative effects of genetic and
environmental influences can be calculated for the variation in the slope G1, total
variance = 0.04+0.01+0 = 0.05, such that genetic and shared environmental effects
accounted for 80% and 20% of the slope variance, respectively. Non-shared
environmental influences on slope were almost negligible. The overall covariance
between the intercept G0 and slope G1 was negative: 0.02-0.08+0.01 = -0.05, which
means the higher level at the initial wave, the slower subjects’ levels of psychopathic
traits would increase during the following three waves. Finally, the residual variance
unique to each wave (σ
u
2
) was 2.35.
1.4 Summary and Discussion
Based on the results from the above biometric latent growth model, a non-linear
trajectory with positive slope was identified. Generally speaking, subjects’ level of
psychopathic traits decreased at wave 2, then increased at wave 3 and finally decreased
again at wave 4 (i.e., a cubic change pattern, see Figure 1.3 for a plot of mean values with
standard errors). The nadir of the trajectory occurred at wave 2. There was a negative but
not strong association between level and change rate. Genetic factors explained the
majority of the variance in both level and change rate in psychopathic personality.
The cubic change pattern was an unusual one and several factors were suspected
of causing this pattern. The first factor might be the administration differences. In wave 2,
10
around one half of the subjects were tested in labs and the other half received mail
surveys instead. To exclude this confounder, the mean CPS total scores of those lab visit
only families across four waves were examined (see Table 1.1). In this subgroup, the
cubic change pattern was replicated.
Figure 1.4: Panel Histograms of CPS total (raw scores) by wave
11
A second factor that might pose threats to validity of this change pattern was
missing data. To have a better look at the trajectory of psychopathic traits, the profiles of
CPS scores were examined under different missing patterns (see Table 1.2). It could be
seen that subjects with complete data on all four waves (n=315) again exhibited a cubic
change pattern. In addition, they consistently demonstrated the lowest level of
psychopathic traits at each wave, when compared to subjects of the other missing patterns.
Subjects with 4
th
wave missing tended to be the group with highest level of CPS scores at
either wave 1 (n=137) or wave 3 (n=261). Consistent with these findings, a significantly
positive association was found between attrition in wave 4 and initial levels of
psychopathic traits in the analyses of logistic regressions. This relationship meant
subjects missing at wave 4 were more likely to be those with higher levels of
psychopathic traits at initial wave.
It is thus reasonable to speculate that this trend might carry over to wave 4. These
missing subjects, despite unable to be measured and tested, might also continue to have
higher levels of psychopathic traits than subjects who remained in the 4
th
wave. In other
words, the probability that subjects were missing was actually dependent on missing
values of the variable of interest themselves (missingness pattern is not random or non-
ignorable). If this assumption was true, it could be expected that mean levels of
psychopathic traits in wave 4 would be spuriously lowered as a result of this missingness
pattern. Another by-product of this phenomenon was shrinkage of variance, which
brought adverse influences on estimate of slope variances in the biometric latent growth
model.
Another interesting aspect of the data worth noticing was the instability of
12
skewness and kurtosis across four waves. Both of these two measures in the first two
waves were much larger than their counterparts in the last two waves. One reason that
skewness and kurtosis decreased in wave 3 and wave 4 was smaller maximum values and
less accumulations on the higher extreme tail. For example, the maximum value was 42
in wave 1, but 38 in wave 3 and only 35 in wave 4 (see Figure 1.4 for details). This may
imply that some subjects’ levels of psychopathic traits dec reased as adolescents became
more physically and socially mature with age. Nonetheless, this could also be a sign that
subjects remaining in later waves tended to center on moderate levels or middle of the
distribution, while those at higher extremes (on the right tail of the distribution) were
reluctant to come back and thus could not be measured.
Under this situation, two questions arise and deserve deliberations: how can the
assumption that attrition probability depends on missing values themselves be evaluated
if these values are not available at all? If this assumption is true, how can one minimize
the adverse effects on statistical inferences and conclusions brought about by this non-
random missingness pattern? This research aims to provide some insights and solutions to
these two questions. The next chapter begins this research by reviewing some existing
statistical theories and methods related to these issues.
13
Chapter 2: Introduction to the Bayesian Approach and
Missing Data Techniques
In the field of structural equation modeling (SEM), the most common method to
estimate parameters has long been maximum likelihood (ML). Under ML estimation, the
likelihood of the observed data with respect to one or more parameters is formulated and
values maximizing this likelihood function constitute the desired estimate.
In recent years, however, the Bayesian approach, as another alternative estimation
method for SEM, has received a large amount of attention and begins to exhibit some
promising prospects. Implemented with the Markov Chain Monte Carlo (MCMC)
algorithm, this approach has demonstrated its better performance than conventional
methods in some models like multilevel modeling (Browne & Draper, 2006), longitudinal
modeling (Wang & McArdle, 2008), and heterogeneous factor analysis (Ansari, Jedidi, &
Dube, 2002).
Different from regular SEMs, the twin design in behavior genetics enables further
decomposition of variance components (random effects) of models into genetic effects,
shared environmental effects, non-shared environmental effects and residual error, which
is thus more capable of tracking sources of individual differences. The field of behavior
genetics also witnessed an increasing interest in MCMC framework of the Bayesian
methodology during the past decade. The interest of behavior genetist in the Bayesian
approach largely originated from its wide application in arena of quantitative genetics,
especially the analysis of linkage and pedigree data (Thomas & Gauderman, 1996),
which usually involves integration over high-dimensional likelihood and may encounter
14
numerical difficulties. The Bayesian approach was suggested to be especially better and
more flexible than ML for hierarchically structured data with many latent variables in the
sense that the integration over high-dimensional likelihood function tends to be laborious
or even infeasible for ML, while the Bayesian approach with MCMC algorithm as its
cornerstone are free of these limitations (Eaves & Erkanli, 2003; Van den Berg, Beem, &
Boomsma, 2006). Inspired by this, a bunch of researchers have recently been devoted to
introducing it into the modeling of behavior genetics. For example, it has been
successfully applied in models with Gene-Environmental correlation and interaction
(Eaves & Erkanli, 2003), models incorporating twin model and item response theory
simultaneously (Eaves et al., 2005; Van den Berg, Glas, & Boomsma, 2007), and
nonlinear genetic model estimating genetic and cultural transmission under assortative
mating (Van den Berg, 2009).
Despite successful implementation in previously mentioned twin models, there is
still a large gap to fill regarding the use of the Bayesian approach in the field of behavior
genetics. Few studies have looked into by far the performance of the Bayesian approach
when data are missing. Therefore, the main purpose of the current research is to
investigate the influence of missing data under the design of longitudinal twin studies.
With the ability to do data augmentation, and the computational flexibility in calculating
high-dimensional probability distributions, it is expected that the Bayesian approach to
SEM can provide some insight into this problem in longitudinal twin studies. This
research will consist of a simulation part and an application in real data sets
(psychopathic traits).
15
2.1 The Bayesian approach and MCMC algorithm
To illustrate how the Bayesian approach works, it is important to understand the
major difference between the Frequentist approach and the Bayesian approach. In the
perspective of the Frequentist approach, the model parameters are thought unknown but
fixed values, while in the Bayesian approach, the model parameters are considered as
random quantities with a probability distribution (Casella & Berger, 2002). The choice of
this distribution largely depends on the researcher’s prior knowledge (hence termed ‘prior
distribution’). It can be either informative one expressing specific and definite
information about the parameters, or noninformative one with very vague, flat, and
diffuse information about the parameters (Gelman et al., 2004).
Let denote the prior distribution of some model parameter and denote
the distribution of the observed data. Then by Bayesian formula
1
, the conditional
distribution of the parameter given the observed data Y is calculated in the following
way:
The bottom part of equation (1) is actually the marginal distribution of Y and does
not depend on . Therefore, equation (1) can be simplified as follows:
Where was termed the posterior distribution. This distribution is now used
to make inferences about , which is still considered a random quantity. The mean, for
example, can be used as a point estimate of . In most cases, however, the analytic
1
16
approach of calculating the mean through numerical integrals is infeasible when the data
are highly-dimensional and or is complicated. Therefore, MCMC is resorted
to as a solution for the approximation of the posterior distribution.
In general, MCMC methods refer to a class of algorithms to approximate a
desired distribution by repeatedly drawing random samples from some probability
distribution based on constructing a Markov Chain. The desired distribution is achieved
when the Markov Chain reaches its stationary state after a long enough time of sampling.
The time before achieving this stationary state is termed the “ burn in” period. The
sampling values during the burn in period are usually discarded to reduce potential bias
introduced by the starting values. It is thus very important to assess the convergence of
the Markov Chain. Under some simple situations, the trace plots of sampling values
versus the iteration index number are enough to assure the convergence. In practice,
however, special diagnostic tests are needed for most situations.
The current research uses the Gelman-Rubin diagnostics (A. Gelman & Rubin,
1992). To conduct this test, at least two parallel chains are needed. Through adjusted
variance ratio between (or among) chains and the variance within each single chain, a
statistic called the potential scale reduction factor (PSRF) is obtained. When there are
multiple parameters to be estimated, a multivariate PSRF (MPSRF) is also available. A
PSRF value close to 1 means the parallel chains converge. However, an ESPR value
below 1.2 is also acceptable for most situations. Sometimes people are reluctant to
sample multiple chains and hope the convergence test can also be administered to a single
chain. The Geweke test (Geweke, 1991) was developed for this purpose. This test
assesses convergence by checking whether the sample mean from the early and later part
17
of the Markov chain differed significantly. The fractions for the early and latter part are
usually set to be 0.1 and 0.5 respectively. Some other convergence diagnostic tests
available for MCMC algorithm include the Heidelberger-Welch test (Heidelberger &
Welch; 1981, 1983), the Raftery-Lewis test (Raftery & Lewis; 1992, 1995), and the
effective sample size (Kass, Carlin, Gelman, & Neal, 1998) and the autocorrelation.
Many algorithms are available to construct Markov chains, which includes the
Metropolis-Hastings algorithm (Hastings, 1970; Metropolis, Rosenbluth, Rosenbluth,
Teller, & Teller, 1953) and Gibbs sampling (Geman & Geman, 1993). MCMC is
especially better than conventional methods in calculating integrals over high-
dimensional probability distributions (Gilks, Richardson, & Spiegelhalter, 1996), which
makes MCMC a potentially good alternative to ML in the frame of SEMs (Cai, 2010;
Edwards, 2010; B. Muthén & Asparouhov, 2010).
Gibbs sampling and the BUGS. The Gibbs sampling (Geman & Geman, 1993) is
one of the most commonly-used MCMC algorithms. The basic idea of this algorithm is to
approximate some high-dimensional joint-distribution by sampling from the conditional
distribution of each variable given the other variables in the model. To illustrate how the
Gibbs algorithm works, for simplicity, suppose we only have two model parameters to
estimate: α and β. What we really want to get is the joint posterior distribution of α and β
given observed data Y, i.e. . The Gibbs sampling will take the following
general steps:
1. Set (α
1
, β
1
) as the initial values for (α, β);
2. Do iterations from i = 2, 3… n;
(1) Sample
from
18
(2) Sample
from
.
The process of sampling the conditional distribution of each parameter given the
other parameter and observed data is a Markov Chain. After a certain number of
iterations (say , and < ), Gibbs sampling will reach the stationary distribution of the
Markov Chain, which is the desired joint posterior distribution of α and β. These
iterations are the ‘burn-in’ iterations and discarded in estimating parameters. The
inferences about the parameters α and β are usually made based on the averaging values
of samples after the th iterations. This averaging process is Monte Carlo. The iterations
will keep running until the criteria of the convergence diagnostic tests are met.
Based on this algorithm, a software package termed BUGS (Bayesian inference
Using Gibbs Sampling) was developed in 1989 for Bayesian analysis of complex
statistical models. By far, BUGS is very stable and has received a huge number of
applications and links (D. J. Lunn, Thomas, Best, & Spiegelhalter, 2000). Two main
versions are available, namely WinBUGS and OpenBUGS. The latter is an open-source
variant of the former and can be run directly within both R and SAS. Both WinBUGS
and OpenBUGS can be downloaded freely from the internet (http://www.mrc-
bsu.cam.ac.uk/bugs/, http://www.OpenBUGS.info/w/).
2.2 Existing Missing Data Techniques
Research on missing data has a long history and many techniques have been
developed for handling it. The next section will be devoted in reviewing the existing
techniques for handling missing data. Before introducing these methods, however, it is
important to distinguish some basic assumptions about how the data are missing, because
19
it determines whether it is appropriate to apply one method or another in specific
situations (Rubin, 1976).
Let us define the complete data = (
) and the missing data indicator matrix =
(
). Let
denote the observed components of and
the missing components,
i.e. = (
,
). The missing data mechanism can be formulated by the conditional
distribution of given , say , where denotes unknown parameters. If the
missingness does not depend on the data Y, missing or observed, that is, if
for all
The data are called missing completely at random (MCAR). A less restrictive assumption
than MCAR is missing at random (MAR), when the missingness depends on
but not
on
, that is,
=
for all
.
Finally, the mechanism is called not missing at random (NMAR) if the
distribution of M depends on missing values in , that is,
. Another way to describe
these missingness mechanisms is ignorability versus nonigorability. Missing data are said
to be ignorable if the data are MCAR or MAR. Otherwise, missingness is nonignorable if
the data are NMAR. In practice, the assumption of MCAR or MAR can never be directly
assessed since is missing.
Conventional Methods
Listwise Deletion. Listwise deletion (LD), also known as complete-case analysis,
conducts statistical analysis only including those cases with data across all the variables.
This method is very easy to implement and is the default option in almost all statistical
packages (Allison, 2003).
20
Under the assumption of MCAR, it can achieve both unbiased parameter estimate
and unbiased standard errors, since the complete cases are just a random sample of all the
cases. However, the potential loss of information (loss of precision and bias) may be
large if data are not MCAR. Wothke (2000) have demonstrated LD brought about biased
parameter estimates even when data are MAR. For example, boys tend to show more
aggressive behavior than girls. If boys are more likely to be missing than girls, then their
mean of aggressive behavior is very likely to be biased downward under LD. In addition,
the LD will delete a large fraction of data if the number of variables to be estimated is
large. It is particularly problematic in longitudinal studies with multiple waves.
In practice, a common strategy to adjust the bias introduced by LD is to weight
the cases with their probability to be sampled, but the calculation of standard errors is
more complex. Most statistical software packages ignore the sampling uncertainty in
incomplete cases, and the asymptotic standard errors of estimators can be thus
problematic.
Pairwise Deletion. As mentioned above, the loss of data can be very large if data
includes many variables. An alternative to LD was thus developed, namely pairwise
deletion (PD), also known as available-case analysis. Under PD, when calculating the
means and variances of each univariate variable, all cases with data on that variable are
used. Similarly, when it comes to the covariances between two variables, all cases that
are complete on that pair of variables are used. One resulting question is that the sample
base changes from variable to variable according to the missingness pattern. This creates
a problem when researcher wishes to compare across variables, since the statistics come
from different sample bases. Another disadvantage associated with PD is that the
21
covariance matrix may not be positive definite. Many analyses based on covariance
matrix, for example, multiple regression, thus become problematic. In longitudinal
studies, PD may give erroneous impression about whether there is change and the
direction of change.
Although one may assume that PD may be more efficient than LD since it makes
more use of the data, the conclusion is not guaranteed. The superiority of PD was
supported only when data correlate modestly under MCAR (Kim & Curry, 1977), but
inferior to LD when correlations are large (Azen & Van Guilder, 1981; Haitovsky, 1968).
Single Imputation. Another commonly used approach for missing data is to
predict the missing values based on the observed data through some method, and then fill
in those missing values with the predicted values. The filled-in (or imputed) data will
replace those missing values in the intended statistical analyses. Methods used to impute
missing cases are diverse. One simple way is to substitute all the missing cases with the
mean of the respondents. This method is not advisable and yields biased estimates of
many parameters (Little & Rubin, 2002). A better approach than this is conditional mean
imputation, usually taking the associations among variables into account. A good
example of conditional mean imputation is regression imputation, where missing values
were replaced by predicted values from a regression of the missing variables on variables
observed for that case. Although regression imputation can yield unbiased estimate of
regression coefficients under MCAR, it also underestimates variability of filled-in data
and overestimates correlation (Little & Rubin, 1987).
The development of single imputation is revolutionary and makes the standard
complete-case methods of analysis convenient to be applied. However, there is a key
22
problem common to all methods of single imputation: the imputation takes the filled-in
data as real and ignores the uncertainty in the imputation process. This causes spuriously
smaller standard errors and narrower confidence intervals. Two approaches were
developed as remedy to this question, replication methods (e.g. bootstrap sampling) and
multiple imputation. The latter will receive detailed introduction in the following section.
Multiple Imputation. Multiple imputation (MI) refers to “the procedure of
replacing each missing value by a vector of D ≥ 2 imputed values” (Little & Rubin, 2002),
which generates D complete datasets for analysis. The D sets of imputations are usually
repeated draws from a predictive distribution under some particular models for missing
values.
This approach was first proposed by Rubin (1978) and extended more in Rubin
(1987). With advances in computer technology, many statistical software packages (e.g.,
SAS 8.2 or later) have provided MI as an option for missing data. As mentioned above,
the single imputation method tends to underestimate uncertainty in the imputation and
leads to biased SE estimates. MI overcomes the bias from the deterministic imputation by
repeating multiple times and thus introducing variances between different sets of imputed
values. Therefore, MI was thought to share the advantages of single imputation and
rectifies its disadvantages (Little & Rubin, 2002). With D (≥ 2) imputed datasets, a
specific parameter, say , is calculate as the average of that parameter over D datasets:
.
Define
, d=1, …, D as the variance associated with
in each imputed dataset.
Then the variability associated with this estimate is composed of two parts: the average
within-imputation variance,
23
,
and the between-imputation component,
.
In summary, the availability of multiple datasets enables the parameter estimate
more efficient and the possibility of evaluating imputation uncertainty.
Despite the above advantages, there are several problems with MI. The first major
problem is that MI cannot give a determinate result due to the random error introduced at
each imputation. Secondly, there are many different ways to implement MI (for example,
regression with random errors vs. regression with random draw of coefficients) for a
particular situation. The choice of imputation methods is not unique and can be a
daunting task. Uncertainty with MI actually comes from more than one sources: there is
uncertainty not only in choosing correct imputation model but also in choosing model
parameters even after choosing the model (Zhang, 2003). Allison (2003) also emphasized
the importance of choosing imputation model under nonlinearities and interactions. There
were still concerns about the application of MI under nonignorable missing data, which is
common to all the other missing data techniques.
Maximum Likelihood
ML estimation is the mainstream method used in SEM and also a highly popular
approach to analysis of missing data. In a sense, the SEM can be viewed as a special case
of missing data family since the latent variables in the SEM cannot be observed at all.
The ML method is available in most widely used SEM software packages (e.g., Amos,
Mx, Mplus) as well as other regular statistical software packages (e.g., SPSS missing
values analysis).
24
Under ML, the likelihood function of the data with respect to specific parameters
is usually formulated under the assumption of multivariate normal distribution.
Values that can maximize the likelihood function are desired estimates. If multivariate
normality is satisfied, ML estimate will have optimal large-sample properties:
consistency, asymptotic efficiency and asymptotic normality. In addition, under MAR,
the inferences about based only on the observed data
are no different from the
inferences based on the complete data
. However, when the data are
incomplete, the likelihood function of the observed data can be very complex and the ML
estimate cannot be solved directly. Three methods are available to addressing this
question: factoring the likelihood, the EM algorithm, and direct ML. Since the first
method only applies to a restrictive missingness pattern (monotone missing), it will not
discussed further here. More attention will be place on the other two methods, which
works for more general pattern of missing data.
Direct ML. This method utilizes all the available information in the data, also
referred to as ‘raw’ ML (requiring raw data as input) or ‘full information’ ML.
Suppose
is a K-variate normal distribution with mean
and covariance matrix
. When data are complete, the likelihood
function is easy to write:
where
. When some data are missing, the likelihood function is
modified as follows:
25
Where
and
are the mean and covariance matrix of the observed
components of in case . When the likelihood is differentiable and unimodal, ML
estimates can be calculated by solving the following equation
We usually define
. In practice, the estimates are usually
found by iterative methods, for example, the Newton-Raphson algorithm. The standard
errors are acquired by inverting the information matrix
at each iteration.
Direct ML is a very popular method for handling missing data in SEM and has
demonstrated its advantages in many aspects. Direct ML was less biased than
conventional methods like LD, PD and mean imputation in the confirmatory factor
analysis (CFA) across both MAR and MNAR mechanisms (Arbuckle, 1996; Enders &
Bandalos, 2001; B. Muthén, Kaplan, & Hollis, 1987; Wothke, 2000). It also exhibited
advantages in estimation efficiency (i.e., variability in parameter estimate), but the gain
of efficiency ranges from small to substantial (Arbuckle, 1996; Enders & Bandalos, 2001;
Wothke, 2000).
EM algorithm. As mentioned in the section about direct ML, the standard errors
are acquired by inverting the information matrix
at each iteration. However, for
data with complex missingness patterns, the entries in the
and
can be
complicated function of . Also the matrix is large when is high dimension. For
example, for K=30, the information matrix
has K+K (K+1)/2 =495 rows and
columns, which means the number of elements is over 100000. As a result, the inversion
26
of the information at each iteration can be expensive and requires efficient programming.
In contrast, the Expectation Maximization (EM) algorithm does not require calculation or
approximation of information matrix and can also be used as an alternative computing
strategy to direct ML for incomplete data problems.
As its name suggests, the EM algorithm consists of two steps: an expectation step
(E-step) and the maximization step (M-step). The idea underlying this algorithm is
straightforward. In the E-step, missing data are replaced by the conditional expectation of
the “missing data” given the observed data and the current parameter values. The M -step
performs regular ML estimation of on the observed data and filled-in data from the E-
step, as if the data were complete. The two steps are repeated over and over until the
estimated parameter values from the M-step does not change any more, which means the
estimates converge. To implement this algorithm, a starting value for means and
covariance matrix usually needs to be chosen for the E-step to start the iterations. The
conditional expected means and the expected covariances of the “missing data” are
obtained by sweeping unobserved variables on observed variables across different
missingness patterns (Little & Rubin, 2002).
There is some controversy about the efficiency of the EM algorithm. Little and
Rubin (2002) highly recommended the EM algorithm since it avoids inversion of the
information matrix at each iteration. Allison (2003) thought the standard errors through
the EM algorithm will not be a consistent estimate of the true standard errors and
preferred direct ML over EM algorithm. However, in reality, these two methods can be
applied in combination and facilitate the use of each other (for example, please see L. K.
Muthén & Muthén, 2000a). The default assumption when implementing ML methods is
27
that missing data are ignorable (or MAR). Application of ML in non-ignorable
missingness patterns suffered from much vulnerability (Allison, 2003).
The Bayesian Method and Data Augmentation
Data augmentation (DA) is a general name for methods constructing iterative
algorithms via introduction of unobserved data or latent variables (van Dyk & Meng,
2001), when sometimes a likelihood function is complicated and difficult to deal with
directly. From this perspective, EM algorithm is a member of the DA family (Dempster,
Laird, & Rubin, 1977). Tanner and Wong (1987) put up a general framework for
conducting DA from the Bayesian point of view. In the situation of missing data, this
iterative procedure consists of two steps: the imputation (or I) step and the posterior (or P)
step. Given a starting value for , say
, a typical iteration proceeded as follows:
I step: Draw
from distribution
;
P step: Draw
from distribution
.
The reason for doing this sampling is that the conditional distributions in these
two steps are often much easier to draw from than the posterior distribution
. It
can be shown that this iterative procedure eventually generates a draw from the joint
distribution of
and given
,
as tends to infinity. In practice,
these iterative processes are implemented by means of MCMC algorithms.
The Bayesian DA is also a good method for implementing multiple imputations.
When independent draws are sampled simultaneously at each iteration, the final draws
thus form an empirical distribution with sample size and provide sets of imputed
values for missing data. Rubin (Little & Rubin, 2002; Rubin, 1987) highly thought of the
Bayesian DA and multiple imputation. However, this method also suffers from some
28
drawbacks. Nielsen (2003) showed a failure of Bayesian multiple imputation, but Rubin
(2003) later pointed out this failure may arise from inefficient complete-data analysis.
The Bayesian DA may also be sensitive to the fraction of missing information (Rubin,
2003), in contrast to the relative robustness of the direct ML (Wothke, 2000; Yuan,
Bentler, & Zhang, 2005).
Despite these disadvantages, there are still several motivations for considering
Bayesian methods in dealing with missing data. On the first hand, automatic imputation
by DA through MCMC algorithm is an attractive feature. Although both the EM
algorithm for ML and the MCMC algorithm for Bayesian approach belong to the family
of DA and perform imputation, MCMC algorithm fills the missing data with random
draws from the conditional distribution given the observed data and the parameter, rather
than replacing missing data with the conditional mean given the observed data and the
parameter in EM. Therefore, estimates of the covariance matrix from the filled-in data
can be computed without adding corrections to the variances, since the calculation of
posterior mean by averaging over many draws from the posterior distribution limits the
loss of efficiency (Little & Rubin, 2002).
On the other hand, employing the Bayesian approach may benefit the statistical
inferences when data are incomplete in the sense that it can formalize the subjective
elements implicit in those assumptions about the missing data (Daniels & Hogan, 2007).
Almost any statistical inference about the parameter of interest in presence of missing
data is established on some assumption with various degree of subjective judgment. For
example, the parameter estimation mostly relies on the assumption that the data are
missing at random (MAR). This assumption is subjective in that it can never been
29
directly assessed based on the existing data, and thus threatens the validity of inferences
based on it. It was argued that the subjective assumption should be formulated in terms of
prior distributions for parameters, which index the conditional distribution of the missing
data given observed data. It is thus possible to express the assumption about missingness
mechanism in the probability mass function and evaluate the uncertainty about this
assumption.
Therefore, the specific aims of this research are summarized as follows:
(1) Examine the robustness of the Bayesian method to fraction of missing
information (0%, 30%, and 50%) compared to the direct ML;
(2) Compare the performance of the Bayesian method and direct ML under
different missingness mechanisms (MCAR and MNAR); especially for the MNAR data,
the current study will test a model specific only to the Bayesian method, in which the
assumption about the missingness mechanism is expressed in probability mass function;
(3) If the Bayesian approach works for the MNAR data, a further aim is to
explore the sensitivity of the Bayesian approach to misspecification: specifically, to what
extent the Bayesian approach can detect true parameters if the assumption about the
missingness mechanism is not appropriate?
These goals thus constitute a 2 × 2 design. The first factor is the type of
missingness mechanism (MCAR or MNAR) and the second factor is the fraction of
missing information (30% or 50%). Data will be simulated across these missing
conditions and then fitted using both ML estimation and Bayesian approach. Estimated
means and efficiency (i.e. standard deviations) will be used to compare results from these
two methods.
30
Finally, application in real studies is the ultimate goal of the current research.
Therefore, in addition to the above simulation studies, both the Bayesian and ML models
will be applied back into the empirical longitudinal twin study of CPS with four waves
and likely MNAR data at wave 4 to see whether they can bring different conclusions.
31
Chapter 3: Design and Methods
3.1 Simulation of the Complete Data
As an analogue to the empirical dataset, the complete data were simulated based
on a biometric latent growth model with T=4 occasions (See Figure 3.1). The true
parameter value for A[t] is set to be (1, 2, 3, 4), which was a linear growth model. The
intercept and slope (G0, G1) was set to be 10 and 3, respectively. The true value of the
residual error variance at each time point (σ
u
2
) was set uniformly as 16. For G0, σ
a0
2
, σ
c0
2
,
σ
e0
2
were set as 6, 4, and 2 respectively. The covariance between G0 and G1 was negative,
with σ
a10
, σ
c10
, σ
e10
set to be -1.5, -1 and -0.5, while the true values of slope variances σ
a1
2
,
σ
c1
2
, σ
e1
2
were all 2, 2 and 2. Each simulated dataset was composed of 625 twin pairs,
with 310 as MZ twins, and the other 315 as DZ twins. Simulation was replicated 100
times. The missing data were then created by deleting cases from the complete data.
Under the premise that data was always complete at the initial time point, the deletion of
cases was only limited to data at T2, T3 and T4.
On average, the sample mean of these 100 complete datasets were 9.98 at wave 1,
13 at wave 2, 15.98 at wave 3 and 19 at wave 4, while the sample standard deviations
were 5.29, 5.29, 6.33 and 8.00 for four waves in order. For more details of descriptive
statistics of these simulated dataset, please refer to Table 3.1. A longitudinal trajectory of
the variable Y in a randomly selected 25% sample with only one subject in each family
was also plotted in Figure 3.2.
32
Figure 3.1: A one-group biometric latent growth model in path analysis form with true
parameters
3.2 Software
The whole simulation process and descriptive statistics of the empirical data were
implemented using the SAS (Version 9) statistical package (SAS, 2004). PROC IML was
employed to achieve the desired variance-covariance components between intercept G0
and slope G1. The biometric latent growth model was fitted in Mplus (Muthén & Muthén,
2000b) for implementing ML estimation and in OpenBUGS (D. Lunn, Spiegelhalter,
Thomas, & Best, 2009; D. J. Lunn, Thomas, Best, & Spiegelhalter, 2000) for
implementing MCMC algorithm of Bayesian approach. Finally, the R Coda package
3 10
2
-1
-0.5 -1.5
2 2 2
4
3
1
2 4 6
C0 E0 A0
G0
Y [1] Y [2] Y [3] Y [4]
C1 E1 A1
G1
1
ε[2] ε[3] ε[4]
16 16 16 16
ε[1]
33
(Plummer, Best, Cowles, & Vines, 2006) was used for conducting the Gelman-Rubin
tests (A. Gelman & Rubin, 1992) assessing the convergence of MCMC algorithms.
Table 3.1: Descriptive Statistics of complete datasets across 100 replications
Complete
T1 T2 T3 T4
Mean (SE) 9.98 (0.18) 13.00 (0.15) 15.98 (0.21) 19.00 (0.28)
SD (SE) 5.29 (0.10) 5.29 (0.11) 6.33 (0.13) 8.00 (0.15)
Note. These values are average values of the 100 replications. Numbers inside the
parentheses are the standard deviations of the corresponding statistics across the 100
replications.
Figure 3.2: An example plot of longitudinal trajectories of the dependent variable across
four waves (25% random sample, one child per family) for complete data
y
-20
-10
0
10
20
30
40
50
Wave at Testing
1 2 3 4
34
3.3 Dependent Variables
To compare performances of ML estimation and the Bayesian approach on these
parameters, two dependent variables were examined. The first one was the mean value of
each parameter estimate across all the replications:
. The
second one was the parameter estimate efficiency. Efficiency was operationalized as the
standard deviation of empirical sampling distribution of each parameter estimate taken
across all the replications.
35
Chapter 4: Simulation Study I - Compare Performance under
MCAR Data
The aim of this study was to investigate the robustness of Bayesian and ML
estimation to varying fractions of missing information when data were MCAR.
4.1 Design
The MCAR data were created by utilizing uniform random variables. Given the
dependency nature of twin data, this process was implemented on a family basis. Two
uniform random variables were generated here. The first one was used to decide which
family had missing data. For each family, if the value of this variable was larger than 0.7,
that family was marked as a missing family; otherwise it was marked as a complete
family. The second uniform variable was used to decide whether the missing family was
missing on both twins in a pair or only a single twin. If its value was smaller than 0.9,
values of both twins in this family was deleted; otherwise, value of only one twin was
deleted. For the latter, a Bernoulli variable with probability 0.5 was generated to decide
which twin should be deleted. A missingness fraction at 30% for the twin data was thus
obtained. In a similar way, a 50% missingness fraction was constructed, with the criterion
value to mark missing family set to be 0.5 for the first uniform random variable.
For the 30% MCAR datasets, on average the sample means of four waves in
sequence were 9.98, 13, 15.98 and 18.99, while the corresponding standard deviations
were 5.29, 5.30, 6.32 and 8.00 respectively. Results from 50% MCAR datasets were
similar, with sample means 9.98, 12.98, 16.00 and 18.96, and standard deviations 5.29,
5.29, 6.34 and 7.96 in order (see Table 4.1 for more descriptive statistics, e.g. the sample
36
missingness rates). Figure 4.1 displays the trajectories of the variable Y in 25% randomly
selected samples with only one twin in each family when data were 30% MCAR and 50%
MCAR respectively.
Table 4.1: Descriptive Statistics of MCAR datasets across 100 replications
30% MCAR 50% MCAR
T1 T2 T3 T4 T1 T2 T3 T4
Missingness
rate
-- 28.37
(1.68)
28.73
(1.75)
28.4
(1.95)
-- 47.41
(2.05)
47.83
(1.72)
47.73
(2.04)
Mean (SE) 9.98
(0.18)
13.00
(0.19)
15.98
(0.26)
18.99
(0.35)
9.98
(0.18)
12.98
(0.24)
16.00
(0.31)
18.96
(0.39)
SD
(SE)
5.29
(0.10)
5.30
(0.13)
6.32
(0.16)
8.00
(0.19)
5.29
(0.10)
5.29
(0.16)
6.34
(0.17)
7.96
(0.23)
Note. These values are average values of the 100 replications. Numbers inside the
parentheses are the standard deviations of the corresponding statistics across the 100
replications.
37
Figure 4.1: An example plot of longitudinal trajectories of the dependent variable across
four waves (25% random sample, one child per family) for MCAR data, (a) for MCAR
30%, (b) for MCAR 50%
(a)
(b)
y
-20
-10
0
10
20
30
40
50
Wave at Testing
1 2 3 4
y
-10
0
10
20
30
40
Wave at Testing
1 2 3 4
38
4.2 Analysis
Based on the above descriptive statistics and the trajectories, in this study, two
models (regular ML linear model and the regular Bayesian linear model) were fitted to
each dataset under three levels of missingness fractions: complete data, 30% missing, and
50% missing.
Fifteen parameters were of interest and were compared in this study. The first two
parameters to compare were the intercept G0 and slope G1, fixed effects of the model.
Ten random effects of the model were subject to the next round of comparisons, which
included the additive genetic components (A) of the variance-covariance structure
between G0 and G1 (σ
a0
2
, σ
a1
2
, σ
a01
), the shared environmental (C) components (σ
c0
2
, σ
c1
2
,
σ
c01
), the non-shared environmental (E) components (σ
e0
2
, σ
e1
2
, σ
e01
) and finally the
residual error unique to each time point (σ
u
2
). The remaining three parameters of
comparison were the overall values of variance-covariance between G0 and G1 (σ
g0
2
, σ
g1
2
,
σ
g01
) after summing over A, C and E components.
4.3 Results
Results for each dataset from both ML and Bayesian estimation under complete
dataset, 30% MCAR and 50% MCAR were summarized in Table 4.2. Several interesting
findings were observed. First of all, for almost each parameter, the Bayesian approach
consistently possessed smaller standard deviation (i.e. larger efficiency) in estimation
than the ML estimation across each fraction of missing data. Second, the performance of
the Bayesian approach was comparable to that ML estimation for fixed effects, that is,
their mean values of intercept G0 and slope G1 were the same for each missingness
fraction.
39
Table 4.2: Compare robustness of Bayesian and ML estimation to varying fractions of MCAR data
Parameter
Complete 30% MCAR 50% MCAR
Bayesian ML
Bayesian ML Bayesian ML
Fixed effects
G0 =10 9.98 (0.16) 9.98 (0.16) 9.98 (0.16) 9.98 (0.19) 9.98 (0.17) 9.98 (0.17)
G1 =3 3.00 (0.10) 3.00 (0.10) 3.01 (0.11) 3.01 (0.11) 3.00 (0.12) 3.00 (0.12)
A[t] = [0, 1,
2, 3]
=[0, 1, 2, 3] =[0, 1, 2, 3] =[0, 1, 2, 3] =[0, 1, 2, 3] =[0, 1, 2, 3] =[0, 1, 2, 3]
Random effects
σ
u
2
= 16 16.09 (0.47) 15.92 (0.47) 16.17 (0.55) 15.92 (0.56) 16.32 (0.62) 15.93 (0.67)
Additive Genetic
σ
a0
2
= 6 6.10 (2.71) 5.86 (2.81) 6.35 (2.58) 5.65 (2.91) 6.14 (2.53) 5.81 (3.06)
σ
a1
2
=2 2.09 (0.80) 2.13 (1.06) 2.19 (0.83) 2.08 (1.19) 2.21 (0.78) 2.18 (1.26)
σ
a01
= -1.5 -1.39 (1.34) -1.40 (1.40) -1.59 (1.28) -1.27 (1.53) -1.54 (1.25) -1.39 (1.55)
Shared Environ.
σ
c0
2
=4 4.02 (2.34) 4.17 (2.45) 3.82 (2.19) 4.31 (2.50) 3.95 (2.20) 4.18 (2.58)
σ
c1
2
=2 1.96 (0.63) 1.93 (0.84) 1.86 (0.65) 1.94 (0.95) 1.85 (0.66) 1.87 (1.05)
σ
c01
=-1 -1.16 (1.09) -1.13 (1.13) -0.99 (1.01) -1.21 (1.21) -1.05 (1.05) -1.14 (1.30)
Unique Environ.
σ
e0
2
=2 1.57 (0.82) 2.13 (1.06) 1.42 (0.77) 2.02 (1.06) 1.36 (0.70) 1.97 (1.15)
σ
e1
2
=2 1.87 (0.36) 1.92 (0.39) 1.82 (0.44) 1.96 (0.47) 1.73 (0.44) 1.89 (0.49)
σ
e01
= -0.5 -0.26 (0.50) -0.42 (0.54) -0.16 (0.51) -0.46 (0.60) -0.08 (0.43) -0.38 (0.59)
Total variance
σ
g0
2
= 12 11.69 (1.02) 12.00 (1.05) 11.59 (1.09) 11.99 (1.12) 11.45 (0.99) 11.96 (1.10)
σ
g1
2
= 6 5.91 (0.38) 5.98 (0.38) 5.87 (0.46) 5.98 (0.46) 5.79 (0.49) 5.94 (0.49)
σ
g01
= -3 -2.82 (0.50) -2.95 (0.50) -2.75 (0.54) -2.94 (0.55) -2.67 (0.54) -2.91 (0.56)
Note. The numbers outside the parentheses are the means of the estimates across the 100 replications and the ones inside
the parentheses are the standard deviations of the estimates across the 100 replications.
40
However, the results of the random effects of the model were mixed. In general,
the Bayesian approach tended to overestimate the A components while the ML estimation
tended to underestimate them.
Estimate biases were examined in percentage (
) here. When
data were 30% MCAR, the Bayesian approach was 6% positively biased from the true
value of σ
a0
2
on average while ML estimation was 6% negatively biased. Both methods
were positively biased from the true value of σ
a1
2
with the Bayesian approach (10%)
biased more than ML estimation (4%). When estimating σ
a01
, ML estimation was 15%
negatively biased while the Bayesian approach was only 6% positively biased. In contrast
to the trend found in the A components, the values of C components were always
underestimated in the Bayesian models but overestimated in the ML models. When data
were 30% MCAR, ML estimation was 8% positively biased from the true value of σ
c0
2
on
average while the Bayesian approach was only 4% negatively biased. Both methods were
negatively biased from the true value of σ
c1
2
with the Bayesian approach (7%) biased
more than ML estimation (2%). When estimating σ
c01
, ML estimation was 21% positively
biased while the Bayesian approach was only 1% negatively biased.
The biases pattern under 50% MCAR condition was similar to that under 30%
MCAR. When data were 50% MCAR, the Bayesian approach was 2% positively biased
from the true value of σ
a0
2
on average while ML estimation was 3% negatively biased.
Both methods were positively biased from the true value of σ
a1
2
with a 10% bias for the
Bayesian approach and a 9% bias for ML estimation. When estimating σ
a01
, ML
estimation was 7% negatively biased while the Bayesian approach was only 3%
positively biased. In contrast, values of C components were always underestimated in the
41
Bayesian models but overestimated in the ML models. ML estimation was 5% positively
biased from the true value of σ
c0
2
on average while the Bayesian approach was only 1%
negatively biased. Both methods were negatively biased from the true value of σ
c1
2
with
the Bayesian approach 7% biased and ML estimation 6% biased. When estimating σ
c01
,
ML estimation was 14% positively biased while the Bayesian approach was only 5%
negatively biased.
Finally, the Bayesian approach generally demonstrated larger biases than the ML
estimation in both the E components and residual error σ
u
2
. The Bayesian approach
tended to overestimate the residual error (1% positively biased under 30% MCAR and 2%
positively biased under 50% MCAR), which in turn lowering estimates of all the E
components, as compared to ML estimation. The ML estimation succeeded in retaining
the residual error at a stable value (0.5% negatively biased under both missingness rates).
As a result of this tendency, it was not surprising to observe that the absolute values of
sum variance-covariance between G0 and G1 (σ
g0
2
, σ
g1
2
and σ
g01
) decreased as
missingness fractions increased due to inflation in residual errors and thus loss in E
components.
4.4 Summary and Discussion
This simulation study aimed to compare the robustness of the Bayesian and direct
ML estimation to different fractions of missing data (complete data, 30% missing, and 50%
missing) under the scenario of MCAR. The main findings were as follows: (1) both
methods were equally robust in estimates of fixed effects even when the data were
missing as much as 50%; (2) the Bayesian approach consistently demonstrated smaller
standard deviation (larger efficiency) in estimating every parameter than the ML
42
estimation across each condition; (3) both methods showed comparable bias in estimating
A components, but in different directions: the Bayesian approach tended to overestimate
but the direct ML estimation tended to underestimate; (4) opposite to what was found in
A components, the Bayesian approach tended to underestimate but the direct ML
estimation tended to overestimate C components; (5) the Bayesian approach showed
larger positive bias on residual error but negative bias on E components than direct ML
estimation across each missingness fraction. There seemed to be a trade-off effect
between residual error and E components in Bayesian models: the residual error grew
with increase of missingness fraction, and in turn the magnitude of E components
declined, which directly resulted in smaller magnitude in estimating the total variance-
covariance between intercept G0 and G1.
The trade-off effect exhibited by the Bayesian approach implies that it may have
difficulty in differentiating error components of latent factors from error components of
observed variables, probably because these two parameters were more homogeneous with
each other as compared to other categories of random effects like those A components
and C components. Furthermore, the sampling of one parameter conditional on the other
parameters in the model characterized by the Gibbs sampling (Geman & Geman, 1993)
may be inefficient in identifying them.
A larger positive estimate bias of variance components in Bayesian approach as
compared to likelihood-based methods was also reported by Browne & Draper (2006) in
a two-level intercept-only model, but they also found the former performed better than
the latter in 95% confidence intervals of the variance components. The estimate bias and
coverage of confidence intervals were also found to vary as a function of many factors:
43
the sample size of level-2 units (parallel to sample size of families in the current study),
the ratio of intercept variance to the residual error, the interclass correlation, measure of
center in posterior distribution (mean, median, or mode), etc. The choice of prior
distribution for variance components also matters. Currently, the most commonly used
priors for a scalar variance component in Bayesian models are non-informative
distributions like uniform and inverse-gamma families, while for the multivariate
variance component the inverse-wishart are usually recommended. Gelman (2006)
suggested that half-t family was a better option than these non-informative distributions.
A future line of the study would be to investigate how bias of the residual error and E
components is impacted by these factors, and under what prior distributions the bias of
residual error can achieve the level of robustness of ML estimation.
Despite poorer performance in estimating residual errors and E components, the
Bayesian approach was as robust as the ML estimation in the other random effects as
well as the fixed effects. It also possessed the advantage of a smaller standard deviation
(i.e. larger efficiency) for almost every parameter. This larger efficiency produced by
Bayesian estimation in both the complete and incomplete data scenarios may benefit from
multiple imputation and data augmentation embedded in the MCMC algorithm. The
parameter was estimated by averaging over many draws from the posterior distribution,
which effectively limits the loss of efficiency (Little & Rubin, 2002).
Although the finding that the ML estimation succeeded in retaining the residual
error at a stable level while the Bayesian approach failed in this stimulation study partly
supported the prediction of previous research (Rubin, 2003; Wothke, 2000; Yuan, Bentler,
& Zhang, 2005) that the direct ML estimation is relatively robust to varying fractions of
44
missing data in estimate bias, the Bayesian approach could still be an attractive technique
when encountering missing data since in Bayesian models, it is possible and feasible to
formulate the assumption about the missingness patterns and evaluate it explicitly, which
is especially helpful if the data are MNAR (Daniels & Hogan, 2007). The next chapter
thus presents a second simulation study of MNAR data.
45
Chapter 5: Simulation Study II - Compare Performance under
MNAR Data
The aims of this study were to investigate: (1) how existence of MNAR data
could bias parameter estimates and thus threaten the validity of conclusions if ignored; (2)
whether the Bayesian approach could provide a way of addressing the biases caused by
MNAR data; and (3) the robustness of the Bayesian approach and ML estimation to
varying fractions of missing information.
5.1 Design
To mimic the pattern of the empirical data set, only data at T4 were constructed to
be MNAR while data at T2 and T3 were exactly the same as those MCAR ones in
simulation study I. Two cases of MNAR data were investigated in this study.
Case I. In consideration of the dependence in twin data, in this case, the
probability a subject was missing was assumed to depend on levels of both twins in that
family. Therefore, the following logit model was used to decide which cases were
missing in this case:
All regression coefficients in this regression were fixed effects. A uniform
random variable was generated here and compared to the resulting logit values. The
corresponding Y values were deleted if the uniform random variable was smaller than the
logit value; otherwise keep them. The values of
,
and
in this logit model were
fixed at -31.5, 0.8 and 0.6 in order to get a 30% missingness rate, while -32.5, 1 and 0.8
to obtain a 50% missingness rate on average across the 100 replications.
46
For the 30% MNAR datasets, on average the sample means of four waves in
sequence were 9.98, 13, 15.98 and 15.46, while the corresponding standard deviations
were 5.29, 5.30, 6.32 and 6.24 respectively. The missingness rate at T4 on average across
100 replications was 30.67%. Results from 50% MNAR datasets were comparable, with
sample means 9.98, 12.98, 16.00 and 12.81, and standard deviations 5.29, 5.29, 6.34 and
5.64 in order (see Table 5.1 for more descriptive statistics, e.g. standard error). The
average missingness rate at T4 was 55.59%. Figure 5.1 displays the trajectories of the
variable Y in 25% randomly selected samples with only one twin in each family when
data were 30% MNAR and 50% MNAR respectively.
Case II. The second case resembles the multilevel regression model, which
assumed that probability that an individual in each family was missing was only
associated with Y values of that individual via a logistic regression, but the intercept in
this logit model might differ among families. The MNAR dataset was thus constructed in
the following way:
Similar to case I, a uniform random variable was generated and then compared to
the resulting values from this logit model. If the resulting value was larger than the
uniform random variable, then Y values of this case was deleted; otherwise kept. The
value of
was set to be -23 in order to get a 30% missingness rate, while -18 to obtain a
50% missingness rate on average across the 100 replications.
For the 30% MNAR datasets, on average the sample means of four waves in
sequence were 9.98, 13, 15.98 and 15.34, while the corresponding standard deviations
were 5.29, 5.30, 6.32 and 6.36 respectively. The missingness rate at T4 on average across
47
100 replications was 33.65%. Results from 50% MNAR datasets were comparable, with
sample means 9.98, 12.98, 16.00 and 13.25, and standard deviations 5.29, 5.29, 6.34 and
5.93 in order (see Table 5.1 for more descriptive statistics, e.g. standard error). The
average missingness rate at T4 was 54.18%. Figure 5.2 displays the trajectories of the
variable Y in 25% randomly selected samples with only one twin in each family when
data were 30% MNAR and 50% MNAR respectively.
5.2 Analysis
Models fitted to MNAR condition were relatively more complex than those in
MCAR condition. First, if ignoring the existence of MNAR (as aim (1) suggested), the
data were usually assumed to take a non-linear growth pattern based on the descriptive
statistics across 100 MNAR datasets. A regular non-linear latent growth model was thus
established in both the ML and Bayesian estimation. Next, to achieve aim (2), the
assumption about the missing data mechanism itself could also be added as another part
of the model in the Bayesian approach, that is, a logistic regression about the underlying
missingness mechanism was formulated here and its parameters were also subject to
estimation. This brought a third model: the Bayesian logistic model.
So for MNAR data in Case I, on basis of the regular model, the following logistic
regression model was added (
if
is missing, otherwise
):
Uninformative priors were used when estimating the values of parameters in this
logit model. That was:
.
48
For Case II, the logistic regression about the missingness mechanism was formulated as
follows:
To estimate parameters in this model, the following priors were used:
To simplify and differentiate, these two models were referred to as ‘Bayesian
logistic model-I’ and ‘ Bayesian logistic model-II’ respectively. Therefore, for each
dataset, three models were fitted and then compared: the ML regular non-linear model,
the Bayesian regular non-linear model, and the Bayesian logistic models (‘Bayesian
logistic model-I’ for Case I and ‘Bayesian logistic model -II’ for Case II).
In addition to those 15 parameters in the MCAR condition, several more
parameters of interest arose as a result of the increase in complexity of models fitted to
the MNAR datasets. First, given that non-linear models were considered, two more fixed
effects (latent basis for T2 – A2 and latent basis for T3 – A3) were added for both ML
and Bayesian models. Second, because of the extra logit models in the Bayesian logistic
models, for the Bayesian logistic model-I, three more parameters were added: b
0
, b
1
and
b
2
, while for the Bayesian logistic model-II, there were also three more parameters to
compare:
0
, σ
0
2
and b
1
.
49
Figure 5.1: An example plot of longitudinal trajectories of the dependent variable across
four waves (25% random sample, one child per family) for MNAR data (Case I), (a) for
MNAR 30%, (b) for MNAR 50%
(a)
(b)
y
-20
-10
0
10
20
30
40
Wave at Testing
1 2 3 4
y
-20
-10
0
10
20
30
40
Wave at Testing
1 2 3 4
50
Figure 5.2: An example plot of longitudinal trajectories of the dependent variable across
four waves (25% random sample, one child per family) for MNAR data (Case II), (a) for
MNAR 30%, (b) for MNAR 50%
(a)
(b)
y
-20
-10
0
10
20
30
40
Wave at Testing
1 2 3 4
y
-20
-10
0
10
20
30
40
Wave at Testing
1 2 3 4
51
Table 5.1: Descriptive Statistics of MNAR datasets across 100 replications
30% MNAR Case I 50% MNAR Case I
T1 T2 T3 T4 T1 T2 T3 T4
Missingness
rate
-- 28.37 (1.68) 28.73 (1.75) 30.67 (1.67) -- 47.41(2.05) 47.83 (1.72) 55.59 (1.97)
Mean (SE) 9.98 (0.18) 13.00 (0.19) 15.98 (0.26) 15.46 (0.23) 9.98 (0.18) 12.98 (0.24) 16.00 (0.31) 12.81 (0.21)
SD (SE) 5.29 (0.10) 5.30 (0.13) 6.32 (0.16) 6.24 (0.16) 5.29 (0.10) 5.29 (0.16) 6.34 (0.17) 5.64 (0.19)
30% MNAR Case II 50% MNAR Case II
T1 T2 T3 T4 T1 T2 T3 T4
Missingness
rate
-- 28.37 (1.68) 28.73 (1.75) 33.65 (1.40) -- 47.41 (2.05) 47.83 (1.72) 54.18 (1.83)
Mean (SE) 9.98 (0.18) 13.00 (0.19) 15.98 (0.26) 15.34 (0.26) 9.98 (0.18) 12.98 (0.24) 16.00 (0.31) 13.25 (0.26)
SD (SE) 5.29 (0.10) 5.30 (0.13) 6.32 (0.16) 6.36 (0.16) 5.29 (0.10) 5.29 (0.16) 6.34 (0.17) 5.93 (0.20)
Note. These values are average values of the 100 replications. Numbers inside the parentheses are the standard deviations of the
corresponding statistics across the 100 replications.
52
Table 5.2: Compare robustness of Bayesian and ML estimation to varying fractions of MNAR Data (Case I)
Parameter
30% MNAR (Case I)
50% MNAR (Case I)
Bayesian
Logistic-I
ML
Non-Linear
Bayesian
Non-Linear
a
Bayesian
Logistic-I
b
ML
Non-Linear
Bayesian
Non-Linear
Fixed effects
G0 =10 9.98 (0.17) 9.99 (0.17) 9.99 (0.17) 9.98 (0.18) 9.98 (0.17) 9.98 (0.18)
G1 =3 3.01 (0.11) 2.25 (0.10) 2.24 (0.10) 3.01 (0.13) 1.51 (0.10) 1.53 (0.10)
A[2] =1 1.00 (0.05) 1.32 (0.07) 1.32 (0.07) 1.00 (0.06) 1.89 (0.15) 1.88 (0.14)
A[3] =2 1.99 (0.05) 2.61 (0.07) 2.63 (0.07) 2.00 (0.08) 3.71 (0.26) 3.70 (0.24)
Random effects
σ
u
2
= 16 16.16 (0.58) 16.09 (0.62) 16.37 (0.58) 16.27 (0.66) 16.72 (0.77) 16.93 (0.75)
Additive Genetic
σ
a0
2
= 6 6.37 (2.65) 5.82 (3.05) 6.30 (2.95) 6.20 (2.61) 5.42 (3.01) 5.60 (2.60)
σ
a1
2
=2 2.22 (0.80) 1.45 (0.70) 1.21 (0.33) 2.27 (0.83) 0.81 (0.47) 0.67 (0.18)
σ
a01
= -1.5 -1.65 (1.30) -1.24 (1.27) -1.35 (0.96) -1.63 (1.25) -0.89 (0.93) -0.95 (0.57)
Shared Environ.
σ
c0
2
=4 3.84 (2.32) 4.69 (2.63) 4.34 (2.58) 3.89 (2.19) 4.46 (2.56) 4.23 (2.27)
σ
c1
2
=2 1.89 (0.62) 0.65 (0.52) 0.86 (0.26) 1.86 (0.62) 0.40 (0.35) 0.56 (0.29)
σ
c01
=-1 -1.00 (1.08) -1.15 (1.02) -1.07 (0.80) -0.97 (1.01) -0.97 (0.78) -0.83 (0.60)
53
Table 5.2 (Continued): Compare robustness of Bayesian and ML estimation to varying fractions of MNAR Data (Case I)
Parameter
30% MNAR (Case I)
50% MNAR (Case I)
Bayesian
Logistic-I
ML
Non-Linear
Bayesian
Non-Linear
a
Bayesian
Logistic-I
b
ML
Non-Linear
Bayesian
Non-Linear
Unique Environ.
σ
e0
2
=2 1.41 (0.73) 1.95 (1.10) 1.34 (0.72) 1.43 (0.73) 1.44 (1.12) 1.20 (0.67)
σ
e1
2
=2 1.77 (0.41) 1.53 (0.39) 1.45 (0.35) 1.72 (0.43) 0.79 (0.33) 0.80 (0.33)
σ
e01
= -0.5 -0.13 (0.47) -0.47 (0.53) -0.21 (0.42) -0.10 (0.45) -0.09 (0.46) -0.03 (0.37)
Total variance
σ
g0
2
= 12 11.63 (1.07) 12.46 (1.13) 11.98 (1.11) 11.45 (0.99) 11.31 (1.28) 11.04 (1.32)
σ
g1
2
= 6 5.89 (0.49) 3.62 (0.37) 3.51 (0.37) 5.79 (0.49) 2.00 (0.43) 2.04 (0.60)
σ
g01
= -3 -2.77 (0.52) -2.86 (0.49) -2.64 (0.48) -2.67 (0.54) -1.94 (0.55) -1.76 (0.73)
Logit Part
b
0
= -31.5/-32.5 -34.32 (5.87) -- -- -36.82 (6.06) -- --
b
1
=0.8/1 0.87 (0.15) -- -- 1.13 (0.18) -- --
b
2
=0.6/0.8 0.65 (0.11) -- -- 0.91 (0.15) -- --
a. Three of the 100 datasets failed in convergence after 110000 iterations with the first 10000 as burn-in.
b.
One of the 100 datasets failed in convergence after 110000 iterations with the first 10000 as burn-in.
54
Table 5.3: Compare robustness of Bayesian and ML estimation to varying fractions of MNAR Data (Case II)
Parameter
30% MNAR (Case II)
50% MNAR (Case II)
Bayesian
Logistic-II
a
ML
Non-linear
Bayesian
Non-linear
Bayesian
Logistic-II
a
ML
Non-linear
Bayesian
Non-linear
Fixed effects
G0 =10 9.98 (0.17) 10.00 (0.18) 9.99 (0.17) 9.98 (0.18) 9.99 (0.18) 9.99 (0.18)
G1 =3 3.01 (0.12) 2.18 (0.10) 2.18 (0.10) 3.02 (0.17) 1.61 (0.10) 1.62 (0.10)
A[2] =1 1.00 (0.05) 1.35 (0.07) 1.35 (0.07) 0.99 (0.07) 1.75 (0.13) 1.75 (0.12)
A[3] =2 2.00 (0.05) 2.67 (0.08) 2.68 (0.08) 1.99 (0.10) 3.42 (0.18) 3.44 (0.18)
Random effects
σ
u
2
= 16 16.17 (0.55) 15.96 (0.60) 16.23 (0.56) 16.27 (0.68) 16.18 (0.83) 16.51 (0.73)
Additive Genetic
σ
a0
2
= 6 6.45 (2.52) 5.97 (3.11) 6.67 (2.80) 6.27 (2.58) 5.77 (3.10) 6.12 (2.63)
σ
a1
2
=2 2.22 (0.84) 1.32 (0.80) 1.41 (0.41) 2.31 (0.81) 0.99 (0.58) 0.96 (0.26)
σ
a01
= -1.5 -1.67 (1.27) -1.34 (1.32) -1.58 (1.01) -1.64 (1.27) -1.20 (1.06) -1.19 (0.70)
Shared Environ.
σ
c0
2
=4 3.76 (2.23) 4.52 (2.68) 3.98 (2.44) 3.84 (2.18) 4.31 (2.61) 3.96 (2.25)
σ
c1
2
=2 1.88 (0.65) 1.27 (0.65) 1.19 (0.35) 1.87 (0.68) 0.84 (0.51) 0.84 (0.24)
σ
c01
=-1 -0.95 (1.03) -1.15 (1.06) -0.97 (0.83) -0.96 (1.07) -0.85 (0.91) -0.79 (0.63)
55
Table 5.3 (Continued): Compare robustness of Bayesian and ML estimation to varying fractions of MNAR Data (Case II)
Parameter
30% MNAR (Case II)
50% MNAR (Case II)
Bayesian
Logistic-II
a
ML
Non-linear
Bayesian
Non-linear
Bayesian
Logistic-II
a
ML
Non-linear
Bayesian
Non-linear
Unique Environ.
σ
e0
2
=2 1.40 (0.70) 2.05 (1.10) 1.43 (0.70) 1.41 (0.77) 1.86 (1.16) 1.36 (0.74)
σ
e1
2
=2 1.80 (0.44) 1.03 (0.35) 0.92 (0.26) 1.80 (0.49) 0.66 (0.30) 0.62 (0.19)
σ
e01
= -0.5 -0.14 (0.50) -0.37 (0.51) -0.12 (0.33) -0.11 (0.48) -0.25 (0.47) -0.11 (0.25)
Total variance
σ
g0
2
= 12 11.61 (1.07) 12.54 (1.13) 12.08 (1.07) 11.52 (1.05) 11.94 (1.18) 11.44 (1.08)
σ
g1
2
= 6 5.91 (0.51) 3.62 (0.34) 3.52 (0.33) 5.98 (0.74) 2.50 (0.42) 2.41 (0.36)
σ
g01
= -3 -2.76 (0.54) -2.86 (0.48) -2.67 (0.45) -2.71 (0.53) -2.29 (0.51) -2.09 (0.44)
Logit Part
0
= -23/-18 -35.43 (10.96) -- -- -28.93 (10.33) -- --
σ
0
2
=25 79.73 (50.79) -- -- 89.98 (58.51) -- --
b
1
= 1 1.53 (0.47) -- -- 1.60 (0.56) -- --
a.
One of the 100 datasets failed in convergence after 110000 iterations with the first 10000 as burn-in.
56
5.3 Results
Case I. When there were MNAR data at T4, it could be seen that most estimates
from both the Bayesian non-linear and the ML non-linear models were far from their true
values, and the biases became larger with increased fractions of missingness data. First,
they failed to detect the true linear model from the data and misinterpreted them as non-
linear model. Under the 50% MNAR condition, the data were even misconstrued as a
quadratic model that decreased from T3 to T4, and the mean value of slope was only
around one half of the true value 3 (G1= 1.51, A[2] = 1.89, A[3] = 3.71 for ML non-
linear at 50% MNAR; G1= 1.53, A[2] = 1.88, A[3] = 3.70 for Bayesian non-linear at 50%
MNAR). Second, as compared to results in simulation study I of MCAR data, there was
a dramatic decrease in random components of the slope variance (σ
a1
2
= 1.45, σ
c1
2
= 0.65,
σ
e1
2
= 1.53 and σ
g1
2
= 3.62 for ML non-linear model at 30% MNAR; σ
a1
2
= 0.81, σ
c1
2
=
0.40, σ
e1
2
= 0.79 and σ
g1
2
= 2.00 for ML non-linear model at 50% MNAR; σ
a1
2
= 1.21,
σ
c1
2
= 0.86, σ
e1
2
= 1.45 and σ
g1
2
= 3.51 for Bayesian non-linear model at 30% MNAR;
σ
a1
2
= 0.67, σ
c1
2
= 0.56, σ
e1
2
= 0.80 and σ
g1
2
= 2.04 for Bayesian non-linear model at 50%
MNAR). Third, the random effects of covariance between G0 and G1 also witnessed a
large decline (σ
a01
= -1.24, σ
c01
= -1.15, σ
e01
= -0.47 and σ
g01
= -1.94 for ML non-linear
model at 30% MNAR; σ
a01
= -0.89, σ
c01
= -0.97, σ
e01
= -0.09 and σ
g01
= -1.94 for ML non-
linear model at 50% MNAR; σ
a1
2
= -1.35, σ
c1
2
= -1.07, σ
e1
2
= -0.21 and σ
g1
2
= -2.64 for
Bayesian non-linear model at 30% MNAR; σ
a01
= -0.95, σ
c01
= -0.83, σ
e01
= -0.03 and σ
g01
= -1.76 for Bayesian non-linear model at 50% MNAR). Finally, the residual error was
amplified and there was a positive bias exacerbated as fraction of missing data increased
(σ
u
2
= 16.09 for ML non-linear model at 30% MNAR; σ
u
2
= 16.72 for ML non-linear
57
model at 50% MNAR; σ
u
2
= 16.37 for Bayesian non-linear model at 30% MNAR; σ
u
2
=
16.93 for Bayesian non-linear model at 50% MNAR).
Despite huge biases, the Bayesian non-linear model and ML non-linear model
demonstrated similar pattern of results as in the scenario of MCAR data when compared
to each other. That is, they exhibited comparable performances in estimating fixed effects,
the A component and C components of the biometric latent growth model. The former
showed smaller standard deviation on every parameter except the total variance
components σ
g0
2
, σ
g1
2
, and σ
g01
. Again, when it came to the residual errors and E
components, there was larger bias in the Bayesian approach and a trade-off effect
between these two types of random effects.
In contrast to results from the Bayesian non-linear and ML non-linear models, the
Bayesian logistic model-I performed very well in deciphering true models from the
MNAR data. First, this model was very accurate in capturing the linear growth pattern of
the true model (G0 = 9.98, G1= 3.01, A[2] = 1.00, A[3] = 1.99 at 30% MNAR; G0 = 9.98,
G1= 3.01, A[2] = 1.00, A[3] = 2.00 for at 50% MNAR). The estimates of random effects
by fitting this model to the MNAR data were also very similar to those values in
simulation study I of MCAR data (σ
a0
2
= 6.37, σ
a1
2
= 2.22, σ
a01
= -1.65, σ
c0
2
= 3.84, σ
c1
2
=
1.89, σ
c01
= -1.00, σ
e0
2
= 1.41, σ
e1
2
= 1.77, σ
e01
= -0.13, σ
g0
2
= 11.63, σ
g1
2
= 5.89, σ
g01
= -
2.77 for 30% MNAR; σ
a0
2
= 6.20, σ
a1
2
= 2.27, σ
a01
= -1.63, σ
c0
2
= 3.89, σ
c1
2
= 1.86, σ
c01
=
-0.97, σ
e0
2
= 1.44, σ
e1
2
= 1.79, σ
e01
= -0.10, σ
g0
2
= 11.45, σ
g1
2
= 5.79, σ
g01
= -2.67 for 50%
MNAR). The estimates of residual error was less biased and more stable compared to the
Bayesian non-linear and ML non-linear models (σ
u
2
= 16.16 for 30% MNAR; σ
u
2
= 16.27
for 50% MNAR). Finally, it also succeeded in detecting the true parameters of the
58
logistic regression part formulating assumption about the missingness mechanism (b
0
= -
34.32, b
1
= 0.87, b
2
= 0.65 for 30% MNAR; b
0
= -36.82, b
1
= 1.13, b
2
= 0.91 for 50%
MNAR). More details of these results could be looked up in Table 5.2.
Case II. Similar to Case I, both the Bayesian non-linear and the ML non-linear
models produced huge biases in estimating most parameters and these biases became
larger with increased fractions of missing data. Data were still misinterpreted to take a
non-linear growth pattern in both 30% MNAR and 50% MNAR conditions. When data
were missing at 50% level, a quadratic model with decrease from T3 to T4 recurred (G1=
1.61, A[2] = 1.75, A[3] = 3.42 for ML non-linear at 50% MNAR; G1= 1.62, A[2] = 1.75,
A[3] = 3.44 for Bayesian non-linear at 50% MNAR). Compared to results in simulation
study I of MCAR data, a dramatic decrease was found in random components of the
slope variance (σ
a1
2
= 1.32, σ
c1
2
= 1.27, σ
e1
2
= 1.03 and σ
g1
2
= 3.62 for ML non-linear
model at 30% MNAR; σ
a1
2
= 0.99, σ
c1
2
= 0.84, σ
e1
2
= 0.66 and σ
g1
2
= 2.50 for ML non-
linear model at 50% MNAR; σ
a1
2
= 1.41, σ
c1
2
= 1.19, σ
e1
2
= 0.92 and σ
g1
2
= 3.52 for
Bayesian non-linear model at 30% MNAR; σ
a1
2
= 0.96, σ
c1
2
= 0.84, σ
e1
2
= 0.62 and σ
g1
2
=
2.41 for Bayesian non-linear model at 50% MNAR). The random effects of covariance
between G0 and G1 also witnessed a large decline (σ
a01
= -1.34, σ
c01
= -1.15, σ
e01
= -0.37
and σ
g01
= -2.86 for ML non-linear model at 30% MNAR; σ
a01
= -1.20, σ
c01
= -0.85, σ
e01
=
-0.25 and σ
g01
= -2.29 for ML non-linear model at 50% MNAR; σ
a01
= -1.58, σ
c01
= -0.97,
σ
e01
= -0.12 and σ
g01
= -2.67 for Bayesian non-linear model at 30% MNAR; σ
a01
= -1.19,
σ
c01
= -0.79, σ
e01
= -0.11 and σ
g01
= -2.09 for Bayesian non-linear model at 50% MNAR).
Finally, the residual error was amplified and there was a bias positively associated with
fraction of missing data (σ
u
2
= 15.96 for ML non-linear model at 30% MNAR; σ
u
2
=
59
16.18 for ML non-linear model at 50% MNAR; σ
u
2
= 16.23 for Bayesian non-linear
model at 30% MNAR; σ
u
2
= 16.51 for Bayesian non-linear model at 50% MNAR).
Despite huge biases, the Bayesian non-linear model and ML non-linear model
demonstrated similar pattern of results as in the scenario of MCAR data when compared
to each other. That is, they exhibited comparable performances in estimating fixed effects,
the A component and C components of the biometric latent growth model. The former
showed smaller standard deviation (larger efficiency) on every parameter. Again, when it
came to the residual errors and E components, there was larger bias in the Bayesian
approach and a trade-off effect between these two types of random effects.
When estimating the parameters of the logistic regression part formulating
assumption about the missingness mechanism, the Bayesian logistic model-II was largely
biased from the true values (
0
= -35.43, σ
0
2
= 79.73, b
1
= 1.53 for 30% MNAR;
0
= -
28.93, σ
0
2
= 89.98, b
1
= 1.60 for 50% MNAR). Despite this large bias, it still performed
very well in deciphering true models from the MNAR data as compared to the Bayesian
non-linear and ML non-linear models. First, this model accurately detected the linear
growth pattern from the MNAR data (G0 = 9.98, G1= 3.01, A[2] = 1.00, A[3] = 2.00 at
30% MNAR; G0 = 9.98, G1= 3.02, A[2] = 0.99, A[3] = 1.99 for at 50% MNAR). The
estimates of random effects by fitting this model to the MNAR data were also very
similar to those values in simulation study I of MCAR data (σ
a0
2
= 6.45, σ
a1
2
= 2.22, σ
a01
= -1.67, σ
c0
2
= 3.76, σ
c1
2
= 1.88, σ
c01
= -0.95, σ
e0
2
= 1.40, σ
e1
2
= 1.80, σ
e01
= -0.14, σ
g0
2
=
11.61, σ
g1
2
= 5.91, σ
g01
= -2.76 for 30% MNAR; σ
a0
2
= 6.27, σ
a1
2
= 2.31, σ
a01
= -1.64, σ
c0
2
= 3.84, σ
c1
2
= 1.87, σ
c01
= -0.96, σ
e0
2
= 1.41, σ
e1
2
= 1.80, σ
e01
= -0.11, σ
g0
2
= 11.52, σ
g1
2
=
5.98, σ
g01
= -2.71 for 50% MNAR). The estimates of residual error (σ
u
2
= 16.17 for 30%
60
MNAR; σ
u
2
= 16.27 for 50% MNAR) were almost the same as the values from MCAR
data and also the MNAR data Case I, which were more stable compared to the Bayesian
non-linear and ML non-linear models. More details of these results can be found in Table
5.3.
5.4 Summary and Discussion
The influences of MNAR data on model fitting could be destructive if the
existence of MNAR was completely ignored. As can be seen from results of this
simulation study, treating the MNAR data as missing (completely) at random misread a
non-linear change pattern from the truly linear growth data. All parameter estimates
(including fixed effects and random effects) related to the intercept G0 were relatively
robust to the existence of MNAR data given that data at T1 were always complete.
However, all estimates of A, C and E components of slope and covariance between
intercept and slope were attenuated by MNAR data. As a result, the residual error was
amplified. The larger the fraction of missing data was, the more biased these results were.
Since the Bayesian approach makes it possible to formulate the assumption about
the missing data mechanism in terms of prior distribution, the true model was able to be
recovered from the MNAR data by applying two different Bayesian logistic models to
two cases of MNAR data here. As expected, estimates of fixed effects including G0, G1,
A2 and A3 were all as accurate as the results from MCAR data. Through implementing
data augmentation by sampling from the joint distribution of missing data and complete
data, these two models also succeeded in complementing A, C, E components in slope
and covariance between slope and intercept that were attenuated by the MNAR data.
61
Finally, the residual error was thus maintained at a more stable and less biased level close
to that from the MCAR data.
To summarize, results from simulation study II have demonstrated that the
Bayesian logistic models could alleviate the influences of MNAR data to a large extent.
However, in real studies, it is usually impossible to know for sure whether a specific
dataset has data that are missing at random or not, not to mention what mechanism leads
to MNAR data. Therefore, it is also important to know whether the Bayesian approach
can detect misspecification about missingness pattern and how sensitive the Bayesian
approach is to wrong choices of model.
62
Chapter 6: Simulation Study III - the Sensitivity of the
Bayesian Model to Misspecification
This study aimed to explore the sensitivity of the Bayesian approach to
misspecification. Specifically, it was an open question to what extent the Bayesian
approach was able to detect true parameters if the assumption about the missingness
mechanism was not appropriate? This question could be twofold: (1) could true
parameters be detected if the assumption about MCAR data was inappropriate, i.e.
MCAR data were mistakenly thought to be MNAR and the Bayesian logistic model was
thus fitted? (2) What if the assumption about MNAR data was not appropriate? This may
involve treating MNAR data as MCAR or an inappropriate logistic regression model for
the assumed missingness pattern.
6.1 Design
Since examining robustness to various fractions of missing information was not
the focus of this study, only data missing around 50% were considered here. That means,
only 50% MCAR data, 50% MNAR data Case I and 50% MNAR Case II of previous
simulation studies entered into analysis of this study.
6.2 Analysis
For data at each condition, the non-linear model with a new logistic regression
part was fitted. The equation of the new logistic regression was as follows:
Where the following priors were used to estimate parameters in this regression equation:
63
,
Unlike the Bayesian logistic model-I, this model treats the intercept b
0
as a
random effect varying by families instead of a fixed one that is the same for each family.
As compared to the Bayesian logistic model-II, this model assumes the probability that a
subject was missing at T4 also depends on his/her sibling. To summarize, this model was
apparently a more ‘saturated’ model than Bayesian model-I and Bayesian model-II. To
differentiate, it was referred to as ‘ Bayesian logistic model-III’ throughout this research.
Therefore, there were 21 parameters subject to comparisons here. The first 15
parameters were the same as those in simulation study I. Then two more fixed effects
(latent basis for T2 – A2 and latent basis for T3 – A3) due to non-linear model were
considered. The extra logit models in the Bayesian logistic model-III added four more
parameters:
0
, σ
0
2
, b
1
and b
2
.
If misspecification could be detected, these results were expected from the
Bayesian logistic model-III: (1) for the MCAR data, the values of b
1
and b
2
should be
close to zero on average; no predictions were made for
0
and σ
0
2
at this moment; (2) for
the MNAR data (Case I), the mean values of
0
, b
1
and b
2
were close to -32.5, 1 and 0.8
(the true parameters of logit model in simulating MNAR data Case I) respectively while
keeping the value of σ
0
2
close to zero on average in the meantime; and (3) for the MNAR
data (Case II), the mean values of
0
, σ
0
2
and b
1
should be close to -18, 25 and 1 (the true
parameters of logit model in simulating MNAR data Case II) respectively but keeping the
mean value of b
2
around zero.
64
Therefore, for MCAR data and MNAR data (Case II), a very important question
naturally followed: what is the type I error for the null hypothesis that the probability an
individual was missing at T4 did not depend on Y values of that individual or his/her
sibling? For this purpose, sample distributions of the 95% credible intervals for means of
b
1
and b
2
from the Bayesian logistic model-III were also examined.
6.3 Results
MCAR. Despite misspecifying that the probability that an individual was missing
at T4 was associated with Y values of both that subjects and his/her sibling, this model
still seemed successful in detecting the true model from the data. The estimates of all
fixed effects in the biometric latent growth model were very accurate and kept the linear
pattern (G0 = 9.97, G1= 3.01, A[2] = 1.00, A[3] = 2.00). As compared to the Bayesian
regular linear model, one of the A components was amplified by a notable amount and
more positively biased (σ
a0
2
=6.32 for model-III Vs. σ
a0
2
=6.14 for regular linear model),
while the remaining A components was less biased or kept the same bias (σ
a1
2
=2.18,
σ
a01
=-1.54 for model-III Vs. σ
a1
2
=2.21, σ
a01
=-1.54 for regular linear model). The opposite
trend was found for C components. One of the C components declined by a notable
amount and was thus more negatively biased (σ
c0
2
=3.76 for model-III Vs. σ
c0
2
=3.95 for
regular linear model), while the remaining C components were all amplified and became
less biased (σ
c1
2
=1.87, σ
c01
=-1.00 for model-III Vs. σ
c1
2
=1.85, σ
c01
=-1.05 for regular
linear model). However, this model demonstrated better performance in estimating all E
components (σ
e0
2
=1.41, σ
e1
2
=1.75, σ
e01
=-0.12 for model-III Vs. σ
e0
2
=1.36, σ
e1
2
=1.73,
σ
e01
=-0.08 for regular linear model). Two of the total variances increased a little bit and
were less biased (σ
g0
2
=11.48, σ
g1
2
=5.80 for model-III Vs. σ
g0
2
=11.45, σ
g1
2
=5.79 for
65
regular linear model). Finally, the residual error was smaller and less biased (σ
u
2
= 16.28
for model-III Vs. σ
u
2
= 16.32 for regular linear model).
However, among the 100 replications, there were 6 datasets failing to converge
even after 110000 iterations with the first 10000 as burn-in. Secondly, although this
model seemed to succeed in detecting the true model out even under misspecifications, in
contrast to the prediction that the values of b
1
and b
2
should be close to zero on average,
the mean values of these two parameters were -2.00 (SD = 14.42) and 1.27 (SD = 8.90)
respectively. The mean estimate about the fixed effect of the intercept (
0
) was only -6.71
(SD = 28.26) while the random effect of the intercept was extremely huge (σ
0
2
= 4891997,
SD = 35586508). These results are summarized in Table 6.1.
After examining the result of each dataset one by one carefully, it was found that
most datasets had values of σ
0
2
in the range from 200 to 500, while 10 datasets among the
94 converged ones produced huge values of σ
0
2
and could be thought as outliers. Given
this divergence, these 94 datasets were thus divided into two groups: group 1 including
the other 84 normal ones with moderate values of σ
0
2
and group 2 including the 10 outlier
ones with huge values of σ
0
2
.
When looking at the group 1 with the 84 normal datasets, it could be seen that the
mean values of b
1
and b
2
were close to zero this time, with the former being 0.02 (SD =
0.13) and the latter to be 0.00 (SD = 0.10) respectively. The mean value of
0
was -1.49 in
this group while the random effect of the intercept (σ
0
2
) was only 445.24 this time. The
major improvements in this group were observed on σ
a0
2
(decreased from 6.32 for all the
94 converged datasets to 6.16 for the 84 normal ones only) and σ
c0
2
(increased from 3.76
66
for all the 94 datasets to 3.89 for the 84 ones only). There was also a little decline in the
residual error (σ
u
2
= 16.28 for 94 datasets vs. σ
u
2
= 16.27 for 84 normal datasets).
The performance of the second group with only the outlier datasets was much
worse as compared to group 1. In the logit part, the mean value of
0
was -50.55 in this
group while the random effect of the intercept (σ
0
2
) was 45981035, which was huge. As a
result, the mean values of b
1
and b
2
were not zero anymore but in opposite directions,
with the former to be -18.91 (SD = 42.36) and the latter to be 11.92 (SD = 26.02)
respectively. The fixed effects were less accurate but within tolerant range (G0 = 9.98,
G1= 3.01, A[2] = 1.00, A[3] = 2.00 for group 1 vs. G0 = 9.98, G1= 3.01, A[2] = 1.02,
A[3] = 2.03 for group 2). The worst performances occurred on four components: σ
a0
2
(increased from 6.16 for group 1 to 7.62 for group 2), σ
a01
(decreased from -1.51 for
group 1 to -1.75 for group 2), σ
c0
2
(decreased from 3.89 for group 1 to 2.63 for group 2),
and σ
c01
(increased from -1.02 for group 1 to -0.84 for group 2). The residual error was
also large and more positively biased (σ
u
2
= 16.27 for group 1 vs. σ
u
2
= 16.42 for group 2).
Figure 6.1 displays the sample distributions of 95% credible intervals (CIs) for
means of b
1
and b
2
by fitting the Bayesian logistic model-III to MCAR data. As can be
seen from this Figure, for mean of b
1
there were 7 out of 100 datasets whose 95% CIs did
not cover value 0, while for mean of b
2
there were 6 out of 100 not covering 0. This type-
I error was very close to the nominal level 0.05.
67
Figure 6.1: Distribution of sample credible intervals for mean of b
1
and b
2
by fitting the
Bayesian logistic model-III to MCAR data
(a)
68
(b)
MNAR Data Case I. When fitting the Bayesian logistic model-III to MNAR data
(Case I), the mean value of the random effects of the intercept (σ
0
2
) was only 0.22 (SD =
0.43), which was close to 0 as expected. However, the Bayesian model-III was more
biased than the Bayesian model-I in estimating the other parameters in the logit part. It
could be seen that the strengths of associations between Y values themselves and the
attrition probability decreased by a third as compared to those in the Bayesian logistic
model-I (b
1
=0.63, b
2
=0.50 for model-III Vs. b
1
=1.13, b
2
=0.91 for model-I), and similarly
69
for the fixed effects of the intercept (
0
= -20.27 for model-III Vs. b
0
= -36.82 for model-
I).
Despite this, the capability of the Bayesian logistic model-III in detecting the true
model from the MNAR data Case I was still reliable. The linear growth pattern of the
true model was successfully restored (G0 = 9.98, G1= 3.01, A[2] = 1.00, A[3] = 2.00).
The positive bias of the A components was amplified but still in an acceptable range
(σ
a0
2
=6.25, σ
a1
2
=2.29, σ
a01
=-1.64 for model-III Vs. σ
a0
2
=6.20, σ
a1
2
=2.27, σ
a01
=-1.63 for
model-I). The performance on C components was almost comparable to model-I
(σ
c0
2
=3.85, σ
c1
2
=1.95, σ
c01
=-0.97 for model-III Vs. σ
c0
2
=3.89, σ
c1
2
=1.86, σ
c01
=-0.97 for
model-I). Estimates of the E components were kept the same as model-I (σ
e0
2
=1.43, σ
e01
=-
0.10 for both model-III and model-I) except a small decrease in the E component of slope
(σ
e1
2
=1.70 for model-III Vs. σ
e1
2
=1.72 for model-I). Interestingly, the total variances from
the Bayesian logistic model-III were all larger and thus less biased than those in Bayesian
logistic model-I (σ
g0
2
=11.54, σ
g1
2
=5.94, σ
g01
=-2.71 for model-III Vs. σ
g0
2
=11.45, σ
g1
2
=5.79,
σ
g01
=-2.67 for model-I). As a result, the estimates of residual error were smaller in this
model (σ
u
2
= 16.23 for model-III Vs. σ
u
2
= 16.27 for model-I). These results are also
summarized in Table 6.2.
MNAR Data Case II. When fitting the Bayesian logistic model-III to MNAR data
(Case II), the association between the probability that an individual was missing at T4
and the Y value of that subject’s sibling was almost 0 as expected (b
1
=-0.02, SD = 0.05).
On average, the strength of association between the probability that an individual was
missing at T4 and his/her own Y values decreased but less biased than that in the
Bayesian logistic model-II (b
2
=1.36 for model-III Vs. b
2
=1.60 for model-II). So were the
70
random effects of the intercept (σ
0
2
=61.51 for model-III Vs. σ
0
2
=89.98 for model-II). As
compared to the Bayesian logistic model-II, the only parameter with larger estimate bias
in this model was the fixed effect of the intercept (
0
=-31.04 for model-III Vs.
0
=-
28.93 for model-II).
Figure 6.2 also displays the sample distributions of 95% credible intervals (CIs)
for mean of b
2
by fitting the Bayesian logistic model-III to MNAR data (Case II). As can
be seen from this Figure, among the 100 datasets, there were 5 datasets whose 95% CIs
did not cover value 0. This was exactly the nominal type-I error 0.05.
The Bayesian logistic model-III also performed well in deciphering the true
model from MNAR data Case II. As can be seen from Table 6.2, the estimates of all
fixed effects the biometric latent growth model were very accurate and exhibited a linear
pattern (G0 = 9.98, G1= 3.01, A[2] = 1.00, A[3] = 2.00). As compared to the Bayesian
logistic model-II, absolute values of two A components was also amplified by a notable
amount and more positively biased in this model (σ
a0
2
=6.50, σ
a01
=-1.70 for model-III Vs.
σ
a0
2
=6.27, σ
a01
=-1.64 for model-II), while the remaining A component decreased and was
less positively biased (σ
a1
2
=2.22 for model-III Vs. σ
a1
2
=2.31 for model-II). The opposite
trend was found for C components. Specifically, absolute values of two C components
was also attenuated by a notable amount and more negatively biased in this model
(σ
c0
2
=3.72, σ
c01
=-0.91 for model-III Vs. σ
c0
2
=3.84, σ
c01
=-0.96 for model-II), while the
remaining C component increased and was less negatively biased (σ
c1
2
=1.89 for model-III
Vs. σ
c1
2
=1.87 for model-II). However, this model demonstrated better performance in
estimating all E components (σ
e0
2
=1.43, σ
e1
2
=1.85, σ
e01
=-0.14 for model-III Vs. σ
e0
2
=1.41,
σ
e1
2
=1.80, σ
e01
=-0.11 for model-II). Two of the total variances increased (σ
g0
2
=11.66, σ
g01
=-
71
2.76 for model-III Vs. σ
g0
2
=11.52, σ
g01
=-2.72 for model-II) while the other one decreased a
little bit (σ
g1
2
=5.96 for model-III Vs. σ
g1
2
=5.98 for model-II). Finally, the residual error
was smaller and less biased (σ
u
2
= 16.15 for model-III Vs. σ
u
2
= 16.27 for model-II).
Figure 6.2: Distribution of sample credible intervals for mean of b
2
by fitting the
Bayesian logistic model-III to MNAR data (Case II)
72
Table 6.1: Results of fitting Bayesian Model-III to MCAR data
Parameter
50% MCAR 50% MCAR (Converged)
Linear Logistic-III
a
84 datasets with
moderate σ
0
2
10 datasets
with huge σ
0
2
Fixed effects
G0 =10 9.98 (0.17) 9.97 (0.17) 9.97 (0.18) 9.97 (0.17)
G1 =3 3.00 (0.12) 3.01 (0.14) 3.01 (0.15) 3.01 (0.14)
A[2] =1 =1 1.00 (0.05) 1.00 (0.05) 1.02 (0.06)
A[3] =2 =2 2.00 (0.07) 2.00 (0.07) 2.03 (0.08)
Random effects
σ
u
2
= 16 16.32 (0.62) 16.28 (0.60) 16.27 (0.58) 16.42 (0.76)
Additive Genetic
σ
a0
2
= 6 6.14 (2.53) 6.32 (2.45) 6.16 (2.45) 7.62 (2.12)
σ
a1
2
=2 2.21 (0.78) 2.18 (0.77) 2.20 (0.80) 1.93 (0.48)
σ
a01
= -1.5 -1.54 (1.25) -1.54 (1.18) -1.51 (1.19) -1.75 (1.13)
Shared Environ.
σ
c0
2
=4 3.95 (2.20) 3.76 (2.06) 3.89 (2.08) 2.63 (1.51)
σ
c1
2
=2 1.85 (0.66) 1.87 (0.67) 1.85 (0.66) 2.08 (0.70)
σ
c01
=-1 -1.05 (1.05) -1.00 (1.01) -1.02 (1.03) -0.84 (1.88)
Unique Environ.
σ
e0
2
=2 1.36 (0.70) 1.41 (0.68) 1.41 (0.71) 1.41 (0.38)
σ
e1
2
=2 1.73 (0.44) 1.75 (0.44) 1.76 (0.45) 1.70 (0.41)
σ
e01
= -0.5 -0.08 (0.43) -0.12 (0.45) -0.11 (0.45) -0.21 (0.40)
Total variance
σ
g0
2
= 12 11.45 (0.99) 11.48 (0.97) 11.46 (0.99) 11.66 (0.86)
σ
g1
2
= 6 5.79 (0.49) 5.80 (0.49) 5.81 (0.50) 5.71 (0.43)
σ
g01
= -3 -2.67 (0.54) -2.66 (0.55) -2.64 (0.54) -2.80 (0.61)
Logit Param.
0
= NA -- -6.71 (28.26) -1.49 (4.08) -50.55 (75.56)
σ
0
2
= NA -- 4.89E
6
(3.56E
7
) 445.24 (401.89) 4.60E
7
(1.05E
8
)
b
1
= 0 -2.00 (14.42) 0.02 (0.13) -18.91 (42.36)
b
2
= 0 -- 1.27 (8.90) 0.00 (0.10) 11.92 (26.02)
a. 6 of the 100 datasets did not converge after 110000 iterations with the first 10000 as
burn-in; a division of the converged 94 datasets was thus summarized in the right block
of this table.
73
Table 6.2: Results of fitting Bayesian Model-III to MNAR data
Parameter
50% MNAR (Case I) 50% MNAR (Case II)
Logistic-I
a
Logistic-III
a
Logistic-II
a
Logistic-III
b
Fixed effects
G0 =10 9.98 (0.18) 9.98 (0.18) 9.98 (0.18) 9.98 (0.17)
G1 =3 3.01 (0.13) 3.01 (0.14) 3.02 (0.17) 3.01 (0.12)
A[2] =1 1.00 (0.06) 1.00 (0.06) 0.99 (0.07) 1.00 (0.05)
A[3] =2 2.00 (0.08) 2.00 (0.09) 1.99 (0.10) 2.00 (0.05)
Random effects
σ
u
2
= 16 16.27 (0.66) 16.23 (0.66) 16.27 (0.68) 16.15 (0.55)
Additive
Genetic
σ
a0
2
= 6 6.20 (2.61) 6.25 (2.59) 6.27 (2.58) 6.50 (2.54)
σ
a1
2
=2 2.27 (0.83) 2.29 (0.83) 2.31 (0.81) 2.22 (0.83)
σ
a01
= -1.5 -1.63 (1.25) -1.64 (1.29) -1.64 (1.27) -1.70 (1.29)
Shared Environ.
σ
c0
2
=4 3.89 (2.19) 3.85 (2.19) 3.84 (2.18) 3.72 (2.26)
σ
c1
2
=2 1.86 (0.62) 1.95 (0.66) 1.87 (0.68) 1.89 (0.67)
σ
c01
=-1 -0.97 (1.01) -0.97 (1.05) -0.96 (1.07) -0.91 (1.04)
Unique Environ.
σ
e0
2
=2 1.43 (0.73) 1.43 (0.74) 1.41 (0.77) 1.43 (0.73)
σ
e1
2
=2 1.72 (0.43) 1.70 (0.43) 1.80 (0.49) 1.85 (0.68)
σ
e01
= -0.5 -0.10 (0.45) -0.10 (0.46) -0.11 (0.48) -0.14 (0.50)
Total variance
σ
g0
2
= 12 11.45 (0.99) 11.54 (1.03) 11.52 (1.05) 11.66 (1.08)
σ
g1
2
= 6 5.79 (0.49) 5.94 (0.54) 5.98 (0.74) 5.96 (0.82)
σ
g01
= -3 -2.67 (0.54) -2.71 (0.54) -2.71 (0.53) -2.76 (0.53)
Logit Param.
0
=-32.5 -36.82 (6.06) -20.27 (2.86) =-18 -28.93 (10.33) -31.04 (8.51)
σ
0
2
= 0 -- 0.22 (0.43) =25 89.98 (58.51) 61.51 (36.24)
b
1
=1 1.13 (0.18) 0.63 (0.09) =1 1.60 (0.56) 1.36 (0.38)
b
2
=0.8 0.91 (0.15) 0.50 (0.07) =0 -- -0.02 (0.05)
a. 1 of the 100 datasets did not converge after 110000 iterations with the first 10000 as burn-in.
b. 3 of the 100 datasets did not converge after 110000 iterations with the first 10000 as burn-in.
74
6.4 Summary and Discussion
In general, the Bayesian approach was capable of detecting true models even
under misspecification. For each condition of missing data in this simulation study
(MCAR, MANR Case I, and MNAR Case II), the Bayesian logistic model-III was
able to decipher the true model from the data. They might not be as accurate as the
correctly specified models (for example, Bayesian logistic model-I for MANR Case I)
in estimating random components like A, C and E, but they demonstrated comparable
performances on all fixed effects and also fewer biases on total variances and residual
error.
In addition, the estimates for the logit part were very sensitive. Even though
the assumption about the missingness mechanism was inappropriate and the wrong
logit part was thus formulated in this study, it performed very well in correctly
rejecting the inappropriate assumption by comparing parameters in the logistic
regression equation with value zero. Take the MCAR data for example, when fitted
with the Bayesian logistic model-III, the values of b
1
(M=0.02, SD=0.13) and b
2
(M=0.00, SD=0.10) were both almost close to zero (in group 1). By examining the
sample distributions of 95% credible intervals for means of both b
1
and b
2
, their
actual type-I errors of incorrectly rejecting the null hypothesis, i.e., that the
probability an individual was missing at T4 depended on Y values of both that
individual and his/her sibling, were very close to the nominal level 0.05.
An interesting result found in the MCAR data was that convergence of the
data was less likely to be achieved than the other conditions. When fitting MCAR
data with the Bayesian logistic model-III, 6 out of 100 failed to converge after
75
110000 iterations and these disconvergences were obviously impossible to be mended
by increasing iterations (see Figure A.2 in the Appendix A for an example). In
addition, even among the remaining 94 converged datasets, there were 10 outliers
ones with huge variances in the intercept. This may suggest that these abnormal
results per se were also a sign that the Bayesian logistic model-III was not appropriate
for the dataset and the assumption thus needs to be revised.
However, not every misspecification could be detected as well as the Bayesian
logistic model-III performed in this study. Fitting the Bayesian logistic model-I to
MNAR data Case II or fitting the Bayesian logistic model-II to MNAR data Case I
could also be distorting and achieve invalid conclusions (see Tables A.4 and A.5 in
Appendix A for details of these results). The key factor that enabled misspecification
to be detected in this study lay in the fact that the Bayesian logistic model-III was a
more ‘saturated’ model than Bayesian logistic model-I and model-II. Put another way,
the Bayesian logistic model-I and model-II were just reduced versions of the
Bayesian logistic model-III with either one fixed effect or random effect to be zero.
Therefore, in real studies, when exploring the missingness pattern of empirical
data, it is always safer to start from the most saturated logit model with the observed
variable itself as well as all potentially relevant covariates (e.g., ethinicity, gender,
social economic status, etc.) as predictors, and then decide, mostly through credible
intervals or significance test, whether it is appropriate to switch to a reduced model
by removing some predictor or random effect. This procedure is very similar to the
backward elimination in stepwise regression (Hocking, 1976).
76
Chapter 7: Revisiting the Empirical Study
In this chapter, the empirical longitudinal twin data set of CPS was revisited. The
Bayesian logistic models were fitted to the data and compared to results in the ML non-
linear model in chapter 1.
7.1 Result
Two Bayesian logistic models (logistic model-III and logistic model-IV) were
fitted to investigate the underlying genetic and environmental etiology of individual
differences in both level (intercept) and change (slope) in psychopathic personality. The
Bayesian logistic model-III was the same as that in simulation study III, predicting the
attrition probability at wave 4 based on both the subject’s and his/her sibling’s levels of
psychopathic traits, while the Bayesian logistic model-IV added one more predictor
(gender) into the logit model. The purpose of model-IV was to examine whether different
gender groups differed on attrition probability.
Estimates from these two models suggested that females were less likely to drop
than males at wave 4 (b
3
= -3.15, CI: -6.36, 0.60). After taking into gender differences on
attrition into account, the probability that a subject was missing at wave 4 was
significantly positively related only with his/her own levels of psychopathic traits at this
wave (b
1
= 1.33, CI: 0.17, 2.28), with
0
= -17.48, σ
0
2
=675.33.
The fixed effects of this model were: G0 = 11.08, G1= 0.28 and A[t] = (0, -0.48,
2.46, 3). Random effects from this model can be used to derive the relative effects of
genetic and environmental influences on both level and change in psychopathic
personality. The total variance of G0 was composed of the additive genetic component
77
Table 7.1: Results of fitting different models to the CPS data
Parameter
ML
Non-linear
Bayesian
Logistic-III
Bayesian
Logistic-IV
Fixed effects
G0 11.11 11.08 11.08
G1 0.20 0.29 0.28
A[2] -1.18 -0.46 -0.48
A[3] 3.49 2.45 2.46
Random effects
σ
u
2
2.35 2.23 2.23
Additive Genetic
σ
a0
2
0.94 0.96 0.97
σ
a1
2
0.04 0.07 0.07
σ
a01
0.02 -0.07 -0.07
Shared Environ.
σ
c0
2
0.53 0.54 0.53
σ
c1
2
0.01 0.06 0.06
σ
c01
-0.08 -0.05 -0.05
Unique Environ.
σ
e0
2
0.22 0.28 0.28
σ
e1
2
0.00 0.06 0.06
σ
e01
0.01 -0.02 -0.01
Total variance
σ
g0
2
1.69 1.78 1.78
σ
g1
2
0.05 0.19 0.19
σ
g01
-0.05 -0.14 -0.13
Logit Parameters
Logit Param.
-- --
0
-20.17 -17.48
-- -- σ
0
2
440.0 675.33
-- -- b
1
1.18 (0.64-1.93) 1.33 (0.17, 2.28)
b
2
0.88 (0.31-1.61) 0.97 (-0.32, 1.94)
b
3
-3.15 (-6.36, -0.60)
Note. b1 is the regression coefficient measuring the association between the probability
that an individual dropped out at wave 4 and his/her own level of psychopathic traits at
wave 4; b2 is the regression coefficient for his/her sibling’s level of psychopathic traits;
and b3 is the regression coefficient for this individual’s gender.
78
(σ
a0
2
)
,
the shared environmental component (σ
c0
2
) and the non-shared environmental
component (σ
e0
2
): 0.97+0.53+0.28 = 1.78. Therefore, the relative genetic influence due to
additive genetic effects was 54% (=0.97/1.78). The majority of the variance in level of
psychopathic personality was explained by genetic effects. Shared environment explained
30% (=0.53/1.78) of the variance while non-shared environment accounted for the
remaining 16% (=0.28/1.78) of the variance. Similarly, relative effects of genetic and
environmental influences can be calculated for the variation in the slope G1, total
variance = 0.07+0.06+0.06 = 0.19, such genetic accounted for 36% of the slope variance
while shared environmental and non-shared environmental effects each explained 32% of
the slope variance. The overall covariance between the intercept G0 and slope G1 was
negative: -0.07-0.05-0.01 = -0.13, which means that the higher level at the initial wave,
the slower subjects’ levels of psychopathic traits would increase during the foll owing
three waves. Finally, the residual variance unique to each wave (σ
u
2
) was 2.23.
7.2 Discussion
Differences between ML and Bayesian models
First of all, the results of the logit part in the Bayesian logistic-III model implied
that subjects missing at T4 tended to be those with higher levels on psychopathic traits.
After taking this into account, the Bayesian logistic-III model produced very different
results from those of the ML non-linear model. The first major difference occurred in the
trajectory of the non-linear change. Instead of a cubic trajectory in ML non-linear model,
the development of psychopathic traits from ages 9-18 exhibited a quadratic trajectory
with the nadir at T2 in the Bayesian logistic model. The variation in slope and also the
covariance between intercept and slope were largely boosted, with the former being more
79
than three times and the latter more than twice the corresponding part in the ML model.
When decomposed into different sources, it could be seen that this increment of variation
was mostly attributed to the shared and non-shared environmental effect of slope and
then to the genetic influences between intercept and slope. Since more variability was
accounted for by the genetic and environmental effects of the model, the residual error
remaining was thus smaller than that of the ML model. Therefore, the influence of the
MNAR was large and consideration of its existence could yield very different
conclusions from those of regular models. The implications of these different conclusions
regarding the theory of psychopathy itself receive detailed discussion in the following
part.
Discussion regarding theory of psychopathy
This study used a longitudinal sample to investigate the development pathways of
psychopathic traits from childhood to late adolescents. By use of the twin design, how the
genetic and environmental effects contributed to individual differences of this
developmental path were also examined. The main findings of this study suggested that:
(1) the development trajectory of psychopathic traits from childhood to adolescence took
a quadratic pattern, with mean level declining from ages 9-10 to ages 11-13, but then kept
accelerating through ages 14-18; (2) the majority of variance at initial level (the intercept)
was attributed to the genetic influences (54%), while the shared environmental effects
explained 30% and non-shared environmental effects only explained the remaining 16%;
(3) the contribution of genetic, shared environmental and non-shared environmental
effects were almost equal in explaining the variation in slope (36% , 32% and 32%
respectively).
80
The finding that psychopathic traits developed with a turning point around ages
11-13 was an interesting one. The age range 11-13 is usually the transition period from
childhood to early adolescence and children experience various significant changes on
both social and biological levels. On the one hand, most subjects just finish elementary
school and enter their 1
st
year in middle school in terms of education level. They begin to
spend more time with peers and less time with family and also form their identity by
exploring different clothes, hairstyles, friends, music, and hobbies (PAM, 2001). On the
other hand, subjects begin puberty characterized by a series of physiological changes in
height, weight, body composition, circulatory and respiratory systems, hormone
production (Marshal, 1978) and also brain structure (Choudhury et al., 2006).
In parallel with these changes, it is not surprising to find that the stability of
antisocial personality between ages 14 to age 18 was larger than that between ages 10 and
14 (Farrington, 2005). In another study on longitudinal invariance and temporal stability
of interpersonal callousness (IC, an important component of psychopathy) from
childhood to adolescence, behavioral manifestations of IC based on parent reports was
found to vary with age ranges, with a first similar factor structure loading at ages 8-11
and a second invariant one at ages 12-16 (Obradović, Pardini, Long, & Loeber, 2007) .
Both studies, together with results of this study, suggest that there must be something
different happening around ages 11-13 in the developmental pathway of psychopathic
traits. It deserves further efforts to investigate the underlying mechanism at this turning
point, for example, whether relationships of psychopathic personality traits with poor
decision-making, cognition, electrodermal and cardiovascular indicators (Gao et al., 2009)
81
at this age range differs from other age ranges and what factors may contribute to the
decline found at this period.
To date, no study has ever investigated the contributions of genetic and
environmental factor to variance in changes of psychopathic traits from childhood to late
adolescence. Therefore, results of this study could only be compared with a recent study
investigating antisocial behavior utilizing biometric growth models. In that study, genetic
influences were largely responsible for the initial level of antisocial behavior, while non-
shared environmental influences were primarily responsible for change from late
adolescence through mid-adulthood (Burt et al., 2007). In contrast to Burt et al., results in
this study found there were still moderate effects of shared environment on the initial
level of psychopathic traits, and in the meantime, genetic, shared environment and non-
shared environment exerted almost comparable influences on change of psychopathic
traits from childhood to adolescence.
Divergence between Burt et al.’s study and this study might arise from several
aspects. First, psychopathy is a multi-dimension construct characterized by a cluster of
distinct facets: callous-unemotional affective traits, arrogant-deceitful interpersonal traits,
and impulsive-irresponsible behavioral traits (Hare, 1991; Lynam, 1997) as well as
behavioral deviance (externalizing) feature, e.g., the conduct problems, juvenile
delinquency, and criminal versatility as a supplemental facet (Frick & Hare, 2001; R. D.
Hare & Neumann, 2006). As a construct tapping into more personality facets than
antisocial behaviors alone, it also encompasses more factors leading to its originations
and development. Another reason for this divergence could be different development
stage (childhood to late adolescence) in this study as compared to Burt et al.’s study (late
82
adolescence to early adulthood). Subjects in this study were obviously at a life stage that
was more connected with families and peers than those from late adolescence to early
adulthood, which thus left more space for the influences of shared-environmental factors.
A few limitations in the present study must be considered when interpreting these
findings. First, it needs to be noted that the way this study modeled change of
psychopathic traits on the basis of wave merged 2-3 different ages together. This strategy
could attenuate or even mask the real trend due to merging values at the turning point
with those at other ages in the same wave. A future direction of this study could be use
age as the basis for modeling the change of psychopathic traits so that it would be more
sensitive and accurate in detecting where the turning point exactly is. To achieve this goal,
Bayesian estimation could also be employed, which turned out to be a better approach
than conventional method in estimating unknown change points like this (Wang &
McArdle, 2008).
Another limitation was inadequate use of the sample. Researchers on psychopathy
have until recently relied almost exclusively on male samples. Relatively little is known
about psychopathy in females and findings on sex differences in psychopathy are mixed
given the few existing studies (Cale & Lilienfeld, 2002). Despite the number of female
subjects approximately large as that of males in the current sample, analysis did not
differentiate between these two (due to some technical consideration which will be
discussed in the following chapter), leaving potentially important sex differences
undetected. This lack of discrimination, together with the approach modeling change on
the basis of merged ages, may inflate the estimate of shared environmental effects in the
current study.
83
In conclusion, our results demonstrated that the change of psychopathic traits
from childhood to late adolescence followed a quadratic pattern with a turning point
around ages 11-13. This is a period meriting further investigation regarding relationships
of psychopathic traits with other (physiological, neural, executive function, etc.)
correlates in comparison to other periods. Identification of relevant correlates at this
turning period may help decipher the etiology of psychopathy and facilitate its prevention.
Although the variance of the initial level was largely attributed to the genetic factor,
individual differences in changes of psychopathic traits were under almost equal
influences from genetic, shared environmental and non-shared environmental factors.
This implies that the developmental pathway of psychopathic traits from childhood to
adolescence is subject to much plasticity and could be intervened upon to a large extent if
appropriate treatments were available.
84
Chapter 8: Conclusions and Integration
The overarching aims of this research were to find a way to evaluate assumptions
about missingness pattern in the data and minimize its adverse effects on statistical
inferences and conclusions if data are missing not at random. The Bayesian approach,
implemented through the MCMC algorithms, provided possible bridges to these aims and
received multi-phase examinations under a longitudinal twin design in this research. In
the first simulation study, the Bayesian approach and the direct ML estimation were
compared under the condition of MCAR data. Their robustness to varying fractions of
missing data (0%, 30% and 50%) was examined. The Bayesian approach performed as
well as the direct ML estimation in the fixed effects and a subset of random effects
including A components and C components. The Bayesian approach, however, was not
effective in differentiating the residual error from E components of latent factors (G0 and
G1) and not as robust as ML estimation in these two types of random effects.
Despite this shortcoming, the Bayesian approach exhibited satisfactory capability
for deciphering the true model from MNAR data in the stimulation study II. In this study,
the results of fitting the regular Bayesian non-linear and ML non-linear models showed
how far the statistical inferences and conclusions could be from the true model if one
mistakenly treated MNAR data as MCAR. By explicitly formulating appropriate
assumption about the missingness pattern via the logistic regression and incorporating
this logistic regression as an extra part of the whole model to be estimated, the Bayesian
logistic approach succeeded in detecting the linear growth pattern from the seemingly
85
non-linear data due to MNAR. The estimates of the random effects model were almost as
accurate as those in the MCAR datasets.
The third simulation study aimed to assess whether the Bayesian approach was
sensitive to misspecification given that in real studies, it is usually impossible to know for
sure whether a specific dataset were missing at random or not, not to mention what the
mechanism leading to MNAR data might be. The results suggested that, as long as the
logistic regression equation in the Bayesian logistic model was a saturated model taking
as many predictors as possible into account, the Bayesian approach could correctly detect
misspecification with an acceptable type-I error.
This flexibility in defining models and the excellent ability to capture the true
models, nonetheless, have a price. It needs to be pointed out that MCMC algorithm, the
core element of the Bayesian approach, is highly demanding in terms of the computation
time. For a desktop computer with Intel i5 2400 CPU of 3.10GHZ and a RAM of 4GB, it
took the computer around 5 minutes to run 10000 iterations and around 45 minutes to run
100000 iterations in OpenBUGS on average. This is much longer compared to only 10
seconds used by Mplus. The convergence of multiple chains is sometimes not easy and
becomes more difficult with increase in model complexity (see Appendix A for some
examples of trace plots and density plots for two parallel chains).
Also, model comparison in the Bayesian approach is not as straightforward as for
the ML methods. In the ML estimation, model fit is usually assessed by chi-square or
BIC (Schwarz, 1978). Comparison between models is well-established and
straightforward by checking the significance of the chi-square change (if models nested)
or magnitude of BIC value (if models not nested). There are also various criteria
86
available in the Bayesian approach for assessing model fit, among which the most
commonly used are deviance information criterion (DIC; Spiegelhalter et al., 2002),
posterior predictive loss (PPI; Gelfand & Ghosh, 1998), and Bayes factor (Berger &
Pericchi, 1996). There are advantages and disadvantages for each of these criteria and no
agreement has been reached about which one is the best to use by far.
Take DIC for example: it could be directly obtained from outputs of OpenBUGS.
However, there is no established standard about to what extent the decrease of DIC is
large enough to accept the alternative model. A rough range of 5-10 is usually thought to
be large but if alternative models yield much different inferences, a value less than 5 is
also accepted (Speigelhalter et al., 2002). In addition, DIC is not invariant to the
parameterization of in the model (Daniels & Hogan, 2007). All these technical concerns
pose difficulties in comparing model fit to the data and this is why the issue of sex
differences is not considered in modeling changes of CPS in this study. A better strategy
for comparing group differences may be the model comparison approach (Kruschke,
2011), where data are assumed to follow a mixture distribution consisting of the null
model hypothesizing no group difference and the alternative model hypothesizing group
difference. The posterior probability of these two models given the data will be estimated
and then compared. The model with larger posterior probability will be finally accepted.
However, these drawbacks should not hinder further applications of the Bayesian
estimation in the SEMs. With the development of computer technology and improvement
of Bayesian theory and the MCMC algorithms themselves, the Bayesian method will be a
promising alternative to ML estimation for fitting SEMs in the future. Muthén et al.
87
(2010), for example, have been devoting efforts to incorporating the Bayesian approach
with MCMC estimation as a new feature to the new version of Mplus in recent years.
There is still a lot to explore about the use of the Bayesian approach regarding its
application in dealing with missing data problems. Future lines of this research could be
diverse. In terms of the CPS study per se, it needs to be noted that missingness in the
empirical data might depend not only on the CPS total scores that were not observed
(missing outcomes), but also on the developmental trends (latent factors like the intercept
and slope), and qualitatively different outcome trajectory types (latent trajectory classes).
Although there has been no direct evidence supporting the heterogeneity in
developmental trajectories of psychopathic traits available by far, the developmental
taxonomy of antisocial behaviors (Moffitt, 1993) was well-established and widely
verified by many studies (Brame et al., 2001; Côté et al., 2006; Nagin & Tremblay, 1999;
Silberg et al., 2007). Since antisocial tendency is an important facet of psychopathy, it is
thus of particular interests to investigate the heterogeneity of psychopathic traits
development and how to deal with non-ignorable missingness depending on this
heterogeneity in future studies.
As mentioned in simulation study I, it is open to question that how estimate bias
varies as a function of various factors, like sample size of level-2 units (parallel to sample
size of families in the current research), the ratio of intercept variance to the residual error,
the intra-class correlation, measure of center in posterior distribution (mean, median, or
mode), etc. The choice of prior distributions could be another important subject. Taking
demographic covariates like gender, ethnicity, SES, etc. and their groups differences into
consideration would also make the conclusions more ecologically valid.
88
References
Allison, P. D. (2003). Missing data techniques for structural equation modeling. Journal
of Abnormal Psychology, 112(4), 545-557.
Ansari, A., Jedidi, K., & Dube, L. (2002). Heterogeneous factor analysis models: A
Bayesian approach. Psychometrika, 67(1), 49-77.
Arbuckle, J. L. (1996). Full information estimation in the presence of incomplete data.
Advanced structural equation modeling: Issues and techniques, 243-277.
Azen, S., & Van Guilder, M. (1981). Conclusions regarding algorithms for handling
incomplete data.
Baker, L. A., Barton, M., Lozano, D. I., Raine, A., & Fowler, J. H. (2006). The Southern
California Twin Register at the University of Southern California: II. Twin
research and human genetics: the official journal of the International Society for
Twin Studies, 9(6), 933-940.
Berger, J. O., & Pericchi, L. R. (1996). The intrinsic Bayes factor for model selection and
prediction. Journal of the American Statistical Association, 109-122.
Blonigen, D. M., Hicks, B. M., Krueger, R. F., Patrick, C. J., & Iacono, W. G. (2006).
Continuity and change in psychopathic traits as measured via normal-range
personality: A longitudinal-biometric study. Journal of Abnormal Psychology,
115(1), 85.
Brame, B., Nagin, D. S., & Tremblay, R. E. (2001). Developmental trajectories of
physical aggression from school entry to late adolescence. Journal of Child
Psychology and Psychiatry, 42(4), 503-512.
Browne, W. J., & Draper, D. (2006). A comparison of Bayesian and likelihood-based
methods for fitting multilevel models. Bayesian Analysis, 1(3), 473?414.
Burt, S. A., McGue, M., Carter, L. A., & Iacono, W. G. (2007). The different origins of
stability and change in antisocial personality disorder symptoms. Psychological
medicine, 37(01), 27-38.
Cai, L. (2010). High-dimensional Exploratory Item Factor Analysis by A Metropolis-
Hastings Robbins-Monro Algorithm. Psychometrika, 75(1), 33-57.
Cale, E. M., & Lilienfeld, S. O. (2002). Sex differences in psychopathy and antisocial
personality disorder: A review and integration. Clinical psychology review, 22(8),
1179-1207.
89
Casella, G., & Berger, R. L. (2002). Statistical inference (2 ed.). New Delhi: Wadsworth.
Choudhury, S., Blakemore, S. J., & Charman, T. (2006). Social cognitive development
during adolescence. Social Cognitive and Affective Neuroscience, 1(3), 165-174.
Côté, S., Vaillancourt, T., LeBlanc, J. C., Nagin, D. S., & Tremblay, R. E. (2006). The
development of physical aggression from toddlerhood to pre-adolescence: A
nation wide longitudinal study of Canadian children. Journal of Abnormal Child
Psychology, 34(1), 68-82.
Daniels, M. J., & Hogan, J. W. (2007). Missing data in longitudinal studies: strategies
for bayesian modeling and sensitivity analysis. Boca Raton, FL: Chapman &
Hall/CRC.
Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from
incomplete data via the EM algorithm. Journal of the Royal Statistical Society.
Series B (Methodological), 39(1), 1-38.
Eaves, L., & Erkanli, A. (2003). Markov Chain Monte Carlo approaches to analysis of
genetic and environmental components of human developmental change and GxE
interaction. Behavior Genetics, 33(3), 279-299.
Eaves, L., Erkanli, A., Silberg, J., Angold, A., Maes, H. H., & Foley, D. (2005).
Application of Bayesian inference using Gibbs sampling to item-response theory
modeling of multi-symptom genetic data. Behavior genetics, 35(6), 765-780.
Edwards, M. C. (2010). A Markov chain Monte Carlo approach to confirmatory item
factor analysis. Psychometrika, 75(3), 474-497.
Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information
maximum likelihood estimation for missing data in structural equation models.
Structural Equation Modeling: A Multidisciplinary Journal, 8(3), 430-457.
Farrington, D. P. (2005). The importance of child and adolescent psychopathy. Journal of
abnormal child psychology, 33(4), 489-497.
Forsman, M., Lichtenstein, P., Andershed, H., & Larsson, H. (2008). Genetic effects
explain the stability of psychopathic personality from mid-to late adolescence.
Journal of abnormal psychology, 117(3), 606.
Frick, P. J., & Hare, R. D. (2001). The antisocial process screening device. Toronto:
Multi-Health Systems.
Gao, Y., Glenn, A. L., Schug, R. A., Yang, Y., & Raine, A. (2009). The neurobiology of
psychopathy: a neurodevelopmental perspective. Canadian Journal of
Psychiatry/Revue Canadienne de Psychiatrie, 31(12), 813.
90
Gelfand, A. E., & Ghosh, S. K. (1998). Model choice: A minimum posterior predictive
loss approach. Biometrika, 85(1), 1-11.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models.
Bayesian analysis, 1(3), 515-533.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2004). Bayesian Data Analysis
(2nd ed.). Boca Raton, FL: Chapman & Hall/CRC.
Gelman, A., & Rubin, D. B. (1992). Inference from iterative simulation using multiple
sequences. Statistical Science, 7, 457-472.
Geman, S., & Geman, D. (1993). Stochastic relaxation, Gibbs distributions and the
Bayesian restoration of images. Journal of Applied Statistics, 20(5), 25-62.
Geweke, J. (1991). Evaluating the accuracy of sampling-based approaches to the
calculation of posterior moments. Federal Reserve Bank of Minneapolis,
Research Dept.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (Eds.). (1996). Markov Chain Monte
Carlo in practice. London: Chapman & Hall.
Haitovsky, Y. (1968). Missing data in regression analysis. Journal of the Royal Statistical
Society. Series B (Methodological), 30(1), 67-82.
Hare, R. D. (1991). Manual for the revised psychopathy checklist. Toronto: Multi-Health
Systems.
Hare, R. D. (2002). Psychopathy and risk for recidivism and violence. Criminal justice,
mental health, and the politics of risk. London: Cavendish Publishing, 27-47.
Hare, R. D. (2003). The Hare Psychopathy Checklist-Revised (PCL-R) (2nd ed.). Toronto,
Ontario, Canada: Multi-Health Systems.
Hare, R. D., & Neumann, C. S. (2006). The PCL-R assessment of psychopathy:
Development, structural properties, and new directions. In C. J. Patrick (Ed.),
Handbook of psychopathy (pp. 58-88). New York: Guilford Press.
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their
applications. Biometrika, 57(1), 97.
Heidelberger, P., & Welch, P. D. (1981). A spectral method for confidence interval
generation and run length control in simulations. Communications of the ACM,
24(4), 233-245.
91
Heidelberger, P., & Welch, P. D. (1983). Simulation run length control in the presence of
an initial transient. Operations Research, 1109-1144.
Hocking, R. R. (1976). The Analysis and Selection of Variables in Linear Regression,.
Biometrics, 32(1), 1-49.
Isen, J., Raine, A., Baker, L., Dawson, M., Bezdjian, S., & Lozano, D. I. (2011). Sex-
specific association between psychopathic traits and electrodermal reactivity in
children. Journal of abnormal psychology, 119(1), 216.
Kass, R. E., Carlin, B. P., Gelman, A., & Neal, R. M. (1998). Markov chain Monte Carlo
in practice: a roundtable discussion. American Statistician, 93-100.
Kim, J. O., & Curry, J. (1977). The treatment of missing data in multivariate analysis.
Sociological Methods & Research, 6(2), 215.
Kruschke, J. K. (2011). Doing Bayesian Data Analysis. Oxford: Elsevier Inc.
Little, R. J. A., & Rubin, D. B. (1987). Statistical analysis with missing data.
Little, R. J. A., & Rubin, D. B. (2002). Statistical Analysis with missing data (2nd ed.).
Hoboken, New Jersey: John Wiley & Sons, Inc.
Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project:
Evolution, critique and future directions (with discussion). Statistics in Medicine,
28, 3049--3082.
Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS -- a Bayesian
modelling framework: concepts, structure, and extensibility. Statistics and
Computing, 10, 325--337.
Lynam, D. R. (1997). Pursuing the psychopath: Capturing the fledgling psychopath in a
nomological net. Journal of Abnormal Psychology, 106(3), 425-438.
Marshal, W. (1978). Puberty. In F. Falkner & J.Tanner (Eds.), Human growth (Vol. 2).
New York: Plenum.
McArdle, J. J. (1986). Latent variable growth within behavior genetic models. Behavior
Genetics, 16(1), 163-200.
McArdle, J. J., & Hamagami, F. (2003). Structural equation models for evaluating
dynamic concepts within longitudinal twin analyses. Behavior genetics, 33(2),
137-159.
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55(1), 107-122.
92
Metropolis, N., Rosenbluth, A. W., Rosenbluth, M. N., Teller, A. H., & Teller, E. (1953).
Equation of state calculations by fast computing machines. The journal of
chemical physics, 21(6), 1087.
Moffitt, T. E. (1993). Adolescence-limited and life-course-persistent antisocial behavior:
A developmental taxonomy. Psychological Review, 100(4), 674-701.
Muthén, B., & Asparouhov, T. (2010). Bayesian SEM: A more flexible representation of
substantive theory. Submitted for publication. Los Angeles: Muthén & Muthén.
www.statmodel.com
Muthén, B., Kaplan, D., & Hollis, M. (1987). On structural equation modeling with data
that are not missing completely at random. Psychometrika, 52(3), 431-462.
Muthén, L. K., & Muthén, B. O. (2000a). Mplus: Statistical analysis with latent variables
technical appendices. Los Angeles, CA: Muthén & Muthén
Muthén, L. K., & Muthén, B. O. (2000b). Mplus: The comprehensive modeling program
for applied researchers: User's guide. Los Angeles, CA: Muthén & Muthén
Nagin, D., & Tremblay, R. E. (1999). Trajectories of boys' physical aggression,
opposition, and hyperactivity on the path to physically violent and nonviolent
juvenile delinquency. Child development, 70(5), 1181-1196.
Neumann, C. S., & Hare, R. D. (2008). Psychopathic traits in a large community sample:
Links to violence, alcohol use, and intelligence. Journal of Consulting and
Clinical Psychology, 76(5), 893.
Nielsen, S. F. (2003). Proper and improper multiple imputation. International Statistical
Review, 71(3), 593-607.
Obradović, J., Pardini, D. A., Lon g, J. D., & Loeber, R. (2007). Measuring interpersonal
callousness in boys from childhood to adolescence: An examination of
longitudinal invariance and temporal stability. Journal of Clinical Child and
Adolescent Psychology, 36(3), 276-292.
PAM. (2001). Teenage Growth & Development: 11-14 Years. from
http://www.pamf.org/
Plummer, M., Best, N., Cowles, K., & Vines, K. (2006). CODA: Convergence Diagnosis
and Output Analysis for MCMC. R News, 6(1), 7-11.
Raftery, A. E., & Lewis, S. M. (1992). One Long Run with Diagnostics: Implementation
Strategies for Markov Chain Monte Carlo. Statistical Science, 7(4), 493-497.
93
Raftery, A. E., & Lewis, S. M. (1995). The number of iterations, convergence diagnostics
and generic Metropolis algorithms. Practical Markov Chain Monte Carlo, 115-
130.
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581.
Rubin, D. B. (1978). Multiple imputation in sample surveys. Proceedings of the Survey
Research Methods Section, American Statistical Association, 20-34.
Rubin, D. B. (1987). Multiple imputation for nonresponse in surveys (Vol. 519): Wiley
Online Library.
Rubin, D. B. (2003). Discussion on multiple imputation. International Statistical Review,
71(3), 619-625.
SAS. (2004). SAS/STAT@Software: Version 9. Cary, NC: SAS Institute, Inc.
Schwarz, G. (1978). Estimating the dimension of a model. The annals of statistics, 6(2),
461-464.
Silberg, J. L., Rutter, M., Tracy, K., Maes, H. H., & Eaves, L. (2007). Etiological
heterogeneity in the development of antisocial behavior: the Virginia Twin Study
of Adolescent Behavioral Development and the Young Adult Follow-Up.
Psychological medicine, 37(08), 1193-1202.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian
measures of model complexity and fit. Journal of the Royal Statistical Society:
Series B (Statistical Methodology), 64(4), 583-639.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data
augmentation. Journal of the American statistical Association, 82(398), 528-540.
Thomas, D. C., & Gauderman, W. J. (1996). Gibbs sampling methods in genetics. In W.
R. Gilks, S. Richardson & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo
in practice (pp. 419-440). London: Chapman & Hall.
Van den Berg, S. M. (2009). Imposing Nonlinear Constraints When Estimating Genetic
and Cultural Transmission Under Assortative Mating: A Simulation Study Using
Mx and BUGS. Behavior genetics, 39(1), 123-131.
Van den Berg, S. M., Beem, L., & Boomsma, D. I. (2006). Fitting genetic models using
Markov chain Monte Carlo algorithms with BUGS. Twin Research and Human
Genetics, 9(3), 334-342.
Van den Berg, S. M., Glas, C. A. W., & Boomsma, D. I. (2007). Variance decomposition
using an IRT measurement model. Behavior genetics, 37(4), 604-616.
94
van Dyk, D. A., & Meng, X. L. (2001). The art of data augmentation. Journal of
Computational and Graphical Statistics, 10(1), 1-50.
Wang, L., & McArdle, J. J. (2008). A simulation study comparison of Bayesian
estimation with conventional methods for estimating unknown change points.
Structural Equation Modeling: A Multidisciplinary Journal, 15(1), 52-74.
Wothke, W. (2000). Longitudinal and multigroup modeling with missing data. In T. D.
Little, K. U. Schnabel & J. Baumert (Eds.), Modeling longitudinal and multiple
group data: Practical issues, applied approaches and specific examples (pp. 219-
240). Mahwah, NJ: Erlbaum.
Yuan, K. H., Bentler, P. M., & Zhang, W. (2005). The effect of skewness and kurtosis on
mean and covariance structure analysis. Sociological methods & research, 34(2),
240.
Zhang, P. (2003). Multiple imputation: theory and method. International Statistical
Review, 71(3), 581-592.
95
Appendix A: Supplementary Tables and Figures
Table A.1: Expected sample covariance matrix for the simulated data based on the true
parameters
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 10 7.5 5 2.5 28
Y12 7.5 9 10.5 12 9 28
Y13 5 10.5 16 21.5 6 15 40
Y14 2.5 12 21.5 31 3 18 33 64
Y11 7 5.25 3.5 1.75
Y12 5.25 6.5 7.75 9
Y13 3.5 7.75 12 16.25
Y14 1.75 9 16.25 23.5
Table A.2: Sample covariance matrix of simulated data average across 100 replications
(a) Complete Data
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 10.09 7.67 5.14 2.48 27.94
Y12 7.54 8.91 10.71 11.98 9.01 28.01
Y13 5.04 10.45 16.38 21.80 6.14 15.16 40.11
Y14 2.45 12.02 22.02 31.61 3.08 18.15 33.14 64.00
DZ
Y11 7.03 5.21 3.23 1.72
Y12 5.35 6.60 7.64 8.92
Y13 3.57 7.59 11.76 16.02
Y14 1.51 8.56 15.80 23.21
96
(b) MCAR 30%
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 10.09 7.72 5.11 2.50 27.94
Y12 7.65 9.03 10.93 12.00 9.04 28.09
Y13 5.18 10.68 16.20 21.08 6.18 15.18 39.98
Y14 2.47 11.89 21.72 31.33 3.20 18.25 32.99 64.11
DZ
Y11 7.03 5.18 3.30 1.80
Y12 5.51 6.53 7.40 9.19
Y13 3.60 7.39 11.60 16.19
Y14 1.57 8.50 15.82 23.35
(c) MCAR 50%
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 10.09 7.56 5.08 2.10 27.94
Y12 7.85 8.73 11.32 12.04 9.03 28.05
Y13 4.90 10.78 16.48 20.65 6.17 15.44 40.24
Y14 2.53 11.66 21.37 30.94 3.01 18.35 32.58 63.44
DZ
Y11 7.03 5.10 3.54 1.66
Y12 5.05 6.79 7.45 8.74
Y13 3.40 7.67 11.39 15.07
Y14 1.35 8.92 14.93 23.07
97
(d) MNAR 30% (Case I)
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 10.09 7.72 5.11 1.02 27.94
Y12 7.65 9.03 10.93 4.75 9.04 28.09
Y13 5.18 10.68 16.20 7.79 6.18 15.18 39.98
Y14 1.02 4.77 8.22 6.79 1.77 10.52 19.06 38.98
DZ
Y11 7.03 5.18 3.30 0.46
Y12 5.51 6.53 7.40 2.57
Y13 3.60 7.39 11.60 4.14
Y14 0.23 1.71 3.89 1.14
(e) MNAR 50% (Case I)
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 10.09 7.56 5.08 0.59 27.94
Y12 7.85 8.73 11.32 2.56 9.03 28.05
Y13 4.90 10.78 16.48 3.93 6.17 15.44 40.24
Y14 0.15 2.12 4.47 0.07 1.25 8.27 15.04 31.80
DZ
Y11 7.03 5.10 3.54 0.35
Y12 5.05 6.79 7.45 0.29
Y13 3.40 7.67 11.39 -0.01
Y14 -0.31 -0.30 -0.21 -5.23
98
(f) MNAR 30% (Case II)
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 10.09 7.72 5.11 1.42 27.94
Y12 7.65 9.03 10.93 7.99 9.04 28.09
Y13 5.18 10.68 16.20 13.92 6.18 15.18 39.98
Y14 1.26 7.80 13.99 18.52 1.97 11.50 20.80 40.49
DZ
Y11 7.03 5.18 3.30 1.21
Y12 5.51 6.53 7.40 5.76
Y13 3.60 7.39 11.60 10.59
Y14 0.85 5.26 9.83 14.15
(g) MNAR 50% (Case II)
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 10.09 7.56 5.08 1.00 27.94
Y12 7.85 8.73 11.32 6.77 9.03 28.05
Y13 4.90 10.78 16.48 11.64 6.17 15.44 40.24
Y14 0.73 6.77 12.92 16.54 1.65 9.84 18.40 35.20
DZ
Y11 7.03 5.10 3.54 1.23
Y12 5.05 6.79 7.45 4.74
Y13 3.40 7.67 11.39 8.72
Y14 0.67 4.39 8.61 13.17
99
Table A.3: Covariance matrix of log transformed CPS total score
Y21 Y22 Y23 Y24 Y11 Y12 Y13 Y14
MZ All
Y11 1.38 1.40 1.35 1.51 3.52
Y12 1.26 2.37 2.54 1.35 1.50 5.19
Y13 1.05 2.02 2.51 2.09 1.34 2.32 4.45
Y14 0.70 1.28 2.08 1.91 1.07 1.69 2.91 4.13
DZ
Y11 0.65 1.06 0.91 0.41
Y12 1.20 1.65 1.01 0.17
Y13 0.44 0.66 1.21 0.89
Y14 0.45 1.21 0.42 0.54
100
Table A.4: Results of 50% MNAR data (Case I) when fitting different Bayesian Models
Parameter
Regular
Non-linear
Bayesian
Logistic I
a,b
Bayesian
Logistic II
c
Fixed effects
G0 = 10 9.98 (0.18) 9.98 (0.18) 9.99 (0.19)
G1 =3 1.53 (0.10) 3.01 (0.13) 2.15 (0.23)
A[2] =1 1.88 (0.14) 1.00 (0.06) 1.38 (0.14)
A[3] =2 3.70 (0.24) 2.00 (0.08) 2.71 (0.25)
Random effects
σ
u
2
= 16 16.93 (0.75) 16.27 (0.66) 15.99 (0.77)
Additive Genetic
σ
a0
2
= 6 5.60 (2.60) 6.20 (2.61) 6.06 (2.96)
σ
a1
2
=2 0.67 (0.18) 2.27 (0.83) 1.46 (0.56)
σ
a01
= -1.5 -0.95 (0.57) -1.63 (1.25) -1.14 (1.00)
Shared Environ.
σ
c0
2
=4 4.23 (2.27) 3.89 (2.19) 4.43 (2.53)
σ
c1
2
=2 0.56 (0.29) 1.86 (0.62) 1.11 (0.48)
σ
c01
=-1 -0.83 (0.60) -0.97 (1.01) -1.00 (0.82)
Unique Environ.
σ
e0
2
=2 1.20 (0.67) 1.43 (0.73) 1.67 (0.92)
σ
e1
2
=2 0.80 (0.33) 1.72 (0.43) 1.26 (0.36)
σ
e01
= -0.5 -0.03 (0.37) -0.10 (0.45) -0.29 (0.44)
Total variance
σ
g0
2
= 12 11.04 (1.32) 11.45 (0.99) 12.15 (1.03)
σ
g1
2
= 6 2.04 (0.60) 5.79 (0.49) 3.83 (0.75)
σ
g01
= -3 -1.76 (0.73) -2.67 (0.54) -2.70 (0.54)
Logit Parameters
-- -- b
0
= -32.5 -36.82 (6.06)
0
= NA -44.44 (19.74)
-- -- b
1
=1 1.13 (0.18) σ
0
2
=NA 3220 (3937)
-- -- b
2
=0.8 0.91 (0.15) b
1
= NA 2.88 (1.32)
a. The appropriate model for the data was marked in bold.
b. 1 out of the 100 datasets failed in convergence after 110000 iterations with the first
10000 as burn-in.
c. 49 out of the 100 datasets failed in convergence after 110000 iterations with the first
10000 as burn-in.
101
Table A.5: Results of 50% MNAR data (Case II) when fitting different Bayesian Models
Parameter
Regular
Non-linear
Bayesian
Logistic I
b
Bayesian
Logistic II
a,b
Fixed effects
G0 =10 9.99 (0.18) 9.99 (0.18) 9.98 (0.18)
G1 =3 1.62 (0.10) 3.11 (0.19) 3.02 (0.17)
A[2] =1 1.75 (0.12) 0.96 (0.07) 0.99 (0.07)
A[3] =2 3.44 (0.18) 1.93 (0.10) 1.99 (0.10)
Random effects
σ
u
2
= 16 16.51 (0.73) 16.47 (0.68) 16.27 (0.68)
Additive Genetic
σ
a0
2
= 6 6.12 (2.63) 6.04 (2.57) 6.27 (2.58)
σ
a1
2
=2 0.96 (0.26) 2.45 (0.88) 2.31 (0.81)
σ
a01
= -1.5 -1.19 (0.70) -1.58 (1.31) -1.64 (1.27)
Shared Environ.
σ
c0
2
=4 3.96 (2.25) 3.98 (2.21) 3.84 (2.18)
σ
c1
2
=2 0.84 (0.24) 1.91 (0.68) 1.87 (0.68)
σ
c01
=-1 -0.79 (0.63) -1.04 (1.12) -0.96 (1.07)
Unique Environ.
σ
e0
2
=2 1.36 (0.74) 1.32 (0.72) 1.41 (0.77)
σ
e1
2
=2 0.62 (0.19) 1.86 (0.53) 1.80 (0.49)
σ
e01
= -0.5 -0.11 (0.25) -0.05 (0.48) -0.11 (0.48)
Total variance
σ
g0
2
= 12 11.44 (1.08) 11.35 (1.01) 11.52 (1.05)
σ
g1
2
= 6 2.41 (0.36) 6.22 (0.69) 5.98 (0.74)
σ
g01
= -3 -2.09 (0.44) -2.62 (0.53) -2.71 (0.53)
Logit Parameters
-- -- b
0
= NA -6.99 (1.17)
0
= -18 -28.93 (10.33)
-- -- b
1
= NA 0.34 (0.05) σ
0
2
=25 89.98 (58.51)
-- -- b
2
= NA 0.03 (0.01) b
1
= 1 1.60 (0.56)
a. The appropriate model for the data was marked in bold.
b. One of the 100 datasets did not convergence after 110000 iterations with the first
10000 as burn-in.
102
Figure A.1: An example of converged trace plots and density plots
103
Figure A.2: An example of problematic trace plots and density plots that is impossible to be mended by adding iterations alone
104
Appendix B: Codes and Scripts
B.1: SAS codes for generating complete data and MCAR data
%MACRO lgmt(n);
proc iml;
/** specify the mean and covariance of the population **/
Mean = {0,0};
MeanE = {0,0,0,0};
CovA = {6 -1.5,
-1.5 2};
CovC = {4 -1,
-1 2};
CovE = {2 -0.5 0 0,
-0.5 2 0 0,
0 0 2 -0.5,
0 0 -0.5 2};
aNames = "a0":"a1"; cNames = "c0":"c1";
eNames = {"e0t1" "e1t1" "e0t2" "e1t2"};
NumSamples = 310;
call randseed(4321+&n); /** set seed for the RandNormal module **/
A = RandNormal(NumSamples, Mean, CovA);
C = RandNormal(NumSamples, Mean, CovC);
E = RandNormal(NumSamples, MeanE, CovE);
*print (X[1:5,])[label="First 5 Obs: MV Normal"];
create mzA from A [colname=aNames];
append from A;
create mzC from C [colname=cNames];
append from C;
create mzE from E [colname=eNames];
append from E;
quit;
/*
proc corr data=mze;
var e0t1 e1t1 e0t2 e1t2;
run;
*/
data mzACE;
merge mzA mzC mzE;
run;
data mzACE;
set mzACE;
famid=_N_;
RUN;
data mzG(keep=famid mzcode y01_score y02_score ys1_score ys2_score);
RETAIN famid mzcode y01_score y02_score ys1_score ys2_score;
set mzACE;
y01_score=a0+c0+e0t1;
105
y02_score=a0+c0+e0t2;
ys1_score=a1+c1+e1t1;
ys2_score=a1+c1+e1t2;
mzcode=1;
RUN;
/*proc corr data=mzG;
var y01_score y02_score ys1_score ys2_score;
RUN; */
proc iml;
/** specify the mean and covariance of the population **/
Mean1 = {0,0};
Mean2 = {0,0,0,0};
CovAC = {3 -0.75,
-0.75 1};
CovC = {4 -1,
-1 2};
CovAU = {3 -0.75 0 0,
-0.75 1 0 0,
0 0 3 -0.75,
0 0 -0.75 1};
CovE = {2 -0.5 0 0,
-0.5 2 0 0,
0 0 2 -0.5,
0 0 -0.5 2};
acNames = "ac0":"ac1"; cNames = "c0":"c1";
auNames = {"au0t1" "au1t1" "au0t2" "au1t2"};
eNames = {"e0t1" "e1t1" "e0t2" "e1t2"};
NumSamples = 315;
call randseed(5678+&n); /** set seed for the RandNormal module **/
AC = RandNormal(NumSamples, Mean1, CovAC);
C = RandNormal(NumSamples, Mean1, CovC);
AU = RandNormal(NumSamples, Mean2, CovAU);
E = RandNormal(NumSamples, Mean2, CovE);
*print (X[1:5,])[label="First 5 Obs: MV Normal"];
create dzAC from AC [colname=acNames];
append from AC;
create dzC from C [colname=cNames];
append from C;
create dzAU from AU [colname=auNames];
append from AU;
create dzE from E [colname=eNames];
append from E;
quit;
data DZACE;
merge dzAC dzC dzAU dzE;
run;
data dzACE;
set dzACE;
famid=_N_+310;
RUN;
data dzG(keep=famid mzcode y01_score y02_score ys1_score ys2_score);
RETAIN famid mzcode y01_score y02_score ys1_score ys2_score;
set dzACE;
y01_score=ac0+c0+au0t1+e0t1;
106
y02_score=ac0+c0+au0t2+e0t2;
ys1_score=ac1+c1+au1t1+e1t1;
ys2_score=ac1+c1+au1t2+e1t2;
mzcode=0;
RUN;
/*
proc corr data=dzG;
var y01_score y02_score ys1_score ys2_score;
RUN;
*/
data sim_twin;
set mzG dzG;
RUN;
TITLE2 'Building the internal relational matrix needed for the LGM';
DATA sim_rel;
SET sim_twin;
IF(mzcode = 1) THEN DO weightAC=1; weightAU1=0; weightAU2=0; END;
/* next set differs for DZ twins */
IF(mzcode = 0) THEN DO weightAC=sqrt(.5); weightAU1=sqrt(.5);
weightAU2=0; END;
y0 = y01_score; ys = ys1_score; Person=famid; Twin1=1;
OUTPUT;
IF(mzcode = 0) THEN DO weightAC=sqrt(.5); weightAU1=0;
weightAU2=sqrt(.5); END;
y0 = y02_score; ys = ys2_score; Person=famid+0.5; Twin1=0;
OUTPUT;
KEEP person famid mzcode Twin1 weightAC weightAU1 weightAU2 y0 ys;
RUN;
TITLE2 'Note -- all that we need is the Level and Slope Scores to
Generating LGMs';
TITLE3 'Where Correct Model is dy = beta*y + alpha*x; y0 = 10 + dy + e';
/*Set beta=1, alpha=0, which makes difference score model be a regular
LGM model*/
/*True Model: y=10 +3*agec, g0=10, g1=3*/
DATA sim_dyn4_rel;
SET sim_rel; /* person famid mzcode Twin1 weightAC weightAU1
weightAU2 y0 ys */
* setting additional mathematical parameters;
mu_y0 = 10; mu_ys = 3 ;
sigma2_ey = 16; sigma_ey = SQRT(16);
alpha_y = 0;
Times = 4; seed = 20120501+&n;
* generating raw data -- note that pairs do not count *;
yzero = mu_y0 + y0;
ystar = mu_ys + ys;
yt = yzero; eyt = 0;
DO age = 1 TO Times;
agec = age-1;
if age=1 then do;
ytm1 = yzero;
eyt = sigma_ey * RANNOR(seed);
yt = ytm1 + eyt;
107
END;
else if age>1 then do;
ytm1 = yt - eyt; /* so the error does not
propogate over time */
* now the true structural model ;
dyt = ystar + alpha_y * ytm1;
eyt = sigma_ey * RANNOR(seed);
yt = ytm1 + dyt + eyt;
delta_yt = yt - ytm1;
END;
* now the output per time;
KEEP person famid mzcode Twin1 weightAC weightAU1 weightAU2
y0 ys age agec yt eyt;
OUTPUT;
END;
RUN;
TITLE4 'Generate Missing Data';
DATA pattern(drop=i t);
Do t=1 to 4;
age=t;
DO i=1 to 625;
famid=i;
output;
end;
end;
run;
DATA pattern30(drop=SEED);
set pattern;
SEED=20120601+&n;
U1=RANUNI(SEED);
U2=RANUNI(SEED);
C=RAND('BERNOULLI',0.5);
IF (U1<=0.3 and age ne 1) THEN DO;
*First wave is always complete;
IF (U2<=0.9) THEN M30=1;
*10% of time that only one twin (either twin 1 or twin 2)
within each family is missing;
ELSE IF (U2>0.9) and (C=0) THEN M30=2;
ELSE If (U2>0.9) and (C=1) THEN M30=3;
END;
ELSE M30=0;
run;
DATA pattern50(drop=SEED);
set pattern;
SEED=20120701+&n;
U1=RANUNI(SEED);
U2=RANUNI(SEED);
C=RAND('BERNOULLI',0.5);
IF (U1<=0.5 and age ne 1) THEN DO;
IF (U2<=0.9) THEN M50=1;
ELSE IF (U2>0.9) and (C=0) THEN M50=2;
ELSE If (U2>0.9) and (C=1) THEN M50=3;
END;
ELSE M50=0;
run;
108
proc sort data=pattern30; by age famid; run;
proc sort data=pattern50; by age famid; run;
proc sort data=sim_dyn4_rel; by age famid; run;
data sim_30_m (drop= U1 U2 C);
merge sim_dyn4_rel pattern30;
by age famid ;
run;
data sim_50_m (drop= U1 U2 C);
merge sim_dyn4_rel pattern50;
by age famid ;
run;
DATA sim_30_m;
SET sim_30_m;
IF (M30=1) THEN yt=.;
ELSE IF (M30=2) and (twin1=1) THEN yt=.;
ELSE If (M30=3) and (twin1=0) THEN yt=.;
RUN;
DATA sim_50_m;
SET sim_50_m;
IF (M50=1) THEN yt=.;
ELSE IF (M50=2) and (twin1=1) THEN yt=.;
ELSE If (M50=3) and (twin1=0) THEN yt=.;
RUN;
DATA final.ylg&n;
SET sim_dyn4_rel;
RUN;
DATA final.ylg30mcar&n;
SET sim_30_m;
RUN;
DATA final.ylg50mcar&n;
SET sim_50_m;
RUN;
%MEND;
109
B.2: SAS Codes for generating MNAR data (Case I)
%MACRO lgmt(n);
DATA ylg;
SET final.ylg&n;
RUN;
DATA p1(keep=famid age p1);
SET ylg;
if twin1=1;
rename yt=p1;
RUN;
DATA p2(keep=famid age p2);
SET ylg;
if twin1=0;
rename yt=p2;
RUN;
proc sort data=p1; by age famid;run;
proc sort data=p2; by age famid;run;
proc sort data=ylg; by age famid;run;
DATA ylg;
merge ylg p1 p2;
by age famid;
RUN;
DATA ylg(drop=p1 p2);
SET ylg;
IF twin1=1 THEN ytsib=p2;
IF twin1=0 THEN ytsib=p1;
RUN;
DATA y3wsp30;
SET final.ylg30mcar&n;
RUN;
DATA y3wsp30;
SET y3wsp30;
IF age ne 4;
RUN;
DATA y3wsp50;
SET final.ylg50mcar&n;
If age ne 4;
RUN;
DATA y3wsp50;
SET y3wsp50;
IF age ne 4;
RUN;
DATA ylgw4;
SET ylg;
If age=4;
RUN;
110
TITLE4 'Generate Missing Data';
*Deal with wave 4 only;
DATA ylgw4;
SET ylgw4;
yt_logit30=exp(-31.5+0.8*yt+0.6*ytsib)/(exp(-
31.5+0.8*yt+0.6*ytsib)+1);
yt_logit50=exp(-32.5+1*yt+0.8*ytsib)/(exp(-
32.5+1*yt+0.8*ytsib)+1);
RUN;
DATA ylgw4(Drop=SEED);
SET ylgw4;
SEED=20120425+&n;
U30=RANUNI(SEED);
U50=RANUNI(SEED);
M30=0;
M50=0;
*Generate MNAR data for wave 4;
IF (U30<=yt_logit30) THEN M30=1;
IF (U50<=yt_logit50) THEN M50=1;
RUN;
DATA M1w4(keep=famid age M30t1 M50t1);
SET ylgw4;
if twin1=1;
rename M30=M30t1 M50=M50t1;
RUN;
DATA M2w4(keep=famid age M30t2 M50t2);
SET ylgw4;
if twin1=0;
rename M30=M30t2 M50=M50t2;
RUN;
proc sort data=M1w4; by famid;run;
proc sort data=M2w4; by famid;run;
proc sort data=ylgw4; by famid;run;
DATA yw4;
merge ylgw4 M1w4 M2w4;
by famid;
RUN;
DATA yw4 (drop=M30t1 M30t2 M50t1 M50t2);
SET yw4;
IF M30=1 THEN DO;
IF M30t1=1 and M30t2=0 THEN M30=2;
IF M30t1=0 and M30t2=1 THEN M30=3;
*Identify families with only one case missing;
END;
IF M50=1 THEN DO;
IF M50t1=1 and M50t2=0 THEN M50=2;
IF M50t1=0 and M50t2=1 THEN M50=3;
*Identify families with only one case missing;
END;
RUN;
111
DATA yw4p30 (drop=M50 U50);
SET yw4;
IF M30 ne 0 THEN yt=.;
RUN;
DATA yw4p50 (drop=M30 U30);
SET yw4;
IF M50 ne 0 THEN yt=.;
RUN;
*NOW merge 3ws and w4 data;
DATA y30;
SET y3wsp30 yw4p30;
RUN;
DATA y50;
SET y3wsp50 yw4p50;
RUN;
*Retain M=0 or 1 for logistic regression;
DATA y30;
SET y30;
M=0;
IF yt=. THEN M=1;
RUN;
DATA y50;
SET y50;
M=0;
IF yt=. THEN M=1;
RUN;
DATA final.ylg30mnar&n;
SET y30;
RUN;
DATA final.ylg50mnar&n;
SET y50;
RUN;
%MEND;
112
B.3: SAS codes for generating MNAR data (Case II)
%MACRO lgmt(n);
DATA ylg;
SET final.ylg&n;
RUN;
DATA y3wsp30;
SET final.ylg30mcar&n;
RUN;
DATA y3wsp30;
SET y3wsp30;
IF age ne 4;
RUN;
DATA y3wsp50;
SET final.ylg50mcar&n;
If age ne 4;
RUN;
DATA y3wsp50;
SET y3wsp50;
IF age ne 4;
RUN;
DATA ylgw4;
SET ylg;
If age=4;
RUN;
TITLE4 'Generate Missing Data';
*Deal with wave 4 only;
DATA pattern(drop=i);
DO i=1 to 625;
famid=i;
output;
end;
run;
DATA pattern(drop=SEED);
set pattern;
SEED=20120601+&n;
U1=-23+5*RANNOR(SEED);
U2=-18+5*RANNOR(SEED);
run;
proc sort data=ylgw4; by fid person; run;
proc sort data=pattern; by fid; run;
data ylgw4;
merge ylgw4 pattern;
by famid;
run;
113
DATA ylgw4(drop=SEED);
SET ylgw4;
SEED=20120425+&n;
U30=RANUNI(SEED);
U50=RANUNI(SEED);
yt_logit30=exp(U1+yt)/(exp(U1+yt)+1);
yt_logit50=exp(U2+yt)/(exp(U2+yt)+1);
M30=0; M50=0;
*Generate MNAR data for wave 4;
IF (U30<=yt_logit30) THEN M30=1;
IF (U50<=yt_logit50) THEN M50=1;
RUN;
DATA M1w4(keep=famid age M30t1 M50t1);
SET ylgw4;
if twin1=1;
rename M30=M30t1 M50=M50t1;
RUN;
DATA M2w4(keep=famid age M30t2 M50t2);
SET ylgw4;
if twin1=0;
rename M30=M30t2 M50=M50t2;
RUN;
proc sort data=M1w4; by famid;run;
proc sort data=M2w4; by famid;run;
proc sort data=ylgw4; by famid;run;
DATA yw4;
merge ylgw4 M1w4 M2w4;
by famid;
RUN;
DATA yw4 (drop=M30t1 M30t2 M50t1 M50t2);
SET yw4;
IF M30=1 THEN DO;
IF M30t1=1 and M30t2=0 THEN M30=2;
IF M30t1=0 and M30t2=1 THEN M30=3;
*Identify families with only one case missing;
END;
IF M50=1 THEN DO;
IF M50t1=1 and M50t2=0 THEN M50=2;
IF M50t1=0 and M50t2=1 THEN M50=3;
*Identify families with only one case missing;
END;
RUN;
DATA yw4p30 (drop=M50 U50);
SET yw4;
IF M30 ne 0 THEN yt=.;
RUN;
DATA yw4p50 (drop=M30 U30);
SET yw4;
IF M50 ne 0 THEN yt=.;
RUN;
114
*NOW merge 3ws and w4 data;
DATA y30;
SET y3wsp30 yw4p30;
RUN;
DATA y50;
SET y3wsp50 yw4p50;
RUN;
*Retain M=0 or 1 for logistic regression;
DATA y30;
SET y30;
M=0;
IF yt=. THEN M=1;
RUN;
DATA y50;
SET y50;
M=0;
IF yt=. THEN M=1;
RUN;
DATA final.ylg30mnar&n;
SET y30;
RUN;
DATA final.ylg50mnar&n;
SET y50;
RUN;
%MEND;
115
B.4: Mplus scripts for full ACE biometric linear latent growth model
TITLE: 2-group, ACE, linear model
DATA: FILE = sim.dat;
VARIABLE: NAMES = famid mzcode
y11 y12 y13 y14
y21 y22 y23 y24
age1 age2 age3 age4
M11 M12 M13 M14
M21 M22 M23 M24
;
MISSING = .;
USEVAR = y11 y12 y13 y14
y21 y22 y23 y24;
GROUP = mzcode(1=mz 0=dz);
ANALYSIS: TYPE = MEANSTRUCTURE;
ITERATIONS=10000;
MODEL:
go1 BY y11@1 y12@1 y13@1 y14@1;
gs1 BY y11 @0 (3)
y12 @1 (4)
y13 @2 (5)
y14 @3 (6);
[go1] (1); [gs1] (2);
go1 @0; gs1@0;
y11 y12 y13 y14 (Ve);
[y11 @0 y12 @0 y13 @0 y14@0];
go2 BY y21@1 y22@1 y23@1 y24@1;
gs2 BY y21 @0 (3)
y22 @1 (4)
y23 @2 (5)
y24 @3 (6);
[go2] (1); [gs2] (2);
116
go2 @0; gs2@0;
y21 y22 y23 y24 (Ve);
[y21 @0 y22 @0 y23 @0 y24@0];
go1 with gs1@0; go1 with go2@0; go1 with gs2@0;
gs1 with go2@0; gs1 with gs2@0;
go2 with gs2@0;
!Biometric elements for level
Ao1 by go1 * .7 (11); Ao2 by go2 *.7 (11);
Co1 by go1 * .3 (12); Co2 by go2 *.3 (12);
Eo1 by go1 * .6 (13); Eo2 by go2 *.6 (13);
[Ao1-Eo2@0]; Ao1-Eo2@1;
!Biometric elements for slope
Ao1 by gs1 * 0 (21); Ao2 by gs2 *0 (21);
Co1 by gs1 * 0 (22); Co2 by gs2 *0 (22);
Eo1 by gs1 * 0 (23); Eo2 by gs2 *0 (23);
As1 by gs1 * .6 (31); As2 by gs2 *.6 (31);
Cs1 by gs1 * .5 (32); Cs2 by gs2 *.5 (32);
Es1 by gs1 * .6 (33); Es2 by gs2 *.6 (33);
[As1-Es2@0]; As1-Es2@1;
!correlations for latent variables
Ao1-Ao2 with Co1-Es2@0; Ao1 with Ao2 @1;
Co1-Co2 with Eo1-Es2@0; Co1 with Co2 @1;
Eo1-Eo2 with As1-Es2@0; Eo1 with Eo2 @0;
As1-As2 with Cs1-Es2@0; As1 with As2@1;
Cs1-Cs2 with Es1-Es2@0; Cs1 with Cs2@1;
Es1 with Es2@0;
Model mz:
[go1] (1); [gs1] (2);
go1 @0; gs1@0;
[go2] (1); [gs2] (2);
go2 @0; gs2@0;
Ao1 with Ao2@1;
As1 with As2@1;
117
Model dz:
[go1] (1); [gs1] (2);
go1 @0; gs1@0;
[go2] (1); [gs2] (2);
go2 @0; gs2@0;
Ao1 with Ao2@0.5;
As1 with As2@0.5;
OUTPUT: SAMP STAND RES TECH1 TECH4;
sAVEDATA:
ESTIMATES ARE est.txt;
118
B.5: Mplus scripts for full ACE biometric non-linear latent growth model
TITLE: 2-group, ACE Non-linear model
DATA: FILE = sim.dat;
VARIABLE: NAMES = famid mzcode
y11 y12 y13 y14
y21 y22 y23 y24
age1 age2 age3 age4
M11 M12 M13 M14
M21 M22 M23 M24;
MISSING = .;
USEVAR = y11 y12 y13 y14
y21 y22 y23 y24;
GROUP = mzcode(1=mz 0=dz);
ANALYSIS: TYPE = MEANSTRUCTURE;
ITERATIONS=10000;
MODEL:
go1 BY y11@1 y12@1 y13@1 y14@1;
gs1 BY y11 @0 (3)
y12 *1 (4)
y13 *2 (5)
y14 @3 (6);
[go1] (1); [gs1] (2);
go1 @0; gs1@0;
y11 y12 y13 y14 (Ve);
[y11 @0 y12 @0 y13 @0 y14@0];
go2 BY y21@1 y22@1 y23@1 y24@1;
gs2 BY y21 @0 (3)
y22 *1 (4)
y23 *2 (5)
y24 @3 (6);
[go2] (1); [gs2] (2);
119
go2 @0; gs2@0;
y21 y22 y23 y24 (Ve);
[y21 @0 y22 @0 y23 @0 y24@0];
go1 with gs1@0; go1 with go2@0; go1 with gs2@0;
gs1 with go2@0; gs1 with gs2@0;
go2 with gs2@0;
!Biometric elements for level
Ao1 by go1 * .7 (11); Ao2 by go2 *.7 (11);
Co1 by go1 * .3 (12); Co2 by go2 *.3 (12);
Eo1 by go1 * .6 (13); Eo2 by go2 *.6 (13);
[Ao1-Eo2@0]; Ao1-Eo2@1;
!Biometric elements for slope
Ao1 by gs1 * 0 (21); Ao2 by gs2 *0 (21);
Co1 by gs1 * 0 (22); Co2 by gs2 *0 (22);
Eo1 by gs1 * 0 (23); Eo2 by gs2 *0 (23);
As1 by gs1 * .6 (31); As2 by gs2 *.6 (31);
Cs1 by gs1 * .5 (32); Cs2 by gs2 *.5 (32);
Es1 by gs1 * .6 (33); Es2 by gs2 *.6 (33);
[As1-Es2@0]; As1-Es2@1;
!correlations for latent variables
Ao1-Ao2 with Co1-Es2@0; Ao1 with Ao2 @1;
Co1-Co2 with Eo1-Es2@0; Co1 with Co2 @1;
Eo1-Eo2 with As1-Es2@0; Eo1 with Eo2 @0;
As1-As2 with Cs1-Es2@0; As1 with As2@1;
Cs1-Cs2 with Es1-Es2@0; Cs1 with Cs2@1;
Es1 with Es2@0;
Model mz:
[go1] (1); [gs1] (2);
go1 @0; gs1@0;
[go2] (1); [gs2] (2);
go2 @0; gs2@0;
Ao1 with Ao2@1;
As1 with As2@1;
120
Model dz:
[go1] (1); [gs1] (2);
go1 @0; gs1@0;
[go2] (1); [gs2] (2);
go2 @0; gs2@0;
Ao1 with Ao2@0.5;
As1 with As2@0.5;
OUTPUT: SAMP STAND RES TECH1 TECH4;
sAVEDATA:
ESTIMATES ARE estnl.txt;
121
B.6: Bugs script for full ACE linear biometric latent growth model
model
{
for (i in 1:N)
{
for (k in 3:6)
{muy[i,k]<-gt1[i,1]+(k-3)*gt1[i,2]
y[i,k]~dnorm(muy[i,k],tauy)}
for (k in 7:10)
{muy[i,k]<-gt2[i,1]+(k-7)*gt2[i,2]
y[i,k]~dnorm(muy[i,k],tauy)}
}
for (i in 1:310)
{
g_C[i,1:2]~dmnorm(mug[1:2],tauC[1:2,1:2])
g_AC[i,1:2]~dmnorm(g_C[i,1:2],tauA[1:2,1:2])
gt1[i,1:2]~dmnorm(g_AC[i,1:2],tauE[1:2,1:2])
gt2[i,1:2]~dmnorm(g_AC[i,1:2],tauE[1:2,1:2])
}
for (i in 311:N)
{
g_C[i,1:2]~dmnorm(mug[1:2],tauC[1:2,1:2])
dzg_ACbt[i,1:2]~dmnorm(g_C[i,1:2],tauAhf[1:2,1:2])
dzg_ACt1[i,1:2]~dmnorm(dzg_ACbt[i,1:2],tauAhf[1:2,1:2])
dzg_ACt2[i,1:2]~dmnorm(dzg_ACbt[i,1:2],tauAhf[1:2,1:2])
gt1[i,1:2]~dmnorm(dzg_ACt1[i,1:2],tauE[1:2,1:2])
gt2[i,1:2]~dmnorm(dzg_ACt2[i,1:2],tauE[1:2,1:2])
}
for (k in 1:2)
{for (l in 1:2)
{tauAhf[k,l]<-2*tauA[k,l]}
}
sigma2C[1:2,1:2]<-inverse(tauC[1:2,1:2]) sigma2A[1:2,1:2]<-inverse(tauA[1:2,1:2])
sigma2E[1:2,1:2]<-inverse(tauE[1:2,1:2]) sigma2y<-1/tauy
tauC[1:2,1:2]~dwish(R[1:2,1:2],2) tauA[1:2,1:2]~dwish(R[1:2,1:2],2)
tauE[1:2,1:2]~dwish(R[1:2,1:2],2)
tauy~dunif(0,10) mug[1]~dnorm(0,1.0E-6) mug[2]~dnorm(0,1.0E-6)
}
list(mug=c(10,3),tauy=0.1)
list(mug=c(30,1),tauy=0.5)
list(N=625,
y = structure(.Data = c( 1, 1,18.165356623,26.828948708, ……),
.Dim = c(625, 22)),
R = structure(.Data = c( 1, 0, 0, 1),
.Dim = c(2, 2)))
122
B.7: Bugs script for full ACE non-linear biometric latent growth model
model
{
for (i in 1:N)
{
for (k in 3:6) {y[i,k]~dnorm(muy[i,k],tauy)}
for (k in 7:10) {y[i,k]~dnorm(muy[i,k],tauy)}
muy[i,3]<-gt1[i,1]
muy[i,4]<-gt1[i,1]+A2*gt1[i,2]
muy[i,5]<-gt1[i,1]+A3*gt1[i,2]
muy[i,6]<-gt1[i,1]+3*gt1[i,2]
muy[i,7]<-gt2[i,1]
muy[i,8]<-gt2[i,1]+A2*gt2[i,2]
muy[i,9]<-gt2[i,1]+A3*gt2[i,2]
muy[i,10]<-gt2[i,1]+3*gt2[i,2]
}
for (i in 1:310)
{
g_C[i,1:2]~dmnorm(mug[1:2],tauC[1:2,1:2])
g_AC[i,1:2]~dmnorm(g_C[i,1:2],tauA[1:2,1:2])
gt1[i,1:2]~dmnorm(g_AC[i,1:2],tauE[1:2,1:2])
gt2[i,1:2]~dmnorm(g_AC[i,1:2],tauE[1:2,1:2])
}
for (i in 311:N)
{
g_C[i,1:2]~dmnorm(mug[1:2],tauC[1:2,1:2])
dzg_ACbt[i,1:2]~dmnorm(g_C[i,1:2],tauAhf[1:2,1:2])
dzg_ACt1[i,1:2]~dmnorm(dzg_ACbt[i,1:2],tauAhf[1:2,1:2])
dzg_ACt2[i,1:2]~dmnorm(dzg_ACbt[i,1:2],tauAhf[1:2,1:2])
gt1[i,1:2]~dmnorm(dzg_ACt1[i,1:2],tauE[1:2,1:2])
gt2[i,1:2]~dmnorm(dzg_ACt2[i,1:2],tauE[1:2,1:2])
}
for (k in 1:2)
{
for (l in 1:2)
{
tauAhf[k,l]<-2*tauA[k,l]
}
}
sigma2C[1:2,1:2]<-inverse(tauC[1:2,1:2]) sigma2A[1:2,1:2]<-inverse(tauA[1:2,1:2])
sigma2E[1:2,1:2]<-inverse(tauE[1:2,1:2]) sigma2y<-1/tauy
tauC[1:2,1:2]~dwish(R[1:2,1:2],2) tauA[1:2,1:2]~dwish(R[1:2,1:2],2)
tauE[1:2,1:2]~dwish(R[1:2,1:2],2) tauy~dunif(0,10)
A2~dnorm(0,1.0E-6) A3~dnorm(0,1.0E-6)
mug[1]~dnorm(0,1.0E-6) mug[2]~dnorm(0,1.0E-6)
123
# put all the parameters' results into res
res[1]<-A2 res[2]<-A3
for(j in 3:4){ res[j]<- mug[j-2] }
res[5]<-sigma2A[1,1]
res[6]<-sigma2C[1,1]
res[7]<-sigma2E[1,1]
res[8]<-sigma2A[2,2]
res[9]<-sigma2C[2,2]
res[10]<-sigma2E[2,2]
res[11]<-sigma2A[1,2]
res[12]<-sigma2C[1,2]
res[13]<-sigma2E[1,2]
res[14]<-sigma2y
}
list(A2=1,A3=2,mug=c(11,2),tauy=0.1)
list(A2=1.5,A3=3,mug=c(30,1),tauy=0.5)
124
B.8: Bugs script for Bayesian logistic model-III
model {
# Define Missingness pattern
for (i in 1:N)
{
b0[i]~dnorm(mub,taub) # Remove this part to get Bayesian logistic model-II
y[i,18]~dbern(p.bound[i,18])
p.bound[i,18] <- max(0, min(1, p[i,18]))
# Remove b2 to get Bayesian logistic model-I
logit(p[i,18])<-b0[i]+b1*y[i,6]+b2*y[i,10]
y[i,22]~dbern(p.bound[i,22])
p.bound[i,22] <- max(0, min(1, p[i,22]))
# Remove b2 to get Bayesian logistic model-I
logit(p[i,22])<-b0[i]+b1*y[i,10]+b2*y[i,6]
}
# Define Regression model
for (i in 1:N)
{
for (k in 3:6) {y[i,k]~dnorm(muy[i,k],tauy)}
muy[i,3]<-gt1[i,1]
muy[i,4]<-gt1[i,1]+A2*gt1[i,2]
muy[i,5]<-gt1[i,1]+A3*gt1[i,2]
muy[i,6]<-gt1[i,1]+3*gt1[i,2]
for (k in 7:10) {y[i,k]~dnorm(muy[i,k],tauy)}
muy[i,7]<-gt2[i,1]
muy[i,8]<-gt2[i,1]+A2*gt2[i,2]
muy[i,9]<-gt2[i,1]+A3*gt2[i,2]
muy[i,10]<-gt2[i,1]+3*gt2[i,2]
}
# Define Model for MZ twins
for (i in 1:310)
{
g_C[i,1:2]~dmnorm(mug[1:2],tauC[1:2,1:2])
g_AC[i,1:2]~dmnorm(g_C[i,1:2],tauA[1:2,1:2])
gt1[i,1:2]~dmnorm(g_AC[i,1:2],tauE[1:2,1:2])
gt2[i,1:2]~dmnorm(g_AC[i,1:2],tauE[1:2,1:2])
}
# Define Model for DZ twins
for (i in 311:N)
{
g_C[i,1:2]~dmnorm(mug[1:2],tauC[1:2,1:2])
dzg_ACbt[i,1:2]~dmnorm(g_C[i,1:2],tauAhf[1:2,1:2])
dzg_ACt1[i,1:2]~dmnorm(dzg_ACbt[i,1:2],tauAhf[1:2,1:2])
dzg_ACt2[i,1:2]~dmnorm(dzg_ACbt[i,1:2],tauAhf[1:2,1:2])
gt1[i,1:2]~dmnorm(dzg_ACt1[i,1:2],tauE[1:2,1:2])
gt2[i,1:2]~dmnorm(dzg_ACt2[i,1:2],tauE[1:2,1:2])
}
125
for (k in 1:2)
{
for (l in 1:2)
{tauAhf[k,l]<-2*tauA[k,l]}
}
# Define Priors
sigma2C[1:2,1:2]<-inverse(tauC[1:2,1:2]) sigma2A[1:2,1:2]<-inverse(tauA[1:2,1:2])
sigma2E[1:2,1:2]<-inverse(tauE[1:2,1:2]) sigma2y<-1/tauy sigma2b<-1/taub
tauC[1:2,1:2]~dwish(R[1:2,1:2],2) tauA[1:2,1:2]~dwish(R[1:2,1:2],2)
tauE[1:2,1:2]~dwish(R[1:2,1:2],2) tauy~dunif(0,10) taub~dgamma(0.001,0.001)
mug[1]~dnorm(0,1.0E-6) mug[2]~dnorm(0,1.0E-6)
mub~dnorm(-20,1.0E-6) b1~dnorm(0,1.0E-6) b2~dnorm(0,1.0E-6)
A2~dnorm(0,1.0E-6) A3~dnorm(0,1.0E-6)
# put all the parameters' results into res
res[1]<-A2
res[2]<-A3
res[3]<-mub
res[4]<-sigma2b
res[5]<-b1
res[6]<-b2
for(j in 7:8){ res[j]<- mug[j-6] }
res[9]<-sigma2A[1,1]
res[10]<-sigma2C[1,1]
res[11]<-sigma2E[1,1]
res[12]<-sigma2A[2,2]
res[13]<-sigma2C[2,2]
res[14]<-sigma2E[2,2]
res[15]<-sigma2A[1,2]
res[16]<-sigma2C[1,2]
res[17]<-sigma2E[1,2]
res[18]<-sigma2y
}
list(A2=1,A3=2,mub=-20,b1=1,b2=0.8,mug=c(10,3),tauy=0.1,taub=0.2)
list(A2=2,A3=3,mub=-10,b1=0.5,b2=0.6,mug=c(0,5),tauy=0.5,taub=0.8)
Abstract (if available)
Abstract
The overarching aims of this research were to find a way to evaluate assumptions about patterns of missing data and minimize adverse effects on statistical inferences and conclusions if data are missing not at random. The Bayesian approach, implemented through the MCMC algorithms, provided possible bridges to these aims and received multi-phase investigations under a longitudinal twin design in this research. In simulation study I and II, the performances of the Bayesian approach were compared to the direct Maximum likelihood (ML) estimation when data were missing completely at random (MCAR) or not at random (MNAR), respectively. Robustness of different approaches to varying fractions of missing information was also examined. The Bayesian approach, despite being less robust than ML in estimating unique environmental components and residual error, was effectively capable of capturing true models from MNAR when an appropriate logistic regression was formulated about the missingness pattern. In simulation study III, the sensitivity of the Bayesian approach itself to misspecifications of the missingness pattern received further investigation. The results suggested that, as long as the logistic regression equation in the Bayesian model was a saturated one taking into account as many predictors as possible, the Bayesian approach could correctly detect misspecifications with an acceptable type-I error. Finally, the Bayesian model was applied to an empirical study of psychopathic traits to demonstrate the influence of MNAR data and what different conclusions from regular ML estimation it could achieve. The implications of these different conclusions regarding the theory of psychopathy itself were also discussed.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Estimation of nonlinear mixed effects mixture models with individually varying measurement occasions
PDF
Sources of stability and change in the trajectory of openness to experience across the lifespan
PDF
Regularized structural equation modeling
PDF
Antecedents of marijuana initiation
PDF
Rasch modeling of abstract reasoning in Project TALENT
PDF
Multivariate biometric modeling among multiple traits across different raters in child externalizing behavior studies
PDF
A comparison of standard and alternative measurement models for dealing with skewed data with applications to longitudinal data on the child psychopathy scale
PDF
Later life success of former college student-athletes as a function of retirement from sport and participant characteristics
PDF
Biometric models of psychopathic traits in adolescence: a comparison of item-level and sum-score approaches
PDF
Patterns of EEG spectral power in 9-10 year old twins and their relationships with aggressive and nonaggressive antisocial behavior in childhood and adolescence
PDF
A comparison of classical methods and second order latent growth models for longitudinal data analysis
PDF
Identifying diverse pathways to cognitive decline in later life using genetic and environmental factors
PDF
A biometric latent curve anaysis of visual memory development using data from the Colorado Adoption Project
PDF
Comparing dependent groups with missing values: an approach based on a robust method
PDF
A functional use of response time data in cognitive assessment
PDF
Evaluating social-cognitive measures of motivation in a longitudinal study of people completing New Year's resolutions to exercise
PDF
Applying adaptive methods and classical scale reduction techniques to data from the big five inventory
PDF
On the latent change score model in small samples
PDF
Evaluating the associations between the baseline and other exposure variables with the longitudinal trajectory when responses are measured with error
PDF
Family conflict, negative mood, and adolescents' daily problems in school
Asset Metadata
Creator
Wang, Pan
(author)
Core Title
Attrition in longitudinal twin studies: a comparative study of SEM estimation methods
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
10/15/2012
Defense Date
08/31/2012
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
longitudinal studies,OAI-PMH Harvest,SEM,structural equation modeling,Twins
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Baker, Laura A. (
committee chair
), Gatz, Margaret (
committee member
), John, Richard S. (
committee member
), McArdle, John J. (
committee member
), Thomas, Duncan C. (
committee member
)
Creator Email
panwang@usc.edu,phoebewang2@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-104726
Unique identifier
UC11288964
Identifier
usctheses-c3-104726 (legacy record id)
Legacy Identifier
etd-WangPan-1248.pdf
Dmrecord
104726
Document Type
Dissertation
Rights
Wang, Pan
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
longitudinal studies
SEM
structural equation modeling