Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Estimation of nonlinear mixed effects mixture models with individually varying measurement occasions
(USC Thesis Other)
Estimation of nonlinear mixed effects mixture models with individually varying measurement occasions
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
NONLINEAR MIXED EFFECTS MIXTURE MODELS 1
Estimation of Nonlinear Mixed Effects Mixture Models
with Individually Varying Measurement Occasions
Sarfaraz Serang
A Thesis Submitted in Partial Fulfillment of the
Requirements for the Degree of
Master of Arts
in Psychology
at
The University of Southern California
May 2015
NONLINEAR MIXED EFFECTS MIXTURE MODELS 2
Table of Contents
Abstract ........................................................................................................................................... 5
Introduction ..................................................................................................................................... 6
Purpose ...................................................................................................................................... 11
Model formulation..................................................................................................................... 12
Estimation.................................................................................................................................. 15
Simulations ................................................................................................................................... 20
Part 1 – General conditions ....................................................................................................... 23
Part 1 – Results .......................................................................................................................... 24
Part 2A – Acknowledging time heterogeneity .......................................................................... 28
Part 2A – Results ....................................................................................................................... 29
Part 2B – Time-varying residuals.............................................................................................. 31
Part 2B – Results ....................................................................................................................... 32
Part 3A – Data missing completely at random .......................................................................... 34
Part 3A – Results ....................................................................................................................... 34
Part 3B – Attrition ..................................................................................................................... 35
Part 3B – Results ....................................................................................................................... 35
Part 3C – Non-random selection ............................................................................................... 36
Part 3C – Results ....................................................................................................................... 37
Illustrative Example ...................................................................................................................... 38
Discussion ..................................................................................................................................... 42
Acknowledgements ....................................................................................................................... 47
References ..................................................................................................................................... 48
Appendix A – Sample Mplus model statement for 2-class NMEMM ......................................... 55
Appendix B – Sample OpenBUGS model statement for 2-class NMEMM ................................. 56
Tables ............................................................................................................................................ 57
Figures........................................................................................................................................... 85
NONLINEAR MIXED EFFECTS MIXTURE MODELS 3
List of Tables
Table 1: Part 1 – General conditions – Population values ............................................................ 57
Table 2: Part 1 – General conditions – Convergence ................................................................... 58
Table 3: Part 1 – General conditions – Model selection ............................................................... 59
Table 4: Part 1 – General conditions – Percent bias ..................................................................... 60
Table 5: Part 1 – General conditions – Standard errors ................................................................ 68
Table 6: Part 2A – Acknowledging time heterogeneity – Population values ............................... 76
Table 7: Part 2A – Acknowledging time heterogeneity – Percent bias ........................................ 77
Table 8: Part 2B – Time-varying residuals – Percent bias............................................................ 79
Table 9: Part 3A – MCAR – Percent bias ..................................................................................... 81
Table 10: Part 3B – Attrition – Percent bias ................................................................................. 82
Table 11: Part 3C – Non-random selection – Percent bias ........................................................... 83
Table 12: Illustrative example – Parameter estimates for ECLS-K data ...................................... 84
NONLINEAR MIXED EFFECTS MIXTURE MODELS 4
List of Figures
Figure 1: Growth trajectories for a subsample of 100 individuals from the ECLS-K study ........ 85
Figure 2: Expected trajectories based on estimated model parameters ........................................ 86
NONLINEAR MIXED EFFECTS MIXTURE MODELS 5
Abstract
Change over time often takes on a nonlinear form which can introduce complexities in
model estimation. Furthermore, these change patterns can sometimes be characterized by
heterogeneity due to underlying unobserved groups in the population. Nonlinear mixed effects
mixture models provide one way of addressing both of these issues simultaneously. The purpose
of this study is to extend this class of models to accommodate individually varying measurement
occasions. We develop methods to fit these models in both the structural equation modeling
framework as well as the Bayesian framework and evaluate the performance of these methods in
an attempt to provide researchers with some practical recommendations regarding their use.
Simulation results show that the main force driving the success of these methods is the
separation between latent classes. When these classes are well separated, even a sample of 200
individuals appears to be sufficient. Otherwise, a sample of 1000 or more may be required before
parameters can be accurately recovered. Ignoring heterogeneity in time of measurement also led
to substantial bias, particularly in the random effects parameters. Finally, we demonstrate the
application of these techniques to an empirical dataset involving the development of reading
ability in children.
Keywords: longitudinal, nonlinear, mixed effects model, growth mixture model,
structural equation modeling, Bayesian
NONLINEAR MIXED EFFECTS MIXTURE MODELS 6
Introduction
The study of how individuals change over time is an important element of research in the
social sciences, especially in the field of psychology. This change is often be decomposed into
two parts: intraindividual change (change within a person) and interindividual differences
(differences between persons). Intraindividual change examines how a person develops over
time, whereas interindividual differences attempts to capture how patterns of change vary from
one person to another (Baltes & Nesselroade, 1979). Longitudinal models have become popular
due to their strength in teasing apart the effects of each of these two components (Curran &
Bauer, 2011). In particular, the structural equation modeling (SEM) framework has seen
extensive use in the modeling of longitudinal data because of its inherent flexibility in addition to
its ability to integrate latent variables into its models (McArdle, 2009).
The latent growth curve model (LGM; Meredith & Tisak, 1990; McArdle & Epstein,
1987; McArdle, 1988) has become among the most commonly used models in studying
developmental change in psychology. The LGM proposes that the trajectory of an individual’s
scores over time is the result of some unobserved underlying process with some added random
error. While the shape of this trajectory (the functional form) is assumed to be the same for all
individuals, each person is permitted to have their own parameter values, which are assumed to
take on a prespecified distributional form. Allowing each person to have their own curve gives us
the opportunity to simultaneously study both interindividual differences and intraindividual
change by fitting only a single model.
Many models assume that samples, and thus by extension the populations of interest from
which they are drawn, are homogeneous. That is, they operate under the assumption that all
individuals within a population are fundamentally alike and therefore exhibit similar
NONLINEAR MIXED EFFECTS MIXTURE MODELS 7
characteristics. However, in the social sciences such a strong assumption is often unfounded and
heterogeneity is usually the norm. When the source of this heterogeneity is observed, a multiple
group model may better describe the data because of its flexibility in allowing data in each group
to behave in a way that can potentially differ from that of the other groups. For example, one
commonly used grouping variable is gender, given that males and females often differ in distinct
ways (Nolen-Hoeksema, 2001). For simplicity we only consider nominal grouping variables
here.
When the grouping variable is unobserved however, multiple group models cannot be fit
due to the lack of information on group membership. In this case, the data may be better
described by a finite mixture model (McLachlan & Peel, 2000). Finite mixture models assume
that a population is composed of a finite number of underlying subpopulations, each with its own
properties. While the distributional form of each subpopulation is typically assumed to be the
same, the parameters are allowed to vary across subpopulations. One popular use for mixture
models is classification. Because each subpopulation can be conceptualized as a latent class,
each individual can be assigned to a class based on posterior probabilities of latent class
membership.
The growth mixture model (GMM) stems from the combination of the LGM and finite
mixture model (Muthén & Shedden, 1999; Muthén & Muthén, 2000). In GMMs, each latent
class derived from the mixture model component represents a subpopulation that follows a
distinct trajectory of change described by the LGM component. The mean trajectory of each
class typically follows the same functional form (though this need not be the case) but with
different parameters for each class. Furthermore, individuals within the same class are allowed to
vary around the mean trajectory and are thus permitted to differ from each other. The ability to
NONLINEAR MIXED EFFECTS MIXTURE MODELS 8
account for individual variability is a feature of the GMM that sets it apart from other models of
heterogeneous change such as the latent class growth model (Nagin, 1999).
While linear models are among the most widely used, change patterns in psychology
often take on nonlinear functional forms (Cudeck & Harring, 2007). We use this opportunity to
explicitly define our notion of linear and nonlinear models to clarify our intent when referring to
each. Our definition can be perceived from either a linear algebra or calculus perspective, which
for our purposes we consider to be functionally equivalent. From a linear algebra perspective, we
define a linear model as one in which the outcome is written as a linear combination of the model
parameters. Alternatively from a calculus perspective, we consider a linear model one in which
the first partial derivative of the model with respect to any model parameter no longer contains
that parameter (Timmons & Preacher, 2015). This statement must hold for all parameters in the
model for the model to be considered linear.
For example, consider the multiple regression model 𝑦 𝑖 = 𝛽 0
+ 𝛽 1
𝑥 𝑖 1
+ 𝛽 2
𝑥 𝑖 2
+ 𝜀 𝑖 .
From a linear algebra perspective, 𝑦 𝑖 is written as a linear combination of the 𝛽 parameters. This
can easily be seen when the model is written in matrix notation, 𝒀 = 𝑿𝜷 + 𝜺 . From a calculus
perspective, the partial derivative taken with perspective to each parameter is no longer a
function of that parameter. In this case,
𝜕 𝑦 𝑖 𝜕 𝛽 0
= 1,
𝜕 𝑦 𝑖 𝜕 𝛽 1
= 𝑥 𝑖 1
, and
𝜕 𝑦 𝑖 𝜕 𝛽 2
= 𝑥 𝑖 2
, none of which contain
their respective 𝛽 parameters. Note that this definition makes no reference to the geometric
trajectory over time, only the functional form. For example consider a quadratic model of
growth, 𝑦 𝑖𝑡
= 𝛽 0
+ 𝛽 1
𝑡 + 𝛽 2
𝑡 2
+ 𝜀 𝑖𝑡
. While the trajectory of growth is nonlinear, our definition
would still consider this to be a linear model for the same reasons as the multiple regression
model above. On the other hand, an example of a nonlinear model would be an exponential
NONLINEAR MIXED EFFECTS MIXTURE MODELS 9
growth model of the form 𝑦 𝑖𝑡
= 𝛽 0
𝑒 𝛽 1
𝑡 + 𝜀 𝑖𝑡
. This model cannot be written as a linear
combination of its parameters. Additionally, the first derivative with respect to 𝛽 1
,
𝜕 𝑦 𝑖 𝜕 𝛽 1
=
𝛽 0
𝑡 𝑒 𝛽 1
𝑡 , contains 𝛽 1
. According to our definition, polynomial models as well as piece-wise
models with polynomial components would be considered linear models. The exponential,
Gompertz, and logistic models would all be examples of nonlinear models.
Although methods for estimating linear GMMs have been well established (Muthén &
Shedden, 1999), the estimation of their nonlinear counterparts has not been so straightforward.
As a result, many different estimation methods have been proposed for this class of models.
Kelley (2008) suggested five multistage procedures that could be used for estimation. Grimm
and colleagues made use of linearization via Taylor series expansion (Grimm, Ram, &
Estabrook, 2010) as described by Browne and du Toit (1991). Others have employed maximum
likelihood estimation (MLE). Among them, Harring (2012) utilized the expectation-
maximization (EM) algorithm, while Codd and Cudeck (2014) used adaptive Gauss-Hermite
quadrature. On the other hand, Lu and Huang (2014) took a Bayesian perspective on modeling
the trajectory of viral load response. Serang and colleagues also evaluated a Bayesian approach,
comparing its performance to the linearization approach (Serang, Zhang, Helm, Steele, &
Grimm, 2015).
Time of measurement is another concern. LGMs are frequently fit under the assumption
that all individuals measured within a wave of data are measured simultaneously. As such, the
independent variable, time of measurement, is assumed to be the same for all individuals. One
example of this is in educational settings, where time of measurement is often defined by grade.
If students were assessed at grades 1, 2, 3, and 4, the values 1, 2, 3, and 4 would be used as the
NONLINEAR MIXED EFFECTS MIXTURE MODELS 10
independent variable in the model for all individuals. However, the assumption that all
individuals are measured at the same time may be a bit too presumptuous in some circumstances.
A child measured at the end of first grade may be expected to perform better than one measured
at the beginning of first grade, but this would not be accounted for in the aforementioned
scheme. This structure also precludes age as an independent variable, given that individuals are
typically of different ages when measured. That said, it may be more useful to explicitly
acknowledge time heterogeneity by accommodating individual measurement schedules.
While the use of individually varying time points for LGMs was previously only
available in the multi-level modeling (MLM) framework, it has been shown that it can be
incorporated in the SEM framework as well (Mehta & West, 2000). This can be done using
definition variables, observed variables placed in parameter matrices in order to fix them to
individual specific values (Mehta & Neale, 2005). In SEM, this involves placing the definition
variables in the factor loading matrix, and allowing this matrix to be individual specific. This
logic is not restricted to linear LGMs; individually varying time points have been incorporated
into nonlinear LGMs in the same way (Sterba, 2014).
Several studies have demonstrated that failure to account for individual measurement
occasions can lead to bias in parameter estimates. Mehta and West (2000) showed that although
estimates for fixed effects (means of the intercept and slope) were generally reasonable, random
effects exhibited some bias. While the estimate for the variance of the slope was acceptable, the
estimates for the variance of the intercept and the covariance between the intercept and slope
were biased. Aydin and colleagues also advise against ignoring time heterogeneity, though they
note some difference between their results and those of Mehta and West (Aydin, Leite, & Algina,
2014). In their more recent study, although these authors also found no appreciable bias in fixed
NONLINEAR MIXED EFFECTS MIXTURE MODELS 11
effects, they found different patterns of bias in the random effects (Aydin et al., 2014). They
found no bias in the estimates for variance of intercept or covariance between intercept and
slope, but did find bias in the variance of the slope. The authors suggested that the differences in
results could be due to differences in study design. Finally, Blozis and Cho (2008) recommend
the incorporation of individual measurement occasions as well. Because their study used real
datasets as opposed to simulated ones, claims regarding direction and magnitude of bias could
not be made. However, these authors noted that models accounting for individual measurement
schedules fit the data better than those that ignored it.
Purpose
The purpose of this article is to extend the work on the nonlinear GMM, hereafter
referred to as the nonlinear mixed effects mixture model (NMEMM) to incorporate individually
varying measurement occasions. Although we do not claim to be the first to fit such models,
there is a dearth in the literature regarding the conditions under which estimation techniques used
to fit these models can be expected to provide accurate, stable, and reliable parameter estimates.
Our goal is to fill this need in order to provide researchers with recommendations on the
appropriate circumstances in which these techniques should be used.
We continue by first developing the NMEMM and discussing how it can be estimated.
We then describe a simulation study in which we test how well the models can be fit as well as
how well parameters can be recovered under general conditions. Next, we demonstrate the
importance of acknowledging time heterogeneity in these models. We continue by exploring
some specific conditions often encountered by researchers in practice. Finally, we apply our
methods to a real dataset involving the development of reading ability in order to illustrate their
use in practical applications.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 12
Model formulation
To lay the foundation for the NMEMM, we begin with the unconditional LGM. In the
GMM literature, the LGM is typically written from the SEM perspective (Bauer, 2007; Sterba,
2013). Given that the SEM framework is an inherently linear framework, this presents no issues
for linear models but precludes the direct specification of nonlinear models within this
framework. Therefore, we choose to describe the LGM from a mixed effects perspective because
of its ability to more easily accommodate nonlinear models. It has been shown that identical
results can be obtained regardless of the framework within which the model is specified (Willett
& Sayer, 1994).
That said, the unconditional LGM is given by
𝒚 𝑖 = 𝑓 (𝒕 𝑖 ,𝜷 ,𝒃 𝑖 ) + 𝒆 𝑖 (1)
where 𝒚 𝑖 = [𝑦 𝑖 1
𝑦 𝑖 2
… 𝑦 𝑖𝑊
]′ is a W × 1 vector of the observed scores of the i
th
individual (i = 1,
2, … , n) measured during wave w (w = 1, 2, …, W). Here, 𝑓 is a linear or nonlinear vector-
valued function of 𝒕 𝑖 = [𝑡 𝑖 1
𝑡 𝑖 2
… 𝑡 𝑖𝑊
]′, a W × 1 vector of the W individual specific time points at
which the i
th
individual was measured, 𝜷 = [𝛽 1
𝛽 2
… 𝛽 𝑃 ]′, a P × 1 vector of fixed effects (where
P is the number of parameters in the growth function), and 𝒃 𝑖 = [𝑏 𝑖 1
𝑏 𝑖 2
… 𝑏 𝑖𝑃
]′, a P × 1 vector
of random effects for the i
th
individual. The random effects are assumed to follow a multivariate
normal distribution, i.e. 𝒃 𝑖 ~ 𝑁 (𝟎 ,𝚺 ) where 𝚺 represents the P × P covariance matrix for the
random effects. Lastly, 𝒆 𝑖 = [𝑒 𝑖 1
𝑒 𝑖 2
… 𝑒 𝑖 𝑊 ]′ represents the W × 1 residual vector of individual i.
Each residual is assumed to be normally distributed, 𝑒 𝑖𝑤
~ 𝑁 (0,𝜎 2
) where 𝜎 2
denotes the
residual variance.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 13
The NMEMM can be conceptualized as an LGM subjected to a finite mixture model.
That is, the NMEMM can be written as
𝒚 𝑖 = ∑𝜋 𝑖𝑐
(𝑓 𝑐 (𝒕 𝑖 ,𝜷 𝑐 ,𝒃 𝑖𝑐
)
𝐶 𝑐 =1
+ 𝒆 𝑖𝑐
) (2)
where once again, 𝒚 𝑖 is a vector of length W containing the scores of the i
th
individual. The
variable 𝜋 𝑖𝑐
represents the unobserved probability that individual i belongs to class c (c = 1, 2,
…, C). Given that 𝜋 𝑖𝑐
is a proportion, we also impose the constraints 0 ≤ 𝜋 𝑖𝑐
≤ 1 and
∑ 𝜋 𝑖𝑐
𝐶 𝑐 =1
= 1. In this model, 𝑓 𝑐 is the potentially class-specific vector-valued function for class c,
though we note that in practice this function is typically chosen to be the same for all classes,
𝑓 1
= 𝑓 2
= ⋯ = 𝑓 𝐶 = 𝑓 . Its first argument, 𝒕 𝑖 is once again a vector of length W containing the
individual specific times at which individual i was measured. 𝜷 𝑐 is the vector containing the
fixed effects for class c. 𝒃 𝑖𝑐
is the i
th
individual’s vector of random effects for class c, assumed to
follow the multivariate normal distribution 𝒃 𝑖𝑐
~ 𝑁 (𝟎 ,𝚺 𝒄 ) where 𝚺 𝒄 is the covariance matrix of
the random effects for class c. Lastly, 𝒆 𝑖𝑐
is a vector of length W containing the class specific
residuals of the i
th
individual, where each residual is assumed to be normally distributed,
𝑒 𝑖𝑐𝑤
~ 𝑁 (0,𝜎 𝑐 2
) with 𝜎 𝑐 2
representing the residual variance for class c.
As mentioned earlier, NMEMMs can also be used to classify individuals into one of the
C latent classes. Because class membership is unobserved, classifications can never be made
definitively. However, an attempt at categorization can be made using posterior probabilities of
membership. Because 𝜋 𝑖𝑐
is estimated for all i and c, the probability that each individual belongs
to each class can be obtained. Individuals are then assigned to the class for which their posterior
probability is greatest. The mixture proportions for a given sample can therefore be derived
based on the proportion of individuals assigned to each class.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 14
Before proceeding, we introduce the notion of measurement windows (Sterba, 2014).
Although they are not technically necessary for the specification of the model as we have
presented it, they can sometimes be needed for estimation. Because all models fit in this paper
require the concept, we describe it here. Measurement windows, as we define them, are discrete
time intervals within which individuals are measured. Individuals need not be measured within a
given window, but they can only provide at most one measurement per window. The time at
which that measurement is given may vary freely within the window, but all measurements must
occur within the bounds of some window. Additionally, each window can be of different lengths;
they need not all be the same size. For our purposes, we require that windows be non-
overlapping.
In this way, measurement windows can be thought of as waves in a cohort study. In this
case, an individual can only have at most W measurements: one per window or wave. As an
example, consider a hypothetical study following individuals during their first three decades of
life, collecting a single measurement per decade. Here, each decade would be considered a
measurement window. One individual could be measured at ages 5, 15, and 25. Another could be
measured at ages 9, 11, and 29. A third could be measured at ages 4 and 12, but drop out of the
study before being measured a third time. These are all valid measurements under the window
system. The measurements at ages 4, 5, and 9 all take place within the first window, the
measurements at ages 11, 12, and 15 take place within the second window, and the
measurements for ages 25 and 29 take place within the third. However, if an individual were to
be measured at ages 6, 11, and 19, this would not be valid because the measurements at age 11
and 19 occur within the same window.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 15
Estimation
As previously noted, estimation of NMEMMs is not straightforward. For linear mixed
effects models, integrating over the random effects can provide closed form solutions (Pan &
Fang, 2002). However, for nonlinear mixed effects models, no closed form solution exists.
Incorporating mixtures only further complicates matters. For linear GMMs, although no
analytical solution exists, the EM algorithm can be used to estimate model parameters (Muthén
& Shedden, 1999). Yet due to its nonlinear form, the NMEMM cannot be estimated in the same
way. As a consequence, a variety of techniques have been proposed to fit this model (Kelley,
2008; Grimm et al., 2010; Harring, 2012; Codd & Cudeck, 2014; Lu & Huang, 2014; Serang et
al., 2015). Because the current study extends the methods described by Serang and colleagues,
we limit our discussion to the techniques used in that study.
Serang and collaborators (2015) compared two different techniques for estimating
NMEMMs. Although this study did not explicitly consider individually varying time points, we
use this opportunity to integrate them here. The first is a method described by Grimm, Ram, and
Estabrook (2010) and is implemented in the SEM framework. Because of the nonlinearity, the
model cannot be directly specified as a structural equation model. To circumvent this, it is
approximated as a nonlinear structured latent curve model using linearization via Taylor series
expansion (Browne & du Toit, 1991).
This can be carried out as follows. Consider a nonlinear growth function, 𝑔 𝑖 (𝒕 𝑖 ,𝝁 𝜂 𝑖 ),
where 𝒈 𝑖 is an individual specific function of time, 𝒕 𝑖 , and the means of the growth parameters,
𝝁 𝜂 𝑖 = [𝜇 𝜂 𝑖 1
𝜇 𝜂 𝑖 2
… 𝜇 𝜂 𝑖𝑃
]′. Note that the functional form of each 𝑔 𝑖 is the same for all
NONLINEAR MIXED EFFECTS MIXTURE MODELS 16
individuals, the only individual specific component is the time of measurement, which is allowed
to vary across individuals. The approximation for the LGM can be written as
𝒚 𝑖 = 𝚲 𝑖 𝛈 𝑖 + 𝒆 𝑖 (3)
where 𝒚 𝑖 is a W × 1 vector of scores, 𝚲 𝑖 is a W × P matrix of factor loadings, 𝛈 𝑖 is a P × 1 vector
of latent factor scores, and 𝒆 𝑖 is a W × 1 vector of residuals. Here, the factor loadings consist of
partial derivatives of the growth function with respect to the fixed effects and the latent factors
correspond to the P parameters in the growth function. This can be more clearly seen if we
rewrite Equation (3) to show the elements of each of its vectors and matrices, namely
[
𝑦 𝑖 1
⋮
𝑦 𝑖𝑊
] =
[
𝝏 𝑔 𝑖 𝝏 𝜇 𝜂 𝑖 1
⋯
𝝏 𝑔 𝑖 𝝏 𝜇 𝜂 𝑖𝑃
⋮ ⋮ ⋮
𝝏 𝑔 𝑖 𝝏 𝜇 𝜂 𝑖 1
⋯
𝝏 𝑔 𝑖 𝝏 𝜇 𝜂 𝑖𝑃
]
[
𝜂 𝑖 1
⋮
𝜂 𝑖𝑃
] + [
𝑒 𝑖 1
⋮
𝑒 𝑖𝑊
] (4)
The factor scores can be decomposed into fixed effects, 𝜷 , and random effects, 𝒃 𝑖 , such that
𝛈 𝑖 = 𝜷 + 𝒃 𝑖 (5)
The random effects and residuals each follow a multivariate normal distribution such that
𝒃 𝑖 ~ 𝑵 (𝟎 ,𝚿 ) and 𝒆 𝑖 ~ 𝑵 (𝟎 ,𝚯 ) where 𝚿 is the P × P covariance matrix for the random effects
and 𝚯 is the W × W covariance matrix for the residuals. Inserting Equation (5) into Equation (3)
yields
𝒚 𝑖 = 𝚲 𝑖 𝜷 + 𝚲 𝑖 𝒃 𝑖 + 𝒆 𝑖 (6)
with expected mean vector (𝝁 𝑖 ) and covariance matrix (𝚺 𝑖 )
𝛍 𝑖 = 𝚲 𝑖 𝜷
𝚺 𝑖 = 𝚲 𝑖 𝚿 𝚲 𝑖 ′
+ 𝚯
(7)
NONLINEAR MIXED EFFECTS MIXTURE MODELS 17
for each individual. Note that there is not only a single mean and covariance matrix, but rather n
of each. Incorporating the finite mixture component, we have
𝒚 𝑖 = ∑𝜋 𝑖𝑐
(𝚲 𝑖𝑐
𝜷 𝑐 + 𝚲 𝑖𝑐
𝒃 𝑖𝑐
+ 𝒆 𝑖𝑐
)
𝐶 𝑐 =1
(8)
where 𝜋 𝑖𝑐
adheres to the same properties described before. From here, we can derive the mean
and covariance expectations for the NMEMM with individual measurement schedules
𝛍 𝑖 = ∑𝜋 𝑖𝑐
(𝚲 𝑖𝑐
𝜷 𝑐 )
𝐶 𝑐 =1
𝚺 𝑖 = ∑𝜋 𝑖𝑐
(𝚲 𝑖𝑐
𝚿 𝑐 𝚲 𝑖𝑐
′
+ 𝚯 𝑐 )
𝐶 𝑐 =1
(9)
This specification allows the NMEMM to be fit via the EM algorithm (Muthén & Shedden,
1999) using SEM software.
The second method utilizes the Bayesian framework, proceeding as follows. Begin by
sampling a class membership indicator 𝑐 𝑖 from a categorical distribution for each individual,
such that
𝑐 𝑖 ~ 𝐶𝑎𝑡 (𝜋 1
,𝜋 2
,…,𝜋 𝐶 ) (10)
where 𝜋 𝑐 represents the probability of belonging to class 𝑐 . Then, the observed score for the i
th
individual measured at wave w, conditional on 𝑐 𝑖 , can be modeled by a normal distribution
𝑦 𝑖𝑤
~ 𝑁 (𝜇 𝑖𝑤
,𝜎 𝑐 2
) (11)
where 𝜇 𝑖𝑤
is defined by the growth function
μ
𝑖𝑤
= 𝑓 𝑐 (𝑡 𝑖𝑤
,𝜷 𝑐 ,𝒃 𝑖𝑐
) (12)
NONLINEAR MIXED EFFECTS MIXTURE MODELS 18
a function of time of measurement, 𝑡 𝑖𝑤
, fixed effects 𝜷 𝑐 = [𝛽 𝑐 1
𝛽 𝑐 2
… 𝛽 𝑐𝑃
]′, and random effects
𝒃 𝑖𝑐
= [𝑏 𝑖𝑐 1
𝑏 𝑖𝑐 2
… 𝑏 𝑖𝑐𝑃
]′ which follow a multivariate normal distribution
𝒃 𝑖𝑐
~ 𝑁 (𝟎 ,𝚺 𝑐 ) (13)
We specify prior distributions
𝝅 ~ 𝐷𝑖𝑟 (𝛼 1
,𝛼 2
,…,𝛼 𝐶 )
𝜎 𝑐 2
~ 𝐼𝐺 (𝜔 ,𝜓 )
𝛽 𝑐𝑝
~ 𝑁 (𝜏 𝑝 ,Φ
𝑝 )
𝚺 𝑐 ~ 𝐼𝑊 (𝛀 ,υ)
(14)
Here, 𝝅 = [𝜋 1
𝜋 2
… 𝜋 𝐶 ]′ follows a Dirichlet (Dir) distribution, each 𝜎 𝑐 2
follows an inverse
gamma (IG) distribution, each 𝛽 𝑐𝑝
follows its own normal distribution mutually independent
from the others, and each 𝚺 𝑐 follows an inverse Wishart (IW) distribution. We note that the priors
for each 𝛽 𝑐𝑝
can potentially have their own parameter specific hyperparameters, though in
practice they are typically the same for each prior.
The selection of the hyperparameters is left to the researcher. In practice,
hyperparameters are typically chosen such that the resulting prior distribution is non-informative.
By non-informative, we mean that the prior distribution contains little information regarding the
posterior distribution. An example of a non-informative prior is the normal distribution defined
by 𝑁 (0,1000). Because the variance is so large, the posterior is informed almost entirely by the
data with little influence coming from the prior. However, when estimation difficulty is
encountered, an informative prior may be necessary (Serang et al., 2015).
Conditional on 𝑐 𝑖 , let the set θ be the set of parameters upon which priors are placed,
namely 𝜽 = {𝜎 𝑐 2
,𝜷 𝑐 ,𝚺 𝑐 }. Furthermore, let 𝐷 be the set containing all observed data, such that
NONLINEAR MIXED EFFECTS MIXTURE MODELS 19
𝐷 = {𝒚 𝑖 ,𝒕 𝑖 }. If we denote the probability density functions of the prior distributions of the
parameters in θ with the function ℎ, then assuming the parameters in θ are independent of
each other, the prior density of θ can be written as
ℎ(𝜽 ) = ℎ(𝜎 𝑐 2
)ℎ(𝚺 𝑐 )∏ℎ(𝛽 𝑐𝑝
)
𝑃 𝑝 =1
(15)
The joint posterior density of θ, conditional on 𝐷 and 𝒄 = [𝑐 1
,𝑐 2
,…,𝑐 𝑛 ]′, the vector of class
membership indicators, can then be expressed as
𝑓 (𝜽 |𝐷 ,𝒄 ) ∝ {∏∫𝑓 (
𝑛 𝑖 =1
𝒚 𝑖 |𝒃 𝑖𝑐
;𝑐 𝑖 , 𝜽 )𝑓 (𝒃 𝑖𝑐
|𝚺 𝑐 )𝑑 𝒃 𝑖𝑐
} ℎ(𝜽 ) (16)
Because the integral in Equation (16) is typically intractable in the sense that it has no analytical
solution, we usually generate samples from the posterior using Markov Chain Monte Carlo
(MCMC) procedures which can be implemented in Bayesian software programs (Lu & Huang,
2014).
NONLINEAR MIXED EFFECTS MIXTURE MODELS 20
Simulations
To evaluate the effectiveness of each of these methods for estimating NMEMMs, we
conducted a number of Monte Carlo simulations. The simulations are divided into three parts.
Part 1 examines general conditions under which the model can be estimated. Part 2 contains two
sections regarding some more specific conditions which may be of interest to researchers; part
2A considers whether failing to account for time heterogeneity induces bias in estimation,
whereas part 2B focuses on the notion of allowing residuals to vary over time. Part 3 deals with
missing data. Part 3A looks at data missing completely at random (MCAR), part 3B concentrates
on attrition, and part 3C examines the situation in which missingness is related to class
membership. Each section will consist of a description of the methods used to implement the
simulations within that section, followed by the results of those simulations.
Before discussing each part specifically, we begin by providing an overview of the
components common to all simulations. All simulations used the three parameter exponential
growth curve model defined by
𝑦 𝑖𝑐𝑤
= (𝛽 𝑐 1
+ 𝑏 𝑖𝑐 1
) + (𝛽 𝑐 2
+ 𝑏 𝑖𝑐 2
) ⋅ (1− exp(−(𝛽 𝑐 3
+ 𝑏 𝑖𝑐 3
) ⋅ 𝑡 𝑖𝑤
)) + 𝑒 𝑖𝑐𝑤
(17)
The outcome score 𝑦 for individual i in class c measured during wave w is written as a function
of fixed effects 𝜷 𝑐 = [𝛽 𝑐 1
𝛽 𝑐 2
𝛽 𝑐 3
]′, random effects 𝒃 𝑖𝑐
= [𝑏 𝑖𝑐 1
𝑏 𝑖𝑐 2
𝑏 𝑖𝑐 3
]′, time of measurement
𝑡 𝑖𝑤
, and residual 𝑒 𝑖𝑐𝑤
. Here, 𝛽 𝑐 1
represents the intercept at time 𝑡 𝑖𝑤
= 0, 𝛽 𝑐 2
is the change from
the intercept to the upper asymptote, and 𝛽 𝑐 3
corresponds to the rate of approach to that
asymptote. The random effects follow a multivariate normal distribution with class invariant
covariance matrix such that 𝒃 𝑖𝑐
~ 𝑁 (𝟎 ,𝚺 ). The residuals each follow a normal distribution with
class invariant variances, 𝑒 𝑖𝑐𝑤
~ 𝑁 (0,𝜎 2
).
NONLINEAR MIXED EFFECTS MIXTURE MODELS 21
Data for all simulations were generated from the model in Equation (17) with 2 latent
classes (C = 2); 80% of individuals were in Class 1, while the remaining 20% were from Class 2.
Five time points were used (W = 5), given that previous work has shown that this number is
sufficient as long as individuals are measured along the same portion of growth curve (Serang et
al., 2015). For each simulation condition, 200 datasets were generated. Models were fit to each
of these datasets, and the results were averaged across datasets to yield a single set of results for
each condition.
Data were generated using the R statistical software (R Core Team, 2014). All models
were then fit using both methods described earlier. For the method utilizing the SEM framework,
models were fit in Mplus version 7.0 (Muthén & Muthén, 1998-2012) facilitated through R using
the MplusAutomation package (Hallquist & Wiley, 2014). For the Bayesian method, models
were fit in OpenBUGS (Lunn, Spiegelhalter, Thomas, & Best, 2009) via the R package
R2OpenBUGS (Sturtz, Ligges, & Gelman, 2005). Sample code for fitting the NMEMM in
Mplus and OpenBUGS is provided in Appendices A and B respectively.
The simulation portion focused on three main outcomes: model convergence, model
selection, and recovery of parameter estimates. Model convergence was judged using different
methods depending on each program. In Mplus, convergence was said to have been reached if
the EM algorithm terminated normally and if no out-of-bounds parameter estimates were
obtained. In OpenBUGS, convergence was determined by the Geweke statistic (Geweke, 1992)
which tests for equality of first 10% and last 50% of the Markov chain, after burn-in iterations
are discarded.
Models were also compared using different statistics for each program, with the goal of
determining whether the data-generating model would be preferred over an alternative model.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 22
Because mixture models are non-nested the likelihood ratio test is inappropriate for model
comparison (Nagin, 1999). As such, we used information criteria in comparing models for
selection. For Mplus, we used the sample size adjusted Bayesian information criterion (aBIC;
Sclove, 1987). Although to our knowledge no study has evaluated the performance of
information criteria for NMEMMs, we prefer this criterion given its favorable performance over
similar information criteria, such as Akaike’s information criterion (AIC) and the usual Bayesian
information criterion with no adjustment (BIC), for both mixture models (Henson, Reise, & Kim,
2007) as well as GMMs (Tofighi & Enders, 2008; Usami, 2014). For OpenBUGS, we compared
models using the deviance information criterion (DIC; Spiegelhalter, Best, Carlin, & van der
Linde, 2002). We note that this is not the DIC reported by OpenBUGS itself (OpenBUGS does
not report the DIC for mixture models), but rather the DIC given by the R2OpenBUGS package.
Both criteria select models for which the value of the criterion is lesser. For each condition, we
report the percentage of simulated datasets for which the data-generating model was selected
over an alternative model.
The ability to recover parameter estimates was evaluated in the same way for both
methods. Accuracy of estimates was determined using percent bias for all estimates in order to
compare them on the same scale. Percent bias is defined using the average of the parameter
estimates for a given condition, such that
𝑃𝑒𝑟𝑐𝑒𝑛𝑡 𝑏𝑖𝑎𝑠 = 100 ⋅
𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 (18)
We note that a negative percent bias simply reflects underestimation of the parameter. Precision
in parameter estimates was indicated by the standard error of the estimates. These were not
transformed, so they can only be compared for other estimates of the same parameter.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 23
Part 1 – General conditions
As mentioned above, the goal of part 1 was to determine general conditions under which
estimation of the NMEMM is feasible. We do this under more or less ideal conditions with the
hopes of establishing an upper bound with regard to results. Measurement windows for this part
were centered at times t = 0, 2, 4, 6, and 8. As such, the intervals defining the spans of these
windows were: (-1, 1), (1, 3), (3, 5), (5, 7), and (7, 9) respectively. Both programs were allowed
to run for 10,000 iterations, though for OpenBUGS the first 5,000 were considered burn-in
iterations and discarded. Two models were fit for each condition: a 2-class model (the data
generating model) and a 1-class model (a model ignoring the mixture component) to see whether
heterogeneity could indeed be detected. Three variables of interest were varied in order to
determine their influence, if any, on estimation: sample size, variation in measurement occasion,
and differences between classes. Sample size took on three values: n = 200, 500, and 1000.
Variation in measurement occasion examined the extent to which variability in time of
measurement affected estimation. Time of observation within a measurement window followed
either a normal distribution or a uniform distribution centered at the center of the window. The
degree of variability was also varied, ranging from large to medium to small. For the normal
distribution, these degrees manifested themselves in differences in the variance parameter, such
that time of measurement followed the distributions 𝑡 𝑖𝑤
~ 𝑁 (𝜇 𝑤 ,0.5
2
), 𝑡 𝑖𝑤
~ 𝑁 (𝜇 𝑤 ,0.25
2
), and
𝑡 𝑖𝑤
~ 𝑁 (𝜇 𝑤 ,0.1
2
), for the large, medium, and small conditions respectively, where 𝜇 𝑤 is the
center of window w. For the uniform conditions, distributions for the large, medium, and small
conditions were 𝑡 𝑖𝑤
~ 𝑈 (𝜇 𝑤 − 1,𝜇 𝑤 + 1), 𝑡 𝑖𝑤
~ 𝑈 (𝜇 𝑤 − 0.5,𝜇 𝑤 + 0.5), and 𝑡 𝑖𝑤
~ 𝑈 (𝜇 𝑤 −
0.25,𝜇 𝑤 + 0.25) respectively. These values were selected such that the 95% probability
coverages for each pairing of normal and uniform distributions were similar, allowing the large
NONLINEAR MIXED EFFECTS MIXTURE MODELS 24
variability condition for the normal distribution to be comparable to that of the large variability
condition for the uniform, etc. We note that although time points were randomly generated, in all
analyses they were considered known and thus fixed as opposed to being treated as random
variables.
Differences between latent classes were made to be either very large or comparatively
smaller (but still rather large). Class 1 fixed effects were the same in each of these conditions,
whereas Class 2 fixed effects were varied based on the differences between classes. In the large
differences condition, 𝛽 21
and 𝛽 22
were set to be 2.5 standard deviations (SDs) greater than 𝛽 11
and 𝛽 12
respectively, while 𝛽 23
was 2 SDs less than 𝛽 13
. Similarly, in the small differences
condition 𝛽 21
and 𝛽 22
were 1.7 SDs greater than 𝛽 11
and 𝛽 12
respectively, while 𝛽 23
was 1.3
SDs less than 𝛽 13
. We acknowledge that the magnitude of the differences between classes are
exceptionally large, however we note that given their exploratory nature, such differences are
necessary in order to detect heterogeneity (Tofighi & Enders, 2008; Grimm, Ram, Shiyko, & Lo,
2013; Serang et al., 2015). Population values for all model parameters are provided in Table 1.
Part 1 – Results
Results for convergence rates are presented in Table 2. Mplus performed very well with
regards to convergence, with nearly all simulations in all conditions converging. In comparison,
convergence rates for OpenBUGS were much lower. Most conditions only had roughly between
45% and 65% of the models fit to their datasets converge. One key difference can be seen in the
conditions with small samples and small differences between classes, in which convergence only
ranged from 28% to 46%. In an effort to address this, we increased the number of iterations to
15,000 for conditions in which the program encountered difficulties in converging, but this did
NONLINEAR MIXED EFFECTS MIXTURE MODELS 25
not help. We note that these low rates may be in part due to our more conservative regions of the
Markov chain (first 10% and last 50%) than previous work (first 30% and last 40%; Serang et
al., 2015). It is also possible that the Geweke statistic may not be the optimal criterion for
convergence for NMEMMs.
With regard to model selection, the proportions for which the 2-class models were
correctly selected over their 1-class counterparts are given in Table 3. In Mplus it appears that,
consistent with past work, the aBIC is able to correctly select the true data generating model in
nearly all cases. OpenBUGS showed a more nuanced pattern. When differences between classes
were large, the DIC performed rather well. However when differences between classes were
small and the sample size was either medium or small, the DIC performed much worse. For
example, for sample sizes of 500, the 2-class model was only selected between 53.5% and 69%
of the time. For small class differences and sample sizes of 200, the 2-class model was correctly
selected at near chance levels of between 45% and 56%. We interpret this to mean that if the
researcher has strong reason to believe that the differences between expected latent classes are
quite large, even samples of size 200 should be sufficient for OpenBUGS to detect these
differences. However, if there is a chance that the differences between classes are not as
pronounced, we recommend that researchers should use sample sizes of 1000 or more.
Percent bias for parameter estimates are shown in Table 4, with standard errors for these
estimates given in Table 5. Because nearly all estimates were reasonable, the results we present
are averaged across all datasets, not only the ones considered to have converged. Beginning with
Mplus, we see that fixed effects estimates showed very little bias. The fixed effects for Class 1
all exhibited less than 1.5% bias, with only a few exceeding 1%. Bias in fixed effects for Class 2
was slightly higher, but still very reasonable given that they were all within 5%. Bias in the
NONLINEAR MIXED EFFECTS MIXTURE MODELS 26
random effects parameters was noticeably larger, especially when differences between classes
were small. For example, when differences were large, bias in Σ
1,1
went up to roughly 4%, while
when differences were small, this same bias was as high as about 9%. Similar results occurred
for Σ
2,2
. For large differences, reach about 4%, while for smaller differences bias went up to
about 5.5%. The greatest amount of bias was observed for Σ
3,3
. When differences were large,
bias ranged from 5.5% to 8.5%. However when differences were small, bias extended from about
11% to 17%. This may have to do with the calculation of the percent bias statistic. Since for this
parameter, average bias was divided by 0.01, the resulting values may be overinflated. Bias in
residual variances was minimal, with average biases for all conditions being within 2%. Finally,
with regard to mixture proportions, Mplus seemed to be able to recover proportions of class
membership rather well. However, we note that for the small differences conditions, relatively
few models reached the 0.80 cutoff value for entropy associated with correct class assignment in
factor mixture models (Lubke & Muthén, 2007). While entropy hovered around 0.90 on average
for the large differences conditions, they ranged between 0.70 and 0.75 for the small differences
conditions.
OpenBUGS showed similar patterns of results, though in some cases more bias. Fixed
effects for Class 1 showed bias within 2.5%, though none outside of the small sample and small
differences conditions exhibited bias greater than 1%. Fixed effects for Class 2 showed slightly
more bias, though all were within 5%. The only exceptions were again for the small sample and
small differences conditions, in which 𝛽 21
had rates of bias between 4.5% and 6.5% and 𝛽 22
had
rates between 8% and 10%. For Σ
1,1
and Σ
2,2
, bias was typically low, with the exception of a
handful of estimates in the small sample conditions, some of which rose to 9% bias. The largest
bias was found for parameter Σ
3,3
, which was much greater in OpenBUGS than Mplus. Bias in
NONLINEAR MIXED EFFECTS MIXTURE MODELS 27
this parameter seemed to be related to sample size. For samples of size 1000, bias ranged from
roughly 17% to 24%. For medium samples of size 500, the bias increased to about 33% to 43%.
Finally, for samples of size 200 bias reached between 75% and 90%, far too large to be
considered acceptable. Estimates for residual variance were reasonable, with none exceeding 4%
bias. Mixture proportions were estimated accurately, but we note that for these parameters, we
used informative priors as opposed to the non-informative priors used for all other parameters,
given the necessity required for estimation described by previous work (Serang et al., 2015).
Overall, it seemed that the magnitude of the differences seemed to drive these results.
Surprisingly, sample size did not appear to impact bias very much at all. However, an interesting
pattern emerged when examining the effects of variation of measurement occasions. Although
the distributional form (normal or uniform) did not seem to have any effects, there seemed to be
a trend with regard to the variability in time of measurement. In general, conditions with a large
amount of variability in measurement occasions produced more biased estimates than those with
less variability. Though this was only a general trend and not a steadfast rule, the result is
consistent with what we would expect. In the extreme case in which all individuals were
measured at the same time points, we would expect bias to be minimal. As we inject more
variability into the time of measurement, we would expect that variability to manifest itself in the
parameter estimates, leading to bias. Fortunately, according to our simulations the magnitude of
that bias is small, often negligible, and in general an acceptable sacrifice when considering how
it allows researchers the flexibility to measure individuals at any time within a measurement
window.
In comparing the two approaches, it seems that the SEM approach implemented in Mplus
seems to have an edge over the Bayesian approach implemented in OpenBUGS. This appears to
NONLINEAR MIXED EFFECTS MIXTURE MODELS 28
hold with regard to model convergence as well as identification of heterogeneity. The SEM
approach also holds a bit of an advantage with regard to bias in parameter estimation, although
bias in the fixed effects (which are typically of more interest to researchers) seems comparable
across programs. However, given that one’s model does converge, we prefer estimates with the
smallest standard errors (SEs). It is in this domain that OpenBUGS seems to make up some
ground. As seen in Table 5, for fixed effects OpenBUGS seems to produce smaller SEs for the
𝛽 𝑐 1
and 𝛽 𝑐 2
parameters for most conditions regardless of class. 𝛽 13
is an exception. OpenBUGS
yielded smaller SEs only when differences between classes was small; when differences between
classes was large, Mplus gave smaller SEs. Mplus also produced smaller SEs for 𝛽 23
. With
regard to random effects parameters, OpenBUGS tended to have smaller SEs for Σ
1,1
and Σ
2,2
.
For Σ
3,3
, OpenBUGS only outperformed Mplus with regard to SE when sample size was large.
For residual variances, OpenBUGS had smaller SEs for large and medium samples, but for small
sample conditions, Mplus had the lower SEs.
Part 2A – Acknowledging time heterogeneity
In an effort to examine whether ignoring time heterogeneity adversely affects estimation,
we fit models that both do and do not acknowledge it to data with individually varying
measurement occasions. In this as well as all subsequent simulations, we attempted to simulate
data that closely resembled a real dataset, namely the ECLS-K dataset described in greater detail
in our illustrative example. Given the fewer number of conditions in this section, we were more
flexible with the number of iterations for which we let each program run. For Mplus, we set the
number of iterations to 1000 because this seemed to be sufficient for convergence. However, for
OpenBUGS, we increased the number of iterations to 20,000 with 10,000 burn-in iterations, in
an effort to increase convergence rates.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 29
Population parameters are given in Table 6. A sample size of 1000 was used with
differences between class set to 5 SDs for 𝛽 𝑐 1
, and 3 SDs for 𝛽 𝑐 2
and 𝛽 𝑐 3
. Although these
differences are extremely large, these parameters created datasets that best resembled the
empirical data on which it was based. Time of measurement was also chosen in this way, though
we note that all time points for all individuals were centered at the mean of the first window in
order to allow the 𝛽 𝑐 1
parameter to be interpreted as the intercept at the start of the study. Since
latent class membership is unknown for real data, we used the same mixture proportions of 80%
for Class 1 and 20% for Class 2 as we used in the previous simulations. Given that the focus of
this section is on time heterogeneity, we only fit 2-class models comparing models that allow for
the individually varying time points to models that treat all individuals as having been measured
at the same time.
Part 2A – Results
In this section, we only concern ourselves with model convergence and recovery of
parameter estimates. We did not feel the need to examine model selection via information
criteria because we do not view the models being compared as competing models. Because these
models treat time of measurement as known, researchers know the structure of the measurement
occasions in their data beforehand. As such, there is no ambiguity in which model is more
appropriate, eliminating the need for model selection via information criteria.
We refer to the condition in which all individuals were treated as having been measured
at the same time (the center of the measurement window) as the “same-time” condition, while
referring to the condition where time of measurement was allowed to vary across individuals as
the “varying-time” condition. For Mplus, convergence rates were high for both conditions. For
NONLINEAR MIXED EFFECTS MIXTURE MODELS 30
the same-time condition, all models converged, while for the varying-time condition, 99% of the
models converged. Parameter estimates were not as similar, the results for which are presented in
Table 7. For the same-time condition, there was a considerable amount of bias exhibited in the
fixed effects, with the largest seen in 𝛽 21
with over 200% bias. However, this paled in
comparison to the bias seen in the random effects, one of which ballooned to over 4000%
(though it should be noted that the population value was extremely small, perhaps making the
percent bias statistic overly sensitive). Mixture proportions were also off, with estimates of 90%
for Class 1 and 10% for Class 2. The average entropy for the same-time condition was only
about 75%. The only parameter that seemed to be estimated well for this condition was the
residual variance, whose bias was under 1%. Comparatively, the varying-time condition
performed relatively well. All fixed effects were within 1% bias with the exception of 𝛽 21
which
exhibited around a 2% bias. Random effects parameters also showed much less bias. Although
the Σ
3,3
still showed excessive bias, it was nowhere near as biased as its same-time counterpart.
Residual variance was also accurately estimated here, with bias under 1%. More importantly,
mixture proportions were accurately estimated to within 1% with average entropy for all models
fit in this condition at around 0.94.
Results for OpenBUGS were comparable. Although the 75% of the models in the same-
time condition converged, only 27% reached convergence in the varying-time condition. For the
same-time condition, fixed effects also showed bias, with the bias in 𝛽 21
exceeding 700%.
Random effects parameters for this condition also seemed excessive, the largest again being Σ
3,3
with over 16000% bias. Mixture proportions were recovered to within 1%, though again this may
be due to the use of informative priors. Bias for the residual variance was under 4%. As
expected, the varying-time condition performed much better, with all fixed effects exhibiting
NONLINEAR MIXED EFFECTS MIXTURE MODELS 31
under 2% bias. Outside of Σ
3,3
, the random effects parameters also seemed to be accurately
estimated, with bias less than 7%. Bias for the residual variance was under 6%, and the mixture
proportions were again well recovered.
In comparing same-time and varying-time conditions, results seem consistent with past
research (Mehta & West, 2000; Blozis & Cho, 2008; Aydin et al., 2014). Failure to account for
individual measurement occasions induces bias in parameter estimates for NMEMMs, perhaps
even more than LGMs. Previous work has found that bias is concentrated in the random effects,
(Mehta & West, 2000; Aydin et al., 2014) and our results overwhelmingly support this notion. It
seems that the variation in time of measurement was absorbed into the random effects more than
any other model parameters. Unlike these other studies, we also found some bias in fixed effects
parameters, though this may be due to the added complexity inherent in nonlinear models as well
as mixture models. We also note models accounting for individual measurement schedules
resulted in uniformly superior SEs for all model parameters, providing further justification for
their use.
Part 2B – Time-varying residuals
Until this point, we have only considered residual structures that are invariant across
time. However, this assumption is often too strict when dealing with models involving nonlinear
change trajectories (Browne & du Toit, 1991; Grimm & Widaman, 2010; Grimm, Zhang,
Hamagami, & Mazzocco, 2013). Fortunately, time non-invariant residual structures can easily be
incorporated into the mathematical formulation of the NMEMM simply by adding subscripts i
and w to all instances of 𝜎 2
and adjusting all distributional assumptions accordingly to reflect
this change.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 32
The goal of this section is to determine whether the methods we have thus far employed
can adequately address an alternative residual structure in NMEMMs. To do this, we generated
data with residual variances following a two-parameter monotonically decreasing exponential
decay function as described by Browne and du Toit (1991), namely
𝜎 𝑖𝑤
2
= 𝛾 1
⋅ exp (− 𝛾 2
𝑡 𝑖𝑤
) (19)
The values of 𝛾 1
and 𝛾 2
were set to 0.05 and 0.2 respectively, in order to best resemble the
empirical data. Note that under this formulation, residual variances only differ across windows,
not within. Residuals were not allowed to covary, resulting in a diagonal residual covariance
matrix. All other aspects of the simulations in this section are identical to part 2A, including the
model parameters, which were set to be the same as those given in Table 6. Two models were fit
and compared for each dataset. In the “invariant-residuals” condition, residual variances were
assumed to be the same across time, whereas for the “varying-residuals” condition, residual
variances were freely estimated for each window. We do not impose any functional structure in
the estimation of the residuals in the varying-residuals condition given that past work has found
that this works best for the empirical data upon which the simulated data are based (Grimm &
Widaman, 2010).
Part 2B – Results
All simulations in both Mplus conditions converged. Additionally, the aBIC correctly
selected the models with time varying residuals over their time invariant counterparts 100% of
the time. Results regarding parameter estimates for both programs are given in Table 8. In the
invariant-residuals condition, very little bias was observed in the fixed effects, all of which were
within 1%. Bias was observed in the random effects parameters, the largest of which was seen in
NONLINEAR MIXED EFFECTS MIXTURE MODELS 33
Σ
3,3
, with over 5000% bias. As expected, a fair amount of bias was observed in the residual
variances, ranging from roughly 14% to 131%. Mixture proportions, however, seemed to be
estimated accurately, to within 1%. Average entropy was high for this condition, with a value of
approximately 0.94. For the varying-residuals condition, fixed effects were also recovered to
within 1% bias, with the exception of 𝛽 21
, which displayed bias around 7%. Bias in the random
effects parameters was much lower, though the bias for Σ
3,3
was still high at 600%. A limited
amount of bias was observed in the residual variances, all of which were within 7%. Mixture
proportions were correctly estimated in this condition, with an average entropy of 0.94 once
again.
The pattern of results was similar in OpenBUGS. Convergence rates were 52% in the
invariant-residuals condition and 32% in the varying-residuals condition. The DIC correctly
selected the varying-residuals models in 86.5% of the simulations. For the invariant-residuals
condition, fixed effects did not exhibit much bias, all within 4%. Random effects parameters did,
especially Σ
3,3
with bias over 12,000%. Similar to Mplus, bias in the residual variances ranged
from 18% to 123%. Finally, mixture proportions were well estimated. In the varying-residuals
condition, exhibited minimal bias as well, all of which were within 4% once again. Random
effects parameters did show bias, with the largest again being Σ
3,3
with bias over 10,000%.
Residual variances in this condition showed much less bias, within 22%. Mixture proportions
were correctly estimated here just as in the other conditions.
Overall, it seems that fixed effects were well recovered by both programs, with minimal
bias. In the invariant-residuals conditions, it seems that the variability in the residuals was
absorbed by the random effects parameters, leading to increased bias. Despite this, it seems that
both programs had difficulty estimating Σ
3,3
, even in the varying-residuals conditions. Although
NONLINEAR MIXED EFFECTS MIXTURE MODELS 34
time varying residual structures are more flexible, we conclude this section by cautioning
researchers considering their use. The power of these residual structures has the potential to be
misused, especially to compensate for a misspecified model. We encourage researchers who
wish to use them to provide strong theoretical justification for why their choice of residual
structure is appropriate. For more thorough discussions on this topic, we refer the reader to
Browne and du Toit (1991) and to Grimm and Widaman (2010).
Part 3A – Data missing completely at random
In part 3, we consider a problem all too common in longitudinal research: missing data.
In this section, we consider data missing completely at random. Though we expect no bias to
result from this type of missingness (Rubin, 1976), we examine it as a starting point to provide a
foundation for subsequent sections. Following McArdle and Hamagami (1991), we began by
generating complete data for all individuals, after which we masked a subset of the data to
simulate missingness. Here, we chose missingness patterns that most closely resemble
missingness patterns observed in the empirical dataset. We began by calculating all missingness
patterns in the full dataset, retaining those patterns that characterized 10 or more individuals per
1000, rounding up to the nearest person. This accounted for 945 individuals. We then added 5
more people to each pattern, along with an additional person to each of the 5 most popular
pattern to complete the set. All other simulation conditions were set to be the same as those in
part 2A, including the parameters listed in Table 6.
Part 3A – Results
Results are contained in Table 9. For Mplus, 99% of all models fit converged. Bias in the
fixed effects was minimal, within 5% for all parameters. Random effects parameters also showed
NONLINEAR MIXED EFFECTS MIXTURE MODELS 35
low levels of bias, outside of Σ
3,3
which reached 137%. No bias was observed in the residual
variance estimates. Mixture proportions were accurately estimated, with an average entropy of
approximately 0.87. In OpenBUGS, only 20.5% of the models converged. As expected, little
bias was seen in the fixed or random effects, again with the exception of Σ
3,3
. Bias in the residual
variances reached 7%, but none was seen in the mixture proportions. In general, it seems that
these results follow the same pattern as we observed in previous simulations, with a limited
amount of bias in parameter estimates outside of Σ
3,3
.
Part 3B – Attrition
In this section, we turn our attention to another type of missingness: attrition. Here we
examine the case in which all individuals participating in the study are present at the first time
point, but gradually drop out over time. Again, we assume that dropout is due to an MCAR
mechanism, with the intent of determining whether enough information remains in the reduced
sample to estimate the NMEMM. To do this, we assume that all individuals are present at the
first occasion of measurement and that attrition follows an exponential decay function similar to
the form given in the right hand side of Equation (19). Values of 1000 for 𝛾 1
and 0.11 for 𝛾 2
were chosen to most closely match the patterns of attrition observed in the empirical data had the
study been designed in this way. All other simulation conditions were identical to those in part
3A.
Part 3B – Results
We present the results in Table 10. Convergence rates reached 97.5% for Mplus, with
limited bias in the fixed effects, the highest of which was for 𝛽 21
with approximately 5% bias.
Random effects parameters showed little bias as well, aside from Σ
3,3
at 85%. Neither residual
NONLINEAR MIXED EFFECTS MIXTURE MODELS 36
variance nor mixture proportions exhibited any bias. Average entropy was high as well, at
approximately 92%. OpenBUGS performed similarly to previous conditions. With regard to
convergence, 34.5% of the models converged. Fixed effects all exhibited under 3% bias.
Random effects parameters showed under 8% bias, with the exception of Σ
3,3
. Bias for residual
variance was within 8%, and mixture proportions were accurately recovered. Overall, it appears
that the declining sample does not appear to induce bias beyond what we would ordinarily
expect, provided that the assumptions regarding the nature of the attrition are met.
Part 3C – Non-random selection
The final section considers data missing not at random. Specifically we examine the case
in which missingness is related to latent class membership. Consider a hypothetical scenario in
which two latent classes of students existed, where Class 1 consisted of the majority of students
and Class 2 was made up of students slightly higher in ability. It is possible that parents of
students in Class 2 are more involved in their children’s education and, being dissatisfied with
the school’s preparation of their children, decide to remove their students from their current
schools and place them in schools not involved in our study. In this case, dropout is conditional
on class membership which is unobserved.
To see whether parameter estimates could be accurately recovered under these
conditions, we simulated data according to this scenario. For simplicity, data for those in Class 1
was complete. Those in Class 2 were split into 5 equal groups. Data for group 1 were complete.
Each of the remaining groups was observed at only 2 consecutive measurement occasions. For
example, group 2 was only measured during windows 1 and 2, group 3 was measured during
windows 2 and 3, etc. All other simulations conditions were identical to those in part 2A.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 37
Part 3C – Results
Results for this section can be found in Table 11. Mplus had a convergence rate of 99%.
Surprisingly, bias was minimal for the fixed effects. Only 𝛽 21
seemed to be affected, with a bias
of about 14%. Random effects parameters also did not seem to be biased much, outside of 𝚺 3,3
for which both programs experienced difficulty. The residual variance only exhibited about 1%
bias as well. However, the estimates of the mixture proportions were biased. Mplus reported that
class membership proportions to be approximately 89% for Class 1 and 11% for Class 2, when
the true proportions were 80% and 20% respectively. These estimates were produced despite the
fact that average entropy was 0.89.
Unexpectedly, OpenBUGS seemed to perform slightly better than Mplus in this
condition. Although the convergence rate was low (17.5%), fixed effects estimates were all
within 4% bias. Bias for random effects parameters was also low, with the expected exception of
Σ
3,3
. Residual variance did show a slight bias at just under 6%. Though estimates of class
membership proportions were closer than Mplus, they were still biased at 84% for Class 1 and
16% for Class 2. We direct the reader to Lu, Zhang, and Lubke (2011) for a comprehensive
discussion regarding Bayesian estimation of GMMs when missing data is class dependent. As
one would expect, for both programs it appears that when missingness is related to class
membership, the resulting estimates for the probability of membership will exhibit bias.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 38
Illustrative Example
In order to demonstrate the practical applications of the methods described in this study,
we illustrate their use with an empirical dataset involving reading ability. Previous work has
suggested that reading development follows a nonlinear trajectory (Grimm & Ram, 2009) and
that two or more subgroups may exist in the development of reading ability over time (Kaplan,
2002; Pianta, Belsky, Vandergrift, Houts, Morrison, & NICHD ECCRN, 2008; Grimm et al.,
2010; Serang et al., 2015). We discuss how NMEMMs can be used in the context of this
problem, though we note that our primary goal is the demonstration of these methods and that a
comprehensive examination of heterogeneity in reading ability is beyond the scope of this study.
The data used in this study come from the Early Childhood Longitudinal Study-
Kindergarten Cohort (ECLS-K). The study began in 1998 and data on reading ability were
collected from over 21,000 nationally representative children between kindergarten and eighth
grade. For our purposes, we selected a random sub-sample of 1000 from this sample in order to
more closely match our simulation conditions. For simplicity, we also assume that each
individual is self-weighting. We used data from five waves: fall of kindergarten, spring of first
grade, and spring of third, fifth, and eighth grades. Data were also collected during spring of
kindergarten and fall of first grade, but these were not used because this would have resulted in
overlapping measurement windows. Outcomes consisted of reading ability theta scores,
constructed using item response theory (IRT scores). We model these scores using age as the
independent variable, which varies by individual, as opposed to grade level. Although age is not
given in the dataset directly, the exact date of measurement is provided, from which age can be
inferred.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 39
A plot of 100 randomly selected individuals from our subsample (10% of the subsample)
are given in Figure 1. At a glance, it can be seen that growth does not follow a linear trajectory,
suggesting that NMEMMs may be more appropriate. As such, the model provided in Equation
(17) was fit to the data in the same way as was done in the simulations. Both a 1-class and 2-
class model were fit, in order to determine if heterogeneity was present. Since both the aBIC and
the DIC selected the 2-class model, we only discuss the methods and results for the 2-class
model here.
The model was run for 10,000 iterations in Mplus and was able to successfully converge.
Convergence for OpenBUGS, however, was not reached despite a wide range of starting values,
prior distributions, alternative samples of various sizes, and a large number of iterations. Upon
closer inspection of the Markov chains, parameter estimates for OpenBUGS had the tendency to
move toward a solution in which the second class consisted of individuals whose reading ability
declined over the length of the study. We considered this an unacceptable solution, given that in
the context of this problem it is unreasonable to believe that reading ability declines between
kindergarten and eighth grade. We are not the first to encounter difficulties when trying to fit
NMEMMs to the ECLS-K dataset using OpenBUGS, and as such we followed the procedure of
previous work in using WinBUGS (Lunn, Best, & Spiegelhalter, 2000) to fit the models instead
(Serang et al., 2015). WinBUGS was facilitated through R using the R2WinBUGS package
(Sturtz et al., 2005). Convergence for WinBUGS was reached after 15,000 iterations, with
10,000 iterations discarded as burn-in.
Results for both Mplus and WinBUGS are given in Table 12. The structure and
composition of the latent classes appeared to be the same for both programs. The majority class,
Class 1, consisted of most individuals (92% for Mplus, 90% for WinBUGS). The minority class,
NONLINEAR MIXED EFFECTS MIXTURE MODELS 40
Class 2, contained students who started behind the majority class. Upon entering kindergarten,
these students were characterized by lower intercepts, 𝛽 21
. We note that the difference at the start
of the study was found to be larger in Mplus than in WinBUGS. However, these students also
showed more growth, as evident in their larger change to asymptote parameters, 𝛽 22
. Finally, the
students in the minority class also approached this asymptote at a faster rate, indicated by their
rate parameters, 𝛽 23
.
Graphs of the mean trajectories for both groups and programs are provided in Figure 2.
An interesting feature can be seen in the graph for Mplus, where the trajectory for Class 2
overtakes that of Class 1 between first and fourth grade, at which point they cross again. The
crossing of trajectories makes little sense, and could indicate that the solution obtained by Mplus
should be interpreted with caution, especially given an entropy value of 0.774. The picture
painted by WinBUGS seems clearer. Here, students in Class 2 lag behind those in Class 1 until
around third grade, at which point the two classes have nearly identical scores. The value in
fitting the model in both programs is evident given the differences in interpretation between
them. Mplus results suggest that the minority class catches up to the majority class by first grade,
whereas WinBUGS suggests that this does not happen until the third grade. These two
interpretations can have very different implications for researchers and policymakers alike,
highlighting the utility of examining the issue from multiple perspectives.
These results are interesting because they suggest a different latent class structure than
the work of others who have settled on 2-class solutions. Using GMMs, Pianta and colleagues
(2008) also found two latent classes, but the second class corresponded to a group of early
readers, whose performance was above the typical readers of the majority class. Using
NMEMMs but with a Gompertz model, Grimm and colleagues (2010) performed an analysis
NONLINEAR MIXED EFFECTS MIXTURE MODELS 41
which also resulted in a separation between early and typical readers. This is noteworthy in that
these authors also analyzed data from the ECLS-K. Our results differ in that our second class is
composed of later readers as opposed to early readers. This difference may stem from our use of
individually varying measurement occasions. If failing to acknowledge time heterogeneity can
alter the structure and composition of the latent class solution, this could have substantial
implications for research conclusions and resulting policy decisions. However, this concern
extends beyond the scope of the current study, and as such we leave it as an avenue for future
research.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 42
Discussion
Overall, it seems that NMEMMs with individually varying measurement occasions can
be fit in both the SEM and Bayesian frameworks. In part 1, we examined the performance of
both of these methods under general conditions via Monte Carlo simulation. The main force
driving the result seemed to be class separation. When differences between classes were smaller,
far more difficulties in estimation were encountered, particularly in OpenBUGS. The effect of
sample size also seems to depend on the distance between classes. When class differences are
large, it seems that even a sample of 200 is sufficient. However if the extent of the differences
between latent classes is unknown, which is typically the case in practice, we recommend using
samples of 1000 or more. The variability in measurement occasion did not appear to be too
influential a factor. We observed that, as expected, models performed slightly better when
individuals were measured closer to each other in time. Yet this improvement was very minimal
at best and in practice any gains received would likely not be worth the effort required to ensure
that measurement occasions were identical.
In part 2, we considered whether acknowledging time heterogeneity was worthwhile,
along with the possibility of time-varying residuals. We found that ignoring individual
measurement schedules led to some bias in estimates of fixed effects as well as overwhelming
amounts of bias in the random effects. Consistent with previous work (Mehta & West, 2000;
Blozis & Cho, 2008; Aydin et al., 2014), we encourage researchers to account for individually
varying measurement occasions when fitting NMEMMs if their data allow for it. With regard to
time-varying residuals, we demonstrated that alternative residual structures can easily be
incorporated into our methods for fitting NMEMMs, though we caution researchers to choose
residual structures carefully and with strong justification.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 43
Part 3 dealt with missing data. We showed that both programs can deal with MCAR data
effectively, both under patterns of missingness observed in empirical data as well as missingness
purely due to attrition. However, when dropout is related to class membership (not missing at
random), bias in some parameters is exhibited, especially the proportions of individuals in each
latent class. Because only unconditional NMEMMs were considered in this study, conditions
under which data were missing at random (MAR) were not examined. Extensions of this model
that do incorporate covariates may find the MAR condition to be of interest, so we encourage
future research to explore this further.
In general, it seems that most parameters can be estimated accurately despite allowing for
variability in time of measurement. In practical applications, the parameters of interest are
typically the fixed effects and mixture proportions since these serve to characterize the structure
and properties of the latent classes. Of the parameters in NMEMMs, these are also usually the
most readily interpretable as well as the most useful in making decisions regarding the results of
the analysis. Fortunately, it appears that these parameters are estimated with little difficulty and
with low bias overall. Residual variances, though not typically the focus of most analyses, also
seem to be recovered rather well. The biggest difficulties encountered seemed to be localized in
the random effects parameters, which exhibited a great deal of bias in certain conditions. This is
especially true for the rate of change parameter. We believe that the bias in the random effects
parameters is due in part to variability in measurement occasion, as well as their relative size,
since smaller values lead to greater percent bias simply as a result of the method by which the
statistic is calculated.
Comparing programs, it seems that estimation in Mplus seems more consistent. Nearly all
models fit in Mplus converged and the aBIC was able to correctly identify the data-generating
NONLINEAR MIXED EFFECTS MIXTURE MODELS 44
model in nearly all conditions. OpenBUGS had far more difficulty in reaching convergence, and
the DIC was much more conservative, opting to select the alternative model far more often.
However, as mentioned earlier OpenBUGS seems to produce smaller SEs for parameter
estimates overall than Mplus. When estimation procedures do converge, smaller SEs are more
desirable. Yet to achieve these smaller SEs, OpenBUGS requires a combination of informative
starting values and/or priors to aid in the estimation. As such, we second the recommendation of
Serang and colleagues (2015), who advocate first using Mplus to find rough estimates of the
parameters, then refining these estimates by feeding them to OpenBUGS to obtain the smaller
SEs. This procedure not only results in more precise estimates, but the use of Mplus first
simultaneously produces the information needed for the starting values and priors required to
successfully implement the estimation procedure in OpenBUGS.
When implemented appropriately, NMEMMs can serve as a valuable tool in the analysis
of interindividual differences and intraindividual change. However as discussed by Bauer (2007),
GMMs, and by extension NMEMMs, also make several strong assumptions that when violated
can lead to incorrect results. We take this opportunity to highlight two of the main relevant
assumptions. The first assumption is that of within-class multivariate normality. NMEMMs can
be used to extract normally distributed classes from non-normally distributed data. Yet, as
pointed out by Bauer and Curran (2003), the ability to extract normally distributed latent classes
does not imply that heterogeneity truly exists in the data. These authors showed that mixture
models can extract two normally distributed latent classes from data generated according to a
homogeneous lognormal distribution. As such, we encourage users of NMEMMs to have some
theory justifying the existence of the latent classes they extract and to be aware of the potential to
see heterogeneity when there is none.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 45
A second assumption is the correct specification of the model. Though this assumption is
implicitly made when fitting most statistical models, it is especially important when fitting
NMEMMs because of the number of components. Unlike GMMs, the NMEMM offers a wider
variety of choices for the functional form of the change. Selecting these functional forms is
critical in the model construction process, especially when change patterns in the latent classes
are expected to follow different functional forms. Once these have been decided, the correct
specification of the mean and covariance structures is essential in arriving at desired result.
Failure to do so may unintentionally force variability into different pieces of the model, which
could lead to biased parameter estimates or even the extraction of too many or too few latent
classes. These concerns are even more salient for models involving individually varying
measurement occasions.
While this study has extended the capabilities of NMEMMs in several ways, there are
still many directions left open for future research. One possibility is the study of alternative
residual structures beyond those discussed here. Although we examined two possible residual
structures, there are several possible alternatives that are worth exploring (Grimm & Widaman,
2010; Harring & Blozis, 2014). Another option involves the random effects. In this study, we
constrained random effects parameters to be equal across latent classes as is often done in
practice. However, relaxing this constraint may improve estimation of NMEMMs in any number
of ways. Random effects parameters were not the only component on which we imposed equality
constraints; we also constrained the change across the classes to follow the same functional form.
In this study we only used a three parameter exponential model, but there are several alternative
nonlinear functional forms from which to choose. The literature on NMEMMs would benefit
from examining the properties of these other functional forms, as well as the feasibility of
NONLINEAR MIXED EFFECTS MIXTURE MODELS 46
allowing different classes to follow different functional forms. With regard to the number of
latent classes, we only fit models with one or two classes. Yet some practical applications may
expect the existence of five or more classes (Colder, Campbell, Ruel, Richardson, & Flay, 2002).
As such, examining the influences of including more latent classes in the estimation of
NMEMMs may be useful. Additionally, we only considered unconditional NMEMMs in this
study. In practice it is often of interest to find characteristics of individuals that aid researchers in
predicting class membership, or at the very least permit them to incorporate covariates.
Conditional NMEMMs thus represent a natural next step. Although we have not yet made it
explicit, all models fit in this study assumed that all observations were independent and self-
weighting. Since many large scale longitudinal studies (especially those that would be most
likely to use the techniques we have discussed) employ complex sampling schemes requiring
sample weights, future research may look to incorporate these into the estimation of NMEMMs.
Model selection criteria for selecting the number of latent classes also deserve attention,
especially when fitting the model from a Bayesian perspective. Though the DIC can be useful, it
has a number of flaws (Spiegelhalter, Best, Carlin, & van der Linde, 2014). There is some work
examining the effectiveness of alternative model selection criteria for robust GMMs (Lu &
Zhang, 2014), but this has yet to be extended to NMEMMs. Finally, in the illustrative example,
we observed hints that incorporating individual measurement schedules can lead to different
latent class structures with different properties than if measurement was considered to be
performed at the same time points for all individuals. If this is indeed the case, we believe it
important to investigate the magnitude of these differences. If the solutions are drastically
distinct, this would provide even more evidence suggesting the use of individually varying
measurement occasions within NMEMMs.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 47
Acknowledgements
This work was supported in part by grant AG 007137-25 from the NIH/NIA. The author
would like to thank Dr. Kevin J. Grimm and Dr. John J. McArdle for their guidance and for
providing the opportunity to make this work possible. The author would also like to thank Dr.
Zhiyong Zhang, Dr. Rand Wilcox, and Dr. Richard John for their helpful comments and
suggestions.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 48
References
Aydin, B., Leite, W. L., & Algina, J. (2014). The consequences of ignoring variability in
measurement occasions within data collection waves in latent growth models. Multivariate
Behavioral Research, 49(2), 149–160.
Baltes, P. B. & Nesselroade, J. R. (1979). History and rationale of longitudinal research. In J. R.
Nesselroade & P. B. Baltes (eds.), Longitudinal research in the study of behavior and
development (pp. 1–39). New York: Academic Press.
Bauer, D. J. (2007). Observations on the use of growth mixture models in psychological
research. Multivariate Behavioral Research, 42(4), 757–786.
Bauer, D. J., & Curran, P. J. (2003). Distributional assumptions of growth mixture models:
Implications for overextraction of latent trajectory classes. Psychological Methods, 8(3),
338–363.
Blozis, S. A., & Cho, Y. I. (2008). Coding and centering of time in latent curve models in the
presence of interindividual time heterogeneity. Structural Equation Modeling: A
Multidisciplinary Journal, 15(3), 413–433.
Browne, M. W., & du Toit, S. H. C. (1991). Models for learning data. In L. M. Collins & J. L.
Horn (Eds.), Best methods for the analysis of change: Recent advances, unanswered
questions, future directions (pp. 47–68). Washington, DC: American Psychological
Association.
Codd, C. L., & Cudeck, R. (2014). Nonlinear random-effects mixture models for repeated
measures. Psychometrika, 79(1), 60–83.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 49
Colder, C. R., Campbell, R. T., Ruel, E., Richardson, J. L., & Flay, B. R. (2002). A finite
mixture model of growth trajectories of adolescent alcohol use: predictors and
consequences. Journal of Consulting and Clinical Psychology, 70(4), 976–985.
Cudeck, R., & Harring, J. R. (2007). Analysis of nonlinear patterns of change with random
coefficient models. Annual Review of Psychology, 58, 615–637.
Curran, P. J., & Bauer, D. J. (2011). The disaggregation of within-person and between-person
effects in longitudinal models of change. Annual Review of Psychology, 62(1), 583–619.
Grimm, K. J., & Ram, N. (2009). Non-linear growth models in Mplus and SAS. Structural
Equation Modeling: A Multidisciplinary Journal, 16(4), 676–701.
Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches to calculating
posterior moments. In J. M. Bernardo, J. O. Berger, A. P. Dawiv, & A. F. M. Smith (Eds.),
Bayesian statistics 4: Proceedings of the fourth Valencia International Meeting (pp. 169–
194). Oxford, UK: Clarendon.
Grimm, K. J., Ram, N., & Estabrook, R. (2010). Nonlinear structured growth mixture models in
Mplus and OpenMx. Multivariate Behavioral Research, 45(6), 887–909.
Grimm, K. J., Ram, N., Shiyko, M. P., & Lo, L. L. (2013). A simulation study of the ability of
growth mixture models to uncover growth heterogeneity. In J. J. McArdle & G. Ritschard
(Eds.), Contemporary issues in exploratory data mining (pp. 172–189). New York, NY:
Routledge.
Grimm, K. J., & Widaman, K. F. (2010). Residual structures in latent growth curve
modeling. Structural Equation Modeling: A Multidisciplinary Journal, 17(3), 424–442.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 50
Grimm, K., Zhang, Z., Hamagami, F., & Mazzocco, M. (2013). Modeling nonlinear change via
latent change and latent acceleration frameworks: Examining velocity and acceleration of
growth trajectories. Multivariate Behavioral Research, 48(1), 117–143.
Hallquist, M., & Wiley, J. (2014). MplusAutomation: Automating Mplus model estimation and
interpretation. R package version 0.6-3. http://CRAN.R-
project.org/package=MplusAutomation
Harring, J. R. (2012). Finite mixtures of nonlinear mixed effects models. In J. R. Harring & G. R.
Hancock (Eds.), Advances in longitudinal methods in the social and behavioral sciences
(pp. 159–192). Charlotte, NC: Information Age.
Harring, J. R., & Blozis, S. A. (2014). Fitting correlated residual error structures in nonlinear
mixed-effects models using SAS PROC NLMIXED. Behavior Research Methods, 46(2),
372–384.
Henson, J. M., Reise, S. P., & Kim, K. H. (2007). Detecting mixtures from structural model
differences using latent variable mixture modeling: A comparison of relative model fit
statistics. Structural Equation Modeling: A Multidisciplinary Journal, 14(2), 202–226.
Kaplan, D. (2002). Methodological advances in the analysis of individual growth with relevance
to education policy. Peabody Journal of Education, 77(4), 189–215.
Kelley, K. (2008). Nonlinear change models in populations with unobserved
heterogeneity. Methodology: European Journal of Research Methods for the Behavioral
and Social Sciences, 4(3), 97–112.
Lu, X., & Huang, Y. (2014). Bayesian analysis of nonlinear mixed-effects mixture models for
longitudinal data with heterogeneity and skewness. Statistics in Medicine, 33(16), 2830–
2849.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 51
Lu, Z., Zhang, Z. (2014). Robust growth mixture models with non-ignorable missingness:
Models, estimation, selection, and application. Computational Statistics and Data Analysis,
71, 220–240.
Lu, Z., Zhang, Z., Lubke, G. (2011). Bayesian inference for growth mixture models with latent-
class-dependent missing data. Multivariate Behavioral Research, 46, 567–597.
Lubke, G., & Muthen, B. O. (2007). Performance of factor mixture models as a function of
model size, covariate effects, and class-specific parameters. Structural Equation Modeling:
A Multidisciplinary Journal, 14(1), 26–47.
Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D. (2000). WinBUGS – a Bayesian
modelling framework: Concepts, structure, and extensibility. Statistics and Computing, 10,
325–327.
Lunn, D., Spiegelhalter, D., Thomas, A., & Best, N. (2009). The BUGS project: Evolution,
critique, and future directions. Statistics in Medicine, 28, 3049–3067.
McArdle, J.J. (1988). Dynamic but structural equation modeling of repeated measures data. In
J.R. Nesselroade & R.B. Cattell (Eds.), Handbook of Multivariate Experimental Psychology
(Vol. 2, pp. 561–614). New York: Plenum Press.
McArdle, J. J. (2009). Latent variable modeling of differences and changes with longitudinal
data. Annual Review of Psychology, 60(1), 577–605.
McArdle, J. J., & Epstein, D. (1987). Latent growth curves within developmental structural
equation models. Child Development, 58(1), 110–133.
McArdle, J. J., & Hamagami, F. (1992). Modeling incomplete longitudinal and cross-sectional
data using latent growth structural models. Experimental Aging Research, 18(3), 145–166.
McLachlan, G., & Peel, D. (2000). Finite Mixture Models. New York, NY: Wiley.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 52
Mehta, P. D., & Neale, M. C. (2005). People are variables too: multilevel structural equations
modeling. Psychological Methods, 10(3), 259–284.
Mehta, P. D., & West, S. G. (2000). Putting the individual back into individual growth
curves. Psychological Methods, 5(1), 23–43.
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55(1), 107–122.
Muthén, B., & Muthén, L. K. (2000). Integrating person-centered and variable-centered analyses:
growth mixture modeling with latent trajectory classes. Alcoholism, Clinical and
Experimental Research, 24(6), 882–891.
Muthén, B., & Shedden, K. (1999). Finite mixture modeling with mixture outcomes using the
EM algorithm. Biometrics, 55(2), 463–469.
Muthén, L. K., & Muthén, B. O. (1998–2012). Mplus user’s guide (7th ed.). Los Angeles, CA:
Muthén & Muthén.
Nagin, D. S. (1999). Analyzing developmental trajectories: A semiparametric, group-based
approach. Psychological Methods, 4(2), 139–157.
Nolen-Hoeksema, S. (2001). Gender differences in depression. Current Directions in
Psychological Science, 10(5), 173–176.
Pan, J.-X., & Fang, K.-T. (2002). Growth curve models and statistical diagnostics. New York,
NY: Springer.
Pianta, R. C., Belsky, J., Vandergrift, N., Houts, R., & Morrison, F. J. (2008). Classroom Effects
on Children’s Achievement Trajectories in Elementary School. American Educational
Research Journal, 45(2), 365–397.
R Core Team (2014). R: A language and environment for statistical computing. Vienna, Austria:
R Foundation for Statistical Computing.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 53
Rubin, D. B. (1976). Inference and missing data. Biometrika, 63(3), 581–592.
Sclove, S. L. (1987). Application of model-selection criteria to some problems in multivariate
analysis. Psychometrika, 52(3), 333–343.
Serang, S., Zhang, Z., Helm, J., Steele, J. S., & Grimm, K. J. (2015). Evaluation of a Bayesian
approach to estimating nonlinear mixed-effects mixture models. Structural Equation
Modeling: A Multidisciplinary Journal, 22(2), 202–215.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & Van Der Linde, A. (2002). Bayesian measures
of model complexity and fit. Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 64(4), 583–639.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2014). The deviance
information criterion: 12 years on. Journal of the Royal Statistical Society: Series B
(Statistical Methodology), 76(3), 485–493.
Sterba, S. K. (2013). Understanding linkages among mixture models. Multivariate Behavioral
Research, 48(6), 775–815.
Sterba, S. K. (2014). Fitting nonlinear latent growth curve models with individually varying time
points. Structural Equation Modeling: A Multidisciplinary Journal, 21(4), 630–647.
Sturtz, S., Ligges, U., & Gelman, A. (2005). R2WinBUGS: A package for running WinBUGS
from R. Journal of Statistical Software, 12, 1–16.
Timmons, A. C., & Preacher, K. J. (2015). The importance of temporal design: How do
measurement intervals affect the accuracy and efficiency of parameter estimates in
longitudinal research? Multivariate Behavioral Research, 50(1), 41–55.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 54
Tofighi, D., & Enders, C. K. (2008). Identifying the correct number of classes in growth mixture
models. In G. R. Hancock & K. M. Samuelsen (Eds.), Advances in latent variable mixture
models (pp. 317–341). Charlotte, NC: Information Age.
Usami, S. (2014). Performance of information criteria for model selection in a latent growth
curve mixture model. Journal of the Japanese Society of Computational Statistics, 27(1),
17–48.
Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and
predictors of individual change over time. Psychological Bulletin, 116(2), 363–381.
NONLINEAR MIXED EFFECTS MIXTURE MODELS 55
Appendix A
Sample Mplus model statement for 2-class NMEMM
TITLE: Nonlinear Mixed Effects Mixture
Model;
DATA: FILE = data.dat;
VARIABLE: NAMES = y1-y5 t1-t5;
USEVAR = y1-y5;
CLASSES = c(2);
CONSTRAINT = t1-t5;
ANALYSIS: TYPE= MIXTURE;
ITERATIONS = 10000;
MODEL:
%OVERALL%
b1 BY y1-y5@1;
b2 BY y1* y2-y5 ;
b3 BY y1* y2-y5 ;
!Means
[y1-y5@0 b1*50 b2*100 b3@0];
!Variances & Covariances
y1-y5 (theta);
b1 (V_1);
b2 (V_2);
b3 (V_3);
b1 WITH b2 (c_12);
b1 WITH b3 (c_13);
b2 WITH b3 (c_23);
%c#1%
b2 BY y1* (L21a)
y2 (L22a)
y3 (L23a)
y4 (L24a)
y5 (L25a);
b3 BY y1* (L31a)
y2 (L32a)
y3 (L33a)
y4 (L34a)
y5 (L35a);
[b1*20] (mu_1a); [b2*120] (mu_2a);
%c#2%
b2 BY y1* (L21b)
y2 (L22b)
y3 (L23b)
y4 (L24b)
y5 (L25b);
b3 BY y1* (L31b)
y2 (L32b)
y3 (L33b)
y4 (L34b)
y5 (L35b);
[b1*20] (mu_1b); [b2*120] (mu_2b);
MODEL CONSTRAINT:
new(mu_3a*0.4 mu_3b*0.6);
!Constraints for Class 1;
L21a = 1 - exp (- mu_3a * t1);
L22a = 1 - exp (- mu_3a * t2);
L23a = 1 - exp (- mu_3a * t3);
L24a = 1 - exp (- mu_3a * t4);
L25a = 1 - exp (- mu_3a * t5);
L31a = (mu_2a) * t1 * exp (- mu_3a * t1);
L32a = (mu_2a) * t2 * exp (- mu_3a * t2);
L33a = (mu_2a) * t3 * exp (- mu_3a * t3);
L34a = (mu_2a) * t4 * exp (- mu_3a * t4);
L35a = (mu_2a) * t5 * exp (- mu_3a * t5);
!Constraints for Class 2;
L21b = 1 - exp (- mu_3b * t1);
L22b = 1 - exp (- mu_3b * t2);
L23b = 1 - exp (- mu_3b * t3);
L24b = 1 - exp (- mu_3b * t4);
L25b = 1 - exp (- mu_3b * t5);
L31b = (mu_2b) * t1 * exp (- mu_3b * t1);
L32b = (mu_2b) * t2 * exp (- mu_3b * t2);
L33b = (mu_2b) * t3 * exp (- mu_3b * t3);
L34b = (mu_2b) * t4 * exp (- mu_3b * t4);
L35b = (mu_2b) * t5 * exp (- mu_3b * t5);
OUTPUT: TECH1;
NONLINEAR MIXED EFFECTS MIXTURE MODELS 56
Appendix B
Sample OpenBUGS model statement for 2-class NMEMM
model {
for( i in 1:N ) {
#Latent class indicators
c[i] ~ dcat(pi[1:2])
#Random effects
b[i,1:3] ~ dmnorm(beta[c[i],1:3], pre.bsigma[1:3,1:3])
#Growth curve
for(t in 1:5) {
y[i,t] ~ dnorm(mu[i,t], pre.ysigma)
mu[i,t]<-b[i,1] + b[i,2] * (1 - exp(-b[i,3]*time[i,t]))
}
}
#Priors
pi[1:2] ~ ddirch(alpha[1:2])
alpha[1]<- N * 0.8
alpha[2]<- N * 0.2
beta[1,1] ~ dnorm(0, 0.001)
beta[2,1] ~ dnorm(0, 0.001)
beta[1,2] ~ dnorm(0, 0.001)
beta[2,2] ~ dnorm(0, 0.001)
beta[1,3] ~ dnorm(0, 0.001)
beta[2,3] ~ dnorm(0, 0.001)
pre.ysigma ~ dgamma(0.001, 0.001)
pre.bsigma[1:3,1:3] ~ dwish(omega[1:3,1:3],3)
omega[1,1]<-1
omega[2,2]<-1
omega[3,3]<-1
omega[1,2]<-0
omega[1,3]<-0
omega[2,3]<-0
omega[2,1]<-omega[1,2]
omega[3,1]<-omega[1,3]
omega[3,2]<-omega[2,3]
#Transforming parameters
ysigma <- 1/pre.ysigma
bsigma[1:3,1:3]<- inverse(pre.bsigma[1:3,1:3])
}
NONLINEAR MIXED EFFECTS MIXTURE MODELS 57
Tables
Table 1: Part 1 – General conditions – Population values
Parameter Class 1 Means Class 2 Means
(Large Differences)
Class 2 Means
(Small Differences)
Fixed Effects
𝛽 1
20 32.5 28.5
𝛽 2
125 137.5 135.2
𝛽 3
0.70 0.50 0.57
Random Effects
Σ
1,1
25 25 25
Σ
1,2
0 0 0
Σ
2,2
36 36 36
Σ
3,1
0 0 0
Σ
3,2
0 0 0
Σ
3,3
.01 .01 .01
Residual Variance
𝜎 2
9 9 9
Mixture Proportions
𝜋 𝑐
0.80 0.20 0.20
NONLINEAR MIXED EFFECTS MIXTURE MODELS 58
Table 2: Part 1 – General conditions – Proportion of datasets for which models converged
Mplus
Large Differences
Time Distribution n=1000 n=500 n=200
𝑁 (0,0.5
2
) 1.000 1.000 1.000
𝑁 (0,0.25
2
) 1.000 1.000 1.000
𝑁 (0,0.1
2
) 0.995 1.000 0.980
𝑈 (−1,1) 1.000 1.000 1.000
𝑈 (−0.5,0.5) 1.000 1.000 1.000
𝑈 (−0.25,0.25) 1.000 1.000 1.000
Small Differences
Time Distribution n=1000 n=500 n=200
𝑁 (0,0.5
2
) 1.000 1.000 1.000
𝑁 (0,0.25
2
) 1.000 1.000 1.000
𝑁 (0,0.1
2
) 1.000 1.000 1.000
𝑈 (−1,1) 1.000 1.000 1.000
𝑈 (−0.5,0.5) 1.000 1.000 1.000
𝑈 (−0.25,0.25) 1.000 1.000 1.000
OpenBUGS
Large Differences
Time Distribution n=1000 n=500 n=200
𝑁 (0,0.5
2
) 0.590 0.470 0.550
𝑁 (0,0.25
2
) 0.575 0.575 0.545
𝑁 (0,0.1
2
) 0.580 0.530 0.595
𝑈 (−1,1) 0.545 0.475 0.465
𝑈 (−0.5,0.5) 0.575 0.525 0.500
𝑈 (−0.25,0.25) 0.615 0.580 0.620
Small Differences
Time Distribution n=1000 n=500 n=200
𝑁 (0,0.5
2
) 0.580 0.455 0.325
𝑁 (0,0.25
2
) 0.585 0.595 0.395
𝑁 (0,0.1
2
) 0.585 0.595 0.410
𝑈 (−1,1) 0.565 0.560 0.280
𝑈 (−0.5,0.5) 0.580 0.570 0.370
𝑈 (−0.25,0.25) 0.635 0.645 0.460
NONLINEAR MIXED EFFECTS MIXTURE MODELS 59
Table 3: Part 1 – General conditions – Proportion of datasets for which the 2-class model was
correctly selected over the 1-class model
Mplus
Large Differences
Time Distribution n=1000 n=500 n=200
𝑁 (0,0.5
2
) 1.000 1.000 1.000
𝑁 (0,0.25
2
) 1.000 1.000 1.000
𝑁 (0,0.1
2
) 0.995 1.000 0.980
𝑈 (−1,1) 1.000 1.000 1.000
𝑈 (−0.5,0.5) 1.000 1.000 1.000
𝑈 (−0.25,0.25) 1.000 1.000 1.000
Small Differences
Time Distribution n=1000 n=500 n=200
𝑁 (0,0.5
2
) 1.000 1.000 1.000
𝑁 (0,0.25
2
) 1.000 1.000 1.000
𝑁 (0,0.1
2
) 1.000 1.000 1.000
𝑈 (−1,1) 1.000 1.000 1.000
𝑈 (−0.5,0.5) 1.000 1.000 1.000
𝑈 (−0.25,0.25) 1.000 1.000 1.000
OpenBUGS
Large Differences
Time Distribution n=1000 n=500 n=200
𝑁 (0,0.5
2
) 1.000 0.985 0.920
𝑁 (0,0.25
2
) 1.000 1.000 0.930
𝑁 (0,0.1
2
) 1.000 0.985 0.950
𝑈 (−1,1) 1.000 0.975 0.880
𝑈 (−0.5,0.5) 1.000 0.980 0.955
𝑈 (−0.25,0.25) 1.000 0.995 0.945
Small Differences
Time Distribution n=1000 n=500 n=200
𝑁 (0,0.5
2
) 0.865 0.550 0.525
𝑁 (0,0.25
2
) 0.895 0.615 0.560
𝑁 (0,0.1
2
) 0.910 0.620 0.555
𝑈 (−1,1) 0.850 0.570 0.495
𝑈 (−0.5,0.5) 0.905 0.535 0.450
𝑈 (−0.25,0.25) 0.935 0.690 0.535
NONLINEAR MIXED EFFECTS MIXTURE MODELS 60
Table 4: Part 1 – General conditions – Percent bias for each parameter. Mixture proportions are
given in their original metric.
Mplus
Large differences
n Time distr. 𝛽 11
𝛽 12
𝛽 13
𝛽 21
𝛽 22
𝛽 23
1000 𝑁 (0,.5
2
) -0.712 -0.120 -0.491 -0.771 0.946 -1.943
𝑁 (0,.25
2
) -0.216 -0.195 -0.681 -0.544 0.881 -2.237
𝑁 (0,.1
2
) 0.030 -0.200 -0.942 -0.388 0.888 -2.098
𝑈 (−1,1) -0.994 -0.075 -0.386 -1.095 0.930 -1.975
𝑈 (−.5,.5) -0.349 -0.173 -0.855 -0.490 0.908 -2.050
𝑈 (−.25,.25) 0.021 -0.193 -0.944 -0.327 0.933 -2.458
500 𝑁 (0,.5
2
) -0.471 -0.125 -0.556 -0.701 0.920 -1.915
𝑁 (0,.25
2
) -0.395 -0.188 -0.686 -0.581 0.892 -2.115
𝑁 (0,.1
2
) 0.210 -0.210 -1.001 -0.432 0.687 -2.631
𝑈 (−1,1) -0.947 -0.079 -0.234 -0.811 0.941 -2.159
𝑈 (−.5,.5) -0.422 -0.191 -0.746 -0.565 0.897 -1.555
𝑈 (−.25,.25) -0.216 -0.185 -0.956 -0.544 0.840 -2.361
200 𝑁 (0,.5
2
) -0.644 -0.163 -0.321 -0.698 0.876 -2.249
𝑁 (0,.25
2
) -0.105 -0.143 -0.797 -0.547 0.747 -2.080
𝑁 (0,.1
2
) -0.197 -0.130 -0.848 -0.675 0.766 -1.739
𝑈 (−1,1) -0.898 -0.132 -0.230 -1.511 0.916 -1.514
𝑈 (−.5,.5) -0.592 -0.186 -0.672 -0.310 0.856 -2.120
𝑈 (−.25,.25) -0.027 -0.192 -1.049 -0.277 0.994 -2.450
NONLINEAR MIXED EFFECTS MIXTURE MODELS 61
Mplus
Large differences
n Time distr. Σ
1,1
Σ
2,2
Σ
3,3
𝜎 2
𝜋 1
𝜋 2
1000 𝑁 (0,.5
2
) 2.445 3.349 -6.250 1.627 0.797 0.203
𝑁 (0,.25
2
) 2.159 4.179 -5.900 0.582 0.797 0.203
𝑁 (0,.1
2
) 0.720 2.994 -7.437 1.008 0.797 0.203
𝑈 (−1,1) 3.788 2.987 -6.150 1.094 0.797 0.203
𝑈 (−.5,.5) 1.786 3.537 -6.100 1.298 0.797 0.203
𝑈 (−.25,.25) 0.471 3.383 -6.750 0.559 0.797 0.203
500 𝑁 (0,.5
2
) 3.819 2.646 -5.500 0.970 0.797 0.203
𝑁 (0,.25
2
) 0.916 3.833 -7.150 0.936 0.797 0.203
𝑁 (0,.1
2
) 1.569 2.885 -6.050 0.377 0.798 0.202
𝑈 (−1,1) 2.809 3.699 -6.300 1.570 0.797 0.203
𝑈 (−.5,.5) 1.434 3.078 -6.800 0.541 0.795 0.205
𝑈 (−.25,.25) 0.185 2.311 -8.100 0.739 0.796 0.204
200 𝑁 (0,.5
2
) 1.834 1.416 -6.950 0.786 0.796 0.204
𝑁 (0,.25
2
) -0.452 1.664 -7.200 0.711 0.797 0.203
𝑁 (0,.1
2
) -0.663 2.363 -7.704 1.281 0.797 0.203
𝑈 (−1,1) 3.564 -0.112 -6.700 1.283 0.796 0.204
𝑈 (−.5,.5) 0.980 2.364 -7.500 1.218 0.796 0.204
𝑈 (−.25,.25) -0.603 1.199 -8.500 0.574 0.799 0.201
NONLINEAR MIXED EFFECTS MIXTURE MODELS 62
Mplus
Small differences
n Time distr. 𝛽 11
𝛽 12
𝛽 13
𝛽 21
𝛽 22
𝛽 23
1000 𝑁 (0,.5
2
) -0.258 -0.144 0.026 -2.396 -0.799 -3.396
𝑁 (0,.25
2
) -0.012 -0.152 -0.156 -1.924 -0.792 -3.584
𝑁 (0,.1
2
) -0.006 -0.199 -0.272 -1.617 -0.843 -3.311
𝑈 (−1,1) -0.775 -0.084 0.120 -2.565 -0.741 -2.966
𝑈 (−.5,.5) -0.066 -0.129 -0.077 -1.453 -0.908 -4.057
𝑈 (−.25,.25) 0.151 -0.190 -0.344 -1.224 -0.731 -3.579
500 𝑁 (0,.5
2
) -0.570 -0.087 0.011 -2.541 -0.936 -3.388
𝑁 (0,.25
2
) -0.208 -0.146 -0.210 -1.879 -0.937 -3.590
𝑁 (0,.1
2
) 0.358 -0.111 -0.442 -2.166 -1.065 -3.628
𝑈 (−1,1) -0.743 -0.084 0.279 -2.810 -0.810 -2.945
𝑈 (−.5,.5) -0.213 -0.074 -0.456 -2.451 -0.873 -2.802
𝑈 (−.25,.25) 0.174 -0.176 -0.476 -0.513 -0.666 -3.571
200 𝑁 (0,.5
2
) -0.115 -0.008 -0.283 -3.577 -0.749 -2.860
𝑁 (0,.25
2
) 0.098 0.170 -1.307 -2.989 -1.059 -2.854
𝑁 (0,.1
2
) 0.466 -0.106 -0.501 -2.960 -1.287 -2.930
𝑈 (−1,1) -0.763 0.129 -0.810 -4.578 -1.289 -1.288
𝑈 (−.5,.5) -0.174 -0.026 -1.116 -3.181 -1.430 -2.061
𝑈 (−.25,.25) 0.287 -0.033 -0.865 -2.691 -1.148 -3.049
NONLINEAR MIXED EFFECTS MIXTURE MODELS 63
Mplus
Small differences
n Time distr. Σ
1,1
Σ
2,2
Σ
3,3
𝜎 2
𝜋 1
𝜋 2
1000 𝑁 (0,.5
2
) 6.220 3.690 -11.300 0.492 0.789 0.211
𝑁 (0,.25
2
) 4.840 3.464 -11.750 0.459 0.790 0.210
𝑁 (0,.1
2
) 2.167 2.257 -12.150 0.914 0.787 0.213
𝑈 (−1,1) 7.386 1.758 -11.150 0.477 0.787 0.213
𝑈 (−.5,.5) 3.156 4.087 -11.750 0.259 0.791 0.209
𝑈 (−.25,.25) 1.674 1.889 -12.050 0.431 0.792 0.208
500 𝑁 (0,.5
2
) 6.141 4.734 -11.150 -0.009 0.787 0.213
𝑁 (0,.25
2
) 3.631 4.040 -12.900 0.560 0.786 0.214
𝑁 (0,.1
2
) 3.705 4.639 -12.000 0.300 0.791 0.209
𝑈 (−1,1) 7.523 2.633 -12.050 0.875 0.785 0.215
𝑈 (−.5,.5) 5.407 4.376 -12.800 0.559 0.787 0.213
𝑈 (−.25,.25) 2.527 2.448 -11.050 1.031 0.797 0.203
200 𝑁 (0,.5
2
) 7.517 0.896 -16.700 1.377 0.778 0.222
𝑁 (0,.25
2
) 7.037 2.916 -15.100 1.125 0.787 0.213
𝑁 (0,.1
2
) 4.427 4.790 -15.550 1.353 0.786 0.214
𝑈 (−1,1) 8.795 5.334 -13.600 0.830 0.777 0.223
𝑈 (−.5,.5) 5.149 5.228 -15.150 0.018 0.778 0.222
𝑈 (−.25,.25) 3.420 1.869 -13.550 0.842 0.783 0.217
NONLINEAR MIXED EFFECTS MIXTURE MODELS 64
OpenBUGS
Large differences
n Time distr. 𝛽 11
𝛽 12
𝛽 13
𝛽 21
𝛽 22
𝛽 23
1000 𝑁 (0,.5
2
) -0.039 0.004 0.013 0.135 1.837 0.648
𝑁 (0,.25
2
) -0.033 -0.019 0.117 0.039 1.829 0.668
𝑁 (0,.1
2
) 0.029 -0.007 -0.026 0.021 1.834 0.945
𝑈 (−1,1) 0.000 0.010 -0.034 -0.072 1.818 0.500
𝑈 (−.5,.5) -0.046 -0.003 -0.106 0.109 1.829 0.719
𝑈 (−.25,.25) -0.014 0.000 0.022 0.102 1.878 0.495
500 𝑁 (0,.5
2
) 0.310 -0.013 -0.057 0.261 1.813 1.116
𝑁 (0,.25
2
) -0.127 -0.022 0.090 0.093 1.806 1.299
𝑁 (0,.1
2
) 0.215 -0.032 0.012 0.094 1.835 1.319
𝑈 (−1,1) 0.127 -0.015 0.103 0.279 1.815 0.853
𝑈 (−.5,.5) -0.033 -0.025 0.005 0.180 1.787 1.503
𝑈 (−.25,.25) -0.186 -0.002 0.012 -0.019 1.772 1.254
200 𝑁 (0,.5
2
) 0.108 -0.074 0.124 0.305 1.646 2.522
𝑁 (0,.25
2
) 0.129 -0.001 0.076 -0.064 1.636 2.763
𝑁 (0,.1
2
) -0.204 0.030 0.195 0.061 1.667 2.920
𝑈 (−1,1) 0.218 -0.089 0.026 -0.338 1.727 2.801
𝑈 (−.5,.5) -0.131 -0.041 0.051 0.435 1.684 2.286
𝑈 (−.25,.25) -0.035 -0.030 0.070 0.081 1.804 2.361
NONLINEAR MIXED EFFECTS MIXTURE MODELS 65
OpenBUGS
Large differences
n Time distr. Σ
1,1
Σ
2,2
Σ
3,3
𝜎 2
𝜋 1
𝜋 2
1000 𝑁 (0,.5
2
) -2.020 -1.055 17.965 0.379 0.800 0.200
𝑁 (0,.25
2
) -0.412 -0.168 19.998 -0.792 0.800 0.200
𝑁 (0,.1
2
) -0.366 -0.698 19.697 -0.615 0.800 0.200
𝑈 (−1,1) -1.731 -2.087 16.950 -0.144 0.800 0.200
𝑈 (−.5,.5) -1.083 -0.586 18.737 -0.091 0.800 0.200
𝑈 (−.25,.25) -0.717 0.078 19.794 -1.064 0.800 0.200
500 𝑁 (0,.5
2
) -0.980 -2.606 34.515 -0.215 0.801 0.199
𝑁 (0,.25
2
) -1.695 0.129 35.563 -0.608 0.800 0.200
𝑁 (0,.1
2
) -0.038 -0.854 38.091 -1.370 0.801 0.199
𝑈 (−1,1) -3.806 -2.483 32.920 0.465 0.800 0.200
𝑈 (−.5,.5) -1.715 -0.825 34.753 -0.786 0.800 0.200
𝑈 (−.25,.25) -0.897 -0.671 36.708 -1.097 0.800 0.200
200 𝑁 (0,.5
2
) -5.373 -4.662 78.100 0.864 0.800 0.200
𝑁 (0,.25
2
) -3.661 -3.263 80.119 -0.098 0.800 0.200
𝑁 (0,.1
2
) -3.032 -0.274 82.952 0.100 0.800 0.200
𝑈 (−1,1) -5.646 -8.793 75.902 1.835 0.800 0.200
𝑈 (−.5,.5) -2.919 -2.595 80.175 0.920 0.800 0.200
𝑈 (−.25,.25) -1.745 -2.143 82.138 -0.518 0.801 0.199
NONLINEAR MIXED EFFECTS MIXTURE MODELS 66
OpenBUGS
Small differences
n Time distr. 𝛽 11
𝛽 12
𝛽 13
𝛽 21
𝛽 22
𝛽 23
1000 𝑁 (0,.5
2
) 0.221 -0.026 -0.344 0.230 -0.008 1.921
𝑁 (0,.25
2
) 0.015 0.007 -0.310 0.014 -0.004 2.220
𝑁 (0,.1
2
) -0.008 -0.010 -0.286 0.039 -0.006 2.405
𝑈 (−1,1) -0.105 0.013 -0.350 0.457 -0.048 2.030
𝑈 (−.5,.5) 0.061 0.014 -0.271 0.400 0.025 2.029
𝑈 (−.25,.25) 0.052 -0.016 -0.227 0.154 0.060 2.179
500 𝑁 (0,.5
2
) 0.118 0.029 -0.568 -0.158 -0.243 3.803
𝑁 (0,.25
2
) -0.077 0.021 -0.447 0.370 -0.199 3.360
𝑁 (0,.1
2
) 0.178 0.031 -0.326 -0.313 -0.160 4.062
𝑈 (−1,1) 0.084 -0.012 -0.459 0.330 -0.096 3.567
𝑈 (−.5,.5) 0.066 0.053 -0.575 -0.340 -0.155 4.067
𝑈 (−.25,.25) 0.070 -0.024 -0.340 0.254 -0.068 3.671
200 𝑁 (0,.5
2
) 1.605 0.557 -1.480 -5.516 -2.126 8.887
𝑁 (0,.25
2
) 1.531 0.559 -1.412 -5.109 -1.925 8.608
𝑁 (0,.1
2
) 1.942 0.495 -1.143 -5.682 -2.070 9.477
𝑈 (−1,1) 2.468 0.501 -1.655 -6.399 -2.166 9.517
𝑈 (−.5,.5) 2.188 0.475 -1.354 -6.116 -2.174 9.214
𝑈 (−.25,.25) 1.816 0.324 -1.079 -4.504 -1.278 7.996
NONLINEAR MIXED EFFECTS MIXTURE MODELS 67
OpenBUGS
Small differences
n Time distr. Σ
1,1
Σ
2,2
Σ
3,3
𝜎 2
𝜋 1
𝜋 2
1000 𝑁 (0,.5
2
) -1.550 -0.919 21.527 -0.287 0.801 0.199
𝑁 (0,.25
2
) -0.420 -0.394 22.916 -0.571 0.801 0.199
𝑁 (0,.1
2
) -0.947 -1.328 23.611 -0.368 0.800 0.200
𝑈 (−1,1) -2.206 -2.322 20.475 -0.290 0.800 0.200
𝑈 (−.5,.5) -2.373 -0.615 22.579 -0.644 0.801 0.199
𝑈 (−.25,.25) -1.142 -1.600 23.119 -0.816 0.801 0.199
500 𝑁 (0,.5
2
) -2.027 -0.504 39.600 -0.613 0.800 0.200
𝑁 (0,.25
2
) -2.106 0.693 40.434 -0.439 0.800 0.200
𝑁 (0,.1
2
) -0.799 0.916 42.393 -0.905 0.800 0.200
𝑈 (−1,1) -3.218 -2.546 37.514 0.262 0.800 0.200
𝑈 (−.5,.5) -1.276 0.038 39.804 -0.223 0.801 0.199
𝑈 (−.25,.25) 0.078 -0.556 41.392 -0.329 0.801 0.199
200 𝑁 (0,.5
2
) -6.293 -5.014 83.360 3.631 0.796 0.204
𝑁 (0,.25
2
) 0.462 -4.342 86.605 2.370 0.796 0.204
𝑁 (0,.1
2
) -0.970 0.019 90.222 2.237 0.798 0.202
𝑈 (−1,1) -4.521 -8.426 86.105 3.599 0.797 0.203
𝑈 (−.5,.5) -2.566 -0.454 87.112 1.648 0.795 0.205
𝑈 (−.25,.25) 0.355 -0.573 89.895 0.825 0.800 0.200
NONLINEAR MIXED EFFECTS MIXTURE MODELS 68
Table 5: Part 1 – General conditions – Standard errors for each parameter estimate
Mplus
Large differences
n Time distr. 𝛽 11
𝛽 12
𝛽 13
𝛽 21
𝛽 22
𝛽 23
1000 𝑁 (0,.5
2
) 0.235 0.279 0.004 0.508 0.610 0.009
𝑁 (0,.25
2
) 0.224 0.268 0.004 0.489 0.601 0.009
𝑁 (0,.1
2
) 0.219 0.263 0.004 0.478 0.594 0.009
𝑈 (−1,1) 0.244 0.286 0.004 0.518 0.621 0.009
𝑈 (−.5,.5) 0.226 0.270 0.004 0.493 0.602 0.009
𝑈 (−.25,.25) 0.217 0.262 0.004 0.474 0.592 0.009
500 𝑁 (0,.5
2
) 0.334 0.394 0.006 0.723 0.860 0.012
𝑁 (0,.25
2
) 0.316 0.379 0.006 0.690 0.849 0.012
𝑁 (0,.1
2
) 0.310 0.372 0.006 0.686 0.909 0.012
𝑈 (−1,1) 0.346 0.407 0.006 0.725 0.878 0.013
𝑈 (−.5,.5) 0.320 0.382 0.006 0.700 0.849 0.012
𝑈 (−.25,.25) 0.310 0.372 0.006 0.674 0.833 0.012
200 𝑁 (0,.5
2
) 0.534 0.633 0.010 1.145 1.368 0.020
𝑁 (0,.25
2
) 0.508 0.622 0.010 1.169 1.412 0.021
𝑁 (0,.1
2
) 0.494 0.595 0.010 1.123 1.381 0.020
𝑈 (−1,1) 0.563 0.647 0.010 1.212 1.451 0.019
𝑈 (−.5,.5) 0.512 0.609 0.010 1.113 1.353 0.019
𝑈 (−.25,.25) 0.492 0.598 0.010 1.103 1.321 0.020
NONLINEAR MIXED EFFECTS MIXTURE MODELS 69
Mplus
Large differences
n Time distr. Σ
1,1
Σ
2,2
Σ
3,3
𝜎 2
1000 𝑁 (0,.5
2
) 1.953 2.888 0.001 0.278
𝑁 (0,.25
2
) 1.821 2.746 0.001 0.282
𝑁 (0,.1
2
) 1.744 2.641 0.001 0.286
𝑈 (−1,1) 2.066 3.003 0.001 0.271
𝑈 (−.5,.5) 1.844 2.790 0.001 0.283
𝑈 (−.25,.25) 1.731 2.620 0.001 0.286
500 𝑁 (0,.5
2
) 2.766 4.038 0.001 0.391
𝑁 (0,.25
2
) 2.541 3.847 0.001 0.399
𝑁 (0,.1
2
) 2.499 3.781 0.001 0.403
𝑈 (−1,1) 2.906 4.304 0.001 0.388
𝑈 (−.5,.5) 2.615 3.862 0.001 0.391
𝑈 (−.25,.25) 2.473 3.654 0.001 0.404
200 𝑁 (0,.5
2
) 4.378 6.361 0.001 0.616
𝑁 (0,.25
2
) 4.293 6.175 0.001 0.629
𝑁 (0,.1
2
) 4.007 5.822 0.001 0.643
𝑈 (−1,1) 4.760 6.687 0.001 0.609
𝑈 (−.5,.5) 4.209 6.043 0.001 0.627
𝑈 (−.25,.25) 3.908 5.876 0.001 0.636
NONLINEAR MIXED EFFECTS MIXTURE MODELS 70
Mplus
Small differences
n Time distr. 𝛽 11
𝛽 12
𝛽 13
𝛽 21
𝛽 22
𝛽 23
1000 𝑁 (0,.5
2
) 0.291 0.339 0.006 0.774 0.847 0.012
𝑁 (0,.25
2
) 0.280 0.333 0.006 0.751 0.827 0.012
𝑁 (0,.1
2
) 0.272 0.329 0.006 0.723 0.835 0.012
𝑈 (−1,1) 0.301 0.350 0.006 0.791 0.826 0.012
𝑈 (−.5,.5) 0.274 0.332 0.006 0.728 0.843 0.012
𝑈 (−.25,.25) 0.267 0.319 0.006 0.696 0.821 0.012
500 𝑁 (0,.5
2
) 0.443 0.525 0.009 1.205 1.326 0.018
𝑁 (0,.25
2
) 0.420 0.489 0.009 1.157 1.241 0.017
𝑁 (0,.1
2
) 0.408 0.496 0.009 1.086 1.266 0.017
𝑈 (−1,1) 0.464 0.510 0.009 1.192 1.295 0.019
𝑈 (−.5,.5) 0.472 0.559 0.009 1.166 1.276 0.019
𝑈 (−.25,.25) 0.389 0.472 0.008 1.024 1.197 0.018
200 𝑁 (0,.5
2
) 0.709 0.811 0.013 1.715 1.997 0.027
𝑁 (0,.25
2
) 0.729 0.860 0.014 1.720 1.985 0.029
𝑁 (0,.1
2
) 0.843 1.049 0.017 2.215 2.049 0.028
𝑈 (−1,1) 0.773 0.918 0.014 1.721 1.887 0.025
𝑈 (−.5,.5) 0.736 0.893 0.014 1.720 1.974 0.028
𝑈 (−.25,.25) 0.723 0.833 0.015 1.676 2.050 0.026
NONLINEAR MIXED EFFECTS MIXTURE MODELS 71
Mplus
Small differences
n Time distr. Σ
1,1
Σ
2,2
Σ
3,3
𝜎 2
1000 𝑁 (0,.5
2
) 2.393 3.276 0.001 0.275
𝑁 (0,.25
2
) 2.262 3.124 0.001 0.282
𝑁 (0,.1
2
) 2.181 3.117 0.001 0.287
𝑈 (−1,1) 2.552 3.372 0.001 0.271
𝑈 (−.5,.5) 2.238 3.217 0.001 0.280
𝑈 (−.25,.25) 2.132 3.029 0.001 0.285
500 𝑁 (0,.5
2
) 3.589 4.894 0.001 0.386
𝑁 (0,.25
2
) 3.364 4.700 0.001 0.400
𝑁 (0,.1
2
) 3.147 4.585 0.001 0.403
𝑈 (−1,1) 3.779 4.888 0.001 0.386
𝑈 (−.5,.5) 3.376 5.041 0.001 0.398
𝑈 (−.25,.25) 3.089 4.378 0.001 0.407
200 𝑁 (0,.5
2
) 5.332 7.344 0.001 0.617
𝑁 (0,.25
2
) 5.255 7.141 0.002 0.636
𝑁 (0,.1
2
) 5.971 8.760 0.002 0.642
𝑈 (−1,1) 5.530 7.587 0.001 0.605
𝑈 (−.5,.5) 5.290 7.413 0.002 0.623
𝑈 (−.25,.25) 5.086 7.264 0.002 0.637
NONLINEAR MIXED EFFECTS MIXTURE MODELS 72
OpenBUGS
Large differences
n Time distr. 𝛽 11
𝛽 12
𝛽 13
𝛽 21
𝛽 22
𝛽 23
1000 𝑁 (0,.5
2
) 0.233 0.279 0.004 0.487 0.601 0.009
𝑁 (0,.25
2
) 0.222 0.267 0.005 0.473 0.592 0.009
𝑁 (0,.1
2
) 0.217 0.263 0.005 0.465 0.587 0.009
𝑈 (−1,1) 0.241 0.285 0.004 0.498 0.610 0.009
𝑈 (−.5,.5) 0.224 0.271 0.005 0.477 0.594 0.009
𝑈 (−.25,.25) 0.216 0.262 0.005 0.463 0.584 0.009
500 𝑁 (0,.5
2
) 0.332 0.391 0.007 0.698 0.855 0.014
𝑁 (0,.25
2
) 0.315 0.379 0.007 0.671 0.845 0.014
𝑁 (0,.1
2
) 0.308 0.371 0.007 0.666 0.833 0.014
𝑈 (−1,1) 0.342 0.404 0.007 0.704 0.869 0.013
𝑈 (−.5,.5) 0.318 0.382 0.007 0.673 0.842 0.014
𝑈 (−.25,.25) 0.307 0.371 0.007 0.663 0.830 0.014
200 𝑁 (0,.5
2
) 0.525 0.622 0.012 1.116 1.379 0.024
𝑁 (0,.25
2
) 0.500 0.599 0.012 1.084 1.347 0.024
𝑁 (0,.1
2
) 0.489 0.593 0.012 1.069 1.359 0.025
𝑈 (−1,1) 0.542 0.633 0.012 1.135 1.381 0.024
𝑈 (−.5,.5) 0.508 0.606 0.012 1.102 1.369 0.025
𝑈 (−.25,.25) 0.487 0.588 0.012 1.078 1.331 0.025
NONLINEAR MIXED EFFECTS MIXTURE MODELS 73
OpenBUGS
Large differences
n Time distr. Σ
1,1
Σ
2,2
Σ
3,3
𝜎 2
𝜋 1
1000 𝑁 (0,.5
2
) 1.891 2.799 0.001 0.272 0.009
𝑁 (0,.25
2
) 1.780 2.627 0.001 0.275 0.009
𝑁 (0,.1
2
) 1.723 2.546 0.001 0.278 0.009
𝑈 (−1,1) 1.987 2.886 0.001 0.267 0.009
𝑈 (−.5,.5) 1.803 2.663 0.001 0.275 0.009
𝑈 (−.25,.25) 1.698 2.527 0.001 0.278 0.009
500 𝑁 (0,.5
2
) 2.721 3.950 0.001 0.385 0.013
𝑁 (0,.25
2
) 2.531 3.758 0.001 0.390 0.013
𝑁 (0,.1
2
) 2.465 3.612 0.001 0.390 0.013
𝑈 (−1,1) 2.815 4.147 0.001 0.383 0.013
𝑈 (−.5,.5) 2.575 3.802 0.001 0.389 0.013
𝑈 (−.25,.25) 2.435 3.612 0.001 0.392 0.013
200 𝑁 (0,.5
2
) 4.421 6.456 0.002 0.628 0.021
𝑁 (0,.25
2
) 4.124 6.077 0.002 0.631 0.021
𝑁 (0,.1
2
) 3.989 5.957 0.002 0.634 0.021
𝑈 (−1,1) 4.633 6.676 0.002 0.630 0.021
𝑈 (−.5,.5) 4.245 6.251 0.002 0.638 0.021
𝑈 (−.25,.25) 3.943 5.823 0.002 0.629 0.021
NONLINEAR MIXED EFFECTS MIXTURE MODELS 74
OpenBUGS
Small differences
n Time distr. 𝛽 11
𝛽 12
𝛽 13
𝛽 21
𝛽 22
𝛽 23
1000 𝑁 (0,.5
2
) 0.263 0.313 0.005 0.649 0.789 0.012
𝑁 (0,.25
2
) 0.251 0.306 0.005 0.627 0.767 0.012
𝑁 (0,.1
2
) 0.246 0.298 0.005 0.610 0.757 0.012
𝑈 (−1,1) 0.271 0.320 0.005 0.658 0.792 0.012
𝑈 (−.5,.5) 0.251 0.304 0.005 0.620 0.761 0.012
𝑈 (−.25,.25) 0.244 0.295 0.005 0.605 0.749 0.012
500 𝑁 (0,.5
2
) 0.391 0.475 0.007 1.006 1.255 0.018
𝑁 (0,.25
2
) 0.368 0.448 0.007 0.947 1.174 0.018
𝑁 (0,.1
2
) 0.366 0.444 0.008 0.964 1.167 0.018
𝑈 (−1,1) 0.399 0.469 0.007 0.990 1.214 0.018
𝑈 (−.5,.5) 0.389 0.464 0.007 1.037 1.259 0.019
𝑈 (−.25,.25) 0.363 0.434 0.007 0.940 1.137 0.018
200 𝑁 (0,.5
2
) 0.662 0.824 0.013 1.824 2.304 0.031
𝑁 (0,.25
2
) 0.680 0.880 0.013 1.972 2.671 0.032
𝑁 (0,.1
2
) 0.691 0.862 0.013 2.044 2.598 0.033
𝑈 (−1,1) 0.677 0.818 0.013 1.850 2.281 0.030
𝑈 (−.5,.5) 0.723 0.935 0.014 2.113 2.852 0.034
𝑈 (−.25,.25) 0.696 0.875 0.014 2.107 2.727 0.034
NONLINEAR MIXED EFFECTS MIXTURE MODELS 75
OpenBUGS
Small differences
n Time distr. Σ
1,1
Σ
2,2
Σ
3,3
𝜎 2
𝜋 1
1000 𝑁 (0,.5
2
) 2.264 3.270 0.001 0.270 0.011
𝑁 (0,.25
2
) 2.118 3.108 0.001 0.276 0.011
𝑁 (0,.1
2
) 2.048 2.983 0.001 0.280 0.011
𝑈 (−1,1) 2.363 3.382 0.001 0.267 0.011
𝑈 (−.5,.5) 2.117 3.098 0.001 0.274 0.011
𝑈 (−.25,.25) 2.024 2.953 0.001 0.278 0.011
500 𝑁 (0,.5
2
) 3.334 4.846 0.001 0.383 0.016
𝑁 (0,.25
2
) 3.104 4.562 0.001 0.393 0.016
𝑁 (0,.1
2
) 3.011 4.486 0.001 0.392 0.016
𝑈 (−1,1) 3.486 4.954 0.001 0.381 0.016
𝑈 (−.5,.5) 3.217 4.691 0.001 0.392 0.016
𝑈 (−.25,.25) 2.981 4.333 0.001 0.398 0.016
200 𝑁 (0,.5
2
) 5.085 7.429 0.002 0.630 0.025
𝑁 (0,.25
2
) 5.106 7.325 0.002 0.645 0.025
𝑁 (0,.1
2
) 4.879 7.190 0.002 0.647 0.025
𝑈 (−1,1) 5.182 7.648 0.002 0.622 0.024
𝑈 (−.5,.5) 5.236 7.835 0.002 0.641 0.026
𝑈 (−.25,.25) 5.086 7.359 0.002 0.641 0.026
NONLINEAR MIXED EFFECTS MIXTURE MODELS 76
Table 6: Part 2A – Acknowledging time heterogeneity – Population values
Parameter Class 1 Means Class 2 Means
Fixed Effects
𝛽 1
-1.2 -0.2
𝛽 2
2.5 1.6
𝛽 3
0.70 1
Random Effects
Σ
1,1
0.036 0.036
Σ
1,2
0 0
Σ
2,2
0.09 0.09
Σ
3,1
0 0
Σ
3,2
0 0
Σ
3,3
.0001 .0001
Residual Variance
𝜎 2
0.04 0.04
Mixture Proportions
𝜋 𝑐
0.80 0.20
NONLINEAR MIXED EFFECTS MIXTURE MODELS 77
Table 7: Part 2A – Acknowledging time heterogeneity – Percent bias for each parameter with
standard errors. Standard errors and mixture proportions given in their original metric.
Mplus Same-time Varying-time
Parameter Percent bias Standard error Percent bias Standard error
𝛽 11
26.695 0.157 -0.108 0.010
𝛽 12
14.852 0.158 -0.033 0.014
𝛽 13
6.741 0.026 -0.038 0.006
𝛽 21
201.220 0.207 2.093 0.022
𝛽 22
32.065 0.183 0.780 0.030
𝛽 23
-3.296 0.108 -0.841 0.023
Σ
1,1
1380.819 0.047 -7.772 0.003
Σ
2,2
547.317 0.045 1.538 0.007
Σ
3,3
4555.000 0.003 167.677 0.001
𝜎 2
-0.125 0.001 -0.177 0.001
𝜋 1
0.902 0.801
𝜋 2
0.098 0.199
NONLINEAR MIXED EFFECTS MIXTURE MODELS 78
OpenBUGS Same-time Varying-time
Parameter Percent bias Standard error Percent bias Standard error
𝛽 11
-20.302 0.039 -0.397 0.010
𝛽 12
-10.257 0.039 0.032 0.014
𝛽 13
7.292 0.011 1.320 0.007
𝛽 21
727.084 0.132 -1.569 0.022
𝛽 22
96.271 0.130 -0.211 0.030
𝛽 23
-23.208 0.029 1.036 0.026
Σ
1,1
1155.432 0.034 2.202 0.003
Σ
2,2
392.354 0.037 6.748 0.007
Σ
3,3
16494.319 0.002 11829.819 0.001
𝜎 2
-3.349 0.001 -5.944 0.001
𝜋 1
0.808 0.012 0.801 0.009
𝜋 2
0.192 0.199
NONLINEAR MIXED EFFECTS MIXTURE MODELS 79
Table 8: Part 2B – Time-varying residuals – Percent bias for each parameter with standard errors.
Standard errors and mixture proportions given in their original metric.
Mplus Invariant-residuals Varying-residuals
Parameter Percent bias Standard error Percent bias Standard error
𝛽 11
-0.140 0.011 -0.186 0.010
𝛽 12
-0.072 0.014 -0.071 0.014
𝛽 13
0.117 0.005 0.015 0.005
𝛽 21
-0.960 0.023 -7.453 0.023
𝛽 22
-0.063 0.030 -0.478 0.030
𝛽 23
0.887 0.024 -0.870 0.022
Σ
1,1
59.722 0.004 1.389 0.005
Σ
2,2
24.833 0.007 3.722 0.008
Σ
3,3
5675.000 0.001 600.000 0.001
𝜎 1
2
-57.690 0.001 -6.630 0.006
𝜎 2
2
-42.887 0.001 -0.880 0.002
𝜎 3
2
-14.798 0.001 0.084 0.001
𝜎 4
2
27.106 0.001 1.240 0.001
𝜎 5
2
131.603 0.001 -2.181 0.001
𝜋 1
0.802 0.802
𝜋 2
0.198 0.198
NONLINEAR MIXED EFFECTS MIXTURE MODELS 80
OpenBUGS Invariant-residuals Varying-residuals
Parameter Percent bias Standard error Percent bias Standard error
𝛽 11
-0.233 0.011 -0.357 0.011
𝛽 12
0.071 0.014 -0.036 0.014
𝛽 13
1.258 0.006 1.378 0.006
𝛽 21
-3.644 0.024 -3.783 0.023
𝛽 22
-0.369 0.030 -0.455 0.029
𝛽 23
1.144 0.022 0.733 0.023
Σ
1,1
67.972 0.004 20.599 0.005
Σ
2,2
28.269 0.007 11.587 0.008
Σ
3,3
12053.317 0.001 10212.607 0.001
𝜎 1
2
-59.390 0.001 -21.545 0.005
𝜎 2
2
-45.183 0.001 -16.720 0.002
𝜎 3
2
-18.222 0.001 -0.235 0.001
𝜎 4
2
21.999 0.001 6.888 0.001
𝜎 5
2
122.296 0.001 -20.227 0.001
𝜋 1
0.802 0.009 0.802 0.009
𝜋 2
0.198 0.198
NONLINEAR MIXED EFFECTS MIXTURE MODELS 81
Table 9: Part 3A – MCAR – Percent bias for each parameter with standard errors. Standard
errors and mixture proportions given in their original metric.
Mplus OpenBUGS
Parameter Percent bias Standard error Percent bias Standard error
𝛽 11
0.030 0.011 -0.573 0.011
𝛽 12
-1.936 0.018 0.081 0.018
𝛽 13
-1.098 0.007 1.616 0.009
𝛽 21
4.409 0.025 -3.282 0.025
𝛽 22
0.368 0.036 -0.341 0.038
𝛽 23
-0.336 0.030 1.283 0.033
Σ
1,1
-5.345 0.004 5.411 0.004
Σ
2,2
-0.135 0.010 6.460 0.010
Σ
3,3
137.374 0.002 15280.048 0.002
𝜎 2
-0.051 0.002 -7.247 0.001
𝜋 1
0.803 0.803 0.009
𝜋 2
0.197 0.197
NONLINEAR MIXED EFFECTS MIXTURE MODELS 82
Table 10: Part 3B – Attrition – Percent bias for each parameter with standard errors. Standard
errors and mixture proportions given in their original metric.
Mplus OpenBUGS
Parameter Percent bias Standard error Percent bias Standard error
𝛽 11
-0.109 0.010 -0.496 0.010
𝛽 12
-0.028 0.017 0.107 0.018
𝛽 13
0.012 0.007 1.439 0.009
𝛽 21
-5.159 0.023 -2.630 0.023
𝛽 22
1.009 0.034 -0.322 0.036
𝛽 23
-0.802 0.028 1.373 0.031
Σ
1,1
-7.251 0.004 4.256 0.004
Σ
2,2
-0.467 0.010 7.293 0.009
Σ
3,3
84.615 0.002 14703.642 0.002
𝜎 2
-0.064 0.001 -7.507 0.001
𝜋 1
0.801 0.802 0.009
𝜋 2
0.199 0.198
NONLINEAR MIXED EFFECTS MIXTURE MODELS 83
Table 11: Part 3C – Non-random selection – Percent bias for each parameter with standard
errors. Standard errors and mixture proportions given in their original metric.
Mplus OpenBUGS
Parameter Percent bias Standard error Percent bias Standard error
𝛽 11
-0.724 0.010 -0.570 0.010
𝛽 12
0.333 0.014 0.467 0.014
𝛽 13
0.372 0.006 1.714 0.007
𝛽 21
-13.596 0.036 -0.881 0.037
𝛽 22
1.818 0.056 3.540 0.055
𝛽 23
1.104 0.045 0.329 0.047
Σ
1,1
-8.712 0.004 1.941 0.004
Σ
2,2
-0.887 0.008 5.285 0.008
Σ
3,3
-145.455 0.001 11877.773 0.001
𝜎 2
1.124 0.001 -5.799 0.001
𝜋 1
0.893 0.841 0.009
𝜋 2
0.107 0.159
NONLINEAR MIXED EFFECTS MIXTURE MODELS 84
Table 12: Illustrative example – Parameter estimates for ECLS-K data
Mplus WinBUGS
Parameter Mean Standard error Mean Standard error
𝛽 11
-1.052 0.028 -0.982 0.025
𝛽 12
2.494 0.027 2.416 0.027
𝛽 13
0.357 0.271 0.673 0.012
𝛽 21
-1.635 0.076 -2.373 0.127
𝛽 22
2.714 0.098 3.833 0.111
𝛽 23
0.672 0.009 0.770 0.045
Σ
1,1
0.437 0.027 0.361 0.024
Σ
1,2
-0.376 0.026 -0.277 0.022
Σ
1,3
0.040 0.008 0.050 0.007
Σ
2,2
0.413 0.030 0.298 0.026
Σ
2,3
-0.052 0.008 -0.057 0.008
Σ
3,3
0.019 0.004 0.033 0.003
𝜎 2
0.040 0.002 0.039 0.002
𝜋 1
0.920 0.899 0.009
𝜋 2
0.080 0.101
NONLINEAR MIXED EFFECTS MIXTURE MODELS 85
Figures
Figure 1: Growth trajectories for a subsample of 100 individuals from the ECLS-K study
NONLINEAR MIXED EFFECTS MIXTURE MODELS 86
Figure 2: Expected trajectories based on estimated model parameters
Abstract (if available)
Abstract
Change over time often takes on a nonlinear form which can introduce complexities in model estimation. Furthermore, these change patterns can sometimes be characterized by heterogeneity due to underlying unobserved groups in the population. Nonlinear mixed effects mixture models provide one way of addressing both of these issues simultaneously. The purpose of this study is to extend this class of models to accommodate individually varying measurement occasions. We develop methods to fit these models in both the structural equation modeling framework as well as the Bayesian framework and evaluate the performance of these methods in an attempt to provide researchers with some practical recommendations regarding their use. Simulation results show that the main force driving the success of these methods is the separation between latent classes. When these classes are well separated, even a sample of 200 individuals appears to be sufficient. Otherwise, a sample of 1000 or more may be required before parameters can be accurately recovered. Ignoring heterogeneity in time of measurement also led to substantial bias, particularly in the random effects parameters. Finally, we demonstrate the application of these techniques to an empirical dataset involving the development of reading ability in children.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
On the latent change score model in small samples
PDF
Attrition in longitudinal twin studies: a comparative study of SEM estimation methods
PDF
A comparison of classical methods and second order latent growth models for longitudinal data analysis
PDF
Regularized structural equation modeling
PDF
Rasch modeling of abstract reasoning in Project TALENT
PDF
Comparison of nonlinear mixed effect modeling methods for exhaled nitric oxide
PDF
Multilevel structural equation models for defining neighborhood dimensions
PDF
The design, implementation, and evaluation of accelerated longitudinal designs
PDF
Comparing dependent groups with missing values: an approach based on a robust method
PDF
Identifying diverse pathways to cognitive decline in later life using genetic and environmental factors
PDF
A Bayesian region of measurement equivalence (ROME) framework for establishing measurement invariance
PDF
Bayesian multilevel quantile regression for longitudinal data
PDF
Evaluating the associations between the baseline and other exposure variables with the longitudinal trajectory when responses are measured with error
PDF
Reproducible large-scale inference in high-dimensional nonlinear models
PDF
A functional use of response time data in cognitive assessment
PDF
Quantile mediation models: methods for assessing mediation across the outcome distribution
PDF
Later life success of former college student-athletes as a function of retirement from sport and participant characteristics
PDF
The limits of unidimensional computerized adaptive tests for polytomous item measures
PDF
Latent change score analysis of the impacts of memory training in the elderly from a randomized clinical trial
PDF
Evaluating social-cognitive measures of motivation in a longitudinal study of people completing New Year's resolutions to exercise
Asset Metadata
Creator
Serang, Sarfaraz
(author)
Core Title
Estimation of nonlinear mixed effects mixture models with individually varying measurement occasions
School
College of Letters, Arts and Sciences
Degree
Master of Arts
Degree Program
Psychology
Publication Date
04/21/2015
Defense Date
03/25/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
Bayesian,growth mixture model,longitudinal,mixed effects model,nonlinear,OAI-PMH Harvest,structural equation modeling
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
McArdle, John J. (
committee chair
), John, Richard S. (
committee member
), Wilcox, Rand R. (
committee member
)
Creator Email
serang@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-555342
Unique identifier
UC11299817
Identifier
etd-SerangSarf-3353.pdf (filename),usctheses-c3-555342 (legacy record id)
Legacy Identifier
etd-SerangSarf-3353.pdf
Dmrecord
555342
Document Type
Thesis
Format
application/pdf (imt)
Rights
Serang, Sarfaraz
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
Bayesian
growth mixture model
longitudinal
mixed effects model
nonlinear
structural equation modeling