Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
On the latent change score model in small samples
(USC Thesis Other)
On the latent change score model in small samples
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1
On the Latent Change Score Model in Small Samples
Sarfaraz Serang
A dissertation presented to
The Faculty of the USC Graduate School
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy in Psychology
University of Southern California
Los Angeles, California
August 2018
2
Table of Contents
Abstract 3
Introduction 4
Purpose 22
Study 1: Estimation of LCSMs in Small Samples 23
Methods 23
Results 26
Discussion 27
Study 2: Model Fit and Model Specification Effects 29
Methods 29
Results 30
Discussion 32
Study 3: Model Comparison via the Likelihood Ratio Test 35
Methods 35
Results 36
Discussion 38
Study 4: An Alternative to Small Sample Corrections 41
The Monte Carlo Test 42
Methods 46
Results 47
Discussion 49
Study 5: Cognitive Changes in Adults 51
Methods 51
Results 52
Discussion 54
Conclusions 55
References 63
Tables 72
Figures 92
3
Abstract
The latent change score model (LCSM) is typically fit to large samples, as is common for
structural equation models. However, the purpose of this dissertation is to better understand its
performance in small samples ( N≤100). Through a series of Monte Carlo simulation studies, I
show that the LCSM does perform well even in samples as small as N=20. Conventional
structural equation modeling software can fit LCSMs and produce parameter estimates with
minimal bias. Despite this, neither conventional model fit statistics nor the small sample
corrections typically used to adjust them work well both when examining model fit and
comparing models via the likelihood ratio test. As an alternative, I propose an extension of the
Monte Carlo test to address this and demonstrate its effectiveness in both of the aforementioned
contexts. Finally, I apply the knowledge gained from the simulation studies to examine empirical
data from the CogUSA study, a study of the effects of aging on cognition in adults.
4
Introduction
Structural equation modeling (SEM) is a flexible modeling framework that encompasses
a broad set of statistical models. One common misconception about SEM is that it should only be
used in large samples. Although it is unclear where specifically this notion originated, it has been
proliferated to students of SEM by introductory texts. For example, Loehlin recommends that
“structural equation modeling should not be considered a small-sample technique” (Loehlin,
2004, p. 59). Kline concurs, claiming “it is generally true that SEM is a large-sample technique”
(Kline, 2011, p. 11).
General rules of thumb have been proposed regarding minimum sample sizes required to
use SEM. Some have recommended sample sizes must be merely greater than N=100 (Gorsuch,
1983), whereas others suggest samples of size N=500 or more (Comrey & Lee, 1992). Others tie
their sample size recommendations to features of the model being estimated. For example,
Nunnally (1967) advises 10 observations per variable
1
, while Bentler and Chou (1987) suggest 5
to 10 observations per estimated parameter. However, such rules of thumb have severe
limitations, as the minimum sample size required can be influenced by a variety of factors
including number of variables, model specification, parameter values, and amount of missing
data (MacCallum, Widaman, Zhang, & Hong, 1999; Wolf, Harrington, Clark, & Miller, 2013).
Thus, researchers with smaller samples (e.g. N ≤100) have turned to alternative modeling
frameworks. One popular choice is the mixed-effects modeling framework (also known as
multilevel modeling or hierarchical linear modeling), due in part to literature demonstrating this
framework’s capabilities in handling smaller samples (e.g. Maas & Hox, 2005; Bell, Morgan,
1
Nunnally’s recommendation was not specifically for SEM per se, but for regression. Given that regression is a
special case of SEM however (as described below), the suggestion can easily be extended to the SEM context.
5
Schoenberger, Kromrey, & Ferron, 2014). Longitudinal models can be specified as mixed-effects
models, and as such enjoy these properties as well. In fact, these models can be fit to samples of
as few as N=12 individuals (Brammer, 2003; Muth et al., 2016). In contrast, the longitudinal
SEM literature has little research demonstrating its effectiveness in small samples (Bollen &
Curran, 2006; McNeish & Harring, 2017).
As mentioned above, SEM is a very general modeling framework, subsuming many
statistical techniques as special cases. Although the notion that SEM is a large sample technique
may hold true for certain latent variable models such as factor analysis models, it need not be the
case for simpler models. For example, even a simple linear regression model can be fit as a
structural equation model, yielding identical parameter estimates as traditional methods such as
regression with least squares. If a model of interest does not require a large sample when fit in an
alternative framework, and this model can be equivalently specified as a structural equation
model, it stands to reason that the SEM framework would not require a large sample either.
Indeed, the simple linear regression model is not the only model that can be shown to be
a special case of SEM; equivalencies across frameworks have been demonstrated for longitudinal
models as well. Meredith and Tisak (1990) demonstrated that repeated measures ANOVA was a
special case of their more general latent curve analysis. Soon after, Willett and Sayer (1994)
showed that longitudinal mixed-effects models could easily be specified as structural equation
models in LISREL notation, clarifying the links between the two frameworks. Curran (2003)
expanded on this work to demonstrate that a broad range of mixed-effects models could be
estimated as structural equation models. As such, it is clear that models for how individuals
change over time can be fit in the SEM framework, even for small samples.
6
One of the most popular longitudinal models fit in the SEM framework is the latent
growth curve model (LGM; Meredith & Tisak, 1990; McArdle & Epstein, 1987). The LGM
proposes that the trajectory of an individual’s scores over time is the result of some unobserved
underlying process with some added random error. While the shape of this trajectory is assumed
to be the same for all individuals, each person is permitted to have their own parameter values,
which are assumed to take on a prespecified distributional form.
For the case in which all individuals are measured at the same time points (though this
can easily be extended to allow for individual measurement schedules without loss of generality)
the LGM can be written in the SEM framework as
𝒀 𝑖 = 𝚲 𝛈 𝑖 + 𝛆 𝑖 (1)
where 𝒀 𝑖 is a t × 1 vector of observed scores, 𝚲 is a t × f matrix of factor loadings, 𝛈 𝑖 is a f × 1
vector of latent factor scores, and 𝛆 𝑖 is a t × 1 vector of residuals. The factor scores can be
decomposed into fixed effects, 𝛂 , and random effects, 𝛇 𝑖 , such that
𝛈 𝑖 = 𝛂 + 𝛇 𝑖 (2)
The random effects and residuals each follow a multivariate normal distribution such that
𝛇 𝑖 ~ 𝑵 ( 𝟎 , 𝚿 ) and 𝛆 𝑖 ~ 𝑵 ( 𝟎 , 𝚯 ) where 𝚿 is the f × f covariance matrix for the random effects and
𝚯 is the t × t covariance matrix for the residuals. Inserting Equation (2) into Equation (1) yields
𝒀 𝑖 = 𝚲 𝛂 + 𝚲 𝛇 𝑖 + 𝛆 𝑖 (3)
with expected mean vector (𝛍 ) and covariance matrix (𝚺 )
𝛍 = 𝚲 𝛂
𝚺 = 𝚲 𝚿 𝚲 ′ + 𝚯
(4)
7
A path diagram for an LGM with linear basis is provided in Figure 1.
One major advantage to using the SEM framework to fit longitudinal models is its access
to global model fit criteria (Chou, Bentler, & Pentz, 1998; McNeish & Harring, 2017).
Unavailable in the mixed-effects framework, these model fit criteria allow for the evaluation of
how well a model fits the data. Consider the maximum likelihood fit function with mean
structure included
𝐹 𝑀𝐿
= 𝑙𝑛 |𝚺 | − 𝑙𝑛 |𝐒 | + 𝑡𝑟 [( 𝐒 − 𝚺 ) 𝚺 −𝟏 ] + ( 𝐌 − 𝛍 ) ′𝚺 −𝟏 ( 𝐌 − 𝛍 )
(5)
where 𝐌 is the mean vector of the observed variables, 𝐒 is the covariance matrix of the observed
variables, and 𝛍 and 𝚺 are the expected mean vector and covariance matrix from Equation (4).
The log-likelihood function is maximized when 𝐹 𝑀𝐿
is minimized. The most basic global model
fit criteria (and the one upon which the others discussed are based) is the model chi-square test
statistic, defined as
𝑇 𝑀𝐿
= ( 𝑁 − 1) 𝐹 𝑀𝐿
(6)
where 𝑁 is the sample size. In large samples, 𝑇 𝑀𝐿
follows a central chi-square distribution with
degrees of freedom equal to the degrees of freedom of the model, denoted 𝑑𝑓 𝑀 . As such, 𝑇 𝑀𝐿
is a
measure of misfit, quantifying the extent to which the model fails to reproduce the observed data.
As sample size grows, so too does the power of 𝑇 𝑀𝐿
to detect even minor discrepancies
between the model and the data. Thus, several approximate fit indices have been developed in
order to better evaluate model fit. Among the most popular are the root mean square error of
approximation (RMSEA; Steiger & Lind, 1980; Steiger, 2016), the Comparative Fit Index (CFI;
Bentler, 1990), and the Tucker-Lewis Index (TLI; Tucker & Lewis, 1973). The RMSEA is
defined as
8
RMSEA = √max (
𝑇 𝑀𝐿
− 𝑑𝑓 𝑀 𝑑𝑓 𝑀 ( 𝑁 − 1)
, 0)
(7)
The CFI is defined as
CFI = 1 −
max ( 𝑇 𝑀𝐿
− 𝑑𝑓 𝑀 , 0)
max ( 𝑇 𝑀𝐿
− 𝑑𝑓 𝑀 , 𝑇 𝐵𝑎𝑠𝑒 − 𝑑𝑓 𝐵𝑎𝑠𝑒 )
(8)
where 𝑇 𝐵𝑎𝑠𝑒 and 𝑑𝑓 𝐵𝑎𝑠𝑒 are the test statistic and degrees of freedom corresponding to the nested
baseline model, typically the independence model. Finally, the TLI is defined as
TLI =
𝑇 𝐵𝑎𝑠𝑒 𝑑𝑓 𝐵𝑎𝑠𝑒 −
𝑇 𝑀𝐿
𝑑𝑓 𝑀 𝑇 𝐵𝑎𝑠𝑒 𝑑𝑓 𝐵𝑎𝑠𝑒 − 1
(9)
In their investigation of cutoff criteria for these fit indices, Hu and Bentler (1999) recommend
that values less than 0.06 for the RMSEA and values greater than 0.95 for both the CFI and the
TLI indicate relatively good model fit.
Although the sampling distribution of 𝑇 𝑀𝐿
follows a chi-square distribution in large
samples (e.g. N ≥200), it does not follow this distribution in small samples (N<200; Curran,
Bollen, Paxton, Kirby, & Chen, 2002; Hu & Bentler, 1999). This trickles down to the
approximate fit indices, given that they are based on 𝑇 𝑀𝐿
(Herzog & Boomsma, 2009). For
example, the sample RMSEA overestimates its population counterpart in samples of size less
than N=200, leading to positive bias (Curran, Bollen, Chen, Paxton, & Kirby, 2003).
Furthermore, these shortcomings are exhibited under ideal conditions in which models are
correctly specified and data follow a multivariate normal distribution. Needless to say,
performance does not improve when these assumptions are not met (Fouladi, 2000; Nevitt &
Hancock, 2004).
9
To address these concerns, a correction can be applied which transforms 𝑇 𝑀𝐿
so that it
follows a chi-square distribution even in small samples. Unfortunately, an exact mathematical
transformation to achieve this does not exist (Fujikoshi, 2000; Yuan, Tian, & Yanagihara, 2015).
As a result, all such corrections are heuristic in nature (Herzog, Boomsma, & Reinecke, 2007;
McNeish & Harring, 2017). Several small sample corrections (SSCs) for 𝑇 𝑀𝐿
have been
proposed. The most popular of these, at least in the methodological literature, are those by
Bartlett (1950), Yuan (2005), and Swain (1975), due to their favorable performances. In small
samples, the test statistics which make up the sampling distribution of 𝑇 𝑀𝐿
tend to be larger than
those of the reference chi-square distribution they intend to follow. In these situations, 𝑇 𝑀𝐿
must
be scaled down or reduced in order for its sampling distribution to better approximate its
corresponding chi-square distribution. Each of the three aforementioned SSCs aim to do this,
albeit in different ways.
Bartlett (1950) suggested a SSC defined as
𝑇 𝐵𝐴
= (1 −
2𝑡 + 4𝑓 + 5
6( 𝑁 − 1)
)𝑇 𝑀𝐿
(10)
where t is the number of observed variables and f is the number of latent variables. Originally
developed for exploratory factor analysis, 𝑇 𝐵𝐴
adjusts 𝑇 𝑀𝐿
by a function of the number of
variables, both observed and latent, as well as the sample size. One attractive feature is that it can
easily be calculated post hoc. Another is that its sampling distribution is asymptotically identical
to that of 𝑇 𝑀𝐿
. For small samples, however, 𝑇 𝐵𝐴
more closely approximates a chi-square
distribution than 𝑇 𝑀𝐿
(Herzog & Boomsma, 2009; Fouladi, 2000; Nevitt & Hancock, 2004;
Herzog et al., 2007).
10
One drawback of Bartlett’s correction is that it was originally intended for exploratory
factor analysis, and as such is not ideal for more general models for covariance structure analysis
such as those employed in SEM. This is in part due to the extent of the penalty placed on latent
variables, which can be too high in confirmatory settings. To address this, Yuan (2005) proposed
a slight modification of 𝑇 𝐵𝐴
, defined as
𝑇 𝑌𝑈
= (1 −
2𝑡 + 2𝑓 + 7
6( 𝑁 − 1)
)𝑇 𝑀𝐿
(11)
Yuan (2005) noted that the exploratory factor model and the confirmatory factor model are
identical for 𝑓 = 1, and as such his correction is identical to that of Bartlett’s in this case.
However, when 𝑓 > 1, exploratory factor models estimate far more parameters than their
confirmatory counterparts. Because the number of parameters estimated in the model does not
enter into the correction directly, 𝑇 𝑌𝑈
attempts to reduce the penalty indirectly imposed by
𝑇 𝐵𝐴
through the coefficient of f. In 𝑇 𝐵𝐴
, each additional latent variable decreases the correction
factor by
𝜕𝑇
𝜕𝑓
=
−2
3( 𝑁 −1)
, whereas in 𝑇 𝑌𝑈
, the correction is only reduced by
𝜕𝑇
𝜕𝑓
=
−1
3( 𝑁 −1)
(Herzog &
Boomsma, 2009). As such, 𝑇 𝑌𝑈
does not penalize the number of latent factors as heavily as 𝑇 𝐵𝐴
,
since the addition of latent factors in a confirmatory factor model leads to fewer additional
parameters estimated than the addition of latent factors in exploratory factor models. Because of
this, for confirmatory models typically seen in SEM, generally 𝑇 𝐵𝐴
< 𝑇 𝑌𝑈
for 𝑓 > 1.
One can also incorporate the number of parameters estimated in the model more directly.
For example, Swain (1975) derived four SSCs, the most promising of which is defined as
𝑇 𝑆𝑊
= (1 −
𝑡 ( 2𝑡 2
+ 3𝑡 − 1)− 𝑞 ( 2𝑞 2
+ 3𝑞 − 1)
12𝑑𝑓 ( 𝑁 − 1)
) 𝑇 𝑀𝐿
(12)
11
where 𝑑𝑓 is the degrees of freedom and
𝑞 =
√1 + 4𝑡 ( 𝑡 + 1)− 8𝑑𝑓 − 1
2
(13)
The number of parameters estimated in the model, p, enters the correction through the degrees of
freedom since, for models with both mean and covariance structures, df is calculated as
𝑑𝑓 =
𝑡 ( 𝑡 + 1)
2
+ 𝑡 − 𝑝
(14)
As such, 𝑇 𝑆𝑊
can easily be written as a function p of instead of df.
Another popular correction for 𝑇 𝑀𝐿
is the Satorra-Bentler correction (Satorra & Bentler,
1994). Although this correction was originally developed to address nonnormality in the data, it
has been suggested that it can be used to better approximate the chi-square distribution in small
samples (Satorra & Bentler, 2001). Though I am not concerned with nonnormality here, I include
the Satorra-Bentler correction solely for its potential benefits in small samples, following the
practice of Herzog et al. (2007). Consequently, I will consider it a SSC for my purposes and refer
to it as such. Following the notation of Herzog et al. (2007), the Satorra-Bentler correction takes
the form
𝑇 𝑆𝐵
= (
𝑑 ′
tr( 𝐀 )
) 𝑇 𝑀𝐿
(15)
where 𝑑 ′ is a sample specific adjustment of the number of degrees of freedom, defined as
𝑑 ′ = 𝑖𝑛𝑡 {
[tr( 𝐀 ) ]
2
tr( 𝐀 2
)
}
(16)
12
where the 𝑖𝑛𝑡 {⋅} function rounds its argument to the nearest integer. Here, the construction of the
matrix 𝐀 is rather complex, created as part of a scaling factor to account for potential kurtosis of
the observed variables. I refer interested readers to Satorra & Bentler (1994, p. 406) and Muthén
(2004, Equation 110) for details of its computation. 𝑇 𝑆𝐵
is also referred to in the literature as the
adjusted Satorra-Bentler chi-square statistic, given that it adjusts for both mean and variance of
𝑇 𝑀𝐿
. For multivariate normal data, 𝑇 𝑆𝐵
follows a chi-square distribution asymptotically, with 𝑑 ′
degrees of freedom.
Two other methods studied in the literature to address the sampling distribution of 𝑇 𝑀𝐿
in
small samples are Browne’s asymptotic distribution free (ADF) chi-square (Browne, 1982, 1984)
and the bootstrap (Bollen & Stine, 1992). The ADF statistic has the advantage that it does not
require any specific distributional assumptions on the data, making it potentially attractive.
However, simulation studies have shown that it requires large sample sizes before becoming a
viable option. Although some have found reliable behavior of this statistic at samples of size
N=500 (Curran, West, & Finch, 1996), others suggest samples of N=5,000 or more (Hu, Bentler,
& Kano, 1992). Both are far larger than the small samples (N ≤100) considered here.
With regard to the bootstrap, Bollen and Stine (1992) following the work of Beran and
Srivastava (1985) proposed a bootstrap method for generating a sampling distribution for 𝑇 𝑀𝐿
.
Bollen and Stine noted that naïve bootstrapping from the original sample data is inappropriate
because the distribution of bootstrapped model test statistics follow a non-central chi-square
distribution as opposed to a central chi-square distribution. As such, they suggested a
transformation on the original sample data such that the resampling space satisfies the null
hypothesis. They then proved that the expectation of 𝑇 𝑀𝐿
values drawn from the bootstrapped
13
sample are equal to the model degrees of freedom, and showed empirically that the bootstrapped
test statistics approximate a central chi-square distribution well. Unfortunately, in their thorough
evaluation of this method, Nevitt and Hancock (2001) recommend that this procedure should not
be used with samples smaller than N=100, citing inflated standard errors of parameter estimates.
Due to his findings that the bootstrap procedure yields type I error rates below the nominal level,
Enders (2002) is even more cautious, suggesting a minimum sample size of N=200 when data
are incomplete, a common occurrence in longitudinal data. As with the ADF statistic, both
recommendations require samples too large for our purposes. As such, I do not consider either
the ADF test statistic or the Bollen-Stine bootstrap from this point onward.
As alluded to earlier, the current work seeks to better understand the performances of
𝑇 𝑀𝐿
, 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, 𝑇 𝑆𝑊
, and 𝑇 𝑆𝐵
in longitudinal models applied to small samples. These test statistics
have been studied via simulation under a variety of conditions. I limit my brief review of this
literature to conditions most applicable to the studies in this dissertation, namely those of small
samples from multivariate normal data. The most comprehensive simulation study regarding the
performance of these fit statistics is Fouladi’s examination of their behavior under conditions of
nonnormality (Fouladi, 2000). Under extremely mild distributional nonnormality (no skew, with
kurtosis of -1 or 1), 𝑇 𝐵𝐴
and 𝑇 𝑆𝑊
were able to control type I error best among the SSCs studied.
For more general nonnormal distributions, 𝑇 𝑆𝐵
showed the best overall control of type I error.
One limitation of Fouladi’s (2000) study was that it did not posit the existence of latent
variables in the simulated models. Nevitt and Hancock (2004) sought to address this, simulating
data from a 5-factor latent variable path (LVP) model as well as a 7-factor confirmatory factor
analysis (CFA) model. For normally distributed data, this study found that 𝑇 𝑀𝐿
was robust only
for large samples of several hundred. 𝑇 𝐵𝐴
was better able to control type I error rates, showing
14
consistent performance at samples between N=90 and N=180, for the LVP model, while
requiring samples of size roughly between N=130 and N=350 to do so for the CFA model. 𝑇 𝑆𝐵
outperformed both 𝑇 𝑀𝐿
and 𝑇 𝐵𝐴
, needing as few as N=35 to N=70 people to control type I error
rate for the LVP model, while requiring N=70 to N=130 for the CFA model.
Herzog and colleagues studied the performance of these test statistics in large CFA
models containing between 4 and 16 latent factors with 3 indicators each (Herzog et al., 2007).
Because of the size of these models, the authors focused on larger sample sizes, of which I will
only consider the smallest of N=200. With regard to control over type I error rates, the
performance of 𝑇 𝑀𝐿
was unacceptable regardless of how many latent factors were in the model.
𝑇 𝐵𝐴
provided good control for up to 6 factors, whereas 𝑇 𝑆𝑊
did so for up to 10. 𝑇 𝑆𝐵
controlled
type I error rate the best, able to do so for models of up to 14 factors. This study also attempted
to find how well the sampling distribution of each of these test statistics approximated the
reference chi-square distribution. Using the Kolmogorov-Smirnov test, the study found that 𝑇 𝑆𝑊
most closely matched its reference distribution, and as such recommended its use for normally
distributed data.
Finally, Herzog and Boomsma (2009) studied the performance of approximate fit indices
in small samples in addition to the test statistics upon which they are based. Using a 4-factor
CFA model, this study examined samples ranging from N=50 to N=200. For a correctly specified
model, the authors found that 𝑇 𝑀𝐿
was biased (with respect to its asymptotically predicted mean)
across all sample sizes and that the RMSEA based on it performed poorly compared to the SSCs.
They also found that 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, and 𝑇 𝑆𝑊
exhibited acceptable levels of bias across all sample
sizes, with the exception that 𝑇 𝑆𝑊
required a sample of N≥75. Turning to the RMSEA for each of
these SSCs, all seemed to perform similarly for samples greater than N=100. However, for
15
samples less than this size, 𝑇 𝐵𝐴
performed best followed by 𝑇 𝑌𝑈
and finally 𝑇 𝑆𝑊
. Based on these
results, the authors made several recommendations. First, they concluded that 𝑇 𝑀𝐿
was too liberal
for small sample sizes and so acceptable models were too often rejected. Second, they noted that
although 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
more closely followed a non-central chi-square distribution for minor
model specifications, they had lower power. As such, they advocated for the use of 𝑇 𝑆𝑊
because
it did not reject models too frequently when models were small, but still possessed enough power
to reject misspecified models. Using 𝑇 𝑆𝑊
, Herzog and Boomsma (2009) also suggest that for
incremental fit indices such as the CFI and the TLI, only the test statistic for the target model
should be corrected, while that of the baseline model (𝑇 𝐵𝑎𝑠𝑒 ) should remain uncorrected.
One common feature of most of these studies is that they examine the performance of test
statistics in the context of CFA models. However, CFA models have different attributes than the
longitudinal models of interest here. Longitudinal data are typically more expensive to collect
given that they require repeated observations on the same individuals over time. As such,
longitudinal models usually involve fewer observed variables than CFA models. Incomplete data
are also more of a concern for longitudinal studies, since subjects often drop out of the study as
time passes. Both of these impact how SSCs correct 𝑇 𝑀𝐿
: the former results in a reduction of t
while the latter implies that the N in the formulae overestimates the amount of information
available in the data.
To my knowledge, the only study examining SSCs in the context of longitudinal models
is that of McNeish and Harring (2017). This study compares the performances of 𝑇 𝑀𝐿
, 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
,
and 𝑇 𝑆𝑊
in the context of the LGM. As mentioned above, the authors point out that most
structural equation models are fit using full information maximum likelihood (FIML) estimation,
which does not impute values for missing data. Instead, FIML allows each individual to
16
contribute to the likelihood function based on only their observed scores. As such, a dataset
containing N individuals with missing data does not contain N individuals’ worth of information
for the purposes of the SSCs. Since N overestimates the amount of information available, SSCs
do not correct 𝑇 𝑀𝐿
as much as they should.
To address this, McNeish and Harring (2017) suggested a missing data correction (MDC)
involving replacing N in 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, and 𝑇 𝑆𝑊
with an effective sample size, C
2
N. Here, C is the
proportion of observed elements in the data matrix, conceptualized by the authors as a measure
of data quality. For example, consider a sample in which N=100 individuals were measured at 10
time points each. The resulting data matrix would contain 100×10 = 1,000 cells. If 90% of these
cells were complete, than C=.9, and C
2
N=81. The substitution of C
2
N for N into the equations
for 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, and 𝑇 𝑆𝑊
thus creates analogous corrections to be used in the presence of missing
data. One useful property of this MDC is that, if the data are complete, C
2
=1, so that the effective
sample size will be N and the corrected value of the test statistic will be identical to its complete
data counterpart. Though McNeish and Harring (2017) did not examine 𝑇 𝑆𝐵
, I use this
opportunity to note that Yuan and Bentler (2000) proposed a MDC for 𝑇 𝑆𝐵
that is widely
implemented in popular software packages. I refer those interested in computational details to
their article. Therefore, all four SSCs of interest also have counterparts that incorporate missing
data (cf. Savalei, 2010).
Returning to the study itself, McNeish and Harring (2017) studied the performance of the
test statistics under different sample sizes (N=20, 30, 50, 100), number of time points (4, 8), and
percentage of missing data (0%, 10%, 20%). Similar to the findings of previous studies, 𝑇 𝑀𝐿
performed extremely poorly. Even with complete data, at N=20 it exhibited a type I error rate
near 35% with 8 time points. On the other hand, 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, and 𝑇 𝑆𝑊
were all able to control type I
17
error rates well when data were complete. When data were incomplete, the SSCs on their own
were inadequate, requiring the MDC to be applied before being able to maintain control over
type I error rates. The authors also found that the MDC was required in the calculation of
approximate fit indices as well. Overall, they recommend 𝑇 𝑌𝑈
in small samples when data are
complete, and 𝑇 𝐵𝐴
when data are incomplete.
Though McNeish and Harring (2017) discuss the application of SSCs in LGMs, to my
knowledge, no attention has yet been given to other longitudinal structural equation models. For
example, consider the latent change score model (LCSM; McArdle, 2001; McArdle &
Hamagami, 2001; Hamagami & McArdle, 2001), another popular approach for modeling change
in the SEM framework. The LCSM describes growth by explicitly modeling change scores from
one time point to the next. The LCSM can be written as follows. According to classical test
theory, observed scores from an individual i at time time can be written as a combination of true
scores and errors, such that
𝑌 𝑖 ,𝑡𝑖𝑚𝑒 = 𝑦 𝑖 ,𝑡𝑖𝑚𝑒 + 𝑒 𝑖 ,𝑡𝑖𝑚𝑒 (17)
where the true scores are allowed to covary and the errors follow a multivariate normal
distribution. Then, the change score between a given time point and the one immediately
preceding it can be written as
∆𝑦 𝑖 ,𝑡𝑖𝑚𝑒 = 𝑦 𝑖 ,𝑡𝑖𝑚𝑒 − 𝑦 𝑖 ,𝑡𝑖𝑚𝑒 −1
(18)
It is upon these change scores that hypotheses are made. For example, in a no change model,
Equation (18) reduces to
∆𝑦 𝑖 ,𝑡𝑖𝑚𝑒 = 0 (19)
18
indicating that no change occurs over time. One can also hypothesize a proportional change
model in which change is proportional to the previous true score. This model is written as
∆𝑦 𝑖 ,𝑡𝑖𝑚𝑒 = 𝛽 ⋅ 𝑦 𝑖 ,𝑡𝑖𝑚𝑒 −1
(20)
where 𝛽 is an estimated parameter fixed to be the same across individuals. A constant change
model can also be formulated in which individuals are proposed to change by the same amount
from time point to time point. This model can be written as
∆𝑦 𝑖 ,𝑡𝑖𝑚𝑒 = 𝑠 𝑖 (21)
where 𝑠 𝑖 is an individual specific constant change component with mean 𝜇 𝑠 and variance 𝜎 𝑠 2
.
Finally, one can express a model incorporating both a constant change component as well as a
proportional change component. This dual change model can be written as
∆𝑦 𝑖 ,𝑡𝑖𝑚𝑒 = 𝑠 𝑖 + 𝛽 ⋅ 𝑦 𝑖 ,𝑡𝑖𝑚𝑒 −1
(22)
and represents a nonlinear growth trajectory. Combining this with Equation (17), the model for
the original observed variables can be written as
𝑌 𝑖 ,𝑡𝑖𝑚𝑒 = 𝑦 𝑖 ,1
+ ∑ 𝑠 𝑖 𝑡𝑖𝑚𝑒 𝑗 =2
+ ∑ 𝛽 ⋅ 𝑦 𝑖 ,𝑗 −1
𝑡𝑖𝑚𝑒 𝑗 =2
+ 𝑒 𝑖 ,𝑡𝑖𝑚𝑒 (23)
Path diagrams for the (a) no change model, (b) proportional change model, (c) constant change
model, and (d) dual change model are provided in Figure 2.
Because of the specification of the dual change model, it exhibits some useful properties.
First, it is clear that both the proportional change and constant change models are both nested
within it. Similarly, the no change model is nested within both the proportional change and
constant change models. This allows for model comparison between these models using the
likelihood ratio test. However, the proportional change and constant change models are not
19
nested within each other, and as such cannot be compared using the likelihood ratio test. Second,
the constant change model is actually equivalent to the LGM (Ghisletta & McArdle, 2012),
yielding the same model expectations. Therefore, the LGM as described by Equation (1) can also
be thought of as a special case of the LCSM.
Though the performance of other longitudinal models such as the LGM have been
studied in small samples, to my knowledge the LCSM has received no such attention. The
consensus in the literature is that the sampling distribution of 𝑇 𝑀𝐿
relies heavily on asymptotic
behavior and is therefore resoundingly inadequate when used in small samples. However it
remains unclear which SSCs, if any, can remedy this. 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, 𝑇 𝑆𝑊
, and 𝑇 𝑆𝐵
have all been
recommended under various circumstances, but whether these recommendations hold for the
LCSM in small samples with missing data is questionable. Additionally, all have severe
limitations. This stems in part from the notion that all were intended to be used for models with a
significant measurement component. That is, all four SSCs were derived for models in which
latent variables are conceptualized as unobserved constructs inferred from the relationships
between observed variables. However, in longitudinal SEM, latent variables represent something
entirely different: parameters associated with growth and dynamic processes.
This distinction manifests itself in the correction factors of the SSCs. For example, both
𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
shrink 𝑇 𝑀𝐿
by an amount proportional to the number of latent variables, f. In CFA
models (excluding models for items), each latent variable typically has, without loss of
generality, around 4 indicators (give or take). Thus, as a function of observed variables, t, the
number of latent variables can be represented as 𝑓 =
𝑡 4
. For the LGM however, f is not a function
of t, but rather the functional form of the growth. For the linear growth model for example, one
would obtain 𝑓 = 2 for all t. Finally, the LCSM contains the highest number of latent variables,
20
given that the true scores are explicitly modeled in the matrix algebra. For the no change and
proportional change models, 𝑓 = 2𝑡 and for the constant change and dual change models, 𝑓 =
2𝑡 + 1. This is troubling for two reasons. First, f varies drastically depending on the model,
which in turn can proportionally distort the correction factor. Second, 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
are not
invariant under model respecification. For example, if a model for linear growth were to be fit in
the LGM and LCSM frameworks (using the constant change model), they would obtain identical
values of 𝑇 𝑀𝐿
because both models are equivalent and will yield the same model fit, but different
values of 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
because each model contains a different number of latent variables.
𝑇 𝑆𝑊
avoids these problems by correcting based on the number of degrees of freedom, df,
as opposed to simply the number of latent variables. Though this solves the second issue raised
above, in doing so it creates an alternative problem. This problem deals with the expression for q
in Equation (13). Because this expression contains a radical, it implies that the quantity
underneath that radical must be non-negative, imposing a constraint. It can easily be shown that
this constraint amounts to 𝑡 ≤ 𝑝 , that the number of variables in the models must be less than or
equal to the number of parameters estimated. This is not an issue in CFA models, where p tends
to be much larger than t. However, this is not necessarily the case for longitudinal models. For
example a linear growth model fit in either the LGM framework or as a constant change model
in the LCSM framework estimates 6 parameters: the 2 means of the intercept and slope, their 2
variances, their covariance, and a residual variance. This is troublesome in that the above
constraint implies that 𝑇 𝑆𝑊
is not defined for this model if it is fit to data with 7 or more time
points. Though this can be alleviated in a sense by allowing for a heteroscedastic (though still
diagonal) residual structure, some argue that because the same variable is measured over time, a
homoscedastic residual structure is more appropriate (see Grimm & Widaman, 2010, for a more
21
thorough discussion). Furthermore, with regard to LCSMs, the constant change model estimates
more parameters than most other models. Referring to the models whose diagrams are given in
Figure 2, the no change model estimates 3 parameters, the proportional change model estimates
4, the constant change model estimates 6 (as pointed out above), and the dual change model
estimates 7. This becomes problematic when trying to compare models, as the no change model
could not be tested unless t ≤ 3, but it would be extremely difficult to fit a dual change model for
t = 3 given the exponential shape of the growth trajectory. Lastly, 𝑇 𝑆𝐵
was not designed as a SSC
at all; it was originally intended for nonnormal data. Given that the focus here is not on
nonnormality, its strengths will not be fully utilized potentially putting it at a disadvantage in
comparison to the other SSCs.
The performance of approximate fit indices also requires further evaluation. Even if a
SSC is able to approximate its reference chi-square distribution well, its corresponding
approximate fit indices may not perform well, or vice versa. Herzog and Boomsma (2009) found
that for incremental fit indices, though the target model should be corrected, the baseline model
should not. Whether nuances such as this hold for our conditions of interest has yet to be
determined, given the differences between CFA models and longitudinal structural equation
models. For example, longitudinal models do not use the independence model as the baseline
model (Widaman & Thompson, 2003). Instead, a more appropriate baseline model is a model of
no growth, also called an intercept-only model in the LGM framework and the previously
discussed no change model in the LCSM framework.
The impact of incomplete data, and how it should be addressed in the context of SSCs,
requires more attention as well. McNeish and Harring’s (2017) MDC of C
2
N is the first to
incorporate a measure of data completeness into 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, and 𝑇 𝑆𝑊
. However, similar to the
22
heuristic nature of the SSCs, the C
2
N MDC is also based on heuristics as opposed to a
mathematical derivation. Drawing inspiration from Rubin and Schenker (1986), McNeish and
Harring (2017) selected this form for their MDC simply because it performed well with regard to
control of type I error rate, admitting that it is arbitrary and that other MDCs may perform better
(D. McNeish, personal communication, September 6, 2016). I do not criticize their choice due to
lack of mathematical rigor; on the contrary, I applaud them for finding a correction that works.
However, whether this MDC works when applied to the LCSM has yet to be determined.
Purpose
The purpose of this dissertation is to study the performance of the LCSM in small
samples. This is done using a series of five studies, each with a specific aim. Study 1 focuses on
understanding the extent to which the LCSM can be fit in small samples with missing data.
Study 2 examines differences in the performances of test statistics when models are fit using
different specifications (i.e. the LGM vs. the LCSM framework). Study 3 evaluates the
likelihood ratio test and its use in model selection for models fit in the LCSM framework. Study
4 proposes an extension of the Monte Carlo test as an alternative to SSCs and evaluates its
performance under the conditions of Studies 2 and 3. Study 5 applies the knowledge gained from
Studies 1 through 4 to an empirical dataset to better understand cognitive aging in adults. Finally,
the dissertation will conclude with a general discussion of the results, including
recommendations regarding how the methods studied should be used in practice as well as
directions for future research.
23
Study 1: Estimation of LCSMs in Small Samples
The goal of Study 1 is to determine if LCSMs can indeed be fit in small samples. While
mathematically possible in theory, it is not yet known whether it can be done in practice using
conventional SEM software. The estimation algorithms used in SEM software were designed for
large samples, with a high ratio of observed to latent variables. As described in the introduction,
this is not the case here, and smaller samples may not contain enough information for the
software to find the correct solution given the high number of latent variables relative to
observed variables. The presence of missing data further complicates the estimation. As such, the
purpose of Study 1 is to see whether the estimation algorithms for fitting LCSMs consistently
converge to a solution in this context, and whether this solution exhibits any bias in parameter
estimates.
Methods
The goals of Studies 1 through 4 were pursued via Monte Carlo simulation studies. All
studies were conducted in R (R Core Team, 2017) using the lavaan (Rosseel, 2012) and
OpenMx (Neale et al., 2016) packages. Simulation conditions for Studies 1 through 3 followed a
factorial design in which all combinations of each level of each factor were studied. Each cell
consisted of 1,000 replications, in which 1,000 simulated datasets were generated and to which
the corresponding models were fit.
Four factors were varied in an attempt to understand their influence: model, sample size,
number of time points, and proportion of missing data. All four LCSMs previously discussed
were included: the no change model, the proportional change model, the constant change model,
and the dual change model. Four different sample sizes were examined: N=20, 30, 50, and 100.
24
The number of time points was set to t=4, 6, and 8 corresponding to typical values
observed in longitudinal panel studies in which the LCSM is commonly utilized. It should be
noted that the span of the data in each of these conditions was similar; it was simply which time
points were used in the model fitting process that varied. That is, all datasets were generated to
have t=8 time points but only a subset was used, preventing the span of measurement from
inducing an unintentional confound. For conditions in which t=4, only measurement occasions 1,
3, 5, and 7 were used. Similarly, for t=6 only measurement occasions 1, 3, 5, 6, 7, and 8 were
used. For t=6, these measurement occasions were chosen because models with nonlinear
trajectories (the proportional and dual change models) have increasing curvature over time.
Concentrating measurement occasions on points at which the most change occurs has been
shown to improve accuracy and efficiency of parameter estimates (Timmons & Preacher, 2015).
Proportion of missing data was also varied, with levels of 0% (complete data), 10%, and
20%. Given that the focus of introducing incompleteness in the data was to study the impact of
the degradation of data quality on the estimation of the model, and that McNeish and Harring
(2017) found that the missingness mechanism had no impact on the results beyond just data
quality, missingness was based on an MCAR mechanism. Each data point had a certain
probability of being missing depending on the condition. Thus, the exact proportion of
missingness varied slightly in each simulated dataset, but on average was near its target level
(0%, 10%, or 20%). In each condition, complete data were generated and then masked (treated as
missing) depending on the proportion of missingness. However, in all datasets each individual in
the sample provided at least one time point’s worth of information, preserving the sample size
value N.
25
All combinations of each level of each factor resulted in 4×4×3×3 = 144 conditions. Data
from each condition were generated independently from the other conditions using the
simulateData function from the lavaan package. Population values for the parameters,
corresponding to the notation given in Figure 2, were as follows: 𝜇 𝑖 = 𝜎 𝑖 2
= 10, 𝜇 𝑠 = 𝜎 𝑠 2
= 1,
𝛽 = 0.2, 𝜎 𝑖𝑠
= 0.1, and 𝜎 𝑒 2
= 1. Models were fit using either lavaan or OpenMx to optimize
both speed and accuracy.
All four LCSMs were fit to the datasets in each of the 144 conditions. This was done to
examine if the software could fit misspecified models to the data. When comparing models,
some models in the set of candidate models must inevitably be misspecified. It is important to
determine if the software has the capacity to fit these misspecified models so that they can be
considered in the model selection process.
The outcomes of Study 1 were convergence and parameter bias. For convergence, I
report the percentage of replications in each condition for which the model converged to a viable
solution. Bias in parameter estimates was measured by percent bias, defined as
𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑏𝑖𝑎𝑠 = 100 ×
( 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − 𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 )
𝑝𝑜𝑝𝑢𝑙𝑎𝑡𝑖𝑜𝑛 𝑣𝑎𝑙𝑢𝑒 (24)
Since SEM involves the estimation of many parameters, the percent bias statistic places them all
on the same metric, to facilitate easy comparison across parameters on potentially different
scales. Following Hoogland and Boomsma (1998), estimates for which the absolute value of the
percent bias was under 5% were considered acceptable.
26
Results
Convergence rates are presented in Table 1. Table 1 is partitioned into four sections
based on which model was used to generate the data, with (a) corresponding to the no change
model, (b) the proportional change model, (c) the constant change model, and (d) the dual
change model. Table 1 shows that all data generating models converged without issue. In
addition, the misspecified models converged nearly all the time. The only exception to this was
when the dual change model was fit to the no change data, in which approximately 30% of
replications did not converge. This is not unexpected however as the dual change model is
overparameterized, attempting to estimate four parameters that are degenerate in the data.
Percent bias of parameter estimates for data generating models is given in Table 2, which
is partitioned into four sections in the same way as Table 1. Percent bias for most parameters
falls within the acceptable 5% range. Specifically, 𝜇 𝑖 is within 1%, 𝜇 𝑠 is within 3%, 𝛽 is within
1%, and 𝜎 𝑒 2
is within 4%. For 𝜎 𝑖 2
and 𝜎 𝑠 2
, a handful of conditions with smaller sample sizes (N=
20, 30) exceeded the 5% threshold. Yet all were within 7.5% so I do not find this result to be out
of the ordinary. It should be noted, though, that all intercept and slope variances were
underestimated (negative percent bias). The covariance parameter, 𝜎 𝑖𝑠
, did have a higher percent
bias, ranging from -32% to 57%. This is likely due to the way in which percent bias is calculated.
Since this parameter had a small population value (0.1), when dividing by this value the percent
bias was amplified. However, no systematic pattern emerged with regard to the percent bias for
this parameter, and, when averaged over all of the constant and dual change conditions, the
average percent bias was under 5%.
27
Discussion
Based on these results, it is clear that conventional software (lavaan and OpenMx) has
the capability to fit LCSMs in small samples with missing data. Convergence rates were
excellent, with all data generating models converging. Misspecified models also converged
nearly all of the time, with the lone case in which this did not occur (fitting a dual change model
to no change data) being not unexpected. Parameters were estimated accurately, with very little
bias, if any. The one exception here (𝜎 𝑖𝑠
) resulted primarily as a feature of how the percent bias
statistic is calculated rather than actual bias itself.
Regarding the model fitting process, I have two remarks. The first deals with Heywood
cases encountered when fitting misspecified models. Heywood cases occur when estimation
results in out-of-bounds parameter estimates, most often negative variances. I observed
Heywood cases for 𝜎 𝑖 2
when fitting the no change model to proportional and dual change data, as
well as when fitting the constant change model to dual change data. Heywood cases also
appeared for 𝜎 𝑠 2
when fitting the constant change model to no change data and the dual change
model to proportional change data. The methodological literature on Heywood cases shows that
they can occur as a result of sampling fluctuations caused by small sample sizes, model
misspecification, and missing data (Boomsma, 1985; Anderson & Gerbing, 1984; Chen, Bollen,
Paxton, Curran, & Kirby, 2001), all of which were encountered here. Furthermore, these studies
conclude that despite negative variances, it is still typically acceptable to use the resulting value
of 𝑇 𝑀𝐿
(Gerbing and Anderson, 1987; Chen et al., 2001). Though I do not use 𝑇 𝑀𝐿
in Study 1, it
is a critical component of Studies 2 and 3, and so I note that these values will be treated as valid
in all subsequent studies.
28
The second remark involves optimization. When fitting LCSMs, I opted to use OpenMx.
This is because the fastest optimizer when fitting these models in OpenMx, CSOLNP, was far
faster and exhibited better convergence rates than its lavaan counterpart, nlminb. This is
likely due to the small sample size and presence of missing data. However, CSOLNP had
difficulty fitting the saturated model needed to calculate 𝑇 𝑀𝐿
. As such, the log-likelihood for the
saturated model was found by fitting a more easily fit model in lavaan (e.g. the independence
model) and then extracting the log-likelihood for the saturated model, which lavaan fits
automatically. Since the saturated model only needed to be fit once per dataset and OpenMx
does not automatically fit the saturated model, the strategy of using lavaan once per dataset to
calculate the log-likelihood for the saturated model and OpenMx to fit the LCSMs of interest
allowed for the fastest and most efficient computation. Similar strategies of alternating between
lavaan and OpenMx were used for the remaining studies, capitalizing on the strengths of each
package.
29
Study 2: Model Fit and Model Specification Effects
The goal of Study 2 is to examine the effects of model specification on model fit.
Specifically, I focus on the LGM with linear basis and the constant change LCSM. As mentioned
earlier, since these models are equivalent, 𝑇 𝑀𝐿
is identical in both. However, each framework
uses a different number of latent variables, leading to differences in some SSCs such as 𝑇 𝐵𝐴
and
𝑇 𝑌𝑈
, the SSCs found by McNeish and Harring (2017) to perform best when fitting these models
in small samples. The effects of missing data and MDCs are also considered. Thus, Study 2
serves to evaluate model fit statistics under a broad range of conditions.
Methods
Study 2 reanalyzed the data generated under the constant change model from Study 1.
The levels for the three factors that were varied (sample size, number of time points, and
proportion of missing data) were identical to that of Study 1. As such, Study 2 had 4×3×3 = 36
conditions. Twelve different model fit statistics were calculated for each replication in each
condition. 𝑇 𝑀𝐿
, 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, 𝑇 𝑆𝑊
, and 𝑇 𝑆𝐵
were all calculated when possible (𝑇 𝑆𝑊
was not defined
for models in which t=8). For 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
, test statistics were calculated based on whether the
model was fit using the LGM framework (𝑇 𝐿𝐺𝑀 ) or the LCSM framework (𝑇 𝐿𝐶𝑆𝑀 ). In addition,
for 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, and 𝑇 𝑆𝑊
, MDCs were also examined, denoted 𝑇 𝐵𝐴 .𝑀 , 𝑇 𝑌𝑈 .𝑀 , and 𝑇 𝑆𝑊 .𝑀 . This led to
12 test statistics calculated in total: 𝑇 𝑀𝐿
, 𝑇 𝐵𝐴
𝐿𝐺𝑀 , 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 , 𝑇 𝐵𝐴
𝐿𝐶𝑆𝑀 , 𝑇 𝐵𝐴 .𝑀 𝐿𝐶𝑆𝑀 , 𝑇 𝑌𝑈
𝐿𝐺𝑀 , 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 , 𝑇 𝑌𝑈
𝐿𝐶𝑆𝑀 ,
𝑇 𝑌𝑈 .𝑀 𝐿𝐶𝑆𝑀 , 𝑇 𝑆𝑊
, 𝑇 𝑆𝑊 .𝑀 , and 𝑇 𝑆𝐵
.
Each of these test statistics is intended to be compared to a chi-square reference
distribution. As such, their performance was evaluated in two ways. First, following the
procedure of Herzog et al. (2007) I used the Kolmogorov-Smirnov test to evaluate which test
30
statistics best approximated their reference chi-square distributions. Second, I evaluated type I
error rates to determine how well each test statistic was able to control them. I employed the
criterion of Bradley (1978) in claiming that type I error rates between 2.5% and 7.5% were
within range of the nominal 5% rate.
Finally, for each statistic, the median RMSEA, CFI, and TLI of all replications were
calculated in each condition to determine the usefulness of these approximate model fit indices.
For the CFI and TLI, the intercept-only or no change model served as the baseline model.
Following the recommendation of Herzog and Boomsma (2009), SSCs were not applied to the
baseline model. Following Hu and Bentler (1999) values less than or equal to 0.06 for the
RMSEA and values greater than or equal to 0.95 for both the CFI and the TLI were considered
acceptable.
Results
The one-sample Kolmogorov-Smirnov test measures whether the sample matches a target
reference distribution. As such, in each condition, the test statistics from each replication were
calculated, forming a distribution which was compared to the appropriate chi-square reference
distribution. Results from these tests are provided in Table 3, which contains the p values from
these tests. A p value greater than .05 indicates that the test statistic reasonably approximates its
target chi-square distribution, whereas a p value less than .05 suggests it does not.
Table 3 shows that 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 and 𝑇 𝐵𝐴
𝐿𝐺𝑀 performed the best in this regard. While they did not
perform well overall, 𝑇 𝑆𝑊
and 𝑇 𝑆𝑊 .𝑀 did well in conditions for which the others did not, such as
conditions with complete data and N=20/t=4, N=30/t=4, and N=50/t=6. 𝑇 𝑀𝐿
performed very
poorly, failing to approximate the reference distribution in nearly all conditions, while 𝑇 𝑆𝐵
was
31
even worse, not being able to do so in any condition. The LCSM corrections (𝑇 𝐿𝐶𝑆𝑀 ) also
performed extremely poorly, on par with 𝑇 𝑀𝐿
and 𝑇 𝑆𝐵
. Surprisingly, 𝑇 𝐵𝐴
𝐿𝐺𝑀 and 𝑇 𝑌𝑈
𝐿𝐺𝑀 performed
exceptionally well in the m=10% conditions despite performing poorly in the corresponding
m=0% and m=20% conditions, though it is unclear why this occurred.
In order for the reader to better visualize these results, a plot of the distributions of each
test statistic for the N=30/t=8/m=10% condition is given in Figure 3. 𝑇 𝐵𝐴
𝐿𝐺𝑀 and 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀
approximate the chi-square distribution very closely, with 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 and 𝑇 𝑌𝑈
𝐿𝐺𝑀 serving as a close
second. As expected, values of 𝑇 𝑀𝐿
are too large, hence the need for SSCs, but surprisingly
values of 𝑇 𝑆𝐵
are even larger. 𝑇 𝐿𝐶𝑆𝑀 on the other hand overcorrects, as these values are too
small, having been shrunk too much. Applying MDCs to 𝑇 𝐿𝐶𝑆𝑀 only exacerbates this problem.
Neither 𝑇 𝑆𝑊
nor 𝑇 𝑆𝑊 .𝑀 are included in this plot since it is undefined for t=8.
Results for type I error rates are given in Table 4. This table contains the percentage of
replications in each condition that made a type I error. 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 and 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 had the best control over
type I error rates. In fact, both were perfect for N ≥ 50. 𝑇 𝐵𝐴
𝐿𝐺𝑀 and 𝑇 𝑌𝑈
𝐿𝐺𝑀 also performed well,
nearly matching their MDC counterparts. 𝑇 𝑆𝑊 .𝑀 and 𝑇 𝑆𝑊
did not perform well for conditions
with N < 50, but did for conditions with N ≥ 50. 𝑇 𝑀𝐿
and 𝑇 𝑆𝐵
failed to control type I error rates in
nearly all conditions, with 𝑇 𝑆𝐵
being the worst among all test statistics. The 𝑇 𝐿𝐶𝑆𝑀 statistics also
performed very poorly, with 𝑇 𝑌𝑈
performing slightly better than 𝑇 𝐵𝐴
.
Results for approximate fit indices are presented in Table 5, containing (a) median
RMSEAs, (b) median CFIs, and (c) median TLIs. 𝑇 𝐵𝐴
𝐿𝐶𝑆𝑀 , 𝑇 𝐵𝐴 .𝑀 𝐿𝐶𝑆𝑀 , and 𝑇 𝑌𝑈 .𝑀 𝐿𝐶𝑆𝑀 performed best with
median RMSEAs of 0 in all conditions. 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 and 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 were not far behind, with RMSEAs
above 0.06 only in m=20% conditions with N=20/t=6, N=20/t=8, and N=30/t=8. All approximate
32
fit indices performed excellently for conditions with N ≥ 50, but some struggled for conditions
with N < 50. In particular, as with the other outcomes, 𝑇 𝑀𝐿
performed especially poorly while
𝑇 𝑆𝐵
was worse. For the CFI and TLI, all 𝑇 𝐿𝐶𝑆𝑀 statistics test statistics performed perfectly in all
conditions. 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 and 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 were a close second, only encountering difficulties in the
N=20/t=8/m=20% condition. The others performed perfectly for samples of size N ≥ 30 (with the
sole exception being 𝑇 𝑀𝐿
in the condition with N=30/t=8/m=20%), though some had trouble with
some N=20 conditions, particularly those with more time points and more missing data.
Discussion
Comparing fit statistics, overall 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
performed best. This supports the findings
of McNeish and Harring (2017) who came to a similar conclusion. Based on my results, 𝑇 𝑌𝑈
had
a slight edge, though not by much. 𝑇 𝑆𝐵
clearly exhibited the worst performance. In most cases, its
values were even larger than those of 𝑇 𝑀𝐿
, increasing the distance between its distribution and
that of the chi-square reference distribution. 𝑇 𝑀𝐿
also performed poorly, but corrections were
able to improve on it a great deal. 𝑇 𝑆𝑊
’s performance was in between that of 𝑇 𝐵𝐴
/𝑇 𝑌𝑈
and 𝑇 𝑀𝐿
,
but it was hampered by the fact that it was undefined for all t=8 conditions.
Comparing modeling frameworks, SSCs using latent variable counts from the LGM
framework clearly outperformed those using counts from the LCSM framework. The LCSM
framework has too many latent variables, and the SSCs which make use of these counts are not
properly calibrated for this. This led the 𝑇 𝐿𝐶𝑆𝑀 statistics to overcorrect 𝑇 𝑀𝐿
, shrinking it far too
much. This manifested itself in the inability to approximate the chi-square reference distribution,
as well as low type I error rates. As a byproduct, the 𝑇 𝐿𝐶𝑆𝑀 statistics naturally performed best
when it came to approximate fit indices. This is because these indices measure model misfit.
33
𝑇 𝐿𝐶𝑆𝑀 statistics shrink the estimate of the misfit, underestimating it and leading one to conclude
that the model fits the data well. Since the data generating model was fit to the data, this was the
correct decision. However, if a misspecified model was fit to the data, it is likely that
approximate fit indices using 𝑇 𝐿𝐶𝑆𝑀 corrections would also (though incorrectly) conclude that
these models fit well.
This brings into question the usefulness of approximate model fit indices in this context.
These indices were originally developed in part because 𝑇 𝑀𝐿
would eventually reject with large
enough N, even if the model was correctly specified. They are used in large part to judge model
fit on a metric independent of sample size (Marsh, Balla, & McDonald, 1988; Marsh, Hau, &
Wen, 2004). However, in this particular setting, the sample size is known to be small and this
asymptotic behavior should not apply. An alternative reason for their use, as a check on model
complexity, also does not apply, since these models are relatively simple, especially compared to
the factor analysis models for which they were originally intended. Hence, I suggest that it may
not be necessary to even consider approximate fit indices for the models discussed in this
dissertation, though I believe this proposition is certainly open for debate.
The effects of MDCs were mixed. Their goal was to aid the SSCs in approximating the
reference distribution, by shrinking them even further. When values of the SSCs were larger than
the values of 𝑇 𝑀𝐿
, as was often the case with 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
, this was helpful. However when SSCs
overcorrected, such as in the case of 𝑇 𝐿𝐶𝑆𝑀 statistics, MDCs shrunk the statistic even more,
pushing it further away from the reference distribution. As such, in general they were
occasionally beneficial when models were fit in the LGM framework but not when models were
fit in the LCSM framework.
34
Given all of this, if forced to choose I would recommend 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 , 𝑇 𝐵𝐴
𝐿𝐺𝑀 , 𝑇 𝑌𝑈
𝐿𝐺𝑀 , and 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀
in that order. This ranking is primarily based on each test statistic’s performance on the
Kolmogorov-Smirnov test, since all had similar type I error control.
35
Study 3: Model Comparison via the Likelihood Ratio Test
While evaluating how well a model fits the data is certainly an important concern, the
ability to compare competing models when performing model selection is just as significant. For
LCSMs, this is typically undertaken using the likelihood ratio test. Since the proportional and
constant change models are not nested within each other, testing follows one of two pathways.
For the proportional change pathway, in the first stage the no change model is compared to the
proportional change model. If the proportional change model is not found to improve on the no
change model, testing stops in this pathway and the no change model is selected. On the other
hand, if the likelihood ratio test does result in a statistically significant difference, the second
stage continues by comparing the proportional change model to the dual change model.
Similarly, if the fit of the dual change model is not different from that of the proportional change
model, testing stops and the proportional change model is selected. If the dual change model
does show improvement over the proportional change model, the dual change model is selected.
The second pathway is identical except the constant change model is tested instead of the
proportional change model. The purpose of Study 3 is to better understand the performance of
the likelihood ratio test when SSCs are employed.
Methods
Study 3 also reanalyzed data from Study 1. As such, Study 3 evaluated data from all 144
conditions of Study 1, with data generating model, sample size, number of time points, and
proportion of missing data all being varied across conditions. Since 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
were the most
successful SSCs from Study 2, only these were examined in Study 3, along with 𝑇 𝑀𝐿
for
reference. However since the focus of Study 3 is the LCSM, all the LCSM variants of all SSCs
were used. This resulted in five test statistics in total: 𝑇 𝑀𝐿
, 𝑇 𝐵𝐴
, 𝑇 𝐵𝐴 .𝑀 , 𝑇 𝑌𝑈
, and 𝑇 𝑌𝑈 .𝑀 .
36
Different models were fit in each condition based on the data generating model and its
place in the pathways discussed above. For the no change data, the no change, proportional
change, and constant change models were fit. The fit of the no change model was compared to
that of both the proportional change model (NP) and the constant change model (NC) via the
likelihood ratio test. For the proportional change data, the no change, proportional change, and
dual change models were fit. Here, comparisons were conducted between the no change and
proportional change models (NP) and the proportional change and dual change models (PD). A
similar procedure was used for the constant change data, in which the no change, constant
change and dual change models were fit. In this case, the no change model was compared to the
constant change model (NC) and the constant change model was compared to the dual change
model (CD). Finally, for the dual change data, all four LCSMs were fit to the data. Comparisons
were made between the no change and proportional change (NP), the proportional change and
dual change (PD), the no change and constant change (NC), and the constant change and dual
change models (CD). Figure 4 provides a visual representation of these comparisons for clarity.
Results
All likelihood ratio tests were judged by their ability to select the appropriate model
given the context. Table 6 displays the proportion of replications in each condition for which the
likelihood ratio test favored the model with more estimated parameters (the more complex
model). These results are partitioned according to the data generating model: (a) contains results
for the no change data, (b) for the proportional change data, (c) for the constant change data, and
(d) for the dual change data.
The correct decision for each model comparison varied by context. For example, for no
change data, the correct model was the no change model. Therefore, choosing the proportional
37
change model in the NP comparison and choosing the constant change model in the NC
comparison constituted type I errors. Given that the likelihood ratio tests were conducted at the
95% confidence level, the expected type I error rate was 5%. As in Study 2, type I error rates
between 2.5% and 7.5% were considered to be within range of the nominal 5% rate (Bradley,
1978). However, for proportional change data, the correct model was the proportional change
model. In this case, for the NP comparison the correction choice was the proportional change
model. As such, the values listed in these portions of the table are not type I error rates, but
instead represent the power of the test. Ideally, these values would be as close to 1 as possible.
Yet for the PD comparisons, choosing the dual change model still represents a type I error, and
these values should be close to .05. A similar case can be made for the constant change data,
where results for NC comparisons represent power and CD comparisons represent type I error
rates. For the dual change data, all results represent power, as the dual change model was the
most complex model considered in the model comparison process.
Examining Table 6, results were fairly consistent across data generating models. With
regard to type I error rate control, surprisingly 𝑇 𝑀𝐿
had the best performance across all
conditions. 𝑇 𝑀𝐿
was able to keep type I error rates within an acceptable range in virtually all
conditions. The SSCs on the other hand, encountered some difficulty. Among the SSCs, 𝑇 𝑌𝑈
exhibited the best type I error rate control, though it was nowhere near as effective as 𝑇 𝑀𝐿
. The
remaining SSCs were somewhat worse, and none seemed to stand out from the others.
All SSCs performed perfectly for N=100 conditions, save for a pair of cases in making
NC comparisons in no change data. 𝑇 𝑌𝑈
also performed perfectly for N=50 conditions, except for
a few cases with no change data. However, type I error rate control dropped off for conditions
with N<50 with none of the SSCs performing particularly well. In general, type I error rates
38
decreased as the number of time points and proportion of missing data increased. However, this
association had plenty of exceptions, and seemed to be stronger in conditions with smaller
sample sizes.
With respect to power, 𝑇 𝑀𝐿
once again had the best performance, with power at or near 1
in virtually all conditions. Unlike the type I error results, the SSCs performed very well in terms
of power. For example, all test statistics had power at or near 1 for conditions with N≥30. 𝑇 𝐵𝐴 .𝑀
had power below 1 for some N=20 conditions, specifically for t=6/m=20%, t=8/m=10%, and
t=8/m=20% conditions. In addition to this, all SSCs had power below 1 for PD comparisons for
the t=4/m=20% condition with dual change data.
Discussion
At first, the fact that 𝑇 𝑀𝐿
had the best performance in terms of both type I error control
and power seems unexpected. However, when examining the nature of SSCs more closely, these
results are not as surprising as one might imagine. Recall that the likelihood ratio test statistic is
calculated by taking the difference between the test statistics of the models being compared. If
both test statistics are inflated by the same amount, taking their difference would eliminate this
inflation and result in a likelihood ratio test statistic that follows the appropriate chi-square
distribution.
The differences in inflation can also be thought of using tools from classical test theory.
Consider two models, Model A and Model B where Model A is nested within Model B. The test
statistics for each model, 𝑇 𝐴 and 𝑇 𝐵 can be written as
𝑇 𝐴 = 𝑡 𝐴 + 𝑒 𝐴 and 𝑇 𝐵 = 𝑡 𝐵 + 𝑒 𝐵 (25)
39
where 𝑡 𝐴 and 𝑡 𝐵 are the true test statistics for Models A and B that follow their corresponding
chi-square reference distributions and 𝑒 𝐴 and 𝑒 𝐵 are the inflations in the observed test statistics
due to the small sample size. If these inflations are equal, 𝑒 𝐴 = 𝑒 𝐵 = 𝑒 then the likelihood ratio
test statistic, 𝑇 𝐿𝑅
, is simply
𝑇 𝐿𝑅
= 𝑇 𝐴 − 𝑇 𝐵 = 𝑡 𝐴 + 𝑒 − ( 𝑡 𝐵 + 𝑒 ) = 𝑡 𝐴 − 𝑡 𝐵 (26)
Here, the inflations cancel each other out and the resulting likelihood ratio test statistic is
unbiased.
SSCs on the other hand are different due to the correction factor, which influences the
results of the likelihood ratio test in two ways. First, the correction factor is proportional to
model complexity. For example, when comparing the no change model to the proportional
change model, the proportional change model has more latent variables. Therefore, the test
statistic for the proportional change model is shrunken more than that of the no change model.
When the corrected test statistic for the proportional change model is then subtracted from that of
the no change model, imperfections are not equal (𝑒 𝐴 ≠ 𝑒 𝐵 ) and are therefore not canceled out as
in the case of 𝑇 𝑀𝐿
. The resulting likelihood ratio test statistic does not follow the appropriate chi-
square reference distribution.
Second, the correction factor changes the behavior of the tail of the test statistic’s
distribution. In shrinking a test statistic with a proportional correction factor, the resulting
distribution becomes more light-tailed than the original. Combined with the differences in the
correction factor for different models discussed above, this leads to a likelihood ratio test statistic
with a lighter-tailed distribution compared to its reference chi-square distribution. Since critical
40
values used in hypothesis tests are based on the tail of the distribution, these lighter tails lead to
the lower type I error rates observed for the SSCs.
Overall, 𝑇 𝑀𝐿
performed best in terms of type I error rate control and power, and I
recommend its use when comparing models via the likelihood ratio test. In general, 𝑇 𝑌𝑈
reached
an acceptable level of performance at N≥50 while 𝑇 𝐵𝐴
, 𝑇 𝐵𝐴 .𝑀 , and 𝑇 𝑌𝑈 .𝑀 require a sample size of
N≥100 to be effective.
41
Study 4: An Alternative to Small Sample Corrections
Studies 2 and 3 have demonstrated that for LCSMs, neither 𝑇 𝑀𝐿
nor any of the SSCs
discussed perform well across all contexts in which a model fit statistic is needed. They have
highlighted many weaknesses in these test statistics and the procedures used to construct them.
These flaws highlight five criteria I believe an ideal procedure to address test statistics in small
samples should meet:
1. The distribution of the test statistic should closely match that of the reference distribution,
and converge to it asymptotically. In other words, the procedure should perform well in
small samples and better with increasing sample sizes.
2. The procedure should allow for model comparison via the likelihood ratio test. The
resulting likelihood ratio test statistics should demonstrate adequate power and control
over type I error rates.
3. The procedure should be able to account for missing data, and perform well in its
presence.
4. The procedure should never penalize the researcher for collecting more variables. In the
longitudinal case, this translates to the idea that the procedure should not perform worse
if the researcher were to continue the study and obtain data at more time points. This
combats those SSCs that are only defined when the number of time points is small.
5. The procedure should be invariant to model specification. If two models are identical
with regard to model expectations, then the procedure should treat them as equivalent
models. This represents a shift away from using the number of latent variables as part of
a correction factor.
42
None of the test statistics studied thus far meet all five of these criteria. As such, it is
clear that a new approach is needed in the study of model fit for LCSMs in small samples. To
this end, I propose the Monte Carlo test as a solution to this problem. The rest of Study 4
proceeds as follows. I begin by describing the Monte Carlo test and propose an extension. Next, I
test the procedure under the conditions of Studies 2 and 3 to compare its performance to the test
statistics discussed thus far. I then give the results of these tests followed by a discussion of these
results and the procedure itself.
The Monte Carlo Test
Originally proposed by Barnard (Bartlett, 1963), the Monte Carlo test makes use of
Monte Carlo simulation to approximate the distribution of a test statistic. Its theoretical
properties were studied by Hope (1968) who demonstrated its effectiveness and highlighted its
usefulness in situations where the conditions necessary for uniformly most powerful tests were
not met. Due to high computational costs as well as the increasing popularity of Efron’s
bootstrap (Efron, 1979), the Monte Carlo test fell out of favor. However, with advances in
computing technology and the poor performance of the bootstrap for structural equation models
in small samples (Nevitt & Hancock, 2001), the Monte Carlo test has reemerged as a viable
alternative.
The logic of the Monte Carlo test is as follows. First, the model of interest is fit to the
data and its parameters are estimated. Next, a large number of samples is simulated from the
model using these estimates as population values. The model is then fit to each of these
simulated samples, and the statistic of interest is calculated in each sample. The distribution of
this statistic constitutes the reference distribution, the distribution under the null hypothesis. The
43
statistic from the original sample data is then compared to this reference distribution when
performing hypothesis tests.
Though similar to the bootstrap, the Monte Carlo test differs in important ways. Whereas
the bootstrap generates samples by resampling from the original sample, the Monte Carlo test
simulates them using parameters estimated from the original sample. As a result, all samples
generated in the Monte Carlo test are purely artificial, and the original sample influences the
Monte Carlo sample only through the estimated model parameters. However, since these data are
simulated under the null hypothesis, this allows for the construction of a reference distribution
that in theory should approximate the distribution of the statistic under the null hypothesis.
The Monte Carlo test was recently rediscovered
2
by Jalal, who brought it to SEM (Jalal,
2017; Jalal & Bentler, in press). This work uses the model fit index (e.g. 𝑇 𝑀𝐿
) as the test statistic
of interest. Instead of transforming 𝑇 𝑀𝐿
using a SSC so that it better approximates a chi-square
distribution, Jalal used the Monte Carlo test to approximate the distribution of 𝑇 𝑀𝐿
itself, and
compared the observed value to this approximation of the reference distribution. He
demonstrated that the method had acceptable power and type I error rates in multivariate normal
data. In addition, he showed that when data are nonnormal, the procedure can just as easily be
applied by replacing 𝑇 𝑀𝐿
with 𝑇 𝑆𝐵
(which is better suited for nonnormal data) with similar
performance.
Though Jalal’s work (Jalal, 2017; Jalal & Bentler, in press) attempted to address
structural equation models in general, his studies focused exclusively on CFA models. Emphasis
2
I give Jalal credit for this rediscovery because he explicitly acknowledged the procedure as the Monte Carlo test.
Others have proposed the procedure but refer to it as an extension of the bootstrap, such as the procedure of
McLachlan (1987) which was later dubbed the bootstrap likelihood ratio test (Nylund, Asparouhov, & Muthén,
2007).
44
was placed on the method’s ability to evaluate how well the model fit the data. However, when
dealing with longitudinal structural equation models the ability to compare models via the
likelihood ratio test is of great interest and importance, as highlighted by Study 3. I take this
opportunity to extend Jalal’s method to make it general enough to answer questions regarding
both model fit and model comparisons, as well as incorporate missing data.
I now present a more formal description of the Monte Carlo test. Consider two models,
Model A and Model B, where Model A is nested within Model B. Model A therefore has more
degrees of freedom than Model B. The Monte Carlo test is conducted as follows:
1. Fit Model A to the original data, 𝑿 with sample size N. Estimate model parameters,
𝜽 𝐴 , and 𝑇 𝑀𝐿
for this model, 𝑇 𝐴 . Similarly, fit Model B to 𝑿 and calculate 𝑇 𝑀𝐿
for
Model B, 𝑇 𝐵 . The likelihood ratio test statistic, 𝑇 , is calculated as 𝑇 = 𝑇 𝐴 − 𝑇 𝐵 .
2. Simulate M samples 𝑿 1
∗
, 𝑿 2
∗
, …, 𝑿 𝑀 ∗
of size N from Model A using 𝜽 𝐴 as the
population values. Calculate the proportion of missing data in 𝑿 , π, and mask data in
𝑿 1
∗
, 𝑿 2
∗
, …, 𝑿 𝑀 ∗
such that each data point has probability π of being missing. Ensure
that each simulated sample has at least one observation per person.
3. Fit Model A and Model B to 𝑿 1
∗
, 𝑿 2
∗
, …, 𝑿 𝑀 ∗
. Calculate 𝑇 𝑀𝐿
for each model in each
sample, resulting in test statistics 𝑇 𝐴 1
∗
, 𝑇 𝐴 2
∗
, …, 𝑇 𝐴𝑀
∗
and 𝑇 𝐵 1
∗
, 𝑇 𝐵 2
∗
, …, 𝑇 𝐵𝑀
∗
.
4. Compute the likelihood ratio test statistics 𝑇 1
∗
, 𝑇 2
∗
, …, 𝑇 𝑀 ∗
such that 𝑇 1
∗
= 𝑇 𝐴 1
∗
− 𝑇 𝐵 1
∗
,
𝑇 2
∗
= 𝑇 𝐴 2
∗
− 𝑇 𝐵 2
∗
, …, 𝑇 𝑀 ∗
= 𝑇 𝐴𝑀
∗
− 𝑇 𝐵𝑀
∗
. Rank 𝑇 1
∗
, 𝑇 2
∗
, …, 𝑇 𝑀 ∗
from least to greatest,
such that 𝑇 ( 1)
∗
≤ 𝑇 ( 2)
∗
≤ … ≤ 𝑇 ( 𝑀 )
∗
.
45
5. For a test with significance level 𝛼 , the critical value is 𝑇 ( 𝑚 )
∗
where 𝑚 is the ceiling of
𝛼𝑀 . Thus, reject (choose Model B) if 𝑇 > 𝑇 ( 𝑚 )
∗
and fail to reject (choose Model A) if
𝑇 ≤ 𝑇 ( 𝑚 )
∗
.
This procedure is general enough to address both model fit and model comparison.
Consider for example a case in which the proportional change model is of interest. To see
whether it fits the data well, we would designate the proportional change model as Model A and
the saturated model as Model B. We would therefore simulate data from the proportional change
model and proceed accordingly. However, if we wished to test whether the proportional change
model fit the data better than the no change model, we would consider the no change model to be
Model A and the proportional change model would be Model B. In this case, data would be
simulated from the no change model.
The Monte Carlo test presents a model testing perspective that is fundamentally different
from that of SSCs. In model testing, we have a test statistic, 𝑇 𝑀𝐿
, calculated from the data. We
want to know whether this value would be unusual if it were to have been drawn from the null
distribution, the distribution under the null hypothesis. However, the shape of the null
distribution is unknown to us. As such, we assume it has a specific shape, in this case that of the
chi-square distribution. We then calculate a critical value based on the quantiles of the chi-square
distribution and use this to determine if our 𝑇 𝑀𝐿
is consistent with the null distribution.
Yet in small samples, the null distribution is not a chi-square distribution. As such, using
a chi-square distribution to evaluate our 𝑇 𝑀𝐿
is inappropriate. SSCs address this by multiplying
𝑇 𝑀𝐿
by a correction factor. The hope here is that the resulting test statistic follows a chi-square
distribution. If it does, we can compare our corrected test statistic to the critical value based on
46
the chi-square distribution. However, there is no theorem guaranteeing that this will happen. The
correction factors are entirely arbitrary. Sometimes they do transform 𝑇 𝑀𝐿
in such a way that it
follows a chi-square distribution, but other times they do not. This represents the fundamental
problem with using arbitrary corrections: they may or may not work, and it is impossible to
know for sure whether or not they do for any particular sample.
The Monte Carlo test approaches the problem differently. It gets to the root of the
problem by avoiding the chi-square distribution entirely. Instead of assuming the null
distribution is a chi-square distribution, it estimates the shape of the null distribution directly
using Monte Carlo simulation. 𝑇 𝑀𝐿
is then compared to the critical value from this empirically
derived null distribution. Since the null distribution is not assumed to follow any specified shape,
no correction is needed to transform 𝑇 𝑀 𝐿 so that it adheres to that shape. As long as the Monte
Carlo test can estimate the null distribution well, it should outperform SSCs in a broader context
given its flexibility and lack of arbitrariness.
Methods
To assess the performance of the Monte Carlo test, I tested it under the conditions of
Studies 2 and 3, using M=1,000 simulated samples per replication. Due to the long
computational time of the test, I only evaluated it in a subset of conditions. For Study 2, I used
all combinations of N = 20, 50, and 100, t = 4 and 8, and m = 0% and 20%. These represent the
extreme levels for each factor, omitting most levels in between. Since intermediary levels in
Study 2 resulted in intermediary performance, I believed that the conditions selected would be
sufficient to adequately characterize the Monte Carlo test’s performance. For Study 3, only the
N=20/t=4/m=20% condition was evaluated. Since this was the condition with the least
information and SSCs performed very poorly in it, I felt that it would serve as an acceptable
47
lower bound with regard to the Monte Carlo test’s performance. That is, if the Monte Carlo test
performs exceptionally here, it would do even better in other conditions, as the increase in
information (e.g. with a greater sample size) would only improve performance further.
Results
As with Study 2, the Kolmogorov-Smirnov test was used to determine whether the
distribution of test statistics generated with the Monte Carlo test matched the empirical
distribution of 𝑇 𝑀𝐿
. Recall that each condition contains 1,000 replications. For each of these
1,000 replications, 1,000 samples were simulated and 1,000 test statistics were calculated. In
each replication, the distribution of these test statistics was compared to the empirical
distribution of 𝑇 𝑀𝐿
from Study 2. Table 7 contains the proportion of replications in which the
distribution of the test statistics from the Monte Carlo test was judged by the two-sample
Kolmogorov-Smirnov test to match the empirical distribution of 𝑇 𝑀𝐿
. As such, values closer to 1
were preferred. Fortunately, in nearly all conditions the Monte Carlo test nearly always
generated test statistics whose distributions matched the empirical distributions of 𝑇 𝑀𝐿
. The only
two conditions where this was not the case were the N=20/t=8/m=20% and N=50/t=8/m=0%
conditions, where 81.5% of replications were judged to match the empirical distribution.
However, I do not find this to be problematic as this is still a very high percentage and as
discussed below, this incongruence in a relatively small portion of replications did not hinder
type I error rate control.
A portion of these results is presented in Figure 5 so that the reader may better visualize
them. Figure 5 shows the distributions of 𝑇 𝑀𝐿
, 𝑇 𝑌𝑈
𝐿𝐺𝑀 , and 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 for the constant change model
with N=20/t=4/m=20%. It also gives the distribution of test statistics from the Monte Carlo test
for a single replication, along with the appropriate chi-square reference distribution. Beginning
48
with 𝑇 𝑀𝐿
, we see that its distribution is nowhere close to that of the chi-square distribution. 𝑇 𝑌𝑈
𝐿𝐺𝑀
rescales 𝑇 𝑀𝐿
in an attempt to rectify this, but it too fails. 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 incorporates the MDC which
brings it much closer to the chi-square distribution. However, there is still a slight discrepancy
between 𝑇 𝑌 𝑈 .𝑀 𝐿𝐺𝑀 and the chi-square distribution, despite its position as the best SSC for this
purpose as found in Study 2. In this mismatch, we begin to see the traces of its flaws. It is
slightly more skewed than the chi-square distribution. The LCSM counterparts are even more
skewed, and other models such as the proportional and dual change models do not have a direct
LGM equivalent in the same way the constant change model does
3
. The Monte Carlo test
bypasses the distributional assumption and the required transformation by simply trying to
estimate the shape of the distribution of 𝑇 𝑀𝐿
(which is unknown in empirical data) directly.
Clearly, the Monte Carlo test statistics are able to approximate this distribution rather well, as the
distributions for 𝑇 𝑀𝐿
and the Monte Carlo test are nearly identical. The accuracy of this
approximation demonstrates the true value of the Monte Carlo test.
Table 7 also displays type I error rates for the Monte Carlo test under the conditions of
Study 2. Type I error rates here are defined as before: the proportion of replications for which the
method incorrectly rejected and concluded that the constant change model did not fit the data
well. Here, values closer to the significance level, .05, were more desirable. As can be seen in
Table 7, the Monte Carlo test performed excellently in all conditions. Type I error rates were
very close to the nominal level of .05, ranging from .038 to .063. These were well within the
acceptable range of between .025 and .075.
3
The proportional and dual change models can be respecified via reparameterization, though the meaning of the
parameters changes for these models.
49
Table 8 contains results for the likelihood ratio test performed via the Monte Carlo test.
As in Table 6, Table 8 contains both type I error rates and power for the likelihood ratio test.
Here too the Monte Carlo test performed nearly perfectly. Type I error rates were within the
acceptable bounds, demonstrating that the test has good control over type I error. Power was also
at or near 1 in all cases, showing the Monte Carlo test’s ability to correctly reject misspecified
models on its way to identifying the data generating model.
Discussion
Based on the results of Study 4, it is clear that the Monte Carlo test exhibits superb
performance when examining model fit and comparing models via the likelihood ratio test. With
respect to model fit, the distribution of the test statistics generated by the Monte Carlo test very
closely matched the empirical distributions of 𝑇 𝑀𝐿
. The Monte Carlo test also had excellent type
I error control across all conditions, surpassing the performance of all test statistics evaluated in
Study 2, even the SSCs. With regard to model comparison, the Monte Carlo test outperformed
the SSCs as well, with type I error rates near .05 and power near 1 for all comparisons. Since this
was found under the condition with the least information available, I expect its performance to
generalize to other conditions in which more information is available.
The Monte Carlo test meets all five criteria discussed at the start of Study 4. It generates a
reference distribution that closely matches the empirical distribution of the test statistic of
interest. It allows for model comparison via the likelihood ratio test, with excellent power and
type I error control. It permits missing data and works well in its presence. It is defined as long
as data can be simulated from the model (i.e. in the absence of Heywood cases). Finally, it is
invariant under model respecification resulting in identical performance for equivalent LGMs
and LCSMs.
50
When fitting LCSMs, in Study 2 both 𝑇 𝑀𝐿
and SSCs performed relatively poorly. This
was especially the case in conditions with N≤30. In Study 3, 𝑇 𝑀𝐿
performed well, but SSCs
performed poorly. The Monte Carlo test performed well in all conditions studied for both Study
2 and Study 3, replacing the need to use 𝑇 𝑀𝐿
or SSCs entirely. For all of these reasons, I fully
recommend the Monte Carlo test when evaluating or comparing the fit of LCSMs.
51
Study 5: Cognitive Changes in Adults
Thus far, I have studied the properties of the LCSM in simulated data alone. The purpose
of Study 5 is to examine how the LCSM functions when applied to real data. I used the
knowledge gained from Studies 1 through 4 to analyze an empirical dataset involving cognitive
changes in adults. Based on the results of Study 4, in practice I believe the use of SSCs is
unnecessary and that the Monte Carlo test alone is sufficient. However, to illustrate their
comparative performances I also implemented the analysis using 𝑇 𝑀𝐿
, 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, 𝑇 𝑆𝑊
, and 𝑇 𝑆𝐵
. I
reiterate that the primary purpose of this study is to demonstrate how the methods discussed in
this dissertation should be implemented in practice.
Methods
The data were collected as part of the Cognition and Aging in the USA (CogUSA) study,
a national longitudinal study of age-related changes in cognition in adults. Though the full
sample is much larger, only a small subset of N=24 individuals was examined here, with 20%
missing data. Each individual contributed up to four measurements, the first of which was taken
at around the age of 72 years and the last of which was taken before the age of 80 years. All data
were collected via telephone interviews. The outcome of interest was the Number Series score, a
test from the Woodcock Johnson Psychoeducational Test Battery (WJ-III; Woodcock, McGrew,
& Mather, 2001; Woodcock & Mather, 2001), adapted for administration over telephone. The
Number Series test is a measure of quantitative reasoning in which participants are given a series
of numbers from which they must derive the numerical pattern and provide the missing number
in the sequence. The trajectories of the scores are given in Figure 6. Based on this plot, it seems
that some individuals improve over time while others decline.
52
Results
The no change, proportional change, constant change, and dual change models were all
fit to the data. A heteroscedastic residual structure was used to aid model fitting. Despite this
more flexible residual structure, neither the constant change nor the dual change models
converged, leaving the no change and proportional change models as the only candidate models.
I discuss overall model fit first. For the no change model (df = 8) resulting model statistics were:
𝑇 𝑀𝐿
= 21.71 (p = .005), 𝑇 𝐵𝐴
= 9.98 (p = .267), 𝑇 𝑌𝑈
= 13.63 (p = .092), 𝑇 𝑆𝑊
= 19.12 (p = .014),
and 𝑇 𝑆𝐵
= 25.48 (p = .001). For the proportional change model (df = 7) the obtained model
statistics were: 𝑇 𝑀𝐿
= 15.46 (p = .031), 𝑇 𝐵𝐴
= 7.11 (p = .418), 𝑇 𝑌𝑈
= 9.71 (p = .206), 𝑇 𝑆𝑊
= 13.52
(p = .060), and 𝑇 𝑆𝐵
= 16.56 (p = .020). Based on these results, it seems that both the no change
and the proportional change models have support from some fit indices.
To determine which model was more appropriate for the data, the no change and
proportional change models were compared via the likelihood ratio test with df = 1. When the
test was done using 𝑇 𝑀𝐿
, results suggested that the proportional change model should be chosen
(𝑇 𝑀𝐿
= 6.25, p = .012). Yet when using SSCs, this was not always the case. Three SSCs (𝑇 𝑌𝑈
=
3.92, p = .048; 𝑇 𝑆𝑊
= 5.60, p = .018; 𝑇 𝑆𝐵
= 22.29, p = .000) agreed with 𝑇 𝑀𝐿
, but 𝑇 𝐵𝐴
preferred
the no change model (𝑇 𝐵𝐴
= 2.87, p = .090).
The Monte Carlo test was performed in part as a deciding vote. I generated 1,000 samples
from the no change model with population values 𝜇 𝑖 = 515.36, 𝜎 𝑖 2
= 190.23, 𝜎 𝑒 1
2
= 519.31, 𝜎 𝑒 2
2
=
248.74, 𝜎 𝑒 3
2
= 394.77, and 𝜎 𝑒 4
2
= 493.59, the estimated parameters for the no change model. After
masking them appropriately, I fit the no change model and proportional change model to each
dataset. After calculating the likelihood ratio test statistic for each sample, I sorted them in order
53
of least to greatest, and found the critical value for the 95
th
percentile: 4.29. Since 𝑇 𝑀𝐿
= 6.25 >
4.29, the Monte Carlo test rejects, thus favoring the proportional change model. The resulting p
value (the proportion of Monte Carlo test statistics greater than 𝑇 𝑀𝐿
) was found to be p = .026.
Since both 𝑇 𝑀𝐿
and the Monte Carlo test performed well in the simulations, I follow their
recommendation and conclude that the proportional change model is more appropriate for the
data than the no change model.
That said, I also used the Monte Carlo test to determine how well the model fit the data.
In terms of overall model fit, the Monte Carlo test resulted in a critical value of 18.03. Since 𝑇 𝑀𝐿
= 15.46 < 18.03, we conclude that the proportional change model fits the data relatively well,
with a p value of .656. This is in agreement with 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, and 𝑇 𝑆𝑊
, but contrary to 𝑇 𝑀𝐿
and 𝑇 𝑆𝐵
,
both of which suggest that the proportional change model is not suitable for the data. Based on
the simulation results, I favor the Monte Carlo test and settle on the proportional change model
as the final model.
I use this opportunity to interpret the parameter estimates of the proportional change
model. 𝜇 𝑖 = 510.19 (SE = 4.16) indicating that at age 72 the expected Number Series score for
the average person was 510.19. The variance of this intercept 𝜎 𝑖 2
= 182.08 (SE = 94.93) indicates
that there is a great deal of between person differences in this initial level. The proportional
change component 𝛽 = 0.004 (SE = 0.002) showed that there was a slight positive relationship
between the true score at one time point and the change between it and the next. Finally, the
residual variance 𝜎 𝑒 2
= [471.50 254.01 351.98 409.25]’ pointed to a great deal of variability
uncaptured by the model, particularly at the start of the study.
54
Discussion
Based on the results of the Monte Carlo test, the proportional change model was selected
over the no change model and judged to possess acceptable model fit. This study served to
illustrate that the phenomenon observed in the simulation studies is also true in empirical data.
That is, the use of conventional measures of model fit (𝑇 𝑀𝐿
), in addition to SSCs, can lead to
different and potentially incorrect conclusions when applied to LCSMs. Of course, given the
empirical nature of the data, it is impossible to know the true population model (if one does
indeed exists). However, the simulation studies suggest that we should put our faith in the Monte
Carlo test as our best approximation to the truth relative to the other methods discussed.
One interesting feature in the data is that the results suggest, on average, an improvement
in performance on the Number Series task. This goes against the expected age-related decline in
cognitive functioning (see for example McArdle, Fisher, & Kadlec, 2007). One possible
explanation of this result is a potential retest effect (Ferrer, Salthouse, McArdle, Stewart, &
Schwartz, 2005). That is, the expected decline in cognitive functioning is counteracted by
improvement on the task due to the practice effects from performing it multiple times. Though
examining this goes beyond the scope of this study, future research should account for retest
effects in order to tease them apart from changes in cognitive ability.
55
Conclusions
The purpose of this dissertation was to understand the properties of the LCSM in small
samples, ranging from N=20 to N=100. This was done via a series of five studies, each
examining different aspects. Study 1 simply assessed whether it was even possible to fit LCSMs
in small samples using conventional SEM software, specifically the lavaan and OpenMx
packages in R. I found that it was indeed possible to fit the four most popular LCSMs (the no
change, proportional change, constant change, and dual change models) to samples of sizes
between N=20 and N=100, with a variety of time points (t= 4, 6, and 8) and with missing data (m
= 0%, 10%, and 20%). Convergence was always achieved when fitting the data generating
model, and the fitting of misspecified models also resulted in model convergence nearly all of
the time, with the sole exception being when the dual change model was fit to no change data.
Bias in parameter estimates was also minimal, demonstrating that conventional rules of thumb
regarding minimum sample size for fitting structural equation models do not apply to the LCSM;
N=20 with four time points is enough, even with 20% missing data.
Study 2 examined overall model fit and the effects of model specification. Model fit in
the constant change model was inspected under all conditions described above for Study 1. In
addition to 𝑇 𝑀𝐿
, the performances of the four SSCs (𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, 𝑇 𝑆𝑊
, and 𝑇 𝑆𝐵
) were also of
interest. For 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
, models were fit from both the LGM and LCSM frameworks, since
both corrections are a function of the number of latent variables, which differs across
frameworks. The effects of including MDCs to create an effective sample size in 𝑇 𝐵𝐴
, 𝑇 𝑌𝑈
, and
𝑇 𝑆𝑊
were also evaluated.
Results indicated that 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
performed best, while 𝑇 𝑆𝑊
came in a distant third.
Both 𝑇 𝑀𝐿
and 𝑇 𝑆𝐵
did not follow their corresponding chi-square reference distributions and were
56
therefore unable to effectively control type I error rates. Models fit from the LGM perspective
exhibited better performance due to the large number of latent variables in LCSMs which led to
overcorrection of 𝑇 𝑀𝐿
. However, the use of the LGM perspective is limited only to no change
and constant change models. Proportional change and dual change models have no clear exact
LGM equivalent with two latent variables since these models conceptualize change as a dynamic
process. Though I did not test them explicitly, based on the results for the constant change model
I expect that neither 𝑇 𝐵𝐴
nor 𝑇 𝑌𝑈
would work well for either the proportional or dual change
models.
One potential improvement to the LCSM corrections is to change the specification such
that true scores are not included in the model. This would yield an equivalent model, while
simultaneously reducing the number of latent variables by nearly half, thus limiting the
overcorrection. However, this would not alleviate the problem entirely, as the resulting test
statistic would still always be outperformed by the LGM correction when available. In any case,
the fact that the way in which two equivalent models are specified could lead to different
conclusions regarding model fit is still a concern, one which removing true scores does not solve.
The effects of MDCs varied. Since they shrank 𝑇 𝑀𝐿
further, they were useful when SSCs
needed a little extra help in doing so, as in the case of 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
. When 𝑇 𝑀𝐿
had already been
shrunk too much, as in the case of the LCSM corrections, they made the problem even worse.
Approximate model fit indices such as the RMSEA, CFI, and TLI worked best when calculated
with the LCSM corrections, since they favor values of the test statistic that are smaller. Despite
this, their performance was acceptable for most conditions with the LGM corrections. However,
given that approximate fit indices were originally intended to evaluate model fit for complex
models in large samples, I believe that simply relying on the test statistic is sufficient in this
57
context, given the focus on small samples and that LCSMs are relatively simple models with
respect to the number of parameters estimated.
Study 3 evaluated model comparison via the likelihood ratio test. This study focused on
the ability of the test statistic to select the data generating model when comparing the no change,
proportional change, constant change, and dual change models to each other. The performance of
𝑇 𝑀𝐿
was judged alongside 𝑇 𝐵𝐴
and 𝑇 𝑌𝑈
, the two most successful SSCs from Study 2. 𝑇 𝑀𝐿
performed the best, with near perfect control over type I error rates and power at or near 1 in
virtually all conditions. This is likely because the deficiencies observed in 𝑇 𝑀𝐿
in the context of
model fit are eliminated when performing the likelihood ratio test, since this involves subtracting
𝑇 𝑀𝐿
for both models. This is not the case when SSCs are used, as these deficiencies undergo
different corrections since each candidate model differs in terms of model complexity. The
proportionality of the correction factor results in a likelihood ratio test statistic with a lighter tail
than the reference distribution, leading to lower type I error rates. As such, the use of 𝑇 𝑀𝐿
is
preferable to SSCs when comparing models via the likelihood ratio test.
Since no test statistic had exceptional performance across all conditions in both Studies 2
and 3, an alternative was clearly needed. To this end, I proposed an extension of the Monte Carlo
test. After fitting two competing models to the data, the procedure calls for the simulation of data
using the parameter estimates from the nested model as population values. Both models are fit to
each Monte Carlo sample, test statistics are calculated, and the distribution of these test statistics
is used as the reference distribution to which the original test statistic is compared. The Monte
Carlo test performed excellently with regard to both model fit and model comparison via the
likelihood ratio test. As such, I recommend its use when fitting LCSMs in small samples.
58
The Monte Carlo test is similar to the bootstrap, but it differs in an important way.
Whereas the bootstrap resamples with replacement from the original sample to generate its
bootstrap samples, the Monte Carlo test simulates Monte Carlo samples based on a model
estimated from the data. This slight difference has larger implications for the accuracy of results.
Both methods assume the original sample is representative of the population of interest.
However, if due purely to random chance the original sample drawn from the population does
not happen to characterize the population well (a likely possibility when dealing with samples of
size N=20 for example), there are different consequences for each method. For the bootstrap, the
sampling space from which the resampling mechanism can draw is limited, preventing any of the
bootstrap samples from being representative. Yet for the Monte Carlo test, as long as the
parameter estimates obtained when fitting the model to the original sample are close to the
population values, the simulated Monte Carlo samples will be representative of the population
even if the original sample is not. On the other hand, the reverse is also true. If the original
sample is representative and the model is drastically misspecified, the Monte Carlo test will
generate flawed Monte Carlo samples, whereas the bootstrap samples will be representative. In
other words, both methods assume that the original sample is representative of the population
and that the model of interest is correctly specified, but it’s possible that the bootstrap relies
more heavily on the former assumption while the Monte Carlo test relies more heavily on the
latter. Since in psychological research it is nearly always impossible to know whether either
assumption holds in practice, it is difficult to make the case for one method over the other on this
basis. However, I believe it is still important to acknowledge this distinction.
The Monte Carlo test has many strengths. It allows for the empirical construction of a
reference distribution, eliminating the need to assume that a given test statistic follows some pre-
59
specified form. This alternative reference distribution is also quite flexible given that it does not
need to be expressed mathematically as a probability density function. This is useful in a wide
variety of contexts, such as the case of small samples studied here, in which the test statistics do
not follow a chi-square distribution. Though not examined here, this has also been shown to be
true in other conditions, such as when data are nonnormal (Jalal & Bentler, in press). As such,
the exceptional performance of the Monte Carlo test renders SSCs virtually obsolete as it
improves on their strengths and does not possess their weaknesses.
Despite this, the Monte Carlo test does have weaknesses of its own. One obvious
weakness involves the generation of the Monte Carlo samples. Since the data must be simulated
from a model, the parameter estimates used to simulate the Monte Carlo samples cannot contain
any Heywood cases. This is because it is difficult (if not impossible) for conventional software
programs to generate data with, for example, negative residual variances. Though not typically
problematic when simulating data from the model of interest (e.g. in simulating data from a
proportional change model to examine model fit), it can be troublesome when fitting a
comparison model. For example, when comparing a no change model to a proportional change
model, data must be generated from the no change model. If the no change model is too
inappropriate for the data, Heywood cases can result in its parameter estimates rendering the
Monte Carlo test unusable. It should be noted that Heywood cases are allowed when fitting the
model to Monte Carlo samples; it is only when generating these samples that problems arise,
making it necessary that no Heywood cases occur when models are fit to the original sample.
Another obvious weakness is computation time. For each analysis, 1,000 Monte Carlo
samples must be simulated and the models of interest must be fit to all of them, making the
computation take over 1,000 times longer than fitting a single model to a single sample. While
60
this can potentially be problematic in larger samples that take longer to fit, the small samples
considered here can be fit relatively quickly, assuaging concerns related to this. Although
simulation studies examining it would require extensive computational resources, the typical user
fitting a model to an empirical dataset should not have to wait more than 15 minutes at most for
the Monte Carlo test to be implemented. For reference, when applied to the empirical data from
Study 5, the Monte Carlo test took under 9 minutes to complete on a Windows computer with an
AMD FX-8350 processor and 16 GB of RAM. For those wishing to use multiple cores, the
procedure lends itself well to parallelization, reducing run time even further.
The extension of the Monte Carlo test proposed in this dissertation did perform well, but
it is by no means perfect; several improvements can be made. First, I accounted for missing data
by simply masking simulated data to match the proportions of missing data in the original
sample. This was done because the study by McNeish and Harring (2017) suggested that only the
amount of missing data influenced model fit, not its nature. However, these authors were more
interested in how the amount of missing data affected SSCs. It is possible that the missingness
mechanism may need to be incorporated in the simulation of Monte Carlo samples. Though I
only considered MCAR data, future research should extend the Monte Carlo test to
accommodate MAR data as well, as this is far more common in psychological research.
Additionally, the current work considered only data that followed a multivariate normal
distribution. Though this was done to serve as a baseline, the model should be studied under
conditions of increased variety both in the data and in the models. Jalal and Bentler (in press)
showed that the Monte Carlo test can be performed using 𝑇 𝑆𝐵
instead of 𝑇 𝑀𝐿
for confirmatory
factor models, and I see no reason why that should not extend to LCSMs. This would allow for
61
the relaxation of the multivariate normality assumption of the data, making it possible to use the
procedure even if this assumption is not met.
Similarly, it would be beneficial to verify that the Monte Carlo test still performs well
under variations and extensions of the LCSMs used in this dissertation. For example, the models
could be examined with different parameter estimates, factor levels, residual structures, and so
on. While this work only examined the univariate LCSM, the bivariate LCSM (McArdle, 2001)
could also be considered. The bivariate LCSM extends its univariate counterpart to two
variables, shedding light on both how these variables change over time as well as the dynamics
between them. Additionally, the Monte Carlo test could be extended to accommodate multiple
group models, allowing for the assessment of group differences in change patterns in small
samples.
Study 5 applied the LCSM to an empirical dataset regarding cognitive aging in adults.
Although 𝑇 𝐵𝐴
suggested that the no change model was more appropriate, 𝑇 𝑀𝐿
, 𝑇 𝑌𝑈
, 𝑇 𝑆𝑊
, 𝑇 𝑆𝐵
, and
the Monte Carlo test suggested that the proportional change model fit the data better. Since 𝑇 𝑀𝐿
and the Monte Carlo test performed best in the simulation studies with regard to model
comparison, I selected the proportional change model as the final model. The Monte Carlo test
also suggested that the proportional change model provided a good fit for the data. Parameter
estimates indicated a great deal of individual differences between participants at the start of the
study, and a slight improvement in scores over time. I believe that this slight positive increase is
due to practice or retest effects, which in this case overcame the expected slight cognitive
decline. Future research should attempt to untangle these retest effects from the change in
cognitive ability itself.
62
In sum, this dissertation demonstrates that simple LCSMs can be fit to samples as small
as N=20 with missing data. Conventional SEM software produces excellent convergence rates
and estimates across the board. Where model fit indices and SSCs fail, the Monte Carlo test
succeeds, both in overall model fit and in model comparison via the likelihood ratio test. The
Monte Carlo test represents another step toward an exciting future, one in which we are no
longer limited by the constraints of mathematical distributions, but only by the speed of our
computers. And our computers are only getting faster.
63
References
Anderson, J. C., & Gerbing, D. (1984). The effect of sampling error on convergence, improper
solutions, and goodness-of-fit indices for maximum likelihood confirmatory factor
analysis. Psychometrika, 49, 155–173.
Bartlett, M. S. (1950). Tests of significance in factor analysis. British Journal of Psychology
(Statistical Section), 3, 77–85.
Bartlett, M. S. (1963). The spectral analysis of point processes. Journal of the Royal Statistical
Society, B, 25, 264–296.
Bell, B. A., Morgan, G. B., Schoeneberger, J. A., Kromrey, J. D., & Ferron, J. M. (2014). How
low can you go? Methodology, 10, 1–11.
Bentler, P. M. (1990). Comparative fit indexes in structural models. Psychological Bulletin, 107,
238–246.
Bentler, P. M., & Chou, C. H. (1987). Practical issues in structural modeling. Sociological
Methods & Research, 16, 78–117.
Beran, R., & Srivastava, M. S. (1985). Bootstrap and confidence regions for functions of a
covariance matrix. Annals of Statistics, 13, 95–115.
Bollen, K. A., & Curran, P. J. (2006). Latent curve models: A structural equation perspective.
Hoboken, NJ: Wiley.
Bollen, K. A., & Stine, R. A. (1992). Bootstrapping goodness-of-fit measures in structural
equation models. Sociological Methods & Research, 21, 205–229.
Boomsma, A. (1985). Nonconvergence, improper solutions, and starting values in LISREL
maximum likelihood estimation. Psychometrika, 50, 229–242.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology,
64
31, 144–152.
Brammer, R. J. (2003). Modelling covariance structure in ascending dose studies of isolated
tissues and organs. Pharmaceutical Statistics, 2, 103–112.
Browne, M. W. (1982). Covariance structures. In D. M. Hawkins (Ed.), Topics in applied
multivariate analysis (pp. 72–142). Cambridge, UK: Cambridge University Press.
Browne, M. W. (1984). Asymptotically distribution–free methods for the analysis of covariance
structures. British Journal of Mathematical and Statistical Psychology, 37, 62–83.
Chen, F., Bollen, K. A., Paxton, P. Curran, P. J., & Kirby, J. B. (2001). Improper solutions in
structural equation models: Causes, consequences, and strategies. Sociological Methods
and Research, 29, 468−508.
Chou, C. P., Bentler, P. M., & Pentz, M. A. (1998). Comparisons of two statistical approaches to
study growth curves: The multilevel model and the latent curve analysis. Structural
Equation Modeling, 5, 247–266.
Comrey, A. L., & Lee, H. B. (1992). A first course in factor analysis. Hillsdale, NJ: Erlbaum.
Curran, P. J. (2003). Have multilevel models been structural equation models all along?
Multivariate Behavioral Research, 38, 529–569.
Curran, P. J., Bollen, K. A., Chen, F., Paxton, P., & Kirby, J. B. (2003). Finite sampling
properties of the point estimates and confidence intervals of the RMSEA. Sociological
Methods & Research, 32, 208–252.
Curran, P. J., Bollen, K. A., Paxton, P., Kirby, J., & Chen, F. (2002). The noncentral chi-square
distribution in misspecified structural equation models: Finite sample results from a
Monte Carlo simulation. Multivariate Behavioral Research, 37, 1–36.
65
Curran, P. J., West, S. G., & Finch, J. F. (1996). The robustness of test statistics to nonnormality
and specification error in confirmatory factor analysis. Psychological Methods, 1, 16–29.
Efron, B. (1979). Bootstrap methods: Another look at the jackknife. Annals of Statistics, 7, 1–26.
Enders, C. K. (2002). Applying the Bollen–Stine bootstrap for goodness-of-fit measures to
structural equation models with missing data. Multivariate Behavioral Research, 37,
359–377.
Ferrer, E., Salthouse, T. A., McArdle, J. J., Stewart, W. F., & Schwartz, B. S. (2005).
Multivariate modeling of age and retest in longitudinal studies of cognitive abilities.
Psychology and Aging, 20, 412–422.
Fouladi, R. T. (2000). Performance of modified test statistics in covariance and correlation
structure analysis under conditions of multivariate nonnormality. Structural Equation
Modeling, 7, 356–410.
Fujikoshi, Y. (2000). Transformations with improved chi-squared approximations. Journal of
Multivariate Analysis, 72, 249–263.
Gerbing, D. W., & Anderson, J. C. (1987). Improper solutions in the analysis of covariance
structures: Their interpretability and a comparison of alternate respecifications.
Psychometrika, 52, 99–111.
Ghisletta, P., & McArdle, J. J. (2012). Latent curve models and latent change score models
estimated in R. Structural Equation Modeling, 19, 651–682.
Gorsuch, R. L. (1983). Factor analysis (2nd ed.). Hillsdale, NJ: Erlbaum.
Grimm, K. J., & Widaman, K. F. (2010). Residual structures in latent growth curve
modeling. Structural Equation Modeling, 17, 424–442.
66
Hamagami, F., & McArdle, J. J. (2001). Advanced studies of individual differences linear
dynamic models for longitudinal data analysis. In G. Marcoulides & R. Schumacker
(Eds.), New developments and techniques in structural equation modeling (pp. 203–246).
Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Herzog, W., & Boomsma, A. (2009). Small-sample robust estimators of noncentrality-based and
incremental model fit. Structural Equation Modeling, 16, 1–27.
Herzog, W., Boomsma, A., & Reinecke, S. (2007). The model-size effect on traditional and
modified tests of covariance structures. Structural Equation Modeling, 14, 361–390.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling:
An overview and a meta-analysis. Sociological Methods & Research, 26, 329–367.
Hope, A. C. A. (1968) A simplified Monte Carlo test procedure. Journal of the Royal Statistical
Society, B, 30, 582–598.
Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis:
Conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
Hu, L., Bentler, P. M. & Kano, Y. (1992). Can test statistics in covariance structure analysis be
trusted? Psychological Bulletin, 112, 351–362.
Jalal, S. (2017). Using Monte Carlo normal distributions to evaluate structural equation models
with nonnormal data (Doctoral dissertation). Retrieved from
https://escholarship.org/uc/item/6n79t3h0
Jalal, S., & Bentler, P. (in press). Using Monte Carlo normal distributions to evaluate structural
equation models with nonnormal data. Structural Equation Modeling.
Kline, R. B. (2011). Principles and practice of structural equation modeling. New York:
Guilford Press.
67
Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and structural
equation analysis. Mahwah, NJ: L. Erlbaum Associates.
Maas, C. J., & Hox, J. J. (2005). Sufficient sample sizes for multilevel modeling. Methodology,
1, 86–92.
MacCallum, R. C., Widaman, K. F., Zhang, S., & Hong, S. (1999). Sample size in factor
analysis. Psychological Methods, 4, 84–99.
Marsh, H. W., Balla, J. R., & McDonald, R. P. (1988). Goodness of fit in confirmatory factor
analysis: The effect of sample size. Psychological Bulletin, 103, 391–410.
Marsh, H. W., Hau, K.-T., & Wen, Z. (2004). In search of golden rules: Comment on hypothesis
testing approaches to setting cutoff values for fit indexes and dangers in overgeneralizing
Hu & Bentler’s (1999) findings. Structural Equation Modeling, 11, 320–341.
McArdle, J. J. (2001). A latent difference score approach to longitudinal dynamic structural
analyses. In R. Cudeck, S. duToit, & D. Sorbom (Eds.), Structural equation modeling:
Present and future (pp. 342–380). Lincolnwood, IL: Scientific Software International.
McArdle, J. J., & Epstein, D. (1987). Latent growth curves within developmental structural
equation models. Child Development, 58, 110–133.
McArdle, J. J., Fisher, G. G., & Kadlec, K. M. (2007). Latent variable analyses of age trends of
cognition in the Health and Retirement Study, 1992–2004. Psychology and Aging, 22, 525–
545.
McArdle, J. J., & Hamagami, F. (2001). Latent difference score structural models for linear
dynamic analyses with incomplete longitudinal data. In L. M. Collins & M. Sayer (Eds.),
New methods for the analysis of change (pp. 139–175). Washington, DC: American
Psychological Association.
68
McLachlan, G. (1987). On bootstrapping the likelihood ratio test statistic for the number of
components in a normal mixture. Journal of the Royal Statistical Society, C, 36, 318–324.
McNeish, D., & Harring, J. R. (2017). Correcting model fit criteria for small sample latent
growth models with incomplete data. Educational and Psychological Measurement, 77,
990–1018.
Meredith, W., & Tisak, J. (1990). Latent curve analysis. Psychometrika, 55, 107–122.
Muth, C., Bales, K. L., Hinde, K., Maninger, N., Mendoza, S. P., & Ferrer, E. (2015). Alternative
models for small samples in psychological research: Applying linear mixed effects
models and generalized estimating equations to repeated measures data. Educational and
Psychological Measurement, 76, 64–87.
Muthén, B. O. (2004). Mplus: Statistical analysis with latent variables: Technical appendices.
Los Angeles: Muthén & Muthén.
Neale, M. C., Hunter, M. D., Pritikin, J. N., Zahery, M., Brick, T. R., Kirkpatrick, R. M.,
Estabrook, R., Bates, T. C., Maes, H. H., and Boker, S. M. (2016). OpenMx 2.0:
Extended structural equation and statistical modeling. Psychometrika, 81, 535–549.
Nevitt, J. & Hancock, G. R. (2001). Performance of bootstrapping approaches to model test
statistics and parameter standard error estimation in structural equation modeling.
Structural Equation Modeling, 8, 353–377.
Nevitt, J., & Hancock, G. R. (2004). Evaluating small sample approaches for model test statistics
in structural equation modeling. Multivariate Behavioral Research, 39, 439–478.
Nunnally, J. C. (1967). Psychometric theory. New York, NY: McGraw-Hill.
69
Nylund, K. L., Asparouhov. T., & Muthén, B. (2007). Deciding on the number of classes in
latent class analysis and growth mixture modeling: A Monte Carlo simulation study.
Structural Equation Modeling, 14, 535–569.
R Core Team. (2017). R: A language and environment for statistical computing. Vienna, Austria:
R Foundation for Statistical Computing.
Rosseel, Y. (2012). Lavaan: An R package for structural equation modeling. Journal of
Statistical Software, 48(2), 1–36.
Rubin, D. B., & Schenker, N. (1986). Multiple imputation for interval estimation from simple
random samples with ignorable nonresponse. Journal of the American Statistical
Association, 81, 366–374.
Satorra, A., & Bentler, P. M. (1994). Corrections to test statistics and standard errors in
covariance structure analysis. In A. von Eye & C. C. Clogg (Eds.), Latent variables
analysis: Applications for developmental research (pp. 399-419). Thousand Oaks, CA:
Sage.
Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment
structure analysis. Psychometrika, 66, 507–514.
Savalei, V. (2010). Small sample statistics for incomplete nonnormal data: Extensions of
complete data formulae and a Monte Carlo comparison. Structural Equation Modeling,
17, 241–264.
Steiger, J. H. (2016). Notes on the Steiger–Lind (1980) handout. Structural Equation
Modeling, 23, 777–781.
70
Steiger, J. H., & Lind, J. C. (1980, May). Statistically-based tests for the number of common
factors. Paper presented at the annual meeting of the Psychometric Society, Iowa City,
IA.
Swain, A. J. (1975). Analysis of parametric structures for variance matrices (Unpublished
doctoral dissertation). Department of Statistics, University of Adelaide, Adelaide,
Australia.
Timmons, A. C., & Preacher, K. J. (2015). The importance of temporal design: How do
measurement intervals affect the accuracy and efficiency of parameter estimates in
longitudinal research? Multivariate Behavioral Research, 50, 41–55.
Tucker, L. R., & Lewis, C. (1973). A reliability coefficient for maximum likelihood factor
analysis. Psychometrika, 38, 1–10.
Widaman, K. F., & Thompson, J. S. (2003). On specifying the null model for incremental fit
indices in structural equation modeling. Psychological Methods, 8(1), 16–37.
Willett, J. B., & Sayer, A. G. (1994). Using covariance structure analysis to detect correlates and
predictors of individual change over time. Psychological Bulletin, 116, 363–381.
Wolf, E. J., Harrington, K. M., Clark, S. L., & Miller, M. W. (2013). Sample size requirements
for structural equation models: An evaluation of power, bias, and solution
propriety. Educational and Psychological Measurement, 73, 913–934.
Woodcock, R. W., & Mather, N. (2001). Woodcock-Johnson Tests of Achievement-Examiner's
Manual, Standard and Extended Manual. Riverside, IL: The Riverside Publishing
Company.
Woodcock, R. W., McGrew, K. S., & Mather, N. (2001). Woodcock-Johnson III Cognitive
Battery and Achievement Battery. Riverside, IL: The Riverside Publishing Company.
71
Yuan, K.-H. (2005). Fit indices versus test statistics. Multivariate Behavioral Research, 40,
115–148.
Yuan, K.-H. & Bentler, P. M. (2000). Three likelihood-based methods for mean and covariance
structure analysis with nonnormal missing data. Sociological Methodology 2000 (pp.
165-200). Washington, DC: American Sociological Association.
Yuan, K.-H., Tian, Y., & Yanagihara, H. (2015). Empirical correction to the likelihood ratio
statistic for structural equation modeling with many variables. Psychometrika, 80, 379–
405.
72
Tables
Table 1. Study 1: Convergence rates by sample size (N), number of time points (t), percent
missing data (m), and model fit to the data.
(a) No change data
Model
N t m No Proportional Constant Dual
20 4 0 1.000 .984 1.000 .711
10 1.000 .995 1.000 .736
20 1.000 .999 .998 .717
6 0 1.000 1.000 1.000 .716
10 1.000 1.000 .999 .727
20 1.000 1.000 .997 .720
8 0 1.000 .998 1.000 .746
10 1.000 1.000 1.000 .744
20 1.000 1.000 1.000 .723
30 4 0 1.000 .981 1.000 .738
10 1.000 .999 1.000 .727
20 1.000 .999 1.000 .727
6 0 1.000 .999 1.000 .738
10 1.000 1.000 1.000 .710
20 1.000 1.000 1.000 .728
8 0 1.000 .999 1.000 .704
10 1.000 .999 1.000 .707
20 1.000 1.000 1.000 .751
50 4 0 1.000 .994 1.000 .731
10 1.000 .998 1.000 .700
20 1.000 1.000 1.000 .718
6 0 1.000 1.000 1.000 .700
10 1.000 1.000 1.000 .679
20 1.000 1.000 1.000 .745
8 0 1.000 .998 1.000 .683
10 1.000 1.000 1.000 .745
20 1.000 1.000 1.000 .679
100 4 0 1.000 .994 1.000 .662
10 1.000 1.000 1.000 .711
20 1.000 1.000 1.000 .703
6 0 1.000 1.000 1.000 .659
10 1.000 1.000 1.000 .685
20 1.000 1.000 1.000 .711
8 0 1.000 .998 1.000 .682
10 1.000 1.000 1.000 .676
20 1.000 1.000 1.000 .683
73
(b) Proportional change data
Model
N t m No Proportional Constant Dual
20 4 0 1.000 1.000 1.000 .999
10 1.000 1.000 1.000 .997
20 1.000 1.000 .999 .994
6 0 1.000 1.000 1.000 .994
10 1.000 1.000 1.000 .997
20 .986 1.000 .993 .993
8 0 1.000 1.000 1.000 .994
10 1.000 1.000 .997 .990
20 .957 1.000 .979 .993
30 4 0 1.000 1.000 1.000 .999
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 .999
20 1.000 1.000 .998 .999
8 0 1.000 1.000 1.000 .997
10 1.000 1.000 1.000 1.000
20 .983 1.000 1.000 .998
50 4 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 .999
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
100 4 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
74
(c) Constant change data
Model
N t m No Proportional Constant Dual
20 4 0 1.000 .999 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 .998 1.000 1.000 1.000
30 4 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 .999 1.000 1.000 1.000
50 4 0 1.000 .999 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
100 4 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
75
(d) Dual change data
Model
N t m No Proportional Constant Dual
20 4 0 1.000 1.000 1.000 1.000
10 .999 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 .986 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 .943 1.000 1.000 1.000
30 4 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 .999 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 .977 1.000 1.000 1.000
50 4 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
100 4 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000
20 .999 1.000 1.000 1.000
76
Table 2. Study 1: Percent bias for parameter estimates of data generating models. Values greater
than 5% in magnitude are given in bold.
(a) No change data
Parameters
N t m 𝜇 𝑖 𝜎 𝑖 2
𝜎 𝑒 2
20 4 0 -0.3 -4.6 -0.4
10 0.2 -4.6 0.9
20 0.2 -3.3 0.0
6 0 -0.1 -6.1 -0.4
10 -0.1 -4.7 -0.3
20 -0.1 -4.9 0.1
8 0 0.2 -4.4 -0.1
10 -0.2 -6.0 0.4
20 -0.3 -5.9 -0.1
30 4 0 -0.1 -2.9 0.2
10 -0.3 -1.4 0.6
20 0.0 -4.5 0.1
6 0 0.2 -2.6 -0.3
10 -0.1 -4.1 0.1
20 0.3 -3.1 -0.2
8 0 0.0 -2.1 0.2
10 -0.2 -3.6 -0.3
20 0.1 -3.2 0.2
50 4 0 0.1 -2.5 -0.2
10 0.0 -2.1 0.6
20 0.2 -2.7 -0.6
6 0 0.0 -2.4 0.4
10 -0.1 -1.2 0.1
20 -0.1 -2.9 -0.2
8 0 0.0 -3.0 0.1
10 -0.1 -2.2 0.0
20 -0.2 -1.5 0.1
100 4 0 0.2 -1.5 0.0
10 0.2 -0.7 0.0
20 0.0 -0.9 -0.1
6 0 0.0 -1.3 -0.1
10 0.1 -0.7 0.0
20 0.1 -2.0 -0.3
8 0 0.0 -1.5 0.2
10 0.1 -1.4 -0.1
20 0.0 -1.4 -0.4
77
(b) Proportional change data
Parameters
N t m 𝜇 𝑖 𝜎 𝑖 2
𝛽 𝜎 𝑒 2
20 4 0 0.1 -6.1 -0.1 -2.7
10 0.2 -3.9 0.0 -3.1
20 -1.0 -4.3 0.0 -3.7
6 0 0.0 -4.0 -0.1 -2.0
10 0.0 -6.8 0.0 -1.8
20 0.1 -4.4 0.0 -1.8
8 0 -0.2 -4.7 0.0 -0.9
10 -0.2 -4.7 0.0 -1.1
20 0.4 -5.8 0.0 -1.5
30 4 0 0.0 -4.2 0.0 -0.6
10 0.2 -1.3 0.0 -1.2
20 0.1 -4.7 0.0 -1.2
6 0 0.0 -2.5 0.0 -1.0
10 0.0 -3.9 0.0 -1.2
20 0.4 -3.0 0.0 -1.2
8 0 0.2 -3.2 0.0 0.2
10 0.0 -3.6 0.0 -0.6
20 0.0 -2.8 0.0 -0.9
50 4 0 0.1 -1.2 0.0 -1.3
10 0.0 -1.6 0.0 -0.4
20 0.1 -2.1 0.0 -0.4
6 0 0.1 -2.4 0.0 -0.5
10 0.0 -2.0 0.0 -0.8
20 0.0 -1.3 0.1 -1.0
8 0 -0.1 -2.1 0.0 -0.2
10 0.0 -2.6 0.0 -0.2
20 -0.2 -1.9 0.0 -0.2
100 4 0 0.0 -1.2 0.0 -0.4
10 0.1 -1.6 0.0 -1.1
20 0.1 -0.5 0.0 -0.4
6 0 -0.1 -0.8 0.0 0.0
10 0.2 -1.1 0.0 -0.3
20 -0.1 -1.2 0.0 -0.2
8 0 0.1 -0.5 0.0 -0.2
10 -0.1 -0.5 0.0 -0.2
20 0.1 -1.3 0.0 -0.2
78
(c) Constant change data
Parameters
N t m 𝜇 𝑖 𝜎 𝑖 2
𝜇 𝑠 𝜎 𝑠 2
𝜎 𝑖𝑠
𝜎 𝑒 2
20 4 0 -0.1 -5.4 -0.2 -5.9 -0.1 -0.3
10 0.1 -4.9 0.2 -7.1 -16.4 -0.8
20 -0.2 -5.4 0.8 -5.0 23.8 0.6
6 0 0.2 -5.1 -0.2 -4.5 0.2 -0.3
10 -0.1 -7.4 0.0 -3.7 10.7 -0.5
20 -0.1 -6.0 0.0 -7.5 56.6 -0.3
8 0 0.4 -5.7 0.1 -5.4 18.5 -0.3
10 0.3 -6.9 -0.6 -3.5 5.0 -0.1
20 0.1 -4.6 -0.1 -5.0 26.0 0.2
30 4 0 0.1 -3.9 -0.7 -3.1 -5.2 0.3
10 0.3 -1.9 -2.0 -3.4 -5.3 0.5
20 -0.2 -3.7 0.6 -4.5 18.4 1.1
6 0 0.0 -5.4 -0.2 -3.3 13.1 -0.6
10 0.2 -3.2 0.8 -3.0 5.6 0.3
20 -0.2 -4.4 0.2 -5.3 2.3 0.1
8 0 -0.1 -2.4 -0.3 -3.0 1.1 0.8
10 -0.1 -2.8 0.8 -3.6 7.3 0.4
20 -0.1 -3.9 0.3 -3.9 -12.4 0.5
50 4 0 0.0 -2.7 -0.1 -1.7 18.3 -0.3
10 -0.1 -0.9 0.2 -3.1 8.0 0.6
20 0.2 -2.2 -0.3 -1.7 -19.6 -0.2
6 0 0.1 -3.2 -0.2 -2.1 3.2 0.1
10 0.0 -1.8 -0.1 -2.7 16.9 0.1
20 0.0 -3.4 0.2 -2.7 -10.1 -0.4
8 0 0.0 -2.7 -0.7 -2.0 5.9 0.2
10 0.0 -1.9 -0.2 -2.4 17.7 -0.4
20 0.0 -1.0 0.0 -3.1 -14.1 -0.1
100 4 0 0.0 -0.8 -0.4 -1.2 15.9 0.2
10 0.0 -1.0 -0.3 -1.0 -13.2 0.2
20 0.1 -1.0 0.3 -1.6 -7.1 -0.4
6 0 -0.1 -1.5 0.0 -1.1 6.4 0.1
10 -0.1 -1.5 0.1 -0.8 -7.5 -0.1
20 0.0 -0.5 -0.2 -1.1 4.9 0.3
8 0 0.1 -0.8 -0.1 -0.9 8.8 -0.2
10 0.0 -0.9 0.0 -0.9 5.3 0.1
20 -0.1 -1.5 0.3 -0.9 -9.0 0.0
79
(d) Dual change data
Parameters
N t m 𝜇 𝑖 𝜎 𝑖 2
𝛽 𝜇 𝑠 𝜎 𝑠 2
𝜎 𝑖𝑠
𝜎 𝑒 2
20 4 0 0.0 -7.0 0.3 -0.9 -4.2 -21.3 -3.5
10 0.1 -5.3 -0.6 2.9 -2.2 28.7 -2.6
20 -0.8 -4.1 0.0 -1.8 -5.8 20.4 -3.5
6 0 -0.2 -6.1 0.0 0.1 -4.6 33.3 -1.7
10 -0.1 -5.0 -0.1 0.9 -6.2 -32.0 -1.9
20 0.2 -5.4 0.0 -0.4 -3.9 13.1 -1.6
8 0 -0.1 -4.5 0.1 -0.6 -5.9 8.2 -1.0
10 -0.1 -5.0 0.1 -1.2 -4.4 -7.4 -1.5
20 -0.1 -6.3 -0.1 2.2 -6.5 14.5 -1.2
30 4 0 -0.3 -6.0 -0.1 0.9 -2.8 7.9 -2.3
10 -0.1 -3.0 -0.3 2.0 -2.0 56.1 -1.5
20 0.1 -4.0 -0.2 1.0 -3.8 -17.3 -3.2
6 0 0.0 -3.1 0.1 -0.3 -3.0 15.9 -1.1
10 -0.2 -4.5 0.0 0.5 -3.2 -4.1 -0.9
20 0.4 -3.8 0.1 -0.1 -3.1 13.9 -0.3
8 0 0.2 -2.8 0.2 -0.7 -2.9 -25.6 -0.1
10 0.0 -3.0 0.1 -0.6 -4.2 -6.8 -0.8
20 -0.1 -3.6 0.0 0.2 -2.3 3.5 -1.1
50 4 0 0.1 -0.7 0.1 -0.7 -2.3 -2.9 -2.1
10 -0.1 -1.7 0.1 -0.1 -2.1 2.9 -1.3
20 -0.1 -2.2 -0.1 1.0 -1.8 8.5 -1.3
6 0 0.1 -3.4 0.1 0.0 -1.6 2.8 -0.7
10 -0.2 -2.1 -0.2 1.2 -0.7 -5.9 -0.9
20 0.1 -1.1 -0.1 0.6 -2.4 20.3 -1.5
8 0 0.0 -1.5 0.0 -0.2 -2.3 -6.9 -0.3
10 0.1 -2.5 0.0 -0.3 -1.4 -21.2 -0.2
20 -0.1 -2.6 0.1 -0.7 -1.7 11.9 0.1
100 4 0 0.0 -1.0 0.2 -1.1 -1.1 -14.4 -0.7
10 0.1 -1.1 0.1 -0.3 -0.7 -22.6 -0.7
20 0.0 -0.8 0.2 -0.4 -1.1 4.6 -0.4
6 0 0.0 -1.3 0.0 -0.5 -0.8 7.8 0.1
10 0.2 -1.6 0.0 0.1 -0.9 4.0 -0.2
20 -0.2 -1.1 0.0 0.0 -1.3 -0.7 -0.3
8 0 0.1 -0.8 0.1 -0.1 -1.1 5.9 -0.2
10 0.0 -1.0 0.0 -0.6 -0.2 1.0 -0.4
20 0.1 -0.5 0.0 0.3 -1.3 -12.7 -0.3
80
Table 3. Study 2: Kolmogorov-Smirnov test p values. Values less than .05 are given in bold.
Test Statistics
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝐿𝐺𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 𝑇 𝐵𝐴
𝐿𝐶𝑆𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈
𝐿𝐺𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 𝑇 𝑌𝑈
𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑆𝑊
𝑇 𝑆𝑊 .𝑀 𝑇 𝑆𝐵
20 4 0 .000 .000 .000 .000 .000 .003 .003 .000 .000 .228 .228 .000
10 .000 .940 .002 .000 .000 .614 .090 .000 .000 .000 .000 .000
20 .000 .002 .000 .000 .000 .000 .010 .001 .000 .000 .000 .000
6 0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
10 .000 .144 .000 .000 .000 .102 .000 .000 .000 .000 .000 .000
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
8 0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
10 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
30 4 0 .001 .001 .001 .000 .000 .009 .009 .000 .000 .780 .780 .000
10 .000 .493 .010 .000 .000 .857 .085 .000 .000 .001 .018 .000
20 .000 .123 .025 .000 .000 .022 .189 .061 .000 .000 .000 .000
6 0 .000 .000 .000 .000 .000 .001 .001 .000 .000 .004 .004 .000
10 .000 .397 .000 .000 .000 .535 .002 .000 .000 .000 .000 .000
20 .000 .000 .019 .000 .000 .000 .060 .000 .000 .000 .000 .000
8 0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
10 .000 .132 .000 .000 .000 .002 .081 .000 .000 .000
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
50 4 0 .003 .489 .489 .000 .000 .710 .710 .006 .006 .268 .268 .000
10 .000 .518 .262 .000 .000 .327 .414 .012 .000 .011 .030 .000
20 .000 .442 .026 .000 .000 .256 .113 .041 .000 .007 .092 .000
6 0 .000 .009 .009 .000 .000 .036 .036 .000 .000 .127 .127 .000
10 .000 .128 .001 .000 .000 .410 .009 .000 .000 .004 .044 .000
20 .000 .065 .018 .000 .000 .012 .278 .000 .000 .000 .000 .000
8 0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
10 .000 .214 .010 .000 .000 .030 .151 .000 .000 .000
20 .000 .000 .007 .000 .000 .000 .270 .000 .000 .000
100 4 0 .196 .219 .219 .000 .000 .284 .284 .030 .030 .508 .508 .024
10 .091 .935 .758 .013 .000 .921 .901 .349 .073 .489 .663 .001
20 .047 .740 .619 .030 .000 .681 .791 .619 .015 .236 .478 .000
6 0 .018 .265 .265 .000 .000 .426 .426 .000 .000 .690 .690 .000
10 .012 .331 .055 .000 .000 .528 .127 .000 .000 .666 .953 .000
20 .000 .916 .023 .000 .000 .931 .076 .001 .000 .082 .618 .000
8 0 .000 .019 .019 .000 .000 .042 .042 .000 .000 .000
10 .000 .747 .046 .000 .000 .949 .152 .000 .000 .000
20 .000 .324 .027 .000 .000 .217 .123 .000 .000 .000
81
Table 4. Study 2: Type I error rates. Values outside the range of .025 to .075 are given in bold.
Test Statistics
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝐿𝐺𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 𝑇 𝐵𝐴
𝐿𝐶𝑆𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈
𝐿𝐺𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 𝑇 𝑌𝑈
𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑆𝑊
𝑇 𝑆𝑊 .𝑀 𝑇 𝑆𝐵
20 4 0 .095 .046 .046 .003 .003 .053 .053 .022 .022 .072 .072 .143
10 .133 .056 .033 .007 .000 .062 .043 .026 .013 .089 .078 .196
20 .173 .092 .050 .021 .001 .096 .061 .058 .013 .137 .111 .273
6 0 .160 .034 .034 .000 .000 .038 .038 .001 .001 .085 .085 .246
10 .231 .050 .035 .001 .000 .056 .041 .007 .001 .117 .091 .344
20 .593 .431 .345 .067 .000 .443 .360 .330 .022 .504 .455 .509
8 0 .254 .020 .020 .000 .000 .026 .026 .000 .000 .406
10 .552 .238 .188 .000 .000 .250 .202 .094 .004 .614
20 .963 .887 .616 .000 .000 .900 .698 .309 .000 .759
30 4 0 .080 .037 .037 .006 .006 .040 .040 .018 .018 .062 .062 .101
10 .081 .047 .041 .010 .007 .050 .044 .031 .019 .065 .062 .120
20 .102 .048 .031 .009 .000 .053 .034 .033 .009 .079 .062 .168
6 0 .107 .038 .038 .001 .001 .041 .041 .008 .008 .066 .066 .151
10 .148 .049 .034 .001 .000 .056 .038 .014 .006 .090 .077 .218
20 .249 .123 .082 .035 .002 .132 .090 .072 .029 .181 .142 .322
8 0 .164 .032 .032 .000 .000 .041 .041 .000 .000 .243
10 .227 .049 .030 .000 .000 .057 .035 .005 .000 .342
20 .616 .485 .415 .069 .000 .497 .425 .359 .052 .462
50 4 0 .059 .040 .040 .022 .022 .041 .041 .031 .031 .052 .052 .076
10 .068 .047 .045 .019 .011 .050 .045 .036 .027 .061 .059 .096
20 .070 .055 .043 .029 .013 .056 .043 .042 .028 .060 .057 .104
6 0 .072 .033 .033 .005 .005 .036 .036 .011 .011 .048 .048 .103
10 .097 .049 .042 .011 .003 .050 .043 .026 .020 .066 .060 .129
20 .110 .057 .034 .014 .002 .058 .038 .030 .011 .076 .064 .159
8 0 .091 .030 .030 .002 .002 .032 .032 .009 .009 .123
10 .156 .053 .039 .002 .000 .059 .043 .014 .008 .209
20 .171 .076 .043 .017 .004 .083 .047 .032 .017 .230
100 4 0 .053 .048 .048 .043 .043 .049 .049 .045 .045 .051 .051 .056
10 .061 .047 .045 .032 .032 .048 .047 .040 .035 .054 .051 .071
20 .077 .067 .059 .049 .039 .069 .061 .058 .047 .072 .071 .094
6 0 .071 .051 .051 .031 .031 .053 .053 .040 .040 .064 .064 .081
10 .062 .042 .038 .016 .010 .042 .041 .030 .025 .051 .048 .077
20 .079 .061 .045 .031 .013 .061 .047 .041 .029 .069 .065 .102
8 0 .059 .037 .037 .012 .012 .037 .037 .025 .025 .081
10 .082 .050 .045 .012 .006 .051 .046 .030 .021 .099
20 .093 .054 .035 .015 .003 .057 .039 .029 .014 .113
82
Table 5. Study 2: Median approximate fit indices.
(a) RMSEA (values above .06 are given in bold)
Test Statistics
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝐿𝐺𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 𝑇 𝐵𝐴
𝐿𝐶𝑆𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈
𝐿𝐺𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 𝑇 𝑌𝑈
𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑆𝑊
𝑇 𝑆𝑊 .𝑀 𝑇 𝑆𝐵
20 4 0 .045 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .088
10 .077 .000 .000 .000 .000 .000 .000 .000 .000 .028 .000 .114
20 .104 .000 .000 .000 .000 .018 .000 .000 .000 .073 .045 .139
6 0 .090 .000 .000 .000 .000 .000 .000 .000 .000 .037 .037 .116
10 .105 .000 .000 .000 .000 .000 .000 .000 .000 .062 .046 .136
20 .200 .141 .090 .000 .000 .146 .105 .059 .000 .172 .155 .174
8 0 .108 .000 .000 .000 .000 .000 .000 .000 .000 .134
10 .157 .070 .013 .000 .000 .079 .042 .000 .000 .166
20 .286 .219 .164 .000 .000 .224 .175 .121 .000 .224
30 4 0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .039
10 .038 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .063
20 .057 .000 .000 .000 .000 .000 .000 .000 .000 .033 .000 .082
6 0 .054 .000 .000 .000 .000 .000 .000 .000 .000 .013 .013 .071
10 .062 .000 .000 .000 .000 .000 .000 .000 .000 .033 .019 .083
20 .086 .037 .000 .000 .000 .043 .000 .000 .000 .065 .050 .102
8 0 .064 .000 .000 .000 .000 .000 .000 .000 .000 .081
10 .081 .000 .000 .000 .000 .013 .000 .000 .000 .100
20 .151 .115 .087 .000 .000 .118 .093 .063 .000 .113
50 4 0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .017
10 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .024
20 .007 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .037
6 0 .022 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .037
10 .029 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .045
20 .040 .000 .000 .000 .000 .000 .000 .000 .000 .025 .009 .055
8 0 .033 .000 .000 .000 .000 .000 .000 .000 .000 .042
10 .044 .000 .000 .000 .000 .000 .000 .000 .000 .054
20 .051 .019 .000 .000 .000 .023 .000 .000 .000 .062
100 4 0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
10 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
20 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000
6 0 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .009
10 .007 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .011
20 .013 .000 .000 .000 .000 .000 .000 .000 .000 .000 .000 .021
8 0 .012 .000 .000 .000 .000 .000 .000 .000 .000 .018
10 .015 .000 .000 .000 .000 .000 .000 .000 .000 .021
20 .021 .000 .000 .000 .000 .000 .000 .000 .000 .027
83
(b) CFI (values below .95 are given in bold)
Test Statistics
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝐿𝐺𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 𝑇 𝐵𝐴
𝐿𝐶𝑆𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈
𝐿𝐺𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 𝑇 𝑌𝑈
𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑆𝑊
𝑇 𝑆𝑊 .𝑀 𝑇 𝑆𝐵
20 4 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
10 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.98
20 0.98 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00 0.96
6 0 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.98
10 0.98 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 1.00 0.96
20 0.91 0.95 0.98 1.00 1.00 0.95 0.98 0.99 1.00 0.93 0.95 0.93
8 0 0.97 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.96
10 0.94 0.99 1.00 1.00 1.00 0.98 1.00 1.00 1.00 0.93
20 0.79 0.88 0.93 1.00 1.00 0.87 0.92 0.96 1.00 0.85
30 4 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
20 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
6 0 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
10 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
20 0.98 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99 0.99 0.97
8 0 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.98
10 0.98 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.97
20 0.93 0.96 0.98 1.00 1.00 0.96 0.97 0.99 1.00 0.96
50 4 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
20 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
6 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
20 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
8 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
20 0.99 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 0.99
100 4 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
20 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
6 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
20 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
8 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
20 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
84
(c) TLI (values below .95 are given in bold)
Test Statistics
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝐿𝐺𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐺𝑀 𝑇 𝐵𝐴
𝐿𝐶𝑆𝑀 𝑇 𝐵𝐴 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈
𝐿𝐺𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐺𝑀 𝑇 𝑌𝑈
𝐿𝐶𝑆𝑀 𝑇 𝑌𝑈 .𝑀 𝐿𝐶𝑆𝑀 𝑇 𝑆𝑊
𝑇 𝑆𝑊 .𝑀 𝑇 𝑆𝐵
20 4 0 1.00 1.02 1.02 1.04 1.04 1.01 1.01 1.03 1.03 1.01 1.01 0.99
10 0.99 1.01 1.02 1.04 1.06 1.01 1.01 1.02 1.03 1.00 1.00 0.97
20 0.97 1.00 1.02 1.04 1.09 1.00 1.02 1.02 1.05 0.99 1.00 0.95
6 0 0.98 1.01 1.01 1.06 1.06 1.01 1.01 1.03 1.03 1.00 1.00 0.97
10 0.97 1.01 1.01 1.07 1.09 1.00 1.01 1.03 1.05 0.99 1.00 0.96
20 0.90 0.95 0.98 1.04 1.13 0.94 0.97 0.99 1.05 0.92 0.94 0.92
8 0 0.97 1.01 1.01 1.10 1.10 1.01 1.01 1.05 1.05 0.96
10 0.93 0.99 1.00 1.10 1.14 0.98 1.00 1.04 1.07 0.92
20 0.77 0.87 0.93 1.06 1.24 0.86 0.92 0.96 1.08 0.84
30 4 0 1.00 1.01 1.01 1.02 1.02 1.01 1.01 1.01 1.01 1.01 1.01 1.00
10 1.00 1.01 1.01 1.02 1.03 1.01 1.01 1.01 1.02 1.00 1.00 0.99
20 0.99 1.00 1.01 1.02 1.04 1.00 1.01 1.01 1.02 1.00 1.00 0.98
6 0 0.99 1.01 1.01 1.03 1.03 1.00 1.00 1.01 1.01 1.00 1.00 0.99
10 0.99 1.00 1.01 1.03 1.04 1.00 1.01 1.02 1.02 1.00 1.00 0.98
20 0.98 1.00 1.01 1.03 1.06 0.99 1.00 1.01 1.03 0.99 0.99 0.97
8 0 0.99 1.01 1.01 1.04 1.04 1.01 1.01 1.02 1.02 0.98
10 0.98 1.00 1.01 1.04 1.06 1.00 1.00 1.02 1.03 0.97
20 0.92 0.96 0.98 1.02 1.08 0.95 0.97 0.99 1.02 0.96
50 4 0 1.00 1.00 1.00 1.01 1.01 1.00 1.00 1.01 1.01 1.00 1.00 1.00
10 1.00 1.01 1.01 1.01 1.01 1.00 1.01 1.01 1.01 1.00 1.00 1.00
20 1.00 1.00 1.01 1.01 1.01 1.00 1.01 1.01 1.01 1.00 1.00 1.00
6 0 1.00 1.00 1.00 1.01 1.01 1.00 1.00 1.01 1.01 1.00 1.00 1.00
10 1.00 1.00 1.00 1.01 1.01 1.00 1.00 1.01 1.01 1.00 1.00 1.00
20 1.00 1.00 1.00 1.01 1.02 1.00 1.00 1.01 1.01 1.00 1.00 0.99
8 0 1.00 1.00 1.00 1.02 1.02 1.00 1.00 1.01 1.01 1.00
10 0.99 1.00 1.00 1.02 1.02 1.00 1.00 1.01 1.01 0.99
20 0.99 1.00 1.00 1.02 1.03 1.00 1.00 1.01 1.02 0.99
100 4 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
20 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00
6 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
20 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.00 1.00 1.00 1.00 1.00
8 0 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00 1.00
10 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.00 1.00 1.00
20 1.00 1.00 1.00 1.00 1.01 1.00 1.00 1.00 1.00 1.00
85
Table 6. Study 3: Proportion of replications for which the likelihood ratio test favored the model
with more estimated parameters.
(a) No change data. Values represent type I error rates. Those outside (.025, .075) given in bold.
NP NC
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀
20 4 0 .041 .010 .010 .016 .016 .054 .013 .013 .017 .017
10 .038 .007 .004 .016 .011 .055 .004 .002 .009 .004
20 .070 .013 .000 .027 .010 .071 .007 .000 .015 .005
6 0 .047 .006 .006 .013 .013 .069 .003 .003 .011 .011
10 .042 .002 .000 .013 .007 .064 .004 .000 .015 .008
20 .050 .001 .000 .012 .001 .057 .002 .000 .011 .001
8 0 .049 .000 .000 .007 .007 .068 .000 .000 .006 .006
10 .044 .000 .000 .005 .001 .062 .003 .001 .009 .004
20 .043 .000 .000 .006 .000 .077 .010 .053 .011 .001
30 4 0 .052 .024 .024 .032 .032 .064 .016 .016 .027 .027
10 .050 .026 .022 .032 .031 .060 .026 .020 .033 .030
20 .056 .027 .007 .039 .023 .070 .029 .013 .036 .025
6 0 .063 .014 .014 .031 .031 .063 .013 .013 .025 .025
10 .052 .021 .013 .034 .027 .066 .016 .007 .027 .019
20 .048 .005 .001 .019 .003 .054 .012 .002 .026 .011
8 0 .064 .010 .010 .021 .021 .071 .007 .007 .020 .020
10 .042 .004 .001 .017 .008 .058 .005 .000 .013 .010
20 .052 .008 .001 .020 .006 .055 .006 .001 .019 .004
50 4 0 .045 .027 .027 .035 .035 .048 .028 .028 .033 .033
10 .053 .032 .028 .037 .033 .055 .030 .024 .036 .032
20 .052 .029 .024 .035 .029 .054 .035 .023 .042 .033
6 0 .050 .028 .028 .032 .032 .046 .020 .020 .027 .027
10 .043 .023 .019 .027 .025 .055 .026 .022 .031 .027
20 .033 .014 .008 .020 .013 .056 .026 .009 .035 .021
8 0 .057 .025 .025 .036 .036 .064 .028 .028 .045 .045
10 .046 .018 .011 .027 .024 .038 .013 .008 .020 .017
20 .041 .013 .006 .024 .013 .061 .023 .011 .032 .021
100 4 0 .049 .038 .038 .041 .041 .044 .035 .035 .038 .038
10 .040 .033 .033 .036 .034 .042 .033 .032 .038 .036
20 .039 .032 .031 .034 .032 .044 .033 .026 .037 .031
6 0 .051 .040 .040 .043 .043 .036 .024 .024 .027 .027
10 .052 .041 .034 .044 .042 .063 .045 .042 .052 .048
20 .052 .036 .029 .043 .034 .039 .023 .018 .030 .022
8 0 .055 .033 .033 .043 .043 .051 .033 .033 .038 .038
10 .054 .043 .038 .045 .045 .067 .032 .027 .045 .038
20 .062 .043 .030 .050 .042 .062 .037 .025 .046 .036
86
(b) Proportional change data. NP values represent type I error rates, while PD values represent
power. Type I error rates outside (.025, .075) and power values below 1.000 are given in bold.
NP PD
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀
20 4 0 1.000 1.000 1.000 1.000 1.000 .051 .006 .006 .023 .023
10 1.000 1.000 1.000 1.000 1.000 .060 .005 .001 .021 .011
20 1.000 1.000 0.998 1.000 1.000 .066 .013 .000 .023 .006
6 0 1.000 1.000 1.000 1.000 1.000 .061 .002 .002 .008 .008
10 1.000 1.000 1.000 1.000 1.000 .060 .003 .000 .005 .003
20 1.000 1.000 0.807 1.000 1.000 .070 .003 .000 .016 .003
8 0 1.000 1.000 1.000 1.000 1.000 .065 .000 .000 .005 .005
10 1.000 1.000 0.889 1.000 1.000 .059 .000 .000 .007 .001
20 1.000 1.000 0.052 1.000 0.996 .055 .003 .054 .009 .000
30 4 0 1.000 1.000 1.000 1.000 1.000 .058 .019 .019 .026 .026
10 1.000 1.000 1.000 1.000 1.000 .063 .023 .017 .035 .027
20 1.000 1.000 1.000 1.000 1.000 .067 .028 .009 .041 .024
6 0 1.000 1.000 1.000 1.000 1.000 .056 .007 .007 .014 .014
10 1.000 1.000 1.000 1.000 1.000 .075 .011 .005 .022 .014
20 1.000 1.000 1.000 1.000 1.000 .056 .016 .004 .029 .014
8 0 1.000 1.000 1.000 1.000 1.000 .050 .006 .006 .013 .013
10 1.000 1.000 1.000 1.000 1.000 .065 .008 .002 .020 .012
20 1.000 1.000 1.000 1.000 1.000 .054 .010 .000 .020 .005
50 4 0 1.000 1.000 1.000 1.000 1.000 .047 .028 .028 .034 .034
10 1.000 1.000 1.000 1.000 1.000 .070 .046 .034 .054 .049
20 1.000 1.000 1.000 1.000 1.000 .062 .031 .019 .044 .030
6 0 1.000 1.000 1.000 1.000 1.000 .051 .028 .028 .034 .034
10 1.000 1.000 1.000 1.000 1.000 .048 .025 .020 .032 .027
20 1.000 1.000 1.000 1.000 1.000 .054 .025 .011 .032 .022
8 0 1.000 1.000 1.000 1.000 1.000 .054 .019 .019 .029 .029
10 1.000 1.000 1.000 1.000 1.000 .062 .018 .012 .030 .022
20 1.000 1.000 1.000 1.000 1.000 .063 .023 .011 .039 .022
100 4 0 1.000 1.000 1.000 1.000 1.000 .048 .037 .037 .039 .039
10 1.000 1.000 1.000 1.000 1.000 .058 .047 .044 .050 .047
20 1.000 1.000 1.000 1.000 1.000 .044 .037 .026 .040 .035
6 0 1.000 1.000 1.000 1.000 1.000 .056 .039 .039 .044 .044
10 1.000 1.000 1.000 1.000 1.000 .045 .029 .025 .033 .030
20 1.000 1.000 1.000 1.000 1.000 .051 .039 .030 .043 .038
8 0 1.000 1.000 1.000 1.000 1.000 .051 .028 .028 .034 .034
10 1.000 1.000 1.000 1.000 1.000 .048 .031 .026 .034 .032
20 1.000 1.000 1.000 1.000 1.000 .059 .034 .026 .048 .033
87
(c) Constant change data. NC values represent type I error rates, while CD values represent
power. Type I error rates outside (.025, .075) and power values below 1.000 are given in bold.
NC CD
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀
20 4 0 1.000 1.000 1.000 1.000 1.000 .050 .006 .006 .020 .020
10 1.000 1.000 1.000 1.000 1.000 .053 .017 .008 .028 .021
20 1.000 1.000 0.998 1.000 1.000 .047 .010 .001 .021 .009
6 0 1.000 1.000 1.000 1.000 1.000 .050 .003 .003 .009 .009
10 1.000 1.000 1.000 1.000 1.000 .056 .000 .000 .009 .004
20 1.000 1.000 0.706 1.000 1.000 .055 .001 .000 .010 .001
8 0 1.000 1.000 1.000 1.000 1.000 .040 .000 .000 .003 .003
10 1.000 1.000 0.817 1.000 1.000 .054 .000 .000 .004 .002
20 1.000 1.000 0.006 1.000 0.995 .049 .000 .000 .003 .000
30 4 0 1.000 1.000 1.000 1.000 1.000 .045 .017 .017 .027 .027
10 1.000 1.000 1.000 1.000 1.000 .042 .020 .015 .029 .022
20 1.000 1.000 1.000 1.000 1.000 .068 .027 .013 .044 .024
6 0 1.000 1.000 1.000 1.000 1.000 .046 .011 .011 .020 .020
10 1.000 1.000 1.000 1.000 1.000 .061 .018 .009 .028 .024
20 1.000 1.000 1.000 1.000 1.000 .060 .013 .001 .026 .012
8 0 1.000 1.000 1.000 1.000 1.000 .046 .002 .002 .016 .016
10 1.000 1.000 1.000 1.000 1.000 .056 .005 .001 .014 .011
20 1.000 1.000 1.000 1.000 1.000 .062 .002 .000 .013 .002
50 4 0 1.000 1.000 1.000 1.000 1.000 .051 .035 .035 .039 .039
10 1.000 1.000 1.000 1.000 1.000 .050 .031 .026 .034 .032
20 1.000 1.000 1.000 1.000 1.000 .050 .033 .021 .040 .031
6 0 1.000 1.000 1.000 1.000 1.000 .053 .028 .028 .039 .039
10 1.000 1.000 1.000 1.000 1.000 .047 .027 .023 .032 .029
20 1.000 1.000 1.000 1.000 1.000 .048 .023 .011 .034 .023
8 0 1.000 1.000 1.000 1.000 1.000 .055 .028 .028 .036 .036
10 1.000 1.000 1.000 1.000 1.000 .053 .017 .011 .028 .027
20 1.000 1.000 1.000 1.000 1.000 .045 .019 .004 .025 .020
100 4 0 1.000 1.000 1.000 1.000 1.000 .068 .050 .050 .057 .057
10 1.000 1.000 1.000 1.000 1.000 .052 .042 .040 .045 .044
20 1.000 1.000 1.000 1.000 1.000 .050 .045 .041 .047 .045
6 0 1.000 1.000 1.000 1.000 1.000 .043 .033 .033 .037 .037
10 1.000 1.000 1.000 1.000 1.000 .049 .039 .036 .044 .044
20 1.000 1.000 1.000 1.000 1.000 .050 .036 .029 .042 .037
8 0 1.000 1.000 1.000 1.000 1.000 .041 .028 .028 .032 .032
10 1.000 1.000 1.000 1.000 1.000 .044 .029 .029 .036 .032
20 1.000 1.000 1.000 1.000 1.000 .059 .042 .033 .053 .041
88
(d) Dual change data. All values represent power. Values below 1.000 are given in bold.
NP PD
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀
20 4 0 1.000 1.000 1.000 1.000 1.000 1.000 .999 .999 .999 .999
10 1.000 1.000 1.000 1.000 1.000 1.000 .991 .982 .997 .994
20 1.000 1.000 .998 1.000 1.000 .992 .958 .740 .975 .926
6 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .997 1.000 1.000
20 1.000 1.000 .807 1.000 1.000 1.000 1.000 .457 1.000 .996
8 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 .889 1.000 1.000 1.000 1.000 .668 1.000 1.000
20 1.000 1.000 .066 1.000 .996 1.000 1.000 .003 1.000 .987
30 4 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 .999 .999 .996 .999 .998
6 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
50 4 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
100 4 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
89
NC CD
N t m 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀 𝑇 𝑀𝐿
𝑇 𝐵𝐴
𝑇 𝐵𝐴 .𝑀 𝑇 𝑌𝑈
𝑇 𝑌𝑈 .𝑀
20 4 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 .998 1.000 1.000 1.000 1.000 .979 1.000 .999
6 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .999 1.000 1.000
20 1.000 1.000 .862 1.000 1.000 1.000 1.000 .479 1.000 1.000
8 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 .934 1.000 1.000 1.000 1.000 .566 1.000 1.000
20 1.000 1.000 .073 1.000 .999 1.000 1.000 .000 1.000 .985
30 4 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 .998 1.000 1.000
50 4 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
100 4 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
6 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
8 0 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
10 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
20 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
90
Table 7. Study 4: Monte Carlo test results under conditions of Study 2. Column “KS test”
contains proportion of replications whose Monte Carlo 𝑇 𝑀𝐿
distributions matched the distribution
of 𝑇 𝑀𝐿
across replication according to the Kolmogorov-Smirnov test (ideal value is 1.00).
Column “Type I error” contains type I error rates for Monte Carlo test (ideal value is .05).
Results
N t m KS test Type I error
20 4 0 .924 .063
20 .970 .054
8 0 .993 .046
20 .815 .051
50 4 0 .996 .047
20 .990 .044
8 0 .815 .038
20 .995 .042
100 4 0 .985 .051
20 .998 .063
8 0 .975 .043
20 .988 .049
91
Table 8. Study 4: Monte Carlo test results for likelihood ratio test with N = 20, t = 4, and m =
20%. Values represent either type I error rate or power depending on data generating model.
Comparison
Data NP NC PD CD
No .067 .042
Proportional 1.000 .046
Constant 1.000 .042
Dual .982 1.000 .983 1.000
92
Figures
Figure 1. LGM with linear basis.
93
Figure 2. Latent change score models.
(a) No change model
94
(b) Proportional change model
95
(c) Constant change model
96
(d) Dual change model
97
Figure 3. Study 2: Distributions of each test statistic for constant change model with
N=30/t=8/m=10%.
98
Figure 4. Likelihood ratio test comparisons for LCSM.
99
Figure 5. Study 4: Distributions of test statistics for constant change model with
N=20/t=4/m=20%, including that of Monte Carlo test.
100
Figure 6. Number Series score by age.
Abstract (if available)
Abstract
The latent change score model (LCSM) is typically fit to large samples, as is common for structural equation models. However, the purpose of this dissertation is to better understand its performance in small samples (N≤100). Through a series of Monte Carlo simulation studies, I show that the LCSM does perform well even in samples as small as N=20. Conventional structural equation modeling software can fit LCSMs and produce parameter estimates with minimal bias. Despite this, neither conventional model fit statistics nor the small sample corrections typically used to adjust them work well both when examining model fit and comparing models via the likelihood ratio test. As an alternative, I propose an extension of the Monte Carlo test to address this and demonstrate its effectiveness in both of the aforementioned contexts. Finally, I apply the knowledge gained from the simulation studies to examine empirical data from the CogUSA study, a study of the effects of aging on cognition in adults.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Estimation of nonlinear mixed effects mixture models with individually varying measurement occasions
PDF
Regularized structural equation modeling
PDF
A Bayesian region of measurement equivalence (ROME) framework for establishing measurement invariance
PDF
Evaluating standard error estimators for multilevel models on small samples with heteroscedasticity and unbalanced cluster sizes
PDF
The design, implementation, and evaluation of accelerated longitudinal designs
PDF
Using classification and regression trees (CART) and random forests to address missing data
PDF
Robustness of rank-based and other robust estimators when comparing groups
PDF
Attrition in longitudinal twin studies: a comparative study of SEM estimation methods
PDF
Comparing robustness to outliers and model misspecification between robust Poisson and log-binomial models
PDF
The cost of missing objectives in multiattribute decision modeling
PDF
Latent change score analysis of the impacts of memory training in the elderly from a randomized clinical trial
PDF
Biometric models of psychopathic traits in adolescence: a comparison of item-level and sum-score approaches
PDF
The impact of statistical method choice: evaluation of the SANO randomized clinical trial using two non-traditional statistical methods
PDF
Outlier-robustness in adaptations to the lasso
PDF
Essays on econometrics analysis of panel data models
PDF
Identifying diverse pathways to cognitive decline in later life using genetic and environmental factors
PDF
The robustification of the lasso and the elastic net: utility in practical research settings
PDF
3D printing and compression testing of biomimetic structures
PDF
Modeling human bounded rationality in opportunistic security games
PDF
Exploiting latent reliability information for classification tasks
Asset Metadata
Creator
Serang, Sarfaraz
(author)
Core Title
On the latent change score model in small samples
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
06/07/2018
Defense Date
04/05/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
latent change score model,longitudinal modeling,OAI-PMH Harvest,small sample,structural equation modeling
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Wilcox, Rand (
committee chair
), Dehghani, Morteza (
committee member
), Grimm, Kevin (
committee member
), John, Richard (
committee member
), Polikoff, Morgan (
committee member
)
Creator Email
serang@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-503365
Unique identifier
UC11268230
Identifier
etd-SerangSarf-6331.pdf (filename),usctheses-c40-503365 (legacy record id)
Legacy Identifier
etd-SerangSarf-6331.pdf
Dmrecord
503365
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Serang, Sarfaraz
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
latent change score model
longitudinal modeling
small sample
structural equation modeling