Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Evaluating standard error estimators for multilevel models on small samples with heteroscedasticity and unbalanced cluster sizes
(USC Thesis Other)
Evaluating standard error estimators for multilevel models on small samples with heteroscedasticity and unbalanced cluster sizes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EVALUATING STANDARD ERROR ESTIMATORS FOR MULTILEVEL MODELS ON
SMALL SAMPLES WITH HETEROSCEDASTICITY AND UNBALANCED CLUSTER
SIZES
by
Yichi Zhang
A Thesis Presented to the
FACULTY OF THE USC DANA AND DAVID DORNSIFE COLLEGE OF LETTERS, ARTS
AND SCIENCES
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the degree
MASTER OF ARTS
(PSYCHOLOGY )
August 2021
Copyright 2021 Yichi Zhang
List of Figures ................................................................................................................................................................ iii
Abstract .......................................................................................................................................................................... iv
Introduction ...................................................................................................................................................................... 1
Multilevel Models ............................................................................................................................................. 2
Overview. ............................................................................................................................................ 2
Estimation Methods. ........................................................................................................................... 4
Problems With Real Data .................................................................................................................................. 7
Heteroscedasticity. .............................................................................................................................. 7
Factors Influencing Standard Errors Estimation. ................................................................................ 8
Adjustments For Estimated Covariance Matrix ................................................................................................ 7
Kenward-Roger Correction. .............................................................................................................. 11
Adjusted Cluster-Robust Standard Errors. ........................................................................................ 12
Current Study .................................................................................................................................................. 13
Methods ......................................................................................................................................................................... 14
Design Conditions ........................................................................................................................................... 15
Number Of Clusters. ......................................................................................................................... 15
Average Cluster Size. ........................................................................................................................ 15
Intraclass Correlation (Icc). .............................................................................................................. 15
Imbalance Of Cluster Size. ............................................................................................................... 16
Heteroscedasticity. ............................................................................................................................ 16
Evaluation Criteria .......................................................................................................................................... 17
Results ............................................................................................................................................................................ 17
Relative Bias For Standard Errors. .................................................................................................................. 17
Type I Error Rate. ............................................................................................................................................ 19
Discussion ...................................................................................................................................................................... 20
References ...................................................................................................................................................................... 24
Appendices .................................................................................................................................................................... 38
Appendix A: Bias ............................................................................................................................................ 38
Intercept (Gamma 00) ....................................................................................................................... 38
Gamma 01 (Coefficient For Z) ......................................................................................................... 41
Gamma 10 (Coefficient For X) ......................................................................................................... 44
Appendix B: Type I Error Rates ...................................................................................................................... 47
Gamma 01 (Coefficient For Z) ......................................................................................................... 47
Gamma 10 (Coefficient For X) ......................................................................................................... 50
TABLE OF CONTENTS
ii
List of Figures
Figure 1: Relative Bias for Standard Errors of Intercept (γ00). ...................................................... 30
Figure 2: Relative Bias for Standard Errors of Between-Cluster coefficient (γ01) With
Heteroscedasticity at Level One. .................................................................................................... 31
Figure 3: Relative Bias for Standard Errors of Between-Cluster Coefficient (γ01) With
Heteroscedasticity at Level Two. ................................................................................................... 32
Figure 4: Relative Bias for Standard Errors of Between-Cluster coefficient (γ01) With
Heteroscedasticity at Level One. .................................................................................................... 33
Figure 5: Relative Bias for Standard Errors of Within-cluster Coefficient (γ10). ........................... 34
Figure 6: Type I Error Rate for Standard Errors of Between-Cluster coefficient (γ01). ................. 35
Figure 7: Type I Error Rate for Standard Errors of Between-Cluster coefficient (γ01). ......... 36
Figure 8: Type I Error Rate for Standard Errors of Within-Cluster coefficient (γ10). ............ 37
ii
iii
Abstract
Multilevel modeling (MLM) is commonly used in psychological research to model
clustered data. One of the assumptions of MLM is the homogeneity of variance. However, data in
experimental research are usually small samples with heteroscedastic variances and unbalanced
cluster sizes. The fixed-effect estimates given by the maximum likelihood method remain
unbiased, but the standard errors for the fixed effects are underestimated, resulting in inaccurate
inferences and inflated Type I error rates. Small-sample corrections have been used to correct the
bias in fixed effects standard errors and provide valid inferences, such as the Kenward-Roger
(KR) adjustment and the adjusted cluster-robust standard errors (CR-SEs) with the Satterthwaite
approximation for 𝑡-tests. The current study compares these two standard error estimators for
small samples using a Monte Carlo simulation study. Results show KR’s adjusted standard errors
have large biases and inflated Type I error rates for between-cluster effect in the presence of level
two heteroscedasticity. In contrast, the adjusted CR-SEs maintain the Type I error rates close to
the nominal level. Thus, it is recommended to use the adjusted CR-SEs to guard against
unmodeled heterogeneity when the interest is to make inferences of the between-cluster effect.
Keywords: multilevel modeling, variance estimation, heteroscedasticity, small samples,
unbalanced clusters
iv
Clustered data is common in social science research. For example, in educational
studies, students are nested within schools; in longitudinal studies, observations are nested
within individuals. In this type of hierarchical data structure, errors are correlated within
clusters but not across clusters. However, statistical models such as linear regression models
assume errors to be independent and follow identical distributions (i.i.d.). Ignoring the
within-cluster error correlation results in underestimating standard errors, misleadingly
narrower confidence intervals, larger t statistics, and smaller p-values (Cameron & Miller,
2015). Several statistical methods can be used to analyze clustered data; one is the
multilevel model approach (MLM), a widely used method in psychological research
(Raudenbush & Bryk, 2002; Snijders, 2011). Some alternative methods are cluster-robust
standard errors (CR-SEs) with ordinary least square (OLS) estimates, generalized estimating
equations (GEE), and Taylor series linearization (McNeish & Stapleton, 2017).
Although MLM accounts for the within-cluster error correlation by specifying the
error structure in multiple levels, it assumes a homogeneity of variance. For example, a
two-level random slopes model requires equal variances of errors across observations, and
equal variances of random effects across clusters (Snijders, 2011). However, in reality this
assumption is almost always violated. Real data often has heteroscedastic variance paired
with unbalanced cluster sizes and small sample sizes. These features threaten the validity of
the variance estimation given by MLM, potentially leading to erroneous conclusions for
hypothesis testing. The current study examines small samples with unbalanced cluster sizes
to compare two commonly used standard error estimators when heteroscedasticity exists.
This manuscript introduces MLM and its assumptions, as well as related standard
error estimators and hypothesis testing. Next, the features of real data, including
heteroscedastic variance, unbalanced cluster sizes and small sample sizes, are discussed in
detail with an emphasis on how they influence the variance estimation. Then, adjustments
Introduction
1
for standard error estimation and associated inferences are reviewed, specifically the
Kenward-Roger (KR) correction and the adjusted CR-SEs with the Satterthwaite
approximation. A Monte Carlo simulation is conducted to compare these two standard error
estimators in small samples with heteroscedasticity and unbalanced cluster sizes. Lastly, I
provided some recommendations and directions for future study.
Multilevel Models
Overview
With clustered data, MLM is a popular method in psychological research because of
its advantages in making inferences for cluster level parameters and modeling cross-level
interactions (McNeish & Stapleton, 2017). MLM accounts for the clustered nature of data by
differentiating the effects of explanatory variables into fixed effects and random effects.
Fixed effects describe the shared relationship between the outcome variable and the
explanatory variable across clusters and observations. Random effects represent the
cluster-specific deviation in intercepts and slopes from the shared regression line (Snijders,
2011). MLM partitions the variance of the outcome variable into between-cluster variance
and within-cluster variance, thereby obtaining more accurate estimates of standard errors
than single-level regressions (McNeish & Stapleton, 2017).
Similar to the assumptions of the linear regression analysis, the assumptions of MLM
include linear relationships, normality, and homoscedasticity (Maas & Hox, 2004; Snijders,
2011). In MLM specifically, homoscedasticity means variance of errors and random effects
should be equal across observations and clusters (Snijders, 2011). However, this assumption
can be relaxed by adding the relations between predictor variables and variance into the
model (Goldstein, 2010; Raudenbush & Bryk, 2002; Snijders, 2011).
Assume y
ij
is the observed outcome for a person i in cluster j, and x
ij
is the
explanatory variable, or predictor, on the lower level for the same person. β
0j
is the
2
cluster-specific intercept, β
1j
is the coefficient for the lower level predictor x
ij
and e
ij
is the
error. In MLM, we assume all errors or residuals are mutually independent and have a mean
0. The level one regression can be written as:
y
ij
=β
0j
+β
1j
x
ij
+e
ij
The cluster-level regression models for β
0j
and β
1j
are
β
0j
=γ
00
+γ
01
z
j
+u
0j
β
1j
=γ
10
+γ
11
z
j
+u
1j
γ
00
is the grand intercept, z
j
is an explanatory variable on the cluster level for cluster j, γ
01
is the coefficient for z
j
, u
0j
is the cluster level deviation from the mean intercept when
z
j
= 0, and u
ij
is the cluster level deviation from the mean slope when z
j
= 0. γ
10
is the
average slope across clusters, and u
1j
is the cluster level deviation from the mean slope. Note
x
ij
does not have a group component. Also, z
j
and x
ij
are two separate variables.
Combining the equations above, we obtain a single regression model as listed below.
y
ij
=γ
00
+γ
01
z
j
+γ
10
x
ij
+γ
11
z
j
x
ij
+u
1j
x
ij
+u
0j
+e
ij
(1)
The fixed effect, γ
00
+γ
01
z
j
+γ
10
x
ij
+γ
11
z
j
x
ij
, represents the common relationship
between predictors and outcomes regardless of the clusters. The random effect,
u
1j
x
ij
+u
0j
+e
ij
represents the deviation of each cluster or observation from the fixed effects.
The variance components are assumed to follow a bivariate normal distribution,
µ 0j
µ 1j
∼N
0
0
,
τ
2
0
τ
01
τ
01
τ
2
1
. τ
2
0
and τ
2
1
are the variance of µ 0j
and µ 1j
correspondingly,
and τ
01
is the covariance between the random intercept and the random slope. The level one
error is assumed to follow a normal distribution, e
ij
∼N(0,σ
2
).
Applied researchers usually are most interested in the fixed effect parameters,
including their point estimates and hypothesis tests. The two major estimation methods:
3
maximum likelihood (ML) estimation and restricted maximum likelihood (REML)
estimation generate the most probable value of model parameters given the observed data
(Raudenbush & Bryk, 2002). These two estimation methods are asymptotically unbiased
methods with assumptions that the cluster level deviations and the level one errors are
normally distributed (Goldstein, 1989). The details are discussed in the next section.
In addition to parameter estimates, hypothesis testing is also the main focus. In mlm,
the single parameter test statistic
ˆ γ
SE(ˆ γ)
under the null hypothesis H
o
:γ = 0 follows a t
distribution (Snijders, 2011). The degrees of freedom (df) equals to the total number of units
at the specified level minus the number of predictors at the same level minus one. When the
df is large, the t distribution is close to a normal distribution. Multiparameter test for MLM
includes the multivariate Wald test and the likelihood ratio test (Snijders, 2011). The
multivariate Wald test is only applicable to test fixed-effects parameters. The distribution of
the test statistic under the null hypothesis is an F distribution. The df on numerator is the
number of tested regression parameters, and the df on the denominator is calculated in the
same way as the df in t-test. This F distribution can be approximated by a chi-square
distribution with df equal to the number of tested parameters. The likelihood ratio test can
be used in both multiparameter test and test of random-effects parameters by examining
whether the difference in deviance from the models are statistically significant. The test
statistics under the null hypothesis follow a chi-square distribution with degrees of freedom
depending on the type of tested parameters, i.e., random intercept, random slopes. Further
details of hypothesis testing in MLM can be found at Snijders (2011).
Estimation Methods
The default estimation method for MLM in most statistical software is the ML
estimation or the REML estimation. In this section, I first review the details of these two
estimators and discuss their limitations, then introduce the cluster-robust standard errors,
an variance estimator that can be used in conjunction with the REML estimates.
4
Maximum Likelihood Estimation and Restricted Maximum Likelihood
Estimation. ML gives parameter estimates that have the highest likelihood by
simultaneously estimating variance components and fixed effects components (Raudenbush &
Bryk, 2002). Since these two components are dependent, an iterative approach such as the
expectation maximization (EM) algorithm is needed to find the maximum likelihood
estimates. Specifically, the fixed effects are estimated assuming the random effects are
missing in the initial iteration, then used for estimating variance components. In the second
iteration, the variance estimates from the first iteration are used to update the fixed effect
estimates. This process continued until the estimates stay the same (McNeish, 2017).
However, this procedure ignores the variability of fixed effect estimates and does not account
for the loss of degrees of freedom in fixed-effects estimation. These problems are more
pronounced with small samples because sampling variability is large and impacts of degrees
of freedom are substantial. Thus, although the parameter estimates are still unbiased, the
variance components are underestimated, resulting in underestimated standard errors and
biased inference (Browne & Draper, 2006).
Restricted maximum likelihood estimation (REML) improves the performance of the
ML method by separating the estimation of fixed effects and variance components (McNeish,
2017). Specifically, REML first obtains the variance components assuming fixed effects equal
to zero and then uses the generalized least square (GLS) method to estimate the fixed effects.
Consider a two level MLM with a single level one predictor, the matrix notation of
equation (1) is,
Y
j
=X
j
γ +X
j
u
j
+ϵ
j
The variance of Y
j
can be written as (Raudenbush & Bryk, 2002, p.278).
Var(Y
j
) = Var(X
j
u
j
+ϵ
j
) =V
j
=X
j
TX
T
j
+σ
2
I
n
j
REML uses generalized least squares estimators (GLS) that assumes variance components
5
are known, so the regression coefficients can be written as
ˆ γ = (
j
X
j=1
X
T
j
V
−1
j
X
j
)
−1
j
X
j=1
X
T
j
V
−1
j
Y
j
The variance of regression coefficients is
Var
GLS
(ˆ γ) = (
j
X
j=1
X
T
j
V
−1
j
X
j
)
−1
(2)
Note that both ML and REML assumed the residuals at all levels follow normality
distributions. When this assumption is violated, the parameter estimates are still
asymptotically unbiased, but the standard error estimates are inaccurate (Goldstein, 2010).
CR-SEs. Some adjustments are proposed to correct the standard error estimates
given by the ML or REML when distributional assumptions are violated. One adjustment is
to apply the robust standard errors, which are generated by first estimating the regression
model with ordinary least squares (OLS) and then adjusting the standard errors using the
Satterthwaite estimation proposed by White (1980). The resulting standard errors are called
robust standard errors or heteroscedastic-robust standard errors. Zeger et al. (1988) applied
this estimator to clustered data, which generates the CR-SEs (also called heteroscedastic-
and cluster-robust standard errors). CR-SEs do not place assumptions on the correlation
structure of errors, instead they assume the observations can be grouped into mutually
independent clusters (Pustejovsky & Tipton, 2018). Although MLM already accounts for the
error correlation by specifying the random components, CR-SEs can still be applied to MLM
to account for model misspecification, such as unmodeled heterogeneity (Hox et al., 2010;
McNeish & Stapleton, 2017; Raudenbush & Bryk, 2002; Yuan & Bentler, 2002). Raudenbush
and Bryk (2002) extended the heteroscedastic-robust variance estimator on GLS estimates
(p. 278),
Var
Robust
(ˆ γ) = (
j
X
j=1
X
T
j
V
−1
j
X
j
)
−1
j
X
j=1
X
T
j
V
−1
j
(Y
j
−X
j
ˆ γ)(Y
j
−X
j
ˆ γ)
T
X
j
V
−1
j
(
j
X
j=1
X
T
j
V
−1
j
X
j
)
−1
(3)
6
The details of this heteroscedastic-robust estimator for MLM can be found at
Raudenbush and Bryk (2002). In short, if assumptions of normality or homoscedasticity are
violated, GLS will produce unbiased regression coefficients and biased standard error
estimates. Specifically, Var
GLS
given by equation (2) underestimates the variability in the
regression coefficients, whereas Var
Robust
uses a correction matrix based on observed residuals
to generate more accurate standard error estimates. If residuals follow normality and
homoscedasticity, both Var
GLS
and Var
Robust
are consistent variance estimators but Var
GLS
will be more efficient. With the presence of nonnormality and heteroscedasticity, Var
Robust
has been found to be more accurate (Maas & Hox, 2004).
Problems With Real Data
Both MLM and its estimation methods have distributional assumptions, such as
homoscedastic variance, and normality of residuals at all levels. In this study, I am
interested in the performance of variance estimators for MLM when the assumptions of
homoscedasticity of all levels are no longer valid. The other influencing factors common in
real data are also considered, specifically the small sample size and unbalanced cluster sizes.
Heteroscedasticity
One essential assumption for MLM is the homogeneity of variance, meaning variance
of errors and random effects should be equal across observations and clusters. Unequal
variances of errors across observations or unequal variance of random effects across clusters is
called heteroscedasticity (Snijders, 2011). In MLM, the assumption of homoscedasticity can
be relaxed by directly modeling the relations between predictor variables and variance
(Goldstein, 2010; Raudenbush & Bryk, 2002; Snijders, 2011). When the effects of predictors
vary across clusters but are treated as fixed, then the unmodeled random effects are shown
as heteroscedastic error variance (Snijders, 2011). Some causes for heteroscedasticity are
omitted predictor variables, omitted random effects, coding errors in multiple clusters, and
nonnormal data with heavy tails (Raudenbush & Bryk, 2002). The current study focuses on
7
the omitted random effects at two levels.
The influence of heteroscedasticity on parameter estimation and hypothesis tests are
mixed. If the variance varies randomly, then the influence on coefficients’ point estimates
and standard errors is negligible (Raudenbush & Bryk, 2002). However, if homoscedasticity
is assumed, but variance depends on the explanatory variable either at level one or level two,
then the influence of heteroscedasticity can be serious (Kasim & Raudenbush, 1998; Snijders,
2011). Specifically, Raudenbush and Bryk (2002) stated the violation of homoscedasticity
assumption at level one does not significantly impact the level two coefficients or standard
errors. However, if the heteroscedasticity is caused by unidentified random slopes at level
one, then the level two coefficients would be biased. If the homoscedasticity assumption at
level two is violated, then the level two fixed effect coefficients will remain unbiased but
inefficient (Raudenbush & Bryk, 2002).
A few studies have investigated the performance of REML with the presence of
heteroscedasticity at two levels. For example, Vallejo et al. (2015) concluded that the
fixed-effects estimates were unbiased, but the associated standard errors were biased. The
direction of bias depends on the pairing of heteroscedastic variance and group size. For the
random effects, the estimates were slightly overestimated and standard errors were
underestimated. Similarly, Schielzeth et al. (2020) found that heteroscedastic distributions
resulted in little bias for fixed effect estimates, but the coverage of the confidence intervals
was too low.
Factors Influencing Standard Errors Estimation
The impact of heteroscedasticity depends on conditional factors, such as intraclass
correlation (ICC), sample size and unbalanced design. In this section, I review the literature
on small samples and unbalanced design, then discuss how they influence the estimates in
MLM with heteroscedastic variance.
8
Small Sample Size. For clustered data collected in psychological research, it is
common to have small samples. A widely used criterion for small samples is 30 clusters with
a size of 30 observations (Kreft & Leeuw, 1998). According to a review of McNeish (2016),
using the above criterion, 20% of multilevel models and 30% cluster randomized trials have
small samples. Dedrick et al. (2009) reviewed 99 previous social science studies using
multilevel models and found 20% of them have sample sizes smaller than the suggested
cut-offs.
In multilevel models, small sample sizes is more complexed than single level linear
regression. Since the upper level sample size is always smaller than the lower level sample
size (Maas & Hox, 2005), the upper level sample size determines whether the data suffers
small sample bias (Snijders & Bosker, 1993). When the number of clusters is small, there is
little information about between-group variations, so it is difficult to estimate the variance
parameters (Gelman & Hill, 2006). Another problem is the commonly used estimation
methods for MLM, such as maximum likelihood (ML), are only asymptotically unbiased.
When the sample size is small, the accuracy of the estimates and the standard error cannot
be guaranteed, depending on the degree of cluster size imbalance (Maas & Hox, 2005;
Raudenbush & Bryk, 2002).
Ignoring the small sample bias results in biased estimates for parameters and
inaccurate decisions for associated inference. Maas and Hox (2005) found when the number
of clusters is smaller than 100, the standard errors for the second level variance components
can be underestimated. Other studies generally found the fixed effects standard errors were
also downwardly biased when the number of clusters is below 25 (McNeish, 2017).
Corrections for estimation methods have been developed to overcome the small sample bias,
such as the KR corrections for REML estimates (Kenward & Roger, 1997), bias-reduced
linearization for CR-SEs (Bell & Mccaffrey, 2002).
9
Unbalanced Data. Unbalanced data can result from certain experimental designs
such as cross-over designs or unequal numbers of missing data (Spilke et al., 2005). In
unbalanced data, fixed-effect point estimates and the associated standard errors depend on
the point estimates of variance components. The ML estimation underestimates the
standard error of fixed-effect parameters and is liberal in hypothesis tests in small samples.
Raudenbush and Bryk (2002) suggests using the REML and adjusted df when making
inferences in small samples with unbalanced cluster sizes.
Heteroscedasticity can pair with the unbalanced sample sizes because the imbalance
might result from preexisting group differences, which can generate differences in both group
means and group variances (Blanca et al., 2018). There are abundant studies in the
literature of analysis of variance (ANOVA) discussing this relationship (Blanca et al., 2018;
Vallejo et al., 2010). The pairing of group variances and group sizes can significantly
influence inference testing. Specifically, positive pairing is when the group with the largest
group size has the largest variance, whereas negative pairing is when the group with the
largest group size has the smallest variance. For example, Blanca et al. (2018) examined the
impact of variance ratio on ANOVA, and found the Type I error rates associated with the
F-test depends on the degree of heteroscedasticity and the pairing between variance and
sample size.
Similarly, heteroscedasticity often coexists with unbalanced cluster sizes in MLM. The
influence is more problematic when the sample size is small (Raudenbush & Bryk, 2002).
Korendijk et al. (2008) showed that in the presence of the second-level heteroscedasticity and
unbalanced cluster sizes, the standard errors for the second level fixed and variance
components were biased when the level two sample size was small. Kasim and Raudenbush
(1998) compared the Bayesian method modeling heteroscedasticity with REML assuming
homoscedasticity using heteroscedastic data at level one. The results showed that when the
number of clusters is small, the standard errors are sensitive to heteroscedasticity and
10
sampling distributions of the variance components are skewed. Darandari (2004) investigated
the robustness of multilevel models to higher-level assumption violations with unbalanced
study design, and they found the robustness of second level variance estimates depends
significantly on sample size.
Adjustments for Estimated Covariance Matrix
To improve the performance of estimators in the presence of above factors, some
variance estimators are proposed to further adjust the standard errors and the degrees of
freedom for the hypothesis tests, such as the KR correction and the adjusted CR-SEs.
Kenward-Roger Correction
When making inferences of fixed-effect parameters with small samples, the common
practice is to apply the KR approximation of df on REML estimates. The first step of KR is
to ensure the test statistic derived from REML is accurate. Although REML improves the
variance estimates in small samples compared to ML, the standard error estimates can still
be biased. Kackar and Harville (1984) used a Taylor Series expansion to correct this bias.
Moreover, REML ignores the variability of variance components when fitting the GLS model
to get fixed-effects estimates. Thus, KR incorporated Kackar and Harville’s approximation
and performed a second Taylor series expansions to account for the variability,
consequentially adjusting the t-test statistic (Kenward & Roger, 1997; McNeish, 2017). The
second step of KR is to correct the degrees of freedom of the t-test using a method based on
the Satterthwaite approximation (Kenward & Roger, 1997). By adjusting both the test
statistic and the degrees of freedom, KR corrects the underestimated p values for inference
tests and keeps the type-I error rate at a nominal level. Spilke et al. (2005) used the KR on
small samples with unbalanced cluster sizes, and found KR effectively correcting the degrees
of freedom. Kowalchuk et al. (2004) also found the extension of KR in F test controlled the
Type I error rates well in small samples with unbalanced cluster sizes when the assumptions
of normality and homoscedasticity were violated.
11
Adjusted Cluster-robust Standard Errors
The CR-SEs described in equation (3) are based on the original HC0 estimator
derived by White (1980), which might underestimate the covariance matrix when the sample
size is small (MacKinnon, 2012). MacKinnon and White (1985) proposed several
modifications such as HC1, HC2, and HC3, and these modifications have been generalized to
clustered data labeled as CR1, CR2 and CR3 (Bell & Mccaffrey, 2002; Cameron & Miller,
2015; Pustejovsky & Tipton, 2018). Specifically, assuming there are J clusters in the data
set, CR1 multiplies the estimated variance matrix with
q
J
J−1
(Cameron & Miller, 2015).
However, this adjustment still leads to underestimated variance when the cluster sizes are
unbalanced. Bell and Mccaffrey (2002) proposed the CR2 or bias-reduced linearization
(BRL) correction, an extension of the HC2 variance estimator. The BRL correction assumes
the residuals follow a “working model” specified by researchers, and it defines adjustment
matrices for variance estimator to yield unbiased estimates (Pustejovsky & Tipton, 2018).
Mancl and DeRouen (2001) proposed CR3, which leads to the delete-one-cluster jackknife
estimate of the variance. However, CR3 can over-correct the standard error estimates,
resulting in over-rejections for small samples (Pustejovsky & Tipton, 2018), so literature
tends to suggest using CR2 (Cameron & Miller, 2015).
CR-SEs are developed to be robust to clustering and heteroscedasticity, depending on
the sample size and degree of imbalance of cluster sizes. The BRL correction can largely
improve the CR-SEs in small samples and is often pair with the Satterthwaite approximation
of the degrees of freedom for inference tests (Bell & Mccaffrey, 2002; Satterthwaite, 1946).
Imbens and Kolesár (2016) showed the CR-SEs with BRL correction performed well for
small, clustered data set with heteroscedastic variances. They argued the BRL correction
and the Satterthwaite approximation should always be used even for moderate sample sizes.
However, BRL is undefined in some models and is limited to single parameter
hypothesis tests. The magnitude of BRL adjustments also depends on the specific estimator
12
that is used. Thus, Pustejovsky and Tipton (2018) extended the BRL methods to commonly
used models and multiple parameters hypothesis tests. Specifically, they used Hotelling’s T
2
distribution with estimated degrees of freedom, a generalization of saddle point
approximations proposed by Bell and Mccaffrey (2002). This method has been recently
implemented to R package clubSandwich (Pustejovsky, 2021), and is referred to as the
adjusted CR-SEs in this paper. Pustejovsky and Tipton (2018) found the adjusted CR-SEs
performed well in small data sets with unbalanced cluster sizes. The Type I error rates were
controlled close to the nominal level across all conditions. However, this study used
single-level models with CR-SEs, so it is unclear how the adjusted CR-SEs will perform in
conjunction with REML estimates from MLM with misspecified error structure and
unbalanced cluster sizes.
Note that the CR-SEs or the adjusted CR-SEs are only robust to clustering and
heteroscedasticity, meaning that a small departure from normality or presence of outliers can
seriously influence its performance. This issue is revisited in the Discussion.
Current Study
To sum up, in experimental research using multilevel analysis, the interest is usually
about fixed-effect coefficients and their statistical inferences. However, real data are often
small samples with heteroscedastic variances and unbalanced cluster sizes. The REML
fixed-effect estimates are unbiased, but the associated standard errors and inferences are
invalid. Two standard error estimators: KR and the adjusted CR-SEs are proposed to
improve the standard error estimates and control the Type I error rates. To my knowledge,
there is no study has yet evaluated the two standard error estimators in the above conditions.
Thus, the current study aims to compare the KR and the adjusted CR-SEs using a Monte
Carlo simulation. Specifically, the interest is in investigating their performances for
inferences on the fixed effects coefficients in small samples with unbalanced cluster sizes and
heteroscedasticity at both levels.
13
Methods
A two level random slope model with one predictor at level 1 and one predictor at
level 2,
y
ij
=γ
00
+γ
01
z
j
+γ
10
x
ij
+γ
11
z
j
x
ij
+λ(z
j
)u
1j
x
ij
+λ(z
j
)u
0j
+λ(x
ij
)e
ij
was used to generate the simulated data sets. Here, λ is a a function for heteroscedasticity
that will be described later. The data was analyzed using the random slope model assuming
homoscedasticity,
y
ij
=γ
00
+γ
01
z
j
+γ
10
x
ij
+γ
11
z
j
x
ij
+u
1j
x
ij
+u
0j
+e
ij
Following Maas and Hox (2005), the grand intercept γ
00
was set to be 1. Because this
study examines the Type I error rate, the slopes for within-cluster γ
10
and for
between-cluster γ
01
were set to be 0. I set the regression coefficient for the cross-level
interaction, γ
11
to be 0.3, corresponding to a medium effect size (Cohen, 1988). The error
term µ 0j
, µ 1j
and e
ij
were generated from normal distributions with mean 0 and variances
τ
2
0
, τ
2
1
and σ
2
accordingly. σ
2
was set as 1 and τ
2
0
was set based on the value of ICC and σ
2
.
If ICC = 0.1, then τ
2
0
= 0.11, and if ICC = 0.3, then τ
2
0
= 0.43. Following Busing (1993) and
Maas and Hox (2005), τ
2
0
and τ
2
1
were set to be equal. I assumed the covariance between
random slope µ 0j
and random intercept µ 1j
was 0. The predictors x
ij
and z
j
were generated
from the standard normal distribution.
Six conditions were varied in the simulation: (a) number of clusters; (b) average
cluster size; (c) ICC; (d) balanced versus unbalanced clusters; (e) heteroscedasticity at level
one; and (f) heteroscedasticity at level two. There were 3× 2× 2× 2× 2× 2 = 96 conditions.
For each condition, 2500 data sets were generated, and the simulation was structured by R
package SimDesign (Chalmers & Adkins, 2020). To obtain the REML estimates for fixed
effects, I used R package lme4 (Bates et al., 2015) to fit the two level random slope model to
14
simulated data sets. Two standard error estimators were then applied: KR using R package
lmerTest (Kuznetsova et al., 2017) and adjusted CR-SEs using R package clubSandwich
(Pustejovsky, 2021).
Design Conditions
Number of Clusters
The simulation included three conditions for the number of clusters, 15, 30, 60. In
organizational and educational research, it is common to have 50 groups, and 30 is the least
number of groups to provide unbiased estimates (Kreft & Leeuw, 1998). Research has
suggested that estimation methods based on ML might perform poorly, so 15 was also
included to evaluate the two standard error estimators (Korendijk et al., 2008).
Average Cluster Size
The average cluster size of 5 and 30 were included. In family research and
longitudinal research, a cluster size of 5 is normal. In educational research, a cluster size of
30 is prevalent (Maas & Hox, 2005).
Intraclass correlation (ICC)
The intraclass correlation (ICC) represents the average correlation between
observations in the same cluster (Snijders, 2011), which was set as 0.1 and 0.3 in the current
study. Past simulation studies showed that ICC could significantly influence the accuracy of
the estimates and standard errors (Hox & Maas, 2001). This correlation can be calculated by
specifying an empty model with no predictors, such as
y
ij
=γ
00
+u
0j
+e
ij
and calculated as
ICC =
τ
2
0
τ
2
0
+σ
2
15
Hedges and Hedberg (2007) reported the ICC for educational research conducted in
American ranged from 0.1 to 0.25, I also included 0.3 because the CR-SEs method might
perform worse than REML when ICC is larger than 0.3 (McNeish & Stapleton, 2017).
Imbalance of Cluster Size
The cluster sizes were simulated to be balanced or unbalanced. For the balanced
design, each cluster had the average cluster size. For the unbalanced design, the current
study followed Lai (2020), dividing the number of clusters into five strata, so each had J/5
clusters. The five strata had sizes of n/3, 2n/3, n, 4n/3, and 5n/3. The largest clusters were
five times the size of the smallest clusters.
Heteroscedasticity
The current simulation study examined heteroscedasticity at levels one and two.
Homoscedasticity was also examined to provide a baseline condition. In the homoscedastic
condition, λ(z
j
) = 1 and λ(x
ij
) = 1. In heteroscedastic condition, since the unbalanced
clusters sizes might influence how estimators react to heteroscedastic data (Blanca et al.,
2018; Vallejo et al., 2015), I simulated the random effects using
0.1
√
n×λ(z
j
)u
0j
+ 0.1
√
n×λ(z
j
)u
1j
x
ij
+ 0.1
√
n×λ(x
ij
)e
ij
Here λ(z
j
) =|z
j
| + 1 and λ(x
ij
) =|x
ij
| + 1 such that the variance of y
ij
is the smallest when
predictors are close to their mean. In the added term 0.1
√
n ,
√
n was chosen so that the
variance is positively related with cluster sizes and the ratio of the variance for the largest
cluster to the smallest cluster was 5 : 1. This ratio represents a severe violation of
homoscedasticity assumption (Vallejo et al., 2015). I further multiplied
√
n by 0.1 to control
the difference in variance between homoscedasticity condition and heteroscedasticity
condition to a reasonable level. Otherwise, the variance of heteroscedasticity condition can
be large for clusters with large cluster size.
16
Evaluation Criteria
Relative bias and Type I error rates were used to evaluate the two estimators.
Specifically, relative bias for fixed effects standard errors was calculated by (
P
(
ˆ
θ−θ)
θ
)/R
(McNeish & Stapleton, 2016). Since the interest is in comparing standard error estimators,
ˆ
θ
is the estimated standard error for fixed effects, and θ is the population standard error,
calculated as the standard deviation of the fixed effects parameter estimates across
replications. Previous studies have suggested the relative bias within±10% is considered
negligible (Muthén & Muthén, 2002).
In this study, I focus on the single parameter test, so the fixed-effect parameters were
tested using the t test with df adjusted by the Satterthwaite approximation at a significance
level of 0.05. The Type I error rate was calculated using the number of replications that
reject the null hypothesis divided by the total number of replications. I used Bradley’s
liberal criterion to evaluate the performance of standard error estimators on Type I error
rate. Specifically, if the Type I error rate falls between 0.025 and 0.075, it is considered
robust. Otherwise, the test is too liberal (above 0.075) or too conservative (below 0.025)
(Bradley, 1978).
Results
Relative Bias for Standard Errors
Results showed the influences of small sample sizes and unbalanced cluster sizes on
two standard error estimators, KR and the adjusted CR-SEs, were limited. This finding was
consistent for all fixed-effect parameters, both in estimating standard errors and the
associated Type I error rates. Thus, I describe the general pattern of the two estimators’
performances under the influence of these two factors, then transition to the impact of
heteroscedasticity for each parameter in detail. One example graph showing the influence of
small samples and unbalanced cluster sizes was provided. The other graphs were similar and
17
provided in the supplemental materials.
The influence of small sample sizes was consistent across all conditions. Figure 1
shows an example of how these two estimators performed on small samples when estimating
the standard error of the intercept γ
00
. Using±0.10 as a cutoff, both estimators performed
well. The finding supported the literature that the number of clusters had a larger impact on
estimations than the average cluster size. An increase in ¯ n had a negligible influence on
estimating standard errors of the intercept. However, an increase in the number of clusters,
such as from 15 clusters to 50 clusters, can yield a noticeable improvement in the estimation.
Similar to the impact of small sample size, the impacts of unbalanced cluster sizes on
KR and the adjusted CR-SEs were small across all conditions. This finding was not
influenced by sample size and heteroscedasticity. As shown in Figure 1, both estimators
performed slightly better when the cluster sizes were balanced compared to unbalanced.
The impact of heteroscedasticity varied depending on the specific parameter. For
grand intercept γ
00
, standard errors adjusted by both KR and the adjusted CR-SEs had
small biases across conditions using the cutoff of ±0.10. Overall, standard errors corrected
by KR had bias from -0.05 to 0.05 (M = -0.01, SD = 0.02). Standard errors corrected by the
adjusted CR-SEs had bias from -0.09 to 0.01 (M = -0.02, SD = 0.02). Figure 2 shows both
estimators were robust to level two heteroscedasticity, especially when the number of clusters
was large. Similarly, level one heteroscedasticity yielded only a small influence on these two
standard error estimators.
For the between-cluster coefficient ( γ
01
), KR’s corrected standard errors of γ
01
had
biases ranging from -0.29 to 0.05 (M = -0.13, SD = 0.12), and the adjusted CR-SEs’
corrected standard errors had smaller biases from -0.15 to 0.00 (M = -0.07, SD = 0.04).
Figure 3 shows the performances of two estimators when the level two homoscedasticity
assumption was violated. KR performed well when the level two variance was homoscedastic.
18
However, when the assumption was violated, KR’s corrected standard errors had large biases
for γ
01
. The adjusted CR-SEs constantly underestimated the standard errors of γ
01
across
conditions, with worse performance in heteroscedastic conditions. Figure 4 shows two
estimators’ performances when the level one homoscedasticity assumption was violated. In
contrast to level two heteroscedasticity, level one heteroscedasticity had a negligible impact
on both estimators. Note the large range of bias yielded by KR resulted from the large
difference in its performance for the level two heteroscedasticity.
For the within-cluster coefficient ( γ
10
), the impact of heteroscedasticity on these two
estimators was negligible across levels. Figure 5 shows estimators’ performances when the
level two or level one homoscedasticity assumption was violated. In this case, level two
heteroscedasticity and level one heteroscedasticity yielded highly similar influences. Both
standard error estimators were robust to heteroscedasticity at all levels when estimating
within-cluster effect.
Type I Error Rate
The adjusted CR-SEs controlled the Type I error rate close to the stated α level for
between-cluster coefficient( γ
01
), with magnitude from 0.04 to 0.07 (M = 0.05, SD = 0.01).
KR’s corrected t-tests had a higher Type I error rate ranging from 0.02 to 0.15 (M = 0.09,
SD = 0.04). Figure 6 shows the Type I error rates for the between-cluster coefficient ( γ
01
)
when the level two homoscedasticity assumption was violated. Similar to the relative bias
pattern for standard errors, KR’s adjusted t-tests performed well in homoscedastic
conditions, but had highly inflated Type I error rates in heteroscedastic conditions. In
contrast, the adjusted CR-SEs’ corrected t-tests performed well across conditions. Figure 7
shows the Type I error rates when the level one homoscedasticity assumption was violated.
The influence of level one heteroscedasticity on KR and the adjusted CR-SEs was negligible,
suggesting both standard error estimators were robust to level one heteroscedasticity when
estimating between-cluster coefficient.
19
Violations of homogeneity at both levels did not influence the Type I error rate of
within-cluster coefficient ( γ
10
). Both estimators performed well controlling for the Type I
error rates of γ
10
using Bradley’s liberal criterion, with KR’s corrected t-tests had rates from
0.03 to 0.06 (M = 0.05, SD = 0.01) and the adjusted CR-SEs’ corrected t-tests had slightly
higher rates from 0.04 to 0.07 (M = 0.05, SD = 0.01). Figure 8 shows estimators’
performances when the level two or level one homoscedasticity assumption was violated.
Similar to the relative bias, heteroscedasticity at two levels yielded highly similar influences
on KR and the adjusted CR-SEs.
Discussion
The results showed both standard error estimators performed well when estimating
the standard errors of intercepts and within-cluster coefficient. The Type I error rates for
within-cluster coefficient were acceptable according to Bradley’s liberal criterion.
Heteroscedasticity at level two generated a large impact on estimations of between-cluster
coefficient’s standard errors. Specifically, KR performed well when homoscedasticity holds,
but the corrected standard errors had large biases and inflated Type I error rates when the
assumption was violated. In contrast, the adjusted CR-SEs underestimated the standard
errors of between-cluster coefficient for both homoscedastic conditions and heteroscedastic
conditions, but controlled the Type I error rate slightly above the stated α level.
Heteroscedasticity at level one only yielded a small impact on between-cluster coefficient’s
standard errors and the associated Type I error rates. Unbalanced cluster sizes had small
impacts on the relative bias of standard errors and Type I error rates for both methods.
This study is consistent with previous studies and adds to the literature showing that
the adjusted CR-SEs can provide more accurate inferences for the between-cluster effect
than the KR adjustments when heteroscedasticity exists. The findings support that using
the t-test based on adjusted CR-SEs, an extension of the BRL correction, with the
Satterthwaite approximated degrees of freedom can control the Type I error rate close to the
20
nominal level (Imbens & Kolesár, 2016; Pustejovsky & Tipton, 2018). Moreover, the result is
consistent with previous studies that the KR can largely improve the performance of REML
estimates on small samples in estimating the variance of fixed-effects parameters and making
inferences (McNeish & Stapleton, 2016; Spilke et al., 2005).
Although the adjusted CR-SEs performed well in the current study, it is important to
note that this standard error estimator is essentially a heteroscedastic-robust estimator.
Robustness is a broad notion of statistical methods guarding against all kinds of violation of
distributional assumptions, such as nonnormality, heteroscedasticity and outliers (Wilcox,
2016). CR-SEs are only robust to one kind of violation, the homogeneity of variance, and
can perform poorly when other assumptions are violated. For example, a slight departure
from normality and a few outliers can seriously influence the adjusted CR-SEs’ performance
(MacKinnon, 2012).
Another line of research to improve small sample inferences is bootstrap methods
with asymptotic refinement (Cameron & Miller, 2015). For example, the wild bootstrap can
obtain more accurate p-values (Cameron & Miller, 2015), and the percentile bootstrap can
be used to get more accurate confidence intervals (MacKinnon, 2012). For clustered data,
Cameron et al. (2008) proposed a cluster generalization of the wild bootstrap method, which
has been shown to perform well in the small number of clusters. Lai (2020) compared five
bootstrap confidence intervals for multilevel effect size and found residual bootstrap with
basic confidence intervals performed the best for small samples. Although bootstrap
methods can estimate the covariance matrices, it is more computationally expensive than
CR-SEs with small sample adjustments in linear regression models. Instead, it is better to
use the heteroscedastic-robust test statistic in conjunction with the wild bootstrap for
making accurate inferences (MacKinnon, 2012).
This study has several limitations. First, because the heteroscedasticity at two levels
were simulated depending on the unbalanced cluster sizes, the average variance of
21
homoscedastic conditions and the heteroscedastic conditions are unequal when the average
cluster size is small. Future studies should consider alternative methods to simulate
heteroscedastic variance with unbalanced cluster sizes.
Second, I only compared the two standard error estimators in the relative bias of
fixed effects standard errors and associated Type I error rates. However, power is also an
important factor to consider since both KR and adjusted CR-SEs were developed to control
the Type I error rates, which might associate with power loss. Past studies showed that
power decreases monotonically for small sample adjustments from HC1 to HC3 (MacKinnon,
2012). Luke (2017) examined the corrected power of KR, the power corrected for
anticonservativity proposed by Barr et al. (2013), and found it did not result in noticeable
power loss. Future research could include power as another criterion to compare these two
estimators.
Furthermore, only the single-parameter test is examined in this study, but in other
research both estimators have been extended to multiple parameter tests. For example,
Kowalchuk et al. (2004) applied the KR corrections on mixed-model F tests and they found
KR can effectively control the Type I error rates in small samples. Pustejovsky and Tipton
(2018) examined the adjusted CR-SEs in multiple parameter tests and found it controlled the
Type I error rate well across conditions. However, it is still unclear how the adjusted CR-SEs
will perform in the multiple-parameter hypothesis tests in the same context as the current
study.
In conclusion, researchers should consider using CR-SEs to guard against level two
heteroscedasticity in small samples with unbalanced cluster sizes. The common practice for
small samples, KR, can generate inflated Type I error rates for the between-cluster effects.
Future work could compare the power of these two estimators and examine their
performances in the multiple-parameter test. Note that CR-SEs are only robust to
heteroscedasticity, so it is recommended to consider using bootstrap methods together with
22
CR-SEs for more accurate statistical inferences guarding against other model
misspecifications, such as nonnormality and outliers.
23
References
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for
confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language,
68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models
using lme4. Journal of Statistical Software, 67(1), 1–48.
https://doi.org/10.18637/jss.v067.i01
Bell, R., & Mccaffrey, D. (2002). Bias reduction in standard errors for linear regression with
multi-stage samples. Survey Methodology, 28(2), 169–181.
Blanca, M. J., Alarcón, R., Arnau, J., Bono, R., & Bendayan, R. (2018). Effect of variance
ratio on ANOVA robustness: Might 1.5 be the limit? Behavior Research Methods,
50(3), 937–962. https://doi.org/10.3758/s13428-017-0918-2
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical
Psychology, 31(2), 144–152.
https://doi.org/https://doi.org/10.1111/j.2044-8317.1978.tb00581.x
Browne, W., & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods
for fitting multilevel models. Bayesian Analysis, 1(3).
https://doi.org/10.1214/06-BA117
Busing, F. (1993). Distribution characteristics of variance estimates in two-level models: A
Monte Carlo study (Doctoral dissertation). Leiden University.
https://doi.org/10.13140/RG.2.2.18116.94088
Cameron, A. C., Gelbach, J. B., & Miller, D. L. (2008). Bootstrap-based improvements for
inference with clustered errors. The Review of Economics and Statistics, 90(3),
414–427. https://doi.org/10.1162/rest.90.3.414
Cameron, A. C., & Miller, D. L. (2015). A practitioner’s guide to cluster-robust inference.
Journal of Human Resources, 50(2), 317–372. https://doi.org/10.3368/jhr.50.2.317
24
Chalmers, R. P., & Adkins, M. C. (2020). Writing effective and reliable Monte Carlo
simulations with the SimDesign package. The Quantitative Methods for Psychology,
16(4), 248–280. https://doi.org/10.20982/tqmp.16.4.p248
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). LErlbaum
Associates.
Darandari, E. Z. (2004). Robustness of hierarchical linear model parameter estimates under
violations of second-level residual homoskedasticity and independence assumptions
(Doctoral dissertation). Florida State University. Retrieved March 4, 2021, from
http://diginole.lib.fsu.edu/islandora/object/fsu%3A169031/
Dedrick, R. F., Ferron, J. M., Hess, M. R., Hogarty, K. Y., Kromrey, J. D., Lang, T. R.,
Niles, J. D., & Lee, R. S. (2009). Multilevel modeling: A review of methodological
issues and applications. Review of Educational Research, 79(1), 69–102.
https://doi.org/10.3102/0034654308325581
Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical
models (1st edition). Cambridge University Press.
Goldstein, H. (1989). Restricted unbiased iterative generalized least-squares estimation.
Biometrika, 76, 622–623. https://doi.org/10.1093/biomet/76.3.622
Goldstein, H. (2010). Multilevel statistical models (4th edition). Wiley.
Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning
group-randomized trials in education. Educational Evaluation and Policy Analysis,
29(1), 60–87. https://doi.org/10.3102/0162373707299706
Hox, J. J., & Maas, C. J. M. (2001). The accuracy of multilevel structural equation modeling
with pseudobalanced groups and small samples. Structural Equation Modeling, 8(2),
157–174.
https://doi.org/http://dx.doi.org.libproxy2.usc.edu/10.1207/S15328007SEM0802_1
25
Hox, J. J., Maas, C. J. M., & Brinkhuis, M. J. S. (2010). The effect of estimation method
and sample size in multilevel structural equation modeling. Statistica Neerlandica,
64(2), 157–170. https://doi.org/https://doi.org/10.1111/j.1467-9574.2009.00445.x
Imbens, G. W., & Kolesár, M. (2016). Robust standard errors in small samples: Some
practical advice. The Review of Economics and Statistics, 98(4), 701–712.
https://doi.org/10.1162/REST_a_00552
Kackar, R. N., & Harville, D. A. (1984). Approximations for standard errors of estimators of
fixed and random effect in mixed linear models. Journal of the American Statistical
Association, 79(388), 853–862. https://doi.org/10.2307/2288715
Kasim, R. M., & Raudenbush, S. W. (1998). Application of Gibbs sampling to nested
variance components models with heterogeneous within-group variance. Journal of
Educational and Behavioral Statistics, 23(2), 93–116.
https://doi.org/10.3102/10769986023002093
Kenward, M. G., & Roger, J. H. (1997). Small sample inference for fixed effects from
restricted maximum likelihood. Biometrics, 53(3), 983–997.
https://doi.org/10.2307/2533558
Korendijk, E. J., Maas, C. J., Moerbeek, M., & Van der Heijden, P. G. (2008). The influence
of misspecification of the heteroscedasticity on multilevel regression parameter and
standard error estimates. Methodology, 4(2), 67–72.
https://doi.org/10.1027/1614-2241.4.2.67
Kowalchuk, R. K., Keselman, H. J., Algina, J., & Wolfinger, R. D. (2004). The analysis of
repeated measurements with mixed-model adjusted F tests. Educational and
Psychological Measurement, 64(2), 224–242.
https://doi.org/10.1177/0013164403260196
Kreft, I. G. G., & Leeuw, J. d. (1998). Introducing multilevel modeling (1st edition). SAGE
Publications Ltd.
26
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in
linear mixed effects models. Journal of Statistical Software, 82(13), 1–26.
https://doi.org/10.18637/jss.v082.i13
Lai, M. H. C. (2020). Bootstrap confidence intervals for multilevel standardized effect size.
Multivariate Behavioral Research, 1–21.
https://doi.org/10.1080/00273171.2020.1746902
Luke, S. G. (2017). Evaluating significance in linear mixed-effects models in R. Behavior
Research Methods, 49(4), 1494–1502. https://doi.org/10.3758/s13428-016-0809-y
Maas, C. J. M., & Hox, J. J. (2004). The influence of violations of assumptions on multilevel
parameter estimates and their standard errors. Computational Statistics & Data
Analysis, 46(3), 427–440. https://doi.org/10.1016/j.csda.2003.08.006
Maas, C. J. M., & Hox, J. J. (2005). Sufficient sample sizes for multilevel modeling.
Methodology, 1(3), 86–92. https://doi.org/10.1027/1614-2241.1.3.86
MacKinnon, J. G. (2012). Thirty years of heteroskedasticity-robust inference. In X. Chen &
N. R. Swanson (Eds.), Recent advances and future directions in causality, prediction,
and specification analysis: Essays in honor of Halbert L. White Jr (pp. 437–461).
Springer. https://doi.org/10.1007/978-1-4614-1653-1_17
MacKinnon, J. G., & White, H. (1985). Some heteroskedasticity-consistent covariance matrix
estimators with improved finite sample properties. Journal of Econometrics, 29(3),
305–325. https://doi.org/10.1016/0304-4076(85)90158-7
Mancl, L. A., & DeRouen, T. A. (2001). A covariance estimator for GEE with improved
small-sample properties. Biometrics, 57(1), 126–34.
https://doi.org/10.1111/j.0006-341X.2001.00126.x
McNeish, D. (2016). On using Bayesian methods to address small sample problems.
Structural Equation Modeling: A Multidisciplinary Journal, 23(5), 750–773.
https://doi.org/10.1080/10705511.2016.1186549
27
McNeish, D. (2017). Small sample methods for multilevel modeling: A colloquial elucidation
of reml and the Kenward-Roger correction. Multivariate Behavioral Research, 52(5).
https://doi.org/10.1080/00273171.2017.1344538
McNeish, D., & Stapleton, L. M. (2016). The effect of small sample size on two-level model
estimates: A review and illustration. Educational Psychology Review, 28(2), 295–314.
https://doi.org/10.1007/s10648-014-9287-x
McNeish, D., & Stapleton, L. M. (2017). On the unnecessary ubiquity of hierarchical linear
modeling. Psychological Methods, 22(1), 114–140.
https://doi.org/10.1037/met0000078
Muthén, L. K., & Muthén, B. O. (2002). How to use a Monte Carlo study to decide on
sample size and determine power. Structural Equation Modeling: A Multidisciplinary
Journal, 9(4), 599–620. https://doi.org/10.1207/S15328007SEM0904_8
Pustejovsky, J. E. (2021). Clubsandwich: Cluster-Robust (sandwich) variance estimators with
small-sample corrections [R package version 0.5.3].
https://CRAN.R-project.org/package=clubSandwich
Pustejovsky, J. E., & Tipton, E. (2018). Small-sample methods for cluster-robust variance
estimation and hypothesis testing in fixed effects models. Journal of Business &
Economic Statistics, 36(4), 672–683. https://doi.org/10.1080/07350015.2016.1247004
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and data
analysis methods (2nd edition). SAGE Publications, Inc.
Satterthwaite, F. E. (1946). An approximate distribution of estimates of variance
components. Biometrics Bulletin, 2(6), 110–114. https://doi.org/10.2307/3002019
Schielzeth, H., Dingemanse, N. J., Nakagawa, S., Westneat, D. F., Allegue, H., Teplitsky, C.,
Réale, D., Dochtermann, N. A., Garamszegi, L. Z., & Araya-Ajoy, Y. G. (2020).
Robustness of linear mixed-effects models to violations of distributional assumptions.
Methods in Ecology and Evolution, 11(9), 1141–1152.
https://doi.org/https://doi.org/10.1111/2041-210X.13434
28
Snijders, T. A. B. (2011). Multilevel analysis: An introduction to basic and advanced
multilevel modeling (Second edition). SAGE Publications Ltd.
Snijders, T. A. B., & Bosker, R. J. (1993). Standard errors and sample sizes for two-level
research. Journal of Educational Statistics, 18(3), 237–259.
https://doi.org/10.3102/10769986018003237
Spilke, J., Piepho, H.-P., & Hu, X. (2005). A simulation study on tests of hypotheses and
confidence intervals for fixed effects in mixed models for blocked experiments with
missing data. Journal of agricultural, biological, and environmental statistics, 10(3),
374–389. https://doi.org/10.1198/108571105X58199
Vallejo, G., Fernández, M. P., & Livacic-Rojas, P. E. (2010). Analysis of unbalanced factorial
designs with heteroscedastic data. Journal of Statistical Computation and Simulation,
80(1), 75–88. https://doi.org/10.1080/00949650802482386
Vallejo, G., Fernández, P., Cuesta, M., & Livacic-Rojas, P. E. (2015). Effects of modeling the
heterogeneity on inferences drawn from multilevel designs. Multivariate Behavioral
Research, 50(1), 75–90. https://doi.org/10.1080/00273171.2014.955604
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct
test for heteroskedasticity. Econometrica, 48(4), 817–838.
https://doi.org/10.2307/1912934
Wilcox, R. R. (2016). Introduction to Robust Estimation and Hypothesis Testing (4th
edition). Academic Press.
Yuan, K.-H., & Bentler, P. M. (2002). On normal theory based inference for multilevel
models with distributional violations. Psychometrika, 67(4), 539–561.
https://doi.org/10.1007/BF02295130
Zeger, S. L., Liang, K.-Y., & Albert, P. S. (1988). Models for longitudinal data: A
generalized estimating equation approach. Biometrics, 44(4), 1049–1060.
https://doi.org/10.2307/2531734
29
unbalanced balanced
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure 1
Relative Bias for Standard Errors of Intercept (γ
00
). Note ¯ n is the average cluster size, SE
means the standard error of the intercept. The gray lines represent the lower and upper
bounds of acceptable values of relative bias for standard errors.
30
n = 6 n = 30
lev1 - Homoscedastic
lev2 - Homoscedastic lev2 - Heteroscedastic
lev1 - Heteroscedastic
lev2 - Homoscedastic lev2 - Heteroscedastic
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure 2
Relative Bias for Standard Errors of Between-Cluster coefficient ( γ
01
) With Heteroscedasticity
at Level One. Note ¯ n is the average cluster size, SE means the standard error of the between-
cluster coefficient. The gray lines represent the lower and upper bounds of acceptable values
of relative bias for standard errors.
31
lev2 - Homoscedastic lev2 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.3
−0.2
−0.1
0.0
0.1
−0.3
−0.2
−0.1
0.0
0.1
Relative Bias for SE
Number of clusters
15
30
50
Figure 3
Relative Bias for Standard Errors of Between-Cluster Coefficient ( γ
01
) With Heteroscedasticity
at Level Two. Note ¯ n is the average cluster size, SE means the standard error of the between-
cluster coefficient. The gray lines represent the lower and upper bounds of acceptable values
of relative bias for standard errors.
32
lev1 - Homoscedastic lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.3
−0.2
−0.1
0.0
0.1
−0.3
−0.2
−0.1
0.0
0.1
Relative Bias for SE
Number of clusters
15
30
50
Figure 4
Relative Bias for Standard Errors of Between-Cluster coefficient ( γ
01
) With Heteroscedasticity
at Level One. Note ¯ n is the average cluster size, SE means the standard error of the between-
cluster coefficient. The gray lines represent the lower and upper bounds of acceptable values
of relative bias for standard errors.
33
n = 6 n = 30
lev2 - Homoscedastic
lev1 - Homoscedastic lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic lev1 - Heteroscedastic
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure 5
Relative Bias for Standard Errors of Within-cluster Coefficient ( γ
10
). Note ¯ n is the average
cluster size, SE means the standard error of the within-cluster coefficient. The gray lines
represent the lower and upper bounds of acceptable values of relative bias for standard errors.
34
lev2 - Homoscedastic lev2 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.04
0.08
0.12
0.16
0.04
0.08
0.12
0.16
Type I error rate
Number of clusters
15
30
50
Figure 6
Type I Error Rate for Standard Errors of Between-Cluster coefficient ( γ
01
). Note ¯ n is the
average cluster size. The gray lines represent the lower and upper acceptable levels of Type I
error rates by Bradley’s liberal criterion.
35
lev1 - Homoscedastic lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.04
0.08
0.12
0.16
0.04
0.08
0.12
0.16
Type I error rate
Number of clusters
15
30
50
Figure 7
Type I Error Rate for Standard Errors of Between-Cluster coefficient ( γ
01
).Note ¯ n is the
average cluster size. The gray lines represent the lower and upper acceptable levels of Type I
error rates by Bradley’s liberal criterion.
36
n = 6 n = 30
lev2 - Homoscedastic
lev1 - Homoscedastic lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic lev1 - Heteroscedastic
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.03
0.04
0.05
0.06
0.07
0.03
0.04
0.05
0.06
0.07
Type I error rate
Number of clusters
15
30
50
Figure 8
Type I Error Rate for Standard Errors of Within-Cluster coefficient ( γ
10
). Note ¯ n is the
average cluster size. The gray lines represent the lower and upper acceptable levels of Type I
error rates by Bradley’s liberal criterion.
37
Appendix A
Bias
Intercept (Gamma 00)
unbalanced balanced
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure A1
Relative Bias for Standard Errors of Intercept (γ
00
) General Pattern. Note ¯ n is the average
cluster size, SE means the standard error of the intercept. The gray lines represent the lower
and upper bounds of acceptable values of relative bias for standard errors.
38
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure A2
Relative Bias for Standard Errors of Intercept (γ
00
) With Balanced Clusters. Note ¯ n is the
average cluster size, SE means the standard error of the intercept. The gray lines represent
the lower and upper bounds of acceptable values of relative bias for standard errors.
39
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure A3
Relative Bias for Standard Errors of Intercept (γ
00
) With Unbalanced Clusters. Note ¯ n is the
average cluster size, SE means the standard error of the intercept. The gray lines represent
the lower and upper bounds of acceptable values of relative bias for standard errors.
40
Gamma 01 (Coefficient for Z)
unbalanced balanced
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.3
−0.2
−0.1
0.0
0.1
−0.3
−0.2
−0.1
0.0
0.1
Relative Bias for SE
Number of clusters
15
30
50
Figure A4
Relative Bias for Standard Errors of Between-Cluster Coefficient ( γ
01
) General Pattern. Note
¯ n is the average cluster size, SE means the standard error of the between-cluster coefficient.
The gray lines represent the lower and upper bounds of acceptable values of relative bias for
standard errors.
41
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.3
−0.2
−0.1
0.0
0.1
−0.3
−0.2
−0.1
0.0
0.1
Relative Bias for SE
Number of clusters
15
30
50
Figure A5
Relative Bias for Standard Errors of Between-Cluster Coefficient ( γ
01
) With Balanced Clusters.
Note ¯ n is the average cluster size, SE means the standard error of the between-cluster
coefficient. The gray lines represent the lower and upper bounds of acceptable values of relative
bias for standard errors.
42
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.3
−0.2
−0.1
0.0
0.1
−0.3
−0.2
−0.1
0.0
0.1
Relative Bias for SE
Number of clusters
15
30
50
Figure A6
Relative Bias for Standard Errors of Between-Cluster Coefficient ( γ
01
) With Unbalanced
Clusters. Note ¯ n is the average cluster size, SE means the standard error of the between-cluster
coefficient. The gray lines represent the lower and upper bounds of acceptable values of relative
bias for standard errors.
43
Gamma 10 (Coefficient for X)
unbalanced balanced
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure A7
Relative Bias for Standard Errors of Within-cluster Coefficient ( γ
10
) General Pattern. Note
¯ n is the average cluster size, SE means the standard error of the within-cluster coefficient.
The gray lines represent the lower and upper bounds of acceptable values of relative bias for
standard errors.
44
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure A8
Relative Bias for Standard Errors of Within-cluster Coefficient ( γ
10
) With Balanced Clusters.
Note ¯ n is the average cluster size, SE means the standard error of the within-cluster coefficient.
The gray lines represent the lower and upper bounds of acceptable values of relative bias for
standard errors.
45
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
−0.10
−0.05
0.00
0.05
0.10
−0.10
−0.05
0.00
0.05
0.10
Relative Bias for SE
Number of clusters
15
30
50
Figure A9
Relative Bias for Standard Errors of Within-cluster Coefficient ( γ
10
) With Unbalanced Clusters.
Note ¯ n is the average cluster size, SE means the standard error of the within-cluster coefficient.
The gray lines represent the lower and upper bounds of acceptable values of relative bias for
standard errors.
46
Appendix B
Type I Error Rates
Gamma 01 (Coefficient for Z)
unbalanced balanced
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.04
0.08
0.12
0.16
0.04
0.08
0.12
0.16
Type I error rate
Number of clusters
15
30
50
Figure B1
Type I Error Rate for Standard Errors of Between-Cluster coefficient ( γ
01
) General Pattern.
Note ¯ n is the average cluster size. The gray lines represent the lower and upper acceptable
levels of Type I error rates by Bradley’s liberal criterion.
47
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.04
0.08
0.12
0.16
0.04
0.08
0.12
0.16
Type I error rate
Number of clusters
15
30
50
Figure B2
Type I Error Rate for Standard Errors of Between-Cluster coefficient ( γ
01
) With Balanced
Clusters. Note ¯ n is the average cluster size. The gray lines represent the lower and upper
acceptable levels of Type I error rates by Bradley’s liberal criterion.
48
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.04
0.08
0.12
0.04
0.08
0.12 Type I error rate
Number of clusters
15
30
50
Figure B3
Type I Error Rate for Standard Errors of Between-Cluster coefficient ( γ
01
) With Unbalanced
Clusters. Note ¯ n is the average cluster size. The gray lines represent the lower and upper
acceptable levels of Type I error rates by Bradley’s liberal criterion.
49
Gamma 10 (Coefficient for X)
unbalanced balanced
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.03
0.04
0.05
0.06
0.07
0.03
0.04
0.05
0.06
0.07
Type I error rate
Number of clusters
15
30
50
Figure B4
Type I Error Rate for Standard Errors of Within-Cluster coefficient ( γ
10
) General Pattern.
Note ¯ n is the average cluster size. The gray lines represent the lower and upper acceptable
levels of Type I error rates by Bradley’s liberal criterion.
50
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.03
0.04
0.05
0.06
0.07
0.03
0.04
0.05
0.06
0.07
Type I error rate
Number of clusters
15
30
50
Figure B5
Type I Error Rate for Standard Errors of Within-Cluster coefficient ( γ
10
) With Balanced
Clusters. Note ¯ n is the average cluster size. The gray lines represent the lower and upper
acceptable levels of Type I error rates by Bradley’s liberal criterion.
51
lev2 - Homoscedastic
lev1 - Homoscedastic
lev2 - Homoscedastic
lev1 - Heteroscedastic
lev2 - Heteroscedastic
lev1 - Homoscedastic
lev2 - Heteroscedastic
lev1 - Heteroscedastic
n = 6 n = 30
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
Kenward
Roger
Adjusted
CR−SEs
0.03
0.04
0.05
0.06
0.07
0.03
0.04
0.05
0.06
0.07
Type I error rate
Number of clusters
15
30
50
Figure B6
Type I Error Rate for Standard Errors of Within-Cluster coefficient ( γ
10
) With Unbalanced
Clusters. Note ¯ n is the average cluster size. The gray lines represent the lower and upper
acceptable levels of Type I error rates by Bradley’s liberal criterion.
52
Abstract (if available)
Abstract
Multilevel modeling (MLM) is commonly used in psychological research to model clustered data. One of the assumptions of MLM is the homogeneity of variance. However, data in experimental research are usually small samples with heteroscedastic variances and unbalanced cluster sizes. The fixed-effect estimates given by the maximum likelihood method remain unbiased, but the standard errors for the fixed effects are underestimated, resulting in inaccurate inferences and inflated Type I error rates. Small-sample corrections have been used to correct the bias in fixed effects standard errors and provide valid inferences, such as the Kenward-Roger (KR) adjustment and the adjusted cluster-robust standard errors (CR-SEs) with the Satterthwaite approximation for t-tests. The current study compares these two standard error estimators for small samples using a Monte Carlo simulation study. Results show KR's adjusted standard errors have large biases and inflated Type I error rates for between-cluster effect in the presence of level two heteroscedasticity. In contrast, the adjusted CR-SEs maintain the Type I error rates close to the nominal level. Thus, it is recommended to use the adjusted CR-SEs to guard against unmodeled heterogeneity when the interest is to make inferences of the between-cluster effect.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
A Bayesian region of measurement equivalence (ROME) framework for establishing measurement invariance
PDF
Incorporating uncertainty in design parameters: a hybrid classical-Bayesian power analysis approach for two-level cluster randomized trials
PDF
On the latent change score model in small samples
PDF
Pooling historical information while addressing uncertainty and bias for power analysis: a Bayesian approach for designing single-level and multilevel studies
PDF
A systematic review of measurement invariance research of the CES-D scale across gender
PDF
Comparing skipped correlations: the overlapping case
PDF
Robustness of rank-based and other robust estimators when comparing groups
PDF
Bound in hatred: a multi-methodological investigation of morally motivated acts of hate
PDF
Estimation of nonlinear mixed effects mixture models with individually varying measurement occasions
PDF
Sex differences in moral judgements across 67 countries
PDF
Essays on treatment effect and policy learning
PDF
Preparing for natural disasters: investigating the effects of gain-loss framing on individual choices
PDF
Biometric models of psychopathic traits in adolescence: a comparison of item-level and sum-score approaches
PDF
Making appropriate decisions for nonnormal data: when skewness and kurtosis matter for the nominal response model
PDF
Enriching the Demographic Survey sampling for the Los Angeles County Annual Homeless Count with spatial statistics
PDF
Multi-group Multidimensional Classification Accuracy Analysis (MMCAA): a general framework for evaluating the practical impact of partial invariance
PDF
Essays on estimation and inference for heterogeneous panel data models with large n and short T
PDF
A preliminary evaluation of a telehealth approach to acceptance and commitment training (ACT) for enhancing behavioral parent training (BPT) for Chinese parents
PDF
A measure of weight management strategies and evidence for its utility
PDF
Measuring truth detection ability in social media following extreme events
Asset Metadata
Creator
Zhang, Yichi
(author)
Core Title
Evaluating standard error estimators for multilevel models on small samples with heteroscedasticity and unbalanced cluster sizes
School
College of Letters, Arts and Sciences
Degree
Master of Arts
Degree Program
Psychology
Degree Conferral Date
2021-08
Publication Date
07/30/2021
Defense Date
06/13/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
heteroscedasticity,multilevel modeling,OAI-PMH Harvest,small samples,unbalanced clusters,variance estimation
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Lai, Hok Chio (
committee chair
), John, Richard (
committee member
), Wilcox, Rand (
committee member
)
Creator Email
kathyzhang2019@gmail.com,yzhang97@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC15670113
Unique identifier
UC15670113
Legacy Identifier
etd-ZhangYichi-9944
Document Type
Thesis
Format
application/pdf (imt)
Rights
Zhang, Yichi
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
heteroscedasticity
multilevel modeling
small samples
unbalanced clusters
variance estimation