Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Multivariate biometric modeling among multiple traits across different raters in child externalizing behavior studies
(USC Thesis Other)
Multivariate biometric modeling among multiple traits across different raters in child externalizing behavior studies
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
MULTIVARIATE BIOMETRIC MODELING AMONG MULTIPLE TRAITS
ACROSS DIFFERENT RATERS
IN CHILD EXTERNALIZING BEHAVIOR STUDIES
by
Mo Zheng
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(PSYCHOLOGY)
May 2009
Copyright 2009 Mo Zheng
ii
Table of Contents
List of Tables iv
List of Figures v
Abstract vi
Chapter One: Introduction 1
1.1 Externalizing Behavior Problems 1
1.2 The Latent Externalizing Factor 2
1.3 Genetic Etiology of Externalizing Behaviors 3
1.4 Gender Differences 4
1.5 Rater Effects 5
Chapter Two: Methodological Issues in Genetic Externalizing Studies 8
2.1 Twin Study Design 8
2.2 Assumptions of Twin Models 11
2.3 Multi-trait Twin Models 11
2.4 Multi-rater Twin Models 17
2.5 Multi-trait Multi-rater Models 20
Chapter Three: Statistical Models 30
3.1 Measurement Models 30
3.2 Biometric Models – Psychometric Raters 36
3.3 Model Invariance across Sex 37
3.4 Aim of Study 39
Chapter Four: Research Methods 40
4.1 Overview 40
4.2 Procedure 40
4.3 Sample Characteristics 41
4.4 Measures 42
4.5 Missing Responses 45
4.6 Model Fitting Program 45
Chapter Five: Empirical Results 46
5.1 Sex and Informant Differences 46
5.2 Data Transformation 47
5.3 Phenotypic Correlations 48
5.4 Univariate Biometric Analysis 49
5.5 Measurement Model Fitting 52
5.6 Multivariate Biometric Analysis 56
5.7 Variance Decomposition 60
iii
Chapter Six: Monte Carlo Simulations 65
6.1 Purpose of Monte Carlo Simulation 65
6.2 Data Generation Models 65
6.3 Data Analyses Models 67
6.4 Model Selection 67
6.5 Precision of Parameter Estimates 69
Chapter 7: Discussion 76
References 80
Appendix A: Model Fit Indexes 91
Appendix B: Histograms of Raw Scores and Rank Normalized Scores 94
Appendix C: Simulation Model Example Fitting PM1 on SIM4 96
Appendix D: Monte Carlo Simulation Mplus Script: Fitting PM4 on SIM4 97
iv
List of Tables
Table 1. Means, standard deviations (SD), number of participants (N) and t-tests for 46
Table 2. Phenotypic correlations for ADHD, CAQ and CBCL scores by gender
(transformed scores) 48
Table 3. Twin’s phenotypic correlations by sex and zygosity (transformed data) 49
Table 4. Univariate ACE model fits and biometric decompositions for each
phenotype 51
Table 5. Comparing fit for sex-variant and sex-invariant measurement models 54
Table 6. Comparison of six psychometric rater twin models 58
Table 7. Proportion of genetic and environmental variance separated by shared and
specific rater view factors 62
Table 8. Mean χ χ χ χ
2
values, p value and number of models successfully computed 71
Table 9. Mean AIC values and the percentage of times the simulation model was
selected as the best fit 72
Table 10.Mean BIC values and the percentage of times the simulation model
was selected as the best fit 73
Table 11.Mean RMSEA values and the percentage of times the simulation model
was selected as the best fit 74
Table 12.Precision of model PM4 parameter estimates 75
v
List of Figures
Figure 1. Univariate ACE model for twin data 10
Figure 2. Cholesky decomposition of three phenotypes (one twin only) 13
Figure 3. Independent pathways model (one twin only) 14
Figure 4. Common pathway model (one twin only) 15
Figure 5. Biometric model for ratings of a pair of twins (t1 and t2) 18
Figure 6. Psychometric model for ratings of a pair of twins by their primary
caregiver and teacher 19
Figure 7. Rater bias model for ratings of a pair of twins 20
Figure 8. Psychometric rater model from Baker, et al. (2007) 24
Figure 9. Higher order factor model structure 25
Figure 10. Hierarchical factor model structure 26
Figure 11. Hierarchical factors twin model from Burt, et al. (2005) 27
Figure 12. Bivariate psychometric rater model from Bartels, et al. (2003) 28
Figure 13. Six competing measurement models for the covariances among
the six scores 31
Figure 14. Six competing psychometric twin models 35
Figure 15. Standardized parameter estimates of the best fit 56
Figure 16. Standardized parameter estimates of the best fitting model - PM4 59
vi
Abstract
The aim of this dissertation study was to investigate the best models for estimating
genetic and environmental influences on child externalizing behavior in multivariate
multi-rater twin research. Empirical data were analyzed for a sample of 605 twin pairs
(age 9-10), drawn from a twin study of risk factors for antisocial behavior at the
University of Southern California (USC). The twins were rated by both caregivers and
teachers on several aspects of externalizing behavior using three widely used instruments:
symptoms for Attention Deficit Hyperactivity Disorder (ADHD) using the ADHD
module in the Diagnostic Interview Schedule for Children (DISC-IV), aggressive
behaviors using the Reactive and Proactive Aggression Questionnaire (RPQ), and Child
Behavior Checklist Externalizing (CBCL) behavior problems. Six competing multi-trait
multi-rater genetic models were fitted and the best fitting model was found to be the
“general factor and correlated raters model”, which include one common ADHD factor
shared by all measurements and two oblique rater factors – represented mainly by
individual rater’s view on aggression and delinquency. In terms of rater effects, mother
reports were found to be more reliable, while teachers has less ability to distinguish
different forms of externalizing behavior. This study also employed a Monte Carlo
simulation to evaluate the power and parameter estimates of the “one common and two
correlated rater factors model”. Reasonable power and sufficient precision of parameter
estimates were obtained at this sample size. All analyses for this dissertation were
conducted in the Mplus software.
1
Chapter One: Introduction
1.1 Externalizing Behavior Problems
Externalizing behaviors refer to a constellation of behavior problems that are
manifested in children's outward behavior and reflect the child negatively acting on the
external environment. They are characterized by noncompliance, aggression,
destructiveness, attention problems, impulsivity, hyperactivity, and delinquent types of
behavior (McMahon, 1994). These problems contain a major risk factor for later juvenile
delinquency, substance use, adult crime, and violence; therefore they are of particularly
concern to parents, teachers, and child psychopathology researchers (Farrington, 1989;
Moffitt, 1993).
Despite a general consensus regarding the disruptive nature of externalizing
behavior problems, there has been some confusion about the externalizing phenotypes in
literature. In the current Diagnostic and Statistical Manual of Mental Disorders (4th
Edition, Text Revision [DSM–IV–TR]; American Psychiatric Association, 2000),
childhood externalizing problems are defined by three discrete disorders: Attention
Deficit Hyperactivity Disorder (ADHD), Oppositional Defiant Disorder (ODD), and
Conduct Disorder (CD). In other literature on the empirical organization of
psychopathology in children, however, the term “externalizing behavior problems”
represents aggression and delinquency (e.g. Achenbach & Edelbrock, 1984). They are not
contradictory views though, because criteria for both CD and ODD have elements of
aggressive behavior, defiance, and violation of rules. Nevertheless, it has been argued
that ODD may not constitute a valid category on its own (Hinshaw, 1987), because the
2
early behavior problems of ODD are generally less serious than aggression and
delinquency and are viewed as the forerunner of more serious externalizing disorders
such as conduct disorder.
Distinctions are also made between “externalizing” and “antisocial” behavior
even though they are often used as synonyms for each other. For example, Shaw and
Winslow (1997) suggested using the term externalizing behavior rather than antisocial
behavior to discuss the less severe disruptive and destructive behavior of children.
Therefore externalizing behavior may be regarded as a less severe form of antisocial
behavior, especially in young children. In addition, the externalizing construct includes
hyperactivity, and there are some hyperactive children who are not antisocial, again
illustrating the difference between the terms “externalizing” and “antisocial” (Liu, 2006).
1.2 The Latent Externalizing Factor
Although specific externalizing problems (e.g. conduct disorder, aggression,
delinquency, ADH) are often conceptualized as distinct syndromes, extensive research
shows that these problems co-occur at well beyond chance levels (see, e.g., Armstrong &
Costello, 2002, and Waldman & Slutske, 2000). The fact that externalizing disorders are
correlated with each other inspires researchers to invoke the idea of a latent externalizing
spectrum (Faraone, Tsuang, & Tsuang, 1999; Maser & Patterson, 2002). In short, the idea
is that specific externalizing disorder may act as a reliable indicator of a common latent
factor or a hypothetical core psychopathological process. Specific externalizing disorders
are linked through a coherent underlying domain of human variation and can be
conceived of as their specific instantiations (Krueger, Markon, Patrick, & Iacono, 2005)
3
Multivariate analyses of patterns of co-occurrence among various externalizing
disorders have indeed revealed a factor - a coherent liability dimension - that links these
disorders and distinguishes them from other commonly occurring disorders such as
depression or anxiety which are often classified as internalizing behavior problems (see
review by Krueger, et al, 2002). Similar internalizing and externalizing factors have also
been replicated in research on the empirical organization of psychopathology in children
(e.g. Achenbach & Edelbrock, 1984).
1.3 Genetic Etiology of Externalizing Behaviors
The etiologies of the externalizing factor have not yet been fully understood,
however, recent research in child and adolescent studies suggest that genes play an
important role. First, numerous well-conducted twin studies have found significant
genetic effects on the etiology of specific externalizing behavior problems (see Krueger,
et al, 2002). Second, twin studies have also found common genetic factors linking various
forms of externalizing behavior, such as the genetic overlap between ADHD and CD
(Eaves, et al., 2000; Nadder, Silberg, Eaves, & Maes, & Eaves, 2002; Tuvblad, Zheng,
Raine & Baker, in press), and the overlap between antisocial behavior and substance use
(Grove et al., 1990; Slutske, et al., 1998).
Twin studies can provide a unique perspective for understanding the etiology of
childhood externalizing behavioral disorders because they make it possible to tease apart
the extent to which phenotypic association is due to shared genetic and/or environmental
factors. Twin studies can also make important contributions to our understanding of
comorbidity by suggesting a significant portion of the covariance among externalizing
4
behaviors can be traced to either common genes or common environmental reasons, the
finding of which might have important implications for potential prevention and
intervention efforts. Understanding the extent to which the same and/or different genes
and environments contribute to these disorders would also influence the way in which we
classify and group these disorders; if different externalizing disorders are actually the
result of the same genes, it may suggest that they should be considered a joint construct
with varying symptomatic presentation (Dick, Viken, Kaprio, Pullkkinen, & Rose, 2005).
1.4 Gender Differences
It has been well established that externalizing behaviors tend to be more frequent in
males than females. Male conceptuse appears to be more susceptible to a variety off early
developmental hazards (Angold, 2008), and it is known that boys are more likely to have
a number of externalizing disorders such as ADHD, ODD and CD, with an average male-
to-female ratio of 2.5:1 in the prevalence rates (Kessler et al., 1994; Moffitt et al., 2001,
Newman et al., 1996). These suggested that a broad range of neurodevelopmental
mechanisms are more prone to be disrupted early in boys than girls (Rutter et al., 2003).
Genetic studies on externalizing disorders seem to be closely comparable in males
and females; however, there have been inconsistent findings on comorbidity and
developmental genetic research (Eaves, et al. 2000; Hicks, et al., 2007; Moffitt et al.,
2000). For example, Moffitt (1993) reported that sex difference is most marked with life
course-persistent antisocial behavior and that antisocial behavior in females is less often
associated with ADHD. Also it remains uncertain whether ADHD in males and females
represent the same causal processes (Rhee et al., 1999).
5
There is still little evidence that boys are more susceptible to psychosocial risks for
externalizing disorders (Rutter, Caspi, & Moffitt, 2003). However, Hicks et al. (2007)
used the externalizing spectrum model to test the gender difference in the externalizing
liability. They hypothesized that gender difference existed in mean levels of the general
externalizing liability, thus they tested their hypothesis with confirmatory factor models
that incorporate the means of the individual disorders. The results showed that
emergence of gender differences were due to risk factors operating at the general
vulnerability level rather than the individual externalizing syndrome level. Hicks et al.
thus proposed that studies on sex differences should shift from examining sex difference
at the disorder level to examining them at the general factor level. However, their study
was conducted in a late adolescent and young adult sample and results have not been
confirmed in child samples. There is still lack of knowledge in research on the
development process of sex differentiation.
1.5 Rater Effects
One important issue to consider when investigating the genetic and environmental
contribution to the externalizing factor is the source of information on which
measurement is based. It is well-known that different raters report child’s behavior
differently. As shown in Achenbach, McConaughy, & Howell (1987), correlations
between raters of the same child are typically .60 between mother and father ratings, .28
between parent and teacher ratings, and .22 between parent and child ratings. Thus
externalizing behavior studies relying on single informant report may not be replicable in
another study using a different informant. For example, Silberg et al. (1996) applied
6
genetic models to the maternal ratings of 1197 twin pairs aged 8-16. Bivariate modeling
revealed that the covariation between ADHD and ODD was almost entirely due to
genetic factors. On the contrary, another study conducted by Burt, Krueger, McGue, &
Iacono (2001) on 1506 twin pairs aged 11-12, based on combined mother and child
reports, found only moderate genetic effects on the covariation among the three
externalizing disorders, while the largest contribution to the covariation seems to be a
single shared environmental factor that is common to all disorders.
Differences between raters may be due to the environment in which the
observations are made. Considering the difference between parent and teacher ratings, the
parents are more likely to compare twins with each other, and so may exaggerate their
differences and similarities. Teachers, on the other hand, can compare each twin with a
large number of children of similar age, so ratings may be more objective (Martin
Scourfield & McGuffin, 2002). Twin confusion may also be a factor, in that a teacher
who has both twins in his/her class may rate the wrong twin, whereas parents would
seldom make mistakes in reporting their own children (Simonoff et al., 1998). Moreover,
the twins might behave in a different manner at home and at school so that the behaviors
observed by parents and teachers are truly different. All these indicate that evidence from
either rater alone might not be conclusive. Therefore, inclusion of multiple ratings may
improve the assessment of true child externalizing behaviors and make children’s
behavior problems better screened.
To date there have been literally hundreds of studies investigating genetic etiology
of externalizing behaviors (cf. Krueger et al., 2002; Krueger et al., 2007). A large portion
7
of these investigations rely on a single rater to provide information, but they all address
the importance and the necessity of using various forms of self-, parent-, teacher-, or
peer-report to derive assessments. However, only few studies in the literature have
focused on the methodological issues of analyzing multiple rater data using genetically-
informed design. The present study thus represents an endeavor to contribute
methodological literature of multi-trait multi-rater genetic models in externalizing
research. It started in Chapter 2 by reviewing some commonly used methods in behavior
genetic research of externalizing behavior, specifically multi-rater analysis of data from
twins. An empirical analysis was then conducted to investigate the best externalizing
factor model for the USC twin project data. The methods for this empirical study,
including sample, instrument and procedure were covered in Chapter 3. Issues related to
specific models used were described in detail in Chapter 4, and resulted were presented in
Chapter 5. A unique feature of the current study was to use a Monte Carlo simulation
design for evaluating model fitting results, which was the focus of Chapter 6. Finally,
interpretation of the results, as well as application and limitation of the current study were
discussed in Chapter 7.
8
Chapter Two: Methodological Issues in Genetic Externalizing Studies
2.1 Twin Study Design
To investigate genetic and environmental effects on a specific externalizing
behavior, such as aggression, researchers often begin with a simple univariate twin study.
This basic twin design assumes that individual differences in a phenotype are the result of
genetic influences, shared environmental influences and non-shared environmental
influences. The genetic influences are assumed to be additive, describing the independent
effects of individual genes summed over loci. Shared environmental influences are those
shared by twins or sibling pairs reared in the same families, while non-shared
environmental influences are those unique to either member of a twin pair and make
twins differ from each other. Symbolically, the observed value of a phenotype can be
expressed as a linear function of three latent variables, additive genetic effects (A),
common or shared environmental effects (C), and non-shared environmental effects (E).
This approach to uncover the latent genetic and environmental causes of variation is
sometimes called biometrical genetical analysis (Mather and Jinks, 1982).
Within the ACE framework, twin researchers attempt to estimate how much of
the phenotypic variance is due to genetic effects, how much is due to share environmental
and how much is due to non-shared environmental effects. Assuming the independence
of the three components, the phenotypic variance can be expressed as sum of variance of
A, C and E as follows,
V
P
= V
A
+ V
C
+ V
E
. [1]
9
The amount of genetic variance expressed as a proportion of the total phenotypic
variance (V
A
/V
P
) is called the heritability (h
2
). Its estimation can be resolved by
examining the covariance between MZ and DZ twins. Since MZ twins raised in the same
family share the exact same genes and share a common environment, the observed
covariance between MZ twins thus provides an estimate of V
A
+V
C
. The differences
between MZ twins result only from unique environmental influences. DZ twins have a
common shared environment, but share only half of their genes on average, so the
covariance between DZ twins is an estimate of 0.5V
A
+ V
C
. The two estimates can be
summarized in the following equations,
COV
MZ
= V
A
+V
C
,
and [2]
COV
DZ
= .5*V
A
+V
C
, [3]
where COV
MZ
and COV
DZ
are the covariance of a given trait in pairs of MZ and DZ
twins respectively.
The above equations relate the observed phenotype to latent genetic and
environmental variables and specify the covariation among them. This technique of using
a series of equations to estimate parameters that characterize the relationship between
several latent and observed variables is called structural equation modeling (SEM), and is
now widely used in quantitative genetic research (See McArdle & Goldsmith, 1990).
From these structural equations it is possible to derive the phenotypic covariance implied
by the model through the use of matrix algebra. Parameter estimates for the SEM are then
obtained by using maximum likelihood approaches on a fitting function, which quantifies
the difference between the observed covariance matrix and the covariance matrix implied
10
by the model. These fitting functions provide a measure of how well the model fits the
data as well as the significance of each of the parameters.
Presentation of structural equations is often accompanied by path diagrams, which
is a useful heuristic tool to graphically display causal and correlational relations between
variables. In path diagrams, observed variables are often represented by square boxes,
whereas latent variables are represented by circles. Causal paths between two variables
are defined by single-headed arrows, and correlations are defined by double-headed
arrows. The strength of association between variables is measured by a path coefficient,
which is equivalent to a regression coefficient in a regression analysis, or a correlation
coefficient. Path diagrams are easier to understand than sets of equations because the
expected covariance matrix implied by a path model can be derived through the tracing
rules of path analysis (See McArdle & Goldsmith, 1990). Figure 1 is the path diagram
showing a univariate ACE model typically used in twin studies.
Figure 1. Univariate ACE model for twin data
P
T1
A
1
C
1
E
1
MZ=1.0 / DZ=0.5
V
A V
C
V
E
P
T2
A
2
C
2
E
2
V
A V
C
1.0
1.0 1.0 1.0 1.0 1.0 1.0
P
T1
A
1
C
1
E
1
MZ=1.0 / DZ=0.5
V
A V
C
V
E
P
T2
A
2
C
2
E
2
V
A V
C
1.0
1.0 1.0 1.0 1.0 1.0 1.0
11
2.2 Assumptions of Twin Models
Several important assumptions are implied by the logic of ACE models. First, the
three components A, C, and E are assumed to be independent of each other. This
assumption implies that there are no correlations or interactions between genetic and
environmental components. Second, size of the shared environment experienced by MZ
and DZ twins are assumed to be equal or balanced. This equal environment assumption
may be violated if MZ twins are treated more alike by other people than DZ twins and if
this treatment influences the trait under study. Third, it is assumed that there is no
assortative mating between spouses for the studied trait. The presence of assortative
mating may result in genetic similarity between spouses, and may increase the
resemblance between DZ twins. Finally, non-additive genetic effects, such as genetic
dominance (interaction between alleles at the same locus) and the epitasis effects
(interactions between alleles at different loci), are ignored in ACE models because they
cannot be estimated simultaneously. Non-additive genetic effects, however, could be
examined in more complex twin designs, which include extended family members such
as parents.
2.3 Multi-trait Twin Models
Using SEM and path diagrams to analyze variance components influencing a
single phenotype can be extended to understand multiple phenotypes and their
interrelationships (Martin and Eaves, 1977). In multivariate genetic analysis, information
not only comes from the covariance between twins for each trait (as in univariate models),
but also from the covariance between different traits within a twin, as well as the cross-
12
twin cross-trait covariance. The same logic in univariate analysis applies, that is, a larger
cross-twin cross-trait correlation between MZ twins as compared with DZ twins suggests
that covariance between the variables is partially due to the genetic factors that may be
shared by the different traits. Multivariate genetic models estimate matrices of genetic
covariance (SS
A
), shared environmental covariance (SS
C
) and non-shared environmental
covariance (SS
E
). The expected phenotypic covariance (SS
P
) can thus be partitioned into
three components as follows:
SS
P
= SS
A
+ SS
C
+ SS
E
,
[4]
which may be viewed as an extension of equation [1].
A popular procedure to resolve the genetic and environmental components of
covariance is to use a triangular decomposition, and this procedure is often referred to as
fitting a “Cholesky factor model” (Neale & Cardon, 1992). Figure 2 is a path diagram
that depicts an example of the trivariate Cholesky model with three phenotypes. In this
model, relationships among three phenotypes are parameterized in terms of three factors
(genetic or environmental), where all phenotypes load on the first factor, and two
phenotypes load on the second factor, and the last phenotype loads on the third factor
only. Estimates of factor loadings are used to evaluate the magnitude of genetic and
environmental influences on each phenotype, and the extent to which these influences
contribute to the covariation among phenotypes (Martin and Eaves, 1977). The use of
Cholesky factors to estimate SS
A
, SS
C
, and SS
E
ensures that the estimated covariance
components are each positive definite (i.e. of full rank).
13
Figure 2. Cholesky decomposition of three phenotypes (one twin only)
A1
P1
A2
P2
A3
P3
C1 C2 C3
E1 E2 E3
a11 a21 a31
a22 a32
a33
Fitting multivariate genetic model using Cholesky factorization is probably the
most commonly used approach in twin studies. Referring to it as a model, however, is
somewhat misleading, since it is primarily a technology for estimating a covariance
structure under the constraint that the estimated covariance matrix is positive definite
(See Carey, 2005). Its algorithm is to decompose genetic and environmental covariance
matrices as the product of a lower diagonal matrix post-multiplied by its transpose. For
example, a triangular decomposition of a 3 × 3 genetic covariance matrix SS
A
is
represented by:
SS
A
= Λ
A
Λ
A
′, [5]
where Λ
A
is a triangular matrix and has the form:
= Λ
33 32 31
22 21
11
0
0 0
a a a
a a
a
A
. [6]
Although all parameters in a triangular decomposition can be estimated, cautions should
14
be taken when interpreting results based on these parameters. In fact, the Cholesky
factorization model only describes a correlational pattern among all genetic or
environmental factors, but does not provide theoretical insights into why certain
phenotypes or behaviors tend to co-occur in a population.
Despite the fact that it lacks any theoretical basis, the Cholesky model is often
used as a comparison for more restrictive models such as the independent-pathway model
(Martin & Eaves, 1977) and the common-pathway model (McArdle, 1986), which have
strong theoretical interpretation about the causes of covariation between phenotypes.
Both of these alternative models assume covariation among phenotypes is caused by
common latent genetic and environmental factors, and variance specific to each
phenotype is due to the influence of specific genetic and environmental factors. However,
the two models differ in the way that common factors influence different phenotypes.
Figure 3. Independent pathways model (one twin only)
The independent-pathway model (Kendler, et al., 1987; Martin & Eaves, 1977),
also called biometric common factors model (McArdle, 1986), is shown in Figure 3. The
model assumes that each phenotypic matrix can be formed from common and unique
15
factors that represent genetic and non-genetic factors. The biometric factor structure at
the top of Figure 3 addresses the genetic and environmental components of variance
common to the different phenotypes. These components are orthogonal, thus their factor
loadings are called independent pathways.
The common-pathway model (Kendler, et al, 1987), also called psychometric
common factors model (McArdle & Goldsmith, 1990), is depicted in Figure 4. The model
assumes that each phenotype can be decomposed into a common psychometric factor F
and three unique biometric factors, and the common psychometric factor F has biometric
sources of influences. This pattern clearly shows a higher-order factor model, in which
the first-order common psychometric factor F is influenced by second-order biometric
sources A, C and E.
Figure 4. Common pathway model (one twin only)
Due to the identical structuring of the unique biometric factors, all differences
between the independent-pathway and the common-pathway model come from the
F
16
common factor structure at the top of two figures. Obviously, the independent-pathway
model has three orthogonal factors that combine to produce the manifest observations,
whereas the common-pathway model has only one common factor positioned to account
for the same manifest observation. However, the common-pathway model can be derived
from the independent-pathway model with additional constraint of proportionality of
factor loadings, meaning it is nested under independent-pathway model and is a
constrained biometric factor model.
The independent-pathway model depicted in Figure 3 and the common pathway
model in Figure 4 can both be viewed as special kinds of “one factor” models because
only one latent genetic factor accounts for the genetic covariation among phenotypes.
When more than one common genetic factor exist for the studied phenotypes, the
independent-pathway and the common-pathway models can be extended to incorporating
common factors by using triangular decomposition on their biometric components
(McArdle and Goldsmith, 1990). This approach, however, requires that there is enough
number of phenotypes or indicators for each factor to identify models.
All three multi-trait genetic models (Cholesky, independent-pathway, and
common-pathway models) have been heavily utilized in externalizing behavior research
(Burt, Krueger, McGue, & Iacono, 2001, 2005; Hicks, et al., 2007; Kendler, Prescott,
Myers, & Neale, 2003; Krueger et al., 2002; Nadder, et al., 2001; Dick et al., 2005;
Thaper, Harrington, & McGuffin, 2001). The common-pathway model seems to be more
favored because it is the most parsimonious model among the three and its psychometric
features and merits are in line with traditional psychological approach. For example, in
17
the widely-cited study by Krueger et al. (2002), a common-pathway model was found to
best explain covariation among some common externalizing problems, including
antisocial behavior, conduct disorder and various forms of substance use. These
externalizing behaviors shared genetic and environmental effects through a single
externalizing factor. However, this model is not always replicable, especially among
studies that examine the DSM externalizing syndromes. For example, the study by Burt,
Krueger, McGue and Iacono (2005) found that covariation among conduct disorder,
attention deficit hyperactivity disorder, and oppositional defiant disorder cannot be
grouped by a single latent factor and thus has to be fit with multiple genetic factors in a
Cholesky model.
2.4 Multi-rater Twin Models
In addition to the investigation of covariation among multiple traits or phenotypes,
multivariate twin models introduced in the last section can also be extended to describe
multiple rater data in genetic research. When the assessment of children is based on
ratings from multiple informants, the ratings obtained are a function of both the child’s
behavior and characteristics of the informant. Disentangling the child’s phenotype from
characteristics of the rater becomes an important methodological problem.
Neale and Cardon (1992) summarized three genetic models for multiple rater data,
namely, biometric models, psychometric models and rater bias models. The following
explains the assumptions of the three models using parent and teacher ratings on a single
phenotype.
18
Biometric or Cholesky decomposition models (See Figure 5) assume the parent and
teacher ratings are assessment of different phenotypes of the child (Y
P
and Y
T
respectively). The phenotypes may be correlated but for unspecified reasons. This view
may be true if parents and teachers reported on behaviors observed in distinct situations
(e.g. home and school), or if they do not share a common understanding of the behavioral
descriptions.
Figure 5. Biometric model for ratings of a pair of twins (t1 and t2)
by their primary caregiver and teacher
MZ=1.0/DZ=0.5 MZ=1.0/DZ=0.5
A
P
P
t1
T
t1
A E
F
t1
E
P
C
C
P
A
T
E
T
C
T
A
P
P
t2
T
t2
A E
F
t2
E
P
C
C
P
A
T
E
T
C
T
1.0
1.0
Psychometric models (See Figure 6) have a more restrictive assumption that there is
a common phenotype (F) of the child which is assessed both by parents and by teachers,
and a component of each rater’s ratings which results from an assessment of an
independent aspect of the child. Parent ratings and teacher ratings are presumed to
19
correlate because they are indeed making assessment based on shared observations and
have a shared understanding of the behavioral descriptions used in the assessment.
Figure 6. Psychometric model for ratings of a pair of twins
by their primary caregiver and teacher
A
P
P
t1
T
t1
A E
F
t1
E
P
MZ=1.0/DZ=0.5
MZ=1.0/DZ=0.5
MZ=1.0/DZ=0.5
C
C
P
A
T
E
T
C
T
A
P
P
t2
T
t2
A E
F
t2
E
P
C
C
P
A
T
E
T
C
T
1.0
1.0
1.0
In rater bias models (See Figure 7), the rating of a child’s phenotype is considered
to be a function both of the child’s phenotype and of the bias introduced by the rater. Bias
in this context is the tendency of an individual rater to overestimate or underestimate
scores consistently. Rater bias models allow the variance of different ratings to be
partitioned into components due to reliable trait variance (in F), due to parental or teacher
bias (B
p
and B
T
), and due to unreliability or errors (R
p
and R
T
). Rater bias models
represent special restriction of more general biometric and psychometric models. It is
20
possible to compare the adequacy of rater bias model with the alternative bivariate
psychometric and biometric models.
Figure 7. Rater bias model for ratings of a pair of twins
by their primary caregiver and teacher
P
t1
T
t1
A E
F
t1
MZ=1.0/DZ=0.5
C
R
P
B
P
R
T
B
T
P
t2
T
t2
A E
F
t2
C
R
P
R
T
1.0
2.5 Multi-trait Multi-rater Models
The preceding section has discussed how to analyze twin data obtained from
multiple raters for the univariate (or one phenotype) case. The more general issue,
however, is how to simultaneously analyze multiple phenotypes assessed by multiple
raters. This type of multi-rater multi-trait data has long been of great interest to
methodologists. Several models and methods have been proposed for studying rater bias,
such as the weighted-average model (Kenny, 1991), the realistic accuracy model (Funder,
1995), or generalizability theory (Cronbach, Gleser, Nanda, & Rajaratnam, 1972).
21
After reviewing previous theoretical and empirical work, Hoyt (2000) proposed a
model based on multivariate generalizability theory and showed how the total variance of
the scores assigned by a rater or to a target on attribute Y, σ
2
(Y), can be portioned into
four components as follows:
σ
2
(Y) = σ
2
(t) + σ
2
(r) + σ
2
(d) + σ
2
(ε), [7]
where σ
2
(t) is the target variance, σ
2
(r) is the rater variance, σ
2
(d) is the dyadic variance,
and σ
2
(ε) is the variance of the error term or residual variance. Target variance, σ
2
(t), is
the variance of the deviations of the target’s mean rating from the grand mean. It reflects
the score of the target on the trait of interest that is shared by all raters. Rater variance,
σ
2
(t), is the variance of the deviations of a rater’s mean rating from the grand mean of all
raters of that target. It reflects how a rater generally perceives targets on a trait differently
than the average rater, that is, the unique view of a rater, or rater bias. Dyadic variance,
σ
2
(d), is variance attributable to raters’ unique perceptions of a specific target. This
component can only be estimated when multiple ratings (e.g., measures, form of the
rating scale, or occasions) are available for each rater-target pair.
Hoyt’s model has been expanded to twin studies by Bartels et al. (2007), who
showed variance components σ
2
(t) and σ
2
(r) can be decomposed into genetic and
environmental parts in a psychometric rater model (as in Figure 6). For example, the total
ratings in mother and teacher ratings on twin 1 would be written as
σ
2
(Y
p;
) = ( a
2
+ c
2
+ e
2
) + ( a
p
2
+ c
p
2
+ e
p
2
) [8]
and
σ
2
(Y
t
) = ( a
2
+ c
2
+ e
2
) + ( a
t
2
+ c
t
2
+ e
t
2
) [9]
22
The first part within parentheses is identical in both equations, represent the rater
agreement variance, including components from additive genetic (a
2
), shared
environmental (c
2
), and non-shared environmental variance (e
2
). The second part within
parenthesis represents the rater disagreement variance or rater specific components (a
2
p,t
,
c
2
p,t
, and e
2
p,t
), which may be further decomposed into variance related to true behavior
of the child and bias and/or error. These sources of information, however, cannot be
distinguished in a non-genetically informative sample. Therefore, using multiple rater
data in a genetic informative design improves the interpretation of estimated variance
components in rater bias models, shedding more insight in the possible sources of trait
variances and rater variances.
In the aforementioned study by Bartels et al. (2007), variance components are
analyzed through the use of a psychometric factors model, which has become a common
practice in multi-trait multi-rater genetic research. In fact, it has been suggested that the
best way to model twin data obtained from multiple raters is to use a multivariate,
psychometric factor based approach that allows for both differences and correlations
across informants simultaneously (Bartels et al., 2003, 2004; Kraemer, et al., 2003; van
der Valk, van den Oord, Verhulst, & Boomsma, 2001, 2003). The advantages of using
such an approach include: 1) it allows genetic and environmental etiologies to be
modeled from different perspectives of the raters; 2) it allows significance of certain
types of rater bias to be modeled and tested; and 3) because the underlying common
factor represents a “shared view” of externalizing factor across raters, the heritability of
the common factor may be higher than the heritability obtained from single informants. If
23
that is the case, combining information from different types of reporters may signal to
researchers for more molecular genetic studies (Baker, et al., 2007).
Psychometric rater models may be cumbersome to analyze when there are a large
number of measures obtained from each informant. A simple strategy to deal with this
issue to run a principal component analysis on all the response scales within each rater
and then use the factor-based composite scores in the multivariate model. Thereby, a
multi-rater multivariate problem is transformed into a univariate problem, which only
needs to deal with a single composite score for each informant. An implicit assumption of
this approach, however, is there is a single, general factor underlying the various
measures of each rater. Whether the covariance among these measures could be
accounted for by a single factor must be confirmed ahead of time.
The principal component analysis strategy was adopted by Baker et al. (2007) in a
multi-informant study on childhood antisocial behaviors. The study collected 18 different
measures of antisocial behavior from three unique informants: caregivers, teachers and
children. Figure 8 shows the psychometric rater model used in the study with three
composite antisocial scores. Both rater-shared view and rater-specific view of antisocial
behavior can be investigated in this model.
24
Figure 8. Psychometric rater model from Baker, et al. (2007)
In addition to the use of composite measure, another unique feature about this study
is that additional information about teacher rating (R
T
) was obtained since some twin
pairs were rated by two teachers and other twins were rated by only one teacher. Usually
the rater bias and shared environmental effect are confounded with each other when one
rater assesses both twins. This confound is represented by dash lines in Figure 8 for the
mother rater effect (R
M
) and shared environmental effect (C
M
). Because of this confound,
only one parameter can be estimated in the psychometric rater model, and this parameter
may contain shared environment influences, a rater effect or combination of both.
However, this is not the case for teacher ratings, since some teachers rate only one twin
per family, and others rate both twins. Twin pairs can thus be effectively divided into
those who shared the same classroom and those who do not, allowing differentiation of
teacher reported shared environmental influences from potential teacher bias.
It may be noted that the multi informant analysis in the study by Baker et al.
implies a two-step procedure: the first step is a factor analysis to derive the factor-based
25
composite scores, and the second step is to use the derived composites as if they are
observed scores for analysis in a psychometric rater model. These two steps, however,
may be combined into a single step and analyzed directly using a higher order factor
model as depicted in Figure 9. To simplify the illustration, figure 9 only shows the
measurement part of the model (i.e. without biometric components). The model uses
first-order latent factors, corresponding to the evaluation by each informant, to account
for covariation among multiple measures. The three first-order latent factor scores can be
interpreted similarly as the composite scores used in Figure 8. However, the latent scores
are unbiased estimates of the true scores and are thus more accurate as indicators for the
higher order factor.
Figure 9. Higher order factor model structure
v6 v1 v12 v7 v18 v13
Caregiver
Report
Child
Report
Teacher
Report
Shared
View
ε
1
ε
6
ε
7
ε
12
ε
13
ε
18
26
Despite their accuracy in parameter estimation, it is uncommon to fit a higher order
factor model directly to twin data (consider for example, adding biometric components A,
C, and E will yield a third-order factor analysis). However, a higher order factor model
can be transformed into a special single-order orthogonal factors model, called
hierarchical factor model or group factor model. Figure 10 shows an example of the
hierarchical model, which is equivalent to the higher order model in Figure 9 when
proportionality constraints are imposed on factor loadings of the former (Schimid &
Leiman, 1957; Gignac, 2007; Yung, 1999).
Figure 10. Hierarchical factor model structure
v6 v1 v12 v7 v18 v13
Caregiver
Report
Child
Report
Teacher
Report
Shared
View
v6 v1 v12 v7 v18 v13
Caregiver
Report
Child
Report
Teacher
Report
Shared
View
ε
1
ε
6
ε
7
ε
12
ε
13
ε
18
Application of the hierarchical factor model has only been found in a few genetic
externalizing behavior studies (e.g. Krueger, et al., 2007; Nadder, et al., 2001). One twin
study in particular was conducted by Burt, McGue, Krueger and Iacono (2005), who
applied hierarchical factor model (or as what they called “informants-effect model”) to
investigate rater and shared environment effects on child externalizing disorders. They
27
analyzed ADHD, oppositional defiant disorder, and conduct disorder data based on
mother and children self report, and found the hierarchical model to be the best fit. Figure
11 shows the diagram for one twin in a pair. As can be seen, every indicator is loaded not
only on a general externalizing factor EXT (which represents shared view by parents and
teacher on child’s externalizing behavior), but also is loaded on a rater specific factor.
Figure 11. Hierarchical factors twin model from Burt, et al. (2005)
Another multivariate version of psychometric rater models was presented by
Bartels et al. (2003), who used a design with multi-trait multi-method (MTMM) flavor to
study aggressive behavior (AGG) and rule-breaking behavior (RB) rated by both father
and mother. The model they used is shown in Figure 12. The details of MTMM design
will be covered in the next chapter. However, it should be noted here that Bartels et al.
modeled two distinct but correlated phenotypes AGG and RB, instead of one common
externalizing factor as investigated in the study by Baker et al. (2007) or Burt et al.
28
(2005). In addition, each rater’s specific view on different phenotypes is modeled using
Cholesky factorization, allowing least restrained factor structure on specific rating on
each phenotype. Using this bivariate psychometric model, Bartels et al. found both AGG
and RB are highly heritable in 12 year old twins. The heritability of AGG is .69 for boys
and .72 for girls, and the heritability of RB is .79 for boys and .56 for girls. They also
found 80% of the covariance between AGG and RB is mainly caused by common genes.
Figure 12. Bivariate psychometric rater model from Bartels, et al. (2003)
The studies reviewed in this section demonstrated different ways to model multi-
trait multi-rater data in genetic externalizing research. Application of psychometric rater
models was specially investigated. This type of model has some nice features in that
variance of each observed variable could be decomposed into true variance (shared by
raters) and rater specific variance (non-shared). These models can be viewed as
29
extensions or expansions of the multivariate models discussed in two previous sections
[2.3 & 2.4].
Considering the complexity of multi-rater multi-trait design, the current literature
on multi informant study is very limited. Behavior genetic studies need reliable results
that can be replicated across informants to yield strong evidence for molecular genetic
studies. However, these studies were limited in that the most proposed multi-trait multi-
rater models in literature have not been investigated in a systematic way. The
effectiveness of the best fit model and the power of the study design were often ignored.
To overcome these methodological limitations in previous research and to address
specific multi-trait multi-rater issues in externalizing behavior research, the current study
was conducted.
30
Chapter Three: Statistical Models
3.1 Measurement Models
To investigate the factor structure of externalizing behaviors, a necessary first step
is to run a confirmatory factor analysis (CFA) using a sample of independent
observations (i.e., not twin pairs). The results from this step may serve as reference points
for comparison with the subsequent twin analyses of paired observations and for
generalization to the general population of non-twin children.
Six competing CFA models, which represent different hypotheses about the
relationship among externalizing behavior problems, were investigated in this study.
These models are listed in order of model complexity in Figure 13. M1 is the simplest
measurement model with one factor representing the true externalizing score and six
observed scores being indicators, and no specificity pertaining to either rater. M2 is an
oblique two-factor model, with two factors representing parent view and teacher view of
child’s general externalizing behaviors. M3 is an oblique three-factor model, with each
factor representing a shared rating by parent and teacher on a specific trait only. For
purpose of clarity in the present paper, M1 is called a general factor model, M2 is called a
correlated rater factor model, and M3 is called a correlated trait factor model. In spite of
their different model specifications, models M1-M3 share the same features that (1) each
indicator is specified to load on only one factor and (2) the unexplained variance of one
indicator does not covary with the unexplained variance of another.
31
Figure 13. Six competing measurement models for the covariances among the six scores
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
F
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
ADHD
P CAQ
P
CBCL
P
ADHD
T CAQ
T
CBCL
T
F
P
F
T
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
F
ADHD
P CAQ
P
CBCL
P
ADHD
T CAQ
T
CBCL
T
F
P F
T
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
F
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
F
AD
F
CB
F
CA
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
ADHD
P CAQ
P
CBCL
P
ADHD
T CAQ
T
CBCL
T
F
AD
F
CB
F
CA
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
F
P
F
T
1.0
1.0 1.0 1.0
1.0
1.0
ADHD
P CAQ
P
CBCL
P
ADHD
T CAQ
T
CBCL
T
F
AD
F
CB
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
F
CA
Model M1
Model M3
Model M2
Model M4
Model M5 Model M6
32
Evaluating models M1-M3 represents a common sequence of CFA analyses with
increasing number of factors. Although a useful part of investigation, this sequence
provides an incomplete evaluation of multi-trait multi-rater data in the current study.
Specifically, rater effects may obscure the relationship among the externalizing problems.
When each construct is assessed by different raters, neither a correlated-rater factor
models (M2) nor correlated-trait factor models (M3) alone can determine how much of
the observed overlap is due to rater effects as opposed to “true” covariance of the traits.
Therefore more complicated models M4-M6, which allow simultaneous assessment of
rater effects and trait effects, are proposed in the current study.
M4-M6 can be viewed as combinations of M1-M3. To put simply,
M1 + M2 ⇒ M4
M1 + M3 ⇒ M5
M2 + M3 ⇒ M6
To elaborate, M4, called general factor and correlated rater model, indicates that one
general factor and group rater factors are all required to account for the observed
covariance. M5, called general factor and correlated trait model, is interpreted similarly,
in that the covariance among the observed variables are decomposed into a parts that is
due to a general factor and a part that is due to group trait factors.
M6, called correlated-trait correlated rater model, is the most complicated model
among the six. However, it represents an important application of CFA approach to
analyze data from multitrait-multimethod (MTMM) study, the logic of which was first
articulated by Campbell and Fiske (1959). In an MTMM study, two or more traits are
33
measured with two or more methods. Traits are hypothetical construct such as cognitive
abilities or personality attributes, and methods refers to multiple test forms, occasions,
specific measurement methods or specific informants. The main goals of an MTMM
study are to (1) evaluate the convergent and discriminant validity of a set of tests that
vary in their measurement method and (2) derive separate estimates of the effects of traits
versus methods on the observed scores (Kline, 2005). MTMM data is often analyzed by
CFA to make inferences about potential underlying dimensions such as trait and methods
factors.
In context of the present study, the traits concern ADHD symptoms, aggression
and delinquency, while the methods refer to parent and teacher reports. Model M6
represents a typical form of CFA specification to solve MTMM problems, and this
parameterization solution is referred to as correlated-rater correlated-trait model in the
present study. The major aspects of this model’s specifications include: (1) each indicator
is specified to load on two factors – its trait factor and its rater factor; (2) correlations
among trait factor and among informant factors are freely estimated, but the correlations
between trait and rater factors are fixed to zero; and (3) indicator uniqueness (i.e.,
variance in the indicators not explained by the trait and informant factors) are free
estimated but cannot be correlated with the uniqueness of other indicators. Accordingly,
in this specification, each indicator is considered to be a function of trait, rater, and
unique factors.
However, there are some technical points to note with the above specification.
Typically, MTMM data analyzed by CFA requires at least three traits and at least three
34
methods to make an identified model (Widaman, 1985). The current study only has two
raters’ reports on externalizing behaviors available; leading model M6 has more
parameters than observed statistics. One way to identify M6 is to have both factor
loadings for each trait fixed at 1.0, meaning parent and teacher ratings have equal weights
on each reported trait. As such, M6 represents a restricted correlated-rater correlated-trait
CFA model of MTMM data.
In fact, both model M4 and model M5 are also subsets of the general correlated-
rater correlated-trait model for MTMM data. Model M4 is derived by fixing the
correlation between two rater factors to 1.0, and M5 is derived by fixing correlations
among all three trait factors to 1.0. However, Model M4 and M5 are not nested under
model M6 because of the aforementioned restriction of trait factor loadings doubly fixed
to 1.0.
Two special cases of M4 and M5 are the uncorrelated rater and uncorrelated trait
models. Their specifications are identical to those of correlated models, except that the
covariance of the rater factors or trait factors is fixed to zero instead of being freely
estimated. Because the uncorrelated models are nested under correlated models, their
comparison provides a statistical evaluation of whether the effects associated with the
different raters or different traits are correlated; for example, a lack of correlated rater
effects would be indicated by a non-significant chi-square difference test. It should be
note that the measurement model used by Burt et al., (2005) discussed in section 2.5. was
an example of uncorrelated rater model.
35
Figure 14. Six competing psychometric twin models
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
F
A C E
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
ADHD
P CAQ
P
CBCL
P
ADHD
T CAQ
T
CBCL
T
A1 C1 E1 A2 C2 E2
F
P
F
T
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
A1 C1 E1
F
AD
A2 C2 E2
F
CA
A3 C3 E3
F
CB
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
F
ADHD
P CAQ
P
CBCL
P
ADHD
T CAQ
T
CBCL
T
F
P
F
T
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
A1
A C E
C1 E1 A2 C2 E2
F
ADHD
P CAQ
P
CBCL
P
ADHD
T CAQ
T
CBCL
T
F
AD
F
CB
ε
1
ε
2
ε
5
ε
6
A C E
F
CA
ε
3
ε
4
A1 C1 E1 A2 C2 E2 A3 C3 E3
ADHD
P CAQ
P
CBCL
P
ADHD
T CAQ
T
CBCL
T
ε
1
ε
2
ε
5
ε
6
ε
3
ε
4
F
P
F
T
A1 C1 E1 A2 C2 E2
F
AD
F
CB
F
CA
A3 C3 E3 A4 C4 E4 A5 C5 E5
PM1 PM2
PM3 PM4
PM5 PM6
36
3.2 Biometric Models – Psychometric Raters
After measurement models are evaluated in non-paired sample, a natural follow-
up question is whether this structure is replicable across twins. The answer is not obvious
because the best fit factor model derived from an independent sample of non-paired
observations may explain correlations within individual, but may not be adequate to
explain correlations between twins (including both intra-class and cross twin cross-trait
correlations). It is therefore interesting to investigate how this model (M4) performs
compared to alternative models in the twin sample of paired observations.
Multivariate genetic models were fit to six competing psychometric twin models,
which were derived from combining measurement models with biometric components of
latent factors. The theory of psychometric twin models was developed by McArdle and
Goldsmith (1990), and it has been widely cited in quantitative genetic studies. Figure 14
shows path diagrams of the six psychometric twin models (PM1-PM6), displaying only
one member of a twin pair. In comparison with measurement models in Figure 13, these
psychometric twin models simply add Cholesky factorization or triangular decomposition
of A, C, and E components to each of the first order latent factors.
There are four advantages specifying model this way: (1) it allows correlations
among the first order factors, (2) it provide a test of the fit or orthogonal or uncorrelated
model, (3) it restrict the correlation to be positive definite, and (4) it restricts the
covariance structure in a biometrically informative way (McArdle & Goldsmith, 1990).
An important implication of the last property is that genetic and environmental sources of
variation on factor can have direct influences on other factors, instead modeling
37
correlation model. Such direct influences can be used in deriving variance components by
different sources.
3.3 Model Invariance across Sex
The six measurement models and biometric models introduced in the previous
two sections need to take into account potential sex differences. Because the large sex
differences in prevalence and types of symptoms between boys and groups, it is an
important issue to examine the equivalence of all measurement and structural parameters
of the factor model across two sex groups. The measurement model pertains to the
measurement characteristics of the indicators or observed measures and thus consists of
the factor loadings, intercepts, and residual variances. Hence, the evaluation of across-
group equivalence of these parameters reflects tests of measurement invariance. Test of
equal factor structures may also involve evaluation of the latent variable parameters, such
as factor variances, covariances, and latent means. Examinations of these latent variable
parameters are considered as tests of population heterogeneity (Brown, 2006),
Different terminologies exist in the literature for the various tests of invariance
(see Horn & McArdle, 1992; Meredith, 1993). For example, the test of equal factor
structures (i.e. the number of factors and pattern of indicator-factor loadings is identical)
has been referred to as “configural invariance”. Equality of factor loadings has been
referred to as “metric invariance” or “weak factorial invariance”. The equality of
indicator intercepts has been alternatively termed “scalar invariance” or “strong factorial
invariance”. Finally, evaluation of the equality of indicator residuals has also been
referred to as a test of “strict factorial invariance”.
38
There are also some discrepancies in the order of model restrictions in testing for
invariance. Most commonly, stepwise procedures are employed in multiple-group CFA
research, whereby the analysis begins with the least restricted solution and subsequent
models are evaluated that entail increasingly restrictive constraints; that is, equal factor
loadings → equal intercepts → equal residual variances, and so on (Brown, 2006).
However, some methodologists (e.g., Horn & McArdle, 1992) have advocated a “step-
down” strategy whereby the starting model contains all the pertinent invariance
restrictions, and subsequent models are then evaluated that sequentially relax these
constraints. The former approach is preferred in the present study because the evaluation
of sex invariant model is based on results of the best fit unconstrained full model solution.
In addition, gender differences on the covariation among externalizing behavior problems
are still unknown. Reported results from previous studies have not been consistent on this
issue (Bartels et al., 2003; Krueger et al., 2005; Hicks et al., 2007). Thus it is more
prudent for model evaluation to work upward from the least restricted solution to
determine if further tests of measurement invariance and population heterogeneity are
warranted.
For the aforementioned reasons, model invariance evaluation in the present study
is conducted in the following sequence: (1) Test the equality of factor loadings; (2) test
the equality of indicator intercepts; 3) test the equality of indicator residual variance; 4)
test the equality of factor variances; (5) test the equality of factor covariances; and for
twin analysis only, (6) test the equality of biometric components factor loadings. Steps 1-
3 are tests of measurement invariance; Steps 4-6 are tests of population heterogeneity. It
39
should be noted that equality of latent means are not tested in invariance analysis because
the data transformation issue which will be discussed in section 4.5.
3.4 Aim of Study
In the context of this study, the externalizing factor was defined by three
commonly studied child phenotypes: inattention and hyperactivity, aggression and
delinquency. The data were collected from both parent and teacher reports in an ongoing
longitudinal twin sample. The primary aims of this study were:
a) To investigate the best externalizing factor structure using multi-trait multi-rater
method in a non-paired independent sample.
b) To examine the best biometric externalizing factor model in a genetically
informative twin sample, and estimate genetic and environmental influences on
externalizing factors.
c) To understand the extent to which rater-shared factors and rater-specific factors
contribute to the variance and covariation among specific externalizing behaviors.
d) To evaluate the effectiveness of the best fitting model and compare it with
competing models using a Monte Carlo simulation design.
40
Chapter Four: Research Methods
4.1 Overview
The sample for this study was drawn from participants in the University of
Southern California (USC) Twin Study of Risk Factors for Antisocial Behavior, which is
an ongoing longitudinal study of the “interplay of genetic, environmental, social, and
biological factors on the development of antisocial and aggressive behavior from childhood
to adolescence” (Baker, et al., 2007, p.221). The twins and their families who are part of
the USC Twin Study were recruited from the city of Los Angeles and the surrounding
communities. Participation was voluntary and families were ascertained primarily
through local schools, both public and private. All participating families are contacted by
phone to explain the study in detail and to schedule a testing session. Interested families
were then invited to visit USC laboratories to participate in an extensive assessment
process. Families who could not visit USC laboratories to participate could choose to
receive and return questionnaires by mail. In addition, teachers of twins were contacted
by mail to provide information about twins’ behavior at school. Detailed description of
the study design, recruitment procedure and a summary of the measures can be found in
Baker, Barton, Lozano, Raine and Fowler (2006). There are a total of four waves of
assessment which have been conducted or scheduled at the time of report.
4.2 Procedure
The present analyses are based on data from the first wave of assessment
conducted in 2000-2004. During the first wave of data collection, the twins and their
primary caregiver participated in a 6-8 hour laboratory assessment process at USC. The
41
assessment included behavioral interviews, neurocognitive testing, social risk factor
assessment, and psycho-physiological recording of the twins. Caregivers were also
interviewed about their twins’ behavior, as well as their own behavior and their
relationship to each twin. In addition, cheek swab samples were collected from the
families in order to extract DNA and test for zygosity. All interview or testing procedures
were conducted by rigorously trained examiners (see Baker et al., 2006). Children were
interviewed in English, while caregiver interviews were conducted in either English (81%)
or Spanish (19%), depending on the language preference of the participant.
The twins’ teachers were also contacted to complete surveys about each child’s
school behaviors and to return their survey packets to USC by mail. However, not all
teacher surveys were returned, and the individual return rate was about 60% (see Section 3.5
for details of missing responses). Despite missing teacher surveys, information was
available on whether twins or not were in the same classroom at school and were rated by
the same teacher or two different ones.
4.3 Sample Characteristics
Participants in the present study consisted of 605 families of twins (n=596 pairs) or
triplets (n=9 sets) and their primary caregivers. All twins were 9-10 years old (mean
age=9.6, standard deviation=0.6) born between 1990 and 1995. The sample was
composed of both male and female monozygotic (MZ) twins and dizygotic (DZ) twins,
including both same-sex and opposite-sex DZ pairs. Among the 1,219 child participants,
there was approximately equal gender distribution with 48.7% boys (n=594) and 51.3%
girls (n=625). The 605 caregivers were primarily female (n=567), of which 94.0% are
42
biological mothers (n=533).
The child’s ethnicity was determined by the ethnicity of their biological parents as
reported by the primary caregivers. The sample includes 26.6% Caucasian (n=161 pairs),
14.3% African American (n=86 pairs), 37.5% Hispanic (n=227 pairs), 4.5% Asian (n=27
pairs), 16.7% mixed (n=101 pairs), and 0.3% other ethnicities (n=2 pairs). The sample’s
ethnicity is quite representative of the ethnic diversity of the greater Los Angeles area.
Twin’s zygosity was decided using DNA microsatellite analysis (> 7 concordant
and zero discordant markers = MZ; one or more discordant markers = DZ) for 87% of the
same-sex twin pairs. For the remaining same-sex twin pairs, zygosity was established by
questionnaire items about the twins’ physical similarity and the frequency with which
people confuse them. The questionnaire was used only when DNA samples were
insufficient for one or both twins. When both questionnaire and DNA results were
available, there was a 90% agreement between the two (Baker et al., 2007). As such, the
study sample composed of 138 MZ male, 139 MZ female, 84 DZ male, 97 DZ female,
and 147 opposite sex DZ pairs.
4.4 Measures
The present dissertation study used three different measures of child externalizing
behavior: Diagnostic Interview Schedule for Children version IV (DISC-IV), Child
Aggression Questionnaire (CAQ), and Child Behavior Checklist (CBCL). Assessment
was based on both caregiver’ and teachers’ responses. Caregiver assessment was
administered during the family’s laboratory visit. The mode of assessment, however, is
different among caregiver instruments, with DISC-IV and CAQ being administered
43
through semi-structured caregiver interviews and CBCL being administered in paper-
and-pencil format. In contrast, teacher instruments were all administered in survey form
and returned by mail. The following section describes in detail each of the three
instruments and use of relevant subscales.
The ADHD Module of DISC-IV. The DISC-IV is a highly structured interview
that has been adapted from Diagnostic and Statistical Manual of Mental Disorders (4
th
ed.;
DSM-IV; American Psychiatric Association, 1994) to assess psychiatric disorders and
symptoms in children and adolescents age 6 to 17 years old. The DISC-IV is designed to
be administered by well-trained lay interviewers and can be done by computer assisted
testing. The interview questions are grouped into different diagnostic modules, such as
ADHD, conduct disorder, or general anxiety etc. The present study only uses the
questions from the ADHD module, which assesses 18 symptoms of inattention and
hyperactivity. Response to each question is yes=1/no=0, depending on whether the child
displayed the symptom or not during the past year. The total symptom count, as scored
by the DISC-IV computer assisted program, was used for subsequent data analyses. Since
face to face interviews were not available for twins’ teachers, the ADHD module
questions were sent to teachers as part of the mail survey packet. However, teacher’s
ADHD survey has a different response scale from parent interview. Instead of yes/no, the
teacher ADHD surveys used a 5-point Likert scale with 1=Never, 2= Rarely,
3=Sometimes, 4=Often, and 5=Almost always.
The CAQ. As the name implies, the CAQ is an instrument to assess child’s
aggressive behaviors, including overall and various forms of aggression. The majority of
44
the CAQ items were taken from Raine and Dodge’s Reactive and Proactive Aggression
questionnaire (RPQ, Raine et al., 2006), including a total of 23 items. Each item has a
three-point response format: 0 = never, 1 = sometimes, 2 = often. Some example items
are “He/she threatens and bullies other kids”; “He/she damages or breaks things for fun”.
Parallel forms of CAQ are completed by both parents and teachers. The sum score of
item response was calculated for subsequent analysis and a higher score means more
aggression symptoms.
The Externalizing Subscale of Child Behaviors Checklist (CBCL, Achenbach,
1991). The CBCL is one of the most widely used instruments in research and clinical
practices to test for emotional and behavior problems in children. It includes 112 items
and each item is reported on a 3-point scale: not true (0), sometimes or somewhat true (1),
very true or often true (2). Items are grouped into eight narrow-band syndrome subscales
(e.g., Depressed, Aggressive, Hyperactive) and two higher-order, broad-band factors
labeled internalizing and externalizing. The present study only used the externalizing
factor score, which sums 13 items from Delinquent Behavior and 20 items from
Aggressive Behavior subscales. The aggressive items include such behaviors as
destroying one’s own and other’s belongings, fighting with other children, attacking
others, arguing a lot, and bragging and boasting. The delinquent items include such
behaviors as lying, and stealing at home or elsewhere. All items were filled out in paper
and pencil by both parents and teachers based on their observation of children’s problem
behaviors exhibited in the past six months.
45
4.5 Missing Responses
Missing responses for caregiver assessment were rare in this study. There were
complete valid data for the caregiver response on CAQ, and there were valid data for
over 98% of the sample for DISC-IV/ADHD and CBCL. Missing data were somewhat
greater for teacher reports. The overall teacher response rate was approximately 60%. Of
the 605 individual twin pairs, 269 pairs (44.5%) had teacher reports for both twins, and
an additional 143 pairs (23.6%) had teacher reports for at least one of the two twins. Of
the 269 pairs with valid data for teacher report data for both twins, 111 pairs (41.4%)
were in the same classroom at school and were thus rated by the same teacher informant
(see Baker, et al., 2007 for more detail).
4.6 Model Fitting Program
The present twin model analysis study will be conducted in Mplus (Muthén and
Muthén, 1998-2007). Mplus software is relatively new for twin data analysis compared to
the widely-used Mx program created by Neale, et al. (1991). The first example of the use
of Mplus program with twin data was published in 2004 by Prescott, and there has been
growing attention and discussion on applying the program in behavior genetic research
(e.g., Muthén, Asparouhov, & Rebollo, 2006; Rathouz, et al., 2008). All model selections
were based on three most commonly used model fit indexes: AIC (Akaike Information
Criterion), Bayesian Information Criterion (BIC), and the Root Mean Square Error of
Approximation (RMSEA). All model fit indexes and their computation equations used in
Mplus can be found in Appendix A.
46
Chapter Five: Empirical Results
5.1 Sex and Informant Differences
Table 1 summarized descriptive statistics for raw scores of the six measures. They
were presented separately for boys and girls to investigate potential gender differences.
The mean ratings of boys were consistently higher than girls on all six measures, and
boys also exhibited higher degree of variability than girls. These results suggested that
boys have more severe externalizing problems.
Table 1. Means, standard deviations (SD), number of participants (N) and t-tests for
gender differences (raw scores)
Males Females _ t-test _
Mean SD N Mean SD N t df P
ADHD
P
5.2 4.3 591 3.6 3.7 613 7.3 1202 <.01
CAQ
P
32.1 5.1 594 31.0 4.9 625 4.0 1217 <.01
CBCL
P
7.9 6.7 589 6.4 6.2 624 4.2 1211 <.01
ADHD
T
5.9 3.8 359 4.1 3.4 369 6.7 726 <.01
CAQ
T
29.4 6.9 359 27.3 5.2 369 4.6 726 <.01
CBCL
T
7.0 10.0 350 4.3 7.5 362 4.1 710 <.01
Note. Since caregiver-rated (ADHD
P
) and teacher-rated (ADHD
T
) ADHD symptoms do not
share the same response scale, they are not directly comparable.
A comparison between informants showed that child aggression and delinquent
behavior problems reported by teacher tend to be less severe than caregiver reports.
However, there existed larger variances among teacher ratings than parent ratings. This
pattern of differences did not vary across child’s gender. In addition, since caregiver-
47
rated ADHD symptoms and teacher-rated ADHD symptoms did not share the same
response scales, ADHD results were not directly comparable between two informants.
5.2 Data Transformation
A problem using raw data in this study is that all six sum scores (three measures
for each of the two raters) are highly positively skewed because of the rare occurrence of
many behavior symptoms. To satisfy the normality assumption for the subsequent
statistical analysis, raw data were thus rank- transformed and normalized to approximate
normal distributions. The normalized rank transformation procedure was used because it
has recently been found to be the best transformation method in optimizing model
selection in genetic studies (van den Oord et al., 2000). This procedure involves replacing
raw scores with their rank values, which are then transformed to normal standardized
scores with mean 0 and standard deviation 1. The procedure was performed using PROC
RANK and PROC STANDARD of the SAS 9.0 statistical program. As in previous
studies using similar measurement (see Burt, Krueger, McGue & Iacono, 2001; Krueger,
et al., 2002; Nadder, Rutter, Silberg, Maes & Eaves, 2002), male and female data were
treated separately to account for the potential heterogeneity between sexes. All
transformations were conducted by sex but without regard to twin-pair or zygosity. All
subsequent data analysis and model fitting results were based on transformed scores.
Frequency distributions of all six variables before and after the transformation can be
found in Appendix B: Histograms of raw and transformed data.
48
5.3 Phenotypic Correlations
Table 2 summarized the Pearson correlation coefficients calculated from
transformed scores. Correlations for males were below the diagonal and correlations for
females are above the diagonal. As can be seen, most correlation coefficients were
similar across boys and girls. However, there was somewhat higher parent-teacher
agreement for boys (.53, .28, .36) than girls (.45, .15, .25), suggesting perhaps boys
behaviors may be more readily observable (or more consistent across situations) than
girls. As to informant differences, correlations among three phenotypes were higher in
teacher ratings both for boys and girls, suggesting less ability for teacher to distinguish
the various forms for externalizing behaviors.
Table 2. Phenotypic correlations for ADHD, CAQ and CBCL scores by
gender (transformed scores)
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
ADHD
P
1.0 .38 .49 .45 .24 .37
CAQ
P
.34 1.0 .62 .18 .15 .14
CBCL
P
.53 .63 1.0 .26 .25 .24
ADHD
T
.53 .16 .31 1.0 .65 .70
CAQ
T
.34 .28 .31 .60 1.0 .67
CBCL
T
.47 .21 .36 .75 .71 1.0
Note. Males’ phenotypic correlations are below the diagonal of the correlation matrix.
Females’ phenotypic correlations are above the diagonal of the correlation matrix. All
correlations are significant at the.01 level.
49
5.4 Univariate Biometric Analysis
Univariate analyses were first done to assess the size of genetic and
environmental effects on individual measures of externalizing behaviors. Table 3 showed
twin correlations separated by zygosity for each phenotype followed by univariate ACE
model results. As seen from Table 3, the MZ correlations were consistently greater than
the corresponding same-sex DZ correlations, suggesting genetic influences on all six
measures in both males and females. In particular, The DZ correlations for ADHD
symptoms were less than half the MZ correlations, indicating a large genetic effect and a
small or negligible shared environmental effect on ADHD. In contrast, the DZ
correlations for CAQ and CBCL were all greater than half the MZ correlations, indicating
a strong environment influences shared by twins.
Table 3. Twin’s phenotypic correlations by sex and zygosity (transformed data)
MZ Male MZ Female DZ Male DZ Female DZ Opposite
Sex
ADHD
P
.58 (136) .61 (135) .28 (83) .17 (93)
NS
.40 (146)
CAQ
P
.52 (138) .58 (139) .34 (84) .47 (97) .51 (147)
CBCL
P
.59 (136) .57 (139) .52 (83) .48 (97) .47 (146)
ADHD
T
.65 (67) .64 (68) .22 (46)
NS
.49 (45) .41 (62)
CAQ
T
.58 (67) .66 (68) .49 (46) .48 (45) .61 (62)
CBCL
T
.44 (66) .70 (67) .38 (42) .56 (42) .42 (60)
Note. All correlations are significant at the .05 level except the ones that are marked as non-
significant (NS); Number in the parenthesis is the number of valid twin pairs.
50
One thing worth noting in Table 3 was that a few DZ opposite sex correlations
were much larger than DZ same sex twins, e.g. parent rated ADHD and teacher rated
CAQ. Opposite-sex twin correlations like these are not unusual in twin research, but
sometimes may cause model fit problems if kept in the analysis. Thus in some studies
they were left out in genetic analyses to avoid the fitting and estimating problems (e.g.
Burt, et al., 2005; Hicks, et al, 2007; Nadder et al., 2002). Nevertheless, a recent study on
ADHD and its comorbidity with other externalizing disorders (Tuvblad, Zheng, Raine, &
Baker, in press) based on this twin sample using the same data showed that including the
opposite sex twins did not drastically change the parameter estimates and model fit.
Moreover, including opposite sex twins increases the power of the study by increasing
the sample size. Therefore, all subsequent genetic analyses in this study used five groups
of both same sex and opposite sex twin pairs.
Table 4 presented five-group univariate genetic model fitting results for all six
variables. All models were specified as the ACE model, in which the variance
attributable to for by genetic (A), shared environmental (C), and non-shared
environmental (E) influences were all estimated (see Figure 1 in Chapter 2). Sex
differences on parameter estimates were tested by first allowing for sex differences in
parameter estimates (sex-variant model), and second by constraining the parameter
estimates in both sexes to be equal (sex-invariant model). The model that has lower
values on three commonly used fit indexes, including AIC, BIC and RMSEA, would be
selected as the better fitting model.
51
Table 4. Univariate ACE model fits and biometric decompositions for each phenotype
Phenotype Description χ
2
df p AIC BIC RMSEA A C E
ADHD
P
Sex variant model 29.2 17 .03 3278 3313 .077
Sex invariant model 31.2 21 .07 3272 3290 .063 .61 .00 .39
CAQ
P
Sex variant model 26.5 17 .07 3299 3334 .067
Sex invariant model 30.0 21 .09 3294 3312 .059 .29 .29 .43
CBCL
P
Sex variant model 23.1 17 .15 3258 3293 .054
Sex invariant model 23.4 21 .32 3250 3268 .031 .28 .33 .40
ADHD
T
Sex variant model 19.9 17 .28 1980 2012 .044
Sex invariant model 21.1 21 .45 1973 1990 .009 .63 .06 .31
CAQ
T
Sex variant model 20.6 17 .25 1964 1997 .049
Sex invariant model 21.1 21 .45 1957 1973 .008 .27 .37 .36
CBCL
T
Sex variant model 22.4 17 .17 1940 1972 .061
Sex invariant model 31.8 21 .06 1941 1957 .077 .29 .31 .40
Note. A = Additive genetic effects; C = Shared environmental effects; E = Non-shared environmental effect.
52
As seen in Table 4, no-sex-differences models fit better than sex-differences
models for all measures except teacher-rated CBCL (CBCL
T
). For CBCL
T
, deciding
which model to select depended on the criterion used. Specifically, AIC and RMSEA
suggested there are sex differences in parameter estimates, while BIC values suggested
otherwise. Following the recommendation by Markon & Krueger (2004), the present
analysis used the BIC as criterion and chose the sex-invariant model as the better fitting
model.
Estimates of A, C, E variance components were presented in the right panel of
Table 4. The heritability of ADHD is above 60% for both parent and teacher ratings, and
there is no shared environmental influence; the heritability of aggression and heritability
of delinquency as assessed by CAQ and CBCL are 30% apiece, and there are also shared
environmental influences accounting for 30% of variation in each. The univariate
estimates for these measures are consistent with previous studies and is in the range of
results published in literature (DiLalla, 2002; Thaper et al., 1999). In addition, these
results confirmed two recently published research using MX software to analyze similar
set of measure in the same sample (Tuvblad, Raine, Zheng, & Baker, in press; Tuvblad,
Zheng, Raine, & Baker, 2009).
5.5 Measurement Model Fitting
As discussed in Chapter 3, a reasonable prior analysis before conducting
multivariate biometric analysis is to run a confirmatory factor analysis at the individual
level. The backbones of the six proposed psychometric twin models in Figure 14 are
essentially six measurement models with constraints on biometric sources. The biometric
53
model that is based on the best fitting measurement factor structure thus might be a more
valid hypothesis than alternative models.
Measurement models in this study were evaluated in a non-paired sample of
independent observations, which was obtained by randomly selecting only one twin from
each family in the current study. Measurement invariance across sex was tested for each
measurement model following the sequence listed in section 3.3. However, only results
from the full model (which has the least restricted form allowing all male and female
parameters to be free to vary) and the strict sex-invariant model (which has the most
restricted form constraining all parameters in both sexes to be equal) were presented. The
adequacy of model fit was evaluated by chi-square (χ
2
) test statistics and again by fit
indexes AIC, BIC and RMSEA.
Results in Table 5 showed that sex-invariant M4 was the best fitting model
because it had the lowest AIC (7153), BIC (7264), and RMSEA (less than .001) values of
all the 12 models tested. The optimal fit of this model indicated that one common factor
and two correlated rater factors are needed to account for the six measure’s covariation.
In addition, across all six measurement models, the sex-invariant sub-model fit
consistently better than the full model, suggesting no sex differences on factor structure
and model parameter estimates.
54
Table 5. Comparing fit for sex-variant and sex-invariant measurement models
Model Description χ
2
df p AIC BIC RMSEA
M1 One common factor 327.4 18 <.01 7475 7634 .238
Sex invariant 345.9 36 <.01 7458 7537 .168
M2 Correlated rater factors 78.2 16 <.01 7230 7398 .113
Sex invariant 93.9 35 <.01 7208 7291 .074
M3 Correlated trait factors 286.4 12 <.01 7446 7631 .274
Sex invariant 312.9 33 <.01 7431 7523 .167
M4 G & correlated rater factors 6.7 4 .15 7183 7403 .048
Sex invariant 27.5 29 .55 7153 7264 <.001
M5 G & correlated trait factors 3.5 0 - 7187 7425 -
Sex invariant 29.9 27 .32 7160 7279 .019
M6 Correlated rater &
correlated trait factors
4.5
4
.35
7180
7401
.020
Sex invariant 30.4 29 .39 7156 7266 .013
Note. M5 full model has zero degree of freedoms.
As is evident in Table 5, none of the models M1-M3 fit well. That implies neither
a single factor (M1), nor a rater factors only (M2), nor a scale factors only (M3) model
could adequately account for the covariation. This suggested either residuals are
correlated or indicators may reflect the effects from multiple sources. In contrast, models
M4-M6 were all acceptable, as indicated by their non-significant χ
2
values. Although M4
was the best fitting model, its model fit indexes were very close to the other two models -
M5 and M6. In particular M6 has the same degree of complexity as M4. If only
comparing the six full models (ignoring sex-invariant models), model M6, in fact, would
become the best selected model with the lowest fit index values (χ
2
= 4.5, AIC = 7180,
55
BIC = 7401, RMSEA=.02). Another thing to note about these results is: since the non-
paired data used in the analysis were just one random subset from the whole twin sample,
results might change if another random singleton set was selected. These issues therefore
may leave the conclusion that M4 is the best fitting model in doubt.
A further model tested (not shown in Table 5) was the uncorrelated rater factor
model, which is a simplified version of sex-invariant model M4 by fixing the correlation
between two rater factors to zero. However, the fit index value of this orthogonal model
was significantly worse than the correlated model (χ
2
= 46.5, df=30, AIC = 7170, BIC =
7276, RMSEA=.042). This suggested that there were still overlapping informant between
caregiver and teacher report that were unexplained, even after taking into account their
shared view on all six measures.
The standardized parameter estimates of the best fitting model M4, as shown in
Figure 15, revealed why the rater factors are correlated. The common factor F had only
two significant indicators, parent reported ADHD symptoms and teacher reported ADHD
symptoms, and all other four indicator have non-significant loadings on F. This suggested
that F was simply an ADHD factor. To account for correlation between parent and
teacher ratings, correlated rater factors must be considered in model fitting.
56
Figure 15. Standardized parameter estimates of the best fit
measurement model - M4
F
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
F
P F
T
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
.56 -.08
NS
.86
.52
.61 .72 .81
.68 .79
.45
-.02
NS
.16
NS
.35
NS
Note. NS = non-significant
5.6 Multivariate Biometric Analysis
Measurement models in the preceding analysis were estimated in a sample of
independent (i.e. non-paired) observations randomly selected from twins. These
measurement models can be expanded into twin studies by constraining both twins to
have the same factor structure. Manifest covariation among phenotypes are explained by
their common biometric sources of influence, that is, genetic (A), common environmental
(C) and unique environmental (E) influences. These analyses of paired observations
enable us to understand the rater effects in genetic models by answering questions such as
what proportion of genetic variation in externalizing behavior problems is unique to each
reporter. If a common factor model is found to hold in the analysis of paired observations,
57
one may also address the questions about the heritability of the shared view held by
different raters about child problems. This can effectively result in a more refined
estimation of heritability (h
2
) of true score variance, independent of measurement error
and rater effects.
All models in this biometric analysis were fit using five zygosity groups, again
both allowing for sex differences in parameter estimates and constraining the parameter
estimates in both sexes to be equal. Model fit results are summarized in Table 6. Based
on its lowest value across all model fit indexes, sex-invariant model PM4 was found to be
the best fitting model. This finding was consistent with the measurement model results in
Table 5, indicating the factor structure fit in the non-paired sample was indeed replicated
in the paired twin sample.
58
Table 6. Comparison of six psychometric rater twin models
Model Description χ
2
df p AIC BIC RMSEA
PM1 One common factor full model 1063.7 380 <.001 14271 14581 .121
Sex-invariant model 1100.6 418 <.001 14230 14371 .115
PM2 Two oblique rater factors full model 550.5 370 <.001 13778 14132 .120
Sex-invariant model 604.5 413 <.001 13744 13907 .061
PM3 Three oblique scale factors full model 1040.1 354 <.001 14300 14724 .126
Sex-invariant model 1084.2 405 <.001 14240 14439 .117
PM4 One common & three oblique rater factors 410.8 354 .02 13671 14095 .036
Sex-invariant model 473.4 405 <.01 13629 13828 .037
PM5 One common & three oblique scale factors 418.8 338 <.01 13708 14203 .044
Sex-invariant model 479.2 397 <.01 13651 13885 .041
PM6 Two oblique rater + three oblique scale factors 412.0 334 <.01 13709 14222 .044
Sex-invariant model 476.5 395 <.01 13652 13895 .041
Note. All analyses use five groups including MZM, MZF, DZM, DZF and DZO.
59
Standardized parameter estimates of model PM4 was shown in Figure 16.
Consider first the factor loadings on F, which represent the view shared by both parent
and teacher for all three measures of children externalizing behavior. F has the highest
loading for parent rated ADHD (.74), followed by teacher rated ADHD (.53). In contrast,
poor loadings on F were found for parent rated CAQ and CBCL, and neither estimated
value was significant at the α =.05 level. These suggested that the common factor F was
best represented by ADHD symptoms, and F did not reflect much of aggression and
delinquency. 82% of the variance in F was due to genetic effects, and the non-shared
environment accounted for the remaining 18%.
Figure 16. Standardized parameter estimates of
the best fitting model - PM4
F
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
F
P
F
T
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
A1
A C E
C1 E1 A2 C2 E2
.04 .19 .53 .22 .43 .74
.91 .00 .43
.43 .84 .75
.65 .77
.78
.61 .61
.50 .33*
.55
.63
.43
.13
.03*
.27 .45 .26 .29 .35 .22
60
As for the parent rater factor F
P
, it had high loadings for CAQ (.75) and CBCL
(.84), but a relatively low loading on ADHD (.43). Similarly, the teacher factor F
T
had
high loading for both CAQ (.78) and CBCL (.77), though it also had high loading for
ADHD (.65) compared to F
P
. These factor loading results suggested that Fp and F
T
were
related to child aggression and delinquency that were independent of ADHD factor (F)
Correlation between oblique Fp and F
T
in Figure 16 were modeled using
triangular decomposition or Cholesky factorization. Following equation [4] in Chapter 2,
variance and covariance of the Fp and F
T
can be calculated by multiplying the triangular
matrix by it transpose. Results from matrix multiplication showed that F
T
was influenced
mostly by shared environment (.40), while genetic influences only contributed 25%. Also,
Factors F
P
and F
T
correlated significantly in this data (r=.36, p<.05), about 75% of this
correlation is due to their shared genetic influences.
5.7 Variance Decomposition
As discussed in Chapter 2, modeling two oblique latent factors using triangular or
Cholesky decomposition ensures the variance-covariance matrices to be positive definite.
A more important utility of triangular decomposition, however, is that it allows two
correlated factors be re-parameterized as a shared factor and a unique factor. So instead
of modeling correlations among latent factors, a Cholesky model depicts direct influences
of latent factors. For example, Factors F
P
and F
T
both have genetic influences A1 and A2,
but A1, the genetic source of variation on F
P
, also exerts an influence on F
T
.
Thereby,
F
T
’s variance can be decomposed into parts that were due to shared genetic source with
61
F
P
(A1) and its unique genetic source (A2). The two components can not be distinguished
if a mere correlation model is used (i.e. correlating A1 and A2).
Table 7 demonstrates the percentage of the total variance that is decomposed into
rater-shared and rater-specific biometric components based on the best fitting model PM4.
Note in Figure 16 that there are two sets of rater-shared biometric components. The first
set of rater-shared components (A, C, and E) came through the common factor F, or the
ADHD factor as suspected in previous section’s analysis. This set was classified under
“Rater Shared ADHD” column. The second set of came from the biometric components
(A1, C1, and E1) shared by factors F
P
and F
T
. Because the high loadings of CAQ and
CBCL on factors F
P
and F
T
, this set of shared components was classified under “Rater
Shared AGG/DE” column. Finally, all other non-shared variances were classified under
“Rater specific” column. To make presentation clear, all components were collapsed (1)
across shared or specific ratings (under the column “Rater Shared and Specific Variance
Sub-total”), and (2) across sources of biometric components A, C, and E (under the
column “Biometric Variance Sub-total”).
62
Table 7. Proportion of genetic and environmental variance separated by shared and specific rater view factors
Rater Shared
ADHD
Rater Shared
Agg/De
Rater Specific
Rater Shared and Specific
Variance Sub-total
Biometric Variance
Sub-total
Variable A E A C E A C E ADHD Agg/DE Specific A C E
ADHD
P
.44 .10 .07 .05 .07 .02 .00 .25 .54 .18 .27 .54 .05 .42
CAQ
P
.00 .00 .20 .14 .21 .06 .18 .21 .00 .55 .45 .26 .32 .42
CBCL
P
.03 .01 .26 .18 .26 .01 .12 .13 .03 .70 .26 .30 .30 .40
ADHD
T
.24 .05 .05 .16 .12 .18 .05 .15 .29 .33 .37 .47 .21 .32
CAQ
T
.04 .01 .07 .24 .18 .20 .08 .19 .05 .49 .46 .30 .32 .38
CBCL
T
.16 .03 .06 .23 .18 .10 .07 .16 .19 .47 .34 .32 .30 .37
63
Variance decomposition results showed some differences between parent and
teacher ratings. First, 54% of parent-rated ADHD variance was accounted for by their
shared view with teacher rating on ADHD factor, while only 29% of teacher-rated
ADHD variance was shared by parent rating on ADHD, suggesting parent is a better and
reliable informant on ADHD symptoms. In addition, almost 1/3 of teacher-rated ADHD
variance was from their shared view on Aggression/Delinquency factor, suggesting
teachers were not able to distinguish different forms of externalizing behaviors. Second,
there was a large difference between parents and teachers on their CBCL report. 70% of
parent rating variance was shared by teachers, while 49% of teacher rating variance was
due to their shared view with parents. The discrepancy was mostly due to a portion of
teacher’s view (19%) that was explained by the common ADHD factor, again indicating
teacher has less ability to distinguish ADHD from other externalizing behaviors. Finally,
57% parent rating on CAQ was explained by their shared rating with teachers, slightly
better than teacher rating (49%).
The total A, C, E effects were estimated for each of the six measures, and the
results were compared to univariate analysis in Section 5.3 (see Table 4). The pattern of
the A, C, and E results was fairly similar on CBCL and CAQ across analyses. However,
estimates of ADHD were lower in multivariate analysis than univariate analysis (54% vs.
61% for parent report, and 43% vs. 63% for teacher report). The reason of this
discrepancy might be due to the fact that the multivariate analysis has more information
than the univariate ACE analysis. In particular, because of the overlapping rating by
64
teachers on ADHD and CAQ/CBCL measure, estimates shared environmental influences
on ADHD were significantly increased, and thus dropped the genetic effects estimates.
In summary, multivariate biometric analysis on the six twin rater models in this
chapter found model PM4 to be the best fitting model for the three externalizing behavior
phenotypes. However, researchers may question the validity of these results are and
inquire about how well the data are explained by the model or how well has the best
fitting model been distinguished from other models. To answer these questions, a Monte
Carlo simulation study is designed to evaluate the performance of the best fitting model
and the precision of it parameter estimates.
65
Chapter Six: Monte Carlo Simulations
6.1 Purpose of Monte Carlo Simulation
Monte Carlo simulation is an empirical method for evaluating statistics using
repeated random sampling technique. It is often applied in methodological research to
study the behavior of statistical estimators and test statistics under various conditions
manipulated by the researcher, such as sample size, degree of model misspecification,
distribution and type (binary of continuous) of observation, amount and pattern of
missing data etc. This approach has been incorporated in many SEM software packages,
among them Mplus has been found by author to be the most flexible and easiest to
implement. In fact, one of the reasons that the Mplus program was used for this study was
because of the powerful Monte Carlo applications and analyses the program offered
which allow complicated models to be easily produced.
The Monte Carlo simulation in this study serves as a follow-up to the analysis of
empirical data in Chapter 5. It has three purposes: 1) to evaluate the performance of the
best fitting model under different true model conditions, 2) to estimate the precision of
sample parameter estimates, and 3) to compare model fit indexes and evaluate their
effectiveness in model selections.
6.2 Data Generation Models
Simulation Models. The simulation data were generated under the six
psychometric rater twin models proposed in Chapter 3 (Figure 14). The true population
parameter values used to generate simulation data came from the parameter estimates
output during the multivariate biometric analysis step in section 5.6. Mplus has a
66
convenient data-saving feature that allows all parameter estimates from a real data
analysis to be saved as population parameter values for use in data generation in
subsequent simulation study. Appendix C listed two example model’s unstandardized
parameter estimates output from running the multivariate biometric analysis in section
5.6. These real data output values were used as true parameter value in generating data
for the six simulation models.
Replications.. 500 replications were generated for each true population model to
ensure there was sufficient reliability in the summary information calculated. Six
simulation true models thus would generate a total of 3,000 sample replications.
Sample Size. Since the simulation was based on data from the USC Twin Study,
the samples were simulated as close as the real twin data. Each sample has size N=605
pairs, including 5 zygosity groups MZ males (138 pairs), MZ Female group 139 MZ
female, 84 DZ male, 97 DZ female, and 147 opposite sex DZ pairs.
Missing Data. The effect of missing data is also considered in this analysis. Since
missing responses from parent report were rare, there is no missing parent response in the
generated parent data. However, missing data were allowed for teacher data generation.
Based on the missing data pattern analysis on the USC Twin Study data, four patterns of
teacher missing patterns were specified as follows and their probability are:
1) Complete data: p = .48
2) Missing twin A’s rating: p = .12
3) Missing twin B’s rating: p = .12
4) Missing both twins’ information p = .28
67
Appendix C gave an example of Mplus script for analyzing the simulation data.
6.3 Data Analyses Models
Each of the sample replications was analyzed using six alternative psychometric
rater models used in the previous section, which are the same as the six analysis models
depicted in section 3.2 and fit in section 5.6. Table 8 summarized the average χ
2
, p-value,
and number of successful computations (or number of converged modes). A given
replication may fail to converge because of the inappropriate starting values or singularity
of the information matrix or an inadmissible solution that was approached as a result of
negative variances (Nylund, Asparouhov, & Muthen, 2007). As a result, model estimates
and summaries were not computed for these replications. This occurred mostly in mis-
specified models. For example, when data was generated as one common factor model
(SIM1) but was specified as PM5 which has one common factor and three scale factors,
then only 6 out of 500 model replication could be computed, that is, over 98% would fail
to converge. Occasionally a model replication would fail to converge even when it was
specified correctly. For example when fitting SIM4 data with PM4, there are about 10%
sample replication would not converge.
6.4 Model Selection
The purpose of the current simulation study is to evaluate the performance of six
biometric models under different truth model conditions and to determine the power of
finding the best fitting model. The approach used is to simply analyze each of the true
model data with all six models. The percentage of times that true model was recovered
68
from six models was the power estimate, while the percentage of times that false model
was recovered was the type I error.
Table 9-11 summarized the model selection results using three model fit indexes:
AIC, BIC, and RMSEA. They all select the best fitting model based on the lowest index
value. In these tables, number on the diagonal is highlighted in bold font to represent the
percentage of times the true model is correctly selected. Numbers off diagonal represent
percentage of time a false model is selected.
None of the three fit indexes performed perfectly. BIC performed best in
distinguishing the true PM4 model and simpler alternative models (PM1-PM3), against
which PM4 discriminated perfectly. However, BIC had almost no sensitivity to models
that were more complex than PM4, with type I error over 90% when true models are
PM5 or PM6. This result is not difficult to understand from the BIC equation [17] in
Chapter 3. The bigger the sample size is, the more it will penalize models with
complicated structure or model with more parameters.
AIC performed slightly better than BIC in distinguishing PM4 against PM5-PM6,
but it performed poorly in recognizing a model with same complexity (PM3 has the same
degree of freedom as PM4). If data were generated from PM3, the chance that model
PM3 could be recovered was only 24%.
The RMSEA fit statistic however, performed significantly better than AIC or BIC
or distinguishing models PM4-PM6, with 60% or above probability to hit the correct
model. However, compared to BIC, RMSEA was not very useful in differentiating
simpler models (PM1-PM3).
69
Multivariate biometric analysis in section 5.6 showed that the best fitting model
PM4 has the lowest AIC, BIC and RMSEA values among all compared models. Based on
the current simulate result, it is a confident conclusion to reject PM1-PM3. However,
rejecting model PM5 or PM6 may lead up to 30% chances of making a type I error.
6.5 Precision of Parameter Estimates
Precision of model PM4 parameter estimates based on Monte Carlo approach
were presented in table 12. The first column (Population) is the true population value for
each parameter; in this case they were the saved result values from the real data analysis
on fitting PM4. The second column (Estimates Average) is the average of the parameter
estimates across replications. The percentage of parameter thus can be calculated by
subtracting the population parameter value from the average parameter values, dividing
this difference by the population value. The calculated bias values were listed in the
second to last column (Bias %).
Mplus also yields output containing the standard deviation of the parameter
estimates across replications (Std. Dev.), and the average of the standard error across
replications (S. E. Average). The bias were calculated in a similar fashion, by subtracting
the average of the estimated standard errors across replications from the standard
deviation of the parameter estimate (Std. Dev), and dividing this difference by the latter
(Std. Dev. Standard error bias were shown in the last column (S. E. Bias%).
The criterion given by Muthén and Muthén (2002) for a “precise” parameter was
that parameter bias and standard error bias should be both lower than 10%. In this study,
parameter estimates are satisfactory considering that most biases estimates are lower than
70
10%, but S.E. biases seem greater than expected. For more information regarding the
parameter precision, see Muthén and Muthén (1998-2007).
71
Table 8. Mean χ
2
values, p value and number of models successfully computed
SIM1 _ SIM2 _ SIM3 _ SIM4 _ SIM5 _ SIM6 _
Model
χ
2
p n χ
2
p n χ
2
p n χ
2
p n χ
2
p n χ
2
p n
PM1
450 .23 500 1061 .00 499 465 .14 500 1079 .00 500 1076 .00 500 1080 .00 500
PM2
447 .22 500 444 .23 500 463 .13 499 581 .00 498 575 .00 500 575 .00 498
PM3
443 .18 464 1060 .00 498 442 .20 447 1059 .00 470 1056 .00 475 1060 .00 486
PM4
439 .21 119 434 .25 267 438 .23 246 437 .23 450 444 .18 464 446 .17 447
PM5
433 .14 6 435 .18 376 418 .32 4 444 .14 275 432 .20 382 437 .17 253
PM6
440 .14 246 436 .16 319 440 .15 295 442 .14 336 437 .17 361 430 .20 385
Note. SIM1-SIM6 = Simulation model or true model.
PM1-PM6 = Fitting model or data analysis model, as in Figure 14
n = number of models successfully computed (out of total 500)
72
Table 9. Mean AIC values and the percentage of times the simulation model was selected as the best fit
Mean AIC % Models Selected as the Best
SIM1 SIM2 SIM3 SIM4 SIM5 SIM6 SIM1 SIM2 SIM3 SIM4 SIM5 SIM6
PM1 14200 14315 14202 14204 14211 14207 98.2 0.0 52.8 0.0 0.0 0.0
PM2 14207 13708 14210 13716 13721 13711 1.4 94.2 0.5 0.0 0.0 0.0
PM3 14216 14339 14207 14209 14218 14214 0.2 0.0 24.4 0.0 0.0 0.0
PM4 14208 13718 14204 13587 13606 13599 0.2 5.8 22.4 100.0 62.6 68.1
PM5 14248 13730 14175 13618 13612 13603 0.0 0.0 0.0 0.0 36.9 7.8
PM6 14231 13740 14227 13616 13618 13601 0.0 0.0 0.0 0.0 0.5 24.2
Note. Numbers in diagonal (bold fonts) indicate the analysis model is specified correctly as the simulation true model.
73
Table 10. Mean BIC values and the percentage of times the simulation model was selected as the best fit
Mean BIC % Models Selected as the Best
SIM1 SIM2 SIM3 SIM4 SIM5 SIM6 SIM1 SIM2 SIM3 SIM4 SIM5 SIM6
PM1 14341 14456 14343 14345 14352 14348 100.0 0.0 0.0 0.0 0.0 0.0
PM2 14370 13871 14373 13879 13884 13874 0.0 100.0 0.0 0.0 0.3 1.0
PM3 14414 14537 14405 14407 14416 14412 0.0 0.0 100.0 0.0 0.0 0.0
PM4 14406 13916 14403 13785 13804 13797 0.0 0.0 0.0 100.0 92.7 90.4
PM5 14481 13963 14409 13851 13845 13836 0.0 0.0 0.0 0.0 7.1 5.2
PM6 14474 13982 14469 13858 13861 13843 0.0 0.0 0.0 0.0 0.0 3.4
Note. Numbers in diagonal (bold fonts) indicate the analysis model is specified correctly as the simulation true model.
74
Table 11. Mean RMSEA values and the percentage of times the simulation model was selected as the best fit
Mean RMSEA % Models Selected as the Best
SIM1 SIM2 SIM3 SIM4 SIM5 SIM6 SIM1 SIM2 SIM3 SIM4 SIM5 SIM6
FIT1 .022 .112 .028 .114 .114 .114 67.6 0.0 10.7 0.0 0.0 0.0
FIT2 .023 .022 .029 .057 .057 .057 12.0 55.4 1.1 0.0 0.0 0.0
FIT3 .025 .115 .025 .115 .115 .116 8.4 0.0 47.9 0.0 0.0 0.0
FIT4 .024 .021 .023 .022 .026 .026 9.0 32.4 35.1 93.6 22.5 28.8
FIT5 .026 .025 .020 .029 .024 .026 0.2 6.4 5.2 1.8 67.5 11.7
FIT6 .019 .027 .028 .029 .027 .024 2.8 5.8 18.1 4.7 10.0 59.5
Note. Numbers in diagonal (bold fonts) indicate the analysis model is specified correctly as the simulation true model.
75
Table 12. Precision of model PM4 parameter estimates
Popu-
lation
Estimates
Average
Std.
Dev
S.E.
Average M.S.E.
Bias
(%)
S.E.
Bias(%)
F->Y1 Fix@1
F->Y2 0.05 0.05 0.17 0.14 0.03 -13.5 -25.4
F->Y3 0.25 0.26 0.18 0.14 0.03 4.4 -25.4
F->Y4 0.73 0.78 0.33 0.25 0.11 7.2 -30.4
F->Y5 0.30 0.31 0.16 0.13 0.03 5.3 -20.4
F->Y6 0.59 0.62 0.28 0.21 0.08 6.6 -31.0
F
P
->Y1 0.52 0.53 0.16 0.11 0.03 2.7 -47.4
F
P
->Y2 0.90 0.93 0.18 0.16 0.03 3.8 -17.0
F
P
->Y3 Fix@1
F
T
->Y4
0.85
0.84
0.06
0.06
0.00
-1.2
-5.2
F
T
->Y5 1.00 1.10 0.45 0.26 0.21 10.6 -76.4
F
T
->Y6 Fix@1
A->F 0.67 0.65 0.23 0.22 0.05 -3.4 -2.54
C->F 0.00 0.16 0.21 0.37 0.07 42.8
E->F 0.32 0.34 0.13 0.13 0.02 5.7 -2.6
A1->F
P
0.51 0.49 0.11 0.11 0.01 -3.4 -3.6
A1->F
T
0.33 0.30 0.15 0.15 0.02 -10.3 2.7
A2->F
T
0.25 0.20 0.19 0.34 0.04 -22.1 45.5
C1->F
P
0.42 0.40 0.11 0.12 0.01 -4.5 2.5
C1->F
T
0.02 0.03 0.17 0.18 0.03 56.8 4.5
C2->F
T
0.49 0.40 0.16 0.23 0.03 -18.5 29.6
E1->F
P
0.51 0.50 0.06 0.05 0.00 -1.6 -11.7
E1->F
T
0.10 0.10 0.05 0.05 0.00 -0.5 -5.0
E2->F
T
0.43 0.40 0.06 0.05 0.00 -5.7 -20.7
Y1 Corr. 0.02 -0.09 0.78 0.69 0.61 -510.5 -12.4
Y2 Corr. 0.23 0.22 0.08 0.07 0.01 -7.9 -10.9
Y3 Corr. 0.13 0.13 0.08 0.07 0.01 -3.1 -21.5
Y4 Corr. 0.16 0.15 0.04 0.04 0.00 -5.2 -11.7
Y5 Corr. 0.17 0.13 0.12 0.08 0.02 -18.5 -50.6
Y6 Corr. 0.07 0.07 0.03 0.03 0.00 3.8 2.9
Y1 Resid. 0.26 0.12 0.90 0.82 0.83 -53.9 -10.0
Y2 Resid. 0.45 0.42 0.12 0.10 0.01 -6.0 -15.0
Y3 Resid. 0.27 0.26 0.12 0.10 0.02 -1.2 -28.1
Y4 Resid. 0.30 0.29 0.04 0.04 0.00 -3.7 -17.6
Y5 Resid. 0.34 0.30 0.17 0.10 0.03 -12.5 -67.0
Y6 Resid.
0.23 0.23 0.04 0.03 0.00 2.3 -12.6
Note. Y1 = ADHD
P
; Y2 = CAQ
P
; Y3 = CBCL
P
; Y4 = ADHD
T
; Y5 = CAQ
T
; Y6 = CBCL
T
.
76
Chapter 7: Discussion
The main aim of this dissertation study was to investigate the best model for
estimating genetic and environmental influences on child externalizing behavior in
multivariate multi-rater twin research. Six competing biometric models were tested using
parent and teacher report on three widely used instruments: symptoms for ADHD, CAQ
and CBCL externalizing subscale. The best fitting model was the one that included a
common ADHD factor that was shared among all measures and two oblique rater factors
represented by child aggression and/or delinquency. This factor model has been found
consistently at the phenotypic level in a non-paired random sample as well as in a twin
sample across both boys and girls. The model was also validated by Monte Carlo
simulation studies.
Results of the present study indicated that parents and teachers tend to agree with
each other on the ADHD symptoms (r = .49), whereas there is a fair amount of
disagreement on aggression as measured by CAQ (r =.22) and externalizing symptoms as
measured by CBCL (r =.31). These correlation estimates are in the range of results
reported in previous studies using multiple informants (Achenbach, McConaughy, &
Howell, 1987; Hinshaw, Han , Erhardt, & Huber, 1992; Rutter, Silberg, Eaves, 2001). In
particular, the correlation of parent-teacher agreement on CBCL externalizing symptoms
is very close to the .32 as reported by Achenbach, McConaughy, and Howell (1987).
Two correlated rater factors in the best fitting model suggested that there was
more than one common factor shared by the six externalizing behavior measures. The
results seemed inconsistent with the current externalizing behavior literature that favors a
77
one-factor structure (e.g., Krueger, et al., 2002). However, there is plenty of evidence in
literature showing ADHD and other externalizing behavior have different underlying
liabilities (Burt et al., 2001; Nadder et. al., 2002; Dick et al., 2005). In most studies that
resulted in a one-factor model, the externalizing behavior factor was typically represented
by aggression, delinquency, conduct problems or substance use as indicators of
externalizing behavior. None of studies, to the author’s knowledge, has suggested using
ADHD as a reliable indicator for a general externalizing factor.
Given the debate on the distinction between ADHD inattention and hyperactivity
subtypes (Milich, Balentine & Lynam, 2001), one might suspect that a single
externalizing factor solution for externalizing problems might be reached by using only
the ADHD Hyperactivity/Impulsivity subtype, rather than the full set of symptoms of
both inattention and hyperactivity/impulsivity as in the present study. However, attempts
to separate the ADHD subtypes and fit the same model using either ADHD inattention
symptoms or ADHD hyperactivity symptoms did not change the patter of results
drastically and the factor structure was found to be the same as that presented here.
Nonetheless, future researchers should take caution when including ADHD measures as
an externalizing behavior indicator.
The correlation between the two rater factors suggested the presence of a second
common factor to account for the covariation among six measurements, namely a factor
reflecting aggression and delinquency. Thus each phenotypic variance can be
decomposed into influences due to informant-shared rating on ADHD, informant-shared
rating on aggression/delinquency, and informant unique rating by using Cholesky
78
factorization on biometric components. A unique feature of the current study is its ability
to disentangle phenotypic biometric effects from the rater effects. Based on variance
decomposition results, mother reports were found to be more reliable, while teachers
reports were less informative in distinguishing different forms of externalizing behavior.
Monte Carlo simulation analysis showed that no single model fit index is perfect
in selecting the best fitting model. Which model fit criterion to use depended on the
complexity of a model. When comparing simple factor structures (e.g. one indicator
loaded on only one factor), BIC is very effective in making distinctions among simple
models. This result is consistent with previous findings by Markon and Krueger (2004).
Therefore BIC has been recommended by previous researchersfor use in future studies of
model comparison. However, the present findings show that (1) BIC is not very sensitive
to multiple complex models (e.g., models with one indicator loaded on multiple factors),
and that (2) RMSEA might fit better in such situations. More simulation studies are
needed, however, to investigate conditions under which RMSEA would perform best
(e.g., sample size, model complexity, etc.).
There are, of course, limitations in the current study that should be addressed in
future research. First, the current study did not adjust for the parent-teacher interaction.
Since it is not known to what extent mother and teacher ratings were due to their daily
interaction communication with each other, obtaining this information in future studies
may help to differentiate true rater bias from the effects due to their interactions. Second,
the current did not consider the classroom effect, i.e. whether the twins share the same or
different teacher. As discussed in the study by Baker (in Section 2.5), some teachers rated
79
both twins and some teacher rated only one twin. Adding this information would allow
teacher’s bias effects disentangled from the shared environmental effects, thus yielding
more accurate estimation of teacher’s rating effect. This model can be easily
implemented in a multiple group twin study (e.g., MZ same teacher, MZ different teacher,
DZ same teacher, DZ different teacher).
80
References
Achenbach, T. M. & Edelbrock. C. S. (1978). The classification of child
psychopathology: A review and analysis of empirical efforts, Psychological
Bulletin, 85, 1275-1301.
Achenbach, T. M. & Edelbrock, C. S. (1984). Psychopathology of childhood. Annual
Review of Psychology, 35, 227-256.
Achenbach, T. M. (1991). Manual for the Child Behavior Checklist/4-18 and 1991
profile. Burlington: VT: University of Vermont, Department of Psychiatry.
Achenbach, T. M., Dumenci, L., & Rescorla, L. A. (2000). Ratings of relations between
DSM-IV diagnostic categories and items of the CBCL/1½-5 and C-TRF.
Burlington, VT: University of Vermont.
Achenbach, T. M., Dumenci, L., & Rescorla, L. A. (2001). Ratings of relations between
DSM-IV diagnostic categories and items of the CBCL/6-18, TRF, and YSR.
Burlington, VT: University of Vermont, Research Center for Children, Youth, &
Families. Available at http://www.ASEBA.org.
Achenbach. T. M., McConaughy, S. H., Howell, C. T. (1987). Child/adolescent
behavioral and emotional problems: Implications of cross informant correlations
for situational specificity. Psychological Bulletin, 101, 213–232.
Akaike, H. (1987). Factor analysis and AIC. Psychometrika, 52, 317-322.
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental
disorders (4th ed.). Washington, DC: Author.
Angold, A. (2003). Adolescent depression , cortisol and DHEA. Psychological Medicine,
33, 573-581.
Angold, A., Costello, J. E. & Erkanli, A. (1999). Comorbidity. Journal of Child
Psychology and Psychiatry, 40, 57-87.
Armstrong, T. D., & Costello, E. J. (2002). Community studies on adolescent substance
use, abuse, or dependence and psychiatric comorbidity. Journal of Consulting and
clinical Psychology, 70, 1224-1239.
Baker, L. A., Barton, M., Lozano, D.I., Raine, A., & Fowler, J. (2002). The Southern
California Twin Register at USC. Twin Research, 5, 456-459.
81
Baker, L. A., Barton, M., Lozano, D.I., & Raine, A. (2006). The Southern California
Twin Register at USC, Twin Research and Human Genetics, 9, 840-933.
Baker, L. A., Jacobson, K. C., Raine, A., Lozano, D. I., Bezdjian, S. (2007). Genetic and
environmental bases of childhood antisocial behavior: A multi-informant twin
study. Journal of Abnormal psychology, 116(2), 219-235.
Bartels, M., Boomsma, D. I., van Beijsterveldt, T. C. E. M., Hudziak, J. J., & van den
Oord, E. J. C. G. (2007). Twin and the study of rater (dis)agreement.
Psychological Methods, 12(4), 451-466.
Bartels, M., Boomsma, D. I., Rietveld, M. J. H., van Beijsterveldt, C. E. M., Hudziak, J.
J., & van den Oord, E. J. C. G. (2004). Disentangling genetic, environmental, and
rater effects on internalizing and externalizing problem behavior in 10-year-old
twins. Twin Research, 7, 162–175.
Bartels, M., Hudziak, J. J., Boomsma, D. I., Rietveld, M. J. H., van Beijsterveldt, C. E.
M., & van den Oord, E. J. C. G. (2003). A study of parent ratings of internalizing
and externalizing behavior in 12-year-old twins. Journal of the American
academy of Child and Adolescent Psychiatry, 42, 1351–1359.
Biederman, J. & Faraone, S. V. (2006). The effects of attention-deficit/hyperactivity
disorder on employment and household income. Medscape General Medicine,
18(8), 3-12.
Biederman, J., Faraone, S. V., Milberger, S., Jetton, J. G., Chen, L., Mick, E., Greene, R.
W. & Russell, R. L. (1996). Is childhood oppositional defiant disorder a precursor
to adolescent conduct disorder? Findings from a four-year follow-up study of
children with ADHD. Journal of the American Academy Child and Adolescent
Psychiatry, 35(9), 1193-204.
Biederman, J., Newcorn, J. & Sprich, S. (1991). Comorbidity of attention deficit
hyperactivity disorder with conduct, depressive, anxiety, and other disorders.
American Journal of Psychiatry 148, 564-77.
Browne, M. W., & Cudeck, R. (1993). Alternative ways of assessing model fit. In K. A.
Bollen & J. S. Long (Eds.), Testing structural equation models (pp. 136-162).
Thousand oaks, CA: Sage.
Brown, T. A. (2006). Confirmatory factor analysis for applied research. The Guilford:
New York.
82
Burt, S. A., Krueger, R. F., McGue, M., & Iacono, W. G. (2001). Sources of covariation
among attention-deficit/hyperactivity disorder, oppositional defiant disorder, and
conduct disorder: The importance of shared environment. Journal of Abnormal
Psychology, 110(4), 516-525.
Burt, S. A, McGue, M., Krueger, R. F., & Iacono, W. G. (2005). Sources of covariation
among the child-externalizing disorders: informant effects and the shared
environment. Psychological Medicine, 35, 1133-1144.
Campbell, D. T., & Fiske, D. W. (1959). Convergent and discriminant validation by the
multitrait-multimethod matrix. Psychological Bulletin, 56, 81-105.
Carey, G. (2005), Cholesky problems. Behavior Genetics, 35, 653-665.
Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnam, N. (1972). The dependability of
behavioral measurements: Theory of generalizability for scores and profiles. New
York: Wiley.
Dick, D. M., Viken, R. J., Kaprio, J., Pullkkinen, L., & Rose, R. J. (2005). Understanding
the covariation among childhood externalizing symptoms: Genetic and
environmental influences on conduct disorder, attention deficit hyperactivity
disorder, and oppositional defiant disorder symptoms. Journal of Abnormal Child
psychology, 33(2), 219-229.
Dillalla, L. A. (2002). Behavior genetics of aggression in children: Review and future
directions. Developmental Review, 22, 593-622.
Eaves, L. Rutter, M., Silberg, J. L., Shillady, L., Maes, H., & Pickles A. (2000). Genetic
and environmental causes of covariation in interview assessment of disruptive
behavior in child and adolescent twins. Behavior Genetics, 30, 321-334.
Edelbrock, C., & Costello, A. J. (1988). Convergence between statistically derived
behavior problem syndromes and child psychiatric diagnoses. Journal of
Abnormal Child Psychology, 16, 219- 231.
Faraone, S. V., Tsuang, M. T., Tusang, D. W. (1999). Genetics of mental disorders: A
guide for students, clinicians, and researchers. New York, NY: Guilford Press.
Faraone, S. V., Biederman, J., Mennin, D., Russell, R. & Tsuang, M. T. (1998). Familial
subtypes of attention deficit hyperactivity disorder: a 4-year follow-up study of
children from antisocial-ADHD families. Journal of Child Psychology and
Psychiatry, 39(7), 1045-53.
83
Faraone, S. V. Biederman, J., & Monuteaux, M. C. (2000). Attention deficit disorder and
conduct disorder in girls: Evidence for a familial subtype. Biological Psychiatry,
48, 21-29.
Farrington, D. (1989). Early predictors of adolescent aggression and adult violence.
Violence and Victims. 4(2), 79-100.
Fergusson, D. M. & Horwood, L. J. (1998). Early conduct problems and later life
opportunities. Journal of Child Psychology and Psychiatry, 39(8), 1097-108.
Funder, D. C. (1995). On the accuracy of personality judgment: A realistic approach.
Psychological Review, 102, 652–670.
Gignac, G. E. (2007). Multi-factor modeling in individual differences research: Some
recommendations and suggestions. Personality and Individual Differences, 42,
37-48.
Goldman, L., Genel, M., Bezman, R. & Slanetz, P. (1998). Diagnosis and treatment of
attention-deficit/hyperactivity disorder in children and adolescents. The Journal of
the American Medical Association, 279(14), 1100-1107.
Grove, W. M., Eckert, E. D., Heston, L., Bouchard, T. J., Jr., Segal, N., & Lykken, D. T.
(1990). Heritability of substance abuse and antisocial behavior: A study of
monozygotic twins reared apart. Biological Psychiatry, 27, 1293-1304.
Hewitt, J. K., Silberg, J. L., Neale, M. C., Eaves, L. J. & Erickson, M. (1992). The
analysis of parental rating of children’s behavior using LISREL. Behavior
Genetics, 22, 293-317.
Hicks, B. M., et al. (2007). Gender Differences and developmental change in
externalizing disorders from late adolescence to early adulthood: A longitudinal
twin study. Journal of Abnormal Psychology, 116(3), 433-447.
Hinshaw, S.P. (1987). On the distinction between attentional deficits/hyperactivity and
conduct problems/aggression in child psychopathology. Psychological Bulletin,
101, 443-463.
Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement
invariance in aging research. Experimental Aging Research, 18, 117-144.
84
Hoyt, W. T. (2000). Rater bias in psychological research: When is it a problem and what
can we do about it? Psychological Methods, 5, 64–86.
Kasius, M.C., Ferdinand, R. F., van den Berg, H., & Verhulst, F. C. (1997). Associations
between different diagnostic approaches for child and adolescent
psychopathology. Journal of Child Psychology and Psychiatry, 38, 625-632.
Kendler, K. S., Heath, A. C., Martin, N. G., & Eaves, L. J. (1987). Symptoms of anxiety
and symptoms of depression: Same genes, different environments? Archives
General Psychiatry, 44, 451-457.
Kendler, K. S., Prescott, C. A., Myers, J., & Neale, M. C. (2003). The structure of genetic
and environmental risk factors for common psychiatric and substance use
disorders in men and women. Archives General Psychiatry, 60, 929-937.
Kenny, D.A. (1991). A general model of consensus and accuracy in interpersonal
perception. Psychological Review, 98, 155-163.
Kessler, R. C., McGonagle, K. A., Zhao, S., Nelson, C. B., Hughes, M., Eshleman, S., et
al. (1994). Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders
in the United States: Results from the National Comorbidity Survey. Archives of
General Psychiatry, 51, 8-19.
Kim-Cohen, J., Caspi, A., Moffitt, T. E., Harrington, H., Milne, B. J. & Poulton, R.
(2003). Prior juvenile diagnoses in adults with mental disorder: developmental
follow-back of a prospective-longitudinal cohort. Archives of General Psychiatry,
60(7), 709-17.
Kline, R. B. (2005). Principles and practice of structural equation modeling (2nd ed).
The Guilford: New York.
Kraemer H. C., Measelle, J. R., Ablow, J. C., Essex, M. J., Boyce W. T., Kupfer, D. J.
(2003). A new approach to integrating data from multiple informants in
psychiatric assessment and research. The American Journal of Psychiatry, 160,
1566–1577.
Krueger, R. F., Hicks, B. M., Patrick, C. J., Carlson, S. R., Iacono, W. G. & McGue, M.
(2002). Etiologic connections among substance dependence, antisocial behavior,
and personality: Modeling the externalizing spectrum, Journal of Abnormal
Psychology, 111(3), 411-424.
85
Krueger, R. F., Markon, K. E., Patrick, C. J., & Iacono, W. G. (2005). Externalizing
psychopathology in adulthood: A dimensional-spectrum conceptualization and its
implications for DSM-V. Journal of Abnormal Psychology, 114(4), 537-550.
Krueger, R. F., Markon, K. E., Patrick, C. J., Benning, S. D., Kramer, M. D. (2007).
Linking antisocial behavior, substance use, and personality: An integrative
quantitative model of the adult externalizing spectrum, Journal of Abnormal
Psychology, 116(4), 645-666.
Kuperman, S., Schlosser, S. S., Kramer, J. R., Bucholz, K., Hesselbrock, V., Reich, T. &
Reich, W. (2001). Developmental sequence from disruptive behavior diagnosis to
adolescent alcohol dependence. American Journal of Psychiatry, 158(12), 2022-6.
Lahey, B. B., MCBurnett, K., & Loeber, R. (2000). Are attention-deficit/hyperactivity
disorder and oppositional defiant disorder developmental precursors to conduct
disorder? In A.J. Sameroff & M. Lewis (Eds). Handbook of developmental
psychopathology (2
nd
ed., pp. 431-446). Dordrecht, The Netherlands: Kluwer
Academic.
Lier, Pol A. C., van der Ende, Jan, Koot, H. M., & Verhulst, F. C. (2007). Which better
predicts conduct problems? The relationship of trajectoroies of conduct problems
with ODD and ADHD symptoms from childhood into adolescence. Journal of
Child Psychology and Psychiatry, 48(6), 601-608.
Liu, J. (2004). Childhood externalizing behavior: Theory and implications. Journal of
Child and Adolescent Psychiatric Nursing, 17(3), 93-103.
Loehlin, J. C. (2004). Latent variable models: An introduction to factor, path, and
structural equation analysis (4th edition). Mahwah, New Jersey: Lawrence
Erlbaum Associates.
Lowe, L. A. (1998). Using the Child Behavior Checklist in assessing conduct, Research
on Social Work Practice, 8, 286-301.
Lykken, D. T. (1978). The diagnosis of Zygosity in twins. Behavior Genetics, 8, 437-473.
Marken, K. E. & Krueger, R. F. (2004). An empirical comparison of information-
theoretic selection criteria for multivariate behavior genetic models. Behavior
Genetics, 34, 593-610.
86
Martin, N. G., Eaves, L. J. (1977). The genetical analysis of covariance structure.
Heredity, 38, 79-95.
Martin, N., Scourfield, J., & McGuffin, P. (2002). Observer effects and heritability of
childhood attention-deficit hyperactivity disorder symptoms. British Journal of
Psychiatry, 62, 1352-1359.
Maser, J. D., & Patterson, T. (2002). Spectrum and nosology: Implications for DSM-V.
Psychiatric Clinics of North America, 25, 855-885.
Mather, K., & Jinks, J. L. (1982). Biometrical genetics: The study of continuous
variation. London: Chapman and Hall.
Maughan, B., Rowe, R., Messer, J., Goodman, R. & Meltzer, H. (2004). Conduct
Disorder and Oppositional Defiant Disorder in a national sample: developmental
epidemiology. Journal of Child Psychology and Psychiatry, 45(3), 609-21.
McArdle, J .J. (1986). Latent growth within behavior genetic models. Behavior Genetics,
16, 163-200.
McArdle, J. J., & Goldsmith, H. H. (1990). Alternative common factor models for
multivariate biometric analyses. Behavior Genetics, 20, 569-608.
McMahon, R.J. (1994). Diagnosis, assessment, and treatment of externalizing problems
in children: The role of longitudinal data. Journal of Consulting and Clinical
Psychology, 62(5), 901-917.
Meredith, W. (1993). Measurement invariance, factor anslysis and factorial invariance.
Psychometrika, 58, 525-543.
Miles, D. R., Van den Bree, M. & Pickens, R. W. (2002). Sex differences in shared
genetic and environmental influences between conduct disorder symptoms and
marijuana use in adolescents. American Journal of Medical Genetics
(Neuropsychiatric Genetics), 114, 159-168.
Moffitt, T .E (1993). “Life-course-persistent” and “adolescence-limited” antisocial
behavior: A developmental taxonomy. Psychological Review, 100, 674-701.
Moffitt, T. E., Caspi, A., Rutter, M., & Silva, P. A. (2001). Sex differences in antisocial
behavior: Conduct disorder, delinquency, and violence in the Dunedin
Longitudinal Study. Cambridge, England: Cambridge University Press.
87
Muthén, B., Asparouhov, T. & Rebollo, I. (2006). Advances in behavioral genetics
modeling using Mplus: Applications of factor mixture modeling to twin data.
Twin Research and Human Genetics, 9, 313-324.
Muthén, B.O. (1998-2004). Mplus Technical Appendices. Los Angeles, CA: Muthén &
Muthén.
Muthén, L.K. & Muthén, B.O. (1998-2007). Mplus User’s Guide. Fifth Edition. Los
Angeles, CA: Muthén & Muthén.
Los Angeles, CA: Muthén & MuthénNadder, T. S., Silber, J. L., Rutter, M., Maes, H. H.,
& Eaves, L. J. (2001). Comparison of multiple measures of ADHD
symptomatology: A multivariate genetic analysis. Journal of Child Psychology
and Psychiatry, 42(4), 475-486.
Nadder, T. S., Rutter, M., Silberg, J. L., Maes, H. H., & Eaves, L. J. (2002). Genetic
effects on the variation and covariation of attention deficit-hyperactivity disorder
(ADHD) and oppositional-defiant disorder/conduct disorder (ODD/CD) across
informant and occasion of measurement. Psychological Medicine, 32, 39-53.
Neale, M. C., Boker, S. M., Xie, G. & Maes, H. (2003). Mx: Statistical modeling.
Richmond, VA: Department of Psychiatry, Medical College of Virginia.
Neale, M. C., & Cardon, L. R. (1992). Methodology for genetic studies of twins and
families. Dordrecht, the Netherlands: Kluwer Academic Press.
Neale, M. C., & Stevenson, J. (1989). Rater bias in the EASI Temperament Scales: a twin
study. Journal of Personality and Social Psychology, 56, 446-455.
Newman, D. L., Moffitt, T. E., Caspi, A., Magdol, L., Silva, P. A., & Stanton, W. R.
(1996). Psychiatric disorder in a birth cohort of young adults: Prevalence,
comorbidity, clinical significance, and new case incidence from ages 11–21.
Journal of Consulting and Clinical Psychology, 64, 552–562.
Prescott, C.A. (2004). Using the Mplus computer program to estimate models for
continuous and categorical data from twins. Behavior Genetics, 34, 17- 40.
Raine A, Dodge K, Loeber R, Gatzke-Kopp L, Lynam D, Reynolds C, et al. (2006). The
Reactive–Proactive Aggression Questionnaire: Differential correlates of reactive
and proactive aggression in adolescent boys. Aggressive Behavior, 32, 159–171.
88
Rathouz, P.J., Hulle, C.A., Rodgers, J.L., Waldman, I.D., & Lahey, B.B. (2008).
Specification, testing, and interpretation of gene-by-measured-environment
interaction models in the presence of gene-environment correlations. Behavior
Genetics, 38, 301-315.
Rhee, S. H., Waldman, I. D., Hay, D. A., and Levy, F. (1999). Sex differences in genetic
and environmental influences on DSMIII-R attention-deficit/hyperactivity
disorder. Journal of Abnormal Psychology, 108, 24–41.
Rutter, M, Caspi, A., & Moffitt, T. (2003). Using sex differences in psychopathology to
study causal mechanisms: Unifying issues and research strategies. Journal of
Child Psychology and Psychiatry, 44, 1092-1115.
Rutter, M., Giller, H. & Hagell, A. (1998). Antisocial behavior by young people.
Cambridge, UK: Cambridge University Press.
Schaffer, D., Fisher, P., Lucas, C. P., Dulcan, M. K., & Schwab-Stone, M. E. (2000).
NIMH Diagnostic Interview Schedule for Children Version IV (NIMH DISC-IV):
Description, differences from previous versions, and reliability of some common
diagnoses. Journal of the American Academy of Child & Adolescent Psychiatry,
39, 28-38.
Schmid, J., & Leiman J. M. (1957). The development of hierarchical factor solutions.
Psychometrika, 22, 83-90.
Shaw, D.S. & Winslow, E.B. (1997). Precursors and correlates of antisocial behavior
from infancy to preschool. In D.M. Stoff, J. Breiling, & J. Maser (Eds.),
Handbook of antisocial behavior (pp. 148-158). New York: Wiley.
Sherman, D. K., McGue, M. K. & Iacono, W. G. (1997). Twin concordance for attention
deficit hyperactivity disorder: a comparison of teachers’ and mothers’ reports.
American Journal of Psychiatry, 154, 532-535.
Silberg, J., Rutter, M., Meyer, J., Maes, H., Hewitt, J., Simonoff, E., Pickles, A., Loeber,
R., & Eaves, L. (1996). Genetic and environmental influences on the covariation
between hyperactivity and conduct disturbance in juvenile twins. Journal of Child
Psychology and Psychiatry, 37, 803-816.
Simonoff, E., Pickles, A., Hervas, A., Silberg, J. L., Rutter, M. & Eaves, L. (1998).
Genetic influences on childhood hyperactivity: contrast effects imply parental
rating bias, not sibling interaction. Psychological Medicine, 28(4), 825-37.
89
Slutske, W. S., Heath, A. C., Dinwiddie, S. H.,, Maden, P. A. F., Bucholz, K. K., Dunne,
M. P., et al., (1998). Common genetic risk factors for conduct disorder and
alcohol dependence. Journal of Abnormal Psychology, 107, 363-374.
Spencer, T. J., Biederman, J. & Mick, E. (2007). Attention-deficit/hyperactivity disoder:
diagnosis, lifespan, comorbidities, and neurobiology. Journal of Pediatric
Psychology 32(6), 631-42.
Stieger, J. H., & Lind, J. M. (1980). Statistically based tests for the number of common
factors. Paper presented at the meeting of the Psychometric Society, Iowa City,
IA.
Tackett, J. L., Krueger, R. F., Sawyer, M. G., & Graetz, B. W. (2003). Subfactors of
DSM-IV conduct disorder: evidence and connections with syndromes from the
Child Behavior Checklist. Journal of Abnormal Child Psychology, 31(6), 647-
654.
Thapar, A., Harrington, R. & McGuffin, P. (2001), Examing the comorbidity of ADHD-
related behaviours and conduct problems using a twin study design. British
Journal of Psychiatry, 179, 224-229.
Thapar, A. et al. (1999). Genetic basis of attention deficit and hyperactivity. British
Journal of Psychiatry, 174, 105-111.
Tuvblad, C., Zheng, M., Raine, A. & Baker, L. A. (2009). A common genetic factor
explains the covariation among ADHD, ODD and CD symptoms in 9-10 year old
twins. Journal of Abnormal Child Psychology. 37(2), 153-167.
Tuvblad, C., Raine, A., Zheng, M., & Baker, L. A. (in press). The genetic and
environmental stability differs in reactive and proactive aggression.
Van den Oord, E. J. C. G, Simonoff, E., Eaves, L. J., Pickles, A., Silberg, J., & Maes, H.
(2000), An evaluation of different approaches for behavior genetic analyses with
psychiatric symptom scores. Behavior Genetics, 30, 1-18.
Van der Valk, J. C., van den Oord, E. J. C. G., Verhulst, F. C., & Boomsma, D. I. (2001).
Using parental ratings to study the etiology of 3-year-old twins’ problem
behaviors: Different views or rater bias? Journal of Child Psychology and
Psychiatry, 42, 921–931.
Van der Valk, J. C., van den Oord, E. J. C. G., Verhulst, F. C., & Boomsma, D. I. (2003).
Using common and unique parental views to study the etiology of 7-year-old
twins’ internalizing and externalizing problems. Behavior Genetics, 33, 409–420.
90
Waldman, I. D. & Slutske, W. S. (2000). Antisocial behavior and alcoholism: A
behavioral genetic perspective on comorbidity, Clinical Psychology Review,
20(2), 255-287.
White, H. R., Xie, M., Thompson, W., Loeber, R. & Stouthamer-Loeber, M. (2001).
Psychopathology as a predictor of adolescent drug use trajectories. Psychology of
Addictive Behaviors, 15(3), 210-8.
Widaman, K. F. (1985). Hierarchically nested covariance structure models for multitrait-
multimethod data. Applied Psychological Measurement, 9, 1-26.
Yung, Y. F., McLeod, L. D., & Thissen, D. (1999). On the relationship between the
higher-order factor model and the hierarchical factor model. Psychometrika, 64,
113–128.
91
Appendix A: Model Fit Indexes
Model selection requires comparison on goodness-of-fit index of different models.
There are many model fit indexes described in the SEM literature, but the most basic fit
statistic is the χ
2
test. χ
2
test entails fitting function, a mathematical operation to minimize
the difference between the expected variance-covariance matrix (symbolized as Σ) and
the sample variance-covariance matrix (symbolized as S). By far the most widely used
fitting function in applied SEM research is the maximum-likelihood (ML) function.
Assuming multivariate normality and independent identically-distributed random
observations, the ML fitting function is
( ) ) ( ln ) ( ln 2 / 1
1
q p S T trace F
ML
+ − − Σ + Σ =
−
, [1]
where p and q are the number of independent and dependent variables. Σ is the model
expected variance covariance matrix, and S
is the observed sample variance-covariance,
and T is defined as
)' )( ( μ ν μ ν − − + = S T , [2]
in which ν is the vector of observations. Detailed explication of the above functions can
be found in Muthén (2004).
The F
ML
can be derived from the difference of the expected log likelihood value
(lnL) of the hypothetical model and log likelihood value of the unrestricted model (lnL
0
)
which has no restriction on variance-covariance. lnL and lnL
0
are calculated as
) ( 2 ln 2 ln
1
T trace N N c L
−
Σ − Σ − − = [3]
) ( 2 ln 2 ln
0
q p N S N c L + − − − = , [4]
92
where ) 2 ln( 2 / π Np c = and N is the sample size. Thus, F
ML
in equation [1] can be
derived as
( ) N L L F
ML 0
ln ln − − = , [5]
Under typical ML estimation, N F
ML
2 is distributed as χ
2
statistics or is expressed as:
( )
0
2
ln ln 2 L L− − = χ [6]
A statistically significant χ
2
suggest the model estimated do not sufficiently reproduce the
sample variances and covariance, i.e. the model does not fit the data well.
Although the χ
2
test has a long tradition in applied SEM research, it is rarely used
as a sole index of model fit. One criticism of χ
2
is it inflated by samples size, and thus
large N solutions are usually rejected on the basis of χ
2
even when differences between S
and Σ are negligible. Nevertheless, χ
2
is used for other purposes, such as nested model
comparisons and the calculation of other fit indexes (e.g., RMSEA; see below). While χ
2
is routinely reported in all SEM analyses, other fit indexes are usually relied on more
heavily in the evaluation of model fit.
Three popular model fit indexes used in SEM research are the Akaike Information
Criterion (AIC), Bayesian Information Criterion (BIC), and the Root Mean Square Error
of Approximation (RMSEA). They all adjust for model’s parsimony and are often used to
select the best fit one among competing models, especially when models are non-nested.
Among the three, the AIC is best known index. It is defined as
r L AIC 2 ln 2 + − = [7]
Where L is the likelihood value of the expected model as in equation (3) and r is the
93
number of free model parameters (Akaike, 1987). AIC is a “badness-of-fit” index
because the higher the value, the worse the model’s correspondence to the data.
Specifically, the model with the lowest value of AIC is chose as the one most likely to
replicate. This is the model with relatively better fit (lower fit function lnL value) and
fewer parameters. In contrast, more complex modes with comparable overall fit may be
less likely to replicate due to greater capitalization on chance (Kline, 2005).
Another well-known index is the Bayesian information criterion (Schwartz, 1978),
which is defined as
N r L BIC ln ln 2 + − = [7]
Like the AIC, it can be used to select among competing non-nested models fitted to the
same data, and the model with the lowest values is preferred. However, the BIC takes
account of sample size, and thus penalize complexity more than AIC.
Similar to the two fit indexes, root mean square error of approximation (RMSEA,
Stieger & Lind, 1980) also takes into account model complexity, as reflected in the
degree of freedom in the following equation.
N df RMSEA ) 1 (
2
− = χ [8]
It has been suggested that a value of the RMSEA of less than .05 is indicative of the
model being a reasonable approximation to the analyzed data (Browne & Cudeck, 1993).
Some have found that RMSEA is among the fit indexes are least affected by sample size;
this feature sets the RMSEA apart from many other fit indexes that are sample depdent or
dependent on sample size. Thus it is often called as a population based index (See
Loehlin, 2004).
94
Appendix B: Histograms of Raw Scores and Rank Normalized Scores
Mother Rated ADHD
Mother Rated CAQ
Mother Rated CBCL
Raw Scores Transformed Scores
Raw Scores Transformed Scores
Raw Scores
Raw Scores
Transformed Scores
95
Teacher Rated ADHD
Teacher Rated CAQ
Teacher Rated CBCL
Raw Scores Transformed Scores
Raw Scores
Raw Scores
Transformed Scores
Transformed Scores
96
Appendix C: Simulation Model Example
Fitting PM1 on SIM4
F
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
F
P
F
T
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
A1
A C E
C1 E1 A2 C2 E2
1.0
0.1
0.3
0.7
0.3
0.6
0.7 0.3 0.0
0.5
0.4
0.5
0.4
0.5
0.3
0.3
0.1
0.0
0.5
0.9
1.0 0.5 1.0
1.0
0.3 0.4 0.3 0.3 0.3 0.2
ADHD
P
CAQ
P
CBCL
P
ADHD
T
CAQ
T
CBCL
T
F
A C E
ε
1
ε
2
ε
3
ε
4
ε
5
ε
6
.4 .3 .3
1.0
.8
.9 1.5
1.3
1.5
0.6 0.8 0.7 0.3 0.4 0.3
Analysis
Model
True
Model
97
Appendix D: Monte Carlo Simulation Mplus Script:
Fitting PM4 on SIM4
TITLE: Simulation True Model 4 - One Common Factor + Two Rater Factors
Data Fit Model PM4;
MONTECARLO:
NAMES = y11 y12 y13 y14 y15 y16
y21 y22 y23 y24 y25 y26;
NREPS = 500;
NOBSERVATIONS = 138 139 84 97 147;
NGROUPS=5;
SEED = 20081219;
PATMISS = y14 (0) y15 (0) y16 (0) y24 (0) y25 (0) y26 (0)|
y14 (1) y15 (1) y16 (1) y24 (0) y25 (0) y26 (0)|
y14 (0) y15 (0) y16 (0) y24 (1) y25 (1) y26 (1)|
y14 (1) y15 (1) y16 (1) y24 (1) y25 (1) y26 (1);
PATPROB = .48|.12|.12|.28;
POPULATION = sim4.estimates.dat;
COVERAGE = sim4.estimates.dat;
! REPSAVE = ALL;
! SAVE = sim4.rep*.dat;
RESULTS = Sim4.4.dat;
ANALYSIS: MODEL = NOCOVARIANCES;
MODEL POPULATION:
! Phenotypic factor structure
f_t1 BY y11@1
y12*0.05 (lm2)
y13*0.25 (lm3)
y14*0.70 (lm4)
y15*0.30 (lm5)
y16*0.60 (lm6);
f_t2 BY y21@1
y22*0.05 (lm2)
y23*0.25 (lm3)
y24*0.70 (lm4)
98
y25*0.30 (lm5)
y26*0.50 (lm6);
f1_t1 BY y11*0.5 (lm11)
y12*0.9 (lm12)
y13@1;
f1_t2 BY y21*0.5 (lm11)
y22*0.5 (lm12)
y23@1;
f2_t1 BY y14-y15*0.9 (lm24-lm25)
y16@1;
f2_t2 BY y24-y25*0.9 (lm24-lm25)
y26@1;
[f_t1-f2_t2@0];
f_t1-f2_t2@0 ; !factor variance fixed@0;
! Biometric decomposition of the general factor;
A_t1 BY f_t1*0.8 (ma);
A_t2 BY f_t2*0.8 (ma);
C_t1 BY f_t1*0.1 (mc);
C_t2 BY f_t2*0.1 (mc);
E_t1 BY f_t1*0.5 (me);
E_t2 BY f_t2*0.5 (me);
! Cholesky decomposition of the two rater factors;
A1_t1 BY f1_t1*0.5 (ma11);
A1_t1 BY f2_t1*0.3 (ma12);
A2_t1 BY f2_t1*0.2 (ma22);
A1_t2 BY f1_t2*0.5 (ma11);
A1_t2 BY f2_t2*0.3 (ma12);
A2_t2 BY f2_t2*0.2 (ma22);
C1_t1 BY f1_t1*0.4 (mc11);
C1_t1 BY f2_t1*0.1 (mc12);
C2_t1 BY f2_t1*0.5 (mc22);
C1_t2 BY f1_t2*0.4 (mc11);
C1_t2 BY f2_t2*0.1 (mc12);
C2_t2 BY f2_t2*0.5 (mc22);
E1_t1 BY f1_t1*0.5 (me11);
E1_t1 BY f2_t1*0.1 (me12);
E2_t1 BY f2_t1*0.4 (me22);
E1_t2 BY f1_t2*0.5 (me11);
E1_t2 BY f2_t2*0.1 (me12);
E2_t2 BY f2_t2*0.4 (me22);
! Shared factor correlations between twins
99
[A_t1-E2_t2@0];
A_t1-E2_t2@1;
A_t1 WITH A_t2@1;
A1_t1 WITH A1_t2@1;
A2_t1 WITH A2_t2@1;
C_t1 WITH C_t2@1;
C1_t1 WITH C1_t2@1;
C2_t1 WITH C2_t2@1;
! Biometric decomposition of phenotypic residuals;
y11-y16 (mv1-mv6);
y21-y26 (mv1-mv6);
y11 WITH y21 (mzmr1);
y12 WITH y22 (mzmr2);
y13 WITH y23 (mzmr3);
y14 WITH y24 (mzmr4);
y15 WITH y25 (mzmr5);
y16 WITH y26 (mzmr6);
[y11-y16*0] (mmu1-mmu6);
[y21-y26*0] (mmu1-mmu6);
MODEL POPULATION-g2:
MODEL POPULATION-g3:
A_t1 WITH A_t2@0.5;
A1_t1 WITH A1_t2@0.5;
A2_t1 WITH A2_t2@0.5;
y11 WITH y21 (dzmr1);
y12 WITH y22 (dzmr2);
y13 WITH y23 (dzmr3);
y14 WITH y24 (dzmr4);
y15 WITH y25 (dzmr5);
y16 WITH y26 (dzmr6);
MODEL POPULATION-g4:
A_t1 WITH A_t2@0.5;
A1_t1 WITH A1_t2@0.5;
A2_t1 WITH A2_t2@0.5;
y11 WITH y21 (dzmr1);
y12 WITH y22 (dzmr2);
y13 WITH y23 (dzmr3);
y14 WITH y24 (dzmr4);
y15 WITH y25 (dzmr5);
100
y16 WITH y26 (dzmr6);
MODEL POPULATION-g5:
A_t1 WITH A_t2@0.5;
A1_t1 WITH A1_t2@0.5;
A2_t1 WITH A2_t2@0.5;
y11 WITH y21 (dzmr1);
y12 WITH y22 (dzmr2);
y13 WITH y23 (dzmr3);
y14 WITH y24 (dzmr4);
y15 WITH y25 (dzmr5);
y16 WITH y26 (dzmr6);
MODEL:
! Phenotypic factor structure
f_t1 BY y11@1
y12*0.05 (lm2)
y13*0.25 (lm3)
y14*0.70 (lm4)
y15*0.30 (lm5)
y16*0.60 (lm6);
f_t2 BY y21@1
y22*0.05 (lm2)
y23*0.25 (lm3)
y24*0.70 (lm4)
y25*0.30 (lm5)
y26*0.50 (lm6);
f1_t1 BY y11*0.5 (lm11)
y12*0.9 (lm12)
y13@1;
f1_t2 BY y21*0.5 (lm11)
y22*0.5 (lm12)
y23@1;
f2_t1 BY y14-y15*0.9 (lm24-lm25)
y16@1;
f2_t2 BY y24-y25*0.9 (lm24-lm25)
y26@1;
[f_t1-f2_t2@0];
f_t1-f2_t2@0 ; !factor variance fixed@0;
! Biometric decomposition of the general factor;
A_t1 BY f_t1*0.8 (ma);
A_t2 BY f_t2*0.8 (ma);
C_t1 BY f_t1*0.1 (mc);
C_t2 BY f_t2*0.1 (mc);
E_t1 BY f_t1*0.5 (me);
101
E_t2 BY f_t2*0.5 (me);
! Cholesky decomposition of the two rater factors;
A1_t1 BY f1_t1*0.5 (ma11);
A1_t1 BY f2_t1*0.3 (ma12);
A2_t1 BY f2_t1*0.2 (ma22);
A1_t2 BY f1_t2*0.5 (ma11);
A1_t2 BY f2_t2*0.3 (ma12);
A2_t2 BY f2_t2*0.2 (ma22);
C1_t1 BY f1_t1*0.4 (mc11);
C1_t1 BY f2_t1*0.1 (mc12);
C2_t1 BY f2_t1*0.5 (mc22);
C1_t2 BY f1_t2*0.4 (mc11);
C1_t2 BY f2_t2*0.1 (mc12);
C2_t2 BY f2_t2*0.5 (mc22);
E1_t1 BY f1_t1*0.5 (me11);
E1_t1 BY f2_t1*0.1 (me12);
E2_t1 BY f2_t1*0.4 (me22);
E1_t2 BY f1_t2*0.5 (me11);
E1_t2 BY f2_t2*0.1 (me12);
E2_t2 BY f2_t2*0.4 (me22);
! Shared factor correlations between twins
[A_t1-E2_t2@0];
A_t1-E2_t2@1;
A_t1 WITH A_t2@1;
A1_t1 WITH A1_t2@1;
A2_t1 WITH A2_t2@1;
C_t1 WITH C_t2@1;
C1_t1 WITH C1_t2@1;
C2_t1 WITH C2_t2@1;
! Biometric decomposition of phenotypic residuals;
y11-y16 (mv1-mv6);
y21-y26 (mv1-mv6);
y11 WITH y21 (mzmr1);
y12 WITH y22 (mzmr2);
y13 WITH y23 (mzmr3);
y14 WITH y24 (mzmr4);
y15 WITH y25 (mzmr5);
y16 WITH y26 (mzmr6);
[y11-y16*0] (mmu1-mmu6);
102
[y21-y26*0] (mmu1-mmu6);
MODEL g2:
MODEL g3:
A_t1 WITH A_t2@0.5;
A1_t1 WITH A1_t2@0.5;
A2_t1 WITH A2_t2@0.5;
y11 WITH y21 (dzmr1);
y12 WITH y22 (dzmr2);
y13 WITH y23 (dzmr3);
y14 WITH y24 (dzmr4);
y15 WITH y25 (dzmr5);
y16 WITH y26 (dzmr6);
MODEL g4:
A_t1 WITH A_t2@0.5;
A1_t1 WITH A1_t2@0.5;
A2_t1 WITH A2_t2@0.5;
y11 WITH y21 (dzmr1);
y12 WITH y22 (dzmr2);
y13 WITH y23 (dzmr3);
y14 WITH y24 (dzmr4);
y15 WITH y25 (dzmr5);
y16 WITH y26 (dzmr6);
MODEL g5:
A_t1 WITH A_t2@0.5;
A1_t1 WITH A1_t2@0.5;
A2_t1 WITH A2_t2@0.5;
y11 WITH y21 (dzmr1);
y12 WITH y22 (dzmr2);
y13 WITH y23 (dzmr3);
y14 WITH y24 (dzmr4);
y15 WITH y25 (dzmr5);
y16 WITH y26 (dzmr6);
OUTPUT: TECH9;
Abstract (if available)
Abstract
The aim of this dissertation study was to investigate the best models for estimating genetic and environmental influences on child externalizing behavior in multivariate multi-rater twin research. Empirical data were analyzed for a sample of 605 twin pairs (age 9-10), drawn from a twin study of risk factors for antisocial behavior at the University of Southern California (USC). The twins were rated by both caregivers and teachers on several aspects of externalizing behavior using three widely used instruments: symptoms for Attention Deficit Hyperactivity Disorder (ADHD) using the ADHD module in the Diagnostic Interview Schedule for Children (DISC-IV), aggressive behaviors using the Reactive and Proactive Aggression Questionnaire (RPQ), and Child Behavior Checklist Externalizing (CBCL) behavior problems. Six competing multi-trait multi-rater genetic models were fitted and the best fitting model was found to be the "general factor and correlated raters model", which include one common ADHD factor shared by all measurements and two oblique rater factors -- represented mainly by individual rater's view on aggression and delinquency. In terms of rater effects, mother reports were found to be more reliable, while teachers has less ability to distinguish different forms of externalizing behavior. This study also employed a Monte Carlo simulation to evaluate the power and parameter estimates of the "one common and two correlated rater factors model". Reasonable power and sufficient precision of parameter estimates were obtained at this sample size. All analyses for this dissertation were conducted in the Mplus software.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Biometric models of psychopathic traits in adolescence: a comparison of item-level and sum-score approaches
PDF
Different genetic and environmental structures for the overlap of three antisocial behavior factors with alcohol initiation
PDF
Sources of stability and change in the trajectory of openness to experience across the lifespan
PDF
Untangling the developmental relations between depression and externalizing behavior among maltreated adolescents
PDF
Patterns of EEG spectral power in 9-10 year old twins and their relationships with aggressive and nonaggressive antisocial behavior in childhood and adolescence
PDF
Evaluating social-cognitive measures of motivation in a longitudinal study of people completing New Year's resolutions to exercise
Asset Metadata
Creator
Zheng, Mo
(author)
Core Title
Multivariate biometric modeling among multiple traits across different raters in child externalizing behavior studies
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
04/10/2009
Defense Date
01/07/2009
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
behavioral genetics,child psychopathology,externalization,methodology,Models,OAI-PMH Harvest,Twins
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Baker, Laura A. (
committee chair
), McArdle, John J. (
committee member
), Prescott, Carol A. (
committee member
), Silverstein, Merril (
committee member
)
Creator Email
mozheng2004@yahoo.com,mzheng@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2077
Unique identifier
UC1174492
Identifier
etd-Zheng-2657 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-207784 (legacy record id),usctheses-m2077 (legacy record id)
Legacy Identifier
etd-Zheng-2657.pdf
Dmrecord
207784
Document Type
Dissertation
Rights
Zheng, Mo
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
behavioral genetics
child psychopathology
externalization
methodology