Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Regularized structural equation modeling
(USC Thesis Other)
Regularized structural equation modeling
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Regularized Structural Equation Modeling
Ross Jacobucci
A Thesis Submitted in Partial Fulfillment of the
Requirements for the Degree of
Master of Arts
in Psychology
at
The University of Southern California
December 2015
Contents
Abstract 3
Introduction 4
Background 6
RAM Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Optimization Using RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Latent Variable Graphical Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Regularized EFA and PCA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
Proposed Method 17
Aims . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Method 19
Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Regularized Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
CFA with noise variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Computational Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
Results 23
Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
CFA with Noise Added . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
Discussion 30
References 35
2
Abstract
A new method is proposed, termed Regularized Structural Equation Modeling (RegSEM).
RegSEM adds a penalization term to the traditionally used maximum likelihood fit
function for SEM with the goal of creating easier to understand and simpler SEM.
Although regularization has gained wide acceptance in regression models, very little has
transferred to latent variable models. Implementated as an R (R Core Team, 2014)
package, termed regsem, and in the R package OpenMx (Neale et al., 2015) for general
SEM, its use was demonstrated with two different models. First, results from RegSEM
were compared to that from the glmnet package (Friedman, Hastie, Hofling, & Tibshirani,
2007), a popular package that implements both lasso and ridge regression. Additionally,
RegSEM was tested as a tool for subset selection with a confirmatory factor analysis (CFA)
model. Results were promising with both demonstrations, as performance matched that of
the results from established statistical programs for regularization as well as producing a
sparse CFA solution.
Keywords: regularization, structural equation modeling, lasso, ridge, penalization,
shrinkage, factor analysis
3
Introduction
The desire for model simplicity in factor analysis (FA) has led to the long-standing
goal of achieving simple-structure. By simple structure, I am broadly referring to a pattern
of factor loadings where each indicator is influenced by a minimal number of latent factors,
preferably one. Multiple researchers have set out different criteria for achieving a form of
simple structure, with the most notable being Thurstone (1935). To determine whether a
matrix of loadings was in fact a simple structure, originally, Thurstone (1935) set three
rules, only to be elaborated later (Thurstone, 1947).
A trade-off that each researcher faces, is the desire of achieving a parsimonious
structure while simultaneously achieving a good fitting model. One method for acheving a
simpler structure which implicity induces penalties for over-parameterized models, thus
potentially performing a sort of model reduction, is the use of cross-validation (CV;
Browne, 2000). CV for SEM can be conducted using a single sample (Browne & Cudeck,
1989), splitting a sample into two (Cudeck & Browne, 1983), or using a “leave-one-out”
method for more than two subsamples (De Gooijer & Koopman, 1988). When CV is
conducted with two or more samples, a calibration sample is used to first estimate the
model parameters, then testing these fixed estimates on a test sample, thus deriving a more
realistic estimate of how the model would fit in the population.
Methods for estimating the fit of models on mutliple samples can be compared to
single sample fit indices that induce some form of penalty to prevent overly complex
models. Three of the most popular single sample information criteria include the Akaike
Information Criteria (AIC; Akaike, 1973), Schwarz Bayesian Information Criteria (BIC;
Schwarz, 1978), and the Consistent AIC (CAIC; Bozdogan, 1987). All three indices include
different forms of penalty that account for the number of free parameters in the model. In
a study where these criteria were compared across number of samples used, factor loading
magnitude, and variable non-normality, CV specific fit indices did not consistently
outperform information criteria (Whittaker & Stapleton, 2006).
4
In conjunction with the development of fit indices that impose penalties to create
simpler models, there has been a growing interest in developing new methods that are more
efficient in achieving simple-structure through the use of alternative cost functions. The
drawback of these alternative cost functions is that although simpler models can be
achieved, it comes at the cost of inducing forms of bias, resulting in poorer model fit and a
reduction in the explained variance. Despite inducing forms of bias in the estimation of
within sample fit, ultimately, a simpler structure can be achieved, leading to application in
a wide range of methods. In the context of latent variable models, sparse estimation
techniques were first applied to principal components analysis through the use of
penalization to create sparse loadings (Zou, Hastie, & Tibshirani, 2006) along with
alternative methods of rotation (Trendafilov & Adachi, 2014). Similar methods have also
been applied to factor analysis through the use of convex and non-convex penalized
maximum likelihood (Hirose & Yamamoto, 2014a; Choi, Zou, & Oehlert, 2010; Ning &
Georgiou, 2011).
The current drawback of most new methods for sparse estimation is that they lack the
flexibility to allow researchers to constrain some parameters in the model. This is most
explicitly captured by confirmatory factor analysis, where select factor loadings are
constrained to zero. In an attempt to extend the use of regularization beyond this
limitation, a new method is proposed, called Regularized SEM (RegSEM). RegSEM adds
penalties to specific SEM parameters set by the researcher, allowing for greater flexibility
in model selection. Traditionally, model selection is a categorical choice; selection is
between a small number of models that were either hypothesized, or created as a result of
poor fit. By adding a sequence of small penalties to specific parameters, it turns model
selection into a continuous choice, where the resulting models can be seen as hybrids. For
instance, a practitioner could start with a completely unconstrained factor model, selecting
three latent factors, and allowing every item to load on every factor. By gradually
increasing (decreasing) penalties could be added to each factor loading parameter, until
5
every loading equals zero. An external criterion such as performance on a holdout dataset
(CV), or an information criteria that takes into account complexity (e.g. AIC, BIC) could
then be used to choose one of the models. By building penalties directly into the
estimation, it allows the researcher greater flexibility in testing various models while
building in safeguards to prevent overfitting. Model selection of this nature is very common
in the data mining literature, however, in psychometrics it is rarely if ever used.
To allow for model estimation and selection of this nature, new forms of estimation are
needed, along with new forms of software. To explain this framework and how it was
implemented, first it is necessary to cover background material, while also detailing how
regularization has been implemented across various fields, ending with its use in the newly
proposed method of SEM.
Background
RAM Expectations. To provide the background on general Structural Equation
Modeling (SEM), I will use reticular action model (RAM; McArdle & McDonald, 1984;
McArdle, 2005) notation as it provides a 1:1 correspondence between the graphical
representation and matrix specification (Boker, Neale, & Rausch, 2004), as well as only
requiring three matrices. RAM specification forms the foundation for the SEM program
OpenMx (Neale et al., 2015), which will be used extensively in later sections. This section
will only provide a brief treatment of the underlying matrix specification and algebra, for
further detail see (Browne & Arminger, 1995; McArdle, 2005).
One of the main benefits in using RAM notation, is that only one matrix is needed to
capture the direct effects. This will show to be extremely useful later when regularization
is introduced. To accomplish this with any of the other various matrix formulations would
require penalizing more than one matrix, leading to a more complex formulation.
In RAM, the three matrices are defined as the filter (F), the asymmetric (A), and the
symmetric (S; note in McArdle (2005) S was instead represented by Ω) . To provide a
6
demonstration of the specification for each of the three variables, a one-factor model is
depicted in figure 1.
Figure 1. Example 1 Factor CFA Model
The F matrix is a p x (p+l) matrix, where p is the number of manifest variables, and l
is the number of latent variables. This matrix contains ones in the positions corresponding
to each manifest variables, and zeroes everywhere else. In this case, with four observed
variables (X1,X2,X3,X4) and one latent variable (F), the F matrix has four rows and five
columns.
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
Unidirection relationships, which can be thought of as the directed paths or
regressions between variables, are contained in the A matrix. In figure 1, the only directed
7
paths are the four factor loadings, each emanating from the factor F to each of the
observed variables. This results in an A matrix of
0 0 0 0 λ1
0 0 0 0 λ2
0 0 0 0 λ3
0 0 0 0 λ4
0 0 0 0 0
In A, notice that there are no relationships between the observed variables (columns 1
through 4), the only entries are in column five for the factor F. If we wished to incorporate
the mean structure (necessary for longitudinal models), we could add an additional column
to A, thought of as a unit constant that separates the means from covariances for some
models. This is usually depicted as a "1" inside of a close triangle when graphed and can
be interpreted as the mean intercept when a directed path exists from the unit constant to
a different variable (observed or latent). Since none of the models tested in the remainder
of this paper model the means, details regarding their inclusion to matrices and the
resultant expectations are ommitted (see McArdle, 2005; Grimm & McArdle, 2005)
The third matrix S is a square matrix that contains any two-headed arrows, which can
be either variances or covariances. In figure 1, there is a variance for each of the manifest
variables and one variance for F. This results in
1 0 0 0 0
0 1 0 0 0
0 0 1 0 0
0 0 0 1 0
0 0 0 0 1
8
In this case, all of the variances have been constrained to 1. If, for instance, we wanted
to add a covariance between X1 and X2, we could add a free parameter σ
X1,X2
to position
[2,1] and [1,2] in S. A more thorough explanation of the matrices in RAM and subsequent
expectations can be found in Grimm and McArdle (2005).
After having specified these three matrices, we can now calculate the expected
covariance matrix (McArdle & McDonald, 1984),
Σ =F (I−A)
−1
S(I−A)
−1
0
F
0
(1)
where
−1
refers to taking the inverse of a matrix and
0
is the transpose operator. Once the
expected covariance matrix is calculated, this can then be inserted into a loss function such
as maximum likelihood (Lawley, 1940; Jöreskog, 1969).
F
ML
=log(|Σ|) +tr(C∗ (Σ)
−1
)−log(|C|)−p (2)
where C is the sample covariance matrix and p is the number of observed variables and|·|
is the determinant.
Optimization Using RAM. Although the first and second order partial
derivatives can be specified for many different types of SEMs, to fit a broad array of
different models, without having to specify separate estimation equations, a general form
for approximating the first and second order partial derivatives will be used.
Based on the work of Browne & du Toit, 1992 and others, and following the notation
from Cudeck, Klebe, and Henly (1993), under multivariate normality, the gradient and
Hessian are approximated by:
∂F
ML
∂γ
i
=g
i
=
1
2
tr(A
−1
(Σ−C)A
−1
˙
Σ
i
) (3)
∂
2
F
ML
∂γ
i
∂γ
j
=h
ij
=
1
2
tr(A
−1
˙
Σ
i
A
−1
˙
Σ
j
) (4)
9
where γ
i
is the vector of free parameters, Σ is the expected covariance matrix, C is the
sample covariance matrix, and A is Σ(γ
(t)
) for ML estimation.
˙
Σ
i
contains the partial derivatives of the covariance matrix with respect to the i-th
parameter evaluated at evaluated at iteration t, which is approximated by
˙
Σ
i
=
∂Σ
∂γ
i
(γ
t
)≈
Σ(γ
(
t) +e
i
h)− Σ(γ
t)
h
,i = 1,...,q. (5)
where h is a small positive constant (e.g. h = 10
−5
) and e
i
is a vector of length q with an
entry of 1 at the i-th position. Practically, this is can be implemented in a q iteration loop.
For each iteration i, a 1 is in the i-th position of the e vector, which through multiplication
by h, adds the small positive constant to the i-th parameter in γ. This then approximates
the changes in the expected covariance matrix with respect to each free parameter.
Referring back to the equations for the gradient and Hessian, approximating the
parameters of
˙
Σ
i
allows each value of g
i
and h
ij
to be propagated by looping through each
successive parameter. The method for approximating
˙
Σ
i
is referred to as the finite
difference method (e.g. Burden & Faires, 2001). As the method of this paper uses the
RAM matrix formulation, it is worth noting there exist faster algorithms for exactly
computing the gradient and Hessian that capitalize on the sparse nature of the RAM
matrices (von Oertzen & Brick, 2014).
Regression. In the context of regression, three well known problems include adding
predictors can only increase the insample predictive power, even if the variables constitute
random noise, multicollinearity among predictors, and problems with estmation when the
number of predictors is larger than the number of cases (p > n). One method to deal with
both of these problems is to introduce what can be termed shrinkage, regularization, or
penlization (I will use the terms interchangeably). The two most common procedures for
this are ridge (Hoerl & Kennard, 1970) and lasso (Tibshirani, 1996) regression, with
various alternative forms that can be seen as subsets of these two procedures. Both
methods minimize a penalized residual sum of squares, with ridge
10
ˆ
β
ridge
=argmin
N
X
i=1
(y
i
=β
0
−
p
X
j=1
x
ij
β
j
)
2
+λ
p
X
j=1
β
2
j
. (6)
where λ≥ 0 is the penalty that controls the amount of shrinkage. When λ is equal to zero,
this reduces to standard linear regression and has a closed form solution. As lambda is
increased, the β parameters will be shrunk towards zero. Note that in ridge regression, the
β parameters do not necessarily get shrunken to zero, as they can hover around zero
without being fixed to zero. However, in lasso regression, the use of the l1 norm instead of
l2 in ridge also shrinks the β parameters, but instead the parameters are driven to zero,
thus performing a form of subset selection. The difference between ridge and lasso is the
inclusion of the l1 norm
ˆ
β
lasso
=argmin
N
X
i=1
(y
i
=β
0
−
p
X
j=1
x
ij
β
j
)
2
+λ
p
X
j=1
|β
j
|
. (7)
Unlike ridge regression, lasso regression does not have a closed form solution, thus
becoming a quadratic programming problem, or optimized with the use of a derivative
based method. The gradient of lasso regression (Pourahmadi, 2013; Friedman, Hastie, &
Tibshirani, 2008) is
∂f(β)
∂β
=−y +β +λ∗sign(β) (8)
making it possible to use a similar gradient based method for optimization to that of
general SEM.
A number of generalizations of these two methods exist within and outside of the
context of regression. Generalizations outside of regression will be touched on in the
following sections. Two generalizations within regression that are worth noting are the
elastic net (Zou & Hastie, 2005) and the Bayesian forms of the lasso and ridge (Park &
Casella, 2008; Tibshirani, 1996). The elastic net penalty can be seen as a combination of
both the lasso and ridge
11
λ
p
X
j=1
(αβ
2
j
+ (1−α)|β
j
|). (9)
In this, α controls the amount of compromise between the penalties. When α equals one,
the equation reduces to that of ridge regression, and likewise when α equals zero and
equaling the lasso penalty.
In the Bayesian context, lasso regression can be conducted as inducing Laplace
(double exponential) distribution priors on the β parameters. Decreasing the prior variance
is equivalent increasing λ in equation 7. In the case of ridge regression, a normal prior
would be used to constrain estimates. The lasso estimate is the mode of the posterior as
the ridge is the mean (Park & Casella, 2008; Tibshirani, 1996).
Although these methods for regularization seem intuitively appealing, the question of
how to best choose λ remains. Two different methods seem to predominate; using a fit
index that takes into account complexity such as the AIC or BIC, or using cross-validation.
The simplest form of CV, and the type that has been used widely in SEM applications
(Browne, 2000) is splitting the sample into two, a training and test dataset. A large
number of pre-specified lambda values are specified (e.g. 40), with each of these models run
on the training dataset. Instead of examining the fit of the model at this point, the
parameter estimates are then treated as fixed, and model fit is then re-estimated on the
test dataset. This almost invariably results in a worse fit, but ultimately giving a more
accurate estimate of the generalization of the model (reducing variance). After all of these
models have generated fit on the test dataset, it is common to pick the model that results
in the best fit on the test dataset, or choose the most penalized model (most sparse) within
one standard errors of the best fit (Hastie, Tibshirani, & Friedman, 2009).
Graphical Models. Developed in a separate track than that of SEM, graphical
models or Bayesian networks (in the directed path case) have obvious similarities through
their graphical nature. Their development originated in computer science (Pearl, 1988),
and have been more focused on explicating the causal relationship between variables
12
without having to resort to longitudinal designs ( Pearl, 2000; Bollen & Pearl, 2013). The
similarities between methodologies has spawned hybrid models (Duarte, Klimentidis,
Harris, Cardel, & Fernandez, 2011), that capitalize on each frameworks strengths, as well
as models that build upon their equivalence with Gaussian variables (see Shimizu, Hoyer,
Hyvärinen, & Kerminen, 2006; Buhlmann, Peters, & Ernest, 2014).
Outside of brain scan research, graphical models have not seen a wide adoption with
typical psychological variables such as those from questionnaires or cognitive scales.
However, very recently, some research has been conducted using graphical models for
personality research (Costantini et al., 2015), in the study of comorbidity in
psychopathology (Borsboom, Cramer, Schmittmann, Epskamp, & Waldorp, 2011), and
finding the structure in the Beck Depression scale (Bringmann, Lemmens, Huibers,
Borsboom, & Tuerlinckx, 2014).
The differences between both graphical models and SEM arise in the presence of latent
variables. Graphical models capitalize on the conditional distributions between variables in
the model, reducing what could be a large network to only estimating cliques, or smaller
networks that are conditionally independent from the rest of the network. These
conditional distributions also hold in path analysis (SEM without latent variables), but
when latent variables are present in the model, this is no longer the case. Estimation with
latent variables becomes more complicated, and may be the reason for less of a focus on
using latent variables for dimension reduction (also can be termed hidden variables and
used to represent unmeasured confounders especially in assigning causality, see Pearl, 2000)
in graphical models (although see Glymour, Scheines, & Spirtes, 1987 and subsequent
books and articles from same group).
In graphical models, when the models are not a priori specified, it is common to use
some form of search technique to find the optimal model given the dataset. Although
greedy, stochastic search algorithms were common at the outset (e.g. Chickering, 2003),
recently regularization has seen a wide variety of applications with graphical models
13
(Meinshausen & Buhlmann, 2006; Friedman et al., 2008; Yuan & Lin, 2007). Although
these proposed methods are similar in many senses, the method that has potentially seen
the most application is that of the graphical lasso (Friedman et al., 2008; Friedman, Hastie,
& Tibshirani, 2010; Mazumder & Hastie, 2012). In contrast to SEM where the sample
covariance C matrix is used for estimation, in graphical models it is generally the precision
matrix Θ, or the inverse of the of the sample covariance matrix C
−1
= Θ. By making the
assumption that the observations and conditional distributions follow a Gaussian
distribution, this leads directly to the precision matrix then containing the partial
covariances between variables. Specifically, if the ijth component of Θ is zero, then the
variables i and j are conditionally independent, meaning the path (either undirected or
directed) between variables in the model is also zero. This is especially important as in
model selection, the model can be estimated only using the precision matrix, as without
penalty, this encodes an unconstrained graphical model (paths between all variables).
Using this, a penalty on the Θ matrix can be introduced which leads directly to a form of
constraint on the paths within the model. In the graphical lasso, this penalty takes the
form of
f(Θ) =log(|Θ|)−tr(CΘ)−λkΘk
1
, (10)
where λ is the shrinkage parameter, similar to in regression context, andkΘk
1
is the L
1
norm (Friedman et al., 2007). It is worth noting that if Θ is instead represented as Σ
−1
and
the regularization parameter is removed, the equation becomes
f(Σ
−1
) =log(|Σ
−1
|)−tr(CΣ
−1
) (11)
it is easy to see the similarity to the maximum likelihood fit function for SEM
F
ML
=log(|Σ|) +tr(C∗ (Σ)
−1
)−log(|C|)−p, (12)
which (Jöreskog, 1969) showed is equivelent to
14
lnL =−p =
1
2
mln|Σ
0
|−
1
2
mtr(Σ
−1
0
C). (13)
from (Mulaik, 2009).This works because|Σ
−1
| = 1/|Σ| and the order of tr(CΣ
−1
) doesn’t
matter due to the cyclic property of the trace (see Dahl, Roychowdhury, and Vandenberghe
(2005) for explicit ML presentation).
Latent Variable Graphical Models. The application of regularization in
graphical models, in the presence of only measured variables, has sparked an increased
amount of research in the use of regularization in graphical models with latent variables
(Chandrasekaran, Parrilo, & Willsky, 2012 and discussion series). In this, the sample
dataset can be thought to be composed of observed variables, X
px1
, and latent (hidden)
variables, Y
rx1
, where pr. An additional assumption is that the variables X are
continuous, thus the joint distribution of X and Y follow a multivariate Gaussian. Given
this, the covariance between X and Y can be decomposed into 4 matrices:
Σ
X,Y
= [Σ
X
, Σ
XY
; Σ
Y
, Σ
XY
] and similarly for the inverse covariance
Θ
X,Y
= [Θ
X
, Θ
XY
; Θ
Y
, Θ
XY
]. The goal is to create a sparse marginal precision matrix of
observed variables, or Σ
−1
= Θ
X
− Θ
XY
Θ
−1
Y
Θ
XY
. Generally, Θ
X
is not sparse without the
introduction of a low-rank matrix Θ
XY
Θ
−1
Y
Θ
XY
. These two matrices will be called S and L,
respectively
Σ
−1
= Θ
X
|{z}
S
− Θ
XY
Θ
−1
Y
Θ
XY
| {z }
L
, (14)
which completely account for the covariance between observed variables.
The goal in the graphical modelling context is indentical to that in SEM in that by
introducing latent variables, there will be small to no residual covariance between observed
variables conditional upon the latent variable(s). The problem of estimating S and L
becomes a convex optimization problem, and was successfully solved by Chandrasekaran et
al. (2012), with further speedups provided by Ma, Xue, and Zou (2013). To further induce
sparesness into both the S and L (lower rank) matrices, separate shrinkage parameters
15
were introduced to penalize both S and L
minimize `(S−L, Σ
X
)−log|S−L| +αkSk
1
+βTr(L)
subject to S−L 0, L 0,
(15)
in Ma et al. (2013).
Regularized EFA and PCA
First proposed in the context of principal components analysis (PCA;Jolliffe,
Trendafilov, & Uddin, 2003; Zou et al., 2006), sparse estimation procedures for dimension
reduction techniques have since flourished. Although a number of different methods have
been proposed for both PCA and exploratory factor analysis (EFA; Ning & Georgiou,
2011; Choi et al., 2010; Hirose & Yamamoto, 2014b; Jung & Takane, 2008) One of these
Jung and Takane (2008), proposed to penalize the diagonal of the observed covariance or
correlation matrix in order to overcome the propensity for EFA to yield improper solutions
(negative unique variances). This is in contrast to other sparse EFA methods that explicity
use different forms of regularization as a method of achieving a "sparse" structure (the term
"sparse" can be attributed to Zou et al., 2006), similar to the original goal of Thurstone in
achieving simple structure. A sparse structure denotes a factor loading matrix with a large
number of zeroes in it, leading to simpler interpretations of each latent factor. This is in
contrast to what is done in typical EFA and PCA, where an orthogonal factor (component)
structure is estimated with a pre-specified number of factors (components) to be extracted.
This solution then undergoes a rotation, either oblique or orthogonal, to achieve some form
of simple structure. Typically, this rotation does not lead to loadings of exactly zero.
Instead, loadings below some pre-specified thrshold are either ommitted from display, or
are replaced by truncated values. Needless to say, this is a non-optimal solution to
obtaining a sparse structure, one that was summarized by Cadima and Jolliffe (1995):
"Thresholding requires a degree of arbitrariness and affects the optimality and
uncorrelatedness of the solutions with unforeseeable consequences."
16
An additional goal of regularized EFA methods beyond achieving sparse structure is
that of correctly identifying the correct number of factors underlying the dataset. By
varying the amount of penalization, the structure can be compared to the fit by either using
a hold out dataset in some form of CV, or using one of the many information fit indexes
that take into account model complexity in an attempt to produce more generalizable
models. A newly propoosed fit index, the extended BIC (EBIC; Chen & Chen, 2008), has
achieved wide adoption in evaluating graphical models and was found to achieve the best
model selection consistency in an applicatio of FA (Hirose & Yamamoto, 2014b).
Both Ning and Georgiou (2011) and Choi et al. (2010) can be seen as versions of the
methods proposed in Hirose and Yamamoto (2014b), where Hirose and Yamamoto (2014b)
tests both the lasso penalty, and two nonconvex penalties, both the smooth clipped
absolute deviation (SCAD; Fan & Li, 2001) and minimax concave penalty (MC+; Zhang,
2010). The general form can be seen as
`
ρ
(λ,ψ) =`(λ,ψ)−N
p
X
i=1
m
X
j=1
ρP (|λ
ij
|) (16)
where `(λ,ψ) is the maximum likelihood fit function with respect to the factor loading
matrix (λ) and uniquenesses (ψ), ρ> 0 is a regularization parameter, and P (·) is the
penalty function. Both nonconvex penalty methods require alternative optimization
algorithms, however, both have been shown to produce sparser solutions in comparison to
the lasso in some circumstances (Friedman, 2012; Mazumder, Friedman, & Hastie, 2011;
Zou et al., 2006). Neither nonconvex penalty will be detailed further, as they are not tested
in this paper.
Proposed Method
Regularized Structural Equation Modeling. As an attempt to generalize forms
of regularization from the regression framework to that of structural equation models, two
new cost functions are introduced. The first, a generalization of ridge regression:
17
F
ridge
=log(|Σ|) +tr(S∗ (Σ)
−1
)−log(|S|) +λ∗
X
(A
2
) (17)
and the second, a generalization of lasso regression:
F
lasso
=log(|Σ|) +tr(S∗ (Σ)
−1
)−log(|S|) +λ∗
X
|A|) (18)
With the addition of λ∗sign(A) to the elements from the A matrix in the gradient for the
lasso function, and 2λ∗A for ridge penalties.
It is worth noting that it is a simple step to extend both of these to a form similar to
elastic net regression (Zou & Hastie, 2005). Additionally, instead of specifying every free
parameter of the A matrix as subject to shrinkage, subsets (or singular elements) can be
chosen at the practitioner;s discretion. For example, if one wished to do subset selection for
a CFA model, then every free element of the A matrix could be penalized. On the other
hand, if some of the items or scales were known to be strong indicators a priori, then these
could be used as a form of "anchor", and could be spared penalization.
The new cost function reduces to the maximum likelihood function if the lambda
parameter equals zero. The new method that encompasses both of the new cost functions
is termed Regularized SEM (RegSEM). RegSEM was implemented as a package in R,
termed regsem (not published on CRAN).
In addtion to regsem, an indentical method that uses a different form of optimization
was programmed with the OpenMx (Neale et al., 2015) package in R. Instead of explicitly
expressing equations for both the gradient and Hessian, which both are only approximated
in the R package, using the CSOLNP optimizer in OpenMx 2.0 uses nonlinear augmented
Lagrange multiplier method based on the Rsolnp package (Ghalanos & Theussl, 2010) in
R. Although both the gradient and Hessian can be specified, they are not necessary, and
preliminary results indicate that there is no loss in precision compared to implementation
in the R package. As a result of the effectiveness using OpenMx, the Rsolnp package was
also implemented in regsem as a method of comparison. The use of OpenMx is made
18
possible by the implemented modularization in OpenMx 2.0 (see Neale et al., 2015), where
it is now possible to specify user defined fit functions. Therefore, through slight
modifications to how the matrices are specified, mainly the introduction of a 1x1 lambda
matrix containing the shrinkage parameter, the proposed cost function can be specified in
OpenMx as was created in regsem.
Aims
In order to properly evaluate the proposed method, a number of conditions were used.
The first is to compare RegSEM to regularization in regression, since regression can be seen
as a subset of SEM. The same sequence of penalties from glmnet was used for both ridge
and lasso regression n order to determine whether the two implementations result in the
same estimates.
In the second condition, a CFA model was used, starting with a good fitting model,
and adding both random noise variables, along with variables that cause a decrement in fit
to the model. It is hypothesized that RegSEM will shrink the "noise" loadings to zero first,
thus identifying the optimal model. By using large enough values of penalization, all factor
loadings are hypothesized to be shrunk to zero, albeit at different rates. No form of external
criterion such as a fit index was used to determine the "best" model, as at this time, it is
first necessary to evaluate whether the proposed method works as would be hypothesized.
Method
Data
Across the four different conditions, a number of different datasets were used. For the
comparison of RegSEM to regularized regression, the Boston dataset from the MASS
package (Venables & Ripley, 2002) was used. This is a commonly used dataset for
comparing regression models, containing thirteen predictors of one continuous outcome.
For the evaluation of the CFA model with noise variables, the Holzinger Swineford dataset
19
from the lavaan package (Rosseel, 2012) was used. There are nine cognitive variables that
are generally either thought to conform to a three factor model with no cross-loadings, or a
bi-factor model with one general factor and three specific factors.
Models
Regularized Regression. Both ridge and lasso regression were run using the
glmnet package (Friedman et al., 2010) in R. These methods were compared to the use of
both the l1 and l2 norm penalties in RegSEM. One hundred values of lambda were used for
both the regression and RegSEM. These values ranged from zero (no regularization) to a
value for both methods that either resulted in coefficients of zero for all regression
parameters, or estimation errors occurred.
CFA with noise variables. The three factor model with the Holzinger Swineford
dataset is displayed in Table 1. This model contains no cross-loadings and three items per
factor, the model achieves marginal fit (χ(24) = 85,p< 0.001, RMSEA = 0.092, CFI =
0.931). More importantly, all standardized factor loadings had absolute values above 0.40.
20
Latent Factors
Items F1 F2 F3
X1 0.77 - -
X2 0.42 - -
X3 0.58 - -
X4 - 0.85 -
X5 - 0.85 -
X6 - 0.84 -
X7 - - 0.57
X8 - - 0.72
X9 - - 0.67
Table 1
Base 3 Factor CFA Standardized Loadings
From this base model, three noise variables (n1-n3) were simulated as normally
distributed with mean of zero, and standard deviation of one. One noise variable was
added to each factor. Additionally, three variables were chosen to be added as loadings on
an additional factor. All added loadings had modification indices of less than 3.84 with
respect to the chosen factor. The resulting model estimates are displayed in Table 2.
The model fit improved according to the RMSEA with the addition of six factor
loadings (χ(48) = 111,p< 0.001, RMSEA = 0.066, CFI = 0.928). With this model, the
hypothesis was that the first parameters to be shrunk are those in bold in Table 2. Note
that for the actual estimation, the first loading for each factor was constrained to one. The
use of an anchor item is hypothesized to prevent estimation problems, as the composition
of each factor could change drastically across levels of shrinkage. This is an area for future
study.
21
Latent Factors
Items F1 F2 F3
X1 0.77 - 0.02
X2 0.43 - -
X3 0.58 - -
X4 0.06 0.82 -
X5 - 0.86 -
X6 - 0.84 -
X7 - 0.02 0.58
X8 - - 0.72
X9 - - 0.67
N1 0.06 - -
N2 - 0.03 -
N3 - - 0.03
Table 2
Standardized Loadings from 3 Factor CFA with Noise Items
Computational Implementation
The proposed methods were implemented with a newly created R package called
regsem (regularized structural equation modeling) and in OpenMx. In OpenMx, the
addition of modularity in version 2.0 (Neale et al., 2015) made this possible by creating
new fit functions that included the penalties. In OpenMx, both solvers, NPSOL and the
newly created CSOLNP, which is a C++ instantiation of the solnp function from the
Rsolnp package in R, were used. A third choice of solver was also available at this time,
NLopt, however, this was not tested. Both solvers in OpenMx do not require derivatives.
Since the proposed method aims to be as general as possible, and the derivatives are not
generally known for many forms of SEM, this is seen as ideal.
22
In regsem, multiple forms of optimization were tried. First, approximate first and
second order derivative functions were created using the procedure of (Cudeck et al., 1993).
These derivatives, that work across different forms of SEM’s, were then tested using the
nlminb function from the stats package in R (R Core Team, 2014). This is a form of
nonlinear optimization that is also used by the lavaan package. Second, the solnp function
from the Rsolnp package (Ghalanos & Theussl, 2014) was tried. This is a derivative free
nonlinear optimization method that uses augmented lagrange multiplier method (for more
detail, see Ye, 1987). Finally, two additional methods were tried, (von Oertzen & Brick,
2014) for specifying the first and second order derivatives based on RAM matrices, and the
optimx package in R for optimization (Nash & Varadhan, 2011). However, neither method
yielded adequate results initially and require much more testing, and will not be detailed
further.
It is recognized that with the models that are tested in this study, it would be possible
to implement exact derivatives based on the given cost functions and in relation to the
individual parameters. However, as the aim of this study is to create a method that is as
general as possible, this was not done at this time. If the use of derivative free methods
proves to be inadequate, this will be investigated in the future.
Results
Regression
In comparing the results from the glmnet package and the RegSEM, only the solnp
optimization routine was detailed for every example, as it did not have any convergence
issues at any values of the regularization parameter. Both the use of nlminb and OpenMx
suffered from problems with unstable estimates. The nlminb optimization failed to
converge once the regularization parameter reached certain values. In OpenMx, once the
regularization parameter increased above very small values, the first order optimality
criterion were no longer met. Only default accuracy setting were used for both CSOLNP
23
and NPSOL, however, this will investigated further in the future.
First, the 13 predictors were run with the glmnet package, using lasso regression and
default values for lambda. The results are displayed in Figure 2.
Figure 2. Output from glmnet with lasso penalties. Note that the top axis refers to the
number of nonzero parameter estimates.
As is evident from Figure 2, the lasso penalty pushes the beta coefficients to zero at
different rates, but these values stay are set and stay at zero as compared to the ridge
penalties in Figure 3.
Although all beta coefficients are eventually pushed to zero, this happens at the exact
same time, thus not performing a form of subset selection. In the previous two figures, the
values displayed along the top axis refer to the number of non-zero coefficients at the
specified iteration.
To test the functioning of the proposed method along with the computational aspects
of the tested optimization routines, the results from glmnet, using the same lambda values
were tested in the regsem package, using the Rsolnp for optimization. The results for the
lasso penalty are displayed in Figure 4.
Figure 4 shows that with this method, although the beta coefficients are pushed and
24
Figure 3. Output from glmnet with ridge penalties.
Figure 4. Regression output from regsem with same lambda values
set to zero, higher values of penalization are required to produce the same effects. Using
higher increment of a 0.05 increase for each iteration results in the following plot.
Although a different coloring scheme is used to denote the specific parameters across
the plots from glmnet and regsem, the plots look almost identical. It is evident in Figure 4
that there are some problems with the stability of estimates, as at least two paths are set
25
Figure 5. Regression output from regsem lambda values that result in all coefficients
pushed to 0
to zero and jump back up for at least a few iterations, as well as nonsmooth paths that are
especially clear in the last two coefficients that were set to zero.
The use of ridge penalties with regsem are displayed in Figure 6.
Figure 6. Regression output for ridge penalties with regsem.
In comparison to Figure 3, the plots look almost identical. As with the lasso comparison,
26
larger values of lambda had to be used in regsem to match the slope of decline for the
coefficients from glmnet. Although both forms of penalty in RegSEM required higher levels
of lambda, almost identical results were found in the constraining of estimates. The ratio
of the coefficients was almost the same across glmnet and regsem, which would result in
the same conclusions regarding subset selection.
CFA with Noise Added
Similar to the case with the regression penalties, the use of nlminb with the regsem
package, with and without specifying the approximate Hessian matrix, resulted in
nonconverging solutions. OpenMx, similar to use with regression, displayed warnings
regarding the optimality criterion as soon as any penalty was added. Finally, using regsem
with Rsolnp had no problems with convergence, although did display jumps in parameter
estimates across iterations. Results using nine variables from Holzinger Swineford data and
three added noise variables with lasso penalties are displayed in Figure 7.
Figure 7. CFA output from regsem with lasso penalties.
Although there are jumps in estimates, denoting some problems with estimation, the
method performed as hoped. The smaller factor loadings were almost immediately pushed
27
to zero, while larger loadings were decreased at differing rates. Almost identical results
were found in OpenMx, as displayed in Figure 8, despite warnings regarding optimization.
Figure 8. CFA output from OpenMx. Note: The coloring is different as a result of default
ordering of variables
Ridge penalties were also tested using the same set of lambda values as with lasso.
Results from this with regsem are displayed in Figure 9.
Figure 9. CFA output from regsem with ridge penalties.
28
To further investigate the results of estimates using the ridge penalties, twice the number
of lambda values were used, increasing at the same rate. These results are displayed in
Figure 10 from regsem.
Figure 10. CFA output using 200 iterations for ridge penalties with regsem.
The methods seem to perform as expected, pushing each loading towards zero without
actually reaching zero. Additionally, the smoothness to all of the paths is encouraging from
an estimation perspective as there seem to be fewer problems in achieving stable estimates.
In fact, the ridge penalties were also estimated using OpenMx, with the occurrence of zero
convergence issues across the 200 lambda values. These results are detailed in Figure 10.
It is evident in that using the ridge penalties, the small estimates are not immediately
set to zero, and in this case, the green and yellow negative loadings in Figure 11 actually
increase in absolute value from the non-regularized model. This is not particularly useful
from a subset selection standpoint, however, if generalizability was the goal, it could
demonstrate some utility.
Finally, to more concretely address the hypotheses regarding the subset selection using
the lasso penalties, values of the loadings across four different sets of penalty values are
displayed in Table 3.
29
Figure 11. CFA output using 200 iterations for ridge penalties in OpenMx.
As is evident in Table 3, at the regularization value of 0.3, the hypothesized subset
selection is perfectly achieved. However, there were two negative consequences of achieving
this. One, the resulting nonzero loadings were shrunken to a large degree, especially for
factors one and three. In comparing the resulting factor loadings estimates across factors
two and three, the loading estimates for factor two exhibit a minimal decline, only
decreasing by about 0.20. In comparison, the loadings for factor three decline on the order
of approximately 0.80, going from what would be considered strong loadings with no
regularization, to rather weak indicators. Second, the model fit went from good to very
poor with respect to the RMSEA values. This is to be expected; however, it was a large
decrement in fit.
Discussion
Across the two different model comparisons, different estimates were achieved across
different forms of optimization, with no gold standard. This gold standard could only be
achieved through the use of simulation, which is the logical next step in evaluating the
proposed method. Barring questions regarding the accuracy of estimation, both examples
30
Regularization Values
Loadings 0 0.105 0.19 0.3
f1X1 1 1 1 1
f1X2 0.55 0.30 0.21 0.16
f1X3 0.72 0.45 0.35 0.31
f1X4 0.11 0.04 0.01 0
f1N1 0.08 0 0 0
f2X4 1 1 1 1
f2X5 1.16 1.03 0.97 0.9
f2X6 0.96 0.86 0.81 0.75
f2X7 -0.05 0 0 0
f2N2 -0.02 -0.03 -0.01 0
f3X7 1 1 1 1
f3X8 1.16 0.86 0.7 0.35
f3X9 1.06 0.71 0.53 0.2
f3X1 0.03 0 0 0
f3N3 0.03 0 0 0
RMSEA 0.06 0.14 0.18 0.20
Table 3
Output for different levels of lasso penalization in regsem with CFA noise variables
demonstrated that the implementation of RegSEM, in both the regsem and in OpenMx
packages, worked, to varying degrees in penalizing the estimates and constraining them to
zero after a requisite value of penalty.
Demonstrating that RegSEM works as a form of regularization in SEM is just a small
piece in demonstrating its utility. RegSEM was not paired with CV or any of the
information criteria to determine whether the addition of regularization leads to more
generalizable models. Additionally, testing models on the continuum of unconstrained to
31
very restricted was not conducted. The CFA example with the addition of noise variables
is a very limited attempt at this. Ideally, in the context of factor analysis, RegSEM would
be tested starting with an EFA, and penalized until only a small subset of indicators were
left, essentially imposing a form of simple structure. This could be combined with an
algorithm for eliminating the estimation of latent factors once a minimum in the sum of
squared loadings was achieved, allowing a continuum of factors to be tested.
There are a number of directions for future reasearch. One of the persistent problems
from the results of the RegSEM implementation were questions regarding the accuracy and
stability of parameter estimates across values of penalization. Additionally, it is unknown
as to why the use of the same penalties in glmnet and regsem resulted in less penalization
for regsem in comparison. One hypothesis is that the slower penalization was due to
inadequate optimization. Using nlminb in regsem with the approximate Hessian, both with
the regression and CFA examples, resulted in penalized estimates, however, the
penalization was much slower than any other routine. It could be that either exact gradient
and or Hessian matrices are required to achieve proper results.
It is worth examining the use of other optimization routines such as Coordinate
Descent (Friedman et al., 2010) or the Generalized Path Seeking algorithm (Friedman,
2012) that were created specifically for use with regularized regression. A variant of
Coordinate Descent is used in the fanc package (Hirose & Yamamoto, 2013). A final
proposition for improving the method involves implementing the method of von Oertzen
and Brick (2014) for calculating exact first and second order derivatives with the RAM
matrices. Because only the RAM matrices are necessary for calculating first and second
order derivatives, the method is generalizable to any type of SEM.Including this in regsem
could allow for exact derivatives to be used, thus speeding up estimation and potentially
increasing the accuracy.
Although not touched on until this point, it is worth mentioning how this method
compares to the use of stochastic search algorithms for finding the optimal structure in
32
SEM (e.g. Marcoulides, Ing, & Hoyle, 2012). In these procedures, each free parameter in
the model is given a possible 0 or 1 value, zero for constrained to zero, or 1 being freely
estimated. The vector of 0’s or 1’s, with a length equal to the number of free parameters, is
then used with one of many search algorithms to find the optimal model. In its
implementation, the use of these stochastic search algorithms seems very different from the
use of regularized SEM, however, in practice, they should obtain very similar results. This
is an area for future study as to whether they do in fact produce similar results, and in
what circumstances may one be preferential. One possibility is to incorporate one of the
many genetic algorithm or tabu search R packages into regsem to provide a comparison,
allowing the user greater flexibility in choosing the final model.
In lasso regression, it has been found that the lasso has a propensity to over shrink the
coefficients, with these estimates biased towards zero (Hastie et al., 2009). A way to
overcome this is to use lasso penalties to first identify the nonzero paths, and then run an
unconstrained model (linear regression with no penalties) using only these variables in the
model. This is known as the relaxed lasso (Meinshausen, 2007). Testing different forms of
obtaining the final parameter estimates is an additional area for future study with regards
to RegSEM, as it is unclear as to whether conducting a two-stage approach would be
beneficial and in what circumstances.
Finally, the last area for future research is that of examining which parameters
decrease to zero slowly, and which have a steeper slope. This could be examined by
calculating the level of parameter influence using one of two methods ( S.-Y. Lee & Wang,
1996; T. Lee & MacCallum, 2014). It would be interesting to assess if these values stay
near constant across varying levels of penalization, or whether there is fluctuation. This
has practical importance for determining which parameters in an SEM are worth
evaluating at specific values (high parameter influence), and which parameters should not
be evaluated (low levels of influence), as there is little change in the fit of the model while
varying the estimates of that parameter. By calculating parameter influence across levels of
33
shrinkage, it would also shed more light on how the penalties are operating, instead of just
blindly increasing or decreasing levels of penalty. A weighted version of adding penalties
may create more theoretically interesting models, as penalties could be relatively larger for
parameters with little influence, and smaller for high influence parameters.
This study represents an important initial step in evaluating the utility and accuracy
of RegSEM, while identifying areas for future study. The estimation procedure was
demonstrated to work in reducing parameter estimates as would be expected, however, it
took a larger amount of penalty than expected to constrain estimates to zero in the
regression context. By increasing the penalties in the CFA model, the hypothesized model
was recovered, however, this was not done in a way that would be valuable to researchers.
Although there are a large number of limitations to this study, testing a number of the
proposed ideas will be greatly facilitated through the already created R package regsem.
The base framework for evaluating RegSEM is set, with great potential for rapid
improvement in the near future.
34
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood
principle. In B. N. Petrov & F. Csaki (Eds.), Second internation symposium on
information theory. (p. 267-281). Budapest: Akademiai Kiado.
Boker, S., Neale, M., & Rausch, J. (2004). Latent differential equation modeling with
multivariate multi-occasion indicators. In K. van Montfort et al. (Ed.), Recent
developments on structural equation models (pp. 151–174). Springer.
Bollen, K. A., & Pearl, J. (2013). Eight myths about causality and structural equation
models. In Handbooks of sociology and social research (pp. 301–328). Springer
Science and Business Media.
Borsboom, D., Cramer, A. O. J., Schmittmann, V. D., Epskamp, S., & Waldorp, L. J.
(2011). The small world of psychopathology. PLoS ONE, 6(11), e27407.
Bozdogan, H. (1987). Model selection and Akaike’s information criterion (AIC): The
general theory and its analytical extensions. Psychometrika, 52(3), 345-370.
Bringmann, L. F., Lemmens, L. H. J. M., Huibers, M. J. H., Borsboom, D., & Tuerlinckx,
F. (2014). Revealing the dynamic network structure of the beck depression
inventory-II. Psychological Medicine, 45(04), 747–757.
Browne, M. W. (2000). Cross-validation methods. Journal of Mathematical Psychology,
44(1), 108 - 132.
Browne, M. W., & Arminger, G. (1995). Specification and estimation of mean and
covariance structure models. In Handbook of statistical modeling for the social and
behavioral sciences (pp. 185–249). Springer.
Browne, M. W., & Cudeck, R. (1989). Single sample cross-validation indices for covariance
structures. Multivariate Behavioral Research, 24(4), 445–455.
Browne, M. W., & du Toit, S. H. C. (1992). Automated fitting of nonstandard models.
Multivariate Behavioral Research, 27(2), 269–300. doi: 10.1207/s15327906mbr270213
35
Buhlmann, P., Peters, J., & Ernest, J. (2014). CAM: Causal additive models,
high-dimensional order search and penalized regression. The Annals of Statistics,
42(6), 2526–2556.
Burden, R. L., & Faires, J. D. (2001). Numerical analysis. Brooks/Cole, USA.
Cadima, J., & Jolliffe, I. T. (1995). Loading and correlations in the interpretation of
principle compenents. Journal of Applied Statistics, 22(2), 203–214.
Chandrasekaran, V., Parrilo, P. A., & Willsky, A. S. (2012). Latent variable graphical
model selection via convex optimization. The Annals of Statistics, 40(4), 1935–1967.
Chen, J., & Chen, Z. (2008). Extended Bayesian information criteria for model selection
with large model spaces. Biometrika, 95(3), 759–771.
Chickering, D. M. (2003). Optimal structure identification with greedy search. The
Journal of Machine Learning Research, 3, 507–554.
Choi, J., Zou, H., & Oehlert, G. (2010). A penalized maximum likelihood approach to
sparse factor analysis. Statistics and Its Interface, 3(4), 429–436.
Costantini, G., Epskamp, S., Borsboom, D., Perugini, M., Mottus, R., Waldorp, L. J., &
Cramer, A. O. (2015). State of the aRt personality research: A tutorial on network
analysis of personality data in R. Journal of Research in Personality, 54, 13–29.
Cudeck, R., & Browne, M. W. (1983). Cross-validation of covariance structures.
Multivariate Behavioral Research, 18(2), 147–167.
Cudeck, R., Klebe, K. J., & Henly, S. J. (1993). A simple gauss-newton procedure for
covariance structure analysis with high-level computer languages. Psychometrika,
58(2), 211–232.
Dahl, J., Roychowdhury, V., & Vandenberghe, L. (2005). Maximum likelihood estimation
of gaussian graphical models: numerical implementation and topology selection (Tech.
Rep.). Technical report, UCLA.
De Gooijer, J. G., & Koopman, S. J. (1988). cross-validation criteria and the analysis of
covariance structures. In M. G. H. Jansen & van Schuur W. H. (Eds.), The many
36
faces of multivariate analysis: Vol. ii. proceedings of the smabs-88 conference
(p. 296-311). Groningen, The Netherlands: RION.
Duarte, C. W., Klimentidis, Y. C., Harris, J. J., Cardel, M., & Fernandez, J. R. (2011). A
hybrid Bayesian network/structural equation modeling (BN/SEM) approach for
detecting physiological networks for obesity-related genetic variants. In 2011 IEEE
international conference on bioinformatics and biomedicine workshops (BIBMW).
Institute of Electrical & Electronics Engineers (IEEE). doi:
10.1109/bibmw.2011.6112455
Fan, J., & Li, R. (2001). Variable selection via nonconcave penalized likelihood and its
oracle properties. Journal of the American statistical Association, 96(456),
1348–1360.
Friedman, J. (2012). Fast sparse regression and classification. International Journal of
Forecasting, 28(3), 722–738.
Friedman, J., Hastie, T., Hofling, H., & Tibshirani, R. (2007). Pathwise coordinate
optimization. The Annals of Applied Statistics, 1(2), 302–332.
Friedman, J., Hastie, T., & Tibshirani, R. (2008). Sparse inverse covariance estimation
with the graphical lasso. Biostatistics, 9(3), 432–441.
Friedman, J., Hastie, T., & Tibshirani, R. (2010). Applications of the lasso and grouped
lasso to the estimation of sparse graphical models (Tech. Rep.). Technical report,
Stanford University.
Ghalanos, A., & Theussl, S. (2010). Rsolnp: general non-linear optimization using
augmented lagrange multiplier method. R package version, 1.
Ghalanos, A., & Theussl, S. (2014). Rsolnp: General Non-linear Optimization Using
Augmented Lagrange Multiplier Method [Computer software manual]. (R package
version 1.15.)
Glymour, C., Scheines, R., & Spirtes, P. (1987). Discovering causal structure: Artificial
intelligence, philosophy of science, and statistical modeling. Academic Press.
37
Grimm, K. J., & McArdle, J. J. (2005). A note on the computer generation of mean and
covariance expectations in latent growth curve analysis. In Multi-level issues in
strategy and methods (pp. 335–364). Emerald.
Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning.
Springer New York.
Hirose, K., & Yamamoto, M. (2013). fanc: Penalized Likelihood Factor Analysis via
Nonconvex Penalty [Computer software manual]. Retrieved from
http://CRAN.R-project.org/package=fanc (R package version 1.13)
Hirose, K., & Yamamoto, M. (2014a). Estimation of an oblique structure via penalized
likelihood factor analysis. Computational Statistics & Data Analysis, 79(0), 120–132.
Hirose, K., & Yamamoto, M. (2014b). Sparse estimation via nonconcave penalized
likelihood in factor analysis model. Statistics and Computing, 1–13.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for
nonorthogonal problems. Technometrics, 12(1), 55–67.
Jolliffe, I. T., Trendafilov, N. T., & Uddin, M. (2003). A modified principal component
technique based on the lasso. Journal of Computational and Graphical Statistics,
12(3), 531–547.
Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor
analysis. Psychometrika, 34(2), 183–202. doi: 10.1007/bf02289343
Jung, S., & Takane, Y. (2008). Regularized common factor analysis. In K. Shigemasu
(Ed.), New trends in psychometrics (pp. 141–149). Universal Academy Press.
Lawley, D. N. (1940). Vi.—the estimation of factor loadings by the method of maximum
likelihood. Proceedings of the Royal Society of Edinburgh, 60(01), 64–82.
Lee, S.-Y., & Wang, S. J. (1996). Sensitivity analysis of structural equation models.
Psychometrika, 61(1), 93–108.
Lee, T., & MacCallum, R. C. (2014). Parameter influence in structural equation modeling.
Structural Equation Modeling: A Multidisciplinary Journal, 22(1), 102–114.
38
Ma, S., Xue, L., & Zou, H. (2013). Alternating direction methods for latent variable
gaussian graphical model selection. Neural Computation, 25(8), 2172–2198.
Marcoulides, G. A., Ing, M., & Hoyle, R. (2012). Automated structural equation modeling
strategies. In R. H. Hoyle (Ed.), (pp. 690–704). The Guilford Press New York.
Mazumder, R., Friedman, J. H., & Hastie, T. (2011). Sparsenet: Coordinate descent with
nonconvex penalties. Journal of the American Statistical Association, 106(495).
Mazumder, R., & Hastie, T. (2012). The graphical lasso: New insights and alternatives.
Electronic Journal of Statistics, 6, 2125.
McArdle, J. J. (2005). The development of the ram rules for latent variable structural
equation modeling. Contemporary psychometrics: A festschrift for Roderick P.
McDonald, 225–273.
McArdle, J. J., & McDonald, R. P. (1984). Some algebraic properties of the reticular
action model for moment structures. British Journal of Mathematical and Statistical
Psychology, 37(2), 234–251.
Meinshausen, N. (2007). Relaxed lasso. Computational Statistics & Data Analysis, 52(1),
374–393.
Meinshausen, N., & Buhlmann, P. (2006). High-dimensional graphs and variable selection
with the lasso. The Annals of Statistics, 1436–1462.
Mulaik, S. A. (2009). Foundations of factor analysis. CRC press.
Nash, J. C., & Varadhan, R. (2011). Unifying optimization algorithms to aid software
system users: optimx for R. Journal of Statistical Software, 43(9), 1–14.
Neale, M. C., Hunter, M. D., Pritikin, J. N., Zahery, M., Brick, T. R., Kirkpatrick, R. M.,
... Boker, S. M. (2015). OpenMx 2.0: Extended structural equation and statistical
modeling. Psychometrika. doi: 10.1007/s11336-014-9435-8
Ning, L., & Georgiou, T. T. (2011). Sparse factor analysis via likelihood and `
1-regularization. In Decision and control and European control conference (cdc-ecc),
2011 50th IEEE conference on (pp. 5188–5192).
39
Park, T., & Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical
Association, 103(482), 681–686.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: networks of plausible
inference. Morgan Kaufmann.
Pearl, J. (2000). Causality: models, reasoning and inference (Vol. 29). Cambridge Univ
Press.
Pourahmadi, M. (2013). High-dimensional covariance estimation: With high-dimensional
data. John Wiley & Sons.
R Core Team. (2014). R: A language and environment for statistical computing [Computer
software manual]. Vienna, Austria. Retrieved from http://www.R-project.org
Rosseel, Y. (2012). lavaan: An r package for structural equation modeling. Journal of
Statistical Software, 48(2), 1–36.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6,
461–464.
Shimizu, S., Hoyer, P. O., Hyvärinen, A., & Kerminen, A. (2006). A linear non-gaussian
acyclic model for causal discovery. The Journal of Machine Learning Research, 7,
2003–2030.
Thurstone, L. L. (1935). The vectors of mind. Chicago, IL: University of Chicago Press.
Thurstone, L. L. (1947). Multiple factor analysis. Chicago, IL: University of Chicago
Press.
Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the
Royal Statistical Society. Series B (Methodological), 58(1), 267–288.
Trendafilov, N. T., & Adachi, K. (2014). Sparse versus simple structure loadings.
Psychometrika, 1–15.
Venables, W. N., & Ripley, B. D. (2002). Modern applied statistics with s (Fourth ed.).
New York: Springer.
von Oertzen, T., & Brick, T. R. (2014). Efficient Hessian computation using sparse matrix
40
derivatives in ram notation. Behavior Research Methods, 46(2), 385–395.
Whittaker, T. A., & Stapleton, L. M. (2006). The performance of cross-validation indices
used to select among competing covariance structure models under multivariate
nonnormality conditions. Multivariate Behavioral Research, 41(3), 295–335.
Ye, Y. (1987). Interior algorithms for linear, quadratic, and linearly constrained non-linear
programming (Unpublished doctoral dissertation). Department of ESS, Stanford
University.
Yuan, M., & Lin, Y. (2007). Model selection and estimation in the Gaussian graphical
model. Biometrika, 94(1), 19–35.
Zhang, C.-H. (2010). Nearly unbiased variable selection under minimax concave penalty.
The Annals of Statistics, 894–942.
Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net.
Journal of the Royal Statistical Society: Series B (Statistical Methodology), 67(2),
301–320. doi: 10.1111/j.1467-9868.2005.00503.x
Zou, H., Hastie, T., & Tibshirani, R. (2006). Sparse principal component analysis. Journal
of Computational and Graphical Statistics, 15(2), 265–286.
41
Abstract (if available)
Abstract
A new method is proposed, termed Regularized Structural Equation Modeling (RegSEM). RegSEM adds a penalization term to the traditionally used maximum likelihood fit function for SEM with the goal of creating easier to understand and simpler SEM. Although regularization has gained wide acceptance in regression models, very little has transferred to latent variable models. Implementated as an R (R Core Team, 2014) package, termed regsem, and in the R package OpenMx (Neale et al., 2015) for general SEM, its use was demonstrated with two different models. First, results from RegSEM were compared to that from the glmnet package (Friedman, Hastie, Hofling, & Tibshirani, 2007), a popular package that implements both lasso and ridge regression. Additionally, RegSEM was tested as a tool for subset selection with a confirmatory factor analysis (CFA) model. Results were promising with both demonstrations, as performance matched that of the results from established statistical programs for regularization as well as producing a sparse CFA solution.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
On the latent change score model in small samples
PDF
Attrition in longitudinal twin studies: a comparative study of SEM estimation methods
PDF
Estimation of nonlinear mixed effects mixture models with individually varying measurement occasions
PDF
Identifying diverse pathways to cognitive decline in later life using genetic and environmental factors
PDF
Sacrificing cost, performance and usability for privacy: understanding the value of privacy in a multi‐criteria decision problem
PDF
Effects of AT-1 receptor blockers on cognitive decline and Alzheimer's disease
PDF
The cost of missing objectives in multiattribute decision modeling
PDF
Improving reliability in noncognitive measures with response latencies
PDF
Later life success of former college student-athletes as a function of retirement from sport and participant characteristics
PDF
Hierarchical regularized regression for incorporation of external data in high-dimensional models
PDF
A comparison of classical methods and second order latent growth models for longitudinal data analysis
PDF
Disaster near-miss appraisal: effects of attribution, individual differences, psychological distance, and cumulative sequences
PDF
Outlier-robustness in adaptations to the lasso
PDF
Trade-offs among attributes of authentication
PDF
Evaluating aleatory uncertainty assessment
PDF
A Bayesian region of measurement equivalence (ROME) framework for establishing measurement invariance
PDF
Feature selection in high-dimensional modeling with thresholded regression
PDF
Dynamic analyses of the interrelationship between mothers and daughters on a measure of depressive symptoms
PDF
Applying adaptive methods and classical scale reduction techniques to data from the big five inventory
PDF
An examination of in-school and online protective factors for adolescent trajectories of online victimization and harassment over time
Asset Metadata
Creator
Jacobucci, Ross
(author)
Core Title
Regularized structural equation modeling
School
College of Letters, Arts and Sciences
Degree
Master of Arts
Degree Program
Psychology
Publication Date
09/28/2015
Defense Date
09/28/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
factor analysis,lasso,OAI-PMH Harvest,penalization,regularization,ridge,shrinkage,structural equation modeling
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
McArdle, John J. (
committee chair
), Dehghani, Morteza (
committee member
), John, Richard S. (
committee member
)
Creator Email
jacobucc@usc.edu,rcjacobuc@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-188169
Unique identifier
UC11273535
Identifier
etd-JacobucciR-3956.pdf (filename),usctheses-c40-188169 (legacy record id)
Legacy Identifier
etd-JacobucciR-3956.pdf
Dmrecord
188169
Document Type
Thesis
Format
application/pdf (imt)
Rights
Jacobucci, Ross
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
factor analysis
lasso
penalization
regularization
shrinkage
structural equation modeling