Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Quantile mediation models: methods for assessing mediation across the outcome distribution
(USC Thesis Other)
Quantile mediation models: methods for assessing mediation across the outcome distribution
PDF
Download
Share
Open document
Flip pages
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content
i
Quantile Mediation Models:
Methods for Assessing Mediation across the Outcome Distribution
By Ernest Shen
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOSTATISTICS)
Date: 18 December 2013
ii
Dedication
This work is dedicated to my wife Lewei Duan, and all of my loving family and friends.
iii
Acknowledgments
I would like to thank Drs. Kiros Berhane, Chih-Ping Chou, Mary Ann Pentz, David Conti, W. James
Gauderman, Rand Wilcox, and Donna Spruijt-Metz for their invaluable contributions both to my
dissertation work and my life. This work was also supported by the Eunice Kennedy Shriver National
Institute of Child Health and Human Development (R01 HD061968), the US Department of Health and
Human Services (R01 CA 123243), the National Institute on Drug Abuse (R01 DA027226), and the
National Cancer Institute (R01 CA 123243 (Pentz, PI)), Clinical Trials Number NTP00986011.
iv
Table of Contents
Dedication .................................................................................................................................................... ii
Acknowledgments ...................................................................................................................................... iii
List of Tables .............................................................................................................................................. vi
List of Figures ............................................................................................................................................ vii
Abstract ..................................................................................................................................................... viii
1 Introduction .............................................................................................................................................. 1
1.1 The Role of Mediation in Studying Childhood Obesity ............................................................... 1
1.2 The Healthy PLACES Study......................................................................................................... 4
1.2.1 Study Design ......................................................................................................................... 4
1.2.2 Assessment of Mediators ...................................................................................................... 6
1.2.3 Outcomes Assessment........................................................................................................... 9
1.3 Statistical Literature Review ....................................................................................................... 10
1.3.1 Classical Mediation Analysis .............................................................................................. 10
1.3.2 Bayesian Mediation Analysis .............................................................................................. 16
1.3.3 Methodological Challenges for Mediation Analysis ........................................................... 18
1.3.4 Quantile Regression ............................................................................................................ 20
1.3.5 Structural Equation Models for Conditional Quantiles ....................................................... 25
1.4 Proposed Methodology ............................................................................................................... 27
2 Quantile Mediation Analysis with Observed Variables ..................................................................... 30
2.1 Model Assumptions .................................................................................................................... 30
2.2 Two-Stage Estimation Methods .................................................................................................. 32
2.3 Inference for Quantile Mediation ................................................................................................ 38
2.4 Simulation Results ...................................................................................................................... 39
3 A Bayesian Approach to Quantile Mediation ...................................................................................... 45
3.1 Limitations of the Two-Stage Methods ...................................................................................... 45
3.2 Bayesian Methods for Conventional Mediation Models ............................................................ 46
3.3 The Bayesian Quantile Mediation Model ................................................................................... 49
3.3.1 MCMC Methods ................................................................................................................. 49
3.3.2 The Single Mediator Model ................................................................................................ 50
3.3.3 The Multiple Mediator Model ............................................................................................. 51
3.4 Quantile Mediation with a Latent Mediator Variable ................................................................. 54
3.5 Simulation Studies ...................................................................................................................... 57
3.5.1 Single Mediator Model v. Two-stage Approaches ............................................................. 58
v
3.5.2 Multiple Mediator Model .................................................................................................... 61
3.6 Comments Regarding the Simple Model .................................................................................... 63
4 Multilevel Quantile Mediation Models ................................................................................................ 65
4.1 Overview of Multilevel Models for Mediation Analysis ............................................................ 65
4.2 Multilevel Model for Quantile Mediation ................................................................................... 68
4.3 Simulation Study ......................................................................................................................... 74
5 Quantile Mediation Analysis of Data from the HPS ........................................................................... 79
5.1 Overview of the Analysis Models ............................................................................................... 79
5.2 Analysis Results for the Single Mediator Model ........................................................................ 80
5.3 Analysis Results for the Multiple Mediator Model ..................................................................... 89
5.4 Analysis Results for the Multilevel Mediation Model ................................................................ 93
5.5 Analysis Results for the Quantile Mediation Model with Latent Mediators .............................. 97
6 Discussion.............................................................................................................................................. 100
6.1 Key Contributions ..................................................................................................................... 101
6.2 Key Limitations ........................................................................................................................ 103
6.3 Future Work .............................................................................................................................. 105
References ................................................................................................................................................ 110
List of Acronyms ..................................................................................................................................... 122
vi
List of Tables
Table 2.1 MSE of Simulation-Based Estimates of Indirect Effects across Selected Quantiles .................. 41
Table 2.2 Type 1 Error (αβ(η)=0) for the Fitted Value and QCME methods across Selected Quantiles .... 44
Table 2.3 Statistical Power (αβ(η)≠0) for Fitted Value and QCME methods across Selected Quantiles.... 44
Table 3.1. Relative Bias (%) of Parameters using the Bayesian and ACME Models ................................. 59
Table 4.1. Relative Bias (%) of Indirect Effect (αβ(η) or QCME) using the Different Models .................. 75
Table 4.2. Mean Squared Error of Indirect Effect (αβ(η) or QCME) using the Different Models .............. 78
Table 5.1. Analysis results for BMI ........................................................................................................... 86
Table 5.2. Analysis results for Waist Circumference ................................................................................ 86
Table 5.3 Moments and Order Statistics of BMI and Waist Circumference .............................................. 87
Table 5.7. Analysis of HPS Data using Different Models .......................................................................... 92
Table 5.8. Correlations between the three Mediators in the HPS model .................................................... 92
Table 5.9 Analysis of HPS Data using Different Multilevel Models ......................................................... 96
Table 5.10 Analysis of HPS Data across various quantiles for indirect effect of perceived walkability .... 99
vii
List of Figures
Figure 1.1 Hypothesized mediation model for the Healthy Places Study, adapted from Pentz et al. (2010) 6
Figure 1.2. Hypothetical mediation model .................................................................................................. 12
Figure 2.1. Densities of product of coefficients, difference of coefficients, and product minus difference
.................................................................................................................................................................... 43
Figure 3.1. Relative Bias of Bayesian and QCME Model, for different quantiles ..................................... 60
Figure 4.1 Shape of the Asymmetric Laplace Distribution for different settings ....................................... 72
Figure 5.1 Effects of Variables on BMI ...................................................................................................... 84
Figure 5.2 Effects of Variables on Waist Circumference ........................................................................... 84
Figure 5.3 Effects of Variables on MVPA .................................................................................................. 85
Figure 5.4 Effects of Variables on Walkability .......................................................................................... 85
Figure 5.5 Indirect Effect of Walkability between Intervention and Response .......................................... 87
Figure 5.6 Indirect Effect of Walkability by Quantiles of Walkability and BMI ....................................... 88
Figure 5.7 Quantile-Specific Effects of Wave of Measurement on BMI and Waist Circumference .......... 96
viii
Abstract
Recent introduction of quantile regression methods to analysis of mediation analysis have focused
primarily on multi-step methods, such as dual-stage quantile regression, or causal mediation analysis.
However, the various limitations of these approaches suggest the need for more flexible methods of
dealing with complex mediation models involving multiple mediators and outcomes, or latent variables.
By combining methods for Bayesian mediation analysis with those of Bayesian quantile regression,
mediation can be characterized for any quantile of the response distribution for a large class of mediation
models that are not easily handled by the multi-step approaches. Bayesian estimation and inference
techniques for quantile mediation are proposed, and compared with existing approaches to mediation
analysis, through simulation studies and analyses of data from the Healthy Places Study. Existing
methods for Bayesian mediation analysis are extended in this dissertation in the following important ways:
1) modeling of mediational relationships via multiple correlated mediators is allowed for arbitrary
quantiles of the outcome distribution; 2) mediators can be modeled as latent variables for arbitrary
quantiles of the outcome distribution; and 3) multilevel data structures can be appropriately accounted for
when assessing mediational relationships for arbitrary quantiles of the outcome distribution. We find that
the Bayesian approaches are able to produce similarly consistent and efficient estimates of quantile
indirect effects, and more importantly are able to assess quantile mediation for a much wider class of
models and underlying data structures, compared to existing techniques.
1
1 Introduction
1.1 The Role of Mediation in Studying Childhood Obesity
In light of the recent obesity epidemic in the United States (Ogden and Carroll, 2010), poor
energy balance has been almost unanimously identified as a dominant source of the problem. That is, too
much energy intake and too little physical activity (often abbreviated as PA) among children has resulted
in the secular trend of increasing obesity over the last 30 years. This has led researchers from very
diverse fields of study to argue that particular facets of life are most responsible for either end of the
energy balance sheet. Some researchers assert that reductions in physical activity patterns as children age
result in increased risk for developing obesity, especially among adolescent girls (Kimm et al., 2005).
Moreover, this trend is more severe among African American adolescent girls (White and Jago, 2012).
Others in the nutritional science community contend that an insidious change in nutritional content of
food over the last few decades has resulted directly in increased risk for (and therefore prevalence of)
both adult and childhood obesity, independent of changes in physical activity. From the inter-
disciplinarian‟s point of view, the contention is that it is some (mostly) unobservable combination of poor
diet and low PA that has resulted in the steady ascent of childhood obesity. While genetics clearly plays a
role in obesity development, it will not be explicitly considered in this dissertation.
Whatever the culprit turns out to be, it is certain that the development of good dietary and PA
habits are both necessary to stem the tide of rising childhood obesity rates and promote healthy lifestyles
among the nation‟s population, thereby reducing the burden of obesity and its correlates (e.g. CVD and
Diabetes) for generations to come. Moreover, Dunton et al. (2009) indicated that the built environment
plays a very important role in determining dietary and PA habits. This indicates the need for
interventions that seek to modify aspects of environments which will reverse negative, and promote
positive, dietary and PA habits. As such, the evaluation of such programs must properly account for this
indirect impact they have on body weight. The method of choice for such evaluations happens to be
statistical mediation analysis (MacKinnon, 2008). For example, a study of Belgian adults (Van Dyck et
2
al., 2010) found that objectively measured PA (as well as walking) mediated the relationship between
neighborhood walkability and BMI/waist circumference in Belgian adults. In other words, the impact of
a highly walkable neighborhood on body weight in adults is a result of improved PA patterns, because
residents of such a neighborhood are more likely to use walking as a form of transportation than residents
of neighborhoods which are dominated by automobile forms of transport. Ewing et al. (2003) found that
urban sprawl was negatively related to BMI, and that this relationship was mediated by differential
patterns of walking between neighborhoods with more or less sprawl. Although the former study was
conducted in Belgium and the latter in the United States, they both reached the conclusion that facets of
the built environment have an effect on PA habits which has a resulting effect on body weight, thus
implying that physical activity mediates the relationship between the built environment and body weight.
However, these studies made the classical (and restrictive) assumption that the mediating effects of PA
are the same for all individuals, regardless of weight status. It stands to reason that heavier adults will
experience larger effects of improved PA habits than normal weight adults, whether the effects are a
result of built environments that are conducive to an active lifestyle or from an intervention program
intended to increase PA.
This features more prominently in studies involving children. First of all, obesity in children is
not determined by having a BMI of at least 30 (as with US adults). Rather, overweight and obesity for
children is determined via age- and gender-specific percentile curves, developed based on a standard US
population (Kuczmarski et al., 2002). Second, body weight in children is related to a complex array of
factors (puberty, gender, and socioeconomic status to name a few). As such one is hard-pressed to
assume that all children across the distribution of, say, BMI are going to experience the same level of
weight loss from an intervention program (such as a PA promotion program) or a change in the built
environment (such as moving from a less to a more walkable neighborhood). Conventional methods for
estimating mediation do not allow for the examination of mediation effects for subjects at different parts
of the distribution of BMI. Rather, they operate under the assumption that children with average BMI
3
will respond to an intervention in exactly the same way as an obese child. This dissertation holds that this
is simply not true for many circumstances.
Analogous to the traditional conditional mean models – those which suppose normality, linearity,
and constant variance – conditional quantile models allow one to characterize linear relationships at some
quantile of the response distribution, rather than the mean. For a binary treatment indicator, Doksum
(1974) dubbed this the “quantile treatment effect.” Koenker and Bassett (1978) generalized this notion,
armed with the theory of order statistics, to what they call regression quantiles. By minimizing a sum of
weighted absolute residuals via a linear program, they developed a great deal of useful theoretical results
which have proved fruitful for those wishing to study the full conditional distribution, rather than just the
conditional mean. For instance, Beyerlein et al. (2010) studied the relationship between common
neonatal risk factors (such as low birthweight or formula feeding) and BMI at school entry (age 5-6 in
Germany). Their figure 3 clearly demonstrates an asymmetric effect of formula feeding, where being
formula-fed caused overweight and obese children to become heavier and lightweight children to become
lighter. In other words, formula-fed children who are already at the unhealthy ends of the BMI
distribution tend to be more strongly affected than formula-fed children who are normal weight. One
might readily posit that there are mechanisms which mediate this relationship between, say, formula
feeding and BMI, and that this relationship is different for overweight children than those of normal
weight. It is here that conventional methods for dealing with mediation and conditional quantiles ought to
converge. Aside from the nascent study of quantile-specific mediation arising in the counterfactual model
of causality (Imai, 2010), this issue has not been sufficiently addressed to warrant widespread use by
those most interested in posing and testing mediation models.
In studies which utilize mediation, such as the Healthy Places Study (Dunton et al., 2012), data on
individuals (many of whom share common environmental exposures) are often collected at multiple times.
This poses the problem of how to characterize mediation across multiple communities, across time, and
across all such possible combinations. Mixed-effects models (Laird and Ware, 1982) provide a flexible
framework for dealing with these issues, and this has been addressed to some degree (Krull and
4
MacKinnon, 2001; Kenny et al., 2003), though not specifically for conditional quantiles. Moreover,
direct measurements of mediators of interest are often not feasible for studies such as these. Health
interventions are often designed to change a relevant behavior, such as increasing physical activity (PA)
in order to reach a healthy body weight. Yet changes in the behaviors may be mediated by changes in
attitudes or self-efficacy regarding PA, which are not directly measurable and must be quantified using
some survey scale. Thus, the challenge in studies such as the Healthy Places Study (HPS) [see Section
1.2 for a complete description of this study] is to be able to quantify and test for mediation for a set of
latent variables which occur across both time and space simultaneously. These challenges arise before
one even confronts how to estimate mediation for quantiles (rather than the mean) of the response
distribution. These various challenges suggest the need to develop a modeling framework which
integrates these inherent aspects of data from such studies as HPS, with a paradigm that allows for
estimating mediation for quantiles of the response distribution across both time and space.
This dissertation aims at developing modeling techniques that will allow for estimation and
testing of mediation models for any quantile of the response distribution, and provide a flexible modeling
paradigm which can handle some of the methodological challenges described above. Details of this
modeling paradigm are preceded by a background on the study which motivated this work, and the
relevant statistical literature.
1.2 The Healthy PLACES Study
1.2.1 Study Design
In the Healthy Places Study (HPS), participating parent-child pairs were selected from those
randomly selected (among those eligible) to move into a new community, hereafter referred to as the
preserve, which was developed using smart growth principles (http://www.smartgrowth.org/) such as
mixing land use or creating walkable neighborhoods (Pentz et al., 2010). HPS
examines whether a
proposed mediating mechanism (see Figure 1.1) exists when facets of the built environment (such as
walkability) are substantially altered. This longitudinal study assesses the effectiveness of living in a
5
smart growth community in reducing the burden of obesity development; compared to two control groups
(a randomized, and a demographically matched control groups), using a three group hybrid design with
aspects of both matching and a randomized trial. The 612 multi-ethnic families comprising the sample
each have one index parent and one index child between ages 10-14 years per family. The intervention
group consists of new resident families who are randomly drawn to live in a low-moderate income smart
growth community in California (n=145). The randomly selected control group is composed of 200
eligible families who had unsuccessfully applied to live in the smart growth community, matched on
demographic and income characteristics to the intervention group. A second demographically matched
control group consists of 267 families who live in communities adjacent to the smart growth community
(i.e. within a 30-minute drive).
The study‟s conceptual model is rooted in Integrative Transactional Theory (Pentz, 1999), which
posits that contextual influences on childhood obesity are clustered at the personal, social, and
environmental levels, and have reciprocal relationships to the development of obesogenic trajectories over
time. Specifically, the application of smart growth principles to the design of a residential community will
mediate the effect of contextual influences. Such mediating mechanisms occur at the personal, social, and
environmental levels, including effects that smart growth principles have on: personal attitudes and
behaviors of improving physical activity (by virtue of greater connectivity and walking trails provided as
part of smart growth); changes in social friendship patterns as new residents of smart growth
communities seek out friends to exercise with; and changes in the environment that occur directly as a
result of living in a smart growth community (e.g., greater use and availability of active recreational
facilities).
The first wave of data collection occurred several months after intervention group participants
had moved into the preserve, and included: anthropomorphic measures, accelerometry, communication
network, BMI, food intake, GIS based built environment factors, GPS logger data, archival data on
regional planning, traffic and density. Anthropometric measurements were collected twice, and the
averages of the measurements were used for data analysis. A variety of survey scales were also
6
administered, which characterize the local built environment, the social context of eating and physical
activity, and psychological factors relating to diet and physical activity. Information on age, sex,
ethnicity, and other demographic factors were characterized through self-report surveys.
Figure 1.1 Hypothesized mediation model for the Healthy Places Study, adapted from Pentz et al. (2010)
1.2.2 Assessment of Mediators
The three mediators of interest in the Healthy Places Study are represented as P, S and E as in
Figure 1.1. The personal factor (P) indicates measures health behaviors related to personal aspects of
physical activity, comprising the following aspects: 1) personal attitudes about PA, such as PA is
enjoyable (Courneya and McAuley, 1995); 2) personal meanings of PA, such as it helps me make friends
or deal with stress; 3) intentions for PA, such as I plan to watch less TV or engage in more PA; and 4)
self-efficacy for PA, such as I would exercise even if I am very busy (Motl et al., 2000). As Figure 1.1
7
indicates, it is expected that the effect of changes in the built environment (E) on physical activity will be
mediated through this personal factor, as well as the social factor which is described below.
The social (S) factor measures health behaviors related to social influences on physical activity
and diet. The first aspect of the social factor corresponds to social norms regarding physical activity. The
survey item used to quantify this aspect quantify such things as the number of similar-aged peers (out of
every 100) who exercise or engage in different levels of physical activity per day, or how many of one‟s
closest friends engage in vigorous physical activity each day. The second aspect corresponds to family
norms, such as whether family members engage in physical activity together or if parents reward/praise
their children for engaging in physical activity. The third aspect deals directly with one‟s social network,
such as how often and with how many important people in one‟s life does a study participant engage in
physical activity. This last aspect is quantified by adapting tools from Social Network Analysis (Motl,
1984). It is also expected that the S factor will mediate the effects of the built environment on PA, where
the latter is quantified as follows.
Neighborhood walkability (or simply, walkability) was used as a measure of the built
environment (the E factor), determined via use of the Neighborhood Walkability Scale (or NEWS,
www.drjamessallis.sdsu.edu). The NEWS consists of 8 subscales, which characterize different features
of an environment that determine its walkability: a) residential density; b) proximity to nonresidential
land use, such as shopping or dining areas; c) ease of access to nonresidential land uses; d) street
connectivity; e) infrastructure and safety for walking and cycling, such as sidewalks and bike lanes; f)
neighborhood aesthetics, e.g. trees or attractiveness of buildings; g) safety for walking and cycling; and h)
safety from crime. Items c-h are measured on a 4-point Likert scale (strongly disagree to strongly agree).
Residential density is assessed by asking about the frequency of various types of residences, such as
single-family homes or apartment complexes. The land use proximity variable is measured in terms of
how long a walk would be necessary to reach a variety of nearby facilities, such as banks, schools,
markets and restaurants. As a means of quantifying social interaction while walking, the HPS added an
8
additional variable asking participants to rate, on the same 4-point Likert scale, their agreement with the
statement: “I see and speak to other people when I am walking in my neighborhood.”
Cerin et al. (2005) provide their recommendations for how to score items in the NEWS to
quantify walkability, based on the results of a confirmatory factor analysis and assessment of criterion
validity for the complete survey. Adams et al. (2009) later validated the NEWS against GIS-based
measures, and found weak-to-moderate concordance between the two, with the strongest agreement
occurring among adults who are most physically active (and thus most likely to correctly perceive facets
of the built environment related to walkability). For the purposes of the analyses presented in this
dissertation, walkability was quantified in one of the following two manners. First, a mean score was
computed for the responses to the items corresponding to subcales e-h and the social interaction item,
with items reverse-coded as necessary. This was then treated as an index of walkability for analysis
purposes. The second approach developed a factor model for the same set of items, and treated
walkability as a latent variable. The former approach is used to illustrate the methods described in
sections 2 and 3. The latter approach will be utilized in the work that appears in section 3.4.
Although this metric differs from those computed using Geographic Information Systems, the
latter often have the disadvantage of indicating only single aspects of neighborhood walkability, such as
measuring only street connectivity or land-use mix. The NEWS-based walkability index has the dual
advantage of capturing important features of the local environment which make an environment more or
less walkable, as well as summarizing the features in such a way that their collective impact may be
assessed. While this ends up being a perceived rather than objective measure of neighborhood walkability,
one may properly account for the differing perceptions between individuals by explicitly modeling the
measurement error from the different NEWS subscales, although the individual items have been shown to
correspond well to objective measures of related subscales (Adams et al., 2009). Accounting for such
measurement error allows analysts to combine information from multiple, imperfect measurements in
hopes of more accurately representing an exposure than could be achieved by a single measurement. In
situations where a factor of interest is not directly measurable, or for which no commonly accepted
9
objective measurement exists (the debate over objective measures of PA or environment is quite lively:
Hoehner et al., 2005), using a well-designed survey scale in conjunction with a single objective measure
may be the best alternative.
1.2.3 Outcomes Assessment
Three different outcomes were measured as part of the Healthy Places Study, two of which were
weight related outcomes. Participants‟ weight and height were measured twice using an electronically
calibrated digital scale (Tanita WB-110A) and stadiometer (PE-AIM-101) to the nearest 0.1 kg and cm,
respectively. BMI was subsequently computed for each of the two measurements using the standard
formula (kg/m
2
), and the resulting average of the two was taken as a participant‟s BMI. Waist
circumference was also measured twice, measured in cm. Although two measures of body composition
were collected, it is suspected that waist circumference may be a more relevant indicator of obesity-
related health conditions. For instance, Jansen et al. (2004) found that obese, overweight and normal-
weight US adults (per the CDC BMI thresholds) who had the same waist circumference also had roughly
the same risk for obesity-related diseases such as hypertension and metabolic syndrome. Jacobs et al.
(2010) also found that - among individuals deemed either obese, overweight or normal weight using BMI
thresholds - increasing waist circumference was related to increased risk of mortality due to several
causes. That is, increasing waist circumference resulted in increased mortality risk, regardless of one‟s
obesity status as determined by BMI thresholds. As such, we expect to see stronger associations for waist
circumference than BMI.
Objective data on physical activity was collected using the Actigraph, Inc., GT2M model activity
monitor. Participants wore the device on the right ilia crest (right hip), attached to an adjustable belt. The
monitor was set to record measurements every 30-s, and recording intervals with ≥ 60 minutes of
consecutive zero activity counts were classified as non-wear per the usual NHANES criteria (Troiano et
al., 2008). Records with activity counts > 16,383 counts/30-seconds, or at speeds greater than 105 mph,
were classified as missing because such values are well out of the range of normal measurements. Non-
10
missing accelerometer count data was then classified as sedentary, light, moderate, or vigorous using age-
specific thresholds for predicted metabolic equivalents (or METs) using the Freedson equation (Freedson
et al., 2005). For the purposes of data analysis, we used the threshold of METs ≥ 4 to quantify the
number of minutes per day each subject spent in moderate-to-vigorous physical activity (abbreviated
MVPA).
1.3 Statistical Literature Review
1.3.1 Classical Mediation Analysis
The notion of mediation dates back at least as far as Hume‟s (1748) celebrated analyses of
induction and the nature of human knowledge, embodied in his work: An Enquiry Concerning Human
Understanding. As an example of reasoning by induction, he posits that we believe the sun will rise and
set once every 24 hours, purely on the basis that in our past experiences it has always done so. However,
a visit to Alaska during the summer months can quickly dispel this belief. Hume generalizes this by
claiming that we draw conclusions about the world by using repeated experiences (regarding induction) of
spatially and chronologically ordered events - say A leads to B which leads to C - to infer (regarding
causation) that whenever A occurs B will also occur, thus causing C to also occur. In these most basic of
terms, one may infer that B is a mediator of the causal relationship between A and C, on the basis that the
same sequence of events (the unconfirmed truth) has occurred across repeated experiences (or samples).
In his collection of essays Objective Knowledge, Karl Popper (1972) responds to Hume‟s
arguments by formalizing what he calls “objective” knowledge. In summary, we conclude a theory (or
test statement) about a phenomenon is more or less true by subjecting it to test conditions, and use the
results of the test to determine whether the theory is worth pursuing further. Theories that have withstood
the most rigorous and numerous testing are then “preferred” to those which have been refuted, and are
then subjected to ever more testing, and so the search for truth moves forward through aggregate
examination. Ignoring for the moment the various caveats associated with the ideas of Hume and Popper,
they provide a philosophical basis for applying the science of statistics to samples (or experiences) from
11
the population (or truth) of all possible mediating mechanisms which lead from A to C, through B. In
essence, the practice of statistical mediation analysis is to propose theoretical models of how a
phenomenon works (or test statements), and then pose statistical tests to decide which models are (to use
Popper‟s terms) preferable and which are not.
The formulation of statistical mediation analysis dates back to Sewall Wright in 1920, when he
worked for the United States Department of Agriculture to examine the relative importance of genetic and
environmental determinants of the piebald pattern of guinea pigs. Wright (1921) would later describe a
statistical approach to modeling causal relationships between a set of common causes for a given outcome,
using nothing more than sample correlations, thus laying the foundation for what has come to be known
as path analysis. Researchers in the social sciences recognized the usefulness of mediation models as
early as 1928, in terms of what Woodsworth called Stimulation-Organism-Response models in
experimental psychology. Many years later, Judd and Kenny (1981) laid the groundwork for what was
to become a great confluence of thinking on mediation analysis across the social sciences, for which an
excellent review can be found in MacKinnon‟s 2008 book on the subject. What they outline is essentially
the application of statistical mediation analysis to the evaluation of treatments, proposing the following
three criteria to demonstrate mediation.
First, the treatment must have some effect on the outcome of interest. They argue, however, that
mediation analysis may still be useful if no significant treatment effect is detected – examining the role of
mediators may help one to understand why the treatment was “ineffective”, and thus illuminate aspects of
the treatment that need to be tweaked in order to have the expected impact. Second, any variable in a
given position of a causal chain (see Figure 1.1) must affect the variable that follows in the chain, when
all antecedent variables (including treatment) are controlled. The third condition is a sufficiency
condition for mediation, which states that the treatment under consideration “exerts no effect upon the
outcome when the mediating variables are controlled (which thus demonstrates that) the hypothesized
chain accounts for all of the relationship between the treatment and outcome.”
12
Baron and Kenny (1986) parallel these arguments, but instead discussed mediation in the context
of the simplification represented in Figure 1.2 (rather than the more general setting Judd and Kenny
described for any number of mediators). Equations 1.1 and 1.2 describe the relationship depicted in
Figure 2, and this can be further simplified using the reduced-form expression in equation 1.3. They also
couched the discussion in terms of a mediator “accounting for variation” in the outcome rather than
“affecting” it. These two formulations could be alternatively stated as “associated with” and “is a cause
of”, respectively. The latter formulation has recently been studied in some detail by employing Imai‟s
“Rule of Sequential Ignorability” (2010), which amounts to a set of conditional independence
assumptions similar to those seen in the instrumental variable (IV) literature in economics (Johnston and
DiNardo, 1978). All debates about how to best operationalize mediation aside, it is evident that
examining mediation in some form is important for characterizing the effects of the built environment on
weight outcomes, as is argued above.
(1.1) 𝑌 = 𝑖 1
+ 𝛽 𝑀 + 𝛾 ′𝑋 + 𝑢
(1.2) 𝑀 = 𝑖 2
+ 𝛼 𝑋 + 𝑣
(1.3) 𝑦 = 𝐵 𝒚 + Γ𝒙 + 𝒆 ,𝑤 ℎ𝑒 𝑟 𝑒
𝐵 =
0 𝛽 0 0
, Γ =
𝑖 1
𝛾 ′
𝑖 2
𝛼 , 𝒚 =
𝑌 𝑀 , 𝒙 =
1
𝑋 , 𝒆 =
𝑢 𝑣
Figure 1.2. Hypothetical mediation model
13
Since mediation is concerned with teasing out specific effects from a potentially complicated
model, certain assumptions have traditionally been used to simplify its statistical machinery, as well as
clarify the grounds upon which conclusions about mediation can be drawn. That the model has
theoretical relevance is arguably the most important assumption. Following Popper, before one can
subject a theory (or test statement) to a test, there must be a very compelling reason to believe that the
theory might actually be true. Ignoring for the moment the centuries-old debate over the Protagorean
view that man is the measure of all things, it is assumed that theoretical relevance can be determined
collectively by the community of scholars in a particular discipline.
The second assumption is particularly germane to Hume‟s formulation of the induction problem:
one must be able to observe X occurring before Y, in order to conclude that X causes Y. This temporality
assumption has two interrelated aspects: 1) X must precede M, which must precede Y; and 2) X must be
measured before M, which must be measured before Y. The third assumption is that the effects which are
estimated are causal. Failure to satisfy this assumption does not, however, invalidate any significant
associations that are detected. Rather, it relegates such associations to play a descriptive rather than
explanatory role – the association between X and Y is (at least partially) transmitted through M.
Normality assumptions are typically made about the probability distributions of X, M, Y and the product
αβ. Mediation models are usually assumed to be recursive: B is triangular and COV(u,v)=0. This is
typically done because non-recursive models allow for the possibility of reverse causality, or Y affecting
either X or M. The last, and most easily relaxed (or ignored), assumptions are that the functional form of
the model is correct and that no excluded variables affect the relations specified in the model.
To quantify what is meant by mediation, Wright (1921) demonstrated that the product of
coefficients in a mediation path (αβ) must be mathematically equivalent to the difference (γ-γ ) between
the treatment effects with (γ ) and without (γ) including all the mediators along the path, what Duncan
(1966) dubbed the “calculus for indirect effects.” This term is better known as effect decomposition, and
is easily demonstrated for the classical model if we write α, β, γ and γ in terms of covariances as Wright
14
did. Fox (1980) demonstrated effect decomposition formally for any number of mediators, and further
argued that methods from simultaneous equations models could be utilized to efficiently estimate
parameters of a mediation model, even in the presence of measurement error. The formalization of effect
decomposition by Duncan and others (for a comprehensive review see Bollen, 1989) indicates that the
model in Figure 1.2 “implies” a covariance structure between endogenous and exogenous variables in a
structural equation model (SEM). Endogenous variables are those whose operations are completely
determined within the proposed model, while exogenous variables are determined external to the model
(such as from a measurement device or survey instrument). The implied covariance matrix is typically
represented by equation 1.4, where Φ and Ψ are the covariance matrices of x and e, respectively.
(1.4) 𝛴 𝑦 𝑦 = 𝐼 − 𝐵
−1
𝛤 𝛷 𝛤 ′ + 𝛹 𝐼 − 𝐵
−1
′
Fox further asserts that effect decomposition holds true whether a model is recursive or not.
Formally stated, a fully recursive model is one for which B is triangular and the errors in the different
parts of the model are uncorrelated (or Ψ is diagonal). Models with triangular B but non-diagonal Ψ are
sometimes called „partially recursive,‟ and have been considered in some detail in economics (Chesher,
2003). For example, the model depicted in Figure 1.2 is recursive, but would be non-recursive if B is not
triangular or Ψ is not diagonal. Parameters of non-recursive models cannot be consistently estimated
using maximum likelihood or least squares techniques, as they produce biased estimates for the
endogenous parameters (Johnston, 1972). Fox (1979) instead recommends using two-stage least squares
or instrumental variable estimators, following the econometrics literature, to correct for this bias. It is
assumed throughout the remainder of the text that any mediation models of interest are fully recursive.
Fully recursive models are particularly easy to estimate in the classical sense, because the
parameters of interest can be expressed purely in terms of covariances (and so as a function of observed
data and not the parameter space), thus uniquely identifying the parameters of the model - a property
known more commonly as “identifiability”. It is worth noting that the recursive rule is only a sufficient
condition for model identifiability, as partially recursive models can be identified given appropriate rank
15
conditions. Other rules for identifiability are either necessary but not sufficient (t-rule and order
condition), sufficient but effectively trivial (B=0), or necessary and sufficient but non-trivial to establish
(rank condition). Bollen (1989) provides an excellent summary of these identifiability rules. Chesher
(2003) described rank conditions for partially recursive models, which he calls non-separable, and this
work was extended separately by Ma and Koenker (2006) and Lee (2007) as what they named the
“control function/variate” (or CV) approach. The partially recursive model that Lee describes uses the t-
rule, while Ma and Koenker make reference to Chesher‟s rank condition regarding identifiability of the
CV approach. The CV method is contrasted with two other two-stage approaches: the least squares
quantile regression (LSQR) and the dual-stage quantile regression (DSQR) estimators. Thus, a natural
starting point for the current work is to examine the properties of the LSQR, DSQR and CV estimators
described in the sections to follow.
Under suitable identifiability conditions, parameters of mediation models have been estimated
using a number of techniques. The earliest technique, outlined by Wright (1920), exploits formulae for
the partial correlations between all variables in a path diagram such as that in Figure 1.2. While this
approach is convenient and easy to understand, it becomes quite onerous once the number of variables or
equations in the path grows large, or latent variables are introduced to account for measurement error. To
deal with the issues of dimensionality and measurement error, Joreskog (1970), Keesling (1972) and
Wiley (1973) collectively proposed the JKW covariance structure analysis model (CSA hereafter) as a
robust solution to resolve the problems of estimating parameters from a complex path diagram with latent
variables. Bentler and Weeks (1980) proposed the eponymous Bentler-Weeks model, which estimates the
parameters in B using a variety of methods, as implemented in the associated software package EQS. For
models that are not recursive, Fox (1980) proposes using two-stage least squares or instrumental variables
estimators, and econometricians have proposed other approaches as well (Chesher, 2003), although these
methods do not clearly extend to cases with measurement error or multilevel data structures. Mediation is
then typically quantified using the product of coefficients along the proposed path (i.e. αβ from Figure
1.2), because it has optimal properties compared to using the difference γ-γ (MacKinnon et al., 2002).
16
Inference about mediation exists for either the product or difference operationalization, but
MacKinnon et al. (2002) demonstrated that inference techniques for the product are generally superior,
mainly because the sampling distribution of the product is much tighter than for the difference. Because
of the optimality of the product over the difference, the product approach has become the generally
accepted approach by the community of scientists that employ mediation. Sobel‟s multivariate delta
representation (1982; 1986) has become the canonical test for mediation (Equation 1.5), with many
offshoots for a variety of mediation problems more complex than Figure 1.2. However, MacKinnon et al.
(2004) indicate that this asymptotic representation may not be entirely correct. If one assumes that
estimators of α and β are asymptotically normal, then their product is not normally distributed, since the
normal distribution is not closed under multiplication. In fact, the distribution of the product of two
normally distributed random variables is not even symmetric, and so it is naïve to employ the
conventional normality assumptions as in the classical Sobel test. Instead, MacKinnon et al. (2004)
propose testing αβ=0 by first assuming that the estimators of α and β are asymptotically normal, and then
comparing αβ/ζ
α
ζ
β
to the appropriate percentile from the resulting asymmetric distribution of αβ. Though
they showed that testing under the asymmetric distribution achieves the nominal Type I error and has
greater power than the Sobel test (which has very low Type I error), this approach for testing mediation
has not been widely adopted. Although a wide array of other tests of mediation exist, the Sobel test has
been (and still remains) the method of choice in the literature.
(1.5) 𝑁 (𝑎 𝑏
− 𝑎 𝑏 )
𝑑 𝑁 (0, 𝜍 𝑏 2
𝑎 2
+ 𝜍 𝑎 2
𝑏 2
)
1.3.2 Bayesian Mediation Analysis
Work on mediation analysis has recently fallen under the Bayesian purview. Lee (2007)
summarized the main limitations of CSA approaches to SEM, and in turn mediation. A variety of
problems (including non-linearity, missing data, correlated observations, etc.) pose serious problems to
the classical CSA techniques. For instance, nonlinearity in the latent variables implies non-normality of
endogenous and observed variables, and so the covariance is no longer sufficient to estimate the
17
parameters of interest. Moreover, correlated data (such as that arising from spatial or temporal clustering
of sample observations) present severe problems for CSA-based methods. To address problems such as
these, one needs a modeling approach that allows for estimation utilizing the observed data rather than
just the implied covariance structure of the proposed model, so that more flexible modeling approaches
can be developed. Both Bayesian and Structural Equation Models are readily formulated as multilevel
models (Gelman et al, 2003; Curran, 2003), and so a natural way forward is to develop Bayesian
approaches to estimation and inference for mediation.
MacKinnon and Yuan (2009) echo Lee‟s criticisms of CSA-based methods, emphasizing that the
canonical inference techniques for indirect effects are not correct, and that development of estimation and
inference techniques for multilevel mediation models has been plagued by a wide array of difficulties
(Krull and MacKinnon, 2001; Kenny et al., 2003). To deal with these difficulties, they extend the ideas
of Lee and formalize a Bayesian approach for classical mediation analysis. By exploiting the naturally
hierarchical structure of Bayesian models (Gelman et al, 2003), they write the mediation model in
equations 1.1-1.2 as a hierarchical model (equations 1.6-1.7). The parameters α
0
, α, β
0
, β and γ are then
sampled from non-informative prior distributions (vague normals) and the variance parameters are
sampled from vague inverse gamma distributions, and Gibbs sampling is then used to generate the joint
posterior distribution, from which estimates and 95% credible intervals of αβ can be obtained. This
model could easily be enriched to account for measurement error in any of Y, M or X, by employing the
strategies detailed by Lee (2007).
(1.6) 𝑌 ~𝑁 𝜇 𝑌 , 𝜍 𝑌 2
,𝑤 ℎ𝑒 𝑟 𝑒 𝜇 𝑌 = 𝛽 0
+ 𝛽 𝑀 + 𝛾 ′𝑋
(1.7) 𝑀 ~𝑁 𝜇 𝑀 ,𝜍 𝑀 2
, 𝑤 ℎ𝑒 𝑟 𝑒 𝜇 𝑀 = 𝛼 0
+ 𝛼 𝑋
Yuan and MacKinnon (2009) demonstrate that this model can easily extend to the case where
data are clustered across space or time. When variables of interest are not necessarily observable, one can
extend Bayesian SEM approaches (Lee, 2007) to deal with mediation models which contain latent
variables. One could then easily envision adding another wrinkle to the Bayesian approach, by allowing
18
for the estimation of quantiles of the response variable in the presence of latent variables occurring for
observations which are correlated in space and time. One may further argue that the types of mediation
models which may be estimated in a Bayesian fashion are virtually unlimited - see Lee (2007) for a
summary of reasons why – and so Bayesian approaches are heavily entertained throughout this
dissertation.
1.3.3 Methodological Challenges for Mediation Analysis
Even after enriching the model using a Bayesian paradigm, in the spirit of Yuan and MacKinnon
(2009), there are still several limitations with conventional statistical mediation techniques. A severe
limitation is that the response variable may not be normally distributed. MacKinnon (2008, pp. 297-323)
discusses some of the issues involved when the response variable is binary, and details possible solutions
for the binary case. This problem is perhaps even better dealt with using a fully Bayesian approach, since
there would be no need to derive asymptotic forms for the variance of the product when the outcome is
binary, as one could simply sample from the posterior distribution resulting from an appropriately
specified model.
However, normality is an unnecessarily restrictive assumption even when the response variable is
continuous. The response distribution may be irreconcilably skewed or heavy-tailed, in which case
normality cannot be achieved. Moreover, the conditional response distributions (say, by treatment status)
may each be symmetric but still have different shapes. In such a situation the mean difference between
treatment and control groups will not be the same as the difference at an extreme quantile of the response
distribution. This results in a violation of the constant variance (or homoscedasticity) assumption
typically employed in conventional linear regression. Both the problems of skewness and
heteroscedasticity pose serious challenges, and more robust methods are needed which can properly
account for either one in a reasonable fashion. Quantile regression (Koenker and Bassett, 1978) provides
a robust estimation framework which properly incorporates information about both the location and shape
of the conditional distribution of the response.
19
Other challenges arise when the model is non-linear (in covariates or parameters), but these can
be remedied by employing the Bayesian methods described by Lee (2007) without much additional effort.
When the model is not fully recursive, there may be potential sources of reverse causality, such as when
Ψ is not diagonal. A more daunting challenge is that the mediator or the treatment may actually have no
effect for subjects whose response values lie near the center of the distribution. This would indicate that
the treatment, or the mediator, have no effect until some upper (or lower) response threshold is crossed.
For example, an exercise program intended to reduce body weight (through increasing PA) may have a
much greater effect among overweight and obese subjects than normal weight. In this instance,
conventional mediation analysis techniques may not detect any mediated effect at the mean of the
response distribution, regardless of how mediation is estimated or tested. Methods are thus needed which
can capture these different effects for any part of the response distribution, and quantile regression will
play an important role in their development.
Inferential approaches such as the Sobel test are also problematic in that they are not actually tests
of mediation, as defined by assumptions 1-6 above, but are instead tests of whether the product of
coefficients (αβ) is zero. Causal inference alternatives to the SEM-based approaches are numerous (e.g.
Imai et al., 2010), and they attempt to specify conditions under which causal mediation may be
determined statistically. Regarding this point, Hume (1748) argues that our understanding of how causes
lead to effects can only be determined by consistently observing the same “conjoining” (to use Hume‟s
words) of events in time and space under every possible condition. However, this process can never lead
us to conclude that a given cause will always lead to a specific effect, because we will never be able to
observe every possible set of conditions. Hume‟s example of the sun rising and setting once every 24
hours does an especially good job illustrating how every theory can be easily contradicted if one simply
looks for a counterexample. He goes on to argue that, as a result, belief plays an inextricable role in our
knowledge about the world. Causal inference techniques, such as the estimator of the quantile mediation
effect of Imai et al (2010), attempt to circumvent this natural limitation by dealing explicitly with the
20
unobservable outcomes. Key differences between the approaches described in this dissertation, and Imai
et al‟s, will be further elucidated in the sections that follow.
In epidemiology, Hill (1965) outlined a set of conditions (or at least some minimal subset thereof)
which must be satisfied in order to determine causality, and these sound surprisingly similar to Hume‟s
conditions for determining that a consistently observed “conjoining” of cause and effect is representative
of nature (Morabia, 1991). Although statistical methods for assessing causality have been well-
articulated elsewhere (Rubin, 2004; Imai et al., 2010; Pearl, 2012), the balance of this dissertation aligns
with Hume‟s worldview and acknowledges that causality can never be fully established using statistical
methods. At best, we can approximate the true state of nature following the process described by Popper
(1972), by formulating theories which can undergo rigorous testing and then “preferring” those which
have withstood the most tests. However, if one follows Popper‟s discussion to its end, one arrives back at
one of the major limitations of epidemiological research: researchers are often only able to observe rather
than experiment. In the context of many prevention studies which involve explicit intervention, for which
mediation is often the analytic method of choice, one more closely approaches the conditions under which
Popper‟s “test statements” may be properly defined and tested. In general, though, one is simply left with
trying to establish as many of the conditions as possible laid out by Hill and Hume. While this
dissertation does not explicitly deal with the mathematical conditions that exist for causal inference
methods (Rubin, 2004; Pearl, 2012; Imai et al., 2010; VanderWeele, 2009), it acknowledges such
methods provide possible alternatives, and provides comparisons where necessary (e.g. Imai et al, 2010).
1.3.4 Quantile Regression
Quantile regression is rooted in the problem of estimating a quantile treatment effect (QTE),
extending Doksum‟s (1974) notion of a two-sample treatment effect between two groups having
distribution functions denoted by F(x) and G(x) = F(x+Δx). A QTE would then be defined as the
horizontal distance between two distribution functions at a given quantile of interest, where for the two-
sample case the Kolmogorov-Smirnov test provides an omnibus test of whether the response differs by
21
treatment status at any quantile of the response distribution. In particular, Huber (1982) asserts that the
median is a more “robust” estimate of the location parameter of a distribution when the classical
assumption of normality fails. Even in instances when exchangeability of a sample holds true, the mean
may not exist when the distribution has arbitrarily heavy tails (i.e. a Cauchy distributed random variable).
Anscombe (1960) discusses the problem of how and when to decide whether apparently outlying
observations from a random sample should be rejected in order to attain normality, and couches his
arguments in terms of an insurance policy:
A rejection rule (for outliers) is like a householder's fire insurance policy. Three questions to be considered
in choosing a policy are:
(1) What is the premium?
(2) How much protection does the policy give in the event of fire?
(3) How much danger really is there of a fire?
Item (3) corresponds to the study of whether spurious readings occur in fact-a study that is hardly possible
unless plenty of readings are available. The householder, satisfied that fires do occur, does not bother much
about (3), provided the premium seems moderate and the protection good. In what currency can we express
the premium charged and the protection afforded by a rejection rule? That depends on the purpose of the
observations; an answer can be given as soon as a suitable loss function is specified.
He goes on to argue that, for problems involving estimation of a location parameter, we should
quantify the currency of the premium as “the percentage increase in the variance of estimation errors due
to using the rejection rule, when in fact all the observations come from a homogeneous normal source.”
Although this premium is not always easily calculated for finite samples, its asymptotic value can be
easily quantified by using Asymptotic Relative Efficiency (ARE). A classic example of this compares the
Wilcoxon/Mann-Whitney rank tests to the t-test (Koenker, pp. 83-84 2006), for testing whether the
location parameters of the distributions from two populations are equal. Comparing the ARE of the
Wilcoxon test and t-test under a variety of underlying response distributions, one easily sees that the
premium payable for using the Wilcoxon test is only about 5% when the underlying distributions are
normal. When the distributions are skewed (e.g. Log-Normal), the cost of using the t-test is about 735%
that if we had used the Wilcoxon test instead, and the cost of using the t-test is at least twice that of the
Wilcoxon if the underlying distributions are heavy-tailed (Cauchy, or t-distributed with 2 or 3 d.f.).
Clearly, even when the symmetry assumption is satisfied, the 5% premium one must pay for using the
22
Wilcoxon test under Normality far outweighs the additional costs when the underlying distributions are
skewed or heavy-tailed.
This result ought to extend naturally to the regression context, where the classical approaches to
linear regression assume homoscedasticity, or constant variance in the outcome relative to a given
predictor. Koenker and Bassett (1978) formally introduced the concept of regression quantiles, by
generalizing the theory of order statistics (David, 1981) to the regression context. Briefly, a regression
quantile for the effect of a treatment at quantile η is the horizontal difference in the distribution functions
(evaluated at η) of the treatment and control groups. A regression quantile β(η) is then estimated by the
following minimization program:
(1.8) 𝛽 𝜏 = 𝑚 𝑖 𝑛 𝛽 ,𝛾 ′
∊𝐵 ×𝐶 𝑛 −1
𝜌 𝜏 𝑌 𝑖 − 𝛽 𝑋 𝑖
𝑛 𝑖 =1
,
𝑤 ℎ𝑒 𝑟 𝑒 𝜌 𝜏 𝑢 = 𝜏 𝑢 𝐼 𝑢 < 0 + 1 − 𝜏 𝑢 𝐼 𝑢 > 0 .
For comparison, suppose we have a random sample from the population distribution F(y|X). The
usual least squares estimate and its variance are given by β=(X’X)
-1
X’y and V(β)=(X’X)
-1
ζ
ε
2
. The variance
of the τ
th
regression quantile is given in equation 1.9. It is immediately clear that the ARE of the least
squares and quantile regression estimators is exactly η(1-η)/[s(η)]
2
, where the numerator is the square of
Tukey‟s sparsity function (1965): s(η)=d/dη[F
-1
(η)]=[f(F
-1
(η))]
-1
. Thus, a skewed distribution has more
mass (and thus less sparsity) near one of the extreme quantiles, while a heavy-tailed distribution has less
sparsity near the extremes than an exponentially light-tailed one (e.g. the Normal). More to the point, the
information contained in the sample may not be located at the center of the distribution, but instead near
one of the extreme quantiles (ibid.). Regarding this last point, a fair comparison of the least squares and
quantile regression estimates ought to broaden Ascombe‟s view of the matter, and account for bias as well.
Beyerlein et al. (2010) provide a fine example of this, by demonstrating that formula feeding during
infancy had no effect on mean BMI at age 6 (at which children enter school in Germany), but that
formula-fed children at the 97
th
percentile of BMI (which determines obesity in Germany) had BMI of
about 0.2 standard deviations greater at age 6 than those who were not. In other words, formula feeding
23
tended to make obese children even heavier, while it appeared to have no effect among normal weight
children: specifically, E(BMI | Formula Fed) ≠ Q
BMI
(0.97 | Formula Fed).
(1.9) 𝑛 (𝛽
− 𝛽 ) 𝜍 𝛽 2
~𝑁 (0,1),𝑤 ℎ𝑒 𝑟 𝑒 𝜍 𝛽 2
= lim
𝑛 ∞
(𝑋 ′𝑋 )(𝜏 − 𝜏 2
) 𝑠 (𝜏 )
2
.
Inference for regression quantiles can be conducted using asymptotic approximations, for which
the limiting distributions take the form in equation 1.9. This result is an extension of the asymptotic
distribution of univariate quantiles (David, 1981) to the regression setting. The function s(η) can be
estimated using a variety of density estimation techniques (Silverman, 1986). For finite samples, this
amounts to comparing the ratio in equation 1.9 to a central t distribution, thus producing a Wald-type test
for quantile regression. Alternative approaches include adaptive kernel density, rank-based, and
resampling techniques for estimating standard errors. Chernozukhov (2005; 2011) indicates that the
normal approximation (equation 1.9) performs poorly at the tails of the response distribution (e.g. η↗1 or
η↘0), especially when the response distribution is skewed or heavy-tailed. To combat this problem, he
combines conventional quantile regression approaches (for which central quantiles are well-estimated)
with Extreme Value Theory (EVT for short), by employing a resampling scheme to estimate nuisance
parameters from the corresponding extreme value distributions. Our own experience indicates that
bootstrap-based confidence intervals correspond very well to those from Chernozukhov (2011).
Though the linear programming formulation of quantile regression is computationally efficient,
Chernozukhov (2005) demonstrated that it performs poorly at the extremes. Moreover, it does not easily
extend to the case where observations are correlated across either space (subjects living in the same
neighborhood) or time (subjects have repeated measurements). Geraci and Bottai (2007) proposed a
mixed effects modeling approach, but only for a limited number of random effects estimated in such a
way that does not easily generalize to more complicated models (by virtue of a computationally intensive
MCEM solution). Moreover, existing methods for simultaneous equations models for conditional
quantiles have been limited to the case where data are completely observed (Chesher, 2003; Ma and
Koenker, 2004; Lee, 2007), or are developed using instrumental variables (IV) techniques which may not
24
be germane to estimating mediation (for the special cases where IV models are suitable see MacKinnon,
2008). When some of the variables of interest are latent factors (due to measurement error or lack of
objective measurement), such approaches cannot be simply extended using conventional SEM techniques
such as CSA, because the interest is in characterizing mediation for quantiles of the response distribution.
Bayesian quantile regression has the potential to rectify many of these problems, as fairly robust
estimation and inference techniques exist for Bayesian models (Gelman et al. 2003). Yu and Moyeed
(2001) showed that the Asymmetric Laplace Distribution (ALD, equation 1.10) properly characterizes
regression quantiles. Their approach can be written as a generalized linear model by letting the link
function be the quantile function (equation 1.12), and the random errors (say, u) follow the ALD with
parameters μ=x
T
β(η) and ζ=1. Estimation then follows via a single-component Metropolis-Hastings
algorithm (Yu and Moyeed, 2001). A drawback to this approach is that sampling from the ALD is not
straightforward, and resulting conditional distributions do not have standard forms (such as normal or
inverse gamma).
(1.10) 𝑓 𝜏 𝑦 ;𝜇 , 𝜍 =
𝜏 1−𝜏
𝜍 exp −𝜌 𝜏
𝑦 −𝜇 𝜍
(1.11) 𝜌 𝜏 𝑢 = 𝑢 {𝜏 − 𝐼 (𝑢 < 0)}
(1.12) 𝑄 𝑌 𝜏 𝑋 = 𝑥 ′
𝛽 𝜏 , ∋ 𝑃 𝑦 − 𝑥 ′
𝛽 𝜏 ≤ 𝑄 𝑌 𝜏 = 𝜏
For models such as that embodied in equation 1.3, a Gibbs sampler would be more appropriate
because it samples jointly from the conditional distributions of each free parameter. In order to derive the
conditional distributions of the quantile regression parameters, a random variable U~ALD(μ, ζ) can be
represented as a finite mixture of standard normal and exponential random variables as in equation 1.13,
because both U and its mixture equivalent have the same MGFs (Reed and Yu, 2009) and characteristic
functions (Kozumi and Kobayashi, 2009). With the conditional distributions in hand, one is fully
equipped to conduct Bayesian estimation and inference for conditional quantiles in a flexible manner.
(1.13) 𝑢 =
1−2𝜏 𝜏 1−𝜏
∙ 𝜔 +
2𝜏 𝜏 1−𝜏
𝜔 ∙ 𝑧 , 𝑤 ℎ𝑒 𝑟 𝑒 𝜔 ~𝐸 𝑋 𝑃 1 & 𝑧 ~𝑁 (0,1)
25
1.3.5 Structural Equation Models for Conditional Quantiles
Equations 1.14-1.15 below describe the conditional quantile analogue of the structural equation
model from equations 1.1 and 1.2, hereafter called the Structural Quantile Treatment Effect (SQTE)
model.
(1.14) Quantile Outcome Model: 𝑄 𝑌 𝜏 𝑋 , 𝑀 = 𝛽 0
(𝜏 ) + 𝛽 (𝜏 )𝑀 + 𝛾 ′
(𝜏 )𝑋
(1.15) Mediator Model: 𝐸 𝑀 𝑋 , 𝑍 = 𝛼 0
+ 𝛼 𝑋 + 𝛿 𝑍
Previous work on this problem emerged in the econometrics literature, with development of three related
estimation methods. The first strain was described originally by Amemiya (1982) and Powell (1983),
who proffered what they called the two-stage least absolute deviations approach (2SLAD). This basically
involved estimating the parameters of equation (1.15) via least squares or maximum likelihood, and then
substituting the fitted values 𝑀
= α
0
+ 𝛼 𝑋 + 𝛾 𝑍 into the objective function used to estimate equation
and ‟, which are solutions to equation 1.16.
(1.16) 𝑚 𝑖 𝑛 𝛽 ,γ′∊𝐵 ×𝐶 (𝑛 −1
𝜌 𝜏 (𝑌 𝑖 − 𝛽 0
− 𝛽 𝑀 𝑖
− 𝛾 ′
𝑋 𝑖 )
𝑛 𝑖 =1
)
However, the inference approach proposed by Powell requires the rather arbitrary choice of a value q ∊
[0,1], for which the choice q=0 represents the instrumental variables (IV) interpretation of 2SLAD and
q=1 is the quantile-analogue of two-stage least squares. Kim and Muller (2004) later showed that the
2SLAD approach results in inconsistent estimates for the parameter β(η), which is corrected if one
estimates conditional quantiles for both M and Y at the same quantile η.
The second approach uses an IV construction. One such approach is detailed in Chernozhukov
and Hansen (2005; 2006), with others to be found in the references therein. It entails finding an
“instrument” X so that the residuals, say 𝑈 = 𝑌 − 𝛽 (𝜏 )′𝑋 , satisfy the conditions 𝑈 ⊥ 𝑋 and 𝜌 𝑋 ,𝑌 ≠ 0.
Judd and Kenny (1981) indicate that X must have some relationship with Y (in the absence of a mediator
M) to assess mediation, and so IV methods are typically not employed for estimating or testing mediation.
While the estimation of instrumental variable models will not be considered here, interested readers can
26
see Johnston and Dinardo (1997) which provides an excellent overview of the use of IV models in
econometrics.
The third approach uses control functions, for which basic identifiability conditions were
proposed by Chesher (2003). Chesher‟s approach involves computing “structural derivatives” from a
two-stage estimation procedure which directly incorporates the error in estimating the structural
relationship in M. However, a major limitation to Chesher‟s approach is it becomes very cumbersome
when the number of covariates gets large because the estimator is essentially a partial derivative of an
appropriately defined function, and this function becomes complex in direct proportion to the number of
equations in the model. Ma and Koenker (2006) and Lee (2007) propose alternative approaches not
burdened by the “curse of dimensionality” that Chesher‟s entails. They estimate the conditional quantile
(or mean) of M given X and Z, and then include both the observed M and the residuals from the first stage
estimation (say V) to estimate β and γ’ by equation 1.17 below. As they demonstrated that the control
function approach has less bias and smaller variance than the fitted value method, its properties will also
be examined with respect to mediation for different quantiles of the response distribution.
(1.17) 𝑚 𝑖 𝑛 𝛽 ,𝛽 ,𝛾 ′
,𝜋 ∈𝑹 4(𝑛 −1
𝜌 𝜏 (𝑌 𝑖 − 𝛽 0
− 𝛽 𝑀 𝑖 − 𝛾 ′
𝜏 𝑋 𝑖 − 𝜋 [𝑉 𝑖 ])
𝑛 𝑖 =1
).
These methods have many merits. First, they are easily implemented using existing software, and
are conceptually easy to understand. Second, they build upon a rich source of quantile regression
literature in econometrics, and have a variety of optimal properties for the situations in which they arise.
However, economists are usually concerned with forecasting and prediction (Johnston, 1997), and so
more sophisticated econometric techniques are often poorly suited to dealing with the association models
epidemiologists concern themselves with. The methods further do not attempt to estimate or test quantile-
specific mediation. They are also not suitable in the presence of measurement error in any of the
variables. SEM approaches have typically dealt with measurement error by treating the underlying
variable of interest as a latent variable, and estimating indirect effects via covariance structure analysis
(CSA) using the implied covariance matrix. Such a problem arises when the underlying variable of
27
interest is not directly measured, such as the Walkability index described in section 1.2.2. However, the
implied covariance matrix does not sufficiently identify quantile-specific parameters, and so one cannot
simply extend CSA-based approaches from the mediation literature. The matter is further complicated
when one is interested in estimating indirect effects for latent variables in the presence of correlated data
(be that across temporal or spatial units).
Burgette (2011) dealt with one of the aforementioned issues, by developing a Confirmatory
Factor Analysis (CFA) model for conditional quantiles. Building upon the work of Reed and Yu (2009),
he outlines a method for conducting CFA when the outcome of interest is the η
th
quantile of the
distribution of birthweight, and some of the predictors of interest are latent variables (such as perceived
stress or other psychosocial risk factors). The conditionally Normal distribution representation allows for
a flexible Bayesian approach to quantile-CFA, but does not allow one to characterize mediation. It also
does not allow for modeling the effects of factors across time (repeated measures) or space (e.g. children
from the same school, etc.).
1.4 Proposed Methodology
The underlying motivation for the methods presented in this dissertation concerns how to best
characterize obesity. Among children, obesity is characterized by the 95
th
quantile of age- and gender-
specific percentile curves constructed from a reference population (Kuczmarski et al., 2002). Moreover,
examining the mediating effects of physical activity on normal weight individuals does not appropriately
characterize such effects among obese individuals, whose BMI measurements will lie at the upper
quantiles of the BMI distribution. To our knowledge, no methods exist which allow for estimation of and
inference about indirect effects in mediation models at quantiles of the response distribution in a manner
which properly accounts for: 1) correlation between multiple mediators along the causal path; 2)
measurement error in mediators; or 3) multilevel data structures. To resolve these methodological
problems, this dissertation proposes the models described below.
28
The first model provides for the estimation of indirect effects at quantiles of the response
distribution, when all model variables are directly observable. This is achieved via a simple extension of
Baron and Kenny‟s Causal Steps approach to the quantile regression setting, which parallels the SQTE
model described above by estimating each part of the model in succession (eqs. 1.16-1.17). Though
these SQTE estimation procedures have already been proposed for estimating parameters of SEMs for
quantiles of the response distribution, the proposed methods differ in some very important respects.
Primarily, the SQTE paradigm is not directly concerned with mediation. Rather, the goal is to properly
account for endogeneity in design variables. Estimation of parameters in SQTE models therefore does
not provide any insight into quantile mediation, and the associated inferences can only be drawn about
individual parameters in the model, not whether mediating mechanisms underlie observed associations.
Furthermore, the SQTE paradigm applies only to the relatively simple SEMs represented by equations
1.1-1.2, and does not readily extend to the case of multiple mediators as in Figure 1.1. This model instead
follows the Causal Steps paradigm (Baron and Kenny, 1986) to estimate indirect effects, and devises tests
for whether proposed mediating mechanisms are present. A fully Bayesian approach to estimation and
inference is also proposed to allow for the estimation of multiple specific indirect effects, corresponding
to multiple mediators.
The second model extends the first by allowing for measurement error in any of the model
variables. This is particularly relevant for HPS, where mediators such as personal attitudes about PA are
not directly observable, and so must be operationalized as latent variables which are measured using
appropriate survey scales. The SQTE and paradigm clearly fails to account for measurement error, as
does the causal mediation framework of Imai (2010). Burgette‟s (2011) Bayesian quantile CFA model
properly incorporates measurement error, but does not allow for the characterization of mediational
effects. This model builds upon Burgette‟s work by allowing for the estimation of a full quantile
structural equation model, by also employing the mixture representation (Reed and Yu, 2009) of the
Asymmetric Laplace Distribution (equations 1.10-1.13). This will then allow for the direct
29
characterization of the mediation relationships among obese children and adults (whose BMI values are at
the extreme quantiles of the BMI distribution) participating in the HPS.
The third model deals with quantile mediation in the presence of longitudinal or multilevel data
structures. Yuan and MacKinnon (2009) described a simple approach to accommodate multilevel data
structures in mediation analysis, by allowing the parameters to be group-specific, and then assigning the
group-specific parameters appropriately specified prior distributions. The same strategy may be applied
here, augmenting the model defined by equations 1.14 and 1.15 in the same manner as Yuan and
MacKinnon (2009). Several possible approaches are presented and compared, all of which exploit the
hierarchical structure of the resulting model, and thus employ Bayesian techniques for estimation and
inference for the quantile indirect effects. Both WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/) and
our own software were used to implement the models described below.
The balance of this dissertation is organized into the following sections. Section 2 describes the
basic model assumptions upon which all following work builds, as well as two-stage estimation and
inference procedures for quantile mediation with observed variables and only a single mediator. It further
elucidates the SQTE, causal inference, and Causal Steps approaches for testing mediating mechanisms at
quantiles of the response distribution, and wraps up with a discussion of results from a comparative
simulation study. Sections 3.1-3.3 describes the Bayesian quantile mediation model for one or more
mediators and a single outcome which are measured without error, and the associated estimation and
inference techniques based on MCMC methods. Sections 3.4 and 4 continue the discussion of the
proposed Bayesian techniques to handle the following situations: 1) measurement error in the endogenous
variables; and 2) estimating and testing mediation when the independence assumption fails to hold (e.g.
where there is more than one sample observation for individual subjects). Section 5 presents various
analyses of data from the HP Study, using each of the methods presented in sections 2-4. Section 6
concludes the dissertation with a discussion of the data analysis results, areas of future work that still
remain, and a few key points concerning how causal inference methods compare to those proposed in this
dissertation.
30
2 Quantile Mediation Analysis with Observed Variables
[Some material in this section has been submitted to Multivariate Behavioral Research, and may
appear in that journal as well.]
This section describes methods for estimation and inference for mediation analysis when the
outcome of interest is a quantile of the response distribution. It begins by introducing the basic
assumptions and notation which define the models that follow. Three different approaches to quantile
mediation analysis are then detailed. The first builds upon the two-stage estimation procedures of
Amemiya (1982), Powell (1983) and Kim and Muller (2004), who collectively proposed the least squares
quantile regression (LSQR) and dual-stage quantile regression (DSQR) estimators of the structural
parameters for structural equation models involving conditional quantiles. The second is an extension of
the control function estimators proposed by Chesher (2003), Ma and Koenker (2006) and Lee (2007).
The third extends the Causal Steps approach of Baron and Kenny (1986) to the quantile setting, and all
three approaches are subsequently compared with the Average Causal Mediation Effect (or ACME, for
short) of Imai et al (2010).
2.1 Model Assumptions
A commonly-used assumption in mediation analysis is that both the theoretical and measurement
timing of the elements of a mediation model occur in an acyclic temporal sequence, which precludes the
possibility of having causal feedback loops, whereby a proposed mediator could affect the treatment
variable or the outcome could affect the mediating variable. Models which exclude causal feedback loops
are known as recursive, and are important for a number of reasons. First, recursive models are
identifiable, in that all model parameters can be estimated via the implied covariance matrix (equation
2.4). Second, the reduced form for a recursive model (e.g. equation 2.3) has triangular parameter matrices
B and Γ, which makes explicit the assumption of no causal feedback loops between variables which occur
(theoretically and as measured) in an ordered temporal sequence. The balance of this dissertation
assumes that all models are recursive, as well as the associated assumptions that variables which are
31
proposed to follow in a temporally ordered chain are also measured in the corresponding sequence of time
(i.e. X is measured before M whenever X precedes M in the proposed model). We further assume that X
1
has as many columns or more than X (say G) such that the two-stage estimation procedures are exactly
identified, akin to Kim and Muller (2004) and Lee (2007).
(2.1) 𝑌 = 𝑖 1
+ 𝛽 𝑀 + 𝛾 ′
𝑋 + 𝑢
(2.2) 𝑀 = 𝑖 2
+ 𝛼 𝑋 + 𝑣
(2.3) 𝑦 = 𝐵 𝒚 + Γ𝒙 + 𝑒 , 𝑤 ℎ𝑒 𝑟 𝑒
𝐵 =
0 𝛽 0 0
, 𝛤 =
𝑖 1
𝛾 ′
𝑖 2
𝛼 , 𝒚 =
𝑌 𝑀 , 𝒙 =
1
𝑋 , 𝒆 =
𝑢 𝑣
(2.4) 𝛴 𝑦 𝑦 = 𝐼 − 𝐵
−1
𝛤 𝛷 𝑥 𝑥 𝛤 ′ + 𝛹 𝑒 𝑒 𝐼 − 𝐵
−1
′
However, not all of the classical assumptions of mediation analysis will be employed. In fact,
one of the chief advantages of estimating quantile mediation is that one can relax the assumption that the
mediators and outcomes be normally distributed, and presumably arrive at an estimator of mediation
which is not sensitive to outlying observations of either the mediator or outcome. Indeed it is the
relaxation of the normality assumption which distinguishes the work presented in this dissertation from
existing methods. The remaining assumptions that no important variables have been excluded, and the
mediators of interest are theoretically relevant, will still be employed. The following methods also
assume linearity in both the parameters and predictors.
In order to operationalize mediation, we will adopt the usual convention of characterizing the τ
th
-
quantile mediated (or indirect) effect as the product αβ(η). Here, β(η) is such that P{Y
-Mβ(η)< Q
Y
(η)}= η,
and α is such that E(M|X)=µ
M
+ αX. We also employ the standard notation which lets i=1,…, n indexes
n subjects who are sampled independently from a common population. While we allow the distribution
of u to take any form, we restrict the class of distributions for v to be continuous and symmetric with
mean 0. Following Powell (1983) and Kim and Muller (2004), we further assume that E{Q
U
(η|X,M)} = 0.
This formally expresses the usual assumption that the η
th
quantile of u given X and M should be 0 (see the
check function in equation 1.11 or the definition in 1.12).
32
2.2 Two-Stage Estimation Methods
Amemiya (1982) first proposed an estimator for structural parameters of simultaneous equations
models in econometrics. If we reparametrize equations 2.1 and 2.2 by letting π={i
1
, γ’( τ)} and Z={1, X}
T
,
the usual Two-Stage Least Squares (TSLS) solution is given by equation 2.5 and the instrumental
variables estimator is given by equation 2.6, where P
i
is the i
th
row of P=X
i
’(X
i
’X
i
)
1
X
i
. He then indicates
that the Two-Stage Median Regression analogues of equations 2.5 and 2.6 should be equations 2.7 and
2.8, respectively. Combining equations 2.7 and 2.8 into a single model, we can then write the estimator
as in equation 2.9, where q=0 corresponds to equation 2.8 and q=1 corresponds to equation 2.7. It is
worth noting that P’M is actually the least squares solution for equation 2.2, and so 2.7 will henceforth be
referred to as the Least Squares Quantile Regression (LSQR) estimator, where equation 2.7 represents the
special case when the quantile of interest is the median. This dissertation will later demonstrate that the
more general form of 2.7 has suitable properties for other quantiles of the response distribution, especially
near the extremes.
(2.5) 𝜋 = min
𝜋 𝑌 𝑖 − 𝜋 𝑍 𝑖 − 𝛽 𝑃 𝑖 ′
𝑀 𝑖
𝑖 2
(2.6) 𝜋
𝐼 𝑉 = min
𝜋 𝑃 𝑖 ′𝑌 𝑖 − 𝜋 𝑍 𝑖 − 𝛽 𝑃 𝑖 ′
𝑀 𝑖
2
𝑖
(2.7) 𝜋 (0.5) = min
𝜋 |𝑌 𝑖 − 𝜋 𝑍 𝑖 − 𝛽 𝑃 𝑖 ′
𝑀 𝑖 |
𝑖
(2.8) 𝜋
𝐼 𝑉 (0.5) = min
𝜋 |𝑃 𝑖 ′𝑌 𝑖 − 𝜋 𝑍 𝑖 − 𝛽 𝑃 𝑖 ′
𝑀 𝑖 |
𝑖
(2.9) 𝝅 (0.5) = min
𝜋 |𝑞 𝑌 𝑖 + (1 − 𝑞 )𝑃 𝑖 ′𝑌 𝑖 − 𝜋 𝑍 𝑖 − 𝛽 𝑃 𝑖 ′
𝑀 𝑖 |
𝑖
Amemiya goes on to demonstrate consistency and asymptotic Normality for the model given by
equation 2.9 for any q>0, albeit in the case where u and v are both normally distributed. Although this
might suggest that TSLS ought to have an efficiency advantage over equation 2.9 whenever u and v
follow Normality, Amemiya further demonstrates that this advantage is proportional to how closely the
33
distribution of u matches Normality. In other words, LSQR is superior to TSLS for response distributions
which sufficiently violate Normality (i.e. non-trivial skewness or heavy tails). In the simulation studies
which follow, we demonstrate that this is indeed the case in terms of both bias and efficiency, especially
when the quantiles of interest are near the extremes of the response distribution (particularly the 85
th
and
95
th
percentiles).
Powell (1983) demonstrated the Consistency and Asymptotic Normality of LSQR more generally,
by allowing u to follow any continuous distribution and imposing the conditions that q>0 and the
distribution of v is continuous with mean 0 with equal mass on either side. He further shows that the
estimator is asymptotically equivalent for all q>0, and so it is safe to assume that our estimates of β(η) will
(asymptotically) follow Normality under quite general regularity conditions. Although Amemiya (1982)
argues that a suitable choice for q is ζ
u
/ζ
v
, assuming that P has full column rank and rank(X
1
)=G+rank(X),
ensures that the estimator for π does not depend on q>0 (Kim and Muller, 2004) and has the form given in
equation 2.10 (where ρ
τ
(u) is defined as in equation 1.8).
(2.10) 𝜋 = min
𝜋 𝜌 𝜏 (𝑞 𝑌 𝑖 + (1 − 𝑞 )𝑃 𝑖 ′𝑌 𝑖 − 𝜋 𝑃 𝑖 ′
𝑍 𝑖 )
𝑖
Following conventional approaches (MacKinnon, 2008), the LSQR estimator for the indirect
effect αβ(η) can then be written as the product of two separate estimators, given by equations 2.11 and
2.12. Powell (1983) derived the asymptotic covariance matrix for π, but Chernozukhov (2011) correctly
notes that such asymptotic approximations for conditional quantiles are not necessarily valid near the tails
of the response distribution, which are the quantiles with which we are most interested (namely, the 85
th
and 95
th
percentiles). Since we wish to avoid incorrectly applying normal laws to non-normal tail
behavior (ibid.), we shall employ resampling techniques (Koenker, 2006) in order to correctly estimate
the variance of the LSQR estimator of β(η), which Koenker showed has better coverage than the Wald or
Rank tests for a variety of underlying models.
(2.11) 𝛼 = min
𝛼 𝑀 𝑖 − 𝑖 2
− 𝛼 𝑋 1𝑖
𝑖 2
34
(2.12) 𝛽
(𝜏 ) = min
𝛽 𝜌 𝜏 (𝑌 𝑖 − 𝑖 1
− 𝛽 𝑃 𝑖 ′
𝑀 𝑖 − 𝛾 ′
𝑋 𝑖 )
𝑖
Alternatives to the LSQR estimator are worthy of consideration, as Kim and Muller (2004)
demonstrate that the LSQR estimator may produce biased estimates of β(η) and γ’(η), because
E{Q
U
(η|X,M)}=0 cannot hold simultaneously with E{u|X,M}=0 when either: 1) the distribution of u is not
symmetric, or 2) the quantile of interest is other than the median. To rectify this problem, they
recommend using Dual-Stage Quantile Regression (DSQR), which estimates the conditional quantiles for
both models for Y and M at the same quantile. This is made more explicit in equations 2.13 and 2.14.
Although they derive the asymptotic covariance matrix for the DSQR estimator, Chernozukhov‟s (2011)
criticism of applying the normal approximation for central quantiles to non-normal extreme quantiles still
applies. Specifically, Chenozukhov argues convincingly that the extreme regression quantiles do not
satisfy the usual Normality assumptions that are usually applied for regression parameters in the
conditional mean or median models. Rather, they follow extreme value distributions which depend
directly on the type of tail behavior that the underlying population response distribution exhibits. Thus,
we use bootstrap estimates of the variance for the parameters of interest to circumvent the usual
Normality assumption for the quantile regression parameters.
(2.13) 𝛼 = min
𝛼 𝜌 𝜏 𝑀 𝑖 − 𝑖 2
− 𝛼 𝑋 1𝑖
𝑖
(2.14) min
𝛽 𝜌 𝜏 (𝑌 𝑖 − 𝑖 1
− 𝛽 𝑄 𝑀 𝑖 (𝜏 |𝑋 𝑖 ) − 𝛾 ′
𝑋 𝑖 )
𝑖
Yet a third two-stage approach has no clear analogue in the literature on mediation analysis, but is
worth considering because it has seen some application in economics and parallels the DSQR estimator.
Chesher‟s (2003) paper on identifiability for non-recursive (he calls them non-separable) SEMs
established the basic identifiability conditions for this so-called Control Function/Variate (CV) approach.
Ma and Koenker (2006) and Lee (2007) describe separate, and only slightly differing, versions of the CV
approach. Koenker and Ma explicitly compared their CV estimator to Chesher‟s structural derivative
estimator, and found that the CV had an efficiency advantage over Chesher‟s. Lee adds the additional
35
assumption that the model be fully recursive, and so is less general than the approaches described by
Chesher (2003) and Ma and Koenker (2006). Nevertheless, the assumptions in Lee‟s model most closely
match the assumptions we have made, and so our adaptation of the CV approach will parallel Lee
(equations 2.15 and 2.16). Chernozukhov‟s (2011) criticism once again holds true, and so we again
employ a resampling strategy to estimate the variance of the parameter estimates.
(2.15) 𝛼 = min
𝛼 𝜌 𝜏 𝑀 𝑖 − 𝑖 2
− 𝛼 𝑋 1𝑖
𝑖
(2.16) min
𝛽 𝜌 𝜏 (𝑌 𝑖 − 𝑖 1
− 𝛽 𝑀 − 𝛾 ′
𝑋 𝑖 − 𝛿 (𝑀 𝑖 − 𝑄 𝑀 𝑖 𝜏 𝑋 𝑖 ))
𝑖
Aside from methods from economics, this section also compared two other methods more deeply
rooted in the social sciences, in which methods for mediation took root. To our knowledge, the only
existing approach for quantile mediation analysis was detailed by Imai et al. (2010), and their
nonparametric approach encompasses quantile mediation models, as implemented in the “mediation”
package in R (Imai, 2010). Their sequential ignorability assumption allows them to identify the Average
Causal Mediation Effect (ACME for short) as E{Y(X,M(X=1))-Y(X,M(X=0))}, the Average Direct Effect
as E{Y(X=1,M)-Y(X=0,M)}, and the Average Total Effect as E{Y(X=1,M|X=1)-Y(X=0,M|X=0)}. Under
the assumption of no moderation, they are able to show that the Average Total Effect equals the sum of
the ACME and the Average Direct Effect. This model was then extended to accommodate quantiles of
the outcome distribution, where the Quantile Causal Mediation Effect (or QCME) is defined as the
difference in the quantile functions (for an arbitrary probability τ) between two potential outcomes (Imai,
Keele and Tingley, 2010, p. 10).
Over bootstrapped samples, they fit models for the observed mediator and outcome variables, and
use these results to simulate potential values of the mediator and outcome variables, thus allowing them to
compute the effects described above using the resultant potential outcomes. Point estimates are obtained
across the bootstrapped samples from the summary statistics, as well as confidence intervals. Their
sequential ignorability assumption encompasses the assumptions that a recursive model entails, namely
36
that the errors from different equations are independent and the graph representing the model has no
reverse feedback, while adding the additional assumptions of no pre-treatment confounding and
ignorability of treatment assignment (effectively, randomization). The models presented below instead
only assume that the model be recursive, which QCME mplicitly uses the sequential ignorability
assumption of the QCME (Imai, Keele, and Yamamoto, 2010). However, as in the design of data
collection for the Healthy Places study, proper study design (i.e. proper temporal ordering of the
collection of data corresponding to model variables - X before M and M before Y) renders such
assumptions mostly unnecessary (aside from the no confounding conditions), since the parameters
corresponding to such models ought to have causal interpretations (again, in the absence of confounding).
A simpler estimation strategy can build directly upon the regression-based approach underlying
the Causal Steps method (Baron and Kenny, 1986). The original Causal Steps method separately
estimates regression equations for the mediator and outcome, and the same principle ought to apply to
quantiles of the outcome distribution. The resulting quantile-analogue of the Causal Steps method
estimates the parameters of equation 2.2 via linear regression, and conducts quantile regression to
estimate the parameters of equation 2.1. Alternatively, one could also estimate conditional quantiles for
both equations 1 and 2, at the same or different quantiles. The ramifications for choosing different
quantiles depend upon the shape of the distribution of M, but one could posit that quantifying the
quantile-indirect effect as { ∫α(η)dη}β(η) ought to be equivalent to using αβ(η) where α is obtained using
least squares. Indeed, it can be easily shown that the two quantities are equal whenever the distribution of
v in Equation 2 is symmetric.
While Imai et al (2010) demonstrate that the original Causal Steps method is subsumed in the
QCME approach (in the case of linear models for M and Y, and no X by M interaction), it is not clear
whether this holds true when the effects of interest correspond to quantiles of the outcome distribution.
Though both methods follow from their sequential ignorability assumption, we were interested to know
whether the quantile analogue of the Causal Steps method proposed above results in the same estimates of
the indirect effect as the QCME. This issue is addressed below in the simulation study section, which
37
served as an empirical assessment of whether the usual SEM approach for the product works when the
outcome model corresponds to a quantile of the outcome, rather than the mean. It should be noted at this
stage, that the causal interpretation of this estimator is not guaranteed via sequential ignorability (Imai,
Keele and Tingley, 2010, p. 10).
However, there is no proof one way or the other that the commonly used product of coefficients
does not extend to the quantile setting. As mentioned above, proper study design (i.e. a longitudinal one)
ought to ensure causal interpretation of any model parameters, though a detailed discussion of exactly
how that might happen is better left to other papers (Imai, Jo and Stuart, 2011; Maxwell et al., 2011).
Moreover, the identifiability of direct and indirect effects can be accomplished non-parameterically
without the use of sequential ignorability, by employing the recursive rule (Bollen, 1989) along with
assuming linearity in covariates and parameters for a continuous outcome, no confounding and no X-by-
M interaction (Galles and Pearl, 1998; Pearl 2012).
Extension of existing quantile regression approaches, as detailed above (Amemiya, 1982;
Chesher, 2003; S. Lee, 2007; Ma and Koenker, 2006; Powell, 1983), may constitute alternative
approaches to estimating quantile mediation parameters, which allow one to capture both location and
scale shifts that mediators can exert on the distributions of outcomes. However, the partially recursive
structure of Chesher‟s model (2003) may not be suitable here, because mediation analysis is typically
concerned with testing proposed causal mechanisms, and the possibility of causal feedback is generally
precluded by proper timing of variables (both in construction of a theoretical model and in measurements
which are taken). Moreover, the exact identifying assumptions Chesher employs (2003) involve the
identifiability of derivatives of structural functions, and are needed because the usual assumption of
uncorrelated errors is not made. As such, direct comparison of the identifiability conditions of the control
function approach with the Causal Steps or QCME methods is not useful, since they encompass very
different modeling problems.
38
2.3 Inference for Quantile Mediation
The three foregoing approaches involve only estimation of the parameters of the quantile SEM in
equations 1.14 and 1.15. In order to conduct inference for quantile mediation, we propose tests for the
null hypothesis H
0
: αβ(η)=0. Once a decision is made as to which of the three methods will be employed
to estimate α, β(η) and their variances, one may readily assume that estimates for each parameter are
normally distributed, and then develop corresponding Wald tests for the indirect effect. For illustrative
purposes we assume that the LSQR estimator is being used, and estimates of α and β(τ) and their
variances are given. The tests that appear in this section can be considered quantile analogues of existing
tests for mean indirect effects, of which the notable ones are: 1) Sobel Test (1982, 1986); 2) Goodman
Test (1960); 3) Joint Significance Test; and 4) Product Z-Test (MacKinnon, 1998).
The quantile version of the Sobel test (1982, 1986) employs the delta method (Casella and Berger,
2002) using a first-order Taylor series approximation, and exploits the recursive structure of the model to
conveniently obtain the approximate asymptotic distribution of αβ(η). Here, the function of α and β(η) is
αβ(η), and the recursive structure of the model implies that the covariance matrix is given by 2.17, which
yields the approximate limiting distribution given by 2.18. One then compares the difference in equation
2.18 to the usual percentiles of the corresponding distribution (usually the 5
th
or 95
th
) to test whether
αβ(η)=0 . This test is also easily generalized to situations involving multiple mediators, or latent variables,
and so is a worthy candidate for a test of quantile mediation.
(2.17) Σ α,β
τ
=
𝜍 𝛼 2
0
0 𝜍 𝛽 2
(2.18) 𝑁 𝛼 𝛽 𝜏 − 𝛼 𝛽
𝜏
𝑑 𝑁 (0, 𝛽 𝜏 2
𝜍 𝛼 2
+ 𝛼 2
𝜍 𝛽 2
)
The Joint Significance test is a modified version of the causal steps approaches of Baron and
Kenny (1986) and Judd and Kenny (1981). Baron and Kenny elaborated on the work of Judd and Kenny,
by outlining a set of conditions to test for mediation. First, X must be significantly associated with Y in
the absence of M. Second, X must be significantly associated with M, and M must be significantly
39
associated with Y. Finally, X must no longer be associated with Y after accounting for the mediating
effect of M. MacKinnon et al. (2002) demonstrated that this causal steps procedure has incredibly low
statistical power, despite its usefulness for assessing mediation on substantive grounds, and proposed the
Joint Significance test as a more powerful alternative to the causal steps method. A simple adaptation of
this consists of separately testing whether α=0 and β(η)=0, and declaring the indirect effect significant if
both parameters are deemed different from 0.
A serious limitation of the Joint Significance test is that it provides no sense of the variability of
an indirect effect. Although the Sobel test allows for the creation of symmetric confidence intervals,
MacKinnon et al. (1998) showed that (under the null hypothesis αβ=0) the proportion of Sobel-based
confidence intervals that fall above and below 0 are not the same. This is because the Normal distribution
is not closed under multiplication. In other words, if α and β are each Normally distributed random
variables, their product is not Normally distributed - in fact, is not even symmetric! The Product-Z test
exploits the exact distribution of the product of two standard normal random variables, which is
asymmetric. In the quantile mediation setting, this test first computes Z
α
and Z
β(η)
and then compares the
product Z
α
Z
β(η)
to the appropriate quantiles (5
th
or 95
th
,
depending on the sign of the product), hence the
name Product-Z.
2.4 Simulation Results
A simulation study was conducted to examine 1) how well the quantile indirect effects are
estimated, and 2) whether the product and difference of coefficients are equivalent when estimating direct
and indirect effects for quantiles of the outcome distribution, and 3) what the coverage rates are for the
various methods, both across different sample sizes and outcome quantiles. In order to draw fair
comparisons for the latter three methods, which were compared using the simulation setting below, we
adapted the simulation and estimation frameworks of Ma and Koenker (Ma and Koenker, 2006).
Specifically, we simulated a linear location-scale shift model for 1000 cross-sectional datasets given by
equations 2.19 and 2.20. Sample sizes of 100, 500 and 1000 were used in order to reflect small, medium
40
and large samples, similar to those considered in Ma and Koenker (2006). Furthermore, we knew that the
variability of the estimators of indirect effects would be larger at extreme quantiles than at the median
(Koenker and Bassett, 1978), and also that the type I errors of the different tests of quantile mediation
would approach the nominal level of 0.05 as the sample size increased. Thus, this choice of sample sizes
allowed us to examine whether both were true empirically as well.
(2.19) 𝑌 = 1 + 𝛾 ′
𝑋 + 𝛽 𝑀 + 𝛿 𝘀 1
𝑀 + 𝑋
(2.20) 𝑀 = 1 + 𝛼 𝑋 + 𝜋 𝑍 + 𝘀 2
However, our model differs slightly from theirs in that we impose the additional assumption that the
errors from 2.19 and 2.20 are uncorrelated with M and Y, thus making the model fully recursive. We also
computed different statistics to test the null hypothesis of no mediation (αβ(η)=0) under the null condition
of no mediation, in order to examine whether the tests proposed have the nominal Type I error of 0.05. In
addition, four test conditions were possible: 1) heterogeneity in effect of M and X, 2-3) heterogeneity in
effect of M or X only, 4) no heterogeneity of effect across the distribution of the response Y. The case of
no heterogeneity is equivalent to the linear location-shift model, which should have the same statistical
properties as conventional mediation methods across the distribution of Y.
The true values for the parameters (albeit at the median) are: α=2, β=4, γ=12, γ’=4, π=3, λ=3, δ=5.
Model variables were generated as: 𝑋 ~𝐹 𝑋 , 𝑍 ~𝐹 𝑍 , 𝘀 1
~𝑁 0,1 ,𝘀 2
~𝑁 (0,0.5) . Note as well that the
population quantile-dependent coefficients (assuming effect decomposition hold) are given by equation
2.21 below.
(2.21) 𝛽 𝜏 = 𝛽 + 𝛿 𝐹 𝘀 1
−1
𝜏 ,𝛾 ′
𝜏 = 𝛾 ′
+ 𝛿 𝐹 𝘀 1
−1
𝜏 ,𝑎 𝑛 𝑑 𝛾 𝜏 = 𝛼 ∗ 𝛽 𝜏 + 𝛾 ′
𝜏
For a concise presentation, we focus on the models generated under test condition 1, as the results
for the other test conditions were comparable. The results presented in table 2.1 indicate that the Causal
Steps and QCME methods produce the least variable estimates of the indirect effect (about its true value),
41
where the expected pattern of higher MSE at extreme quantiles was also observed. Though the data are
not shown here, each of the methods we compared result in relatively unbiased estimates of the quantile
indirect effects, with relative bias no greater than 10% for any method under all the simulation settings.
Figure 2.1 shows the densities for the product of coefficients αβ(η) and the difference of coefficients γ(η)-
γ’(η). As expected, the product has much smaller variance than the difference. Moreover, the fitted value
method has a slight efficiency advantage over the control function, primarily because it estimates the
parameter α via least squares rather than the least absolute deviation method of the control function. The
Appendix derives the theoretical value of the total effect of X on Y for this simulated model (denoted γ(η))
and the resulting simulation-based means are never more than 5% off of the true values, for any quantile
that was considered, with the largest bias occurring at the 5
th
and 95
th
percentiles of Y. It would be worth
considering in future work to determine how large this bias becomes for η ↗1 or η ↘0, i.e. when the quantile
of interest gets arbitrarily close to 1 or 0.
Estimator Sample Size η=0.05 η=0.15 η=0.5 η=0.85 η=0.95
QCME
100 1.784 0.875 0.553 0.893 1.838
500 0.348 0.150 0.110 0.149 0.290
1000 0.188 0.086 0.051 0.080 0.160
BK
100 1.780 0.874 0.555 0.893 1.834
500 0.347 0.150 0.109 0.149 0.290
1000 0.188 0.085 0.051 0.080 0.160
LSQR
100 1.906 0.904 0.598 0.934 2.009
500 0.369 0.157 0.112 0.167 0.319
1000 0.199 0.092 0.055 0.087 0.176
CV
100 2.158 0.963 0.609 0.971 2.105
500 0.368 0.163 0.114 0.163 0.314
1000 0.201 0.092 0.054 0.085 0.177
Table 2.1 MSE of Simulation-Based Estimates of Indirect Effects across Selected Quantiles
Table 2.2 displays the empirical Type I error rates for the various test statistics, under the stated test
conditions and using the fitted value approach for estimation. The Asymmetric Confidence Limits and
Quantile Goodman tests are still very conservative near the mean, evidenced by the very low empirical
type I error rates (always < 0.01) for all of the combinations of heterogeneity and sample size, although
42
this is consistent with results for conventional mediation in previous simulation studies(MacKinnon et al.,
2002). The Joint Significance test appears to achieve the highest Type I error at the tails of the
distribution, counter to what MacKinnon et al. presented (2002). The performance of the control function
approach for the Quantile-Sobel, Quantile-Goodman and Joint Significance tests closely match the results
from the fitted value (data not shown). Although the fitted value approach produces the nominal type I
error of 0.05 for the Asymmetric Confidence Limits in the conventional case of no heterogeneity of effect
for M, the rates are much lower when the control function estimates are used (ranging from 0 to 0.003 for
the various test cases, data not shown).
Empirical power results for the three tests are shown in Table 2.3. The Asymmetric Confidence
Limits, Quantile-Sobel, Quantile-Goodman and Joint Significance tests perform nearly the same at all
quantiles, with the greatest power attained at the 85
th
and 95
th
conditional quantiles. Although there
seems to be an asymmetric trend in power for a small sample size, this is likely because the greatest
variability in Y is at the upper quantiles of Y, and regression quantiles achieve what Huber (1972)
describes as “a small asymptotic variance over some neighborhood” of the distribution of Y. In our
simulation setup, this means that one ought to expect the precision in parameter estimates corresponding
to the upper quantiles of Y to be greater than at the lower tails, especially for a small sample size such as
100.
The control function estimator does not seem amenable to the Quantile-Sobel, Quantile-Goodman
or Joint Significance tests, as it produces incredibly low power for all test scenarios. This is due to the
efficiency advantage of least squares over LAD estimators when the error distribution is symmetric with
light tails. Given that using a LAD estimator for the parameters in 2.20 comes with an efficiency cost
when the underlying error distribution for M is normal (which it is here), we also examined the
performance of the control function approach when using the least squares approach to estimate α. This
resulted in substantial gains in power compared to the control function approach as described by Lee
(2007), and comparisons (not shown) indicate that this modified control function approach is also more
powerful than the fitted value.
43
In order to compare our results for the median to existing simulation studies, we generated 1000
realizations of equations 2.19 and 2.20 for the same three sample sizes as above, while varying the
“average” effect size as small, medium and large, corresponding to α=β=γ’=0.2, 0.5 and 0.8 respectively.
The other parameters were kept the same as in the simulation setting described above. For the 15
th
,
median, and 85
th
percentiles, we found the following. For any single effect size, the bias in the estimate
of αβ(η) decreased with decreasing sample size, while fixing the sample size resulted in decreasing bias
with increasing effect size. The magnitude of the bias was roughly the same for all of the 5 methods
considered. We find that these results compare well to other simulation studies conducted to assess these
same issues for indirect effects corresponding to the “average”, rather than quantiles, of the outcome
distribution (MacKinnon et al., 2002; Mackinnon et al., 1995).
Figure 2.1. Densities of product of coefficients, difference of coefficients, and product minus difference
44
Estimator Testing Method Sample Size η=0.05 η=0.15 η=0.5 η=0.85 η=0.95
Fitted Value
Product Z
100
0.005 0.003 0.002 0.002 0.008
500
0.021 0.023 0.004 0.019 0.03
1000 0.052 0.032 0 0.028 0.035
Goodman
100
0.016 0.015 0.024 0.02 0.013
500
0.02 0.016 0.018 0.015 0.02
1000 0.02 0.011 0.02 0.013 0.019
Joint
100
0.02 0.008 0.004 0.009 0.016
500
0.029 0.033 0.003 0.03 0.036
1000 0.034 0.033 0.003 0.032 0.039
QCME QCME-Based 95% CI
100 0.003 0.001 0.002 0.002 0.001
500 0.004 0.004 0.002 0.001 0
1000 0 0 0.002 0 0.002
Table 2.2 Type 1 Error (αβ(η)=0) for the Fitted Value and QCME methods across Selected Quantiles
Estimator Testing Method Sample Size η=0.05 η=0.15 η=0.5 η=0.85 η=0.95
Fitted Value
Product Z
100
0.35 0.663 0.966 0.96 0.866
500
0.899 1 1 1 1
1000 0.994 1 1 1 1
Goodman
100
0.348 0.661 0.966 0.959 0.866
500
0.899 1 1 1 1
1000 0.994 1 1 1 1
Joint
100
0.35 0.663 0.966 0.96 0.866
500
0.899 1 1 1 1
1000 0.994 1 1 1 1
QCME QCME-Based 95% CI
100 0.837 0.932 0.942 0.932 0.863
500 0.925 0.949 0.932 0.95 0.937
1000 0.918 0.93 0.947 0.948 0.942
Table 2.3 Statistical Power (αβ(η)≠0) for Fitted Value and QCME methods across Selected Quantiles
45
3 A Bayesian Approach to Quantile Mediation
3.1 Limitations of the Two-Stage Methods
[Some material in this section has been submitted to Statistics in Medicine, and may appear
in that journal as well.]
Despite their simplicity and ease of implementation, the two-step approaches discussed in section 2
are not easily amenable to the common situations in which: 1) some of the variables in the model are
measured with error (e.g. are latent); 2) one needs to consider multiple mediators or outcomes
simultaneously; or 3) the independence assumption between observations is clearly violated (e.g.
clustering within subjects, communities, etc.). The first two challenges are easily dealt with for mean
mediation analysis, by employing conventional structural equation modeling (SEM) techniques (Bollen,
1989), though these methods become severely cumbersome when the third problem arises (Kenny et al.,
2003; Krull and MacKinnon, 2001). Moreover, if one wishes to examine quantile mediation in the
presence of measurement error, the covariance structure will not be adequate to identify the quantile-
specific parameters in the model, ruling out the simple extension of covariance-based methods such as
those implemented in EQS or MPlus.
Regarding the first problem, Burgette and Reiter (2011) proposed a Bayesian approach for quantile
regression when one or more of the independent variables are measured with error. Combining
approaches from Lee (2007) and Reed and Yu (2009), they obtain a straightforward approach for a
Bayesian analysis of CFA for conditional quantiles. However, before developing a Bayesian approach to
address the three problems listed above, one must first have a Bayesian approach to quantile mediation
analysis. In order to develop a Bayesian approach for mediation for conditional quantiles, we combine
traditions in a fashion similar to Burgette and Reiter (2011).
This section combines the Bayesian mediation model of Yuan and MacKinnon (2009) and the
Bayesian quantile regression model of Reed and Yu (2009), to arrive at a Bayesian Quantile Mediation
Model. To date, the work proposed in this section is the first of its kind, having no precedent other than
46
the two-stage methods described in section 2, and Imai‟s ACME estimator (2010) described in section 2.
Section 3.2 discusses a few basic issues regarding Bayesian Methods for Structural Equation Models.
Section 3.3 details the Bayesian Quantile Mediation Model, and this model is extended to incorporate
latent mediator variables in section 3.4. Section 3.5 describes a simulation study to compare properties
of the Bayesian method with the two-stage methods from section 2. The final section, 3.6, discusses
some key issues regarding the models described in this section.
3.2 Bayesian Methods for Conventional Mediation Models
Least squares and likelihood-based solutions for SEMs treat the parameters of interest as fixed
quantities, which are then estimated by assuming some underlying distribution for the sample data. At
the back end, once suitable estimators are obtained, asymptotic distributions (usually following a Normal
law) are then derived in order to conduct statistical tests. In the Bayesian worldview, parameters (say, θ)
are assumed to follow some underlying distribution p(θ) at the outset, and Bayes rule is then applied to
determine the conditional probability distribution of the parameters given the observed data. As the
denominator of equation 3.1 is simply a constant, one usually considers the rightmost expression to
describe the conditional distribution of θ, referred to as the posterior distribution of θ. The expression p(θ)
is known as the prior distribution of θ, and can be chosen based on little to no knowledge of its true form
without many problems.
(3.1) 𝑝 𝜃 𝑑 𝑎 𝑡 𝑎 =
𝑝 𝑑 𝑎 𝑡 𝑎 𝜃 𝑝 (𝜃 )
𝑝 (𝑑 𝑎 𝑡 𝑎 )
=
𝑝 𝑑 𝑎 𝑡 𝑎 𝜃 𝑝 (𝜃 )
𝑝 𝑑 𝑎 𝑡 𝑎 𝜃 𝑝 (𝜃 )𝑑 𝜃 ∝ 𝑝 𝑑 𝑎 𝑡 𝑎 𝜃 𝑝 (𝜃 )
Classical Bayesian methods typically select a prior distribution that is conjugate with the
posterior, that is, in the same family of distributions (typically exponential) as the posterior. Using a
conjugate prior often helps to simplify the integration involved in computing the expression in the
denominator of equation 3.1, and leads to a posterior distribution with a closed form, thus allowing one to
integrate over the posterior to obtain useful quantities such as quantiles or the mean of the parameter θ.
Unfortunately, only the simplest SEMs have a simple enough form that would allow for direct integration
of the posterior distribution that is tractable (Lee, 2007). One must often resort to Markov Chain Monte
47
Carlo (MCMC) in order to avoid numerical integration, which may be problematic if the dimension of the
parameter space is large.
MCMC methods typically involve drawing samples from some probability distribution π, such
that the set of samples {θ
i
|i=1, …,n} can be used to approximate quantities of interest such as the
population mean or variance of Y using statistics from the samples like the sample mean. Bayesian
methods typically define θ to be a vector of k parameters and π as its posterior distribution. For posterior
distributions with relatively simple forms, direct sampling can be conducted rather easily. However,
many situations call for techniques which can draw samples from more complicated posterior
distributions in such a manner that the distribution of the samples approximates the true posterior.
MCMC methods achieve this by constructing a Markov Chain that has the posterior as its stationary
distribution (Robert and Casella, 2004), which is usually achieved after the chain has moved a sufficient
number of steps to reach stationarity, known as the “burn-in” phase. Convergence of the chain is
typically assessed by examining traceplots or the “estimated potential scale reduction factor” (EPSR, for
short), after running multiple chains from different starting values.
A popular MCMC technique is the Gibbs sampler (Gelfand and Smith, 1990), and Lee (2007)
recommends it as the method of choice because it allows one to easily deal with latent and observed
variables jointly in the same model. The basic idea of Gibbs sampling is to divide θ into k different
groups, and update them according to the probability rule embodied in equation 3.2. This manifests
computationally in the following manner: select an arbitrary set of starting values {θ
i
0
|i=1,…k}, and then
draw θ
1
j
from π(θ
1
| θ
2
j-1
, …, θ
k
j-1
), θ
2
j
from π(θ
2
| θ
1
j-1
, θ
3
j-1
, …, θ
k
j-1
), up to θ
1
j
from π(θ
1
| θ
1
j-1
, …, θ
k-1
j-1
), for
j=1,...,t samples. After a sufficiently long burn-in phase, we may then obtain t independent k-dimensional
samples of θ, and then use the posterior distributions for each of the k parameters to obtain whatever
quantities we like. Bayesian inference for a single parameter will typically take the median of the
posterior samples as the parameter estimate, and construct a (1-ω)% credible interval using the ω
th
and (1-
ω)
th
order statistics of the posterior samples.
(3.2) 𝜋 𝜃 𝑖 𝜃 1
,… , 𝜃 𝑖 −1
,𝜃 𝑖 +1
, … , 𝜃 𝑘 = 𝜋 (𝜃 𝑖 |𝜃 −𝑖 )
48
In the common case when some variables are observed and others are latent, Lee (2007) indicates
that the latter are easily dealt with via Gibbs sampling, by simply treating the latent variables as missing
data. Supposing that θ={θ
i
|i=1, …,p} and Ω={ Ω
j
|j=1,…,q} are (p- and q-dimensional) vectors of the
observed and latent variables, it is easy to see how Gibbs sampling applies. For the t
th
iteration, the
sampler is written as the system of equations 3.3. For t=1,2,…,T iterations after a sufficient burn-in phase,
there will be (p + q)T total sampling steps, mostly from Normal, Gamma and Wishart distributions.
𝑆 𝑎 𝑚 𝑝 𝑙 𝑒 𝜃 1
𝑗 +1
𝑓 𝑟 𝑜 𝑚 𝜋 𝜃 1
𝜃 −1
𝑗 , 𝛺 𝑗 , 𝑑 𝑎 𝑡 𝑎
𝑆 𝑎 𝑚 𝑝 𝑙 𝑒 𝜃 2
𝑗 +1
𝑓 𝑟 𝑜 𝑚 𝜋 𝜃 2
𝜃 1
𝑗 +1
, 𝜃 3
𝑗 , … , 𝛺 𝑗 , 𝑑 𝑎 𝑡 𝑎
……
𝑆 𝑎 𝑚 𝑝 𝑙 𝑒 𝛺 1
𝑗 +1
𝑓 𝑟 𝑜 𝑚 𝜋 𝛺 1
𝜽 𝑗 +1
,𝛺 −1
𝑗 , 𝑑 𝑎 𝑡 𝑎
𝑆 𝑎 𝑚 𝑝 𝑙 𝑒 𝛺 2
𝑗 +1
𝑓 𝑟 𝑜 𝑚 𝜋 𝛺 2
𝜽 𝑗 +1
,𝛺 1
𝑗 +1
,𝛺 3
𝑗 ,… , 𝑑 𝑎 𝑡 𝑎
Despite the computational benefits of Gibbs sampling, it is not always feasible for generating
samples for more complicated models. A classic example of this is the bivariate normal model, where the
conditional distribution for the correlation parameter does not have a known form, and so Gibbs sampling
is not feasible. In such circumstances, the Metropolis-Hastings Algorithm (Chib and Greenberg, 1995) is
typically employed because it does not require one to sample from the conditional distribution of every
model parameter. Rather, the joint probability of the parameters is factored in the same manner as
equation 3.1. The estimation of a sample mean provides a very simple example, where one can simply
use a symmetric prior distribution (such as Normal or Uniform priors), which results in the Metropolis
Algorithm. After a suitable starting value is chosen, one simply computes the joint probability in
equation 3.1 as the likelihood p(data|θ), as the prior probabilities will cancel out in the Hastings Ratio
(ibid.), and conducts the usual Accept-Reject step.
(3.3) (3.3)
49
3.3 The Bayesian Quantile Mediation Model
3.3.1 MCMC Methods
Lee (2007) is correct in suggesting that Gibbs sampling is a natural way to estimate parameters
for SEMs, because SEMs are very readily written as hierarchical models. In fact, Curran (2003) has
clearly articulated general conditions under which SEMs and hierarchical models are equivalent. Yuan
and MacKinnon (2009) echo these comments, and suggest using Gibbs sampling to generate posterior
samples for the parameters in the system of equations 1.1 and 1.2, θ={α
0
, α
1
, β
0,
β
1
, γ’}. An especially
useful property of the posterior samples from Gibbs sampling is that “a function of posterior samples of
parameters is the posterior samples of the function of the parameters” (ibid.). Therefore, one can use the
T posterior samples of α
1
and β
1
to estimate the mean and variance of the indirect effect by equations 3.4
and 3.5. Moreover, one can obtain a (1- ω)% credible interval of the product by obtaining the appropriate
order statistics: α
1[ω]
β
1[ω]
and
α
1[1-ω]
β
1[1-ω]
.
(3.4) 𝛼 1
𝛽 1
=
1
𝑇 𝛼 1
𝑡 𝛽 1
𝑡 𝑇 𝑡 =1
,or 𝛼 1
𝛽 1
= 𝛼 1
[0.5]
𝛽 1
[0.5]
(3.5) 𝜍 𝛼 1
𝛽 1
2
=
1
𝑇 −1
𝛼 1
𝑡 𝛽 1
𝑡 − 𝛼 1
𝛽 1
2
𝑇 𝑡 =1
Recall from section 1.3.2 that the mediation model in equations 1.1-1.2 can be rewritten as
equations 1.6-1.7. Following Curran‟s remarks about the equivalence of SEMs and hierarchical models
(Curran, 2003), we can easily write the conditional distributions of the parameters for equations 1.6 and
1.7 in the same manner as one would write a hierarchical model. Assuming independence and normality
for all the elements of θ={α
0
, α
1
, β
0,
β
1
, γ’}, and inverse gamma distributions for ζ
u
2
and ζ
v
2
, one may write
the conditional distributions for each of the parameters, and initiate the Gibbs sampler by first sampling
ζ
v
2
for equation 1.6, and then ζ
u
2
for equation 1.7. With the hierarchical model encompassed in 1.6 and
1.7, standard software packages such as WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs) may be directly
applied to conduct Gibbs sampling and posterior inference. The presence of latent variables poses no
serious challenges, as the sampling process would simply be augmented using the procedure described in
equation 3.3.
50
3.3.2 The Single Mediator Model
Section 1.3.4 introduced the underlying theory for Bayesian Quantile Regression, which may be
implemented in one of two manners. Yu and Moyeed (2001) showed that the Asymmetric Laplace
Distribution (ALD) properly characterizes regression quantiles. Equations 3.6 and 3.7 encompass the
likelihood for the ALD, which is appropriate because it guarantees that the location parameter x’β(η) will
be the η
th
quantile of the distribution of y. A random walk Metropolis, or any other accept/reject algorithm,
can then be implemented to generate posterior samples of the parameters in β(η). Yu and Moyeed (2001)
demonstrate that one can sample from a highly non-informative prior, and still obtain a proper posterior
distribution for β(η). A drawback to this approach is that sampling from the ALD is not straightforward,
and the resulting conditional distributions do not have standard forms (such as normal or inverse gamma).
(3.6) 𝑓 𝜏 𝑦 ; 𝑥 ′
𝛽 𝜏 , 𝜍 =
𝜏 1−𝜏
𝜍 exp −𝜌 𝜏
𝑦 −𝑥 ′
𝛽 𝜏
𝜍
(3.7) 𝜌 𝜏 𝑢 = 𝑢 {𝜏 − 𝐼 (𝑢 < 0)}
Moreover, the Gibbs sampler is more germane to the purpose of generating posterior samples for
hierarchically-specified models, such as the system in equations 1.6 and 1.7. However, one must modify
the model in equation 1.6 to ensure that the location parameter corresponds to the τ
th
quantile of the
distribution of y. Writing the model using equation 3.6 should be straightforward enough, but one may be
hard-pressed to determine a simple closed-form conditional distribution for the parameters in β(η), which
would make Gibbs sampling nearly impossible. In order to simplify the conditional distributions of the
quantile regression parameters, and thus permit Gibbs sampling, we may employ another representation
of the ALD. A random variable U~ALD(μ, ζ) can be represented as a finite mixture of standard normal
and exponential random variables (eq. 1.13), because both U and its mixture equivalent have the same
moment generating functions (Reed and Yu, 2009) and characteristic functions (Kozumi and Kobayashi,
2009).
Fortunately, this allows us to rewrite equations 1.6 and 1.7 in such a way that the parameters in
the model for y correspond to the τ
th
quantile of y (as in equations 3.8 and 3.9). Written in this fashion, we
51
can easily perform Gibbs sampling by first sampling σ
M
2
from a gamma distribution and then α
0
and α
1
from relatively non-informative distributions – usually a “flat” Normal prior is preferred because it is self-
conjugate. The next step begins by sampling w from a standard exponential distribution, and computing
the variance of y as in equation 3.8. α
1
, β
1
, and γ’ are then updated easily by conditioning on σ
Y
2
and
other relevant quantities. Note that one could easily incorporate a free scale parameter from the ALD, by
adding a hyperparameter to the distribution of w – e.g. sampling w from an Exp(ρ), where the
hyperparameter ρ follows any distribution that allows it to be strictly positive. It is expected that
expanding the parameter space in this fashion will speed convergence for some problems. Gibbs
sampling for this model can be readily implemented in existing packages such as WinBUGS
(http://www.mrc-bsu.cam.ac.uk/bugs/), which allows for easy specification of hierarchical models of the
form in equations 3.8 and 3.9.
(3.8) 𝑌 ~𝑁 𝜇 𝑌 , 𝜍 𝑌 2
,𝑤 ℎ𝑒 𝑟 𝑒 𝜇 𝑌 =
1−2𝜏 𝜏 (1−𝜏 )
𝑤 + 𝛽 0
+ 𝛽 𝑀 + 𝛾 ′
𝑋 𝑎 𝑛 𝑑 𝜍 𝑌 2
=
2𝑤 𝜏 (1−𝜏 )
(3.9) 𝑀 ~𝑁 𝜇 𝑀 ,𝜍 𝑀 2
, 𝑤 ℎ𝑒 𝑟 𝑒 𝜇 𝑀 = 𝛼 0
+ 𝛼 𝑋
3.3.3 The Multiple Mediator Model
Many proposed mediation models involve more than one mediator, and covariance structure
analysis is typically employed for such models when effects on the mean(s) of the response distribution(s)
is(are) of interest (Bollen, 1989). This dissertation thus develops analogous approaches for quantile
mediation models, for which the covariance structure does not sufficiently identify quantile-specific
parameters. This section will outline two separate approaches to the simple case where there is a single
outcome, but multiple mediators. The model encompassed in equations 3.10-3.11 will be used to
illustrate the proposed methods, where it is assumed that there are three mediators which may or may not
be correlated, and which share a common set of covariates X and Z
1
. Though the additive structure of the
model may be overly restrictive in some situations, it is employed here to illustrate the utility of the
methods described above when more than one mediator is of interest.
52
(3.10) 𝑌 = 𝛽 0
+ 𝛽 𝑘 𝑀 𝑘 3
𝑘 =1
+ 𝛾 ′
𝑋 + 𝑢
(3.11) 𝑀 𝑘 = 𝛼 0𝑘 + 𝛼 𝑘 𝑋 + 𝛿 𝑘 𝑍 1
, 𝑘 = 1,2,3
For models involving multiple mediators, one may envision both a marginal and joint modeling
approach. The former is dubbed the marginal model because it assumes that the mediators are all
uncorrelated, and estimates each of the 3 equations in 3.11 separately, as well as 3 separate versions of
equation 3.10. Parameter estimates for the marginal model can be easily obtained using existing methods,
such as linear (or possibly quantile) regression for each of the 3 mediators, and then combining fitted
values of each of the mediators into equation 3.10, on which quantile regression can be conducted. The
respective estimates of α
k
are then simply multiplied with the corresponding estimates of β
k
to obtain the
indirect effect of the k
th
mediator. Despite its simplicity, the marginal model does not account for the
common situations in which several mediators are indeed correlated, and the resulting ignorance of the
correlation results in reduced power in much the same way that ignoring measurement error does.
To resolve this issue, one may specify a joint model for the mediators as in equation 3.11. A
multivariate regression (Johnson and Wichern, 2007) can be employed to obtain fitted values and
parameter estimates for the three mediators, and these are then combined in the same manner as the
marginal model to obtain the indirect effects of each mediating variable. Note that this model has the
added benefit that one can explicitly account for correlation between the mediators, and can specify these
relationships in any manner one so chooses such as compound symmetry (or assuming they are equally
correlated with each other), or independence (which should be the same as the marginal model). As usual,
assumptions such as sphericity which accompany the various structures for the covariance matrix must be
assessed before drawing any conclusions about mediation (ibid.). One could minimize the number of
such assumptions by estimating each of the parameters in the covariance matrix, but likely at the expense
of power to detect indirect effects which are small.
Alternatively, one could rewrite equations 3.10-3.11 as 3.12-3.13, resulting in a model that can be
estimated rather easily using common MCMC techniques such as the Metropolis algorithm. Gibbs
sampling would not be suitable for this model because the conditional distributions of the correlation
53
parameters do not have known forms. Note that one may easily rewrite equations 3.13 as 3.14 to obtain
the marginal mediator formulation of the Bayesian model, and this simpler model has the advantage that
one may directly apply the Gibbs sampling approach described in Section 3.3.2 for each of the mediators
in equation 3.14 separately, then update the parameters in equation 3.12 as usual. The joint model can
utilize a Normal-inverse-Wishart prior distribution, which is conjugate to a multivariate t-distribution.
One could also standardize each of the mediators and fix the correlations between the mediators, which
would simplify the model and result in a simpler multivariate normal posterior distribution, as well as
enable Gibbs sampling to reduce the computational burden of the model.
(3.12) 𝑌 ~𝑁 𝜇 𝑌 , 𝜍 𝑌 2
,𝑤 ℎ𝑒 𝑟 𝑒 𝜇 𝑌 =
1−2𝜏 𝜏 (1−𝜏 )
𝑤 + 𝛽 0
+ 𝛽 𝑘 𝑀 𝑘 3
𝑘 =1
+ 𝛾 ′
𝑋 𝑎 𝑛 𝑑 𝜍 𝑌 2
=
2𝑤 𝜏 (1−𝜏 )
(3.13)
𝑀 1
𝑀 2
𝑀 3
~𝑀 𝑉 𝑁 (
𝛼 01
+ 𝛼 1𝑘 𝑋 + 𝛿 1
𝑍 1
𝛼 02
+ 𝛼 12
𝑋 + 𝛿 2
𝑍 1
𝛼 03
+ 𝛼 13
𝑋 + 𝛿 3
𝑍 1
,
𝜍 𝑀 1
2
𝜌 𝑀 1
𝑀 2
𝜍 𝑀 1
𝑀 2
𝜌 𝑀 1
𝑀 3
𝜍 𝑀 1
𝑀 3
𝜌 𝑀 2
𝑀 1
𝜍 𝑀 2
𝑀 1
𝜍 𝑀 2
2
𝜌 𝑀 2
𝑀 3
𝜍 𝑀 2
𝑀 3
𝜌 𝑀 3
𝑀 1
𝜍 𝑀 3
𝑀 1
𝜌 𝑀 3
𝑀 2
𝜍 𝑀 3
𝑀 2
𝜍 𝑀 3
2
)
(3.14) 𝑀 𝑘 ~𝑁 𝜇 𝑀 𝑘 ,𝜍 𝑀 𝑘 2
,𝑤 ℎ𝑒 𝑟 𝑒 𝜇 𝑀 𝑘 = 𝛼 0𝑘 + 𝛼 𝑘 𝑋 , 𝑘 = 1,2,3
Although the marginal model is appealing from a computational perspective, ignoring the
correlation between mediators may result in biased estimates of the indirect effects. Indeed, an important
limitation of Imai et al‟s R implementation of the QCME method (2011) is that the specific indirect
effects for each of the p mediators are estimated separately, and thus neither allows one to characterize
any possible correlation between mediators in either part of the model, nor to mutually include all p
mediators in the equation for Y. The resulting bias will be proportional to the degree of correlation
between the mediators of interest, and becomes larger as the degree of correlation increases. This is a
well-known fact from the theory of linear models (cf. Johnson and Wichern, 2007, p. 387). In the two-
mediator scenario, the bias introduced by estimating only one of the two indirect effects will be directly
proportional to the covariance between the two mediators. Thus, the Marginal model will be unbiased
only in the case where the two mediators have zero correlation.
54
In order to deal with the inter-mediator correlation in equation 3.13, we allow all covariance
parameters to be estimated, and employ a set of independence assumptions to the make model identifiable,
and Gibbs sampling feasible. Namely, we assume: 1) P(α
1,
α
2
,…,α
p
)=Π
i
P(α
i
),
2)P(α
1,
α
2
,…,α
p
| β,γ’,σ
u
)=Π
i
P(α
i
), 3) P( β|γ’)=P( β), and 4) P(u,v)=P(u)P(v). These are easily seen to be the
commonly used recursive assumptions to identify parameters of an SEM, what Pearl calls the Markovian
assumption to identify causal mediation effects (1998). It is worth noting that allowing the scale
parameter of the ALD (i.e. σ) to be freely estimated results in a potentially under-identified model,
because the recursive rule described in section 1 usually results in a “just-identified” model. Our own
experience indicates that freeing σ results in severe underestimation of the specific and total indirect
effects at all quantiles, and so the results presented in subsequent sections assumed that σ was fixed to
unity.
3.4 Quantile Mediation with a Latent Mediator Variable
Of course, both the marginal and joint models assume that every mediator is measured without
error. Common situations involve mediators which are not directly measurable, such as using the NEWS
to quantify Walkability in the Healthy Places Study (section 1.2.2). The key pitfall of ignoring the
measurement error is that each indicator (and as a result a linear combination of them) will be more
variable than a model which treats them as indicators of a latent variable, which ultimately results in
underestimation of structural parameters (Bollen, 1989). In order to explicitly deal with measurement
error, methods based on covariance structure analysis have typically been employed, such as in the EQS
(Bentler and Weeks, 1980), LISREL (Joreskog, 1970) or MPLUS (Muthen, 1994) software packages.
However, these approaches are not suitable for the estimation of quantile mediation, especially in
situations such as those described in section 4 below, involving multilevel data structures. Section 5.2
shows that the indirect effects of Walkability at the 85
th
and 95
th
percentiles of BMI and Waist
Circumference are substantially greater than the effects at their means. Under such circumstances, the
55
suitability of the covariance structure for estimating indirect effects at the upper or lower tails of the
response distribution is seriously compromised.
As this is a common occurrence for data arising in prevention research (e.g. the Healthy Places
Study), it is important that the quantile mediation model be generalized to situations where some of the
variables in the model are latent. Burgette and Reiter (2011) have already done some work in this regard,
combining Lee‟s (2007) Bayesian Confirmatory Factor Analysis (CFA) model with Reed and Yu‟s (2009)
Bayesian quantile regression model to obtain a Bayesian CFA model for conditional quantiles. The next
step in this process is to extend Burgette and Reiter‟s model to the situation where one or more mediators
are latent variables, and some quantile of the response distribution is of interest. We once again write the
model hierarchically as in equations 3.8-3.9, and further enrich the model for the mediators (equation 3.14)
as in equations 3.15-3.16. MCMC methods may then be applied to sample from the conditional
distributions of the factor loadings, and the corresponding covariance matrix, using WinBUGS
(http://www.mrc-bsu.cam.ac.uk/bugs/) as in Lee (2007).
(3.15)
𝑀 1
𝑀 2
⋮
𝑀 𝑝 ~
𝜴 + 𝜓 1
𝜆 2
𝜴 + 𝜓 2
⋮
𝜆 𝑝 𝜴 + 𝜓 𝑝 , 𝜦 =
𝟏 𝜆 2
⋮
𝜆 𝑝 , 𝛹 =
𝜍 𝜓 1
2
⋯ 0
⋮ ⋱ ⋮
0 ⋯ 𝜍 𝜓 𝑝 2
(3.16) 𝝎 𝒊 = 𝛼 0
+ 𝛼 1
𝑋 𝑖 + 𝜙 𝑖 ,
𝜙 1
⋮
𝜙 𝑛 ~𝑁 (
0
⋮
0
,
𝜍 𝜙 2
⋯ 0
⋮ ⋱ ⋮
0 ⋯ 𝜍 𝜙 2
)
Two simplifying assumptions were made with respect to Lee‟s model. First, the model described
in this section only considers latent mediators, as represented in equations 3.15-3.16. Second, we do not
specify a joint prior distribution for Ω and Y because our model for the latter corresponds to arbitrary
quantiles of its distribution, while Lee‟s set up only applies to the mean, thus making a similar
multivariate normal prior inappropriate. Following Lee (ibid.), we also assume that the indicators of the
latent variable M are independent in their prior distribution, resulting in independent priors for the factor
loadings. The additional identifying assumption of fixing some factor loadings was also employed, and is
56
common in the SEM literature (Bollen, 1989; Lee, 2007). Following from these assumptions, a Gibbs
sampling approach can be implemented as follows.
i. Initialize the parameter sets θ
(1)
={α
0
, α
1
, β
0,
β
1
, γ’}, Λ
(1)
, Ψ
(1)
and the vector of disturbances ϕ,
using the following set of prior distributions
a. π(α
0
, α
1
, β
0,
β
1
, γ’)=π(α
0
)π(α
1
)π(α,)π(β)π(γ’), that is, using independent priors
b. ψ
k
(1)
~ Gamma(1,1) for k=1,...,p
c. π(λ
(1)
|Ψ
(1)
)=π(λ
2
|ψ
2
)...π(λ
p
|ψ
p
), again using independent priors
ii. Generate the factor scores Ω
(2)
from p(Ω| θ
(1)
, Λ
(1)
, Ψ
(1)
,
ϕ
(1)
, Y,{M
1
,...,M
p
})
iii. Generate the diagonal elements of Ψ
(2)
from p(Ψ| Λ
(1)
, ϕ
(1)
, Ω
(2)
, Y).
iv. Generate the elements of Λ
(2)
from p(Λ| θ
(1)
, Ψ
(2)
, ϕ
(1)
, Ω
(2)
, Y), which we do independently for
each factor loading – i.e. sample λ
2
(j+1)
from p(λ
2
|θ
(j)
, Ψ
(j+1)
, ϕ
(j+1)
, Ω
(j+1)
, Y), and so on.
v. Generate ϕ
(2)
from p(ϕ| θ
(1)
, Λ
(2)
, Ψ
(2)
, Ω
(2)
, Y)
vi. Finally, sample θ
(2)
from p(θ| ϕ
(2)
, Λ
(2)
, Ψ
(2)
, Ω
(2)
, Y)
vii. Repeat ii through vi as necessary, up to a designated number of samples.
Once these steps are encoded in the language of WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/), one
may conduct estimation and inference by drawing samples from the posterior distribution of θ (as in
section 3.3.2), and using the appropriate sample statistics as described in section 3.3.1.
While a more general approach would allow the factor loadings to be correlated in their prior,
specification of the prior factor structure should be predefined by prior knowledge (Bollen, 1989; Lee,
2007; Burgette and Reiter, 2011). Specifically, one ought to know before the analysis stage what
indicators correspond to which latent variables, thus allowing for a block-diagonal structure in the matrix
of factor loadings when more than one latent variable are involved, as was done by Burgette and Reiter
(2011, p. 3). The same reasoning applies when one considers the prior specification of multiple indicators
with respect to a single latent variable, for which the selection of indicators that capture substantively
different aspects of the factor ought to imply independence in their prior distribution. This line of
57
reasoning was applied in the specification for the application of this method to data from the HPS, in
section 5.5 below. The implementation described in section 5.5 below follows the strategy described by
Lee for Bayesian SEM (2007, pp. 82-98), more specifically following the WinBUGS implementation
used to fit a two-factor SEM (Lee, 2007, pp.98-103).
3.5 Simulation Studies
A simulation study was conducted to determine: 1) whether the Bayesian quantile mediation model
in equations 3.15 and 3.16 produces unbiased estimates of the quantile indirect effects (both for a single-
mediator and a two-mediator model); and 2) the coverage rates of the Bayesian model. Both points were
examined while varying the effect size (Cohen, 1988) of each of the model variables. In order to examine
the issues listed above, we generated data according to equations 3.17 and 3.18, which allow for the effect
of X to vary by quantiles of Y. The variable X was taken to be a treatment variable with 500 subjects split
into two treatment groups, of roughly equal size. It was assumed that either one or two mediators were of
interest, each having mean 4 and varying effects on the outcome variable (0.2, 0.5 or 0.8), while the size
of the effect of X on each mediator was also allowed to vary (0.2, 0.5 or 0.8). This allowed us to assess
the performance of the Bayesian model for estimating weak, moderate or large specific indirect and total
indirect effects, for both single and multiple mediator settings. Each mediator was assumed to have unit
variance, with (for the multiple-mediator model) two possible degrees of correlation corresponding to
weak and moderate correlation (0.2 and 0.5, respectively). For the single mediator model, there were 9
different simulation scenarios for which 500 realizations were generated. The two-mediator model had
36 different simulation scenarios, for which 500 realizations were generated per scenario for equations
3.15 and 3.16. Since the distributions of indirect effects are not always Normal, posterior medians were
used to obtain Bayesian point estimates for all model parameters. For either single or multiple mediator
settings, the true value of any β(η) is equal to β+F
-1
u
(η). We decided to allow α to be the same for all
outcome quantiles, as the interest lies in how mediated effects change across the outcome distribution, and
not for quantiles of the mediator.
58
(3.17) 𝑌 𝑖 = 𝛽 0
+ (𝛽 𝑘 + 𝑢 𝑖 )𝑀 𝑖 ,𝑘 2
𝑘 =1
+ 𝛾 ′
𝑋 𝑖 + 𝘀 𝑖
(3.18)
𝑀 𝑖 ,1
𝑀 𝑖 ,2
~𝑀 𝑉 𝑁 (
𝛼 01
+ 𝛼 1
𝑋 𝑖 𝛼 02
+ 𝛼 2
𝑋 𝑖 ,
1 𝜌 𝜌 1
)
3.5.1 Single Mediator Model v. Two-stage Approaches
Figure 3.1 displays the results for the relative bias of the indirect effects at various quantiles per
the Bayesian model, as the size of the total indirect effect changes (for both single and multiple mediator
models), and as the correlation between the two mediators changes (multiple mediator model only).
Figure 3.1 suggests that a larger treatment-to-mediator (α) effect results in more biased estimates of the
indirect effect (αβ( τ)) when the mediator-to-outcome (β( τ)) effect is weak or moderate, and this appears to
be true for central as well as extreme quantiles. On the other hand, for any single α, the relative bias of
αβ( τ) drops off as β( τ) increases. In any event, the bias of αβ( τ) is as low as 0.3% and as high as 7.7%,
with the highest degrees of bias occuring when β( τ) is small. Table 3.1 displays estimates of the indirect
effects αβ(τ) for different quantiles of the outcome distribution. It is clear that the Bayesian and ACME
methods perform about the same for all quantiles considered, and in some cases the Bayesian model has
smaller bias than the ACME. This indicates that the two methods should be interchangeable when there
is only a single mediator in the model. However, the Bayesian model has the added advantage of being
capable of jointly estimating the indirect effects for multiple mediators, while simultaneously capturing
the correlation between those mediators. This is usually an important advantage of SEM-based methods
over methods such as the Causal Steps or ACME which estimate the parameters for each mediator
separately. Indeed, this is the key advantage of the Bayesian model over the others.
59
Method τ α
β
0.2 0.5 0.8
Bayesian
0.5
0.2 0.90% 0.30% 1.66%
0.5 3.44% 3.57% 3.21%
0.8 6.03% 6.22% 3.61%
0.7
0.2 4.46% 0.42% 4.93%
0.5 5.29% 5.71% 3.81%
0.8 7.66% 5.92% 2.12%
0.9
0.2 5.35% 0.45% 4.13%
0.5 4.67% 6.21% 3.29%
0.8 6.60% 4.83% 3.63%
ACME
0.5
0.2 1.28% 0.76% 2.02%
0.5 3.80% 4.11% 3.79%
0.8 6.29% 6.69% 4.18%
0.7
0.2 7.45% 1.40% 6.04%
0.5 6.99% 6.91% 4.66%
0.8 8.64% 6.68% 2.71%
0.9
0.2 6.96% 1.58% 4.94%
0.5 5.78% 7.05% 4.00%
0.8 7.31% 5.43% 4.12%
Table 3.1. Relative Bias (%) of Parameters using the Bayesian and ACME Models
60
Figure 3.1. Relative Bias of Bayesian and QCME Model, for different quantiles
61
3.5.2 Multiple Mediator Model
A simulation study was conducted to determine whether: 1) the Bayesian quantile mediation
model in equations 3.12-3.13 produces unbiased estimates of all model parameters; and 2) whether the
model performs differently when we vary the effect size (Cohen, 1988) of each of the model variables. In
order to examine the issues listed above, equations 3.12 and 3.13 were taken to represent the underlying
model, but with only two mediators rather than three. The simulation scenario follows the strategy
described above. Since the distributions of indirect effects are not Normal (MacKinnon, 2008), posterior
medians were used to obtain Bayesian point estimates for all model parameters.
For the two-mediator model, it is immediately evident from Figure 3.2 that the bias of the total
indirect effect (α
1
β
1
(η)+α
2
β
2
(η)) tends to be greater when at least one of the specific indirect effects is
small, and lesser when both of the specific indirect effects are medium or large. At the median, for either
weak or moderate correlation, the relative bias drops off dramatically as either one or both of the indirect
effect sizes gets larger (when the correlation is moderate, the relative bias was as low as 3.8% for large
effect sizes and as high as 46% for the smallest effect size). For the upper quantiles of 0.7 and 0.9, the
relative bias of the total indirect effect gets as low as less than 1%, and never exceeds 18% even for the
cases with weak correlation and small effect sizes. Since we were also interested in the model
performance under situations with no indirect effect and zero inter-mediator correlation, we also
generated a small set of realizations to that effect. When the mediators exhibit no correlation, the Joint
Bayesian mean and median models incorrectly estimate the correlation to be non-zero about 5.4% and 4.6%
of the time, and with a relative bias of less than 1%. The same can be said for the other model parameters
when α
1
=
β
1
= γ’= 0. Coverage rates of the indirect effects α
1
β
1
(η) and α
2
β
2
(η) range from about 0.75 to 1
for all correlations when the effect size is moderate, with only a couple of exceptions. The rates for weak
and large effect sizes are about 5-10% lower and higher, respectively.
62
Figure 3.2. Relative Bias of Joint Bayesian Model, for different inter-mediator correlations and quantiles
63
3.6 Comments Regarding the Simple Model
It is worth noting a few key points regarding the model in sections 3.3, 3.4 and 2.2. First, the
types of models with which data analysts will often be concerned involve much more complicated
relationships than those encompassed in 3.8 and 3.9. It is often expected that multiple mediators will
affect the same outcome, an issue that is handled via the methods in section 3.3.3. Second, any mediation
model should be able to accommodate variables which may be measured with error, which is covered in
section 3.4. A third issue concerns situations when assuming independence of the sample observations is
incorrect. This arises in studies which collect repeated measurements on individual study participants, as
well as those in which underlying exposures or disease risks among study participants are clustered at
spatial levels.
Examples of the latter dependence arise in air pollution studies, where children attending the
same school or living in the same neighborhoods have common exposure to ambient levels of different
pollutants. Other examples include the Healthy Places Study, where subjects are measured at multiple
measurement occasions, and so the resulting correlation for an individual‟s sample observations need to
be accounted for. Ignoring these sources of correlation is problematic for a variety of reasons, most
importantly that the crucial assumption of independence between sample observations fails to be true,
potentially invalidating any conclusions that might be drawn or resulting in less efficient estimators that
assume independence.
While two-stage approaches have the advantage of being relatively straighforward to implement,
they are not readily amenable to the three challenges listed above. The simulation studies in section 3.4
clearly demonstrate that the simple Bayesian model compares well with the two-stage methods in the case
of a single mediator, and is easily extended to the case when there are multiple correlated mediators. The
data analysis results of section 5 also indicate that the Bayesian approach may be even more powerful,
especially once they can properly account for the correlation between multiple mediators. It is later
shown that even the single mediator Bayesian model is further superior to two-stage approaches as
64
described in section 2 or by Imai (2010), when it can properly accommodate measurement error and
dependence between sample observations.
65
4 Multilevel Quantile Mediation Models
4.1 Overview of Multilevel Models for Mediation Analysis
Analysis of data from the Healthy Places study (Shen et al., Submitted-a; Shen et al., Submitted-b)
demonstrated that the examination of mediation for different quantiles of the outcome distribution
resulted in different effects for different quantiles. Specifically, it was found that subjects with much
higher BMI than normal (e.g. at the 85
th
or 95
th
quantiles) appeared to experience greater indirect
(mediated) effects of improved neighborhood walkability compared to normal weight subjects.
Useful as these papers were for demonstrating the utility of assessing mediation for different
quantiles of the outcome distribution, the models proposed are not suitable for a wide variety of analysis
problems in prevention research. Notably, situations involving the independence assumption for sample
observations failing to hold are rather important in prevention research, such as in intervention trials like
the Healthy Places study – or HPS (Dunton et al., 2012) - or cohort studies like the Children‟s Health
Study (Jerrett et al., 2010; Wolch et al., 2011) – CHS for short. Such studies collect data at multiple
levels of aggregation, ranging from repeated measurements on individual study subjects, to multiple
participants within a common population subgroup. An example of the latter is the CHS, in which the
intra-subject correlation structure is further nested within individual communities, within which subjects
share a common exposure profile. Failing to account for the different levels of data aggregation for these
types of data will result in biased and inefficient estimates of the target parameters of a linear model.
Failing to account for such data structures even in the conventional linear regression case may result in
parameter estimates that are less efficient, and as a result could lead to lower power to detect true
mediating relationships.
Clearly, this will also be the case when extending mediation to quantiles of the outcome
distribution. While Mixed Effects Models (MEM, for short) have been used for many years (Laird and
Ware, 1982), only recently have they been extended to model quantiles of an outcome (Geraci and Bottai,
2007; Tian and Chen, 2006; Wang et al., 2009; Wang, 2012; Wei et al., 2006; Yuan and Yin, 2010; Yue
66
and Rue, 2011). While these are not a complete list of all papers that have appeared on the topic, they
represent the general classes of models that have been proposed: 1) autoregressive models (Wei et al.,
2006); 2) hierarchical models (Tian and Chen, 2006); and 3) mixed effects models (Geraci and Bottai,
2007; Wang et al., 2009; Wang, 2012; Yuan and Yin, 2010; Yue and Rue, 2011). To date, no papers have
proposed such models for assessing quantile mediation (Shen et al., Submitted-a; Shen et al., Submitted-b)
when such data structures are at hand. This section seeks to fill that gap, by proposing a method which
combines the mixed effects modeling paradigm with quantile mediation techniques.
Though methods for multilevel quantile mediation have not been proposed prior to this
dissertation, there is an extensive literature on multilevel mediation analysis (Bollen and Curran, 2004;
Chou et al., 1998; Cole and Maxwell, 2003; Curran, 2003; Kenny et al., 2003; Krull and MacKinnon,
1999; Krull and MacKinnon, 2001; Maxwell et al., 2011; Meredith and Tisak, 1990; Muthen and
Asparouhov, 2008; Muthen, 1994; Song et al., 2008; Song et al., 2011; Yuan and MacKinnon, 2009), and
an excellent review of existing methods already exists (Preacher et al., 2010). The MPlus model is one of
the more popular SEM implementations (Muthen and Asparouhov, 2008; Muthen, 1994), and it
decomposes the implied covariance structure (Bollen, 1989) into between and within group components,
and can be thought of as an SEM analogue of ANOVA models with random effects (Montgomery, 2013).
Preacher et al (2010) extend the MPlus model (Muthen and Asparouhov, 2008) to accommodate
mediation hypotheses. The hierarchical linear modeling approach (Raudenbush and Bryk, 2002) is also
frequently used, as illustrated in the work of Krull and MacKinnon (1999; 2001), Chou et al. (1998),
Curran (2003), Kenny et al (2003), and Yuan and MacKinnon (2009). Other approaches use
autoregressive models to explicitly model time dependence of outcomes and mediators measured at
multiple occasions, often in combination with SEM approaches for latent variables (Bollen and Curran,
2004; Cole and Maxwell, 2003; Maxwell et al., 2011; Meredith and Tisak, 1990; Song et al., 2008; Song
et al., 2011). Among these various methods, the Bayesian approaches (Song et al., 2008; Song et al.,
2011; Yuan and MacKinnon, 2009) are able to circumvent many of the limitations of those based on
Covariance Structure Analysis (Bollen and Curran, 2004; Chou et al., 1998; Muthen, 1994), or regression
67
methods like the so-called “2→1→1” (Krull and MacKinnon, 2001) or “HLM5” models (Kenny et al.,
2003). Namely, the Bayesian methods overcome the following challenges of their non-Bayesian
counterparts: 1) missing data (Lee, 2007a); 2) some parameters, like the product of coefficients and its
related covariance term, are difficult to estimate consistently (Kenny et al., 2003; Maxwell and Cole,
2007; Yuan and MacKinnon, 2009); 3) classical tests of mediation like the Sobel test (Sobel, 1982) are
theoretically incorrect (MacKinnon et al., 2004); 4) the level (particularly an upper level) at which
mediation occurs (Bauer et al., 2006; Kenny et al., 2003; Krull and MacKinnon, 1999; Krull and
MacKinnon, 2001) can make it difficult to obtain consistent estimates of mediating relationships as “it is
more convenient [and in fact, better] to estimate the parameters making up the indirect effect
simultaneously, as part of one model, in which multiple components of variance could be considered
simultaneously” (Preacher et al., 2010, p. 213); and most notably, 5) quantiles of the outcome or mediator
cannot be modeled explicitly.
While the first four challenges have been dealt with in some fashion in the papers cited above, no
paper to date has dealt with modeling any quantile of the outcome in an analogous fashion. A method to
deal with the median of an outcome variable is forthcoming (Yuan and MacKinnon, 2013), but their
method is intended to be “robust to various departures from the assumption of homoscedasticity and
normality, including heavy-tailed, skewed, contaminated, and heteroscedastic distributions.” That is, it is
not meant to allow for modeling mediating relationships for any quantile of an outcome, nor does it
encompass models which can account for multilevel or longitudinal data structures. Regarding the former,
one may posit that hypothesized relationships may differ for subjects at different parts of the outcome
distribution, and this has been shown to be the case in several mediation and non-mediation settings
involving obesity outcomes (Abrevaya, 2001; Beyerlein et al., 2010; Burgette et al., 2011; Chernozhukov
and Fernández-Val, 2011; Shen et al., Submitted-a; Shen et al., Submitted-b; Wei et al., 2006). The latter
point follows from the discussion above. As a consequence, this section proposes a method for assessing
mediational relationships for any quantile of an outcome, in such a manner that multilevel data structures
can be taken into account.
68
The balance of the section is organized as follows. Section 4.2 begins by introducing the model
setup along with relevant notation and assumptions, and providing a bit of background on quantile
regression and quantile mediation methods. It concludes by further detailing the approach proposed in
this paper. Section 4.3 summarizes a simulation study, and foreshadows the data analysis results detailed
in section 5.4.
4.2 Multilevel Model for Quantile Mediation
Combining the Bayesian quantile mediation model of Shen et al (Shen et al., Submitted-a) with
the multilevel mediation model of Yuan and MacKinnon (Yuan and MacKinnon, 2009), we may write a
generic multilevel mediation model as equations 4.1-4.3. In this setup, j indexes repeated observations 1
through n
i
within one of the groups i=1,…,g. The groups can be individual subjects, as in the HPS or
CHS, or could reflect other nested data structures such as families, schools, or hospital locations.
Equation 4.3 is the prior distribution of the model parameters, where the covariance matrix Σ could have
any structure one wishes to specify, or otherwise could be diagonal (which is, in theory, the same as
letting the parameters be mutually independent in their priors). Following Yuan and MacKinnon (2009),
we assume all off-diagonal elements of Σ are zero, except for Σ[α1, β1], as our experience similarly
indicates that such a prior results in more stable and efficient estimation of those parameters. We can
alternatively write equation 4.3 in the more familiar multilevel model format in equation 4.4, and the
models are clearly equivalent if one assumes the vector of population parameters θ is the mean of θ
i
, and
that Σ is the covariance matrix of the vector u.
(4.1) 𝑦 𝑖 𝑗 = 𝛽 0𝑖 + 𝛽 1𝑖 𝑀 𝑖 𝑗 + 𝛾 𝑖 ′
𝑋 𝑖 + 𝘀 1𝑖 𝑗
(4.2) 𝑀 𝑖 𝑗 = 𝛼 0𝑖 + 𝛼 1𝑖 𝑋 𝑖 𝑗 + 𝘀 2𝑖 𝑗
(4.3) [𝛼 0𝑖 , 𝛼 1𝑖 , 𝛽 0𝑖 , 𝛽 1𝑖 ,𝛾 𝑖 ′
]
𝑇 ~𝑀 𝑉 𝑁 ( 𝛼 0
, 𝛼 1
,𝛽 0
,𝛽 1
,𝛾 ′
𝑇 ,𝛴 )
69
(4.4) 𝜽 𝒊 =
𝛼 0𝑖 𝛼 1𝑖 𝛽 0𝑖 𝛽 1𝑖 𝛾 𝑖 ′
= 𝜽 + 𝒖 𝒊 =
𝛼 0
𝛼 1
𝛽 0
𝛽 1
𝛾 ′
+
𝑢 1𝑖 𝑢 2𝑖 𝑢 3𝑖 𝑢 4𝑖 𝑢 5𝑖
While we acknowledge that other methods based on causal inference principles define direct and
indirect effects differently (Imai et al., 2010; Pearl, 1998; 2012; Rubin, 2005; VanderWeele, 2009), this
paper follows the usual SEM paradigm by operationalizing the mediated effect of X through M as the
product of coefficients α
1
β
1
. The key assumptions needed for identifiability of the product are as follows:
1) fix all but two off-diagonal elements in the prior covariance matrix Σ to zero; 2) P(ε
1
,ε
2
)=P(ε
1
)P(ε
2
)
and Cov(ε
1
,ε
2
)=0; and 3) the usual assumption that Cov(α
0i,
ε
2ij
)=Cov(α
1i,
ε
2ij
)=Cov(β
0i,
ε
1ij
)= Cov(β
1i,
ε
1ij
)=
Cov(γ’
i,
ε
1ij
)=0. Assumptions 1 and 2 collectively state that the only correlation in which we are interested
is that between α
1
and β
1
. Assumptions 2 and 3 are commonly used assumptions to identify parameters of
an SEM, and assumption 2 is what Pearl (1998) calls the Markovian (or recursive: see Bollen, 1989)
assumption to identify causal mediation effects. We explicitly specify zero correlation between ε
1
and ε
2
along with the classical independence assumption, since two random variables can have zero correlation
but still be dependent, albeit non-linearly. One could also incorporate an additional scale parameter in
Equation 4.1 (Chesher, 2003; Kozumi and Kobayashi, 2011; Ma and Koenker, 2006; Yu and Moyeed,
2001), but doing so may result in an under-identified model, because the recursive rule described above
usually results in a “just-identified” model. Our own experience indicates that freeing ζ resulted in severe
underestimation of the specific and total indirect effects at all quantiles (Shen et al., Submitted-b), and so
the results presented in subsequent sections assumed that ζ was fixed to unity.
The assumptions here closely mirror those of Yuan and MacKinnon (2009) for conducting
Bayesian mediation analysis for the mean of an outcome. The multilevel setup also parallels theirs, with
the key differences being that in our setup, Σ can have an arbitrary structure and the model described in
section 4.2 accommodates quantiles of an outcome. For the purposes of this section, we further restrict
the model to allow for only: 1) model variables measured without error; and 2) a single mediator. Latent
70
variables, such as some of the theoretical constructs described in section 1.2, can also be accommodated,
but such models are not covered in this section. By combining Bayesian SEM techniques (Lee, 2007a)
with the Bayesian quantile mediation model described below, one should be able to easily obtain a
quantile analogue of latent growth curve models (Chou et al., 1998). Extensions to models with multiple
mediators are easily implemented as well, and all such models can be fit using standard Gibbs sampling
software like WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/).
The notion of assessing mediation for different quantiles of an outcome (or mediator, for that
matter) was first introduced by Imai et al. (2010), and as in sections 2 and 3 it is dubbed the Quantile
Causal Mediation Effect (QCME). Shen et al. further fleshed out the possibility (Submitted-b), by
comparing various alternatives based on approaches from econometrics (Chesher, 2003; Kim and Muller,
2004; Lee, 2007b; Ma and Koenker, 2006), and extending the familiar Causal Steps approaches (Baron
and Kenny, 1986). Shen et al (Submitted-b) concluded that extending the Causal Steps approach to
quantiles, and that quantile analogues of conventional tests for mediation, had suitable properties for
assessing Quantile Mediation. A Bayesian approach was later proposed (Shen et al., Submitted-a) which
allowed for specific indirect effects of multiple mediators to be estimated simultaneously, as opposed to
the Causal Steps or QCME approaches that estimate each specific indirect effect separately. The Shen et
al. papers found that: 1) the QCME and Bayesian models performed equally well at assessing quantile
mediation for a single mediator; 2) focusing exclusively on the mean of the outcome distribution provided
an incomplete picture of mediational relationships; and 3) failing to account for the inter-mediator
correlation resulted in misleading findings for each mediator‟s specific indirect effects (Shen et al.,
Submitted-a; Shen et al., Submitted-b).
The basic setup for the Bayesian Quantile Mediation Model involving a single mediator (Shen et
al., Submitted-a) parallels the specification in equations 4.1 and 4.2, with the following modifications.
Equation 4.2 corresponds to the exact same model as in Shen et al. (Submitted-a), assuming independence
in the sample observations and the usual assumptions accompanying linear regression. They then take
equation 4.1 a step further by specifying ε
1
as u from equation 1.13 (yielding the variance formula above),
71
and adding the additional term which is a function of the quantile of interest (η). Collectively, these
modifications allow one to assess mediation for different outcome quantiles, as summarized in section 3.
However, the independence assumption is unrealistic for many prevention studies (cf. section 4.1), and
models are needed which properly account for the multilevel structure inherent in studies such as the HPS
and CHS. The following section details a model which can accomplish this very task.
Following the description in the preceding section, we can rewrite equations 4.1 and 4.2 as
equations 4.5 and 4.6. Written this way, the parameters of equation 4.5 will correspond to the η
th
quantile
of Y (Yu and Moyeed, 2001; Shen et al., Submitted-a). Simple applications of Bayes theorem results in
the following hierarchical structure of the model (equations 4.5-4.9), where f()and π() represent
probability density functions and prior distributions, respectively. In order to set up a Gibbs sampler to fit
this model, we first need to specify appropriate prior distributions for the model parameters. In the initial
stages of prototyping different prior specifications, we arrived at the strategy described below.
(4.5) 𝑦 𝑖 𝑗 ~𝐴 𝐿 𝐷 𝜇 𝑌 , 1, 𝜏 , 𝑤 ℎ𝑒 𝑟 𝑒 𝜇 𝑌 = 𝛽 0𝑖 + 𝛽 1𝑖 𝑀 𝑖 𝑗 + 𝛾 𝑖 ′
𝑋 𝑖
(4.6) 𝑀 𝑖 𝑗 ~𝑁 𝜇 𝑀 ,𝜍 𝑀 2
,𝑤 ℎ𝑒 𝑟 𝑒 𝜇 𝑀 = 𝛼 0𝑖 + 𝛼 1𝑖 𝑋 𝑖
(4.7)
𝛼 0𝑖 𝛼 1𝑖 ~𝑓 𝜶 𝜮 (𝜶 )
(4.8) 𝛽 0𝑖 𝛽 1𝑖 𝛾 𝑖 ′ ~𝑓 (𝜷 |𝜮 (𝜷 ))
(4.9) 𝑓 𝜽 , 𝜮 𝒚 , 𝑿 , 𝛼 0𝑖 , 𝛽 0𝑖 ∝ 𝑓 𝒚 , 𝜷 𝒊 𝜷 , 𝜮 (𝜷 ) 𝜋 𝜷 𝑓 𝑴 , 𝜶 𝒊 𝜶 ,𝜮 (𝜶 ) 𝜋 𝜶 𝜋 𝜮 (𝜶 ) 𝜋 𝜮 (𝜷 )
As all the information about the parameters is contained in the posterior distribution, one ought to
be able to use independent priors for model parameters, and combine the samples from the resulting
posterior distributions to obtain the non-diagonal elements of Σ. While selection of the hyperparameters
of the prior distributions could be done as un-informatively as possible (i.e. Normal distributions with
mean zero and variance “really large”), such an approach only works well when the sample observations
are independent (Shen et al., Submitted-a). It was found very early in this work that the same setup
resulted in inefficient estimation (both in terms of rate of convergence and length of credible intervals) of
the parameters of interest. The choice of prior was a rather important one in our work, and we chose our
72
priors empirically by fitting quantile regression models for each of the g groups (e.g. waves of data
collection) separately, and taking the mean of each of the five parameter estimates from the g different
models as our prior means. This choice was especially important because the model is intended to deal
explicitly with within-group correlation structures inherent in multilevel data, and ignoring such
correlations may result in biased estimates of the indirect effects. More importantly, the ALD has a
sharply peaked shape, as in Figure 4.1. As a consequence, allowing for an uninformative prior may cause
problems when using MCMC methods such as the Metropolis-Hastings algorithm (Chib and Greenberg,
1995), since such algorithms will almost never propose moves away from Markov Chains that start very
far above or below the truth.
Figure 4.1 Shape of the Asymmetric Laplace Distribution for different settings
73
Maxwell and Cole (2007) show that models which ignore correlation between sample
observations result in a bias that is directly proportional to the covariances between α
1
and β
1
, and
between α
0
and β
0
, both of which may be non-zero in a variety of settings. As a result, estimates of
indirect effects that ignore the correlation (i.e. treat the data as cross-sectional) may be under- or over-
estimated to varying directions and degrees (Maxwell and Cole, 2007, p. 36), hence our choice of the
mean of the estimates across the groups, or waves of measurement as in the data analysis example in
section 5.4, ought to “average out” such potential biases in the prior. Furthermore, we feel this choice of
prior is justified based on the fact that the results from any multilevel analysis ought to reflect the results
from separate cross-sectional analyses for each of the groups individually. In the case of multiple waves
of data collection, the trend across time from a longitudinal analysis should be similar to an analysis
which examines the indirect effect at each wave separately. If that were not the case, then a linear model
is not appropriate in the first place, and more complexity needs to be added to the model than what is
presented in this dissertation (i.e. splines or polynomial terms to allow for non-linearity in time).
The conditional distributions of β and Σ do not have known forms, but one can set up a random
walk Metropolis-Hastings sampler using the posterior joint distribution in equation 4.9 above. Such a
sampler can be set up rather easily using WinBUGS (http://www.mrc-bsu.cam.ac.uk/bugs/), by
employing the hierarchical structure of the model in equations 4.5-4.8. As described above, the prior
means of α and β needed to be specified empirically, as early experiments indicated that the rates of
convergence and length of credible intervals were much improved when using informative prior
distributions. One could follow the strategy for prior specification described in section 3 above, i.e. use
non-informative priors, or could employ some prior knowledge about parameters for such models. For
example, one might have already fitted a multilevel mediation model for the mean of the outcome, and
could use the resulting estimate and its standard error to construct a suitable prior distribution.
74
4.3 Simulation Study
A simulation study was conducted to assess: 1) whether the multilevel quantile mediation model
produces good (in terms of both MSE and bias) estimates of the quantile indirect effects; 2) the coverage
rates of the model; and 3) the consequences of ignoring a multilevel structure in the sample observations.
We fit the QCME (Imai et al., 2010) and Bayesian quantile mediation models which assume
independence in the sample observations (Shen et al., Submitted-a) to examine point number 3. We
generated data according to equations 4.10-4.13, which allow the effect of M to vary by quantiles of Y.
The variable X was taken to be a treatment variable with 300, 500 and 1000 subjects split into two
treatment groups, of roughly equal size. Since the distributions of indirect effects are not always Normal
(MacKinnon et al., 2004), posterior medians were used to obtain Bayesian point estimates for all model
parameters.
For either single or multiple mediator settings, the true value of any β
1
(η) is equal to β
1
+F
-1
ε1
(η).
The indices i and j are defined as in section 4.2, with i corresponding to subject-level units and j
corresponding to repeated measurements within subject i. We also let the number of subjects be 300, 500
or 1000, to assess the effects of varying the sample size, while it was assumed that each subject had two
repeated measurements. We decided to allow α
1
to be the same for all outcome quantiles, as we were
only interested in how mediated effects change across the outcome distribution, and not for quantiles of
the mediator. The effect of wave of measurement was assumed to be zero for all quantiles of Y, in order
to mimic the data structure in the HP study (see. Figure 5.7). The true values of the fixed and random
effects and their associated variances, were chosen to reflect data from the HP study, with: 1) waist
circumference as Y, 2) perceived walkability as M, and 3) intervention group as X. As outlined in section
9 of MacKinnon (2008), this setup makes the intercepts for the outcome and mediator group-specific,
while the effects of M on Y are allowed to vary across different quantiles of the outcome distribution.
(4.10) 𝑌 𝑖 𝑗 = 𝛽 0𝑖 + (𝛽 1𝑖 + 𝘀 1𝑖 𝑗 )𝑀 𝑖 𝑗 + 𝛾 𝑖 ′
𝑋 𝑖
(4.11) 𝑀 𝑖 𝑗 = 𝛼 0𝑖 + 𝛼 1i
𝑋 𝑖 + 𝘀 2𝑖 𝑗
75
(4.12)
𝛼 0𝑖 𝛼 1𝑖 ~𝑓
4 1
1 0
0 1
(4.13) 𝛽 0𝑖 𝛽 1𝑖 𝛾 𝑖 ′ ~𝑓 ( 100 0 0 |
12 0 0
0 1 0
0 0 1
)
Underlying
Model
Method η
Sample Size
300 500 1000
Random
Intercept
(α
1i
=α
1,
β
1i
= β
1,
and γ’
i
= γ’)
Bayesian
Multilevel
0.1 -1.760 -0.606 -0.557
0.3 -1.669 -0.129 -0.615
0.5 -0.078 -1.455 -0.038
0.7 -2.563 -1.704 -0.912
0.9 -1.940 -1.262 -0.711
Bayesian
Independence
0.1 2.906 -0.486 -2.661
0.3 -7.635 -10.059 -7.748
0.5 -4.318 -2.361 0.256
0.7 11.359 10.458 4.833
0.9 1.648 -0.631 -2.012
QCME
0.1 -7.874 -7.784 -9.282
0.3 -5.683 -6.583 -8.896
0.5 -2.795 -3.022 0.371
0.7 -6.965 -9.260 -7.846
0.9 -7.999 -8.099 -8.260
Random
Intercept and
Slope
Bayesian
Multilevel
0.1 -6.75 -5.00 -4.90
0.3 -6.82 -3.84 -3.36
0.5 -1.30 -2.07 -0.73
0.7 -7.42 -5.22 -4.23
0.9 -6.85 -5.76 -4.91
Bayesian
Independence
0.1 -6.40 -5.04 -4.95
0.3 -6.37 -4.25 -4.02
0.5 -0.65 -1.70 1.04
0.7 -6.40 -5.46 -3.44
0.9 -5.98 -5.79 -4.33
QCME
0.1 -4.75 -3.37 -4.25
0.3 -3.64 -2.41 -2.93
0.5 -0.65 -1.43 0.32
0.7 -4.26 -3.50 -2.89
0.9 -4.17 -3.82 -3.97
Table 4.1. Relative Bias (%) of Indirect Effect (αβ(η) or QCME) using the Different Models
Table 4.1 clearly shows that the bias of the estimated indirect effects for the three models is larger
at the extreme quantiles of the outcome distribution, compared to central ones, and tends to decrease or
76
remain about the same with increasing sample size. The bias of the QCME and Bayesian independence
models, which assume independence, are larger for the random-intercept data-generating model, and in
general over- or underestimate the true indirect effect in much the same way that Maxwell and Cole show
is true for the mean multilevel mediation setting (2007). Although the Bayesian independence model
exhibits rather large bias for the 30
th
and 70
th
percentiles at smaller sample sizes, the bias decreases
noticeably with increasing sample size, to a reasonable level for 1000 independent level-2 observations.
This bias is less pronounced when dealing with models that have a random intercept and random slope, in
which case all three models perform about the same. While time effects are assumed to be zero in this
setting, such an extension could easily be added to the model. However, the nature of time effects (e.g.
whether they interact with treatment effects, etc.) are often based on theoretical understandings of
intervention processes, and their inclusion must be decided on a case-by-case basis. The simulation
model also does not allow the treatment effects to vary across either outcome or mediator quantiles, and
while such an exploration would be interesting, it is beyond the scope of this dissertation to examine,
since the focus is on the indirect effects for quantiles of the outcome.
Table 4.2 clearly indicates that the variability of the estimated indirect effects is roughly the same
or larger for the models which assume independence, compared to those which included random effects.
When comparing the random intercept and random slope models, we find for either multilevel model that
the estimated indirect effects are more variable for random slope v. random intercept settings for smaller
sample sizes and at the upper and lower tails of the outcome distribution, while the MSE either decreases
or stays about the same with increasing sample size for the central quantiles. Nevertheless, both classes
of data-generating models result in the same expected pattern, where variability of the indirect effects
decreases with increasing sample size at all quantiles, and increases as the quantile of interest moves
towards the extremes at any given sample size. In any event, the random effects models explicitly
separate the between- and within-subject sources of variability, resulting in less variable estimates of the
indirect effects. This is clearly the case for the random slope model results shown in table 4.2.
77
While provided, the results from the QCME should be interpreted with caution, as it estimates a
different target parameter. Specifically, the QCME attempts to recreate the outcomes as they may have
occurred under the unobserved treatment conditions, and then estimates the indirect effects in terms of
individual differences between the observed and “counterfactual” treatment conditions (Imai et al., 2010).
The construction of the QCME is thus intended to allow one to infer the indirect effect (via some
mediator) of two treatment conditions (Y,M|X=1) – (Y,M|X=0), at different quantiles of the distribution of
the outcome Y. Nevertheless, we find that the QCME appears to perform about the same when all model
parameters are random effects. This seems reasonable given the definition of the QCME, which is
constructed to parallel a truly longitudinal study design when the data are treated as cross-sectional, and
so ought to be able to capture a multilevel data structure when it truly exists. It also explains why the
QCME does poorly in the random intercept-only condition, since that data-generating model makes
explicit the assumption that indirect effects ought not to exhibit between-subjects variability, while the
QCME is trying in vain to build such a structure into the model.
78
Underlying
Model
Method η
Sample Size
300 500 1000
Random
Intercept
(α
1i
=α
1,
β
1i
= β
1,
and γ’
i
= γ’)
Bayesian
Multilevel
0.1 0.292 0.191 0.094
0.3 0.057 0.040 0.022
0.5 0.007 0.009 0.008
0.7 0.053 0.039 0.022
0.9 0.300 0.189 0.095
Bayesian
Independence
0.1 0.747 0.387 0.214
0.3 0.297 0.201 0.101
0.5 0.199 0.123 0.067
0.7 0.373 0.268 0.115
0.9 0.805 0.419 0.213
QCME
0.1 0.698 0.459 0.308
0.3 0.254 0.165 0.095
0.5 0.185 0.132 0.059
0.7 0.242 0.177 0.091
0.9 0.750 0.461 0.272
Random
Intercept and
Slope
Bayesian
Multilevel
0.1 0.78 0.46 0.24
0.3 0.27 0.17 0.08
0.5 0.19 0.11 0.06
0.7 0.29 0.16 0.08
0.9 0.84 0.43 0.24
Bayesian
Independence
0.1 1.02 0.67 0.35
0.3 0.31 0.20 0.10
0.5 0.19 0.11 0.06
0.7 0.33 0.20 0.09
0.9 1.02 0.62 0.33
QCME
0.1 0.731 0.420 0.221
0.3 0.262 0.165 0.076
0.5 0.193 0.107 0.058
0.7 0.281 0.157 0.076
0.9 0.756 0.397 0.222
Table 4.2. Mean Squared Error of Indirect Effect (αβ(η) or QCME) using the Different Models
79
5 Quantile Mediation Analysis of Data from the HPS
5.1 Overview of the Analysis Models
[Some material in this chapter has been submitted to Multivariate Behavioral Research and
Statistics in Medicine, and may appear in those journals as well.]
Using the theoretical model in Figure 1.1 as a guide, the following structural quantile model was
considered in equations 5.1 and 5.2. Where Y is an element of {BMI, Waist Circumference, MVPA},
Z={Age, Sex, Race},Z
1
⊂{ Age, Sex, Race, SES}, I = 1(Subject lives in preserve), and E is a summary
score of the NEWS variables. The goal was to compare the quantile mediation analysis results between
the two-stage models (section 2), the Bayesian model assuming independence (section 3), the multilevel
model (section 4), and the latent mediator model (section 3.4). For the Control Function method, the
same quantile was estimated for both E and Y. Tests for mediation were conducted using the Quantile-
Goodman Test, except for the Bayesian models which used 95% credible intervals to conduct inference
about the indirect effect, and the QCME. For the approaches that were not capable of handling latent
variables, neighborhood walkability (denoted E) was quantified by computing the mean of the relevant
items as described in section 1.2.2, and scaled to standard deviation units (SD=0.478, Range=1.489-
3.967).
(5.1) 𝑄 𝑌 𝜏 𝐸 , 𝐼 = 𝛽 0
𝜏 + 𝛽 𝜏 𝐸 + 𝛾 ′
𝜏 𝐼 + 𝜋 𝑍
(5.2)𝐸 𝐸 𝐼 , 𝑍 = 𝛼 0
+ 𝛼 𝐼 + 𝛿 𝑍 1
Variable selection for each of the equations 5.1 and 5.2, particularly for Z and Z
1
, proceeded with
examining plots of the quantile process for each covariate. Figures 5.1-5.4 display the results of these
analyses, and the following points are worth noting. Two proxies for SES were assessed, attained
education level and household income. As the latter variable had a large proportion of missing data, and
education level appeared to give more stable effect estimates for all three outcomes as well as the
walkability index, education level was taken as the proxy for SES. Figures 5.1 5.2 and 5.4 clearly
indicate that gender, age, Hispanic ethnicity and education are all independently related to BMI, waist
80
circumference and Walkability, at all quantiles of their distributions. Hispanic ethnicity and education do
not appear to be related to MVPA at any of its quantiles. Walkability is a feature of the environment and
should not be “caused” by age, gender or ethnicity, but be a result of socioeconomic conditions in the
local neighborhood. As such, only education was included as a covariate in Z
1
. In order to identify the
two-stage models, one must further restrict Z and Z
1
to have at least one “uncommon” column. Since
Hispanic ethnicity is strongly related to educational attainment (~50% of non-Hispanics are college
graduates, while only 25% of Hispanics are), education was removed from the column space of Z, thus
making the model identifiable.
5.2 Analysis Results for the Single Mediator Model
The analysis results are reported in Tables 5.1 and 5.2. Figure 5.5 indicates a very clear picture
with respect to quantile mediation, in that the indirect effect is larger among those at the higher end of the
BMI distribution. In fact, the effect size increases as the subjects under consideration become heavier,
indicating a possible quantile trend across the distribution of BMI. Clearly this trend could not have been
detected had we focused only on mediation for the center of the BMI distribution. One easily sees a
similar pattern for waist circumference. It is worth noting that the fitted value and control function
estimates are quite different from the Causal Steps and QCME, while the latter two are nearly identical
save for the total effect. Moreover, the median indirect effects from the Causal Steps and QCME
methods are about equal to the mean indirect effects obtained from the EQS model (Bentler, 1980), for
both BMI and Waist Circumference. On the other hand, the median indirect effects obtained from the
fitted value and control function methods are grossly different from the EQS-based indirect effects. It is
quite clear that, for these data, the fitted value and control function estimates are probably not correct.
Nevertheless, the results suggest that overweight or obese individuals experience greater indirect effects
of living in the preserve (via changes in neighborhood walkability) than normal weight subjects for whom
the effects are essentially zero.
81
These results suggest two key points. First, the quantile mediation approaches compared in this
paper are actually more robust (Huber, 1972) to the underlying distributions of the outcome variables of
interest than conventional techniques. Our analysis of the Healthy Places data indicate that individuals in
the 85
th
or 95
th
percentiles of BMI or Waist Circumference experience total, direct, and indirect effects of
living in the preserve that are roughly 2-5 times greater than normal weight individuals. Figure 5.5
further indicates that the effects of the preserve tend to be about the same for individuals below the 70
th
percentile of either BMI or Waist Circumference, while the effects steadily increase beyond that threshold.
Such effects would have been impossible to detect using conventional mediation analysis techniques.
Second, the QCME and Causal Steps methods seem best suited to evaluating quantile mediation
for these data. The large discrepancy between those two and the fitted value or control function methods
may be likely due to some unmeasured confounding or the presence of measurement error for our
measure of neighborhood walkability. The fact that neither method employs anything remotely similar to
the sequential ignorability assumption likely explains this discrepancy. More importantly, one would
expect that estimates of the direct and indirect effects for the median of the outcome distribution ought to
compare well to those for the mean. However, this is clearly not the case for the fitted value or control
function methods, while the Causal Steps and QCME estimates at the median compare well to those at the
mean. In light of their comparable simulation performance relative to the fitted value and control
function methods, it is clear that the Causal Steps and QCME methods are better at characterizing
quantile mediation than the other two. This should come as little surprise, though, since both are based
upon a principled manner of thinking about statistical mediation, whereas the fitted value and control
function methods were developed to deal with endogeneity biases rather than mediation.
While the estimates of the indirect effects presented in Tables 5.1 and 5.2 are generally not
statistically significant, they nevertheless shed light on how the effects of living in the preserve can
greatly differ for different individuals. Conventional mediation analysis characterizes direct and indirect
effects on a hypothetical “average” person, meaning average levels of outcome, mediator, and other
covariates. Figure 3 clearly indicates otherwise, where individuals in the upper quantiles of BMI and
82
Waist Circumference appear to experience vastly different indirect effects of the preserve. The pattern in
the right panel of Figure 5.6 further suggests that the indirect effect may vary not only by one‟s outcome
value, but also by the level at which one lies along the mediator distribution. It is evident that overweight
and obese individuals‟ whose perception of their built environment is extremely high or low appear to
experience little to no effect of the preserve, while such individuals who have about average perception
experience much larger effects than normal weight individuals (regardless of their perceptions of their
neighborhoods). Thus, it is clearly important to assess how the indirect effects change across both the
outcome and mediator distributions as well.
This point is further emphasized by even a cursory examination of Table 5.3, which clearly shows
that the conditional distributions of BMI and Waist Circumference have rather different shapes between
the two intervention groups. This is indeed the key advantage of assessing mediation effects for different
quantles. Although the BMI distribution may not be normal for every particular set of data, focusing on
the mean of the distribution of such outcomes may lead to an incomplete picture of mediational
relationships, since individuals with higher BMI may indeed respond differently to treatments than those
with average BMI. In some situations, the Normality of the data may not even be relevant, since two
samples could be centered on the same mean/median value, but have very different shapes at the
upper/lower tails (as is the case for BMI evident in Table 5.3). It is also interesting to note that the
QCME estimates of the total effects of the preserve, on both BMI and Waist Circumference, appear to be
a bit inflated for the median and too small for the 85
th
and 95
th
percentiles of those outcomes, while the
Fitted Value and Causal Steps methods produce estimates of the total effects which are much closer to
those observed in the raw data (viz. comparison with Table 5.3).
Furthermore, regardless of the statistical significance of the estimates of the indirect effects, the
patterns across different quantiles can still illuminate differences between how individuals with different
outcome levels experience direct and indirect effects of a treatment. This assessment of the change of
direct and indirect effects across outcome quantiles has ramifications that reach beyond simple
examination of statistical significance. Most notably, it can inform researchers as to targeting
83
interventions to specific subgroups of the population, particularly when those who are most sensitive to
intervention or lifestyle/environmental changes are also those whose outcome levels are considered to be
pathological (such as the examples in Tables 5.1 and 5.2).
84
Figure 5.1 Effects of Variables on BMI
Figure 5.2 Effects of Variables on Waist Circumference
85
Figure 5.3 Effects of Variables on MVPA
Figure 5.4 Effects of Variables on Walkability
86
Method η αβ(η) γ’(η) γ(η) αβ
η
≠0
1
Baron and Kenny
0.50 -0.23 -0.33 0.26 No
0.85 -0.99 -1.02 -0.89 No
0.95 -0.92 -1.79 -0.72 No
Causal Mediation
0.50 -0.24 -0.33 -0.57 No
0.85 -1.03 -1.02 -2.05 Yes
0.95 -0.92 -1.79 -2.71 No
Control Function
0.50 -2.61 1.93 0.26 No
0.85 -2.38 1.50 -0.89 No
0.95 -4.16 3.79 -0.72 No
Fitted Value
0.50 -1.25 0.74 0.26 No
0.85 -3.27 2.63 -0.89 Yes
0.95 -4.62 0.98 -0.72 No
Conventional
Mediation Model
E(Y|M,I) -0.28 -0.60 -0.88 No
Table 5.1. Analysis results for BMI
Outcome model adjusted for gender, age and Hispanic ethnicity. Mediator model adjusted for education.
Method η αβ(η) γ’(η) γ(η) αβ
η
≠0
1
Baron and Kenny
0.50 -0.75 -2.09 -0.5 No
0.85 -1.51 -1.26 -2.8 No
0.95 -1.33 -2.65 -5.6 No
Causal Mediation
0.50 -0.80 -2.09 -2.89 Yes
0.85 -1.67 -1.26 -2.94 No
0.95 -1.32 -2.65 -3.96 No
Control Function
0.50 -8.34 0.00 -0.5 No
0.85 -5.02 3.39 -2.8 No
0.95 -9.91 4.55 -5.6 No
Fitted Value
0.50 -3.31 4.66 -0.5 No
0.85 -7.07 2.16 -2.8 Yes
0.95 -9.75 5.00 -5.6 No
Conventional
Mediation Model
E(Y|M,I) -0.77 -2.15 -2.87 Yes
Table 5.2. Analysis results for Waist Circumference
Outcome model adjusted for gender, age and Hispanic ethnicity. Mediator model adjusted for education.
87
Quantity
of
Interest
BMI Waist Circumference
Preserve=0 Preserve=1 Preserve=0 Preserve=1
Mean 29.27 28.33 98.03 95.91
Variance 37.27 32.64 193.38 165.39
Skewness 1.04 0.57 0.79 0.63
Kurtosis 4.16 3.14 3.75 3.55
50
th
%ile 28.15 28.17 96.52 95.70
85
th
%ile 35.59 34.25 110.69 107.09
95
th
%ile 41.33 40.27 124.29 118.19
Table 5.3 Moments and Order Statistics of BMI and Waist Circumference
Figure 5.5 Indirect Effect of Walkability between Intervention and Response
88
Figure 5.6 Indirect Effect of Walkability by Quantiles of Walkability and BMI
89
5.3 Analysis Results for the Multiple Mediator Model
The analysis results in section 5.2 model the indirect effect of neighborhood walkability as if it
were independent of the other two factors described in section 1.2.2. However, a cursory glance at Figure
1.1 clearly suggests that the effects of neighborhood walkability on physical activity may actually be
transmitted through one of the other two factors. Moreover, the correlation between the three mediators is
evident when one considers their raw correlations at the first wave of observation (Table 5.8). As such, it
was of interest to determine whether the estimates of the indirect effects of neighborhood walkability on
the three outcomes of interest differed by whether the correlation between the three mediators was
ignored. To make this comparison, equations 5.1 and 5.2 were modified to account for all three mediators
in Figure 1.1, resulting in equations 5.3 and 5.4.
(5.3) 𝑄 𝑢 𝑎 𝑛 𝑡 𝑖 𝑙 𝑒 𝑂 𝑢 𝑡 𝑐 𝑜 𝑚 𝑒 𝑀 𝑜 𝑑 𝑒 𝑙 : 𝑄 𝐵 𝑀 𝐼 𝜏 𝑴 ,𝐼 , 𝑍 1
= 𝛽 0
+ 𝛽 (𝜏 )
𝑘 𝑀 𝑘 𝑝 𝑘 =1
+ 𝛾 ′
𝜏 𝐼 + 𝜑 𝜏 𝑍
(5.4) 𝑀 𝑒 𝑑 𝑖 𝑎 𝑡 𝑜 𝑟 𝑀 𝑜 𝑑 𝑒 𝑙 : 𝐸 𝑀 𝑘 𝐼 , 𝑍 2
= 𝛼 0
+ 𝛼 𝑘 𝐼 + 𝜋 𝑘 𝑍 1
The first mediator measures personal aspects of health, using survey scales that measure health
behaviors related to personal attitudes (Courneya and McAuley, 1995), personal meanings, intentions, and
self efficacy (Motl, 2000) for physical activity. The second mediator measures the social aspects of
health, using survey scales that measure health behaviors related to social norms, family norms (Gattshall,
2008), and social network analysis (Burt, 1984). The third mediator is an index of neighborhood
walkability constructed from the Neighborhood Environment Walkability Survey (Adams et al., 2009).
Because the personal and social mediators were quantified using different surveys, each using different
Likert scales, a weighted sum (by maximum score) was computed for each survey scale. Standard
deviation scores of each mediator were then used for analysis purposes. Details on each of the three
mediators are described in section 1.2.2. Z
1/2
are vectors of demographic covariates including age, gender,
Hispanic ethnicity, and educational attainment as an indicator of socioeconomic status. Participants‟
weight and height were measured twice using an electronically calibrated digital scale (Tanita WB-110A)
and stadiometer (PE-AIM-101) to the nearest 0.1 kg and cm, respectively. BMI was subsequently
computed for each of the two measurements using the standard formula (kg/m
2
), and the resulting average
90
of the two was taken as a participant‟s BMI. Waist circumference (cm) was also measured at each
occasion, and the average of the two was used for analysis purposes. Both outcomes were examined in
these analyses, as waist circumference may be a more relevant indicator of obesity-related health
conditions because it is directly related to unhealthy distributions of body fat (such as having a pear-
shaped torso).
Both the Marginal and Joint models were considered for these analyses, both to echo and to
highlight points raised in the simulation study. Table 5.7 shows the results of the analysis, and it is
immediately clear that both the point and interval estimates for the EQS-based Joint and Marginal models
differ, and substantially more so for waist circumference. Moreover, for both BMI and waist
circumference, the magnitude of the indirect effects appears to be monotonically decreasing with
increasing response values, quite counter to what would be expected for such outcomes. On the other
hand, the Joint model clearly demonstrates larger indirect effect sizes for individuals with higher BMI and
waist circumference. For example, the effects at the 95
th
percentiles of both outcomes are about 2.5 to 3
times as large as the effect at the means or medians, suggesting that overweight and obese individuals are
likely to experience about a three-fold gain in body composition improvement due to improvements in
perceived neighborhood walkability that result from moving into a Smart Growth community. Moreover,
the Joint model allowed us to detect significant indirect effects of perceived walkability on BMI and waist
circumference at all quantiles considered, while the Marginal model would have indicated that the effect
of the preserve on these outcomes was not at all mediated by walkability at any quantile.
While the personal and social components exhibited moderate correlation (ρ=0.5), their raw
correlations with the environmental component were at best weak (Cohen, 1988), with ρ=0.14 and 0.02,
respectively. The EQS model-based estimates of the latter two correlations were less than weak, with
both personal and social components having correlation of about 0.17 with the environmental component,
and about 0.52 with each other. As expected, the Bayesian model resulted in the same correlation matrix
regardless of which quantile of which response was of interest, and the resulting Bayesian model-based
correlation matrix was also a closer fit to the raw correlation matrix than the EQS model-based one (see
91
Table 5.8). Moreover, the EQS model failed to achieve suitable model fit for either BMI or waist
circumference, and this lack of fit is easily seen when comparing the first two correlation matrices in
Table 5.8. The difference between the two model-based correlation matrices thus explains the difference
in point estimates between the EQS-based model and the Bayesian mean model, though their credible
intervals overlap almost completely (see Table 5.7).
The results for the Bayesian quantile models in Table 5.7 highlights the importance of accounting
for the correlation between all potential mediators when one is interested in extreme quantiles of the
response distribution. While the approach taken in this paper may not be as conceptually appealing as
others rooted in the causal inference paradigm (Imai et al., 2010), it allows for relatively straightforward
point and interval estimation of quantile mediation for any number of mediators. Though there are other
issues at play that could further illuminate the effects of the Smart Growth community such as allowing
for measurement error in the mediators, this Bayesian approach can be easily extended to more
complicated models involving measurement error or longitudinal data, by exploiting the commonly used
data augmentation techniques that are used for many Bayesian methods (Gelman et al., 2004; Lee, 2007).
92
Outcome Method Quantile αβ
NEWS
95% CI for αβ
NEWS
Σαβ
k
+ γ' γ
BMI
EQS Marg. Mean -0.278 (-0.58,0.02) -0.89 -0.88
Bayesian
Marginal
Mean -0.239 (-0.61,-0.01) -0.71 -0.75
0.5 -0.192 (-0.49,0.02) -2.31 -0.93
0.85 -0.094 (-0.30,0.03) -4.10 -1.34
0.95 -0.020 (-0.16,0.09) -3.98 -2.86
EQS Joint Mean -0.208 (-0.49,0.08) -0.62 -0.62
Bayesian
Joint
Mean -0.178 (-0.52,0.05) -0.59 -0.78
0.5 -0.214 (-0.45,-0.05) -0.58 -0.23
0.85 -0.271 (-0.82,0.07) -0.63 -1.28
0.95 -0.581 (-1.31,-0.12) -2.03 -2.61
Waist
Circum.
EQS Marg. Mean -0.723 (-1.43,-0.02) -3.04 -2.87
Bayesian
Marginal
Mean -0.531 (-1.38,-.004) -2.60 -2.51
0.5 -0.362 (-1.08,0.07) -2.02 -1.83
0.85 -0.192 (-0.49,0.02) -2.31 -0.92
0.95 -0.054 (-0.37,0.25) -9.86 -5.57
EQS Joint Mean -0.572 (-1.25,0.09) -2.44 -2.44
Bayesian
Joint
Mean -0.397 (-1.17,0.13) -2.23 -2.63
0.5 -0.359 (-0.73,-0.12) -2.02 -2.05
0.85 -0.851 (-1.81,-0.22) -2.48 -1.15
0.95 -1.335 (-2.73,-0.44) -5.69 -5.67
Table 5.7. Analysis of HPS Data using Different Models
Method Mediators Personal Social Environmental
Raw Correlations
Personal 1 0.5 0.14
Social 0.5 1 0.02
Environmental 0.14 0.02 1
EQS Model-Based
Personal 1 0.52 0.17
Social 0.52 1 0.17
Environmental 0.17 0.17 1
Bayesian
Model-Based
Personal 1 0.51 0.17
Social 0.51 1 0.08
Environmental 0.17 0.08 1
Table 5.8. Correlations between the three Mediators in the HPS model
93
5.4 Analysis Results for the Multilevel Mediation Model
As above, I is an indicator of whether or not a subject lives in the intervention community, and
the mediator of interest is an index of neighborhood walkability constructed from the Neighborhood
Environment Walkability Survey (Adams et al., 2009). Standard deviation scores of the mediator were
used for analysis purposes. Z
1/2
are vectors of demographic covariates including age, gender, Hispanic
ethnicity, study year, and educational attainment as an indicator of socioeconomic status, and their
corresponding parameters are also vectors. These were intentionally chosen to be independent of time,
and occurring naturally as pretreatment covariates. This latter criterion is also important in related causal
inference approaches, as it is a crux of both the sequential ignorability (Imai et al., 2010; VanderWeele,
2009) and d-separation (Pearl, 1998) criteria for identifying causal effects. Participants‟ weight and
height were measured twice during each wave using an electronically calibrated digital scale (Tanita WB-
110A) and stadiometer (PE-AIM-101) to the nearest 0.1 kg and cm, respectively. BMI was subsequently
computed for each of the two measurements using the standard formula (kg/m
2
), and the resulting average
of the two was taken as a participant‟s BMI. Waist circumference (cm) was also measured twice at each
wave, and the average of the two was used for analysis purposes. Both outcomes were examined in these
analyses, as waist circumference may be a more relevant indicator of obesity-related health conditions
because it is directly related to unhealthy distributions of body fat (such as having a pear-shaped torso).
The mediator, perceived walkability, was assumed to be the same at both waves of data collection.
This model is analogous to the 2→2→1 model of Krull and MacKinnon (2001). Effects of measurement
wave were assessed as well, though no treatment-by-wave, mediator-by-wave, or treatment-by-mediator
interactions were included, as such models are not identifiable using the product of coefficients (Imai et
al., 2010; Pearl, 2012). It should also be noted that, although measurement wave was not significantly
associated with BMI or waist circumference, other approaches to quantifying time have been (such as
months in residence), but we have left out time to keep things simple. We fit four different models to the
data, in the same fashion as the simulation study section 4.2. Multilevel quantile mediation models were
94
fit which allowed for only a random intercept, and one which let all model parameters be random effects.
The QCME and Bayesian models which assumed independence are also presented, to assess the benefits
of allowing the model parameters to be random effects. The models are defined below in equations 5.5
and 5.6. It should also be noted that, since the quantiles of BMI and waist circumference do not appear to
significantly vary by wave of measurement (figure 5.7), we let δ
1i
= δ
2i
= 0.
(5.5) 𝑄 𝑌 𝑖 𝑗 𝜏 𝑀 𝑖 𝑗 , 𝐼 𝑖 , 𝑍 1𝑖
,𝑇 𝑖 𝑗 = 𝛽 0𝑖 + 𝛽 1𝑖 𝜏 𝑁 𝐸 𝑊 𝑆 𝑖 + 𝛾 𝑖 ′
𝜏 𝐼 𝑖 + 𝜷 2𝑖 𝜏
𝑻 𝒁 1𝑖 + 𝛿 1𝑖 𝑇 𝑖 𝑗
(5.6) 𝐸 𝑁 𝐸 𝑊 𝑆 𝑖 𝐼 𝑖 , 𝑍 2𝑖 , 𝑇 𝑖 𝑗 = 𝛼 0𝑖 + 𝛼 1𝑖 𝐼 𝑖 + 𝜶 2𝑖 𝑇 𝒁 2𝑖 + 𝛿 2𝑖 𝑇 𝑖 𝑗
The analysis results are shown in table 5.9, and they indicate that individuals at the upper
percentiles of the distributions of BMI and waist circumference experience substantially greater indirect
effects of the intervention, via perceived walkability, than those at the center of the distribution.
Furthermore, like in table 5.7, those at the 90
th
percentiles experienced indirect effects ~ 2-6 times larger
than those at the median, depending on which model was used. The random slope model also appears to
be less powerful at detecting significant indirect effects, as evidenced by the much larger credible
intervals, compared to the random intercept model. This reflects the simulation study results from table
4.2, where the additional error introduced for α
1i
and β
1i
results in more variable estimates (and hence
wider credible intervals) for the indirect effects. Since the estimated indirect effects are roughly the same
for both the random intercept and the random slope models, at all quantiles considered, we are inclined to
believe the random intercept model is more appropriate because it yields more efficient estimates of the
target parameters (i.e. smaller credible intervals not containing zero).
The QCME and Bayesian Independence models appear to generally result in estimated indirect
effects that are smaller or larger than those from the two multilevel models. Whenever the QCME
estimates are similar to those of the Random Intercept model, they are generally less efficient. In
particular, the estimated indirect effects at the 70
th
and 90
th
percentiles of BMI and waist circumference
are roughly comparable between the QCME and Random Intercept models, but the credible intervals
resulting from QCME are considerably wider than those of the latter. In fact, the QCME fails to detect
95
significant indirect effects at the median and lower quantiles of the waist circumference distribution,
while the Random Intercept model finds significant indirect effects at all of those quantiles. This reflects
the point made by Bauer et al. (2006), in that “identification of the variance components depends heavily
on the number of Level 1 observations per Level 2 unit, whereas the accuracy with which they are
estimated depends on the number of Level 2 units. In [their] simulation study, [they] encountered serious
difficulty estimating the model when the number of Level 1 observations was small (e.g., four)”. Since
there are only two observations (Level 1 units) per subject (Level 2 units), it is unlikely that we would be
able to reliably estimate variance parameters for both random intercepts and random slopes.
Similarly, comparing the Bayesian independence and Random Slope models reflects a well-
known fact about multilevel models. Specifically, we know that Level-1 measurements of an outcome
taken on the same Level-2 unit at different occasions will tend to be positively correlated (Fitzmaurice et
al., 2004). As a result, models which ignore the multilevel data structure will incorrectly have smaller
standard errors and shorter credible intervals. This is not to say that the Random Slope model estimates
are “correct”, and the Independence model therefore fails to detect the “true” effects. Rather, it simply
clarifies why assuming independence results in shorter credible intervals when compared to models which
explicitly account for the underlying correlation structure, i.e. the Random Slope model. Credible
intervals obtained from the Random Intercept model are shorter than both the Independence and Random
Slope models, and seems to represent a tradeoff between incorrectly ignoring correlation (and getting
incorrectly short credible intervals) and overparametrizing the model with more random effects than may
be reliably estimable (see the preceding paragraph).
96
Outcome Quantile
αβ
NEWS
(95% Credible Interval)
QCME
Bayesian
Independence
Bayesian
Random Int.
Bayesian
Random Slope
BMI
10
th
-0.15
(-0.43, 0.06)
-0.10
(-0.61, 0.33)
-0.13
(-0.48, 0.18)
-0.15
(-1.86, 1.44)
30
th
-0.24
(-0.62, 0.03)
-0.17
(-0.76, 0.26)
-0.22
(-0.56, 0.04)
-0.22
(-1.83, 1.27)
50
th
-0.32
(-0.64, -0.04)
-0.24
(-0.62, 0.22)
-0.31
(-0.64, -0.07)
-0.27
(-1.9, 1.11)
70
th
-0.47
(-1.11, 0.01)
-0.43
(-0.97, -0.03)
-0.56
(-0.95, -0.19)
-0.52
(-2.17, 0.74)
90
th
-1.4
(-2.47, -0.53)
-1.19
(-2.04, -0.42)
-1.5
(-2.36, -0.88)
-1.58
(-3.68, -0.22)
Waist
Circum.
10
th
-0.55
(-1.35, 0.11)
-0.39
(-0.98, 0.09)
-0.5
(-0.84, -0.19)
-0.49
(-2.2, 1.01)
30
th
-0.49
(-1.3, 0.16)
-0.46
(-1.05, -0.03)
-0.61
(-1.06, -0.26)
-0.64
(-2.22, 0.71)
50
th
-0.61
(-1.51, 0.05)
-0.53
(-1.13, -0.53)
-0.69
(-1.19, -0.36)
-0.72
(-2.47, 0.67)
70
th
-1.29
(-2.4, -0.36)
-0.94
(-1.61, -0.29)
-1.2
(-1.85, -0.68)
-1.34
(-3.45, 0.2)
90
th
-3.06
(-5.46, -0.95)
-1.68
(-2.81, -0.62)
-2.15
(-3.21, -1.12)
-2.33
(-4.52, -0.77)
Table 5.9 Analysis of HPS Data using Different Multilevel Models
Figure 5.7 Quantile-Specific Effects of Wave of Measurement on BMI and Waist Circumference
97
5.5 Analysis Results for the Quantile Mediation Model with Latent Mediators
While the analysis results from sections 5.2-5.4 clearly demonstrate the benefits of using the
quantile mediation models proposed in this dissertation to assess mediational relationships at different
quantiles of the outcome distribution, none allowed for the common situation in which one or more of the
mediators are latent variables. This section presents the results of an analysis which did just that,
following the approach described in section 3.4. We use the same modeling setup as in section 5.4, with
three key differences. The model presented in equations 5.7-5.9 below assumes: 1) all model parameters
are fixed effects; 2) the data of interest are restricted to wave 1; and 3) the mediator is modeled as a latent
variable a la section 3.4. The indicators are denoted M
1
… M
5
, and are modeled as in equation 5.9.
(5.7) 𝑄 𝐵 𝑀 𝐼 𝑖 𝜏 𝑀 𝑖 , 𝐼 𝑖 , 𝑍 1𝑖
= 𝛽 0
+ 𝛽 1
𝜏 𝑁 𝐸 𝑊 𝑆 𝑖 + 𝛾 ′
𝜏 𝐼 𝑖 + 𝜷 2
𝜏
𝑻 𝒁 1𝑖 + 𝛿 1
𝑇 𝑖
(5.8) 𝐸 𝑁 𝐸 𝑊 𝑆 𝑖 𝐼 𝑖 , 𝑍 2𝑖 = 𝛼 0
+ 𝛼 1
𝐼 𝑖 + 𝜶 2
𝑇 𝒁 2𝑖 + 𝛿 2
𝑇 𝑖
(5.9)
𝑀 1𝑖 𝑀 2𝑖 𝑀 3𝑖 𝑀 4𝑖 𝑀 5𝑖
~𝑁 (
1 × 𝑁 𝐸 𝑊 𝑆 𝑖 𝜆 2
× 𝑁 𝐸 𝑊 𝑆 𝑖 𝜆 3
× 𝑁 𝐸 𝑊 𝑆 𝑖 𝜆 4
× 𝑁 𝐸 𝑊 𝑆 𝑖 𝜆 5
× 𝑁 𝐸 𝑊 𝑆 𝑖
, 𝜆 )
In order to model perceived walkability as a latent variable, five indicators were chosen from the
NEWS (Cerin et al., 2005; Adams et al., 2009) to represent the key characteristics of neighborhood
walkability. The items chosen were the responses to the items corresponding to subcales e-h and the
social interaction item, with items reverse-coded as necessary (cf. section 1.2.2). Z-scores of each item
were used so that every indicator was analyzed on the same scale, and to make the factor loadings
interpretable. The parameters in equations 5.7-5.9 were estimated at five quantiles of BMI and waist
circumference, to assess both how the indirect effects as well as the factor loadings varied across different
outcome quantiles.
The intercept terms Qη(Y|X,M) closely mirror the order statistics shown in table 5.3, with the term
for both BMI and waist circumference reflecting about what they should be, conditional on other
covariates. It is also clear that the indirect effects are larger at the upper percentiles, compared to the
98
median, for both BMI and waist circumference. Similar to the results in tables 5.7 and 5.9, the indirect
effects at the upper percentiles are roughly 5-7 times larger at the upper tails of the distributions of BMI
and waist circumference, compared to their medians. Also, it was important to make sure that the factor
loadings did not vary by quantiles of either outcome. Clearly this is the case, as the factor loadings for
each of the indicators appear to be almost exactly the same at every quantile, and also between the models
for BMI and waist circumference.
The directions of the loadings for the third and fourth NEWS items (i.e. λ
3
and λ
4
corresponding to
Traffic Hazards and Crime) also make sense, since they represent aspects of the environment that would
make it appear less walkable. More importantly, comparison of the “Bayesian Marginal” results from
table 5.7 and table 5.9 illustrate a key point regarding latent variable models. That is to say, accounting
for measurement error in the five individual NEWS items “enables analysts to pool the information from
these multiple, imperfect measurements in hopes of more accurately representing the [mediator] in the
quantile regression” (Burgette and Reiter, 2011). This is clearly the case, as the Marginal model from
section 5.3 suggests that the indirect effects are smaller at the upper quantiles than at the median, whereas
the model in equations 5.7-5.9 clearly indicates otherwise. Comparing these two sets of results, it is clear
that using a simple summary score of the 5 NEWS items fails to properly incorporate the uncertainty in
the individual items in the models for BMI and waist circumference. The Bayesian model described in
section 3.4, clearly achieves such a purpose, thus resulting in estimates of the indirect effects that better
reflect our understanding of the Healthy Places data (cf. table 5.3).
99
Outcome Parameter 10
th
%ile 30
th
%ile Median 70
th
%ile 90
th
%ile
BMI
Qη(Y|X,M)
20.17
(17.61,22.57)
24.22
(20.8,27.07)
27.24
(23.96,30.63)
35.84
(31.96,39.69)
43.72
(39.71,47.68)
αβ(η)
-0.24
(-0.6,0.03)
-0.17
(-0.58,0.13)
-0.34
(-0.79,0.01)
-1.75
(-2.91,-0.83)
-2.47
(-3.9,-1.26)
λ
2
0.96
(0.81,1.15)
0.95
(0.81,1.13)
0.95
(0.8,1.11)
0.93
(0.78,1.11)
0.93
(0.78,1.1)
λ
3
-0.8
(-1.01,-0.63)
-0.8
(-0.99,-0.63)
-0.8
(-0.97,-0.64)
-0.83
(-1.04,-0.66)
-0.86
(-1.07,-0.68)
λ
4
-0.65
(-0.86,-0.48)
-0.65
(-0.84,-0.47)
-0.64
(-0.82,-0.48)
-0.65
(-0.86,-0.47)
-0.69
(-0.9,-0.51)
λ
5
0.65
(0.49 0.83)
0.65
(0.5,0.83)
0.65
(0.49,0.82)
0.63
(0.48,0.81)
0.61
(0.45,0.79)
Waist
Circum.
Qη(Y|X,M)
82.75
(77.49,87.76)
89.01
(84.14,94.04)
99.77
(94.3,104.87)
114.6
(109.3,119.9)
143.53
(137.7,148.6)
αβ(η)
-1.76
(-3.06,-0.75)
-1.24
(-2.39,-0.32)
-1.04
(-2.29,-0.13)
-5.12
(-8.04,-2.43)
-5.37
(-8.7,-2.37)
λ
2
0.95
(0.8,1.13)
0.95
(0.79,1.12)
0.94
(0.79,1.11)
0.89
(0.74,1.06)
0.88
(0.72,1.05)
λ
3
-0.8
(-1.0,-0.62)
-0.78
(-0.98,-0.61)
-0.78
(-0.97,-0.61)
-0.83
(-1.02,-0.65)
-0.85
(-1.03,-0.67)
λ
4
-0.64
(-0.84,-0.46)
-0.63
(-0.82,-0.46)
-0.63
(-0.8,-0.47)
-0.67
(-0.86,-0.5)
-0.7
(-0.89,-0.52)
λ
5
0.67
(0.52,0.86)
0.64
(0.49,0.82)
0.64
(0.49,0.8)
0.6
(0.44,0.76)
0.6
(0.42,0.78)
Table 5.10 Analysis of HPS Data across various quantiles for indirect effect of perceived walkability
100
6 Discussion
This dissertation has described a general approach to mediation analysis, which allows for the
examination of mediational hypotheses across different quantiles of an outcome distribution. The
different models covered in the preceding sections provide a framework that can accommodate many
different types of data, including those: 1) with two or more mediators; 2) with longitudinal or multilevel
structures; and 3) for which some or all the mediators are latent variables. These models allow one to
overcome many of the limitations of other mediation methods. Most notable among such limitations are:
1) commonly used SEM-based methods (Muthen, 1994; Bentler, 1980; Joreskog, 1970) do not readily
extend to quantiles of an outcome; and 2) even when quantile mediation is possible (Imai et al., 2010),
one is unable to fit mediation models for more than one mediator simultaneously, when mediators are
measured with error, or there exists an underlying multilevel or longitudinal data structure. More
importantly, these models allow one to assess complex mediational relationships at more meaningful
parts of an outcome distribution, thus allowing the relaxation of the common (and often unreasonable)
assumption that individuals at different parts of the outcome distribution should respond the exact same
way – that is, it allows for the explicit examination of Doksum‟s “proneness property” (1974).
Of course, there are important models that this dissertation does not cover. If one is interested in
modeling the outcome as a latent variable, the methods in the preceding sections clearly do not apply. A
related limitation is that they do not allow latent mediator variables to be modeled as quantile-specific
either. For multilevel or longitudinal data, the multilevel model described above does not allow for the
incorporation of latent variables, though such a model could easily be devised by combining the models
in sections 3.4 and 4.2. The important issue of Moderation (Baron and Kenny, 1986) is also not covered,
nor is the issue of allowing for mediator- and treatment-by-time interactions. The sections that follow
provide a more detailed discussion of the key findings and limitations of the methods detailed in the
above sections, and conclude by summarizing the resulting directions that future work in quantile
mediation ought to take.
101
6.1 Key Contributions
Section 2 of this dissertation has demonstrated that the decomposition of effects property
for
conditional mean mediation models appears to hold for conditional quantiles, as long as the model is fully
recursive, assuming no moderation (Baron and Kenny, 1986), and there is no confounding bias. Of the
four possible methods for estimating quantile indirect effects, it appears that both the quantile analogue of
the Baron and Kenny approach (section 2) and the single mediator Bayesian quantile mediation model
(section 3) align with the QCME estimator, while the fitted value and control function approaches
suggested in the economics literature (Amemiya, 1982; S. Lee, 2007; Ma and Koenker, 2006; Powell,
1983) appear to be poorly suited to estimating quantile mediation. The fitted value method was not
explicitly developed to deal with quantiles other than the median, and so its suitability for correctly
estimating the quantile-specific parameters of a mediation model has not been established. Nevertheless,
it is similar in spirit to the Causal Steps and QCME approaches, since all three make the commonly
employed recursive assumption. The simulation results further corroborate this point. Thus, the
differences between the fitted value, and Causal Steps and QCME, methods for the HP analysis are likely
due to some unmeasured confounding in the mediator model, and the fact that important mediators that
are correlated with the NEWS were not included in those models.
Though no formal proof is given, our simulation results suggest that the quantile analogue of the
Causal Steps method is also equivalent to the QCME approach. This indicates that researchers who are
more familiar with the Causal Steps method may follow the same paradigm to assess quantile mediation,
but do as is suggested in the methods section and instead conduct quantile regression on the model for the
outcome variable. We further recommend testing quantile mediation hypotheses using the PRODCLIN
method (MacKinnon and Fritz, 2007), since it achieves (asymptotically) the nominal type I error for
extreme quantiles and the other tests do not. Finally, we suggest that researchers examine how the
indirect effects change across the outcome and mediator distributions (e.g. Figures 3 and 4), as this results
102
in a clearer picture of how indirect effects can vary across the full range of individuals, as opposed to the
hypothetical “average” individual about whom researchers often make claims.
Section 3 generalized the quantile mediation model to the context of multiple mediators along the
causal path. Although the QCME approach allows for multiple mediators, outcomes and treatments (Imai,
2010), it fails to account for the correlation between those mediators because it fits models for each
mediator separately. The generalization of the quantile mediation model to allow for multiple correlated
mediators is an important advance, as illustrated in table 5.7. Nevertheless, the simulation results from
sections 2 and 3 indicate that a simple extension, to conditional quantiles, of the product of coefficients
approach embodied in conventional SEM-based techniques results in estimates of the indirect effect that
are nearly identical to those resulting from the QCME approach. While the two differ in their estimates
of the total effect, they appear to be equivalent for the estimation of quantile mediation. Moreover, the
assessment of the total effect absent any possible mediators is unnecessary if one‟s purpose is to assess
mediation (Kenny, 1998), and redundant when using the QCME method (Imai et al., 2010).
As the analysis of the HP data suggests, considering quantile mediation is very important for
continuous health outcomes such as BMI or Waist Circumference, because it allows for the examination
of mediation effects at different parts of the outcome distribution. The quantile trends evident in the
tables and figures above demonstrate that greater effects of improvements in the built environment can be
gained for those who are overweight or obese, compared to normal weight individuals, echoing Doksum‟s
original discussion of quantile treatment effects (Doksum, 1974). However, Kim and Muller (2004)
demonstrated that (under suitable conditions) the fitted value model results in biased estimates of β(η),
and recommend estimating a conditional quantile for each stage of the model, using the same quantile for
both stages. To assess whether this might be problematic, we also estimated the indirect effect of
Walkability for various quantiles of both Walkability and BMI. The left and right panels of Figure 5.6
display estimates of the indirect effect estimated using the Causal Steps Method, and using quantile
regression in both stages with different combinations of quantiles for Walkability (tau2) and BMI (tau1),
respectively. Figure 5.6 clearly demonstrates that the largest quantile-indirect effects occur near the center
103
of the distribution of Walkability. Moreover, if we average over tau2 for a given cross-section of the tau1
axis in the right-hand figure (say, the 85
th
percentile), the resulting estimate of the indirect effect is about -
1.02, which is nearly identical to the estimate of -0.99 resulting from using least squares to estimate α.
We find that the same holds true for other quantiles of BMI, as well as for quantiles of waist
circumference.
Also covered in this dissertation are models which incorporate latent variables (section 3) and
multilevel data structures (section 4). The simulation results of section 4 clearly indicate the perils of
assuming independence of sample observations in the presence of multilevel data, and reflect many of the
points made by Maxwell et al. (2007; 2011). Analysis results from section 5.4 further indicates that
failing to account for multilevel structure in the data may result in biased and inefficient estimates of
indirect effects at any outcome quantile of interest. Collectively, these models significantly extend the
ability to conduct mediation analysis on quantiles of the outcome, for a wider variety of data structures
(latent variables or multilevel data) than allowed by the QCME (Imai et al., 2010), or other commonly
used causal mediation approaches (e.g. VanderWeele, 2009).
6.2 Key Limitations
A clear limitation of the methods described above is that it does not allow for modeling the
outcome as a latent variable. Though it is reasonable to assume that one ought to be able to apply the
latent variable model of section 3.4, a great deal of thinking needs to go into whether and how these
models might be combined. For instance, if one wanted to model quantiles of a latent variable “body
composition” using BMI and waist circumference as observable indicators, could one model only the
latent variable as quantile-specific using the ALD, or would one have to model the indicators (as
functions of “body composition”) as quantile-specific as well? One might also ask what interpretations
might be ascribed to the parameters of either type of model. SEMs are also often concerned with model
fit, and this ought to be asked of such models as well. Such questions are important, and the implications
of the resulting types of models would need to be assessed carefully before developing any such
104
extensions. A related issue is that the indicators of the latent mediators were assumed to be continuous,
but often such variables will have an ordered categorical structure, which would require a different
modeling approach as detailed in section 6.3 below.
Nonlinearity in the direct and indirect effects has also not been covered here, nor has the issue of
moderation. Indeed one may reasonably posit that the mediating effect of perceived neighborhood
walkability may actually depend on whether an individual was randomized into the treatment group, and
such problems are dealt with in Imai et al‟s QCME approach (2010). One may also be interested in
nonlinear effects of mediators, which ought to be easily accommodated with the incorporation of
nonlinear terms (e.g. splines, etc.) into the models described above. Nonlinear models, such as logistic or
probit models, are beyond the scope of this dissertation, and may bear a substantially different
interpretation from quantile indirect effects. Quantile regression methods could also be used for another
related problem, when the usual assumptions of normality and homoscedasticity of the error distribution
are violated. However, this paper is concerned with assessing mediation across the whole response
distribution, and robust methods for mediation (Yuan and MacKinnon, 2013) serve a rather different
purpose than what is discussed here. A related issue is how to deal with ties when computing the
quantile-specific parameters (Wilcox, 2012), and the work presented in this dissertation completely
ignores that possibility.
The multilevel model of section 4 does not allow for examination of the important issues of
treatment- and mediator-by-time interactions, as simple extension of product of coefficient methods
would clearly not apply (Imai et al., 2010). Also, quantile latent growth curve models, which combine the
methods from sections 3.4 and 4.2, are not covered by the methods in this dissertation. However, as with
the treatment- and mediator-by-time issues, this extension would require further thought before such a
model could be developed. Entirely different approaches for operationalizing mediation would clearly be
required for those situations. One possibility is suggested in section 6.3 below to address these
limitations. Other areas of future research are also outlined, and some suggestions for possible directions
are given.
105
6.3 Future Work
A quantile latent growth curve model ought to be possible, by enriching the model defined by
equations 4.1-4.4 with the latent variable model of equations 3.15-3.16. It is likely that autoregressive
models would have to be accommodated for this to be possible, as well as models that allow for modeling
time-by-treatment and time-by-mediator effects. This latter problem is particularly challenging with
respect to identifiability of direct and indirect effects, as discussed below. Another challenging issue is
how to interpret parameters corresponding to lagged outcome or mediator values. For a model like what
Bollen and Curran present (2004), for example, it would be unclear how to interpret the autoregressive
parameters for an individual who moves from one quantile to another over two waves of measurement.
Would one interpret it as a rate of change from one quantile to another? Unlike models for the mean, for
which such parameters simply imply autocorrelation between repeated outcome or mediator
measurements, in the quantile setting one would need to seriously think through how to construct the
models so that the autoregressive parameters represent autocorrelation, and not something else (like rate
of change between quantiles over time).
However, the structure of the model defined by equations 4.5-4.9 could allow one to model time
in a multilevel fashion. One could augment equations 4.7-4.8, by modeling the random intercepts
explicitly as a function of time, as in equations 6.1 and 6.2. Such a model may even allow one to
overlook the observation depicted in figure 5.7, where time does not appear to have an effect at any
quantile of interest. Augmenting the model using equations 6.1 and 6.2 would explicitly allow the effect
of time to vary between subjects, and the resulting estimates of δ
1
and θ
1
may actually end up being
statistically significant. This extension is an important area of future work, as other covariates that occur
at higher levels of data aggregation (such as schools, clinics, or communities) could be included in a
similar fashion, thus allowing the explicit separation of effects occurring at different levels. This was
indeed the case in the Healthy PLACES study, in which data could be further aggregated by schools or
neighborhoods. The issue of Moderation could possibly be dealt with in this same manner as well, by
106
replacing the time variable T
i
in equation 6.1 with a treatment indicator, though Moderation is typically
not dealt with in this fashion.
(6.1) 𝛼 0𝑖 ~𝑁 (𝛿 0
+ 𝛿 1
𝑇 𝑖 , 𝜍 𝛼 0
2
)
(6.2) 𝛽 0𝑖 ~𝑁 (𝜃 0
+ 𝜃 1
𝑇 𝑖 , 𝜍 𝛽 0
2
)
The issue of treatment-by-mediator interactions, otherwise known as Moderation (Baron and
Kenny, 1986), may also arise when dealing with quantile mediation models. However, product of
coefficient methods do not apply when such interactions are present, even for the multilevel set up
suggested in the previous paragraph (Bauer et al., 2006). An entirely new way to operationalize quantile
mediation is necessary for such an issue to be addressed. Pearl (2012) suggests a way forward in this
regard, by first noting that the usual no treatment-by-mediator interaction, along with an application of his
“do-calculus”, allows one to arrive at the usual effect decomposition resulting in the indirect effects being
defined as usual. He goes on to show that allowing for a treatment-by-mediator interaction results in the
necessity to define two separate indirect effects. In other words, “the portion of output change for which
mediation would be sufficient is αβ, while the portion for which mediation would be necessary is αβ+αδ”,
where δ is the parameter that corresponds to the X-by-M interaction term not present in equation 1.1.
This same line of reasoning could be applied by defining the quantile indirect effect in the same
manner as the QCME (Imai et al., 2010, p. 10). However, such an extension does not clearly
accommodate latent variables, modeling multiple mediators simultaneously, or multilevel data structures.
In fact, causal mediation techniques are still in the very early stages of development. A cursory reading
of the abstracts for the essays encompassing the recently released Handbook of Causal Analysis for Social
Research (Morgan, 2013), clearly demonstrates the lively ongoing debates not only about to best conduct
causal mediation analysis, but also the more fundamental issues of how to even define and identify them.
While Imai et al. (2010) provide a path forward, even the seemingly simple task of extending quantile
mediation analysis to accommodate moderation hypotheses proves extremely challenging. Indeed an
entire dissertation could be focused specifically on that issue alone!
107
The latent variable approach described above (sections 3.4 and 5.5) assumes that the observable
indicators are continuous manifest variables, and they were forced to be continuous in section 5.5 by
analyzing them on the z-score scale. However, the indicators in the NEWS are actually ordered
categorical variables (on a 4-point scale), and so unless the distribution of the four levels appears to be
Normal one cannot simply treat them as continuous and use a Normal distribution to quantify their
relationship with the latent variable Walkability. Lee et al. (2005) describe an EM algorithm (Dempster,
Laird and Rubin, 1977) for handling latent variables whose measurable indicators are ordered categorical
variables. They treat the latent variables as missing data, and then at each step of the algorithm they
generate a sample from the conditional distribution of the latent variables (given the observed indicators
m
i
) using Gibbs sampling. Estimates of the latent factor scores are then given by the sample mean from
the last E-step, which yields the maximum likelihood estimate of the model parameters. Burgette and
Reiter (2011) develop an analogous approach using an ordered probit model (Albert and Chib, 1993),
with the two lowest cutoffs set to fixed values to identify the other parameters. Extending such models to
the latent variable approach described in sections 3.4 and 5.5 represent another important area for future
research. One could also address the issue of allowing BMI and waist circumference to be observed
indicators of a latent “obesity” variable, but it remains unclear whether and how one could interpret factor
loadings corresponding to such a variable if one was interested in quantiles of “obesity”, rather than the
mean.
Model fit is also an important issue when dealing with mediation models, especially those
involving latent variables, as it allows for the testing of the causal assumptions encoded in diagrams like
figure 1.1 (Bollen, 1989; Bollen et al., 2013). Lee (2007) suggests possible ways forward for assessing
model fit. For Bayesian models, as in sections 3 and 4, the Bayes factor is a commonly used measure for
model comparison and assessing model fit. In summary, for two models M
1
and M
2
, one can assess their
relative fit with the Bayes Factor defined as P(Data|M
1
)/P(Data|M
2
), where each quantity can be obtained
analytically by integrating over the parameter space for each model. In many cases, however, such a
computation is intractable. Lee overcomes this limitation by applying path sampling to the Bayesian
108
SEM (2007, pp.115-126). It stands to reason that a similar approach can be taken to assess model fit, and
perform model comparison, for the methods in this dissertation as well. Indeed, this would be an
important next step for this research, as it would provide a formal framework for deciding between, say,
models with and without latent mediator variables.
Another issue that warrants consideration was raised by Kim and Muller (2004), who
demonstrated that two-stage estimation of β
1
(η) may be biased when estimation of the model for M is
based on least squares. Figure 5.8 shows an example of this from the Healthy Places study, where the
right panel showed that the indirect effect at extreme quantiles was rather different at different quantiles
of perceived walkability and BMI, and the left panel assumed that the quantile indirect effects were
instead constant (or homogeneous) across the distribution of walkability. We were able to obtain similar
estimates of the quantile indirect effect when averaging the indirect effects over the distribution of
walkability, as discussed above in 6.2, but this will not be the case for every mediator. In other situations,
the indirect effect should be allowed to vary over the distributions of both the mediator and the outcome.
Otherwise, one might encounter the very situation Kim and Muller described (2004). In principle,
averaging over the distribution of the mediator (or any other covariates for that matter) is similar to
Pearl‟s formula for the indirect effect (2012, p. 7), which sums over the probabilistic difference in
potential outcomes over the mediator distribution. This is another area of future research that would
further benefit our understanding of mediational relationships beyond the average (over the mediator)
quantile indirect effects.
Other related methods such as Latent Class Analysis (McCutcheon, 1987) could benefit from
extensions to the quantile setting. Though it might be difficult to interpret the parameters of such a model,
longitudinal methods which allow for autoregressive model structures can also provide additional insights
into mediational relationships for outcome quantiles. Some quantile regression methods already exist for
dealing with autoregressive models (Koenker, 2005; Wei et al., 2006), and more thought could go into
extending these techniques to the mediation setting as well. On a similar note, instrumental variables
methods have been developed for simultaneous equations models (e.g. Chernozukhov, 2005), and such
109
techniques have also seen use in mediation analysis. Methods which attempt to combine across these
traditions may also be worth pursuing. Finally, the issue of quantile crossing (He, 1997; Tokday and
Kadane, 2011) is an important one, as the assessment of mediational patterns across the outcome
distribution ought to be monotonic.
110
References
Abrevaya, J. (2001). The effects of demographics and maternal behavior on the distribution of birth
outcomes. Empirical economics 26, 247-257.
Adams, M. A., Ryan, S., Kerr, J., et al. (2009). Validation of the Neighborhood Environment Walkability
Scale (NEWS) Items Using Geographic Information Systems. Journal of Physical Activity &
Health 6, S113-S123.
Albert, J. H., and Chib, S. (1993). Bayesian Analysis of Binary and Polychotomous Response Data.
Journal of the American Statistical Association 88, 669-679.
Amemiya, T. (1982). Two Stage Least Absolute Deviations Estimators. Econometrica 50, 689-711.
Anscombe, F. J., and Guttman, I. (1960). Rejection of Outliers. Technometrics 2, 123-147.
Baron, R. M., and Kenny, D. A. (1986). The moderator–mediator variable distinction in social
psychological research: Conceptual, strategic, and statistical considerations. Journal of
Personality and Social Psychology 51, 1173-1182.
Bentler, P. M. (1980). Linear structural equations with latent variables. Psychometrika 45, 289-308.
Beyerlein, A., Toschke, A. M., and von Kries, R. (2010). Risk factors for childhood overweight: shift of
the mean body mass index and shift of the upper percentiles: results from a cross-sectional study.
Int J Obes 34, 642-648.
Blundell, R., and Powell, J. L. (2007). Censored regression quantiles with endogenous regressors. Journal
of Econometrics 141, 65-83.
Bollen, K. A. (1989). Structural equations with latent variables. New York: Wiley.
Bollen, K. A., and Curran, P. J. (2004). Autoregressive latent trajectory (ALT) models a synthesis of two
traditions. Sociological Methods & Research 32, 336-383.
Burgette, L. F., and Reiter, J. P. (2012). Modeling Adverse Birth Outcomes via Confirmatory Factor
Quantile Regression. Biometrics 68, 92-100.
111
Burgette, L. F., Reiter, J. P., and Miranda, M. L. (2011). Exploratory quantile regression with many
covariates: an application to adverse birth outcomes. Epidemiology 22, 859-866.
Burt, R. S. (1984). Network items and the general social survey. Social Networks 6, 293-339.
Casella, G., and Berger, R. L. (2002). Statistical inference, 2nd edition. Australia ; Pacific Grove, CA:
Thomson Learning.
Chernozhukov, V., and Fernández-Val, I. (2011). Inference for Extremal Conditional Quantile Models,
with an Application to Market and Birthweight Risks. The Review of Economic Studies 78, 559-
589.
Chernozhukov, V., and Hansen, C. (2005). An IV Model of Quantile Treatment Effects. Econometrica
73, 245-261.
Chernozhukov, V., and Hansen, C. (2006). Instrumental quantile regression inference for structural and
treatment effect models. Journal of Econometrics 132, 491-525.
Chesher, A. (2003). Identification in Nonseparable Models. Econometrica 71, 1405-1441.
Chib, S., and Greenberg, E. (1995). Understanding the Metropolis-Hastings Algorithm. The American
Statistician 49, 327-335.
Chou, C. P., Bentler, P. M., and Pentz, M. A. (1998). Comparisons of Two Statistical Approaches to
Study Growth Curves: The Multilevel Model and the Latent Curve Analysis. Structural Equation
Modeling-a Multidisciplinary Journal 5, 247-266.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences, 2nd edition. Hillsdale, N.J.: L.
Erlbaum Associates.
Cole, D. A., and Maxwell, S. E. (2003). Testing mediational models with longitudinal data: questions and
tips in the use of structural equation modeling. J Abnorm Psychol 112, 558-577.
Cole, T. J. (1988). Fitting Smoothed Centile Curves to Reference Data. Journal of the Royal Statistical
Society. Series A (Statistics in Society) 151, 385-418.
Conger, A. J. (1974). A Revised Definition for Suppressor Variables: a Guide To Their Identification and
Interpretation. Educational and Psychological Measurement 34, 35-46.
112
Courneya, K. S., and McAuley, E. (1995). Cognitive mediators of the social influence-exercise adherence
relationship: A test of the theory of planned behavior. Journal of Behavioral Medicine 18, 499-
515.
Curran, P. J. (2003). Have multilevel models been structural equation models all along? Multivariate
Behavioral Research 38, 529-568.
David, H. A. (1981). Order statistics, 2d edition. New York: Wiley.
Doksum, K. (1974). Empirical Probability Plots and Statistical Inference for Nonlinear Models in the
Two-Sample Case. The Annals of Statistics 2, 267-277.
Duncan, O. D. (1966). Path Analysis: Sociological Examples. American Journal of Sociology 72, 1-16.
Dunton, G., McConnell, R., Jerrett, M., et al. (In Press). Organized Physical Activity in Young School
Children Predicts Subsequent 4-Year Change in Body Mass Index. . Archives of Pediatrics and
Adolescent Medicine.
Dunton G, M. R. J. M., and et al. (2012). Organized physical activity in young school children and
subsequent 4-year change in body mass index. Archives of Pediatrics & Adolescent Medicine
166, 713-718.
Dunton, G. F., Intille, S. S., Wolch, J., and Pentz, M. A. (2012). Investigating the impact of a smart
growth community on the contexts of children's physical activity using Ecological Momentary
Assessment. Health & Place 18, 76-84.
Dunton, G. F., Kaplan, J., Wolch, J., Jerrett, M., and Reynolds, K. D. (2009). Physical environmental
correlates of childhood obesity: a systematic review. Obesity Reviews 10, 393-402.
Durand, C. P., Andalib, M., Dunton, G. F., Wolch, J., and Pentz, M. A. (2011). A systematic review of
built environment factors related to physical activity and obesity risk: implications for smart
growth urban planning. Obesity Reviews 12, e173-e182.
Ewing, R., Schmid, T., Killingsworth, R., Zlot, A., and Raudenbush, S. (2003). Relationship Between
Urban Sprawl and Physical Activity, Obesity, and Morbidity American Journal of Health
Promotion, 47-57.
113
Fox, J. (1979). Simultaneous Equation Models and Two-Stage Least Squares. Sociological Methodology
10, 130-150.
Fox, J. (1980). Effect Analysis in Structural Equation Models - Extensions and Simplified Methods of
Computation. Sociological Methods & Research 9, 3-28.
Fox, J. (1985). Effect Analysis in Structural-Equation Models .2. Calculation of Specific Indirect Effects.
Sociological Methods & Research 14, 81-95.
Fritz, M. S., and MacKinnon, D. P. (2007). Required Sample Size to Detect the Mediated Effect.
Psychological Science 18, 233-239.
Gattshall, M. L. (2008). Validation of a survey instrument to assess home environments for physical
activity and healthy eating in overweight children. The international journal of behavioral
nutrition and physical activity 5, 3.
Gelfand, A. E., and Smith, A. F. M. (1990). Sampling-Based Approaches to Calculating Marginal
Densities. Journal of the American Statistical Association 85, 398-409.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (2004). Bayesian Data Analysis, Second edition.
Boca Raton, FL: Chapman and Hall/CRC.
Geman, S., and Geman, D. (1984). Stochastic Relaxation, Gibbs Distributions, and the Bayesian
Restoration of Images. Ieee Transactions on Pattern Analysis and Machine Intelligence 6, 721-
741.
Geraci, M., and Bottai, M. (2007). Quantile regression for longitudinal data using the asymmetric Laplace
distribution. Biostatistics 8, 140-154.
Goodman, L. A. (1960). On the Exact Variance of Products. Journal of the American Statistical
Association 55, 708-713.
Hox, J. J., and Maas, C. J. M. (2001). The Accuracy of Multilevel Structural Equation Modeling With
Pseudobalanced Groups and Small Samples. Structural Equation Modeling-a Multidisciplinary
Journal 8, 157-174.
114
Huber, P. J. (1972). The 1972 Wald Lecture Robust Statistics: A Review. The Annals of Mathematical
Statistics 43, 1041-1067.
Imai, K. (2010). Advances in Social Science Research Using R Causal Mediation Analysis Using R. 196,
129-154.
Imai, K., Keele, L., and Tingley, D. (2010). A general approach to causal mediation analysis.
Psychological Methods 15, 309-334.
Imai, K., and Tingley, D. (2012). A Statistical Method for Empirical Testing of Competing Theories.
American Journal of Political Science 56, 218-236.
Jerrett, M., McConnell, R., Chang, C. C. R., et al. (2010). Automobile traffic around the home and
attained body mass index: A longitudinal cohort study of children aged 10–18 years. Preventive
Medicine 50, Supplement, S50-S58.
Johnson, R. A., and Wichern, D. W. (2007). Applied multivariate statistical analysis, 6th edition. Upper
Saddle River, N.J.: Prentice Hall.
Johnston, J. (1984). Econometric methods, 3rd edition. New York: McGraw-Hill.
Judd, C. M., and Kenny, D. A. (1981). Process Analysis - Estimating Mediation in Treatment Evaluations.
Evaluation Review 5, 602-619.
Keesling, J. W. (1972). Maximum likelihood approaches to causal analysis. Unpublished dissertation,
University of Chicago.
Kenny, D. A. (1998). Data analysis in social psychology. Handbook of Social Psychology 1, 233.
Kenny, D. A., Korchmaros, J. D., and Bolger, N. (2003). Lower level mediation in multilevel models.
Psychol Methods 8, 115-128.
Kim, T.-H., and Muller, C. (2004). Two-stage quantile regression when the first stage is based on quantile
regression. Econometrics Journal 7, 218-231.
Kimm, S. Y. S., Glynn, N. W., Obarzanek, E., et al. (2005). Relation between the changes in physical
activity and body-mass index during adolescence: a multicentre longitudinal study. Lancet 366,
301-307.
115
Koenker, R. (2005). Quantile regression. Cambridge ; New York: Cambridge University Press.
Koenker, R., and Bassett, G., Jr. (1978). Regression Quantiles. Econometrica 46, 33-50.
Kozumi, H., and Kobayashi, G. (2011). Gibbs sampling methods for Bayesian quantile regression.
Journal of Statistical Computation and Simulation 81, 1565-1578.
Krull, J. L., and MacKinnon, D. P. (1999). Multilevel mediation modeling in group-based intervention
studies. Evaluation Review 23, 418-444.
Krull, J. L., and MacKinnon, D. P. (2001). Multilevel modeling of individual and group level mediated
effects. Multivariate Behavioral Research 36, 249-277.
Laird, N. M., and Ware, J. H. (1982). Random-effects models for longitudinal data. Biometrics 38, 963-
974.
Lee, S.-Y. (2007a). Structural equation modeling : a Bayesian approach. Chichester, England ; Hoboken,
NJ: Wiley.
Lee, S.-Y., Song, X.-Y., Skevington, S., and Hao, Y.-T. (2005). Application of Structural Equation
Models to Quality of Life. Structural Equation Modeling: A Multidisciplinary Journal 12, 435-
453.
Lee, S. (2007b). Endogeneity in quantile regression models: A control function approach. Journal of
Econometrics 141, 1131-1158.
Lee, S. Y. (2002). Bayesian selection on the number of factors in a factor analysis model.
Behaviormetrika 29, 23.
Ludtke, O., Marsh, H. W., Robitzsch, A., Trautwein, U., Asparouhov, T., and Muthen, B. (2008). The
multilevel latent covariate model: a new, more reliable approach to group-level effects in
contextual studies. Psychol Methods 13, 203-229.
Luo, Y. X., Lian, H., and Tian, M. Z. (2012). Bayesian quantile regression for longitudinal data models.
Journal of Statistical Computation and Simulation 82, 1635-1649.
116
Lytle, L., Hearst, M., Fulkerson, J., et al. (2011). Examining the Relationships Between Family Meal
Practices, Family Stressors, and the Weight of Youth in the Family. Annals of Behavioral
Medicine 41, 353-362.
Ma, L., and Koenker, R. (2006). Quantile regression methods for recursive structural equation models.
Journal of Econometrics 134, 471-506.
MacKinnon, D. P. (2000). Equivalence of the mediation, confounding and suppression effect. Prevention
science 1, 173.
MacKinnon, D. P. (2008). Introduction to statistical mediation analysis. New York: Lawrence Erlbaum
Associates.
MacKinnon, D. P., and Dwyer, J. J. (1993). Estimating mediated effects in prevention studies. Evaluation
Review 17, 144-158.
MacKinnon, D. P., and Fritz, M. S. (2007). Distribution of the product confidence limits for the indirect
effect: Program PRODCLIN. Behavior Research Methods 39, 384-389.
MacKinnon, D. P., Lockwood, C. M., Hoffman, J. M., West, S. G., and Sheets, V. (2002). A comparison
of methods to test mediation and other intervening variable effects. Psychological Methods 7, 83-
104.
MacKinnon, D. P., Lockwood, C. M., and Williams, J. (2004). Confidence Limits for the Indirect Effect:
Distribution of the Product and Resampling Methods. Multivariate Behavioral Research 39, 99-
128.
Mackinnon, D. P., Warsi, G., and Dwyer, J. H. (1995). A Simulation Study of Mediated Effect Measures.
Multivariate Behavioral Research 30, 41-62.
Maxwell, S. E., and Cole, D. A. (2007). Bias in cross-sectional analyses of longitudinal mediation.
Psychol Methods 12, 23-44.
Maxwell, S. E., Cole, D. A., and Mitchell, M. A. (2011). Bias in Cross-Sectional Analyses of
Longitudinal Mediation: Partial and Complete Mediation Under an Autoregressive Model.
Multivariate Behavioral Research 46, 816-841.
117
Meredith, W., and Tisak, J. (1990). Latent Curve Analysis. Psychometrika 55, 107-122.
Montgomery, D. C. (2013). Design and analysis of experiments, Eighth edition. edition. Hoboken, NJ:
John Wiley & Sons, Inc.
Morgan, S. L. (2013). Handbook of causal analysis for social research. New York: Springer.
Motl, R. W. (2000). Factorial validity and invariance of questionnaires measuring social-cognitive
determinants of physical activity among adolescent girls. Preventive Medicine 31, 584.
Muthen, B., and Asparouhov, T. (2008). Growth mixture modeling: Analysis with non-Gaussian random
effects. In Longitudinal data analysis, G. Fitzmaurice, M. Davidian, G. Verbeke, and G.
Molenberghs (eds), 143-165. Boca Raton, FL: Chapman and Hall/CRC.
Muthen, B. O. (1994). Multilevel Covariance Structure-Analysis. Sociological Methods & Research 22,
376-398.
Newey, W. K., Powell, J. L., and Vella, F. (1999). Nonparametric estimation of triangular simultaneous
equations models. Econometrica 67, 565-603.
Ogden, C. L., Carroll, M. D., Curtin, L. R., Lamb, M. M., and Flegal, K. M. (2010). Prevalence of High
Body Mass Index in US Children and Adolescents, 2007-2008. JAMA: The Journal of the
American Medical Association 303, 242-249.
Pearl, J. (1998). Graphs, Causality, and Structural Equation Models. Sociological Methods & Research
27, 226-284.
Pearl, J. (2012). The Causal Mediation Formula-A Guide to the Assessment of Pathways and
Mechanisms. Prevention science 13, 426-436.
Pentz, M. A., Dunton, G., Wolch, J., et al. (2010). Design and Methods of the Healthy Places Trial: A
Study of the Effects of Smart Growth Planning Principles on Family Obesity Prevention. Annals
of Behavioral Medicine 39, 42-42.
Pentz, M. A., Jerrett, M., Spruijt-Metz, D., Wolch, J., Valente, T., and Chou, C.-P. (Submitted). Smart
growth community planning and prevention of family obesity risk: Design and methods of the
Healthy PLACES Trial, 2012. Multivariate Behavioral Research.
118
Pinheiro, J., Bates, D., Saikat, D., Sarkar, D., and Team, R. C. D. (2013). nlme: Linear and Nonlinear
Mixed Effects Models. R package version 3.1-111.
Powell, J. L. (1983). The Asymptotic Normality of Two-Stage Least Absolute Deviations Estimators.
Econometrica 51, 1569-1575.
Preacher, K. J., Zyphur, M. J., and Zhang, Z. (2010). A general multilevel SEM framework for assessing
multilevel mediation. Psychol Methods 15, 209-233.
Raudenbush, S. W., and Bryk, A. S. (2002). Hierarchical linear models : applications and data analysis
methods, 2nd edition. Thousand Oaks: Sage Publications.
Reiersøl, O. (1941). Confluence Analysis by Means of Lag Moments and Other Methods of Confluence
Analysis. Econometrica 9, 1-24.
Robins, J. M., and Greenland, S. (1992). Identifiability and exchangeability for direct and indirect effects.
Epidemiology 3, 143-155.
Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of
the American Statistical Association 100, 322-331.
Rundle, A., Neckerman, K. M., Freeman, L., et al. (2009). Neighborhood Food Environment and
Walkability Predict Obesity in New York City. Environmental Health Perspectives 117, 442-447.
Schembre, S. M., Wen, C. K., Davis, J. N., Shen, E., et al. (2013). Eating breakfast more frequently is
cross-sectionally associated with greater physical activity and lower levels of adiposity in
overweight Latina and African American girls. Am J Clin Nutr 98, 275-281.
Shen, E., Chou, C.-P., Pentz, M. A., and Berhane, K. (Submitted-a). Quantile Mediation Models:
Methods for Assessing Mediation Across the Outcome Distribution. Multivariate Behavioral
Research.
Shen, E., Chou, C.-P., Pentz, M. A., and Berhane, K. (Submitted-b). Bayesian Quantile Mediation
Models. Statistics in Medicine.
Shi, J.-Q., and Lee, S.-Y. (2000). Latent Variable Models with Mixed Continuous and Polytomous Data.
Journal of the Royal Statistical Society. Series B (Statistical Methodology) 62, 77-87.
119
Silverman, B. W. (1986). Density estimation for statistics and data analysis. London ; New York:
Chapman and Hall.
Sobel, M. E. (1982). Asymptotic Confidence Intervals for Indirect Effects in Structural Equation Models.
Sociological Methodology 13, 290-312.
Sobel, M. E. (1987). Direct and Indirect Effects in Linear Structural Equation Models. Sociological
Methods & Research 16, 155-176.
Song, X.-Y., and Lee, S.-Y. (2001). Bayesian estimation and test for factor analysis model with
continuous and polytomous data in several populations. British Journal of Mathematical and
Statistical Psychology 54, 237-263.
Song, X.-Y., Lee, S.-Y., and Hser, Y.-I. (2008). A two-level structural equation model approach for
analyzing multivariate longitudinal responses. Statistics in Medicine 27, 3017-3041.
Song, X.-Y., Lu, Z.-H., Hser, Y.-I., and Lee, S.-Y. (2011). A Bayesian Approach for Analyzing
Longitudinal Structural Equation Models. Structural Equation Modeling: A Multidisciplinary
Journal 18, 183-194.
Tian, M. Z., and Chen, G. M. (2006). Hierarchical linear regression models for conditional quantiles.
Science in China Series a-Mathematics 49, 1800-1815.
Tofighi, D. (2011). RMediation: An R package for mediation analysis confidence intervals. Behavior
Research Methods 43, 692-700.
Tukey, J. W. (1965). Which part of the sample contains the information? Proceedings of the National
Academy of Sciences of the United States of America 53, 127.
Van Dyck, D., Cerin, E., Cardon, G., et al. (2010). Physical activity as a mediator of the associations
between neighborhood walkability and adiposity in Belgian adults. Health & Place 16, 952-
960.
VanderWeele, T. J. (2009). Marginal structural models for the estimation of direct and indirect effects.
Epidemiology 20, 18-26.
120
VanderWeele, T. J., Valeri, L., and Ogburn, E. L. (2012). The role of measurement error and
misclassification in mediation analysis: mediation and measurement error. Epidemiology 23, 561-
564.
Wang, H. J., Zhu, Z. Y., and Zhou, J. H. (2009). Quantile Regression in Partially Linear Varying
Coefficient Models. Annals of Statistics 37, 3841-3866.
Wang, J. (2012). Bayesian quantile regression for parametric nonlinear mixed effects models. Statistical
Methods and Applications 21, 279-295.
Wei, Y., Pere, A., Koenker, R., and He, X. (2006). Quantile regression methods for reference growth
charts. Statistics in Medicine 25, 1369-1382.
White, J., and Jago, R. (2012). Prospective Associations Between Physical Activity and Obesity Among
Adolescent GirlsRacial Differences and Implications for PreventionPhysical Activity and Obesity
Among Girls. Archives of Pediatrics & Adolescent Medicine 166, 522-527.
Wiley, D. E. (1973). The identification problem for structural equation models with unmeasured
variables. In Structural equation models in the social sciences, A. S. Goldberger and O. D.
Duncan (eds), 69-83. New York, NY: Seminar Press.
Wolch, J., Jerrett, M., Reynolds, K., et al. (2011). Childhood obesity and proximity to urban parks and
recreational resources: A longitudinal cohort study. Health & Place 17, 207-214.
Wright, S. (1920). Correlation and causation Part I. Method of path coefficients. Journal of Agricultural
Research 20, 0557-0585.
Yu, K. M., and Moyeed, R. A. (2001). Bayesian quantile regression. Statistics & Probability Letters 54,
437-447.
Yuan, Y., and MacKinnon, D. P. (2009). Bayesian mediation analysis. Psychological Methods 14, 301-
322.
Yuan, Y., and MacKinnon, D. P. (2013). Robust mediation analysis. Psychological Methods to appear.
Yuan, Y., and Yin, G. (2010). Bayesian quantile regression for longitudinal studies with nonignorable
missing data. Biometrics 66, 105-114.
121
Yue, Y. R., and Rue, H. (2011). Bayesian inference for additive mixed quantile regression models.
Computational Statistics & Data Analysis 55, 84-96.
Zeger, S. L., and Karim, M. R. (1991). Generalized Linear Models With Random Effects; A Gibbs
Sampling Approach. Journal of the American Statistical Association 86, 79-86.
122
List of Acronyms
ACME: Average Causal Mediation Effect (Imai et al., 2010)
ALD: Asymmetric Laplace Distribution
BK: Baron and Kenny
BMI: Body Mass Index
CFA: Confirmatory Factor Analysis
CSA: Covariance Structure Analysis
CV: Control Variate (Lee, 2007; Ma and Koenker, 2006)
DSQR: Dual-Stage Quantile Regression (Kim & Muller, 2004)
GIS: Geographic Information System
HPS: Healthy Places Study
LSQR: Least Squares Quantile Regression
MCMC: Markov Chain Monte Carlo
MVPA: Moderate-to-Vigorous Physical Activity
NEWS: Neighborhood Environmental Walkability Scale (Adams et al., 2009)
PA: Physical Activity
QCME: Quantile Causal Mediation Effects (Imai et al., 2010)
SEM: Structural Equation Model
WC: Waist Circumference
Asset Metadata
Creator
Shen, Ernest (author)
Core Title
Quantile mediation models: methods for assessing mediation across the outcome distribution
Contributor
Electronically uploaded by the author
(provenance)
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Publication Date
11/09/2015
Defense Date
10/16/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
mediation analysis,multilevel models,OAI-PMH Harvest,quantile regression,structural equation models
Format
application/pdf
(imt)
Language
English
Advisor
Berhane, Kiros T. (
committee chair
), Chou, Chih-Ping (
committee chair
), Conti, David V. (
committee member
), Pentz, Mary Ann (
committee member
), Wilcox, Rand R. (
committee member
)
Creator Email
theshenami@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-345038
Unique identifier
UC11295244
Identifier
etd-ShenErnest-2146.pdf (filename),usctheses-c3-345038 (legacy record id)
Legacy Identifier
etd-ShenErnest-2146.pdf
Dmrecord
345038
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Shen, Ernest
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Abstract (if available)
Abstract
Recent introduction of quantile regression methods to analysis of mediation analysis have focused primarily on multi-step methods, such as dual-stage quantile regression, or causal mediation analysis. However, the various limitations of these approaches suggest the need for more flexible methods of dealing with complex mediation models involving multiple mediators and outcomes, or latent variables. By combining methods for Bayesian mediation analysis with those of Bayesian quantile regression, mediation can be characterized for any quantile of the response distribution for a large class of mediation models that are not easily handled by the multi-step approaches. Bayesian estimation and inference techniques for quantile mediation are proposed, and compared with existing approaches to mediation analysis, through simulation studies and analyses of data from the Healthy Places Study. Existing methods for Bayesian mediation analysis are extended in this dissertation in the following important ways: 1) modeling of mediational relationships via multiple correlated mediators is allowed for arbitrary quantiles of the outcome distribution
Tags
mediation analysis
multilevel models
quantile regression
structural equation models
Linked assets
University of Southern California Dissertations and Theses