Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The design, implementation, and evaluation of accelerated longitudinal designs
(USC Thesis Other)
The design, implementation, and evaluation of accelerated longitudinal designs
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
The design, implementation, and evaluation of Accelerated Longitudinal Designs
Nicholas J. Jackson
A dissertation presented to
The Faculty of USC Graduate School
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy (PSYCHOLOGY)
University of Southern California
Los Angeles, California
August 2018
i
Table of Contents
Ac knowle d g e m e nts… ………………………………………………………………… ….. iv
Abstra c t……………… …………………………………………………………………… v
Chapter 1: Introduction …………………………………………………………… ….. 1
1.1 Research design and the intersection of development and social change ……….. 1
1.2 Research objective and aims …………………………………………………….. 7
1.3 Chapter Summary … ……………………………………………………………… 10
1.4 Chapter References ……………………………………………………………….. 11
Chapter 2: Design, cost, and attrition parameters and metrics in accelerated
longitudinal designs … ……………………………………………………..
14
2.1 Design Parameters and Performance Metrics of an ALD ……………… ………... 15
2.2 ALD Cost Equations and Cost Performance Metrics …………………… ……….. 25
2.3 Incorporating Attrition into the Cost Equations ………………………………….. 35
2.4 Chapter Summary … ……………………………………………………………… 39
2.5 Chapter References ……………………………………………………………….. 42
Chapter 3: Statistical power in accelerated designs with linear growth and no
between-cohort differences ………………………………………………..
43
3.1 Conceptualizing Age-Period-Cohort effects … …………………………………... 44
3.2 Simulating an ALD ……………………………………………………………….. 45
3.3 Statistical power of ALDs with no cohort differences ………………… ………… 50
3.4 Chapter Summary … ……………………………………………………………… 75
3.5 Chapter References …………………………………………………………… …. 77
Chapter 4: Statistical power in accelerated designs with nonlinear de-escalating
exponential growth and no between-cohort differences ………… ………
79
4.1 Simulating Nonlinear Effects ………………… ………………………………….. 80
4.2 Conceptualizing Nonlinear Models ………… ……………………………………. 81
4.3 Statistical power of nonlinear ALDs without cohort differences ……… ………… 83
4.4 Chapter Summary … ……………………………………………………………… 100
4.5 Chapter References ……………………………………………………………….. 102
Chapter 5: Bias, Estimator Efficiency, and Coverage Probability in the absence of
between-cohort differences ………………………………………………..
103
5.1 Conceptualizing bias, estimator efficiency, and coverage probability … ………… 103
5.2 Bias, Estimator Efficiency, and Coverage Probability in the ALD …… ………… 104
ii
5.3 Chapter Summary … ……………………………………………………………… 141
5.4 Chapter References ……………………………………………………………….. 144
Chapter 6: Statistical power in accelerated designs with linear growth and
in the presence of between-cohort differences … ………………………...
145
6.1 Models for age-by-cohort interaction ………… ………………………………….. 146
6.2 A note on simulating period effects ………… ……………………………………. 154
6.3 Statistical power of ALDs to detect the population fixed effect slope … ………… 155
6.4 Statistical power of ALDs to detect between-cohort variability ……… …………. 178
6.5 Chapter Summary … ……………………………………………………………… 185
6.6 Chapter References ……………………………………………………………….. 188
Chapter 7: Bias, Estimator Efficiency, and Coverage Probability in the presence
of between-cohort differences in linear accelerated designs …………….
190
7.1 Considerations for bias, efficiency, and coverage in the presence of between-
cohort differences …… ……………………………………………………………
190
7.2 Bias, efficiency, and coverage of the population slope in the presence of
between-cohort differences ………………………………………………………..
192
7.3 Bias and efficiency of the estimates for between-cohort differences … ………….. 207
7.4 Chapter Summary … ……………………………………………………………… 213
7.5 Chapter References ……………………………………………………………….. 215
Chapter 8: Age trajectories of marijuana and cigarette use in the National
Longitudinal Youth Survey (NLSY) using the ALD mixed model ……..
216
8.1 Introduction ………………………………………………………………………. 216
8.2 Methods ………………………………………………………………………….. 219
8.3 Results …………………………………………………………… ………………. 223
8.4 Discussion ………………………………………………………………………… 229
8.5 Chapter References ……………………………………………………………….. 235
8.6 Tables ……………… …………………………………………………………….. 239
8.7 Figures …………… ………………………………………………………………. 246
Chapter 9: General Discussion ………………………………… ……………………… 251
9.1 Chapter specific reflections, limitations, and future directions ………… ……….. 251
9.2 General reflections, limitations, and future directions …………………………… 259
9.3 Conclusion ……………………………………………………………………….. 264
9.4 Chapter References ……………………………………………………………….. 266
iii
Appendix A: Acronyms and Abbreviations …… …………………… ……………………. 268
Appendix B: Glossary of Terms ………………… ……………………………………….. 269
Technical Appendix: Statistical Programs ……… ……………………………………….. 270
aldesign ……………… ………………………………………………………………. 271
aldcost ………………………………………………………………………………… 275
aldsim ………………………………………………………………………………… 285
iv
Acknowledgements
There are many mentors and collaborators that have contributed to my education in
research and statistics over the years. My colleagues and friends at the University of Arizona
were responsible for my introduction to the research world where we spent many a late night
running in-home sleep studies. I owe a debt of gratitude to Iris, Amy, and Mikel for these
experiences which fostered a love for research and statistics and sent me down the path I find
myself on. My work at U of A helped me obtain a research position in the Division of Sleep
Medicine at the University of Pennsylvania where I would eventually become a biostatistician.
The researchers I worked with at Penn taught me to appreciate good statistical communication
and would constantly push the boundaries of my knowledge. The Penn Sleep Center will always
retain a special place in my heart as it is where I first considered myself a ‘real researcher’ and
more importantly where I would meet my future wife. With encouragement from my friends at
Penn I applied to doctoral programs and ended up at the University of Southern California. At
USC I’ve shared many coffees, beers, and conversations that have helped stimulate many ideas
and lead to productive collaborations. More importantly, the friendships I developed from this
graduate program have provided much needed distraction and camaraderie that continues to
enrich my life. In my time away from USC I was fortunate to work with some of the brightest
and kindest statisticians at UCLA’s Department of Medicine Statistics Core. The commitment
these statisticians have towards improving upon research methodology has been an inspiration
for me and I will be forever grateful for their continued mentoring and encouragement of my
academic endeavors. Lastly, I wish to thank my parents, partner, and friends who have supported
and endured during these many long years of educational pursuits.
In thinking about the accelerated design and cultural influences in estimates of
development, I can’t help but reflect on the times we live in. As the accelerated design captures
unknown sources of cultural change I’m reminded of that Buffalo Springfield song: “There’s
something happening here … what it is ain’t exactly clear”. This sentiment perfectly sums up the
palpable change which we can’t quite identify but which seems to be impacting the world we
experience. My hopes are that the accelerated design will add clarity to what is happening here.
v
Abstract
Longitudinal designs are the gold standard for researchers studying within-subject
changes in age-related development. These designs are typically conducted using a single cohort
followed for a fixed period of time. However, single-cohort designs often necessitate a lengthy
time commitment from participants, sponsors, and researchers which make them vulnerable to
greater attrition and even premature termination. The time commitment for these designs also
means that the results may be obsolete by the time they are published, particularly if the
outcomes under study are sensitive to generational differences. Bell (1953) proposed the use of
an Accelerated Longitudinal Design (ALD) as a means to generate age-based trajectories over a
shortened duration to combat these issues. In the ALD multiple birth-cohorts are studied
simultaneously in a longitudinal fashion with overlap in the age distributions between the
cohorts. In this manner the same age span may be studied while reducing the number of
measurements per participant, the study duration, and study costs. These designs also allow for
the modeling of between-cohort differences, which are important for researchers interested in
developing age-based trajectories that generalize to multiple cohorts. While models that
incorporate cultural influence are increasingly relevant, there has not yet been widespread
adoption of these designs. Part of the hesitancy to use ALDs stems from their unfamiliarity, as
few methodological papers have demonstrated the efficacy of these designs for studying
development. We propose the use of cost equations to utilize the cost-savings of the ALD to
determine sample sizes that are of equal cost to a single-cohort design. The use of an equal cost
sample size allows for ALDs to have N’s that are 10-85% larger than in the single-cohort design,
thereby offsetting the potential loss of power in the ALD. We subsequently utilize Monte Carlo
simulation methods to demonstrate how the statistical power and bias in the ALD is comparable
to that of the single-cohort design for both linear and nonlinear models and discuss
considerations for when between-cohort differences in development are present. Lastly, we use
data from the National Longitudinal Survey of Youth (NLSY 1997) to demonstrate the ability of
an ALD to capture both within-person and between-cohort variability in marijuana and tobacco
use from the ages of 12 to 32. We additionally discuss considerations for the modeling of cohort
membership and alternate strategies for cohort inclusion. Results from the simulations and in the
NLSY suggest that ALDs should be the preferred longitudinal design for researchers studying
age-related development.
1
Chapter 1
Introduction
In the study of age-related (maturational) development, of primary concern is the ability
to estimate age-based trajectories in order to understand how traits or measures change across the
age distribution. Developmental models have broad applicability across the biomedical and
social sciences and have been used to understand how it is that we change as we age. For
example, in the biomedical sciences, developmental approaches are used to understand how bone
density changes across the life-span or how hormone levels change in a population. Similarly, in
the social sciences, developmental models have been used to understand when cognitive decline
begins to occur as well as to understand at what ages adolescents are most at risk for substance
use. How we design research studies and implement models to estimate these trajectories can
have a large impact on the interpretation of our findings. While developmental researchers are
primarily interested in maturational change, the traditional research designs used to study this
change make assumptions about the nature of development, often ignoring or confounding the
influence of generational change. While ignoring generational change can be appropriate for
outcomes that are developmentally homogeneous across long periods of time; for knowledge or
behavior that may be influenced by a rapidly changing culture, alternatives to the traditional
methods are needed.
1.1 Research design and the intersection of development and social change
1.1.1 How social change can influence development
One such example of the intersection of development and cultural influences is from the
cognitive sciences. Early researchers of cognition noted that older participants had poorer
cognitive ability when compared to younger participants, however it was unclear if these changes
between age groups were due to normative maturational processes or alternately reflective of
generational differences (Kuhlen, 1940). Researchers postulated that perhaps it was the differing
social conditions between the generations that was masking itself as maturational changes in
cognitive ability. In order to understand how this could be, we'll examine how technology and
culture impacted generational differences in the age distribution for cognition.
2
As the world turned into the 20
th
century, novel technological innovations promoted
economic prosperity; resulting in profound shifts of the societal structure. Prior to 1850,
American society was hyper-local (Hall, 1984). Most Americans lived in rural towns of less than
2,500 people. Their work and leisure would occur within the confines of these small
communities where many would spend their entire lives. Rural living necessitated a large portion
of the day to be devoted to food production, as a result, most Americans worked in the
agricultural sector, often on small farms. With so much effort devoted to food production, there
was little time for leisurely or intellectual pursuits. Subsequently the population was uneducated
relative to today’s standards, as children were expected to work on farms rather than attend
school. As the effects of the first industrial revolution continued into a second industrial
revolution, Americans saw a restructuring of their economic life. Customers and merchants could
buy and sell goods from communities at a great distance thanks to rail-roads. The connection of
these communities through transportation systems allowed local economies to become more
hierarchically organized and develop regional specialization such that one area of the country
could provide goods for consumption in another area. This economic re-structuring, in concert
with technological advancements in farming that increased production levels, made it possible
for Americans to decrease their efforts in the pursuit of feeding themselves. Americans, now
with greater mobility, moved to larger population centers to find work in factories. From the
years 1890 to 1910, the percentage of Americans working in agriculture decreased (by ~11%) as
more people went to work in the industrial hotbeds of the time (Lebergott, 1966). The increased
productivity output brought on by technological advancements as well the ability for
manufacturers to access a large and interconnected market between states without tariffs allowed
for a rapid increase in economic output which increased the value of labor and the standard of
living (Kirkland, 1961). Major increases in life expectancy and health occurred during this time,
with a jump from an average life expectancy for Whites from 42 to 52 years over this same time
period (Arias, 2011; Hacker, 2010). Similarly, this period saw major declines in infant mortality.
While the changing economic situation bettered the lives for most Americans, children still
remained a large part of the work force. As the type of work children performed shifted from
agricultural to industrial, the increased dangers of the work environment for children weighed on
the nations conscience. Though increased living standards afforded more parents the ability to
send their kids for schooling, in 1890 still ~32% of white male children (<15 years old) were
3
involved in the labor force (Carter & Sutch, 1996). The combination of rising living standards
and a changing economy necessitating a more educated work force would help shift cultural
values on the importance of compulsory education for children. By 1918, all states required
children to graduate elementary school (Katz, 1976). As moral outrage over harsh working
conditions for children grew, the US saw an increase in laws aimed at curbing child labor. With
the stock market crash of 1929 and subsequent economic depression, the public’s support to keep
children out of the labor force (i.e. to prevent job competition with adults) peaked resulting in a
major drop in child labor by 1930 (<6% white males < 15 years old). By the 1950's the majority
of all children in the United States would attend a high school (Snyder, 1993).
Given the profound generational differences in the opportunities for education and even
in the length and quality of schooling, it was not surprising that as a result of these societal
changes, individuals from earlier birth cohorts (ie. older participants) scored lower on cognitive
tests. The creation of compulsory education laws and the elimination of children from the
workforce is credited with increasing the cognitive abilities of the population. However, as
evidenced above, the driving forces behind these societal changes were multifaceted and
complex. Moreover, the technological and social changes throughout this time-period created
major shifts in the age trajectories of diseases, such as tuberculosis, which were first noted by
W.H. Frost in 1939. Frost found that those from younger cohorts (e.g. born 1910) consistently
showed lower levels of mortality at all ages compared to older cohorts born in the mid to late
1800’s. These were thought to arise from the general improvement in sanitation that
accompanied better living standards as well the increased efforts by public health officials over
the recent decades. The changes to cognition and tuberculosis mortality both demonstrate how
changes to technology and social structure are interrelated and can lead to profound generational
differences in social, behavioral, and biomedical outcomes.
1.1.2 Research design and social change
The idea that “...transformations of the social world modify people of different ages in
different way[s]...” is not new (Ryder, 1965). The influence of societal change on the population
distributions of physical and psychological measures are well-documented and have been shown
to play a major role in the development of personality (Baltes & Nesselroade, 1972), intelligence
(Schaie, Willis, & Pennak, 2005), and substance use (O'Malley, Bachman, & Johnston, 1984).
Studies of childhood and adolescent development are particularly sensitive to generational
4
differences (Carlsson& Karlsson, 1970; Baltes & Nesselroade, 1972), as these are years of rapid
maturational growth when environmental influences can have a great effect in shaping adult
behavior patterns.
Part of the error made by early researchers of cognitive development was to infer intra-
individual change from a research design of inter-individual differences. The cross-sectional
design employed by researchers consisted of collecting data on individuals of various ages at the
same time and generating a developmental curve by linking the age groups. Cross-sectional
designs are advantageous in that they are generally cheaper and faster to conduct relative to other
designs; however, when studying development, the cross-sectional approach confounds the effect
of age with birth cohort differences resulting in the inability to distinguish maturational from
generational changes.
It for this reason that much of the developmental literature relies upon the use of single-
cohort longitudinal designs, where researchers follow the same cohort of subjects over time,
thereby disentangling the confounding of cohort and age. However, these designs preclude the
estimation of cohort influences which may be particularly important when studying
developmental phenomena that are subject to a changes in culture. Moreover, the generalizability
of a single-cohort to the overall developmental phenomena is unlikely unless the outcome is
invariant to cultural influences.
Both cross-sectional and longitudinal studies of development can be considered special
cases of the Age-Period-Cohort model. Proposed by Schaie (1965) as a general developmental
model, these models concern themselves with identifying the effects of 1) age, how an outcome
changes across the life-span; 2) period, change due to measurement at a particular time, and 3)
cohort, changes due to generational differences. Period effects can also be considered as changes
in environment or cultural context; likewise, cohort differences can be considered differences in
cumulative life experience prior to measurement (Schaie, 1965). Both may reflect cultural shifts
that are either momentary (period) or long-lasting (cohort). The traditional developmental
designs do not allow for disentangling these effects as the cross-sectional design confounds age
and cohort and the single-cohort longitudinal design ignores cohort influences and confounds age
and period effects. Given that many medical and psychological outcomes have been shown to be
greatly influenced by cultural changes, it is no wonder that we find such discrepant results
between cross-sectional and single-cohort longitudinal studies (Baltes, 1968).
5
1.1.3 Accelerated longitudinal designs
In order to overcome the disadvantages of these designs, Richard Bell (1953; 1954)
proposed the Accelerated Longitudinal Design (ALD) which makes use of multiple cohorts
studied simultaneously in a longitudinal fashion. Figure 1.1 provides an example of this design
where 3 cohorts of children (born in 1980, 1978, and 1976) were examined over 6 years (4
measurement occasions) between the ages of 10 and 20. The measurement of multiple cohorts in
the same time-period, longitudinally, with overlap in the ages of study between cohorts, allows
for the determination of cohort influences on age trajectories (O'Brien, 2014; Yang & Land,
2006, 2016).
Figure 1.1: Example of an ALD
This design has sometimes gone under the moniker of the Cross-Sequential (Farrington, 1991) or
Cohort-Sequential (Nesselroade & Baltes, 1979) design. Though Baltes and Nesselroade (1974,
1984) pointed out that developmental change is influenced by cultural movement and proposed
the regular use of ALDs in order to capture these effects, few studies have employed this design
despite the deficiencies of traditional designs to capture cohort effects.
1.1.4 Advantages of ALDs
In addition to allowing for estimation of cohort effects, ALDs can also be used to provide
estimates of a generalized developmental trajectory by aggregating the maturational effects
across multiple cohorts. While trajectories from a single-cohort longitudinal design may offer the
best estimates for that particular cohort, the likelihood of that trajectory being applicable to other
cohorts is low if the trajectory is influenced by cultural shifts. With ALDs, the average estimate
across cohorts can be used to provide a development trajectory that generalizes to all of the
cohorts under study while also providing the ability to estimate cohort specific trajectories.
6
Naturally the use of the ALD to create a generalized trajectory is dependent on the nature of the
cohort differences. Outcomes that experience unidirectional cohort effects such as in our
cognition example (e.g. cognition improving with each successive generation) are more likely to
be applicable to a generalized trajectory than those where cohort differences are multidirectional
(ie. sometimes advantageous, sometimes deleterious). Nevertheless, the identification of
developmental trajectories that are applicable to multiple cohorts in the population are an
obvious advantage over estimates that are applicable to only a single cohort.
Another advantage of the ALD is the ability to conduct a study of age-related
development over a substantially shortened time period. For example, a ten-year single-cohort
longitudinal study will take 10 years for data collection; the implications for a theory or policy
may be out of date by the time these results are able to be published. Moreover, such an
undertaking requires ten years of continued funding as well as a long-term commitment by the
researchers to a question that may or may not prove fruitful. There are additional issues of
attrition and the effects of repeated testing which may bias the results from these single cohort
studies. In an ALD, the same age range of interest (e.g. age 10 to 20 years) can be studied over a
shorter period of time through planned missingness at the earlier periods for older cohorts and
later periods for younger cohorts (See Figure 1.1, Panel B). In addition to reducing the overall
time commitment for the study, the rates of attrition would decrease by having less follow-up.
The overall costs for the study would also be reduced by having fewer measurements per subject
as well as reducing staff and institutional (indirect) costs by decreasing the study duration.
1.1.5 ALDs: Designs for a rapidly changing culture?
Accelerated longitudinal designs may be particularly relevant for those studying
culturally influenced outcomes now. Though the technological innovations of the early 20
th
century that lead to increases in cognition and better health were mechanical, the early 21
st
century has given rise to digital technologies that are influencing how we work, learn, and spend
our leisure (Harper & Leicht, 2015). These technologies have changed how we interact with each
other in social and economic transactions and are transforming the economic structure of our
society with long term impacts that we are still attempting to understand (Harper & Leicht,
2015). In this world of rapid cultural change, the ALD may have increased relevancy for various
fields of study interested in age-related change.
7
While models that incorporate unmeasured cultural influence are increasingly relevant,
there has not been widespread use of the accelerated longitudinal design. Part of the hesitancy to
adopt these designs may be due to unfamiliarity with them, as few substantive studies have
employed ALDs and even fewer methodological papers have demonstrated the efficacy of these
designs in studying development. To this end, articles describing the sample size, power, and
cost considerations of these designs are virtually non-existent with the recent exceptions of
Miyazaki and Raudenbush (2000), Moerbeek (2011), and Galbraith, Bowden, and Mander
(2017). Even with recent interest in these designs, none have shown how an ALD relates to the
single-cohort longitudinal design in the presence of shifting cultural influence (e.g. cohort/period
differences) nor have these designs been thoroughly examined in the context of nonlinear change
or attrition.
1.2 Research objective and aims
The objectives of this dissertation were to increase awareness of the cost, design,
simulation, and analytic considerations for accelerated longitudinal designs. In order to address
these goals, the following aims were proposed.
1.2.1 Aims
Aim 1: Investigate the role of design choices for the ALD and how these alter cost, power, and
sample size considerations relative to a single-cohort longitudinal design in the absence
of cohort effects
1.1. Develop language for and metrics of the design elements of the ALD.
1.2. Develop a cost model to allow for the cost comparison of an ALD to a single-cohort
longitudinal design.
1.3. Develop methods for simulating and analyzing data from an ALD.
1.4. Investigate how the design elements (e.g. cohorts, periods, cohort spacing, and period
spacing) impact the power of an ALD as well as the bias and efficiency of the slope
estimate.
Aim 2: Examine the estimates of age-related growth in the ALD in the presence of cohort
heterogeneity.
8
2.1 Develop methods for introducing and evaluating the amount of between-cohort
variation in intercepts and slopes.
2.2 Evaluate the impact of between-cohort variance on the power to detect the global
estimates of growth and between-cohort differences.
2.3 Determine the consequences of model misspecification when between-cohort
differences are present but not appropriately modeled.
2.4 Examine metrics of bias and efficiency when between-cohort variance is introduced.
Aim 3: Evaluate an ALD using real data on age trajectories of marijuana and tobacco use in the
National Longitudinal Survey of Youth (NLSY).
4.1 Describe estimates for age, period, and cohort effects of tobacco and marijuana use.
4.2 Demonstrate how the modeling of cohort influences can improve age trajectory
estimation in an ALD.
4.3 Examine how alternate specifications of cohort membership can impact ALD
interpretation.
1.2.2 Organization of the dissertation
Each of the above aims will be examined over the course of the subsequent seven
chapters (Chapters 2 thru 8) followed by a general discussion of chapter specific and overall
considerations, limitations, and future directions in Chapter 9. Each chapter is comprised of parts
that are methodological (explaining definitions, equations, and methods to be used in the
chapter), part results (showing findings using figures and tables), and part discussion
(interpreting these findings in the context of the literature). Often the results and discussion
elements are interwoven and occur as they might in a book rather than a journal article. To this
end, tables and figures appear embedded in the text to allow for easy interpretation in proximity
to the text that describes them. At the end of each chapter a summary is presented which
highlights some of the main findings. The exception to this structure is Chapter 8, which is
written as a more traditional research paper with separate results and discussion as well as with
tables and figures appearing at the end of the manuscript.
Aim 1 is covered by chapters 2 thru 5 and will seek to understand the influence of design
choices in the absence of cohort effects using simulated data. This will give the reader a firm
9
grasp of the various design elements (Chapter 2) and how trade-offs in study duration,
measurement intervals, number of cohorts, and overlap between successive cohorts can alter the
cost (Chapter 2) and power of linear (Chapter 3) and nonlinear (Chapter 4) ALDs. The chapters
for Aim 1 will introduce the language of these design parameters that will be used throughout
this dissertation as well as provide a look at the performance of ALDs under these optimal
circumstances. Investigating the roles of the ALD in the absence of between-cohort differences is
important because those using single-cohort designs to study development are often assuming
that their estimates will generalize to nearby cohorts. By demonstrating that the ALD can be
equally powerful in capturing the age trajectories we can show that the ALD is the most sensible
design choice for researchers who would ordinarily use a single-cohort design. Additional
metrics such as bias and efficiency of the slope parameter are also assessed (Chapter 5). Aim 1
also introduces the concept of the 'equivalent cost sample size' as well as provides an overview
for the methods of simulating ALDs. Statistical programs written in Stata version 15 (College
Station, TX) to aid in the design and visualization (aldesign; Jackson, 2017a), cost calculation
(aldcost; Jackson, 2017b), and simulation (aldsim; Jackson, 2017c) of ALDs are also be
presented.
Aim 2 (Chapters 6 & 7) seeks to understand how heterogeneous cohort effects can be
captured by an ALD, and the implications that between-cohort variance has for measures of
statistical power (Chapter 6) and bias (Chapter 7) of both the aggregate slope estimate and
between-cohort variance estimate. The chapters arising from Aim 2 use simulated data to
introduce the methods for incorporating and analyzing between-cohort differences and discuss
their implications for the aggregate estimates of the intercepts and slopes. A brief discussion on
the generalizability of these estimates occurs as well as considerations for evaluating the
generalizability of the aggregate estimates in the presence of cohort differences. Special attention
is paid to the consequences of choosing to simulate cohort differences based on the use of effect
sizes that either vary or are constant between cohorts. Additional examinations explore the
consequences of model misspecification when between-cohort differences are present in the data
but excluded from the model.
Lastly Aim 3 (Chapter 8) will demonstrate the analysis of an ALD using real data. In this
penultimate chapter, data from the National Longitudinal Survey of Youth (NLSY) was used to
examine the developmental trajectories of past year and past month marijuana and cigarette use
10
for 6,800 participants followed annually from the ages of 12-17 in 1997 until 27-32 in 2011.
Although the NLSY is not often thought of as an ALD, the data structure of multiple birth
cohorts followed longitudinally allows for examination of the data as an ALD. The analytic
methods established in the prior chapters were used to assess the age and cohort effects on
substance use. Two different cohort structures were used, school grade cohort and birth cohort,
in order to examine how cohort choice altered the interpretation of the findings. Unlike in
previous chapters, historical period effects were also modeled to demonstrate the full
specification of the ALD mixed model.
The final chapter of the dissertation, Chapter 9, summarizes the key findings from each
chapter. Chapter specific limitations and future directions are discussed as well as more general
considerations for the design of ALDs that were not addressed in this manuscript.
1.3 Chapter Summary
Accelerated longitudinal designs present an opportunity for incorporating unmeasured
generational differences into statistical models for studying longitudinal development. In
addition to describing between-cohort differences these designs allow for shortened study
durations and lower levels of attrition which increase the relevancy and improve the bias of
research findings. In the subsequent chapters we will add to the body of knowledge on these
designs through investigations of the design parameters, cost considerations, statistical power,
and parameter bias. Monte Carlo simulation methods will be used to compare the performance of
the ALD to single-cohort longitudinal designs as a means to demonstrate their efficacy in
replacing traditional developmental methods, particularly in small samples. Real data will also be
used to demonstrate the application of the accelerated design in modeling between-cohort
variability in substance use trajectories.
11
1.4 Chapter References
1. Arias, E. (2011). National vital statistics reports Volume 64, Number 11. National Center for
Health Statistics, 64(11), 52.
2. Baltes, P. B. (1968). Longitudinal and cross-sectional sequences in the study of age and
generation effects. Human Development, 11(3), 145-171.
3. Baltes, P. B., & Nesselroade, J. R. (1972). Cultural change and adolescent personality
development: An application of longitudinal sequences. Developmental Psychology, 7(3),
244.
4. Baltes, P. B., & Nesselroade, J. R. (Eds.). (1979). Longitudinal research in the study of
behavior and development. Academic Press.
5. Baltes, P. B., & Nesselroade, J. R. (1984). Paradigm lost and paradigm regained: Critique of
Dannefer's portrayal of life-span developmental psychology. American Sociological Review,
49(6), 841-847.
6. Bell, R. Q. (1953). Convergence: An accelerated longitudinal approach. Child Development,
145-152.
7. Bell, R. Q. (1954). An experimental test of the accelerated longitudinal approach. Child
Development, 281-286.
8. Carlsson, G., & Karlsson, K. (1970). Age, cohorts and the generation of
generations. American Sociological Review, 710-718.
9. Carter, S. B., & Sutch, R. (1996). Fixing the facts: editing of the 1880 US Census of
occupations with implications for long-term labor-force trends and the sociology of official
statistics. Historical Methods: A Journal of Quantitative and Interdisciplinary History, 29(1),
5-24.
10. Farrington, D. P. (1991). Longitudinal research strategies: Advantages, problems, and
prospects. Journal of the American Academy of Child & Adolescent Psychiatry, 30(3), 369-
374.
11. Galbraith, S., Bowden, J., & Mander, A. (2017). Accelerated longitudinal designs: an
overview of modelling, power, costs and handling missing data. Statistical Methods in
Medical Research, 26(1), 374-398.
12. Hacker, J. D. (2010). Decennial life tables for the white population of the United States,
1790–1900. Historical methods, 43(2), 45-79.
12
13. Hall, P. D. (1984). The organization of American culture, 1700-1900: Private institutions,
elites, and the origins of American nationality. NYU Press.
14. Harper, C. L., & Leicht, K. T. (2015). Exploring social change: America and the world.
Routledge.
15. Jackson, N.J. (2017a). ALDESIGN: Stata program for the design of accelerated longitudinal
designs. Stata Version 15.0. revised 09.04.2017.
16. Jackson, N.J. (2017b). ALDCOST: Stata program for the cost computations of accelerated
longitudinal designs. Stata Version 15.0. revised 09.05.2017.
17. Jackson, N.J. (2017c). ALDSIM: Stata program for the simulation of accelerated longitudinal
designs. Stata Version 15.0. revised 09.07.2017.
18. Katz, M. S. (1976). A history of compulsory education laws. ” Bloomington, IN: Phi Delta
Kappa Educational Foundation.
19. Kirkland, E. C. (1961). Industry comes of age: Business, labor, and public policy, 1860-
1897 (Vol. 6). Holt, Rinehart and Winston.
20. Kuhlen, R. G. (1940). Social change: a neglected factor in psychological studies of the life
span. School & Society.
21. Lebergott, S. (1966). Labor force and employment, 1800–1960. In Output, employment, and
productivity in the United States after 1800 (pp. 117-204). NBER.
22. Miyazaki, Y., & Raudenbush, S. W. (2000). Tests for linkage of multiple cohorts in an
accelerated longitudinal design. Psychological Methods, 5(1), 44.
23. Moerbeek, M. (2011). The effects of the number of cohorts, degree of overlap among
cohorts, and frequency of observation on power in accelerated longitudinal designs.
Methodology: European Journal of Research Methods for the Behavioral and Social
Sciences, 7(1), 11.
24. Nesselroade, J. R., & Baltes, P. B. (1974). Adolescent personality development and historical
change: 1970-1972.Monographs of the Society for Research in Child Development, 1-80.
25. Nesselroade, J. R., & Baltes, P. B. (1984). Sequential strategies and the role of cohort effects
in behavioral development: Adolescent personality (1970–1972) as a sample case. Handbook
of Longitudinal Research, 1, 55-87.
13
26. O'Brien, R. (2014). Age-period-cohort models: Approaches and analyses with aggregate
data. CRC Press.
27. O'Malley, P. M., Bachman, J. G., & Johnston, L. D. (1984). Period, age, and cohort effects
on substance use among American youth, 1976-82. American Journal of Public
Health, 74(7), 682-688.
28. Ryder, N. B. (1965). The cohort as a concept in the study of social change. In Cohort
analysis in social research (pp. 9-44). Springer, New York, NY.
29. Schaie, K. W. (1965). A general model for the study of developmental
problems. Psychological Bulletin, 64(2), 92.
30. Schaie, K. W., Willis, S. L., & Pennak, S. (2005). An historical framework for cohort
differences in intelligence. Research in Human Development, 2(1-2), 43-67.
31. Sherman, L. E., Payton, A. A., Hernandez, L. M., Greenfield, P. M., & Dapretto, M. (2016).
The power of the like in adolescence: effects of peer influence on neural and behavioral
responses to social media. Psychological Science, 27(7), 1027-1035.
32. Snyder, T. D. (Ed.). (1993). 120 years of American education: A statistical portrait. DIANE
Publishing.
33. Stewart, J. S., Oliver, E. G., Cravens, K. S., & Oishi, S. (2017). Managing millennials:
Embracing generational differences. Business Horizons, 60(1), 45-54.
34. Yang, Y., & Land, K. C. (2006). A mixed models approach to the age-period-cohort analysis
of repeated cross-section surveys, with an application to data on trends in verbal test
scores. Sociological Methodology, 36(1), 75-97.
35. Yang, Y., & Land, K. C. (2016). Age-period-cohort analysis: New models, methods, and
empirical applications. Chapman and Hall/CRC.
14
Chapter 2
Design, cost, and attrition parameters and metrics in accelerated longitudinal designs
The study of age-related development is of primary concern across fields in the social and
behavioral sciences. Understanding when behaviors or traits emerge and subside across the life-
span is important for the development of programs designed to screen for, prevent, or intervene
on targeted outcomes. In psychology, age-related development has been used to study changes in
personality, intellect, and antisocial behavior. However, traditional research designs make
assumptions about the nature of this development, often ignoring or confounding the influence of
generational change. While ignoring generational change can be appropriate for outcomes that
are developmentally homogeneous across long periods of time; for knowledge or behavior that
may be influenced by a rapidly changing culture, alternatives to the traditional methods are
needed. For the social sciences, the notion that “...transformations of the social world modify
people of different ages in different way[s]...” is not new (Ryder, 1965). Generational differences
have been shown to play a major role in the development of personality (Baltes & Nesselroade,
1972), intelligence (Schaie, Willis, & Pennak, 2005), and substance use (O'Malley, Bachman, &
Johnston, 1984). Studies of childhood and adolescent development are particularly sensitive to
generational differences (Carlsson & Karlsson, 1970; Baltes & Nesselroade, 1972), as these are
years of rapid maturational growth when environmental influences can have a great effect in
shaping adult behavior patterns. Most importantly, results from traditional research designs may
be obsolete by the time they are published due to cultural shifts. As such, alternate methods are
needed that are sensitive to cultural shits and can be conducted over shortened time spans. One
such approach is to use an Accelerated Longitudinal Design (ALD). In order to understand
ALDs, we must first have a clear understanding of the components of and ALD and the
consequence for varying these parameters when considering an accelerated design. This chapter
establishes the language by which we can discuss the design of the ALD as well as proposes a
cost equation for use when comparing the ALD to the traditional single-cohort longitudinal
design.
15
2.1 Design Parameters and Performance Metrics of an ALD
The key features of an ALD are its ability to cover the same age span as a traditional
single cohort longitudinal design but with fewer measurement occasions. This is accomplished
through the addition of multiple cohorts, overlapping in ages, each with shortened follow-up
times that allow for the developmental trajectory to be created based on the average within-
person pattern across cohorts. The design parameters of an ALD consist of the number of cohorts
(Cn) and number of subjects per cohort (N c), the cohort interval spacing or age difference
between cohorts (Cs), the number of periods (Pn), and the period interval spacing or frequency of
measurement (Ps). Each element plays a unique role in the design of the ALD with is graphically
illustrated below. Figure 2.1 shows a design covering the age span of 10 to 20 years using 3
cohorts (Cn) separated by 2 years (Cs) each with 4 periods of measurement (Pn) spaced 2 years
apart (Ps) with the first measurement occurring in the year 2000.
Figure 2.1: Example of ALD Design
Of course, this example represents a balanced ALD where there are equal numbers of periods in
each cohort and the intervals of measurement are equivalent between cohorts. While the ALDs
discussed in this dissertation will be ‘balanced’ in this manner, alternate formulations of the
ALD exist which might be more beneficial depending on the research question.
2.1.1 The Number of Cohorts
The primary component that distinguishes the ALD from a traditional longitudinal design
is the addition of multiple cohorts to the design. The number of cohorts (Cn) plays an integral
role in both extending the age span under investigation as well as allowing for the capturing of
between-cohort differences. For competing designs where only the number of cohorts varies, it
16
can be seen that each additional cohort extends the amount of developmental time under
investigation known as coverage by a function of the cohort spacing (Cs) such that by adding
cohorts alone, the coverage can be increased by Cs*(Cn-1). This increase in coverage can be
easily seen in Figure 2.2 below (panel A), yet notably the duration of the study or length is not
affected (panel B) because the same number of periods are being measured regardless of the
number of cohorts.
Figure 2.2: Increasing the number of cohorts increases coverage but not length
As the coverage increases and the length of the study remains the same, the efficiency of the
design is said to increase. Design efficiency in this context is referring to the proportion of the
age-span (i.e. coverage) that can be extended beyond that allowed by the study duration (i.e.
length) alone and is a primary metric for describing the performance of an ALD. The formulas
for coverage, length, and design efficiency are provided below in equations 1 thru 3.
coverage = Cs*(Cn-1) + Ps*(Pn-1) eq.1
length = Ps*(Pn-1) eq.2
design efficiency = 1 - length / coverage , range [0,1) eq.3
17
While these formulas are specific to the balanced ALDs presented here, alternate specifications
could be created to allow for an unequal number of measurements or spacing intervals between
cohorts. What should be apparent form these formulas are that the study length is solely
determined by the period information alone while coverage is a function of both the length
(period information) and the cohort information. The design efficiency based on equation 3 can
range from 0 to approaching 1, with higher values indicating greater efficiency. More efficient
designs will cover a greater number of years of development with shorter study durations
resulting in greater efficiency. A value of 50% efficiency would indicate that the coverage is
double the amount of the length (e.g. a study that takes 4 years to complete but covers 8 years of
development). As we increase the number of cohorts, efficiency will increase, though the
increases result in diminishing returns as more cohorts are added (Figure 2.3).
Figure 2.3: Increasing the number of cohorts (Cn) increases design efficiency
The number of cohorts also influences the ability to detect generational or cohort
differences in the developmental trajectories. Designs with fewer cohorts will have less
opportunity to detect generational differences. How the ability to detect these differences
between cohorts may depend on the ALD design parameters will be explored in detail in Chapter
6, however for now it suffices to know that with less cohorts there is less opportunity to examine
cohort differences. When thinking about cohorts is it important to consider what a cohort is. In
the context of an ALD we often think about the cohort as representing the birth cohort, whereby
we believe that those born around the same time will share a common experience and be exposed
to the same social changes. For our purposes we will use the term cohort to indicate a birth-
cohort and thus use the two terms interchangeably. However, certainly we can argue that cohort
can be a stand-in for any group we believe to be homogenous on some exposure of interest that is
18
time dependent. For example, while there is variation in the birth year for a given grade in school
(e.g. 8
th
grade, graduate school) it is reasonable to consider members of the same grade as
belonging to the same cohort given that they are undergoing a common experience that is
specific to a point in time. Similarly, cohort membership could be assigned based on fighting in a
war, exposure to a disease, or even retrospectively based on age of death. Regardless of how one
chooses to define the cohort, the important element is that the cohort should share some common
experience or exposure that is time dependent and which the researcher believes may impact the
outcomes they are interested in. Though for this dissertation we consider cohort to be a discrete
category of membership, one could examine cohort membership on a continuum as noted by
O'Brien (2014). Presenting birth cohort as discrete may present conceptual challenges. For
example, those born in January 1980 are likely more similar to those born in December 1979
than individuals born in December 1980 despite being from the same birth-cohort year. This
illustrates the importance of having a clear definition for cohort membership as well as
understanding that birth-cohorts exist more on a continuum than as discrete groups.
2.1.2 The Number of Subjects per Cohort
The number of subjects per cohort (Nc) is used to define the sample size per cohort. In
this dissertation it will be assumed that each cohort is of equal size such that the total sample size
is defined by Nc*Cn.
2.1.3 The Cohort Interval Spacing
The Cohort interval spacing (Cs) represents the age difference between successive cohorts
which directly impacts the amount of overlap in measurements between cohorts. As explained in
the paragraphs above, the cohort interval works in concert with the number of cohorts and the
study length in order to modify the amount of coverage. This results in increasingly efficient
designs as the cohort interval is increased to allow for greater coverage (Figure 2.4).
19
Figure 2.4: Increasing the cohort interval spacing (Cs) increases design efficiency
We can also see that, similar to the effects of increasing Cn alone, the increases in efficiency by
Cs diminish as the interval spacing is lengthened. Given that both Cn and Cs play a role in
determining the amount of coverage, it is reasonable to wonder which may provide the most
effective means for increasing coverage with a minimal cost to design efficiency. Figure 2.5
shows the changes in design efficiency as we hold Cs constant and increase the Cn or alternately
hold Cn constant and increase the Cs.
Figure 2.5: When trying to increase coverage, increases in Cn are less detrimental to design
efficiency
As we can see above, the amount of change (gain) in design efficiency is generally greater if
increasing the Cn (mean change=3%) for a given C s rather than keeping the Cn and increasing the
Cs (mean change=2%). Thus, for those interested in increasing coverage while minimizing the
potential loss to design efficiency, the incorporation of additional cohorts is generally preferable
to increasing the cohort interval spacing. Conversely, if the goal of the researcher is to decrease
the amount of coverage, then decreases to Cs will be the least impactful on design efficiency.
20
2.1.4 The Number of Periods
The number of periods (Pn) refers to the number of measurement occasions that occur for
each subject. As seen in equation 2, the number of periods in concert with the period interval
spacing (Ps) is responsible for the study length such that ALDs with a greater number of periods
will subsequently be longer as a function of Ps. As a result, a greater number of periods results in
greater study length and thus decreased design efficiency (Figure 2.6).
Figure 2.6: Increasing the number of periods (Pn) decreases design efficiency
As the number of periods increase, the ALD efficiency decreases, though the amount of
efficiency lost will decrease with each additional period of measurement. On average, this loss in
efficiency is approximately 5% per additional period.
2.1.5 The Period Interval Spacing
The period interval spacing (Ps) denotes the amount of time between periods of
measurement and is a component of the study length. Larger period spacings denote less frequent
measurement while smaller intervals indicate more frequent measurement. Accordingly, as the
period interval spacing increases (i.e. less frequent measurement) the study design efficiency
decreases (Figure 2.7).
21
Figure 2.7: Increasing the period interval spacing (Ps) decreases design efficiency
As both Pn and Ps play a role in determining the study length, we can examine which is more
beneficial for design efficiency when having to choose between increasing the number of periods
or the period spacings in order to decrease study length. Figure 2.8 shows the changes in design
efficiency as we hold Ps constant and increase the Pn or alternately hold Pn constant and increase
the Ps.
Figure 2.8: When trying to decrease length, decreases in Pn are more beneficial to design
efficiency
For researchers interested in decreasing the length of their ALD, decreases in Pn will generally
yield the greatest increase in efficiency (average loss of 7% per 1 Pn). Conversely, if a researcher
is interested in increasing the study length, increasing the period spacing is a better means for
minimizing the loss in design efficiency (average loss of 6.5% per 1 Ps).
2.1.6 The Interplay between the Cohort and Period Interval Spacing
The cohort (Cs) and period interval spacing (Ps) act together to impact the nature of the
overlap between the cohorts in an ALD. The ratio of Cs to Ps represents the forward lagging of
22
the period measurements for the proximate cohort. For example, a value of one-half would mean
that the subsequent cohort's first measurements appear halfway between the first and second
period of the initial cohort. When Cs=Ps, there is perfect overlap of the periods between cohorts
which allows each additional cohort to increase the age span under study by an additional period.
In the example below (Figure 2.9, panel A) the overlap between the cohorts is 3 periods.
Figure 2.9: The influence of Cs and Ps ratio on cohort overlap
In instances where the cohort interval spacing is greater than the period spacing (panel B) fewer
cohorts can cover the same age span, although the amount of overlap between cohorts is reduced.
In instances where the cohort interval spacing is less than the period spacing (panel C), more
cohorts will be needed to cover the same age span and the amount of overlap between cohorts is
greatly increased. The proportion of overlap between the periods of two successive cohorts is
another primary metric of ALD performance and is defined as:
overlap= (Ps*Pn - Cs) / (Ps*Pn) eq. 4
The values for overlap will be between 0 and 1 for most designs, however values can be below
zero for designs where there is a gap in the age at measurement between the last measurement
for one cohort and the first measurement for the successive cohort. We can visually see how the
overalp as is related to the Cs/Ps ratio and Pn in Figure 2.10 below.
23
Figure 2.10: Cs/Ps ratio is linearly associated with cohort overlap
Ratio values of 1 indicate where Cs is equal to Ps (e.g. Fig 2.9, panel A), values greater than 1
indicate Cs > Ps (e.g. Fig 2.9, panel B), and values less than 1 indicate a Cs < Ps (e.g. Fig 2.9,
panel C). As the ratio increases, the proportion of overlap decreases linearly. This drop in
overlap can be offset through increases in the number of periods because as Pn increases the
overlap also increases. In general, it is desirable to have a greater amount of overlap between the
cohorts as this provides greater stability in generating an aggregate developmental trajectory.
When the amount of overlap is small, the developmental trajectory for the ages covered by a
given cohort will be largely dependent on that particular cohort. This is not desirable if one is
attempting to generate a trajectory that generalizes across cohorts. The implications of this will
be explored in detail in future chapters. Moreover, there is a tradeoff between overlap and design
efficiency such that while designs with ratios less than 1 will have the greatest overlap, they will
also have poor efficiency. The tradeoff between design efficiency and overlap can be best
understood graphically (Figure 2.11).
Figure 2.11: The tradeoff between design efficiency and overlap
24
In Figure 2.11, we can see that designs with fewer periods have a slightly better
performance curve for the design efficiency-overlap tradeoff. Though a greater number of
periods may increase the amount of overlap, it also reduces efficiency at a rate that is greater than
the increase in overlap. We can also note that the efficiency-overlap performance curve and be
improved by increasing the number of cohorts, as this will increase the coverage of the design
and thus allow for greater efficiency. The changes to this curve are more greatly impacted by
changes in Cn rather than Pn. We have somewhat arbitrarily defined quadrants based on having
design efficiency or overlap values that are above or below 50%. For overlap, a value of 50%
indicates that half of the periods overlap between successive cohorts. While the implications for
the degree of overlap on statistical power and bias will be explored in later chapters, for now we
will utilize a value of 50% to differentiate low and high overlap. Similarly, design efficiency can
be categorized in this manner, whereby values with >50% efficiency represents efficient designs
where the age span coverage is more than double the study length. Designs in the lower left
quadrant represent those with both low efficiency and overlap. The overlap for these designs is
low due to the high Cs/Ps ratios (mean=4.3, N=4) as well as the design efficiency being low due
to only having 2 cohorts represented. These designs are undesirable, as they will not decrease the
study duration by much and would also have developmental trajectories that are less likely to
converge between cohorts as a result of their low overlap. Designs in the lower right quadrant
represent those with high overlap but poor design efficiency. Most of these designs will be those
with Cs/Ps ratios < 1, with the average ratio across designs in this quadrant (N=210) being 0.82.
These designs are ideal for those utilizing the ALD to measure between-cohort differences but
with little concern towards reducing the overall study length. Designs in the upper left quadrant
represent designs with low overlap and high design efficiency. These are designs (N=26) that had
very high Cs/Ps ratios (mean = 4.6) and would likely present some difficulty for estimating a
developmental trajectory that generalizes across cohorts due to the low overlap. The upper right
quadrant contains the designs that are high on both design efficiency and overlap. These are the
designs that provide the best tradeoffs between overlap and design efficiency. The ratios in these
are moderate and generally > 1 (mean=1.6, N=49) with 90% of the ratios being between 0.75 and
3.
25
2.2 ALD Cost Equations and Cost Performance Metrics
2.2.1 Cost Equation
Now that we have a better understanding of the design elements of the ALD, we can
begin to explore the implications these elements have on the cost of an ALD; particularly in
comparison to single cohort designs covering the same age span. Galbraith, Bowden, and
Mander (2017) proposed an equation for calculating the cost of an ALD based on the work of
Bloch (1986) :
costALD= overhead + c1*N + c2*N*M + c3*L eq. 5
where c1 is the cost of recruiting a subject, c2 is the cost of taking a measurement, N is the total
number of subjects, M is the number of measurements per subject, c3 is ongoing cost per year,
and L is the study length or duration in years. Overhead costs could refer to any costs that are not
linked to changes in time or that change as a result of the number of subjects. For example,
upfront costs for equipment, computers, and software might be independent of both the length of
the study and number of subjects. Costs for recruitment might include advertising, initial
screening for eligibility, as well as initial subject compensation. Costs for measurement could
include disposable equipment and supplies, printing costs for questionnaires, and measurement
specific participant compensation. Ongoing costs per year would mostly refer to researcher
salary costs as well as any auxiliary costs for space and storage not covered by the indirect costs
from a research grant.
Though Galbraith et al. (2017) express this equation more generally for all ALDs; we can re-
write this equation (eq. 5) for the balanced ALDs we’ve been exploring as a function of the
design parameters:
costALD= overhead + c1*Cn*Nc + c2*Cn*Nc*Pn + c3*Ps*(Pn-1) eq. 6
The same design parameters for the ALD can then be used to compute the cost in the single-
cohort longitudinal design (SCD) covering the same age span of the ALD by substituting the N,
M, and L of equation 5 with the following:
N = Cn*Nc eq. 7
L = Ps*(Pn-1) + Cs*(Cn-1) eq. 8
M = [(L / Ps) + 1] eq. 9
26
Resulting in the SCD cost equation of:
costSCD= overhead + c1*Cn*Nc
+ c2*[((Ps*(Pn-1) + Cs*(Cn-1)) / Ps) + 1]
+ c3*(Ps*(Pn-1) + Cs*(Cn-1)) eq. 10
Putting forth the equation for the equivalent SCD is necessary in order to facilitate cost
comparisons between an ALD and SCD. The choice for M in a single cohort design that is the
equivalent to an ALD is a non-trivial problem. In most instances it seems reasonable to assume
that the researcher is interested in simply covering the same age span with the same frequency of
measurement (Ps), thus equation 9 will be suitable for most applications. However, it is
important to understand that exceptions exist which alter the cost equation. Additional period
measurements in the equivalent SCD may be necessary when there is less than perfect overlap in
the age at which period measurements occur between subsequent cohorts in the ALD. For
example, in Figure 2.12 below, the period spacing of 0.75 and cohort spacing of 0.5 result in an
inflated number of measurements required to measure the same ages in the equivalent SCD.
Figure 2.12: ALD with unique age measurements per cohort
If the goal is for the SCD to have information at the exact same ages as in the ALD, then the
equivalent SCD would necessitate 24 measurements per subject in this design resulting in an
inflated cost for the SCD. For this dissertation we will assume that the researcher's goals are to
cover the age-span with equal intervals (eq. 9).
2.2.2 Scaling the Cost Components
In many instances, the absolute costs per recruit or per measurement are likely not
known. In these cases, the cost parameters (c1, c2, c3) can be thought of as proportions of the
budget such that c1+c2+c3 = 1. This can help facilitate relative cost comparisons between
27
competing designs despite not offering an absolute magnitude of cost. Moreover, it is important
to note that equation 5 by Galbraith, Bowden, and Mander (2017) treat the cost parameters as
values per unit (recruit, measurement, year). It is unlikely that many researchers would know
what proportion of the budget a single measurement would cost. Rather they would likely have a
clearer idea about what proportion of the budget can be devoted to all of the data collection. In
these instances, the cost parameters can be rescaled so that they represent the total proportion of
the budget for recruitment, measurement/data-collection, and duration. Doing this rescaling
requires dividing the cost parameters by their unit of measurement. For example c1 scaled = c1 / N
and c2 scaled = c2 / (N*M). What should be apparent from this re-scaling is that doing for an
individual cost equation is not meaningful. For example, the measurement costs are expressed by
c2*N*M. Substituting in c2 scaled for c2 results in a computation of (c2 / (N*M))*N*M which is
equal to c2. However, the re-scaling of these costs can become meaningful when used to compare
between the SCD and ALD.
2.2.3 Cost comparisons between single cohort and accelerated longitudinal designs
As noted in the previous paragraph, the re-scaling of the cost parameters to represent
proportions of a budget category rather than as amounts or proportions per unit is likely easier
for a researcher to comprehend. While re-scaling these cost parameters within the equation for an
SCD (eq. 10) or ALD (eq. 9) alone will not yield meaningful results, the re-scaling can be used
to create comparisons between the single cohort and accelerated longitudinal designs. If one
considers the proportions for each budget category to be representative of the costs in an SCD,
the re-scaled parameters from the SCD can be used in the cost equation for the ALD in order to
identify the amount of relative cost savings in each budget category. For example, if we want to
compare the cost savings for data collection between the designs we can re-scale the cost
parameter c2 based on the SCD (eq. 10) and then apply c2 scaled to the cost equation for the ALD
(eq. 6) and derive the cost savings for the ALD relative to the SCD:
% savings for measurement = c2 / (N*M) - (c2 / (N*M))*Cn*Nc*Pn eq. 11
This same logic can be applied to the cost savings for the durations costs (c3) as well. As can be
seen from the different cost equations (eqs. 10 and 6), the SCD and ALD will be assumed to
share the same total overhead as well as equivalent total costs for recruitment. A more general
equation can be used to compute the total proportion of cost savings for using an ALD over an
28
SCD covering the same age-span, regardless of the scaling of the cost parameters, by solving
each of the cost equations.
savings = 1 – (costALD / costSCD) eq. 12
Values for savings will range from 0 to 1 with higher values indicating greater cost savings in the
ALD relative to the SCD. Overhead and recruitment costs are the same between the designs, thus
the cost savings from the ALD are derived from reductions in the number of measurements taken
as well as the shortened length of the study. When the cost parameters have been rescaled as
proportion of their respective budget category, changes to the number of subjects per cohort (Nc)
will not result in changes to the savings. Using these equations, one can examine the cost trade-
offs of conducting an ALD versus a single-cohort longitudinal design at the same total N.
Moreover, these cost equations can be set equal to each other and solved for the number of
subjects per cohort (Nc) that would provide the same cost in the ALD as in the single-cohort
design.
2.2.4 Equivalent cost sample size
Examining the number of subjects per cohort in the ALD that are of equivalent cost to an
SCD covering the same age span is an important contribution to the design of ALDs. While prior
examinations of ALDs have examined statistical power assuming a total N that is equivalent
between an ALD and single-cohort design (Galbraith et al., 2017; Moerbeek, 2011), these
comparisons dismiss some of the main advantages of an ALD (i.e. shorter duration and fewer
measurements) which may unfairly diminish the power of these designs. As such, the
computation of the equivalent cost number of subjects per cohort (EqNc) will provide for a
sample size in the ALD which would cost the researcher the same amount as conducting an SCD
of sample size Nc*Cn. Evaluation of the EqNc will prove useful in later chapters of this
dissertation when examining statistical power between SCDs and ALDs as well as provide a
performance metric for the impact of varying design choices on costs. The computation of the
EqNc is provided in equation 13 below.
EqNc = [(costSCD - c3*Ps*(Pn-1)) / (Cn*(c1+ c 2*Pn))] eq. 13
The computation of the EqNc can be conducted with either absolute or rescaled values for the
cost parameters and can additionally be reported as a percentage increase in the number of
subjects that can be afforded in the ALD (sample growth) as a means of evaluating cost
performance.
29
sample growth = 1- Nc / EqNc eq. 14
Sample growth values will range from 0 to 1 with higher values indicating greater increases in
the sample size. When the cost parameters have be rescaled and treated as proportions of their
respective budget category, changes to Nc will not change the values for sample growth despite
the corresponding increases in EqNc.
2.2.5 Evaluating costs for a given design
In many instances the true cost parameters may not be known, thus when evaluating the
cost performance of an ALD it may be important to examine a range of values for relative costs
of recruitment, measurement, and duration. By generating a distribution of cost savings and
sample growth across the potential cost spectrum, we can explore how changes to the cost
parameters influence theses cost metrics. In figure 2.13 below, we examine these cost
performance metrics for a single ALD by varying the values for the cost parameters (c1, c 2, c3)
from 10% to 80% for each (such that c1+c2+c3=100%).
Figure 2.13: Relationship of cost parameters to cost savings and sample growth
As we can see from the figure, for ALDs where the recruitment costs (c1) are the majority of the
budget, the amount of savings and sample growth relative to the SCD will be minimized. This
makes sense given that cost equations for recruitment are equivalent between the ALD and SCD.
For a given cost of recruitment, increases to the measurement/data collection costs will minimize
the differences between the ALD and SCD, suggesting that much of the gain in savings (and
subsequently sample growth) for an ALD is derived from its ability to decrease the length of the
study relative to an SCD. This same figure can be reconstructed to show how for fixed
measurement costs (c2), increased costs for study length will increase savings and sample growth
30
(Figure 2.14, panel A); and how for fixed yearly costs (c3), increases in measurement costs (c2)
will also increase savings and sample growth linearly (Figure 2.14, panel B).
Figure 2.14: Alternate relationships of cost parameters to cost savings and sample growth
In this exemplar, where the costs have been re-scaled as proportions of the total budget, knowing
the values of any two cost parameters will yield information about the third. In order to evaluate
which of these cost parameters are most influential in changing the sample growth and savings,
we can plot the average values to get an idea about the average rate of change for increases in
each parameter. In Figure 2.15 below, the associations with the average rate of change are
displayed along with the linear slope (b) for each cost parameter per 10% increase.
Figure 2.15: Average sample growth and savings for cost parameter increases
For sample growth (panel A), the rate of change is greatest for changes in study length costs (c3),
followed closely by recruitment (c1). These will have opposing effects such that increases in
recruitment costs decrease sample growth while increases to study length costs will increase
sample growth. Changes to measurement costs (c2) have almost no impact on sample growth.
For savings (panel B), changes to the cost of recruitment (c1) will have the greatest impact
followed by changes to the cost for study length (c 3), and then cost per measurement (c 2). While
31
increased in recruitment costs decrease the average savings, increased measurement and length
costs increase the savings. As a result, the overall cost performance is best served when the
budget minimizes recruitment costs and maximizes costs associated with study length. It is
important to note that these comparisons of cost performance have thus far focused on a single
ALD with fixed design parameters (Cn, Cs, Pn, Ps) but with varying cost parameters. In order to
compare across different designs of an ALD, one would need to know the true fixed values for
these cost parameters or develop a summary measure.
2.2.6 Evaluating costs across competing ALDs
Given that a researcher exploring these designs may not know the values for these cost
parameters, we can develop a summary measure based on the average values for sample growth
and savings when the cost parameters are varied in increments of .05 (i.e. 5%) from 0.05 to 0.95
such that they sum to 1 for a given design. In this manner we can compare the average growth
and savings across competing designs.
When comparing the association of the cost performance measures across designs, there
is little variation in how sample growth and savings are associated with one another (Figure
2.16), suggesting that when we evaluate cost performance between designs we can utilize either
sample growth or savings as a singular cost performance metric. The association between the
two measures is not perfectly linear and is such that the average sample growth is higher than the
average cost savings.
Figure 2.16: Associations of cost performance measures between competing designs
The choice of savings as the singular cost performance metric for between design comparisons
seems sensible given the intuitiveness of the measure as opposed to the more obscure sample
32
growth. The number of additional subjects (sample growth) that a given savings may yield will
be dependent on the relative weighting of the cost parameters.
As the number of cohorts (Cn) increases, the average cost savings will increase
nonlinearly with diminishing savings for each additional cohort (Figure 2.17, panel A). This is
similar to the pattern seen for increases in the Cs/Ps ratio (panel C). Increases to the number of
periods (Pn) will decrease the savings, though with less negative impact on savings for each
additional period (panel B). These associations of these design parameters with cost savings
make sense given that they reflect advantages in duration (Cn, Cs/Ps) or number of measurements
(Pn) for the ALD relative to the SCD. Moreover, these patterns are similar to those seen for
design efficiency. As we will see in the next section, the design and cost performance metrics are
interconnected, allowing for a simplification of the metrics to evaluate an ALD.
Figure 2.17: Design parameter associations with cost savings
2.2.7 Associations of ALD performance and costs metrics
Now that we have determined to use savings as our primary cost metric for comparing
between designs, we can examine how savings is associated with both the performance metrics
of design efficiency and overlap. In Figure 2.18 below, we see the association of design
efficiency with savings for various designs.
33
Figure 2.18: Associations of design efficiency with cost savings
As we can see, there is a linear relationship between the average cost savings for a design
and the design efficiency, where designs with greater savings will also be more efficient. Indeed,
the correlations between these measures within a given design are greater than 0.9995 for the
designs displayed in Figure 2.18 above. There is also little variation between the slopes for the
alternate design parameters of Cn and Pn, though increases in Pn do seem to improve the design
efficiency -savings slope slightly. The lack of variability in these relationships is likely due to
how these savings values were generated, as across designs the average savings values will have
been derived using the same range of duration costs. Given that the study length is the primary
component of efficiency and the cost of study duration (c3) a major contributor of cost savings,
the lack of variation in these slopes between designs at the same fixed cost parameters should not
be surprising. The nature of this design efficiency-savings slope can be improved (i.e. greater
efficiency with greater savings) within a given design through higher costs for duration (at same
measurement costs (c2)) as displayed in Figure 2.19.
Figure 2.19: Increases to duration costs (c3) improve the design efficiency-savings slope
34
Given that design efficiency and savings exhibit this linear relationship, and that this
relationship does not vary greatly between designs at the same fixed costs (Figure 2.18); when
evaluating the performance of the design parameters for an ALD at fixed costs, we can utilize
either of these metrics.
When examining the trade-offs between design efficiency and overlap, as in Figure 2.11
the substitution of savings for design efficiency does not alter the nature of the associations with
overlap. In Figure 2.20 below, the original design efficiency-overlap curve from Figure 2.11 is
displayed (panel A) alongside the savings-overlap curves for when savings are averaged across a
range of costs (panel B) as well as for low (panel C) and high (panel D) duration costs. The
scaling of the y-axis for savings has been rescaled based on the design efficiency-savings slope
so as to allow for visual comparison with the design efficiency-overlap curve (panel A).
Figure 2.20: Equivalency between design efficiency-overlap curve and savings-overlap curve
As expected, based on the efficiency-savings associations, the substitution of savings for design
efficiency does not alter the nature of the relationship with overlap. In choosing to use design
efficiency or savings as a performance metric it seems clear that design efficiency should be
chosen because these values are invariant to changes in cost. The preference for design efficiency
35
also aligns with how most researchers might use these metrics such that their research costs will
be fixed and will then have to determine an appropriate design given these fixed costs. Despite
this preference for using design efficiency over savings, this does not negate the usefulness of
these cost performance metrics. Both savings and sample growth can be useful for discerning
performance in instances where the cost parameters vary between designs. Moreover, though the
sample growth and savings will not increase with greater Nc, the absolute magnitude of the
equivalent cost number of subjects per cohort (EqNc) will increase resulting in increases to
statistical power.
2.3 Incorporating Attrition into the Cost Equations
2.3.1 Models for Attrition
One of the strengths of the ALD is that it is able to cover the same age span as a single
cohort design, but in less time, making it less vulnerable to attrition. Given that attrition is a
normal facet of longitudinal research, a fairer comparison between an ALD and single-cohort
design would examine the costs when attrition is present. Taking the approach of Galbraith et al.
(2017) that was originally proposed by Verbeke & Lesaffre (1999), we will utilize a Weibull
model for defining the probabilities of attrition. In order to compare dropout in an ALD to the
equivalent single-cohort design, the number of periods used when generating these probabilities
will be based on the number of periods that would be present in the equivalent single-cohort
design as defined in equation 9. By doing so, we ensure that the dropout rate for the ALD is
always less than in the single cohort design. Each participant has 1 to M measurements, for
which each measurement can be defined by the probability of dropping out of the study:
p = {p1, …pj, …, pM}, where pj is the probability of having exactly j measurements
The probabilities in vector p are then determined based on the Weibull function. The periods of
measurement can be rescaled as proportions from 0 to 1 by using the following formula:
t
*
j=(tj-1)/(M-1), so that the first period (t1) equals zero and the last (tM) equals 1
Implied in this rescaling is that the period intervals are equivalent across all periods. While this is
certainly true for the ALDs described in this dissertation; depending on the design choices
specified, the equivalent single-cohort design may have unequal period spacing which would
need to be accounted for. Let w represent the overall proportion of participants who drop out at
some point during the study and λ = -log(1-w). The proportion of individuals who have dropped
36
out at period t, assuming they remained in the study until period t, will be defined by λγt
γ-1
.
Without loss of generality, this can be expressed in terms of the probabilities for vector p as:
p
j
=(1-w)
t
j
γ
-(1-w)
t
j+1
γ
, where pM = 1-w
In this manner, for a given value of w, the dropout serves as a function of γ whereby when γ=1,
the dropout is constant over the course of study indicating that missing data are presumed
missing completely at random (MCAR). When γ>1, dropout is concentrated towards the end of
the study and when γ<1 dropout occurs more often in the beginning of the study; that is, data are
presumed missing at random (MAR).
With the probabilities for having exactly j of M measurements stored in vector p, the
ALD and SCD cost equations can be modified to incorporate this attrition. For the ALD, the total
number of measurements when attrition is present (M
*
) can be computed using:
M
ALD
*
=!P
n
*N
C
*C
n
– P
n
*∑ N
c
*C
n
*p
j
P
n
-1
j=1
+ ∑ j*N
c
*C
n
*p
j
P
n
-1
j=1
# eq. 15
M
SCD
*
=!∑ j*N
c
*C
n
*p
j
M
j=1
# eq. 16
Substituting the number of measurements from equations 6 and 9 with the total number of
measurements from the equations above (M
*
) will allow for computation of costs when attrition
is present. The equations above assume that the amount of attrition (w) is related to study
duration and that the ALD will thus have less attrition relative to the SCD. In some
circumstances this may not be desirable such that the researcher would like to assume, for
example, 25% attrition in the SCD and ALD such that by the last period of measurement (M or
Pn) both studies will have had 25% of subjects with some amount of attrition. In this instance, the
vector of probabilities for the ALD should be constructed for periods 1 to Pn (as opposed to M)
such that t
*
j=(tj-1)/(Pn-1) with t1=0 and t Pn= 1. The formula for M* in the ALD can then be
applied to be the same as in SCD (eq. 16), but where the summation occurs over j ranging from 1
to Pn.
2.3.2 The metrics of attrition and between design comparisons
As is apparent from the cost equations modified to include attrition (eqs. 15, 16), the
impacts of attrition on costs are solely related to the cost of measurement. The metrics of savings
and sample growth are impacted by attrition, but in such a way that increases in attrition will
decrease the cost savings relative to an SCD. This is because under attrition the number of
measurements per subject will decrease at a faster rate in the SCD relative to the ALD resulting
37
in a decrease in savings for the ALD (as more measurements are retained). As a result, savings is
not the metric to examine when attrition is present, as we should primarily be interested in how
the combination of the design parameters and attrition can maximize the number of
measurements and sample size in the ALD. We can examine how the amount of attrition impacts
the number of measurements and subjects affected between an ALD and SCD for a given design
in Figure 2.21 below.
Figure 2.21: The impact of attrition on measurements and sample size
The attrition percentage (equation 17) for measurements is defined as the proportion of
measurements not taken as a result of attrition or more formally as:
attrition percentage = ((M
*
- NcCnM) / (NcCnM))*100, eq.17
In panel A, we can see that attrition in the number of measurements taken is always greater in the
SCD than in the ALD (below identity line). The same is true for examining the percentage of
subjects lost to attrition (panel B), where the SCD subject attrition will be equal to the specified
attrition parameter (w). For either, as the attrition rate (w) increases, the differences between the
SCD and ALD grow such that higher attrition rates affect the SCD more than the ALD. As the
distribution of the attrition (gamma) becomes concentrated towards the latter periods in the study
38
(i.e. as gamma increases), the differences are further enhanced affecting the SCD more than the
ALD. This can perhaps be better understood by examining panels C and D where the amount of
attrition savings (equation 18), defined as the difference in the SCD and ALD attrition
percentage, can be seen.
attrition savings = [((M
*
SCD- NcCnM SCD) / (NcCnM SCD))*100]
- [((M
*
ALD- NcCnMALD) / (NcCnMALD))*100], eq.18
For the attrition savings in measurement (panel C), increases are modest as gamma increases
with a ~9-10% savings in the number of measurements in an ALD when attrition is high and
concentrated towards the study end. The attrition savings for measurements is maximized when
γ=2 and begins to decrease thereafter, suggesting that as the attrition becomes more extreme
towards the end of the study even those in the ALD would begin to suffer more from attrition
effects. Attrition savings in subjects (panel D) is substantial as attrition increases or becomes
concentrated towards the end of the study. This increase is nonlinear such that subsequent
increases in gamma result in increasingly diminished gains in attrition savings. For the design
specified above, when the attrition rate is 35% in the SCD and attrition is concentrated towards
the end of the study (γ=3), only ~ 5% of the subjects in the ALD would suffer from attrition.
Potentially saving 30% of the subjects from attrition by using the ALD over the SCD is an
astounding level of subject savings.
Within a given design, we saw that higher attrition and attrition concentrated towards the
end of the study would have the greatest impacts on attrition savings favoring the ALD. When
examining between designs with fixed attrition, increases to the Cs/Ps ratio and Cn will increase
the attrition savings (Figure 2.22) while increases to the Pn decrease the attrition savings.
Figure 2.22: ALD design effects on attrition savings
39
These effects and their increases on attrition savings are closely related to the effects on design
efficiency which are capturing how the various design elements reduce the study length relative
to the age-span coverage. This is attributable to how we have defined attrition, such that it is
related to the study duration, thus designs with better design efficiency will have less attrition in
the ALD relative to the SCD and hence higher attrition savings. When examining statistical
power in Chapter 3, we would expect that higher values of attrition savings will relate to
increased statistical power as ALD retains a greater proportion of measurements and subjects
under attrition.
2.4 Chapter Summary
When considering an accelerated longitudinal design, we would expect most researchers
to have a clear idea in their minds about some of these design parameters. While the above
paragraphs explore changes to all of the parameters, we think it realistic that the researcher will
have a fixed age-span coverage in mind as well as some idea of the frequency of measurement
(e.g. period interval spacing Ps) required in order to capture changes in development. Given this,
we would suggest that researchers focus on the inferences as they pertain to the number of
cohorts, cohort interval spacing, and the resulting overlap between successive cohorts.
2.4.1 ALD Design Parameters and Performance Metrics
The basic design elements of an ALD are the number of cohorts (Cn), the number of
subjects per cohort (Nc), the cohort interval spacing (Cs), the number of periods (Pn) and the
period interval spacing (Ps). Varying configurations of these parameters results in studies of
differing length (time it takes to conduct the study) and coverage (age-span under study). ALDs
allow for covering the same age-spans with shorter lengths than in single-cohort designs, thereby
increasing the efficiency of the design. Increases to design efficiency can be accomplished by
increasing the number of cohorts (Cn) or the cohort interval spacing (Cs). This is because
increases in either result in increases to the total age span under study (coverage). When trying to
increase coverage, increases in Cn provide for a greater increase to efficiency over increases to
Cs. Increasing the number of periods (Pn) or period interval spacing (Ps) will decrease the design
efficiency because increases in either will increase the time it takes to complete the study
(length). When trying to decrease the length of a study while maximizing the increase to design
efficiency, it is more beneficial to decrease the Pn rather than the Ps. The ratio of Cs to Ps impacts
40
the amount of overlap in the ages of measurement between successive cohorts with higher values
indicating less overlap. There is a trade-off between design efficiency and overlap that is
determined by the Cs/Ps ratio, the Pn, and Cn. The design efficiency-overlap performance curve is
heavily impacted by the Cn, with higher values providing for better performance. Cs/Ps ratios that
are too high (> 3) or too low (< .75) will have poor performance. Within a given Cn, decreasing
the Pn will improve ALD performance trade-offs. All of the figures and underlying data from
section 2.1 was generated using the user-written aldesign package in Stata version 15.0 (Jackson,
2017a).
2.4.2 Cost Equations and Cost Metrics
Costs for and ALD can be discussed in terms of recruitment, measurement, and duration
and can be expressed as a function of the ALD design parameters (eq. 6). The cost for an SCD
covering the same age span (coverage) as the ALD can also be computed using the ALD
parameters (eq. 10). In the absence of specific cost information, costs can be rescaled to
represent proportions of the total budget per each budget category, facilitating cost comparisons
between the SCD and ALD. When comparing a given ALD design to its equivalent SCD, cost
performance metrics such as cost savings (eq. 12) can be used to get a sense of how and ALD
saves cost. The cost savings for an ALD are primarily derived from the shortened duration, thus
budgets with high durations costs (c3) will show even greater savings for the ALD relative to the
SCD. The sample size that would be possible in the ALD under the same costs of the equivalent
age-span SCD can be computed (eq. 13) and is termed the equal cost number of subjects per
cohort (EqNc). When expressed as a proportion of the initial Nc, the cost metric of sample growth
can also be evaluated (eq. 14). Cost savings and sample growth are closely related such that only
one is necessary for evaluating cost performance. Moreover, the design performance metric of
efficiency is closely related to savings with variability in this relationship derived from
differences in the relative weighting of the cost parameters. The efficiency-overlap design
performance curve mirrors those of a savings-overlap curve when adjusted for differential
relationships between design efficiency and savings due to varying costs. For comparing between
designs at fixed costs, the performance measures of design efficiency and overlap are preferred,
however when costs vary the total cost savings needs to be considered. Regardless of fixed or
variable costs, the EqN c will vary based on the design and cost parameters which can have
41
implications for statistical power. Data and figures for section 2.2 was generated using the user-
written aldcost package in Stata version 15.0 (Jackson, 2017b).
2.4.3 Attrition
Attrition can be accounted for in the cost equations (eqs. 15 & 16) through use of a
Weibull model. The cost savings will be reduced under attrition because the SCD will have
fewer measurements to take due to its longer study length. As a result, we examine the attrition
savings (eqs. 17 & 18) as our primary metric when attrition is present. Attrition savings is
defined as the difference in the proportion of missing measurements or subjects between the
ALD and SCD. The attrition savings will always indicate greater savings in the ALD than the
SCD and is maximized when attrition is both high and concentrated towards the end of the study
(i.e. γ > 1). Data and figures for 2.3 was generated using the user-written aldcost package in
Stata version 15.0 (Jackson, 2017b).
42
2.5 Chapter References
1. Baltes, P. B., & Nesselroade, J. R. (1972). Cultural change and adolescent personality
development: An application of longitudinal sequences. Developmental Psychology, 7(3),
244.
2. Bloch, D. A. (1986). Sample size requirements and the cost of a randomized clinical trial
with repeated measurements. Statistics in Medicine, 5(6), 663-667.
3. Carlsson, G., & Karlsson, K. (1970). Age, cohorts and the generation of
generations. American Sociological Review, 710-718.
4. Galbraith, S., Bowden, J., & Mander, A. (2017). Accelerated longitudinal designs: an
overview of modelling, power, costs and handling missing data. Statistical Methods in
Medical Research, 26(1), 374-398.
5. Jackson, N.J. (2017a). ALDESIGN: Stata program for the design of accelerated longitudinal
designs. Stata Version 15.0. revised 09.04.2017.
6. Jackson, N.J. (2017b). ALDCOST: Stata program for the cost computations of accelerated
longitudinal designs. Stata Version 15.0. revised 09.05.2017.
7. Moerbeek, M. (2011). The effects of the number of cohorts, degree of overlap among
cohorts, and frequency of observation on power in accelerated longitudinal
designs. Methodology: European Journal of Research Methods for the Behavioral and Social
Sciences, 7(1), 11.
8. O'Brien, R. (2014). Age-period-cohort models: Approaches and analyses with aggregate
data. CRC Press.
9. O'Malley, P. M., Bachman, J. G., & Johnston, L. D. (1984). Period, age, and cohort effects
on substance use among American youth, 1976-82. American Journal of Public
Health, 74(7), 682-688.
10. Schaie, K. W., Willis, S. L., & Pennak, S. (2005). An historical framework for cohort
differences in intelligence. Research in Human Development, 2(1-2), 43-67.
11. Ryder, N.B. (1965). The cohort as a concept in the study of social change. American
Sociological Review. 30(6): 843-861.
12. Verbeke, G., & Lesaffre, E. (1999). The Effect of Drop-Out on the Efficiency of
Longitudinal Experiments. Journal of the Royal Statistical Society: Series C (Applied
Statistics), 48(3), 363-375.
43
Chapter 3
Statistical power in accelerated designs with linear growth and no between-cohort
differences
Chapter 3 will explore how we can simulate accelerated longitudinal designs for linear
models and how these simulations can be used to answer questions about statistical power as
they relate to choices in the ALD design parameters, the amount of attrition, changing effect
sizes and growth reliability, and varying sample sizes and cost influenced sample size. This
chapter explores these in the context of designs with linear growth and no between-cohort
differences in a ‘balanced’ ALD where the number and frequency of measurements are the same
between cohorts. Performance of the ALD in the absence of between-cohort differences is an
important topic, as researchers currently using single-cohort designs to study development are
making assumptions about the presence or absence of these differences. For most, there is an
implicit belief that the findings from their particular cohort will generalize to other cohorts. In
these scenarios, the use of the ALD is especially sensible as it allows for explicit testing of this
belief as well as provides a more expedient means of assessing growth when between-cohort
differences are absent.
These investigations are novel in that while both Moerbeek (2011) and Galbraith et al.
(2017) examine power, they fail to do so in instances where the period overlap between
successive cohorts is imperfect. That is, they only investigated instances where Cs was greater
than Ps and such values were integers. This examination was likely intentional on their part, as
imperfect overlap between the periods of successive cohorts requires more frequent
measurement in a single-cohort design. As a result, the inferences from their simulations are
limited to the special cases of the 'balanced' ALDs they represent. Moreover, the choice of N for
comparison in these simulations has, up until now, only been investigated assuming a total N that
is equivalent between an ALD and single-cohort design. This work will extend these simulations
to compare ALDs with Ns that are of equal cost to their single-cohort counterparts (EqNc, section
2.2). This is an important comparison as otherwise some of the main advantages of an ALD (i.e.
shorter duration and fewer measurements) are nullified when compared to a single cohort design
for power. The previous work by Moerbeek (2011), Galbraith et al. (2017), and Miyazaki and
Raudenbush (2000) has focused on the influence of cohorts in understanding age-related change,
44
specifically in instances where there is no age-by-cohort interaction. This work will extend these
studies by further examining these designs and how statistical power is modified by the inclusion
of different numbers of cohorts.
3.1 Conceptualizing Age-Period-Cohort effects
All research designs for understanding development can be thought of as special cases of
the Age-Period-Cohort model. While cross-sectional studies of age related development
confound the effects of age and cohort and the single-cohort designs confound age and period
effects; the accelerated longitudinal design can be used to estimate effects for age (changes due
to maturation), period (changes due to time of measurement), and cohort (generational changes)
on developmental trajectories simultaneously. Data from Mason, Mason, Winsborough, & Poole
(1973) provides a nice exemplar for understanding the roles that age, period, and cohort play in
the ALD. Figure 3.1 utilizes these data to show how age, cohort, and period would influence
these imagined trajectories. The effects represent a non-linear age-related change in the absence
of cohort or period differences as shown in panels A and B. In both, the levels and trajectories do
not differ between the cohorts at a given age. The individual trajectories for each cohort are
separated and labeled by their period of measurement in panel B to allow for an easier viewing
of these effects.
Figure 3.1: Exemplar of Age-Period-Cohort effects
45
Panels C and D show the effect of cohort in the absence of period effects. In panel C, the
trajectories for each cohort are the same, but have been shifted such that the younger cohorts
reach a minimum at an earlier age. It's important to note that this represents one type of cohort
difference, that is, differences in level. If the trajectories of the outcome are also influenced by
cohort (i.e. cohort-by-age interaction) then we might expect the pattern we see for cohort 3 in
panel D. When considering period effects alone, it is useful to consider how period was handled
in the prior panels. For example, in panels B and C, the values for measurements taken in the
same period are a direct function of either 1) the cohort interval spacing (Cs) (panel B) whereby
cohort and period are confounded or 2) the period interval spacing (Ps) (panel C) whereby period
and age are confounded. The value for the 1
st
period of the second cohort is at the same level in
the first cohort of the 3
rd
period (panel B) and 2
nd
period (panel C) for each of these scenarios.
When we examine period and age effects alone (panel E), we find that period 1 has the same
values across all the cohorts, demonstrating the absence of an age effect. This confounding of
period and age, when plotted across age for each cohort, may give the appearance of cohort
differences; however, when plotted by period (panel F, labeled by age at measurement) it can
been seen that there are no cohort differences present. These figures help provide a conceptual
framework for understanding the influences of period and cohort. While this dissertation will
primarily focus on the influence of cohorts, it is important to have a broader understanding of
these designs to allow for flexibility in modeling. Section 3.2 below will provide us with
descriptions of statistical methods used for simulating and assessing accelerated designs.
3.2 Simulating an ALD
In order to evaluate the usefulness of the ALD we must first understand the elements for
simulating these designs such as the choice of sample size, effect size, attrition, linear and
nonlinear equations of cohort changes, the creation of age distributions, and the determination of
power and bias in the ALD. Each of these are described in detail below. Simulations of the
ALDs were conducted using the user-written package aldsim (Jackson, 2017) in Stata version
15.0 (College Station, TX). As we would expect researchers to have a specific age-span
coverage in mind when designing their study, these simulations will all have a fixed coverage
studying 10 years of development from the ages of 10 to 20.
46
3.2.1 Determining Sample Size and number of Simulations
It is recognized that the social sciences typically employ small samples, so the sample
sizes for the simulations are designed to reflect this reality with total N's not exceeding 120
subjects. Simulations will also be conducted using the equivalent cost Nc described in Chapter 2,
section 2. The number of simulations required for each problem may vary, however simulations
will generally range between 1,000 and 2,000 iterations to account for model estimation failures
and provide a minimal number of 1,000 simulations.
3.2.2 Power
The primary metrics of the ALD in this chapter will be statistical power. Statistical
power will be determined through simulation only. The proportion of simulations yielding a
statistically significant (p<0.05) effect for a parameter will be taken as the power to detect that
effect. In this chapter, this refers to the testing of the fixed effect for age. When testing for the
ability to detect random variance components (i.e. cohort differences) in Chapter 6, statistical
significance will be determined through use of a likelihood-ratio test.
3.2.3 Determining effect sizes
In order to conduct the simulations necessary for the subsequent chapters a measure of
effect size is necessary. Hertzdog, Lindenberger, Ghisletta, and von Oertzen (2006) introduce
growth curve reliability (GCR) as a measure for estimating the proportion of variation in the
outcome that is due to change or growth. The GCR can be conceptualized as the ratio of variance
explained by the growth parameters over the total variance. For linear models, the formula for
the total variance of the outcome for each observation (i) and subject (j) is provided below:
var(yij) = σ
2
u0 + x
2
ij σ
2
u1 + 2xij σu01 + σ
2
ε eq. 2
The total variance (eq 2) is a function of the random variation in intercept (σ
2
u0), the values for
age (xij), random variation in the slope (σ
2
u1), the covariance of slope and intercept (σu01), and
unexplained normally distributed residual variance ( σ
2
ε). The GCR is then defined as:
GCR(xij) = (σ
2
u0 + x
2
ij σ
2
u1 + 2xij σu01) / ( σ
2
u0 + x
2
ij σ
2
u1 + 2xij σu01 + σ
2
ε ) eq. 3
As is evident from the formula for the GCR (eq. 3), the relative proportion of growth explained
by the slope is dependent upon the ages for which the GCR is being computed. For all
simulations, the GCR will be determined at the average starting age of the simulated studies (e.g.
10 years old). The GCR will also be constant across cohorts. What is also evident from equation
3 is that GCR is independent of the magnitude of the slope and thus also independent of the
47
traditional measure of effect size in these studies (i.e. slope). Therefore, this study has two
components to consider when examining the effects of growth; 1) the amount of variation in
growth accounted for by the model (i.e. GCR), and 2) the standardized growth rate (δ) defined as
the ratio of the slope to its standard deviation. For most of the simulations, unless otherwise
specified, the growth rate will be held constant at 0.5 using a slope value of 2 and the default
GCR will be 0.5. The variability in intercept will be set to be equal to the variability in slope in
these simulations and covariance of intercept and slope are presumed (and simulated as) 0,
presenting a conservative estimate of these values. To illustrate what these GCR and effect size
values look like in terms of linear growth, Figure 3.2 below shows linear growth with respect to
the GCR and δ values.
Figure 3.2: Exemplar of GCR and Effect Size on Linear Growth
In essence the GCR determines the amount of residual variance in these models. While all of the
simulations presented in this dissertation will assume a homoscedastic error structure, in practice
we might expect heteroscedastic error. Moreover, the distribution of this error is assumed to be
normal. While we do not have to concern ourselves with violations to these assumptions in these
simulated data; we should take note that the results presented here are only representative of the
limited scenarios they represent and in real data violations to the assumptions would need to be
appropriately incorporated into the modeling.
48
3.2.4 Incorporating attrition
Attrition will be incorporated in the manner described in Chapter 2, section 2.3. Attrition
values will be varied from 0 to 35% in increments of 5%. Given that longitudinal designs will
generally be expected to experience greater attrition towards the end of the study, gamma values
will be explored for when γ≥1. Specifically, when γ=1 (MCAR), and for when γ=2 or γ=3
(MAR). These gamma values may be modified for the nonlinear models, as attrition distributed
towards the beginning of the design may be particularly problematic for nonlinear models where
the greatest amount of change occurs at the start of the age distribution (γ=0.5).
3.2.5 Simulating the Age Distribution
Age at the first period will be simulated based on a truncated normal distribution; Age1 ~
N( µ, σ
2
). The lower (a) and upper (b) bounds that restrict the age range, as well as the mean
staring age (µ), will be shifted dependent upon the cohort being simulated (Nn) and the fixed
interval spacing between successive cohorts (Cs). Using these values, we can derive a Z score for
the lower bound (Za) and upper bound (Zb) of these age distributions.
Nk ={1, ... Cn-1, Cn}
𝑍
"
=
%"&'
(
(*
+
,-)/,(0&'
(
(*
+
,-))
1
eq.4a
𝑍
2
=
%2&'
(
(*
+
,-)/,(0&'
(
(*
+
,-))
1
eq.4b
Using the standard normal cumulative density function, we can generate a random normal
distribution of ages at the first period (Age 1) restricted to the intervals between a and b using the
following:
Age1 = Ф(Za) + U(0,1)*(Ф(Zb) – Ф(Za)) eq. 5
where U(0,1) is the random uniform distribution. For subsequent periods after the first, the age
distribution will be determined by the fixed period interval spacing (Ps) and the current period
(Np) out of the total number of periods such that:
Np = {2,…Pn-1, Pn}
AgeNp = Age1 + Ps(Np -1) eq. 6
In this manner, the error distribution for the ages will be equivalent across periods within and
between cohorts. For the simulations presented in this dissertation, the age at first period for the
youngest cohort will be 10 years with a standard deviation of 0.5. The upper and lower bounds
will be 10+Ps and 10-Ps respectively so that there is minimal overlap in the age distributions
49
between periods.
3.2.6 The use of APC methods for estimating ALDs
Various methods have been proposed to analyze APC models, of which the ALD is a
specific case. Because of the linear dependency between age, period, and cohort (Age=Period-
Cohort), historically researchers have recommended the exclusion of one of these factors (e.g. a
two-factor model) when modeling age-related development in multiple cohorts (Baltes, 1968;
Baltes & Nesselroade, 1970; Baltes, 1972). While exclusion of one factor is a convenient
solution to breaking the linear dependency; in instances where it is believed that each factor
carries a distinct causal effect, the exclusion of any single factor will result in a bias in the
remaining parameters. Early attempts at estimating unique effects for age, period, and cohort
advocated for the use of multiple classification analysis to fit multiple three-factor models with
equality constraints set between two levels of one of the factors (e.g. Period1=Period2;
Age1=Age2, etc.) (Mason et al., 1973). While these constraints allow for a single model to
estimate each of the APC contributions, the solutions provided by these models are not unique,
such that model fit cannot be used to discern between differently constrained models (O'Brien,
2014). Additionally, the application of these equality constraints are not grounded in theory but
rather born from a desire to solve the APC identification problem. Given that the model
estimates are sensitive to the choice in constraints (Mason & Smith, 1985) this approach
sacrifices achieving the best estimate of growth for the ease of modeling. Moreover, the additive
nature of these models ignores potential cohort-by-age and period-by-age interaction, which are
known to exists in many substantive applications (Glenn, 1976). Other approaches such as
nonlinear transformation of one of the factors (Mason & Fienberg, 1985), the use of proxy
variables (Smith, Mason, & Fienberg, 1982), the creation of estimable functions (Holoford,
1983), or even the use of principal components to create an intrinsic estimator (Yang, Fu, &
Land, 2004; Yang & Land, 2016) have been explored; though each carry drawbacks (Mason and
Wolfinger, 2001). As an alternative to these models Yang & Land (2006; 2008) and O'Brien,
Hudson, and Stockard (2008) proposed using an age-period-cohort mixed model approach using
maximum likelihood estimation for aggregate data in form of:
yij = µ + αi + πj + υ0k + εij eq. 7
which models fixed effects of age (αi) and period (πj), with cohort (υ0k) random effects. While the
APC mixed model (APCMM) (or hierarchal APC as named by Yang) was designed as a solution
50
for the APC identification problem, it must be adapted when used in an ALD context, as the
ALD features repeated measurements within-person. As such, the equation can be rewritten to
include random variation in intercept at the person level, named the accelerated longitudinal
design mixed model (ALDMM):
yijk = µ + αi + πj + υ0i + υ0k + εij eq. 8
Alternate variations of the model can be employed where period effects are alternately
considered random or perhaps with both cohort and period crossed random effects (called the
hierarchical APC; Yang & Land, 2016). For the simulations in this chapter, given that the data
generating mechanism assumes no between-cohort differences in intercept or slope (and no
period influences) we can utilize a revised version of the model which only models a random
intercept per person and random slope for age common across the cohorts (equation 10).
yijk = β0 + β1xijk + υ0j + υ1j xijk + εijk eq. 9
From this equation (eq. 9) it can be seen that there is a fixed effect (β1) for age (x) which does not
vary between cohorts but does vary between subjects (υ1j xijk). The starting level (β0) is dependent
upon the variation in these levels between subjects (υ0j). There is no correlation between
intercept and slope, as the slope (β1) is a fixed effect. As mentioned above when discussing the
GCR, the residual variance is assumed homoscedastic and normally distributed. While this is
true for these simulated data, should these models be fit to data that violate these assumptions,
this model would likely prove too conservative in its estimation. This ALDMM will be the
primary model used in the simulations for this chapter and is the basis of the models used in
subsequent chapters.
3.3 Statistical power of ALDs with no cohort differences
In order to begin comparing statistical power between the ALD and SCD, we must first
identify which designs we will compare. For all of the following simulations, we will examine
designs that cover a 10-year age span. Table 3.1 below shows how the values for Cn, Cs, Pn, and
Ps were varied in coming up with these designs.
51
Table 3.1: Variation in design parameters for simulation
Design Parameter Range By Increment
Cn 2 to 5 By 1
Cs 0.5 to 4 By 0.5
Pn 2 to 20 By 1
Ps 0.5 to 4 By 0.5
The variation of these parameters results in 4,864 unique designs, of which 81 had
coverage values of 10 years. Designs were additionally not considered for those where the
number of measurements in the equivalent single cohort design would be non-integers (N=22) or
if the overlap proportion was less than 0 (N=7), resulting in 52 possible designs to be initially
explored. These first round of simulations were designed to compare the ALD to the SCD under
circumstances that were optimal for the SCD (i.e. no cohort differences). In this manner, we can
get a sense of how well the ALD may perform even when the design should favor the SCD.
These conditions favoring the SCD are: no period or cohort differences, no attrition, and equal N
between the SCD and ALD. Each design was simulated 1,000 times with a total N of 60 utilizing
a linear growth model with GCR=0.5 and growth rate (δ) = 0.5.
3.3.1 Differences in power between the SCD and ALD
Statistical power in the SCD is almost universally higher than in the ALD (Figure 3.3)
and is related to the number of measurements in the equivalent SCD design such that a higher
number of measurements yields greater power in the SCD.
Figure 3.3: Relationship between ALD and SCD power
Given that for these simulations the study coverage is fixed at 10 years, the number of
measurements in the SCD is entirely determined by the Ps, where smaller period interval spacing
results in more measurements in the SCD, and thus greater SCD power. On average in these 52
52
designs, the ALD power was 52% while the SCD power was at 82%. Despite these great
differences, as we can see in the figure, there was wide variability in this power in the ALD with
some designs equaling the power of the SCD.
3.3.2 Power in the SCD and ALD and the number of measurements
While the variability in power between SCDs is mostly determined by the number of
measurements, there is a high degree of variability in the power of the ALD for a given SCD. For
some designs, the difference in power between the SCD and ALD is minimal and may be related
to the number of cohorts and ratio between the cohort and period interval spacings (Figure 3.4).
Figure 3.4: For a given SCD, smaller Cs/Ps Ratios minimize power differences with the ALD
For SCDs with 21 measurements, the equivalent ALD will have statistical power similar to the
SCD when the Cs to Ps ratio is smaller. Increases to this ratio increase the differences in power
(favoring the SCD) and this rate of change is increased as the number of cohorts increases, thus a
smaller number of cohorts is preferable to minimize power differences from the SCD. Indeed the
2 and 3-cohort designs were within 3% power of the SCD at the lowest Cs/Ps ratio and the 4 and
5-cohort designs within 10%. This finding concurs with that of Moerbeek (2011) and Galbraith
(2017) who showed that at a fixed number of subjects designs with a greater number of cohorts
were less statistically powerful. This occurs because we have fixed the amount of study coverage
and then varied the parameters. Given that we have fixed both the coverage to 10 (i.e. age span)
and the Ps to 0.5 (SCD M=21), the changes to the cohort spacing and number of cohorts reflects
changes to the number of periods in the ALD. Moreover, as we recall from Chapter 2, the
number of periods (Pn) are inversely related to the design efficiency.
53
Figure 3.5: Power in the ALD is related to the number of periods (Pn)
From Figure 3.5 we see that much like the SCD, the power in the ALD is related to the number
of measurements taken (panel A), with higher values for Pn increasing the power. For each
additional measurement, the power in the ALD in this exemplar increases ~5%. As a result, the
design efficiency is inversely related to power (panel B), as less efficient designs will more
closely approximate the SCD, and thus have higher statistical power. These findings closely
align with those of both Moerbeek (2011) and Galbriath et al. (2017) who found that greater
statistical power was achieved for designs with greater numbers of measurements when the total
sample size was fixed.
3.3.3 Effects of Period Spacing on ALD Power
Similarly, Moerbeek (2011) noted that for a fixed number of measurements, designs with
smaller frequencies of observation (e.g. larger period spacing (Ps)) were more powerful. Hints of
this were seen for the SCD above in Figure 3.3, which showed differences in power based on the
number of measurements as determined by the Ps. In Figure 3.6 below, we can see how the total
number of measurements relates to statistical power at various period interval spacing for the
ALD.
54
Figure 3.6: Less frequent measurement (i.e. higher Ps) will increase power for the same number
of measurements
The rate of increase in power per measurement is faster with less frequent measurement. This has
implications for the cost of these designs, for example, ~60% power can be achieved with a Ps of
2 using 300 measurements or with a Ps of 0.5 with 780 measurements, a more than doubling of
the number of measurements required. As such, designs with less frequent measurement will
provide greater statistical power for the same number of measurements.
In the scenarios proposed above, increases to the number of periods (Fig 3.5) or the
period interval spacing (Fig 3.6) can increase the power of these designs. In these exemplars,
where the coverage is fixed, changes to the period spacing will result in changes to the number
of periods (as well as cohort interval spacing) in order to accommodate the study coverage. As a
result, changes to either the period spacing or number will mirror each other (Figure 3.7).
Figure 3.7: Changes to Period Spacing and Period Number are equivalent
In Figure 3.7 we can see the rank ordering of the period spacing and period number and how
increases in one parameter are offset by decreases in the other parameters. Given this, for the
55
researcher with a specific age-span coverage in mind, changes to period spacing or number will
offset one another. In most circumstances we would expect that the researcher, having some
notion of how frequently measurement needs to occur in order to capture change (Ps), would
choose to fix Ps and then modify the parameters of Cs and Cn in order to achieve the appropriate
coverage. As such, we will primarily focus on the impact of the number of cohorts and cohort
interval spacing on the power of these designs.
3.3.4 The impact of Cohort parameters on ALD Power
The number of cohorts (Cn) and cohort interval spacing (Cs) impact the ability of an ALD
to reduce the study length in order to accomplish the same amount of age-span coverage. As we
recall from Chapter 2, increased number of cohorts are offset by decreases in the number of
measurements per subject (Pn) in order to cover the same age-span in less time. The amount of
cohort interval spacing, relative to the period interval spacing and the number of periods, will
determine the amount of overlap between two successive cohorts. Prior to examining the
individual parameters of Cn and Cs, we will first look at the impact of their downstream
influences on length and overlap.
Figure 3.8: Study length and overlap are related to Power
Increases to the study length or the amount of overlap between cohorts will increase the power of
the ALD (Figure 3.8). For study length (panel A), designs with greater duration will have
greater power. This aligns with the findings of Galbraith et al. (2017) that showed non-linear
increases in power as duration increased for a fixed total N. The reason why this occurs should
be clear in light of previous results showing the associations between the number of
measurements and power (Figs. 3.5 & 3.6). When the study coverage is fixed, designs with a
longer duration will be ones that more closely approximate the SCD and will necessitate a larger
56
total number of measurements than shorter duration designs. In panel A, we also note that for a
fixed duration, designs with more frequent measurement (smaller Ps) will be more powerful. This
occurs, again, because of the resulting impact on the total number of measurements. There are no
noticeable differences in power between designs of differing number of cohorts, as in order to
achieve the same duration with a greater number of cohorts, the cohort interval spacing is simply
reduced. As a result, the total number of measurements between cohorts is the same for a given
period spacing and duration, resulting in similar power values.
For the proportion of overlap between cohorts (Figure 3.8, panel B), larger values are
associated with greater power. This is analogous to the examination conducted in Figure 3.3, as
low Cs/Ps ratios correspond to greater overlap between the cohorts. There is a non-linearity to
this increase such that the rate of power increase is greater when there is a higher degree of
overlap and this degree of non-linearity is more pronounced for designs with a greater number of
cohorts (Cn) and less frequent measurement (Ps). For a fixed value of overlap, designs with
fewer cohorts and more frequent measurement will be more statistically powerful, as these
designs will necessitate a greater number of measurements for the same amount of overlap. For a
given cohort number and period spacing, the increases in overlap and subsequently power, are
directly related to the decline in the cohort spacing (Cs) resulting in increases to (Pn) to cover the
same age spans. Thus, the non-linearity observed between overlap and power is primarily driven
by differences in the number of measurements which can be seen when we replace the y-axis in
Figure 3.8 panel B with the number of measurements instead of power in Figure 3.9 below.
Figure 3.9: The power-overlap association is driven by the total number of measurements.
While both Moerbeek (2011) and Glabriath et al. (2017) noted greater power with greater
overlap, neither noted that this association was nonlinear, likely due to limitations in the range of
57
overlap explored; nor that this association was largely driven by resulting changes to the total
number of measurements. Given that differences between designs with regard to overlap and
length age largely a function of the total number of measurements for a fixed N, we will examine
the remaining analyses with respect to both the number of subjects and the number of
measurements. Figure 3.10 can be used to understand the role of the number of cohorts and
cohort spacing on statistical power in the ALD.
Figure 3.10: The number of Cohorts and Cohort Interval Spacing on Power
As the cohort interval spacing (Cs) increases for a fixed frequency of measurement (Ps), the
statistical power will decrease. This aligns with our findings from Figure 3.3 which showed that
as the Cs/Ps ratio increases (i.e. less overlap) the statistical power decreases. This effect is greater
in designs with a greater number of cohorts because changes to the Cs reduce the number of
measurements by a factor of Nc*(Cn-1), thus a larger number of cohorts will see a greater
reduction in the number of measurements (and hence power) compared to a smaller number of
cohorts. When Cs is fixed (e.g. at 2) and Ps is modulated, we find that less frequent measurement
(higher Ps) increases the rate of change for the power curve. Thus, an overarching goal for design
would be to maximize the Ps while minimizing the Cs (i.e. have smaller Cs/Ps ratios). There are
of course practical considerations, as designs with Cs/Ps ratios less than 1 will minimize the
ability of the ALD to reduce the study length relative to an SCD (see Chapter 2).
Overall, we find that statistical power is greatest for a smaller number of cohorts. Why
this might occur should be apparent, as with fewer cohorts a greater proportion of the trajectory
is able to be explained by any single cohort. Indeed, as the number of cohorts decreases, the
number of measurements per cohort will increase to ensure the same age-span coverage. This
will increase the amount of study overlap between cohorts and reduce the design efficiency.
58
When there is a large number of cohorts, each cohort can explain only a small portion of the
overall trajectory as a result of fewer measurements per subject. This finding regarding cohorts
in contrary to the findings of Moerbeek (2012) and Galbraith et al. (2017) which showed that
fewer cohorts were preferable when the total sample size was fixed, but that more cohorts
yielded greater statistical power when the total number of measurements was fixed. To see why
this might be, we can consider the exemplar in Figure 3.11 which shows the power curves
between designs of different Cn across sample sizes (panel A) and number of measurements
(panel B) for scenarios where the overlap is high. The remaining simulations in this chapter will
examine scenarios where there is a high degree of overlap (Cs/Ps=1) as these provide the most
statistically powerful ALDs while still allowing for them to have modest design efficiency in
covering the age span over a shortened duration.
Examining these power curves as a function of the both the sample size and number of
measurements allows for flexibility for the researcher looking to create a design that either
minimizes the total sample size or alternately the number of measurements.
Figure 3.11: Statistical power between number of cohorts by sample size and number of
measurements
For both we find that a fewer number of cohorts yields the greatest power for both a fixed
number of subjects (panel A) and measurements (panel B). The discrepancies between these
simulations and those of Moerbeek (2011) and Galbriath et al. (2017) may have to do with the
differences in the growth curve reliability that was specified. As we will see in Section 3.3.5,
higher values of GCR can yield very different power curves for a fixed number of measurements
across the cohorts. Thus the relationships specified above may be dependent on the GCR
specified, which Moerbeek (2011) failed to note. Table 3.2 below shows sample sizes (N) and
59
total number of measurements (M) needed to achieve 80% and 90% power under this design
where the overlap is high (Cs/Ps=1).
Table 3.2: Sample Sizes and Number of Measurements for achieving power with high overlap
Number of
Cohorts (Cn)
N for 80%
Power
M for 80%
Power
N for 90%
Power
M for 90%
Power
1 56 620 80 883
2 67 668 91 905
3 82 740 114 1030
4 106 849 144 1156
5 137 958 180 1263
As we can see, the differences between the sample sizes and number of measurements for the
SCD versus the ALD are slight for a smaller number of cohorts and show that while power loss
is present by using the ALD, the loss is minimal at larger sample sizes. Specifically, the power
differences between designs with different number of cohorts are more pronounced as the sample
size (Figure 3.12, panel A) or number of measurements (panel B) increases up until the point on
the power curve where the rate of change is greatest. After this inflection point, the differences
between the cohorts diminish.
Figure 3.12: Power differences as a function of number of cohorts
3.3.5 The role of GCR and Growth Rate on Power
The growth curve reliability and slope effect size both impact the power of these designs.
While the power curves in the preceding sections are derived from fixed values of GCR and
growth rate (δ) equal to 0.5, we can also examine how these power curves shift when the GCR or
growth rate is modulated. In Figure 3.13, we can see how increases in GCR from 0.5 to 0.7 and
0.9 result in increased power for a fixed sample size (panel A) or total number of measurements
(panel B).
60
Figure 3.13: Power curves are dependent on the growth curve reliability of the design
Interestingly, while previously we showed that designs with more cohorts had less power for a
fixed total number of measurements (Figure 3.11), Figure 3.13 panel B shows that this was a
function of the GCR. This potentially explains the previous discrepancies with Moerbeek (2011)
who showed that for a fixed number of measurements statistical power was greater with more
cohorts. Moerbeek employed GCR values greater than 0.65 while our previous simulations have
been using a GCR value of 0.5. For a fixed number of measurements, smaller GCR values yield
an inverse relationship between statistical power and the number of cohorts. However, as the
reliability in the growth curve increases (higher GCR) the associations of power and number of
cohorts become positively associated, such that a greater number of cohorts yields greater power.
This can also be seen in Table 3.3 below, which shows the sample size and number of
measurements for achieving 80% and 90% based on the GCR and number of cohorts used.
Table 3.3: Sample Sizes and Number of Measurements for achieving power with high overlap
and high GCR
Number of
Cohorts (Cn)
GCR
N for 80%
Power
M for 80%
Power
N for 90%
Power
M for 90%
Power
1 0.7 43 471 61 668
2 0.7 48 481 64 641
3 0.7 53 477 70 630
4 0.7 60 479 83 666
5 0.7 77 540 109 760
1 0.9 35 388 46 502
2 0.9 37 367 51 512
3 0.9 39 353 53 473
4 0.9 39 308 56 447
5 0.9 40 279 59 416
61
Moreover, at higher values of growth curve reliability, the between design differences in
power is minimized. Thus, while a larger number of cohorts provides for the best power curve
when GCR is high and the number of measurements fixed, the differences between the power
curves derived from a different number of cohorts is minimized at this high GCR (Figure 3.14).
Figure 3.14: Higher GCR minimizes between Cn Power Differences
Changes to the GCR impact designs with a greater number of cohorts the most. As seen
in Figure 3.15, the amount of change in power relative to the power curve when GCR was 0.5
was greatest for larger cohort designs when the GCR increased to 0.7 or 0.9. This was true for
when the number of subjects (panel A) and when the number of measurements was fixed (panel
B). This mirrors the findings of Moerbeek (2011) who showed that increases in the GCR from
0.65 to 0.90 increased the power and that these increases were more pronounced in the designs
with a greater number of cohorts.
62
Figure 3.15: Changes to the GCR have a greater impact on power for designs with a greater
number of cohorts.
These results suggest that designs with a greater number of cohorts will be more reactive
to changes in the reliability of the growth curve. In Table 3.4 below, we see how at a high GCR
of 0.9, the maximal increase in power for the SCD are only ~23% while the 5-cohort design
increases by ~47%.
Table 3.4: Maximal increases in Power from a GCR of 0.5
Number of
Cohorts (Cn)
ΔPower for
GCR=0.7
ΔPower for
GCR=0.9
1 0.146 0.229
2 0.181 0.268
3 0.205 0.303
4 0.240 0.393
5 0.233 0.467
Moreover, the shape of the power curves in Figure 3.15 suggest that while increases in the
sample size or number of measurements will accentuate differences between different GCRs,
these differences are minimized in terms of proportional change as the number of subjects or
measurements are increased (panels C & D).
When modifying the growth rate (δ), defined as the rate of change (i.e. slope) over its
standard deviation, we unsurprisingly find that increases in the growth rate result in increases to
the power curve (Figure 3.16). This increase is more pronounced in instances where the number
of measurements in fixed (panels B and D). Unlike the GCR, a greater effect size does not
63
change the relative ordering of the power curves based on the number of cohorts in the design.
Instead, we find that the ordering by number of cohorts is still dependent on changes to the GCR
which can be seen by comparing the curves in panels A and B (where GCR=0.5) with those of
panels C and D (where GCR=0.9). It should also be noted that while the effect sizes used at the
low GCR of 0.5 were values of 0.5, 1.0, and 1.5; at the high GCR of 0.9, these effect sizes were
reduced to 0.5, 0.75, and 1.0.
Figure 3.16: Increased growth rate (δ) increases power
Table 3.5 below also shows the sample sizes needed to achieve 80% and 90% power for these
designs at the various effect sizes reported for both a GCR of 0.5 and 0.9. These can be
compared to the values for when the effect size was 0.5 in Tables 3.2 and 3.3.
Table 3.5: Required Sample Sizes to achieve 80% and 90% at various effect sizes
GCR=0.5 GCR=0.9
Number of
Cohorts
(Cn)
Effect
Size
N for 80%
Power
N for
90%
Power
Effect
Size
N for
80%
Power
N for
90%
Power
1 1.0 63 80 0.75 35 46
2 1.0 72 92 0.75 37 51
3 1.0 85 114 0.75 39 53
4 1.0 106 120 0.75 39 56
5 1.0 120 120 0.75 40 59
1 1.5 16 20 1.0 10 12
2 1.5 17 24 1.0 10 13
3 1.5 22 28 1.0 10 15
4 1.5 27 36 1.0 10 16
5 1.5 36 50 1.0 10 16
64
The interplay between the GCR and effect size (δ) can be difficult to understand. As we saw in
Figure 3.15, when the effect size is held constant, increases to the GCR are more impactful for
designs with a greater number of cohorts, however this change is diminished for larger effect
sizes (3.17, panels A and B), indicating that within-design differences in power due to GCR are
minimized at large effect sizes.
Figure 3.17: Interplay between GCR and Growth Rate (δ)
When the GCR is fixed and the differences in power are examined for going from a smaller
effect size (δ=0.5) to a larger one (δ=1.0), a similar pattern is observed. For a fixed sample size,
changes to effect size are more impactful for designs with a greater number of cohorts and that
this impact is reduced at higher GCR (3.17, panel C). However, when the total number of
measurements is fixed we note that while changes to the effect size are more impactful on
designs with a greater number of cohorts at a lower GCR (0.5), the reverse occurs at high GCRs
(0.9) such that a change in effect size has a greater impact on designs with a smaller number of
cohorts (panel D).
Overall these results highlight that there is differential impact to changes in statistical
power when modifying the GCR or effect size and that these two metrics of effect impact the
relative ordering of the power curves for designs with differing number of cohorts. This is of
great importance, as prior work by Moerbeek (2011) showed that designs with a greater number
of cohorts were the most powerful for a fixed number of measurements. These simulations show
65
that this is not universally true and is dependent on the GCR. When the total sample size is fixed,
designs with a smaller number of cohorts are the most powerful. Increases to the GCR or growth
rate will increase the power and changes to either will impact larger cohort designs more, though
this effect diminishes as either GCR or growth rate increases. When the number of
measurements is fixed, designs with a small number of cohorts will be most powerful when GCR
is small. As the GCR increases, the power differences between the cohorts is reduced and
designs with a greater number of cohorts become the most powerful. Increases to the growth rate
will also increase the power of these designs at fixed measurements, however the impact of
increasing the effect size is differential for designs of different cohort numbers. When the GCR
is low, increase to the growth rate effect the power of larger cohort designs more, while at higher
GCRs, the impact is greater for designs with a smaller number of cohorts.
3.3.6 Attrition and power loss in the ALD and SCD
In the previous sections, the examination of statistical power has been conducted under
situations of no attrition. Power analysis in the absence of attrition will yield the greatest
statistical power for an ALD or SCD; however this does not mirror the realities of research
where subject attrition is common. In Figure 3.18 below, we can see how varying levels of
attrition (w) for a fixed gamma (γ=1) impact the power curves for these various designs. The
gamma value of 1 in these curves indicates a scenario where missing data are uniformly
distributed as Missing Completely at Random (MCAR). With our typical setup, panel A shows
the power curves for no attrition, 15% attrition, and 30% attrition as a function of sample size. It
is important to note that under the Weibull model used to define attrition (see section 2.3 in
Chapter 2), the sample size will not vary by the amount of attrition but rather the number of
measurements for a given subject.
66
Figure 3.18: Power curves for attrition when data are MCAR.
For both the total number of subjects (panel A) and number of measurements (panel B), we
unsurprisingly find that designs with greater attrition will have less power at the same sample
size or number of measurements. Power differences between attrition levels are greater for a
fixed sample size (panel A) than a fixed total number of measurements (panel B) and show that
designs with a greater number of cohorts are less sensitive to power changes as a result of
attrition. When attrition is fixed, but gamma values are varied (Figure 3.19), we find that as
attrition occurs more at the end of the study (i.e. higher gamma) the power is greater for all
designs.
Figure 3.19: Power curves for gamma when attrition is fixed.
Given that the designs being simulated are linear, it is not surprising that concentrating attrition
towards the end of the study would maximize power, as much of the trajectory would be able to
be explained by the earlier non-missing periods of measurement. The differences in power
between gamma values are slight and taken together with the findings regarding attrition suggest
that is most desirable to have a small amount of attrition concentrated towards the end of the
67
study. Naturally, these results only extend to the limited circumstances they represent. Prior work
on attrition patterns by Timmons and Preacher (2015) has shown that attrition patterns for single-
cohort designs with linear growth are most powerful when attrition is concentrated towards the
center of the age distribution, as the power of the linear model is mostly determined by its anchor
points at the extremes of the distribution. In practice however, this pattern of attrition is not
likely to arise naturally.
When examining power-loss, defined as the difference in power between designs with
and without attrition, designs with a smaller number of cohorts experience greater power loss
(Figure 3.20) which is increased through greater attrition (w) or through decreased gamma
favoring uniform dispersion of missing data (panels A & B vs C & D).
Figure 3.20: Power-Loss as it pertains to number of cohorts (Cn), attrition (w), and gamma (γ)
The results from the power loss figures largely confirm those from figures 3.18 and 3.19
showing that designs with less attrition and higher gamma are more powerful and subject to less
power loss. Moreover, these figures show how the SCD is subject to a greater amount of power-
loss at the same amount of attrition or gamma relative to the ALDs. Despite this, the maximal
power-loss in the ALDs in these scenarios does not differ substantially from the SCD in the
designs, particularly among ALDs with a smaller number of cohorts.
68
Table 3.6: Maximal Power loss as a function of gamma and attrition
Number of
Cohorts (Cn)
Gamma (γ)
Attrition
w=15%
Power-Loss
Attrition
w=30%
Power-Loss
1 1 -0.073 -0.121
2 1 -0.068 -0.118
3 1 -0.064 -0.131
4 1 -0.063 -0.085
5 1 -0.069 -0.100
1 3 -0.053 -0.069
2 3 -0.047 -0.063
3 3 -0.039 -0.062
4 3 -0.038 -0.054
5 3 -0.026 -0.036
For designs where attrition is uniform, the differences in power for the SCD compared to the
ALD are within 2% regardless of the attrition amount (see Table 3.6). It is only when the
attrition is concentrated towards the end of the study that noticeable differences emerge, and only
then primarily for the larger-cohort designs. The 5-cohort design, for example, shows a halving
of the power-loss experienced by SCD when attrition is concentrated towards the end of the
study. Part of the reason why these particular ALDs do not show such great discrepancies in
power-loss compared to the SCD is that their high overlap results in the study duration being not
that much shorter than the SCD presented. For example, the 2-cohort design has only 1 less year
of measurement then the SCD. Regardless, these results seem to indicate that how the attrition is
distributed may matter more than then amount of attrition in these linear models.
Although Galbraith et al. (2017) showed that the amount of power-loss relative to the
same design without attrition was related to duration or study length and that designs of greater
duration would have the most power loss; we find that this relationship when the study age span
coverage is fixed, is dependent on other design features related to the positioning on the power
curve. For example, when examining the power curves from Figure 3.18 above, we would find
for the single-cohort designs that for very small or very large N, the power loss is minimized.
Conversely in the central portions of the power curve, the power loss would be greater. Despite
this, the length of the design is fixed for a design of a given number of cohorts and only varies
through the reduction in length caused by the addition of cohorts. Thus, at any given point on the
power curve, the power-loss/length association is a function of the power for a given number of
cohorts and between length differences will be dependent on where along the power curve a plot
of loss vs length is created. For example, while power-loss is more pronounced for lower values
69
of gamma and higher values of attrition, the association with study length will vary as a result of
the sample size chosen (Figure 3.21).
Figure 3.21: Association of Study Length with Power Loss varies
In panel A, the sample size at N=60 shows the similar pattern described by Galbraith et al.
(2017), with increased duration resulting in greater power loss. However, as we can see in panel
B, this association is different when examining these values at a fixed sample size of 100. As
explained above, this occurs as a result of where on the power curve these values are taken from.
While all curves will show a pattern of minimal loss, maximal loss, and minimal loss again, this
pattern will occur more slowly for designs with a greater number of cohorts.
One design aspect that is likely to affect these power curves in the presence of attrition is
the period interval spacing (Ps). Designs with a greater frequency of measurement (lower Ps) will
have more periods and hence may be more robust to the loss in measurements. As seen in Figure
3.22, increasing the frequency in measurement increases power for a fixed sample size but
decreases the power for a fixed total number of measurements under attrition.
Figure 3.22: Changes to frequency of measurement impact power under attrition.
70
As explained previously, this results from the tradeoffs between Ps, Pn, and the sample size such
that more frequent measurement for a fixed N result in a greater number of measurements and
hence more power. For a fixed number of measurements, the reverse occurs, resulting in a
smaller sample size at the same number of measurements when the frequency of measurement is
increased. While the same pattern of power changes is observed for changes to the period
spacing under attrition as without attrition, the effect of changing the period interval spacing is
more pronounced when attrition is present. In Figure 3.23 the change in power for decreasing the
frequency of measurement from 0.5 to 1.0 is plotted as a function of sample size and number of
measurements for scenarios with and without attrition.
Figure 3.23: Changes to Period Spacing (Ps) are more impactful on power when attrition is
present.
The change in power is more impactful for these ALD designs when there is greater attrition (w).
For a fixed sample size (panel A), these result in a greater relative increase in power compared to
the design without attrition. These increases are also greater when the number of cohorts is
greater. For a fixed number of measurements, the same pattern if observed, with losses in power
minimized at higher levels of attrition (panel B). Changes to gamma (γ) do not demonstrate a
consistent pattern, as at lower levels of attrition (w=0.15) the differences between gamma values
are minimal. When attrition is higher (w=0.30), shifts in gamma favoring attrition towards the
end of the study show greater power gains/less power-loss for both designs. Overall, the benefits
of modifying the frequency of measurement are amplified when attrition is present and occurring
towards the end of the study. This indicates that less frequent measurement (higher Ps) is even
more desirable when attrition is present and the number of measurements are fixed. When the
number of subjects is fixed, this indicates that more frequent measurement is desirable under
71
attrition. This aligns with our expectations, mainly that a greater number of measurements (lower
Ps) would decrease the proportion of measurements removed from the ALD, thus allowing for
greater power. While this assumption of designs with more frequent measurement having fewer
measurements removed was correct (Figure 3.24, panel A), the proportion of measurements
removed did not differ greatly between designs with different period spacings (panel B).
Figure 3.24: Changes in Period Spacing (Ps) have minimal impact on proportion of
measurements excluded.
Overall, the findings regarding attrition indicate that designs with a low level of attrition
concentrated towards the end of the study will be more powerful. The loss in power as a result of
attrition is minimized at the same sample size for designs with more cohorts, though designs with
more cohorts still have the least power for a given sample size. While Galbraith et al. (2017)
showed an association of longer study durations leading to greater power loss, this was more of a
function of the differences between designs with different numbers of cohorts when the study
age-span coverage is fixed, as it is in these exemplars. That said, if the same design was utilized
with the coverage increasing through a greater number of periods (Pn), we would expect greater
study length to be associated with power-loss. Lastly, increasing the frequency of measurement
when the sample size is fixed or decreasing the frequency of measurement (increasing the Ps)
when the number of measurements is fixed will have a greater impact on increasing the power
curves when attrition is present.
3.3.7 Equal cost sample size and the boosting of power in the ALD
One of the key features of the ALD is the ability to save costs relative to a single cohort
design by reducing the number of measurements and length of the study for same amount of age-
span coverage. As we recall from Chapter 2 section 2.2, we can utilize these cost savings to solve
72
for the number of subjects that would be possible in an ALD that is of equal cost to the SCD.
The equal cost sample size (EqN c) allows us to explore how the cost savings in the ALD can
allow for even greater power through increases in the total N above that of the SCD. In Figure
3.25 below, we can see how the EqNc increases the power for these ALDs relative to when the N
is equal to that of the SCD.
Figure 3.25: Power curves for equal cost sample size.
For a fixed N (panel A), the equal cost sample size unsurprisingly outperforms the scenarios
where the N is equivalent to an SCD. These increases in power are greatest for designs with a
larger number of cohorts and show that, for example at an N of 120, power increases 14.5% in
the five cohort ALD by using the equal cost N. Moreover, differences from the SCD are
minimized by using the EqNc which shows that at an N of 120 the two-cohort-design has power
equal to the SCD and the five-cohort-design has only 8% less power than the SCD. The power
changes for the ALDs by using the EqNc can be seen in Table 3.7 below. While the SCD
achieves a power of 80% at an N of 56, at the EqNc, only the 2 and 3 cohort designs come within
~5% of reaching the 80% threshold. However, when examining the SCD with 90% power at an
N of 80, both the 4 and 5-cohort designs show dramatic improvements in power with a 10% and
17% increase respectively, allowing both to surpass the 80% power threshold. When examining
the power curves for a fixed total number of measurements in Figure 3.25 (panel B), it is readily
apparent that the ALDs using the EqNc show power that is equivalent or greater than the SCD at
the same number of measurements. This indicates that for those looking to limit the total number
of measurements, the ALDs using the EqNc provide an equally powerful alternative to the SCD.
This too is not surprising as the Nc for these designs at the same M is much higher than in the
SCD.
73
Table 3.7: Power compared to the SCD when using the EqNc
SCD N=56 80% Power SCD N=80 90% Power
Number
of Cohorts
(Cn)
EqNc at SCD
N=56
Power at
N=56
Power at
N=EqNc
EqNc at SCD
N=80
Power at
N=80
Power at
N=EqNc
2 60 0.736 0.762 86 0.867 0.886
3 66 0.668 0.742 93 0.819 0.874
4 74 0.600 0.703 104 0.745 0.841
5 84 0.527 0.676 120 0.653 0.821
In Figure 3.26 we can see how equal cost the sample size and number of measurements increases
relative to the designs where the sample size is exactly the same as in the SCD.
Figure 3.26: Relationship between original sample size and equal cost sample size
For both the sample size and number of measurements, use of the equal cost sample size allows
for increases that are on average approximately 8%, 17%, 30%, and 50% higher for ALDs of 2,
3, 4, and 5 cohorts respectively. Given that designs with a greater number of cohorts show
greater increases in the equal cost sample size, it is not surprising that these designs would also
show the greatest improvements in power as demonstrated in Figure 3.25. The reason why this
occurs has to do with how these simulations were conducted. By fixing the age-span coverage to
10 years, designs with a greater number of cohorts will have to be of shorter duration (length) in
order to cover the same age intervals. As a result, these decreases in duration result in a greater
cost savings for these ALDs (relative to the SCD) which allows for a more rapid increase in
sample size. For these computations it was assumed that the costs for recruitment, measurement,
and study duration were equal proportions of the total budget (i.e. 33% each). If these costs were
to shift more heavily towards duration (or measurement to a lesser extent) then we would expect
even greater gains in power as the cost savings of the ALD would be even greater relative to the
SCD.
74
The use of equivalent cost sample sizes may be particularly important in instances where
attrition is present. As we have already seen above (section 3.3.6), studies with attrition will have
less power than those without, however those studies with greater attrition concentrated towards
the end of the study will have less power-loss relative to the SCD. As we may recall from
Chapter 2, the equal cost sample size is reduced when computing costs for a design with attrition.
This results from the minimization of cost savings under attrition, as the loss of measurements in
the SCD from attrition reduces these costs while costs in ALD are less impacted resulting in a
smaller cost savings. For this reason, when examining the equal cost sample size in the context
of attrition, we have computed the EqNc based on the SCD design without attrition. This is
sensible because in the budgeting of a study a researcher will not know apriori the actual amount
of attrition to anticipate. It seems reasonable to assume that budgeting would occur under the
assumption that all of the subjects would complete all measurement occasions even though that
assumption is patently false. When designs with the equal cost sample size are compared to those
without the EqNc it can be shown that the power gains are greater when the EqNc is employed
under conditions where there is attrition.
Figure 3.27: Power gains by using the EqN c with and without attrition
Figure 3.27 demonstrates that power gains by using the equal cost sample size is greater when
attrition is present for a fixed N (panel A) or number of measurements (panel B). Though the
absolute power values will be less under attrition (as was seen in section 3.3.6), these figures
demonstrate that the benefits of using EqNc are more pronounced under attrition. In Figure 3.27
the attrition levels are high (30%) and concentrated towards the end of the study. Based on our
previous examinations of attrition would therefore expect these gains to me minimized as either
the attrition is decreased or becomes distributed more evenly across the age distribution. In Table
75
3.8 below, we can see how use of the EqNc improves the average power-loss when attrition is
present.
Table 3.8: Power Improvements for the EqNc under attrition
Number of
Cohorts (Cn)
Sample N or
EqNc
No Attrition
Avg Power
Attrition
γ=2, w=30%
Avg Power
Avg Power-Loss
2 N 0.820 0.782 -0.038
3 N 0.772 0.740 -0.032
4 N 0.711 0.678 -0.033
5 N 0.627 0.601 -0.026
2 EqNc 0.831 0.797 -0.033
3 EqNc 0.809 0.784 -0.024
4 EqNc 0.780 0.757 -0.022
5 EqNc 0.744 0.725 -0.019
The findings for the equal cost sample size indicate that for fixed age-span coverage, the
EqNc outperforms designs employing an N equal to the SCD. These gains in power are more
pronounced for designs with a larger number of cohorts. The increases in sample size afforded
by the cost savings allow for a large increase in sample size (8-50% in these simulations) which
allows for these designs with a greater number of cohorts to become nearly as powerful as the
SCD. When attrition is present, though the overall power will be reduced, the relative gains in
power by using the EqNc will be greater. Together these findings indicate that the use of the
EqNc is a viable solution to make-up for the power loss when introducing additional cohorts to
estimate a linear developmental trajectory.
3.4 Chapter Summary
In this chapter we introduced the ways for simulating accelerated designs including
methods for generating the age distributions, determining the growth reliability, the effect sizes,
and for incorporating attrition. We also discussed the primary means by which these designs will
be analyzed, the accelerated longitudinal design mixed model and how statistical power will be
assessed using this design.
Our findings regarding power indicate that designs with a smaller number of cohorts and
with a high degree of overlap in the age distributions between the cohorts will result in power
values that closely approximate that of the single-cohort design. We found that when the total
number of measurements was fixed and the growth curve reliability (GCR) was low, that designs
with a smaller number of cohorts were more powerful. This was in contrast to the findings of
76
Moerbeek (2011) who showed that designs with a greater number of cohorts were more powerful
for a fixed number of measurements. We demonstrated that this inconsistency was related to the
growth curve reliability of the simulations and that when the GCR was increased we mirrored the
results of Moerbeek. We also showed that the power associations with the amount of between-
cohort overlap in the age distributions were nonlinear, such that increases in overlap increased
power at an increasingly faster rate. This occurred due to the additional measurements per
subject that were necessary as overlap increased to ensure that the entire age span was covered.
We additionally saw that designs with less frequent measurement when the number of
measurements was fixed would show greater power, but that for a fixed sample size more
frequent measurement resulted in greater power. Moreover, we saw that ALDs with a greater
number of measurements, when the coverage was fixed, would more closely approximate the
SCD and thus have greater power.
We found that improvements to the GCR or effect size will improve the power of the
ALDs and minimize the power differences to the SCD. Changes to effect size were more
pronounced in designs with a larger number of cohorts when the GCR was low (0.5) and for
designs with a smaller number of cohorts when the GCR was high (0.9). Regardless for designs
with less than 5 cohorts, at the all of the GCRs specified, the ALD showed 80% power with
small sample sizes when the age overlap between the cohorts was high and the effect size was at
least 0.5.
For attrition, it was shown that designs with less attrition will have greater power and that
attrition concentrated towards the end of the study will be most powerful. We noted that results
from other research on longitudinal design would suggest that attrition concentrated towards the
middle of the age distribution would likely be the most powerful configuration for linear models.
We also showed that designs with a smaller number of cohorts, particularly the SCD, were more
impacted by power loss due to attrition.
Lastly, we proposed the use of the Equal Cost Sample Size (EqNc) for conducting the
ALD which allowed for us to examine the power of an ALD through use of a sample size that
was greater than in the SCD but of equal cost to the SCD. Our results indicated that even the 5-
cohort ALD comes within 8% of the SCDs power when utilizing the EqNc. Moreover, the
sample size can be further increased if costs are predominately related to study duration. We also
found that the use of the EqN c was helpful in mitigating attrition related power loss.
77
3.5 Chapter References
1. Baltes, P. B. (1968). Longitudinal and cross-sectional sequences in the study of age and
generation effects. Human development, 11(3), 145-171.
2. Baltes, P. B., & Nesselroade, J. R. (1970). Multivariate longitudinal and cross-sectional
sequences for analyzing ontogentic and generational change: A methodological
note. Developmental Psychology, 2(2), 163.
3. Baltes, P. B., & Nesselroade, J. R. (1972). Cultural change and adolescent personality
development: An application of longitudinal sequences. Developmental Psychology, 7(3),
244.
4. Galbraith, S., Bowden, J., & Mander, A. (2017). Accelerated longitudinal designs: an
overview of modelling, power, costs and handling missing data. Statistical Methods in
Medical Research, 26(1), 374-398.
5. Glenn, N. D. (1976). Cohort analysts' futile quest: Statistical attempts to separate age, period
and cohort effects. American Sociological Review, 41(5), 900-904.
6. Hertzog, C., Lindenberger, U., Ghisletta, P., & von Oertzen, T. (2006). On the power of
multivariate latent growth curve models to detect correlated change. Psychological
methods, 11(3), 244.
7. Holford, T. R. (1983). The estimation of age, period and cohort effects for vital
rates. Biometrics, 311-324.
8. Jackson, N.J. (2017). ALDSIM: Stata program for the simulation of accelerated longitudinal
designs. Stata Version 15.0. revised 09.07.2017.
9. Mason, W. M., & Fienberg, S. E. (1985). Introduction: Beyond the identification problem.
In Cohort analysis in social research (pp. 1-8). Springer, New York, NY.
10. Mason, K. O., Mason, W. M., Winsborough, H. H., & Poole, W. K. (1973). Some
methodological issues in cohort analysis of archival data. American Sociological Review,
242-258.
11. Mason, W. M., & Smith, H. L. (1985). Age-period-cohort analysis and the study of deaths
from pulmonary tuberculosis. In Cohort Analysis in Social Research (pp. 151-227). Springer,
New York, NY.
12. Miyazaki, Y., & Raudenbush, S. W. (2000). Tests for linkage of multiple cohorts in an
accelerated longitudinal design. Psychological Methods, 5(1), 44.
78
13. Moerbeek, M. (2011). The effects of the number of cohorts, degree of overlap among
cohorts, and frequency of observation on power in accelerated longitudinal
designs. Methodology: European Journal of Research Methods for the Behavioral and Social
Sciences, 7(1), 11.
14. O'Brien, R. M., Hudson, K., & Stockard, J. (2008). A mixed model estimation of age, period,
and cohort effects. Sociological Methods & Research, 36(3), 402-428.
15. O'Brien, R. (2014). Age-period-cohort models: Approaches and analyses with aggregate
data. CRC Press.
16. Smith, H. L., Mason, W. M., & Fienberg, S. E. (1982). Estimable functions of age, period,
and cohort effects: more chimeras of the age-period-cohort accounting framework: comment
on Rodgers. American Sociological Review, 47(6), 787-793.
17. Timmons, A. C., & Preacher, K. J. (2015). The importance of temporal design: How do
measurement intervals affect the accuracy and efficiency of parameter estimates in
longitudinal research?. Multivariate Behavioral Research, 50(1), 41-55.
18. Yang, Y., Fu, W. J., & Land, K. C. (2004). 2. A methodological comparison of age-period-
cohort models: The intrinsic estimator and conventional generalized linear
models. Sociological methodology, 34(1), 75-110.
19. Yang, Y., & Land, K. C. (2006). A mixed models approach to the age-period-cohort analysis
of repeated cross-section surveys, with an application to data on trends in verbal test
scores. Sociological Methodology, 36(1), 75-97.
20. Yang, Y., & Land, K. C. (2008). Age–period–cohort analysis of repeated cross-section
surveys: fixed or random effects? Sociological Methods & Research, 36(3), 297-326.
21. Yang, Y., & Land, K. C. (2016). Age-period-cohort analysis: New models, methods, and
empirical applications. Chapman and Hall/CRC.
79
Chapter 4
Statistical power in accelerated designs with nonlinear de-escalating exponential growth
and no between-cohort differences
Thus far we have examined statistical power in the context of linear models when cohort
differences in the developmental trajectories are absent. We saw how changes to the design
features (number of cohorts, interval spacing, number of periods etc.) impacted power and that
the use of a sample size in the ALD that is of equal cost to the SCD allows for these designs to
offset the power loss from introducing additional cohorts. While models of linear growth are
common in the literature, many developmental phenomena follow nonlinear patterns of change.
Prior work has largely examined the use of ALDs in the context of linear developmental growth
(Galbraith, Bowden, & Mander, 2017; Moerbeek, 2011; Fitzmaurice, Laird, & Ware, 2004;
Miyazaki & Raudenbush, 2000). This chapter will explore the role of non-linear growth on
power in the ALD versus the single cohort design in the absence of cohort differences to assess
the utility of these designs to capture nonlinear change. Specifically, we will examine the ALD
in the context of de-escalating exponential nonlinear growth. There is good reason to believe that
ALDs for nonlinear designs might differ from their linear counterparts. For a linear model, only
two measurements per person are needed to adequately describe the change over time; however
in a nonlinear model the number of measurements per person (Pn) will play a much more integral
role in the ability to accurately capture developmental change. As a result, we may expect
differences in how the number of periods impacts the power curves between the multi-cohort
designs. Moreover, there is reason to believe that the frequency of measurement would
differentially impact nonlinear models. Increases to the period spacing (Ps), indicating less
frequent measurement, particularly around the period of greatest change in the nonlinear model
would likely induce additional bias in the slope estimates. This may be especially true when
there are a large number of cohorts for a fixed age-span coverage, as the addition of each cohort
reduces the number of measurements per subject and likewise limits the ability to capture non-
linearity.
80
4.1 Simulating Nonlinear Effects
While linear models are an easily interpreted representation of developmental change,
often age-related change behaves non-linearly such as in a logistic, exponential, or Gompertz
function. This is particularly true for models of adolescent development, where there is typically
a period of low (or high) adoption of some trait or behavior; followed by a period of greater
adoption or desistance; concluded by a period of stasis whereby few individuals are changing.
Unlike linear models, where the rate of change and the total amount of change is contained in a
single slope parameter, non-linear models can be adapted to separate these components. The
additive nonlinear model described by Grimm, Ram, and Hamagami (2011) is one such model
that separates these components.
yijk = β0 + β1(1-exp( -β2 xijk)) + υ0j + υ0k + υ1j xijk + υ1k xijk + εijk eq. 1
Indeed, many of the same parameters described in equation 9 from chapter 3 are again presented
here. We additionally notice that our age variable (xijk) is now being modeled exponentially. The
change among subjects is now separated between the total change (β1) and an additional fixed
parameter (β2) that modulates the rate of approach to the upper asymptote and is held constant
between cohorts. While simulations for this chapter will utilize this model of exponential growth,
an examination of the substantive problems presented in Chapter 9 may necessitate the use of a
logistic-normal (eq 2) or Gompertz function (eq 3) shown below.
yijk = β0 + β1/(1+exp(-β2 xijk)) + υ0j + υ0k + υ1j xijk + υ1k xijk + εijk eq. 2
yijk = β0 + β1[exp(-exp(-β2 xijk))]+ υ0j + υ 0k + υ1j xijk + υ1k xijk + εijk eq. 3
Excluded from the above nonlinear equations is the parameter that controls the age at
which the rate of change is greatest (β3). This is accomplished by re-centering age around this
inflection point, which is presumed in these simulations to be either the stating age for
exponential growth or the median of the age-span under study for logistic-normal growth; more
specifically:
Logistic-Normal: β3 = [AgeC1P1 + Ps(Pn-1) + Cs(Cn-1) ] / 2;
Exponential: β3 = AgeC1P1
where AgeC1P1 is the mean age at the first period of the first cohort (youngest) and for eqs. 1-3
above:
xijk = Ageijk - β3
It is recognized that this inflection parameter may also be influenced by cohort differences that
81
would likely be a function of the age difference between the cohorts (Cs); this is beyond the
scope of the current investigation. As such, the proposed simulations above will presume a
singular inflection point that does not vary by cohort as outlined for β3 above.
4.2 Conceptualizing Nonlinear Models
Nonlinear models can be confusing to understand. The inclusion of exponential
transformations coupled with the use of unfamiliar parameters pertaining to when the slope
increases (e.g. β1 and β2 above) can hinder their regular adoption by researchers. It is for this
reason we will start by dissecting these components graphically to get a feel for how our usual
concepts of slope and change translate to the world of nonlinear modeling. In Figure 4.1,
examples of nonlinear growth curves for a logistic-normal (panel A) and de-escalating
exponential growth model (panel B) are displayed as a function of their values for slope (β1) and
rate of change (β2).
Figure 4.1: Examples of nonlinear growth
Both examples have the intercept value (β0) set to 0 and the inflection point (β3) set to age 15
(median of the age-span under study). For the logistic-normal growth, the intercept value
determines that initial value at the first age of measurement, which is the lower asymptote of the
developmental curve. The slope parameter (β1) determines the total amount of change in the
outcome, such that a slope value of 15 would indicate that the outcome will increase by 15 points
from the start to the end of the study. This is unlike the linear models in chapter 3 where the
slope value represented change per year and the total amount of change was the value of
slope*coverage. The slope plus the intercept will equal the upper asymptote for the outcome in
these models. Half of the total amount of change will occur prior to the inflection point (age=15)
and half will occur afterward. As seen in panel A, as the rate of change parameter (β2) increases
82
from 0.5 to 2, the shape of the nonlinearity is modified. Smaller values show a steady (nearly
linear) growth while larger values show a more dramatic increase from the intercept to the upper
asymptote (more S shaped). For the de-escalating exponential growth models (panel B), the
intercept value of 0 refers to the value of the outcome at the inflection point. The slope value will
then represent the total amount of change from the intercept (i.e. the inflection point) to the upper
asymptote. As a result, it is common to use the age at first measurement (i.e. age 10) as the
inflection point in the exponential model rather than the midpoint of age-span. The rate of
change parameter (β2) functions similarly as in the logistic-normal model, where smaller values
indicate a more constant growth and larger values a more rapid increase to the upper asymptote.
In Figure 4.2 below we can see how changes in the inflection point modify the shape of these
curves.
Figure 4.2: Changes in inflection point modify the developmental curve
For the logistic-normal model (panel A), the shifting of the inflection-point to age 17 slows down
the amount of change in early adolescence relative to the model with inflection point at age 15
(i.e. Fig 4.1A). While previously half of the total amount of change occurred by age 15, now that
amount of growth is delayed until age 17. Moreover, by shifting the inflection point, the models
with a slower rate of change are unable to reach the upper asymptote (slope+intercept) by the end
of the observation period at age 20. In the de-escalating exponential model (panel B), by shifting
the inflection point to be the starting age of the study we can more clearly see the characteristic
exponential growth curve with the upper asymptote determined by the slope plus intercept value.
While the simulations in this chapter will not model the inflection point because the age values
will be re-scaled per the description in section 4.1; it is important to note that the inflection point
is also the age at which growth rate is greatest in these models.
83
4.3 Statistical power of nonlinear ALDs without cohort differences
For these simulations, the non-linear additive model (eq. 4.1) demonstrating de-escalating
exponential growth (Figures 4.1 and 4.2 panel B) was used. Thus, all further references to the
“nonlinear” model will refer to this specific type of nonlinearity. As in chapter 3, all of the
following simulations will examine designs that cover a 10-year age span. Each design was
simulated 1,000 times with GCR=0.5 and growth rate (δ) = 0.5. A slope (β1) value of 2 was used
with a rate of change (β2) equal to 0.4, reflecting a moderately fast growth in young-adolescence.
Because the estimation of these nonlinear models was computationally intensive (i.e. was taking
too long to allow completion of this dissertation in a timely manner), the variation in sample size
was restricted to N's of 40, 80, and 120. Simulations of the ALDs were conducted using the user-
written package aldsim (Jackson, 2017) in Stata version 15.0 (College Station, TX). Figure 4.3
below shows what this default model that is being simulated looks like.
Figure 4.3: Default Exponential Growth Model
Using the above stated parameters for the nonlinear simulations, these models depict a
mechanism by which half of the total change occurs in the first two years of measurement. This
relatively rapid rate of growth was chosen intentionally, as these values will provide more
conservative estimates for the multi-cohort designs given that under most design scenarios it will
be the first (youngest) cohort that will explain 50% of the growth. This should bias the results to
favor the single cohort design and thus will provide a strong check on criticism that these
simulations were designed to favor the ALD approach.
84
4.3.1 Relevancy of Linear ALD findings to Nonlinear ALDs
Chapter 3 sections 3.3.1-3.3.4 discussed the general function of the design parameters
(i.e. Cn, Cs, Pn, Ps) on statistical power for linear models. For nonlinear designs we would expect
similar results; that is that power in the equivalent SCD will be higher than power in the ALD.
The tradeoffs between design efficiency (and its constituent components of overlap and length)
with power would also hold true; that is that an inverse relationship exists as designs with more
measurements would have greater power but less design efficiency. Moreover, as in the linear
model, we would find that increases in the study length or in the amount of overlap would
increase power and more so for designs with more frequent measurement. It should be noted that
while, similar to the linear model, we would expect for a fixed sample size that smaller period
spacings (i.e. more frequent measurement) would yield greater power; however in the nonlinear
model the placement of these period interval spacings may be particularly impactful. Figure 4.4
below shows how the period spacing impacts power in these designs.
Figure 4.4: Frequency of Measurement (Ps) and Power
Like the linear model, the nonlinear model shows a clear differentiation in the power
curves for designs of differing frequency of measurement. Despite the fact that much of the
change in these designs occurs in the first few years of measurement, these designs with less
frequent measurement prove to be more powerful at the same number of total measurements,
similar to the linear model from Figure 3.6. This was contrary to the initial hypothesis as it was
expected that less frequent measurement would result in less ability to detect these changes
Though we don't see the hypothesized difference in terms of power, we might expect
substantially more bias in the slope estimates at these higher period spacings when bias is
examined in the subsequent chapter. While all of the simulations in this dissertation assume an
85
equal interval period spacing, varying where the periods occur can yield less bias and greater
statistical efficiency particularly when measurements are concentrated closer to the periods of
greater change (i.e. near curvature) (Timmons & Preacher, 2015) which is of particular
importance to nonlinear designs.
It is not possible to quantitatively compare between the linear and nonlinear models to,
for example, determine if the changes in the design parameters are more impactful in the
nonlinear model as the differences in these models and their specification don't allow for direct
comparison. We can however note instances where there are similarities (and differences) in the
behavior of the power curves. For instance, although our effect size for these nonlinear models is
equal to the linear model (i.e. 0.5) as is our default GCR (0.5), we show less power to detect
nonlinear growth at the same sample sizes and number of measurements than we do to detect
linear growth (Figure 4.5). Table 4.1 shows the sample sizes and number of measurements
required to achieve 80% power for the linear compared to nonlinear model of the same effect
size and GCR. It should be noted that because of the restricted sample sizes used for the
nonlinear models, values for the 4 and 5-cohort designs could not be appropriately interpolated
for 80% power and are thus set to values greater than (>) the 3-cohort design.
Figure 4.5: Statistical power for linear and nonlinear growth as a function of sample size and
measurements
86
Table 4.1: Sample size and number of measurements for 80% Power in the linear and nonlinear
models
Number of
Cohorts (Cn)
N for Linear
Model
N for Nonlinear
Model
M for Linear
Model
M for Nonlinear
Model
1 56 71 620 776
2 67 85 668 855
3 82 117 740 1055
4 106 >117 849 >1055
5 137 >117 958 >1055
Similar to our linear models (panels A & B), we find that statistical power is greatest for a
smaller number of cohorts (Figure 4.5) for a given sample size (panel C) or number of
measurements (panel D). Moreover, we can note that the between-cohort variability in power
appears to be greater for the nonlinear model. This is evident when comparing the ratio of the
sample sizes and number of measurements required to achieve 80% in the ALDs (Table 4.1)
compared to the SCD, as this ratio is typically greater for the nonlinear model, suggesting greater
between-design differences. As previously explained in Chapter 3, the fact that power is greater
for a smaller number of cohorts is intuitive as with a greater number of cohorts, each cohort
explains a smaller proportion of the overall developmental trajectory. In the linear model we
showed that our results for the relationship between the number of cohorts and power as a
function of the number of measurements was largely dependent on the GCR. Higher GCR values
lead to agreement with Moerbeek (2011) and Galbraith (2017) who showed that more cohorts
yielded greater statistical power when the total number of measurements was fixed. Lower GCR
values lead to the opposite conclusion, indicating that a smaller number of cohorts was preferred
for a fixed number of measurements (see Figure 3.12 from Chapter 3). The next section will
explore whether this is the case for the nonlinear model.
4.3.2 The role of GCR and Growth Rate on Power
We can examine how the power curves shift when the GCR or growth rate is modulated,
just as we did for the linear model. In Figure 4.6, we can see how increases in GCR from 0.5 to
0.7 and 0.9 result in increased power for a fixed sample size (panel A) or total number of
measurements (panel B).
87
Figure 4.6: Power curves are dependent on the growth curve reliability of the design
Unlike in our linear model, we find that the relative ordering of the cohorts, in terms of power,
does not appear to depend on the GCR. Even at a high GCR (i.e. 0.9) we find that the single
cohort design still preforms best for a fixed total number of measurements (panel B). Although
Moerbeek (2012) and Galbraith et al. (2017) did not investigate these relationships in nonlinear
models, it is interesting that the general pattern for the linear models is not the same in these
nonlinear growth models. One might posit that at an even higher GCR we might see the designs
with larger cohorts outperform the smaller cohort designs. Indeed, our explanation for this
phenomenon in Chapter 3 was that the larger cohort designs benefited the most from increases to
the GCR and thus reflected a trade-off at a fixed number of measurements between adding more
subjects versus adding more measurements. Because the total number of measurements is a
function of the number of subjects per cohort (Nc), the number of cohorts (Cn), and the number
of periods per subject (Pn); a 2-cohort design in order to have the same number of measurements
as a 5-cohort design when the coverage is fixed, will have a higher Pn per subject. For example,
suppose the following two designs were presented here:
M = Nc*Cn*Pn
800 = 40*2*10
805 = 23*5*7
Although the number of measurements (M) are roughly equal, for our fixed coverage simulations
the total number of measurements for the 2-cohort design are primarily driven by the number of
measurements per subject (Pn) while for the 5-cohort design these are primarily from having a
greater number of subjects (Nc*Cn) compared to the 2-cohort design. In the nonlinear models the
results indicate that even at high reliability (GCR=.9) it is more beneficial to have more
88
measurements than have more subjects whereas for the linear models with high GCR it is more
beneficial to gather more subjects. The inconsistency of this finding between the linear and
nonlinear models may also result from how the nonlinear model was specified. As mentioned in
the beginning of this section (4.3), the rate of approach in these models was fairly steep so as to
bias the results more favorably towards the single-cohort design. Given this, it would not be
unexpected that a design with a slower rate of approach would allow for the larger cohort designs
to have more power at the same number of measurements, possibly resulting in the inverse
relationship between power and number of cohorts that was previously found at high GCR in the
linear model. Indeed, Table 4.2 below which shows the number of subjects and measurements
required to achieve 80 and 90% power at these higher GCRs of 0.7 and 0.9 gives some indication
of this pattern as the number of measurements (M) to achieve 80% power at a GCR of 0.9
decreases as the number of cohorts increases.
Table 4.2: Sample Sizes and Number of Measurements for achieving power with high overlap
and high GCR
GCR
Number of
Cohorts (Cn)
N for 80%
Power
M for 80%
Power
N for 90%
Power
M for 90%
Power
0.7
1 40 440 65 717
2 45 450 77 773
3 65 589 110 990
4 86 687 >110 >990
5 117 817 >110 >990
0.9
1 40 440 40 440
2 40 400 40 400
3 40 360 65 589
4 41 320 76
5 47 84
In addition to the hints in Table 4.2, there are some graphical indications in the nonlinear
model that with a high enough GCR the reversal in the power ordering of the cohorts would take
place. Similar to the linear model, changes to the GCR impact designs with a greater number of
cohorts the most. As seen in Figure 4.7 below, the amount of change in power relative to the
power curve when the GCR was 0.5 was greater for larger cohort designs when the GCR
increased to 0.9 (and 0.7). This was true for when the number of subjects (panel A) and when the
number of measurements was fixed (panel B).
89
Figure 4.7: Changes to the GCR have a greater impact on power for designs with a
greater number of cohorts.
This suggests that designs with a greater number of cohorts will be more reactive to changes in
the reliability of the growth curve, though to a lesser extent than they were in the linear model.
Moreover, the less than perfect rank ordering of the cohorts (i.e. Cn=4 > Cn=5) at the GCR of 0.9
and 0.7 suggests that higher GCRs are needed in nonlinear models in order to make adding more
subjects (i.e. more cohorts) more beneficial than adding more measurements. Table 4.3 shows
how the percentage increase in power from a GCR of 0.5 improves for ALDs with a greater
number of cohorts.
Table 4.3: Percentage increases in Power from a GCR of 0.5
Number of
Cohorts (Cn)
%Increase
for GCR=0.7
%Increase
for GCR=0.9
1 26.0 45.1
2 23.5 41.9
3 25.4 48.2
4 37.5 71.1
5 49.2 100.1
When modifying the growth rate (δ) we find, as in the linear model, that increases in
growth rate result in increases to the power curve (Figure 4.8). This increase is more pronounced
in instances where the number of measurements in fixed (panels B and D) as well as when the
GCR is higher (panels C and D). Similar to our findings for the GCR, a greater effect size does
not change the relative ordering of the power curves based on the number of cohorts in the
design. Indeed, we find that the even at a high GCR (0.9, panels C and D) the ordering by
number of cohorts does not differ as a function of effect size.
90
Figure 4.8: Increased growth rate (δ) increases power
When the effect size is held constant, increases to the GCR were more impactful for designs with
a greater number of cohorts, as seen in Table 4.3. This increase was even more pronounced as
the GCR went from 0.7 to 0.9. However, this change is minimized at larger effect sizes, as
shown by examining the change in power from a GCR of 0.5 to 0.9 for smaller (δ=0.5) and
larger (δ=1.0) effect sizes (Figure 4.9, panels A and B). This finding is intuitive in that larger
effect sizes, already responsible for increasing the power of the design, would have less room for
improvement through increasing the GCR than a smaller effect size. Moreover, the finding that
this differentially impacts designs with a greater number of cohorts also reflects the fact that the
designs with a greater number of cohorts are less powerful and as a result are more sensitive to
shifts in the GCR or effect size.
91
Figure 4.9: Interplay between GCR and Growth Rate (δ)
When the GCR is fixed and the differences in power are examined for going from a smaller
effect size (δ=0.5) to a larger one (δ=1.0) (Panels C & D), similar patterns are observed. Changes
to the effect size have a greater impact on the low GCR designs and impact designs with a
greater number of cohorts more. The changes to power by increasing the effect size are reduced
as the sample size or number of measurements increase. Among these, designs with a greater
number of cohorts show the most benefits in power by increasing the effect size.
Overall these results highlight that the differential impact to changes in statistical power
when modifying the GCR or effect size that was noticed in the linear models is also present in
these nonlinear models. While Moerbeek (2011) showed that designs with a greater number of
cohorts were the most powerful for a fixed number of measurements, our results have shown that
this finding is dependent on the GCR, such that higher GCRs yield this effect. In the linear
models, this effect was noticed at a GCR of ~.7 and was prominent at a GCR of 0.9. For these
nonlinear models, the GCR of 0.9 shows hints of the reversal of this ordering, suggesting that a
higher GCR would be needed to observe this pattern. These results, as discussed above, highlight
the tradeoff between gathering more subjects (i.e. more cohorts, greater total N) versus a greater
number of measurements per subject (i.e. fewer cohorts, greater Pn). While conventional wisdom
on longitudinal designs indicates that it is more beneficial to add more measurements per subject
to increase the power of a design, this increase in power through increased reliability is only true
92
in instances where the GCR is lower. At a sufficiently high GCR (>0.7 linear; >0.9 nonlinear)
additional measurements do not increase power at the same rate as increasing subjects, hence the
general finding that designs with a greater number of cohorts yield greater power for a fixed
number of measurements at high GCR. Just as in the linear model, as the GCR increased, the
power increased and power differences between the designs was reduced. A similar phenomenon
occurred for effect sizes, indicating that the properties of the GCR and effect size are similar
across the linear and nonlinear models.
4.3.3 Attrition and power loss in the ALD and SCD
When examining attrition in the nonlinear model, we can see how varying levels of
attrition (w) for a fixed gamma (γ=1) impact the power curves for these various designs (Figure
4.10). The gamma value of 1 in these curves indicates a scenario where missing data are
uniformly distributed as Missing Completely at Random (MCAR). Panel A shows the power
curves for no attrition, 15% attrition, and 30% attrition as a function of sample size. As
mentioned in Chapter 3, the Weibull model used define attrition will only impact the number of
measurements per subject (Pn) for a subset of subjects, thus the total sample size (i.e. number of
subjects) will not vary by the amount of attrition.
Figure 4.10: Power curves for attrition when data are MCAR.
For the total number of subjects (panel A), we find that designs with greater attrition will have
less power at the same sample size, resulting from the fewer measurements per subject. For the
total number of measurements (panel B), it initially appears as if designs with a greater amount
of missing data have may have slightly more power (or at least equal power to the no attrition
condition). Indeed, this is more of a byproduct of how these figures were constructed using only
3 sample sizes (i.e. 40, 80, 120) instead of a full power curve. Within a design, those with a
93
greater amount of missing data will have a smaller total number of measurements. As a result of
having smaller values for the total number of measurements, the connection of their 3 points on
the figure can give the impression of a steeper increase for designs with greater attrition. While
the rate of power increase is steeper for these designs with attrition; this reflects that the power
per measurement is greater as a result of a minimal decrease in power relative to the greater loss
in the number of measurements due to attrition. In truth, the power values, particularly for the
smaller sample sizes (e.g. N=40) are not very different from one another (within 1%) and when
equated on the total number of measurements provide the expected conclusion of designs with
greater attrition having less power (Figure 4.11).
Figure 4.11: Power values as a function of the total number of measurements when equated
within each cohort
These results also show that Power differences between attrition levels are greater for a fixed
sample size (Fig 4.10, panel A) then a fixed total number of measurements (Fig 4.10, panel B)
and show that designs with a greater number of cohorts are less sensitive to power changes as a
result of attrition, similar to the results from our linear models. Overall, however, designs where
attrition is present will show greater between-design variability in power. When attrition was
fixed (w=0.15), but gamma values are varied (Figure 4.12), we found that there were few
differences in power between designs with attrition distributed uniformly versus concentrated
more at the end of the study (i.e. higher gamma), however overall these nonlinear designs
showed the same patterns as in the linear models.
94
Figure 4.12: Power curves for gamma when attrition is fixed.
Higher gamma yields greater power for when the sample size and total number of measurements
are fixed. This aligns with our findings in our linear model which showed that concentrating
attrition towards the end of the study maximized power. Indeed, compared to the linear models,
these nonlinear models with attrition end up showing less power-loss overall and indicate that
designs with low attrition that is concentrated towards the end of the study will minimize power-
loss (Table 4.4). Most notably, compared to the linear models which suffered ~10% power loss
with uniformly distributed attrition at 30%, these nonlinear models suffer less than 5% power
loss.
Table 4.4: Maximal Power loss as a function of gamma and attrition
Gamma (γ)
Number of
Cohorts (Cn)
Attrition
w=15%
Power-Loss
Attrition
w=30%
Power-Loss
1
1 -0.023 -0.040
2 -0.039 -0.042
3 -0.034 -0.049
4 -0.024 -0.039
5 -0.042 -0.053
3
1 -0.017 -0.032
2 -0.032 -0.027
3 -0.015 -0.030
4 -0.015 -0.018
5 -0.024 -0.030
Similar to the linear models, these results tend to indicate that how the attrition is distributed
matters more than the amount of attrition in terms of minimizing the power-loss. While these
exemplars above only consider situations where attrition is distributed towards the end of the
study, there is reason to believe that attrition distributed towards the beginning of the study
would have a greater impact on power. If attrition was concentrated towards the beginning of the
95
study then we would expect a greater loss in power as the gamma values decreased from 1 to
approach 0, as this would cover the period of greatest change for these designs. This highlights
the need to consider the expected nature of the attrition in the study as well as the expected
developmental trajectory, as together these will impact the power of the design. Distribution of
the attrition concentrated towards the beginning of the study will be explored in the subsequent
section examining changes to the period spacing.
As we showed previously in the linear model, the associations between power and the
study length that were noted by Galbriath et al. (2017) are artifacts of the simulation using a
fixed age-span coverage and the subsequent tradeoffs between decreasing length and increased
number of cohorts. As a result, we will not explore these associations in the nonlinear models
here.
One design aspect that may be particularly important in nonlinear models and that is
likely to affect these power curves in the presence of attrition is the period spacing (Ps). While
we noted few differences in power as the attrition was concentrated towards the end of the design
(Figure 4.12), changes to the frequency of measurement are likely to impact these power curves
as a greater frequency of measurement can allow for more missing data while still accurately
describing the nature of the developmental change. As seen in Figure 4.13, increasing the
frequency in measurement (smaller Ps) increases power for a fixed sample size but decreases the
power for a fixed total number of measurements under attrition. As expected, there were few
differences resulting from changes to gamma, with most differences reflecting the stochastic
process of simulation. However, unlike the previous examinations of attrition in these models,
these simulations included a condition where gamma was set to 0.5 to indicate greater attrition at
the beginning of the study.
96
Figure 4.13: Changes to frequency of measurement impact power under attrition.
Given that these nonlinear designs experience the greatest amount of change at the beginning of
the design, it was expected that this would result in decreased power for these designs. The
results in Figure 4.13 above agree with this intuition and show a degradation in power when
attrition is concentrated towards the beginning of the study. Overall, the results for the nonlinear
model are similar to the linear, suggesting that the effects of Ps here represent the tradeoffs
between Ps, Pn, and the sample size such that more frequent measurement for a fixed N result in a
greater number of measurements and hence more power. For a fixed number of measurements,
the reverse occurs, resulting in a smaller sample size at the same number of measurements when
the frequency of measurement is increased.
Overall, the findings regarding attrition indicate that designs with a low level of attrition
will be more powerful and that changes to the distribution of this attrition may be more impactful
than the amount of attrition in these designs. The loss in power as a result of attrition is
minimized at the same sample size for designs with more cohorts, though designs with more
cohorts still have the least power for a given sample size. Where the attrition was concentrated
(gamma) resulted in increases to power when concentrated towards the end of the study and
losses when distributed uniformly or towards the beginning of the design. The differences
between gamma values, however, were not typically large (within 1%). Decreases to the
frequency of measurement (increasing the Ps) improved the power of these designs, though did
not appear to meaningfully change the impacts of attrition on the power loss for these designs.
4.3.4 Equal cost sample size and the boosting of power in the ALD
As demonstrated previously, we can utilize the cost savings from the use of an ALD to
increase the number of subjects that are available for study that would equal the cost of an SCD
97
covering the same age-span. The equal cost sample size (EqNc) for the linear models in chapter 3
showed us that use of the EqNc makes accelerated designs a feasible alternative to SCDs given
the improvements in power afforded by increasing the available sample size in ALDs. In the
exemplars below, the costs for recruitment, measurement, and duration were considered equal
(~33%) proportions of the budget. In Figure 4.14 below, we can see how the EqNc increases the
power for these ALDs relative to when the N is equal to that of the SCD for nonlinear models.
Figure 4.14: Power curves for equal cost sample size.
For a fixed N (panel A), the equal cost sample size universally outperforms the scenarios where
the N is equivalent to an SCD. These increases in power are greatest for designs with a larger
number of cohorts and show that, for example at an N of 120, power increases 19% in the 5-
cohort design and 9% for the 4-cohort design. Unlike in the linear model, none of the EqNc
designs were able to perform as well as the SCD despite the absolute gains in power. For a fixed
total number of measurements (panel B), power is also higher for the EqNc designs and even the
5-cohort EqNc design is as powerful as the 3-cohort design using the same sample size as the
SCD. This occurs because the Nc for these designs at the same number of measurements is much
higher than in the SCD. Table 4.5 below shows how the power improves in these designs when
using the SCD sample size to achieve 80% power versus using the EqNc. The improvements
from using the EqNc are maximized for the larger cohort designs, with the 5-cohort design show
an approximately 10% improvement. The 2-cohort design however shows only a modest
improvement of ~2%. Compared to the linear models (Table 3.7), these improvements are
dramatically reduced, suggesting that a larger EqN c will be required for nonlinear models in
order to make the power loss from use of the ALD more feasible. One way to achieve this is by
having a design with higher duration costs which would allow for increasing the available EqNc.
98
Table 4.5: Power compared to the SCD when using the EqN c
SCD N=71
80%
Power
Number
of Cohorts
(Cn)
EqNc at SCD
N=71
Power at
N=71
Power at
N=EqNc
2 76 0.755 0.774
3 82 0.668 0.705
4 92 0.618 0.684
5 107 0.519 0.621
As we found in section 4.3.2, it would be expected that at a higher GCR the power benefits of
adding more subjects through the EqNc would be increased. In Figure 4.15 we can see how when
using the EqNc the sample size and number of measurements increases relative to the designs
where the sample size is exactly the same as in the SCD. This figure is the similar to the one
presented in chapter 3 for the linear model, but now also includes these changes to the EqNc if
the duration costs increased from 33% (equal cost to measurement and recruitment) to 66%,
Figure 4.15: Relationship between original sample size and equal cost sample size
As we can see, use of the equal cost sample size when there are equivalent costs for recruitment,
measurement, and duration allows for increases that are approximately 8%, 17%, 30%, and 50%
higher for ALDs of 2, 3, 4, and 5 cohorts respectively. However, if duration costs were to
increase to become two-thirds of the total budget, the EqNc could grow as much as 25%, 53%,
85%, and 118% for the 2, 3, 4, and 5 cohort designs respectively. As can be clearly seen, shifts in
the costs per year of study (c 3) can have profound changes on the EqNc available and should be
considered prior to undertaking these designs. Given that designs with a greater number of
cohorts show greater increases in the equal cost sample size, it is not surprising that these designs
99
would also show the greatest improvements in power as demonstrated in figure 4.14. More in-
depth reasoning as to why this occurs has been previously explained in chapter 3.
The use of equivalent cost sample sizes may be particularly important in instances where
attrition is present. As we have already seen above (section 4.3.3), studies with attrition will have
less power than those without, however the amount of power loss will depend on how that
attrition is concentrated relative to the developmental curve. For our linear models, we were able
to show that designs with attrition benefited more (i.e. gained more power) from the use of the
equal cost sample size than those without attrition. In our nonlinear model, the results were not
as clear (Figure 4.16).
Figure 4.16: Power gains by using the EqN c with and without attrition
Figure 4.16 demonstrates that power gains by using the equal cost sample size appear to be
smaller when attrition is present for a fixed N (panel A) or number of measurements (panel B)
when there are a large number of cohorts. However, for ALDs with a smaller number of cohorts
(<=3) the EqNc under attrition appears to have greater power gains. This is contrary to our linear
results which uniformly showed greater gains using the EqN c for the designs with attrition.
While we note that power gains under attrition are greater than or equivalent to the gains without
attrition at a higher N (i.e. 120) or number of measurements, the pattern is not as consistent as it
was for the linear models. Part of this discrepancy may be due to how the fact that these
nonlinear models may be more sensitive to losses in measurements, such that the addition of
subjects by the EqNc is unable to offset the power loss through the number of measurements.
Table 4.6 shows how the average power-loss performs for the N equal to the SCD or the EqNc
between conditions with and without attrition.
100
Table 4.6: Power Improvements for the EqN c under attrition
Sample N or
EqNc
Number of
Cohorts (Cn)
No Attrition
Avg Power
Attrition
γ=2, w=30%
Avg Power
Avg Power-Loss
N
2 0.763 0.740 -0.024
3 0.688 0.656 -0.032
4 0.627 0.612 -0.014
5 0.533 0.531 -0.002
EqNc
2 0.777 0.766 -0.011
3 0.725 0.714 -0.011
4 0.680 0.670 -0.010
5 0.652 0.637 -0.015
As we can see, the amount of power-loss is generally better when using the EqNc, though this is
not universally true.
The findings for the equal cost sample size in nonlinear models indicate that while the
EqNc outperforms the usual sample size, this improved performance is not usually enough to
offset the deficiencies in power by using a multi-cohort design when the measurement,
recruitment, and durations costs are equivalent. One way to offset this problem is when the study
has a budget that indicates higher study duration costs, which would allow for a greater EqNc.
4.4 Chapter Summary
In comparison to the linear model, we found that the nonlinear growth model required
larger sample sizes to achieve the same power and that between-design differences in power
were greater in the nonlinear model. Unlike in the linear model, we did not find that designs with
a larger number of cohorts were more powerful when the number of measurements was fixed
and the GCR was high. Despite this, there were indications that this pattern would emerge with a
sufficiently high GCR. We similarly saw increases in power as a result of increases in the GCR
and effect size, as well as a reduction in between-design differences in power at high GCR and
effect size. These suggest similar properties between the linear and nonlinear versions of the
ALD in terms of response to changes in GCR and effect size.
As in the linear model, power differences under attrition were mostly driven by the
amount of attrition rather than how the missing data was distributed across the age distribution.
Power differences between attrition levels were also greater when the sample size was fixed than
when the number of measurements were fixed. The amount of power loss from attrition in the
nonlinear models (~5%) was found to be much less than in the linear models (~10%), likely due
to the fact that much of the developmental curve is established early in the age trajectory and
101
thus less impacted by attrition distributed towards the end of the age spectrum. Additionally, the
lower overall power levels in the nonlinear model likely contribute to the comparatively smaller
amounts of power loss under attrition. We additionally examined attrition patterns that occurred
at the beginning of the developmental trajectory which, as expected, showed the least amount of
power, though power differences as a result of gamma were slight. Results for attrition and
frequency of measurement were similar to the linear model, with more frequent measurement
preferred when the sample size is fixed and less frequent measurement when the number of
measurements is fixed.
For the equal cost sample size, we found that improvements in power were greater for
designs with a larger number of cohorts, however unlike in the linear models, no design was able
to achieve the same power as the SCD when recruitment, measurement, and duration costs were
equivalent. If the costs were to shift towards duration (e.g. 66%), these would allow for a
substantial increase in the sample size to as much as a 118% increase for the 5-cohort design,
which would allow for greater improvements in power.
102
4.5 Chapter References
1. Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. Applied longitudinal analysis.
2004. Hoboken Wiley-Interscience.
2. Galbraith, S., Bowden, J., & Mander, A. (2017). Accelerated longitudinal designs: an
overview of modelling, power, costs and handling missing data. Statistical Methods in
Medical Research, 26(1), 374-398.
3. Grimm, K. J., Ram, N., & Hamagami, F. (2011). Nonlinear growth curves in developmental
research. Child Development, 82(5), 1357-1371.
4. Jackson, N.J. (2017). ALDSIM: Stata program for the simulation of accelerated longitudinal
designs. Stata Version 15.0. revised 09.07.2017.
5. Miyazaki, Y., & Raudenbush, S. W. (2000). Tests for linkage of multiple cohorts in an
accelerated longitudinal design. Psychological Methods, 5(1), 44.
6. Moerbeek, M. (2011). The effects of the number of cohorts, degree of overlap among
cohorts, and frequency of observation on power in accelerated longitudinal
designs. Methodology: European Journal of Research Methods for the Behavioral and Social
Sciences, 7(1), 11.
7. Timmons, A. C., & Preacher, K. J. (2015). The importance of temporal design: How do
measurement intervals affect the accuracy and efficiency of parameter estimates in
longitudinal research?. Multivariate Behavioral Research, 50(1), 41-55.
103
Chapter 5
Bias, Estimator Efficiency, and Coverage Probability in the absence of between-cohort
differences
In addition to evaluating the statistical power of an accelerated design, this dissertation
will also address how the accelerated design performs in terms of the precision and accuracy to
detect longitudinal change. These are important considerations when choosing a design as
decrements to accuracy can lead a researcher to over or underestimate the amount of growth that
has occurred. Similarly, if accelerated designs impact the precision of the growth estimates then
the results from studies using an ALD to estimate the development would prove to be more
variable (and hence less reliable) than the traditional longitudinal design. In this chapter we will
use the previous simulations from Chapters 3 (linear models) and 4 (nonlinear de-escalating
models) to explore the concepts of precision and accuracy in the ALD and compare metrics of
these to those of the single-cohort design. It is important to note that just like the previous two
chapters, the results from these simulations represent scenarios that are favorable to the single-
cohort design, as these data are generated with an absence of between-cohort differences in
slopes or intercepts.
5.1 Conceptualizing bias, estimator efficiency, and coverage probability
Our primary metrics from this chapter will be bias, estimator efficiency, and coverage
probability. Bias will be quantified as a percentage based on the difference in the estimated
parameter (𝜃
"
) from the population parameter (𝜃) as described by Enders and Bandalos (2001),
see equation 1.
𝑏𝑖𝑎𝑠 =
)*
+
, *-
*
∗100 , eq.1
Bias can be thought of as a metric of accuracy which will inform us how well the accelerated
design is able to capture the true developmental trajectory. For the both the linear and nonlinear
models the population slope parameter was a value of 2. Even the single-cohort design will be
expected to show some measure of bias, as the sample sizes in the simulations are sufficiently
small that the variability from between-subject differences in slope as well as the random
variation (i.e. GCR) will cause the estimated slopes to differ from the population parameter.
104
Estimator efficiency refers to the variability of the slope estimate (b) and is calculated by
taking the standard deviation of the slope estimates from the 1,000 simulations, see equation 2.
𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 =
6
∑ (9
:
, 9
;
)
=
>,@@@
:A>
B,CCC
, eq.2
Estimator efficiency will serve as our metric of precision, with larger values indicating less
precision in the estimation. This is not to be confused with our previous use of the term
efficiency to refer to the ability of the accelerated design to reduce the length of time it takes to
study a particular age-span. Efficiency in this chapter will thus refer to efficiency of the estimator
and not the aforementioned design efficiency. One expanded metric of estimator efficiency that
will be used is relative efficiency whereby the efficiency values for various ALDs will be
compared as a ratio of the efficiency values for the equivalent single-cohort design.
Coverage probability will be used as our metric of determining the type 1 error control of
these models. The coverage probability in these studies was calculated as the proportion of the
95% confidence intervals from the simulations that included the true population slope parameter.
Coverage probability should not be confused with the term coverage which has been used
throughout the previous chapters and which refers to the number of years of development under
study. Coverage probability can inform us about the standard errors of the slope as well as our
ability to control type 1 error. For example, when the coverage probability is greater than 95%
(our desired or nominal coverage probability), this indicates that the standard errors are likely
too large and result in a conservative test (type 1 error < 5%). When the coverage probability is
smaller than 95%, the standard errors would be too small resulting in a permissive test (type 1
error > 5%). For the simulations in this chapter we don’t expect many differences in terms of
coverage probability, as coverage is largely a function of bias and the standard error estimation.
With using a normally distributed error and no between-cohort variation, there is little possibility
for large deviations in coverage; nevertheless we introduce this term and concept in this chapter
to aid in understanding in the latter chapters that introduce between-cohort variation.
5.2 Bias, Estimator Efficiency, and Coverage Probability in the ALD
The simulation conditions for the linear and nonlinear models are the same as outlined in
chapters 3 (section 3.3) and 4 (section 4.3) using the 52 possible designs from the initial
conditions laid out in Table 3.1. While results from the nonlinear model will be displayed
105
alongside those of the linear, it should be noted that direct comparison between the two is not
possible as these models have different growth structures.
5.2.1 Differences in bias, efficiency, and coverage probability between the SCD and ALD
Our initial examination of bias in the SCD and ALD was conducted by comparing the
average values of the bias between the two design types for the 52 designs (Figure 5.1) in the
linear (panel A) and nonlinear (panel B) growth models.
Figure 5.1: Bias in the ALD versus SCD
For the linear model (panel A), the overall bias in the SCD and ALD are both low, with bias
values below 5% for the ALD and below 3% for the SCD. Bias was generally higher in the ALD
with 35 of the 52 (67%) designs showing higher bias in the ALD (i.e. above the equality line)
with an average bias in the ALD of 1.7% versus 1% in the SCD. This bias in the linear ALD at a
small sample size (N=60) suggests good performance relative to the SCD. In terms of real
numbers, with the true growth estimate being a slope of 2, the SCD on average would produce a
slope of 2.02 whereas the ALD would produce a slope of 2.034, a very minimal difference.
In the nonlinear model (Figure 5.1, panel B), the bias range was large for both the SCD
(range 0 to 62%) and ALD (range 0 to 200%) and the bias in the ALD was more likely to be
greater than in the SCD (32 of 52=62%). The average bias for both was also high (~15% SCD,
32% ALD) which for this nonlinear model is likely a function of the low GCR of 0.5 and small
sample size (N=60). The high bias values in the SCD are especially indicative of these problems
as we would expect values closer to 0 for the best-case-scenario SCD model.
When comparing estimator efficiency between the designs we will label the plot of
efficiency values as "Inefficiency" on the y-axis to aid in the interpretation that higher
"Inefficiency" leads to a less efficient estimator. We find that for the linear model (Figure 5.2,
106
panel A), the ALD is always less efficient than the SCD with an average efficiency of 1.14
versus 0.70.
Figure 5.2: Efficiency in the ALD versus SCD
In the linear model (panel A) we also note that efficiency in the SCD is directly related to the
number of measurements such that designs with a greater number of measurements will be more
efficient and thus less variable in the slope estimates (smaller inefficiency value). This is
intuitive as a greater number of measurements would be expected to increase the reliability of the
slope estimate. On average, the inefficiency of the linear ALD was ~1.6 times greater than in the
SCD. In the nonlinear model (panel B) we surprisingly find no such relationship and moreover
note that 11 of the 52 designs show better efficiency in the ALD (mean=10) than in the SCD
(mean = 20), a likely indication that the growth estimates are too noisy at the given sample size
or GCR for the nonlinear specification. On average the inefficiency of the nonlinear ALD was
~2.4 times greater than in the SCD.
The coverage probability in these models was close to the nominal levels (0.95) for the
linear model (Figure 5.03, panel A) for both the ALD (mean=0.945) and SCD (mean=0.944)
with about equal chance (~60%) that the ALD would have greater coverage probability than the
SCD.
107
Figure 5.3: Coverage Probability in the ALD versus SCD
This is entirely expected as the results in Figure 5.1 indicate low bias for both models and our
expectation given a linear model with no cohort differences and normally distributed errors
would be that the standard errors (and hence coverage) would be appropriately estimated.
For the nonlinear model (panel B) we find that the coverage probability is similarly
underestimated for both the ALD (mean=0.86) and SCD (mean=0.86) suggesting that both
models have an inflated type 1 error rate. As mentioned in the description of the calculation of
coverage probability (section 5.1), changes to the coverage result from either bias or problems
with standard error estimation usually due to non-Gaussian residual error. In this case, given the
high bias from Figure 5.1 it seems likely that these decrements in coverage probability result
from issues pertaining to the bias in the growth estimates. Coverage probability was equivocal
between the SCD and ALD with the SCD showing coverage closer to the nominal level ~55% of
the models.
5.2.2 Bias, Efficiency, and associations with Cs/Ps Ratio and number of measurements
Change in the Cs/Ps ratio or the number of measurements are not likely to impact the
amount of coverage probability, as this is generally invariant to changes relating to the frequency
or amount of measurement. However, estimator efficiency is likely to be impacted by these
metrics as increased frequency or total number of measurements will generally improve the
efficiency and should provide for more precise slope estimation. In Figure 5.4 below we examine
how the relative efficiency for ALDs referent to SCDs with 21 measurements performs for the
linear (panel A) and nonlinear models (panel B).
108
Figure 5.4: Smaller Cs/Ps Ratios minimize efficiency differences with the ALD
For SCDs with 21 measurements, the equivalent ALD will minimize the relative inefficiency in
the estimator when the Cs to Ps ratio is smaller (i.e. more overlap between successive cohorts).
This change in the relative efficiency is smaller for designs with a smaller number of cohorts, but
increases as the number of cohorts in the model increases. For a fixed age-span of 10 years, these
changes to the Cs/Ps ratio reflect the changes to estimator efficiency as a result of changes to the
number of measurements per subject (Pn).
Figure 5.5: Estimator efficiency in the ALD is related to the number of periods (Pn)
In Figure 5.5 we see that the estimator efficiency of a design improves (with diminishing returns)
as the number of measurements is increased. This occurs in both the linear (panel A) and
nonlinear models (panel B), though the nonlinear model shows much greater variability in this
association. This is similar to how the number of measurements relate to power and not
surprisingly designs with better efficiency show greater statistical power (Figure 5.6).
109
Figure 5.6: Estimator efficiency is related to statistical power
As a result of the associations we have seen in efficiency with the number of measurements and
the Cs/Ps ratio it is no surprise that the patterns observed would differ (just as their power
counterparts did) with respect to the period interval spacing. Indeed, we saw hints of this in
Figure 5.2 where the estimator efficiency in the SCD differed by the total number of
measurements which represented shifts in the Ps. In Figure 5.7 below we can see how the period
interval modifies the association between the number of measurements and the estimator
efficiency for both the linear (panel A) and nonlinear models (panel B) such that the rate of
efficiency improvement is greater at a fixed number of measurements when the frequency of
measurement is less.
Figure 5.7: Less frequent measurement improves the estimator efficiency for a fixed number of
measurements
In the scenarios proposed above, increases to the number of periods (Fig 3.5) or the
period interval (Fig 5.7) can increase the power of these designs and hence improve the
efficiency. The reason this occurs is because both the period spacing and the number of periods
110
are related to each other in these designs as a result of the fixed age-span under study, wherein
changes to one result in changes to the other. This has been explained previously in Section
3.3.3.
For bias, we would generally not expect changes to the number of measurements to
reduce the bias substantially, nor for our linear models would we expect changes to the period
interval to impact bias. In Figure 5.8 below we can see how at a small total number of
measurements the estimates are biased in both the linear and nonlinear models, however this bias
quickly reduces to an equilibrium after around 500 measurements. What is also notable is that
this variability in the bias is greater for designs with a larger number of cohorts.
Figure 5.8: Association of total number of measurements with bias
For the nonlinear model (panel B) we are unable to draw the same figure as in the linear model
(panel A) due to the reduced number of simulations at various sample sizes, however the same
inference can be drawn that a very small total number of measurements is likely to increase the
overall amount of bias in the slope. For the frequency of measurement, intuitively we would
expect that more frequent measurement (smaller Ps) would yield less bias in the slope estimate as
more frequent measurement would be more able to capture changes in the development. In
Figure 5.9 we examine how changes to the period spacing impact the average bias across these
designs.
111
Figure 5.9: Frequency of measurement and bias in the ALD
In a truly linear model we would not anticipate frequency of measurement, outside of its
contribution to the total number of measurements, to have a great impact on bias as only two
measurements are needed to properly estimate a linear growth. We find that with less frequent
measurement, the ALD tends to underestimate the amount of growth (-0.7%) while at very
infrequent measurement (Ps = 2.5) the ALD overestimates the growth (~1.1%) (panel A). This
increase in the magnitude of bias is more likely related to the loss in the total number of
measurements for these designs with period interval spacing > 2. For annual or biennial
measurement, there was slight underestimation of the growth (~ -0.05 to -0.2%), however these
values largely indicate no bias in the linear ALD. For the nonlinear model, the frequency of
measurement, particularly around the period of rapid change, could lead to problems in accurate
estimation of the slope. One obvious issue is the ability notice nonlinear change, as infrequent
measurement may give the impression of a linear model when in fact a nonlinear process is
occurring. While this is not a concern in this dissertation, as we have a priori specified the model
structure, for those designing a study this issue is critical. What is noticeable for our nonlinear
models is that the bias more than doubles as a result of decreasing the frequency of measurement
suggesting that for the nonlinear model, specified designs with yearly (Ps=1) or twice yearly
(Ps=0.5) measurements are necessary in order to avoid inducing a large amount of bias into the
slope estimate. While the average bias level is still high even with frequent measurement
(~14%), as explained previously this is more a function of the low default GCR (0.5) employed
for these models.
112
5.2.3 The impact of cohort overlap on efficiency and bias
The amount of overlap between successive cohorts has been previously shown to impact
the power of linear and nonlinear designs such that designs with a greater amount of overlap
between the cohorts have greater power and that this power increases nonlinearly such that a
faster rate of increase occurs with greater overlap. Indeed, just as we found that power and
estimator efficiency were related in Figure 5.5 above, we would thus expect similar results when
examining the relationship of estimator efficiency to overlap in Figure 5.10 below.
Figure 5.10: The association of estimator efficiency with cohort overlap
As we can see for both the linear and nonlinear models, the increase in overlap results in an
improvement in efficiency and mirroring the results for power show that this improvement in
efficiency is greatest for designs with a larger number of cohorts. While the figure indicates that
these improvements tend to oscillate even as they trend downwards, these oscillations are
resultant from differences in the cohort and period interval spacing within the same amount of
overlap. The decrements to efficiency (i.e. higher inefficiency) within these localized windows
reflect a smaller Cs/Ps ratio while improvements represent a higher ratio. When examining the
amount of bias as it pertains to overlap, we can examine the exemplar below (Figure 5.11) which
shows how cohort level growth estimates might look between designs with more or less overlap
in their trajectories, we can see how there may be implications for bias in the slope estimates.
Please note that while Figure 5.11 shows subtle differences in intercept level, this is purely for
the purposes of showing the separate cohorts and the models are assumed to show cohorts that
have the same slope and intercept levels.
113
Figure 5.11: Exemplar of multi-cohort slopes by overlap amount
As should be apparent from the exemplar above, when the fixed effect of slope is estimated from
these models, designs with a high degree of overlap (dashed lines) will have each cohort
contribute more information to the global fixed effect estimate than those with low overlap (solid
lines). Most importantly, the high overlap extends the amount of information available at the
extremes of the age distribution, ensuring that the leverage points for the trajectory are more
representative of the growth from all of the cohorts instead of just the youngest and oldest
members. This occurs because in these designs where the coverage has been fixed, the increases
in overlap represent a decreasing of the cohort interval spacing (Cs) and subsequent increase in
the number of measurements per subject (Pn) to ensure that the same age-span coverage is
maintained. As a result, it is sensible to presume that how much overlap occurs would impact the
bias of the growth estimates; however, in the special case where there are no slope or intercept
differences between the cohorts (which is the case for this chapter) we would not expect the bias
in the slope to be impacted by the amount of overlap (Figure 5.12). Despite this, we note how the
variability in the bias is reduced at higher levels of overlap, which is particularly prominent in
the nonlinear model (panel B).
114
Figure 5.12: Bias in the slope and overlap in the absence of cohort differences
As shown in the figure, there is no discernable pattern between the amount of overlap and the
average amount of bias in the slope for each cohort design. Indeed, the figures show that there
are designs with both low overlap and low bias however the propensity towards low bias is more
common with higher overlap. The reason why variability in the bias might be higher at lower
overlap is likely due to the added influence of any single cohort having parameters that differ
from the group trajectory, as at low overlap the influence of the youngest and oldest cohorts on
the overall trajectory is increased. At this small sample size the random variation from the
simulation could result in this greater variability at lower overlap and hence also explain why
occasionally these bias values can still occasionally be slight despite the low overlap.
5.2.4 The impact of Cohort parameters on ALD Bias, Efficiency, and Coverage Probability
In the previous sections we noted how changes to the amount and frequency of
measurement related to changes in bias and estimator efficiency. While these covered aspects
pertaining the period spacing (Ps) and number of periods (Pn), they neglect the differential impact
that the choice for the number cohorts may play (C n). While we saw some indications of how the
number of cohorts relate to estimator efficiency in Figures 5.4 and 5.10 we have yet to examine
how these metrics preform when the number of periods and period spacing is fixed. In the
following simulations we will examine how changes to the number of cohorts will impact these
measures at various sample sizes while fixing the cohort and period intervals to 1.
For bias we would not expect many changes with sample size, as metrics of bias are
known to be unaffected by changes to sample size. While the simulations largely show this, there
is some indication that a very small sample sizes that the bias is inflated.
115
Figure 5.13: Bias by Cohort as a function of sample size
In the linear model (panel A) we can easily see how bias at sample sizes less than 40 is greater,
particularly for designs with a large number of cohorts. This occurs because the sample size per
cohort is reduced as the number of cohorts increases. Bias at higher sample sizes oscillates
around 0, indicating that changes across sample size are large due to the stochastic nature of the
stimulations. These same properties in the nonlinear model (panel B) are more difficult to
discern because of the limited sample size used, the reasoning for which has been explained in
Chapter 4. However, we can notice that the variability in the bias differs by cohorts for the linear
(panel C) and nonlinear (panel D) models which show that the designs with a greater number of
cohorts will have a more variable bias which increases per cohort at a near linear rate in the
linear model. Bias variability in the nonlinear model, though increasing with the number of
cohorts, is similar between the SCD and 2-cohort design and again between the 3 and 4-cohort
ALDs. Part of why this may occur may have to do with the amount of overlap and the ability of
the oldest and youngest cohorts to exert greater leverage on the nature of the slope. While we
noted that mean levels of bias did not vary as a function of the overlap percentage in Figure 5.12,
we can saw that the estimates of bias were more variable at the lower levels of overlap. In panel
A of Figure 5.13 above, though the Ps and Cs are both fixed to 1, in order for a design with 5
cohorts to equal the same age-span coverage as a design with only 2 cohorts, the number of
periods per subject has to be reduced and hence the amount of overlap between the successive
116
cohorts. To test this, we examined the simulation data from Figure 5.12 and plotted the
variability in the bias estimates as a function of quintiles of the overlap proportion with a fixed
sample size.
Figure 5.14: Variability in Bias and Overlap quintiles
As we can see in Figure 5.14 above, there is an overall downward trend within each ALD for a
decrease in the bias variability as the overlap proportion increases. While mean levels do not
vary greatly between the number of cohorts (Fig. 5.13) the amount of variability in the bias is
higher for designs with a greater number of cohorts (Fig. 5.13) and that this occurs largely as a
function of the reduced overlap in these designs as evidenced by the overlap-bias variability
relationship.
For estimator efficiency we would expect similar patterns that were seen for the number
of cohorts and power associations that were described in Chapters 3 and 4, which showed greater
power for a smaller number of cohorts. Indeed, as seen in Figure 5.15 we find that as the sample
size increases the efficiency improves for both the linear (panel A) and nonlinear models (B).
Figure 5.15: Improvements in estimator efficiency with increased sample size
117
We additionally find that regardless of sample size, designs with a fewer number of cohorts show
the best estimator efficiency and that the between-design differences in efficiency increase for
designs with a larger number of cohorts. The rate of change in efficiency as a function of sample
size is also consistent between designs of differing cohort numbers as evidenced by the changes
in relative efficiency across the sample sizes which show a consistent ratio of ALD to SCD
efficiency (Figure 5.16) in the linear model (panel A). Though the nonlinear model (panel B)
appears as if there are changes to the relative efficiency across the sample size, after a sample
size of 40 these differences are minimized and indeed this same pattern is also present but less
pronounced in the linear model.
Figure 5.16: The relative estimator efficiency within a design is consistent across sample size
Though the estimator efficiency and power are related, the relative efficiency and relative power
show different patterns. For power we saw that increases in sample size showed greater gains in
relative power (Figure 3.11) indicating that the power gains for the ALD improved at a faster
rate than in SCD which was not the case for the relative efficiency (Figure 5.16).
Given the low bias in the linear model and expected performance of the estimator
efficiency we would expect the coverage probability to not vary as a function of sample size for
the linear model. Similar to our results for bias we find that at very low sample sizes (N < 40) the
coverage probability is underestimated resulting in type 1 error inflation (Figure 5.17, panel A).
118
Figure 5.17: Coverage Probability as a function of sample size
At larger sample sizes the coverage probability oscillates as a result of random variation from
the simulation processes. There are no apparent differences in coverage probability between
designs with different numbers of cohorts in the linear model (panel A), however the nonlinear
model (panel B) shows consistent underestimation at all sample sizes and that this
underestimation is more pronounced in designs with a greater number of cohorts. While part of
the discrepancy between the linear and nonlinear model is due to the limited sample size for the
nonlinear model, the consistent underestimation of coverage exposes a potential liability for the
nonlinear design and is believed to occur as a result of the seemingly low GCR (0.5) with which
these nonlinear models were simulated. There was no consistent pattern with regards to the
variability in the coverage probability for either the linear (panel C) or nonlinear (panel D)
model indicating that the observed fluctuations in coverage are most likely the result of
stochastic processes rather than related to design elements of the ALD.
5.2.5 The role of GCR on bias, efficiency, and coverage probability
The growth curve reliability (GCR) and slope effect size (δ) have both been shown to
impact the power of the ALD design, however we have yet to examine the impact on bias,
efficiency, and coverage probability in these designs.
We will initially explore changes in the GCR as the effect size is fixed to δ=0.5. The
GCR represents the ratio of variability due to growth over the total variability in the system (see
119
Chapter 3 for formal definition), as a result it can be thought of changes to the random error of
the model with higher values indicating less random error. The GCR in both the linear and
nonlinear models was set to 0.5, 0.7, and 0.9 in the simulations below. For bias, our prior
expectation would be no changes as a result of the GCR, as more or less random error in the
model should not impact the bias of the point estimate. In Figure 5.18 we explore bias in these
designs for the largest sample size of 120. We can see how the assumption of no GCR
association with bias was only partially true, as the linear model (panel A) showed mostly
random fluctuation around the zero value.
Figure 5.18: GCR and Bias
For the nonlinear model (panel B) however we found that increases to the GCR continued to
show improvements in bias. For both models these results were contrary to our expectations and
showed that increased GCR reduces the bias of the slope estimate. Why this occurs in the
requires further exploration as changes to the amount of random error should not alter the point
estimate of the slope but rather the precision with which we are able to estimate it. This
highlights the sensitivity of the nonlinear models to the growth curve reliability, as at low values
of the GCR the high amounts of random error distort the shape of the slope and result in higher
bias. Table 5.1 below shows how the bias changed from the linear and nonlinear models from a
GCR of 0.5 to 0.9.
120
Table 5.1: Bias from low to high GCR
Model
Number of
Cohorts (Cn)
GCR=0.5
Bias% at N=120
GCR=0.9
Bias% at N=120
Fold-Change
in Bias
Linear
1 -0.85 -0.87 0.98
2 -0.51 -0.72 0.71
3 -1.92 -0.87 2.21
4 -1.57 -1.71 0.92
5 1.47 0.89 1.66
Nonlinear
1 5.7 1.3 4.3
2 3.9 3.1 1.2
3 18.2 5.8 3.1
4 20.1 6.3 3.2
5 21.5 8.4 2.6
As we can see, the fold-change in bias was largely independent of the number of cohorts in the
model and showed that all designs came within 0.9% and 7% of the SCD bias at the 0.9 GCR for
the linear and nonlinear models respectively. The 2-cohort design in the nonlinear simulations
was an outlier with respect to the change in bias suggesting a potential problem with the
estimation of these models in the 2-cohort scenario.
We would additionally expect greater improvements in precision and hence
improvements in the average estimator efficiency as we increase the GCR. In Figure 5.19, we
find exactly this, such that increase to the GCR improve efficiency in both the linear and
nonlinear models and that improvements in efficiency are greater for designs with more cohorts.
For the linear SCD we find that moving from a 0.5 to 0.9 GCR, on average, results in a 1.3 fold
improvement in efficiency while for the 5-cohort design this increases to a 1.8 fold change (Table
5.2).
Figure 5.19: Increased GCR improves estimator efficiency
121
Table 5.2: Average Estimator Efficiency from low to high GCR
Model
Number of
Cohorts (Cn)
GCR=0.5
Avg Eff
GCR=0.9
Avg Eff
Fold-Change
in Eff
Linear
1 0.82 0.62 1.32
2 0.89 0.63 1.41
3 0.99 0.66 1.51
4 1.12 0.68 1.64
5 1.26 0.70 1.80
Nonlinear
1 4.5 0.70 6.1
2 26.5 2.0 13.4
3 9.4 3.4 2.8
4 22.3 2.9 7.7
5 18.3 4.3 4.2
While this pattern of designs with a greater number of cohorts showing greater improvements in
efficiency was not as clearly present in the nonlinear models (Table 5.2), this is likely resultant
from the limited number of simulations from which these conclusions can be drawn in the
nonlinear models. Notably, the 2-cohort nonlinear design is an outlier in estimator efficiency
compared to the other ALDs, suggesting that either there are unique problems with estimation 2-
cohort nonlinear designs or alternately that these designs are more sensitive to the variation in
simulation conditions given that both cohorts have high leverage on the overall trajectory.
Given the changes that we see for the estimator efficiency it would not be surprising to
find improved coverage probability at higher GCRs. In Figure 5.20 below we examine how
changes to the GCR impact coverage. While for the linear model (panel A) there is not a clear
initial pattern, the seemingly high variability in the coverage at GCRs of 0.5 and 0.7 largely
reflect small changes in the coverage due to random variation in the simulation. Across the
GCRs it can be seen that the cohorts fluctuate around a similar coverage probability which on
average is below the nominal 0.95 level, suggesting that both the SCD and ALDs underestimate
the coverage across the sample sizes chosen.
122
Figure 5.20: GCR and Coverage Probability
At the high GCR (0.9) the designs with a greater sample size (~120) approach the nominal
coverage (mean=0.946) suggesting that for linear models a GCR of 0.9 is needed with a fairly
large sample size (N=120) at a modest effect size (δ=0.5). We will see below how increases to
the effect size can lower the sample size requirement to reach the nominal coverage probability.
For the nonlinear model (panel B), the changes to coverage as a function of the GCR are more
apparent and display a pattern whereby increased GCR improves the coverage probability.
While at a low GCR (0.5) the coverage values are below the nominal level (mean=0.89), as the
growth curve becomes more reliable the coverage probability increases dramatically
(mean=0.94) and shows the worst best probability at a GCR of 0.9 for the single-cohort design.
We must interpret what has occurred here in light of our findings for both bias and estimator
efficiency. Both the bias (Fig. 5.18) and estimator efficiency (Fig. 5.19) improved with higher
GCR, the average bias across designs even at a large sample size (N=120) was still high
(mean=5%) ranging from 1.3% in the SCD and 8.4% in the 5-cohort ALD. The rate of change in
bias was also faster than the rate of change in the efficiency resulting in individuals estimates
from these simulations that still showed positive bias but now with confidence intervals that
included the point estimate.
In the above Figures (5.18-5.20) we saw how the ALDs and SCD appeared to vary in
their values for bias, efficiency, and coverage probability even at a high GCR (0.9). We can
confirm this by examining the between-cohort variability in these metrics. In Figure 5.21 can see
how bias (panels A & B), estimator efficiency (panels C & D), and coverage probability (panels
E & F).
123
Figure 5.21: Between Cohort Variability in Bias, Efficiency, and Coverage Probability as
a function of GCR and Sample Size
Many of the figures above indicate that the variability between designs of different cohort
numbers (Cn) is less at larger sample sizes and that this variability reduction increases as the
GCR increases, highlighting the importance of having good reliability in the estimates in order to
minimize the differences in choice between the number of cohorts to use in an ALD. For bias in
the linear models (panel A), after a sample size of approximately 40, the variability between the
cohorts was minimized and moreover the effect of having a high GCR became negligible (i.e.
within .1 SD). For the nonlinear model (panel B), the differences across sample size and GCR
were more pronounced, with between-cohort variability reductions in bias being much greater at
a higher GCR and large sample size. The between-cohort variability in the estimator efficiency
was heavily impacted by increases in the GCR. These showed a nearly 4-fold difference between
the high (0.9) and low (0.5) GCR in the variability of the efficiency regardless of sample size in
124
the linear model (panel C). Indeed, the within-GCR variability of across sample size changed
minimally after a sample size of ~40, suggesting that changes to GCR will be more impactful for
reducing between-cohort variability in efficiency rather than increasing sample size. A similar
pattern was observed in the nonlinear model (panel D). For coverage probability, the between-
cohort variability was minimal regardless of GCR, particularly at larger sample sizes (N>40).
Unlike bias and efficiency, there was no consistent pattern with the GCR to suggest that a higher
or lower GCR would minimize between-cohort differences in the linear model, however the
nonlinear model (Panel F) showed clear improvements with improved GCR. This aligns closely
with what we saw in Figure 5.17 which showed that within-design variability in coverage was
independent of the number of cohorts used. Given these findings and those from Figure 5.20, it
seems clear that while increased GCR may improve the coverage, this improvement occurs
regardless of the number of cohorts used in the linear model.
5.2.6 The role of Effect Size on bias, efficiency, and coverage probability and the interplay with
GCR
As we saw above, the growth curve reliability can have a large impact on improvements
in bias, estimator efficiency, and coverage probability. While the GCR relates to the amount of
random error in the developmental trajectory, the slope effect size (δ) pertains to the precision of
the slope parameter itself and is defined by the ratio of the slope its standard deviation.
When examining bias, we would expect that increases to effect size, which reflect
increases in precision around the slope, would decrease the bias in the slope estimate. Indeed,
this is exactly what we find for both the linear and nonlinear models in Figure 5.22 below.
Figure 5.22: Bias as a function of sample size and effect size (δ)
125
While the figure for the linear model (panel A) has been loess smoothed to make visualization
easier, it should be noted that there is variability in the bias across the sample size that oscillates
around the mean bias estimate. While there are large differences in the bias between the cohorts
at small sample sizes (N <=40), on average the amount of bias in the linear model is not terribly
different in the ALDs versus the SCD even at a smaller effect size of 0.5 and with a low GCR
(Figure 5.23, panel A).
Figure 5.23: Average Bias as a function of effect size (δ)
Table 5.3 shows the average bias values across the various effect sizes for the linear and
nonlinear models at a GCR of 0.5. The average bias in linear models for designs with less than 5
cohorts were so similar (within ~0.7%) at all effect sizes for designs with 4 or less cohorts to
suggest that the use of ALDs makes no difference in terms of bias. The same cannot be said for
nonlinear models (panel B), as there are great differences in bias, especially at smaller effect
sizes (0.5) where the bias difference from the SCD can be as high as ~50%. While this reduces to
within 5% of the SCD bias at an effect size of 1.0 for most of the designs, curiously the 2-cohort
design shows a high level of bias at this effect size. While this could indicate a true bias value,
most likely this indicates a problem with the simulation for this cohort given that the values for
this cohort and effect size appear to be radically different from the other ALD designs seen when
looking at changes across sample size in Figure 5.22.
126
Table 5.3: Average Bias across effect sizes at GCR=0.5
Model
Number of
Cohorts (Cn)
δ=0.5
Avg Bias%
δ=1.0
Avg Bias%
δ=1.5
Avg Bias%
Linear
1
-0.25 -0.23 -0.08
2
-0.54 0.02 -0.19
3
-0.90 0.03 -0.30
4
0.11 -0.09 0.05
5
1.28 -0.26 0.43
Nonlinear
1
7.1 2.6 0.9
2
-28.7 13.2 0.8
3
26.6 6.1 2.5
4
46.8 4.3 2.6
5
60.4 6.9 11.0
For estimator efficiency we would also anticipate improved efficiency at greater effect
sizes. In the linear model, these improvements are readily apparent and when examining the
average efficiency across the sample sizes it is easily noted that increases to effect size
demonstrate diminishing returns on improvements in efficiency (Figure 5.24, panel A).
Figure 5.24: Impact of effect size on average estimator efficiency
While within a given effect size, the between-design differences in efficiency increase as
additional cohorts are added, as the effect size increases these differences become minimized.
For the nonlinear model (panel B), we note that the 2-cohort design performs poorly at the small
(0.5) and medium (1.0) effect sizes and that difference between the cohorts are only minimized
at the largest effect size (1.5) and only for designs with fewer than 5 cohorts. These differences
are displayed in Table 5.4 below.
127
Table 5.4: Average Estimator Efficiency across effect sizes at GCR=0.5
Model
Number of
Cohorts (Cn)
δ=0.5
Avg Eff
δ=1.0
Avg Eff
δ=1.5
Avg Eff
Linear
1 0.82 0.41 0.37
2 0.89 0.44 0.36
3 0.99 0.50 0.45
4 1.12 0.56 0.38
5 1.26 0.63 0.64
Nonlinear
1 4.5 0.4 0.2
2 26.5 22.9 1.0
3 9.4 1.9 0.4
4 22.3 3.5 1.0
5 18.3 7.8 5.6
The average coverage probability was not related to the effect size in the linear model
(Figure 5.25, panel A), however in the nonlinear model (panel B) the increased effect size
resulted in increases to the coverage. This was particularly pronounced for designs with a larger
number of cohorts, though designs with a smaller number overall showed better coverage. For
the nonlinear models, this suggests that improved precision allows for greater reductions in bias
at a faster rate than the improvements in efficiency and that the greater variability in the larger
cohort designs allows for this increase in coverage probability.
Figure 5.25: Impact of effect size on average coverage probability
The interplay between the GCR and effect size (δ) can be difficult to understand. As we
saw in the previous section, when the effect size was held constant for linear models, increases to
the GCR were not associated with changes in bias or coverage probability but were associated
with decreases in the estimator efficiency, particularly at larger sample sizes (Figures 5.18-5.20).
Also in the linear model, increases in effect size were shown to decrease bias and improve
128
efficiency, but with no effect on coverage. For the nonlinear model we saw an alternate pattern
whereby increases to the GCR decreased bias, improved efficiency, and paradoxically decreased
the coverage probability. Larger effect sizes were also shown to decrease bias and improve
efficiency while increasing the coverage probability. In order to better understand these two
measures of effect, we'll explore how the GCR and effect size interact with each other.
For examining bias, we've created Figure 5.26 that looks at the change in bias between a
high (0.9) and low (0.5) GCR at a medium (0.5) and large (1.0) effect size (panels A & B). We
alternately explore the change in bias between the medium and large effect size at both the low
and high GCR (panels C & D).
Figure 5.26: Interplay between GCR and Effect Size (δ) on Bias
129
Table 5.5: Average Bias across effect sizes and GCR
Model
Number of
Cohorts (Cn)
GCR=0.5
δ=0.5
Avg Bias%
GCR=0.5
δ=1.0
Avg Bias%
GCR=0.9
δ=0.5
Avg Bias%
GCR=0.9
δ=1.0
Avg Bias%
Linear
1 -0.25 -0.23 -0.11 -0.07
2 -0.54 0.02 -0.38 -0.20
3 -0.90 0.03 -0.65 -0.33
4 0.11 -0.09 0.05 0.03
5 1.28 -0.26 0.48 0.25
Nonlinear
1 7.1 2.6 2.3 0.3
2 -28.7 13.2 3.5 1.0
3 26.6 6.1 11.8 2.0
4 46.8 4.3 5.0 2.0
5 60.4 6.9 6.9 3.3
Our findings regarding bias with GCR and effect size are similar to what was previously
discovered, with some differences. For the linear model, while previously we found GCR to be
independent of changes in bias (Fig 5.18), this observation was due to the plotting of these
associations for specific sample sizes. Given that the bias has been shown to oscillate as a
function of sample size this finding was entirely dependent on which sample sizes were chosen.
When bias values were averaged across the sample sizes in Figure 5.26 (panel A) we can see that
on average these values decrease as the GCR is increased when the effect size is medium (0.5),
though these decreases were minimal (~<1%). Table 5.5 above provides the numerical values for
indepth study. At the larger effect size, however there appears to be no relationship between the
GCR and bias. For the nonlinear model (panel B) our results were similar to before which show
that increased GCR decreases bias. The amount of decrease was substantial for the ALDs at the
medium effect size ranging between a 15% to 53% drop, with the greatest decrease in the larger
cohort designs. At the larger effect size, the drop in bias was much less with most of the ALDs
experiencing a <4% drop. At a large effect size and small GCR, the ALDs (with exception of the
2-cohort outlier) were all within 5% of the SCD bias and this reduced to within 3% at the high
GCR. Looking at changes to effect size within a given GCR we found for the linear model (as
noted previously in Fig 5.23) that increased effect size decreased bias at both a large and small
GCR (panel C). With a large effect size in the linear model the bias values were all within 0.25%
of each other regardless of GCR for the ALDs and SCD. For the nonlinear model (panel D),
130
increases in effect size also decreased the bias which was more effective when the GCR was
low. At the high GCR, the average decrease in bias was ~4% for increasing the effect size.
For estimator efficiency, increases in efficiency occurred when GCR was increased from
0.5 to 0.9 in the linear model for both the medium and large effect size (Figure 5.27, panel A),
though the rate of change was greater when the effect size was larger (δ=1.0) resulting in a ~2
fold increase in efficiency for the 5-cohort design. Table 5.6 below provides the numerical values
for comparison. Differences in efficiency between the designs were modest at the low GCR with
a ~1.5 fold difference between the SCD and 5-cohort design at both effect sizes. When the GCR
was high (0.9) between design differences were minimized to a maximal 1.11 fold difference
between designs. For the nonlinear model (panel B), the changes in efficiency were even greater
such that increased GCR showed a minimal 4-fold increase in efficiency with the SCD
improving 6-fold when the effect size was small. Changes in estimator efficiency due to GCR
were minimized at the large effect size with 1.3 to 2 fold increase in efficiency. Between design
differences in efficiency were minimized at the high GCR particularly at the large effect size.
Figure 5.27: Interplay between GCR and Effect Size (δ) on Estimator Efficiency
131
Table 5.6: Average Efficiency across effect sizes and GCR
Model
Number of
Cohorts (Cn)
GCR=0.5
δ=0.5
Avg Eff
GCR=0.5
δ=1.0
Avg Eff
GCR=0.9
δ=0.5
Avg Eff
GCR=0.9
δ=1.0
Avg Eff
Linear
1 0.82 0.41 0.62 0.31
2 0.89 0.44 0.63 0.32
3 0.99 0.50 0.66 0.33
4 1.12 0.56 0.68 0.34
5 1.26 0.63 0.70 0.35
Nonlinear
1 4.5 0.4 0.70 0.3
2 26.5 22.9 2.0 0.3
3 9.4 1.9 3.4 0.4
4 22.3 3.5 2.9 0.6
5 18.3 7.8 4.3 0.6
When the effect size was modulated within a GCR, increases in effect size showed a similar rate
of change in the efficiency regardless of the GCR resulting in a 2 fold increase in efficiency in
the linear model (panel C). For the nonlinear model (panel D), efficiency in the SCD at the low
GCR (0.5) was worse than any of the ALDs at high GCR, but performed at least 2-fold better
than any of the ALDs at low GCR. Moreover, although the 2-cohort design exhibited an unusual
pattern with the effect size at a low GCR (see Fig 5.24 panel B), at a high GCR the 2-cohort
design performs as expected showing the next best efficiency compared to the SCD.
When examining coverage probability, our previous results showed no association with
GCR in the linear model and no association with effect size in the linear or nonlinear model.
Paradoxically we found that in the nonlinear model, increases in GCR drastically reduced the
coverage of the slope confidence intervals. Our results below (Figure 5.28) when examining this
interaction of effect size and GCR confirmed these findings.
Figure 5.28: Interplay between GCR and Effect Size (δ) on Coverage Probability
132
For the linear model, some ALD designs may give the appearance of a trend with increased GCR
(panel A) or effect size (panel C), however these changes are within 0.005 of each other and
more likely reflect random variation resulting from the simulations. Similarly, the relative
position of the cohorts in the linear model is not likely meaningful as there is no apparent
ordering to the designs with respect to their coverage. For the nonlinear model, we find
decreased coverage at high GCR and that this decrease is greater at a larger effect size and for
designs with more cohorts (panel B). This surprising finding likely indicates problems with the
default effect sizes specified for the nonlinear model. Within a GCR, there is no association with
increased effect size, however mean coverage levels are lower at the higher GCR (panel D).
5.2.7 Attrition and power loss and the association with bias, efficiency, and coverage probability
Thus far in this chapter our simulations have occurred in the absence of attrition,
providing for a "best case scenario". In practice however, this is not a reasonable assumption as
most longitudinal studies, particularly those over a long time-span, will have some portion of the
participants dropout. The effects of attrition on power have been previously discussed in chapters
3 and 4, however this section will explore the role of attrition on the bias, estimator efficiency,
and the coverage probability. When examining attrition there are two aspects of concern, the
amount of attrition (w) and how that attrition is distributed (γ). As explained in previous
chapters, the attrition in these designs will impact the number of measurements per subject but
not the total number of subjects in the simulations. We will first examine the role of attrition on
our metrics when attrition is distributed uniformly (γ=1) indicating that the missing data are
Missing Completely at Random (MCAR). In Figure 5.29 below, we can see how increasing
attrition relates to changes in bias for our linear (panel A) and nonlinear (panel B) models.
133
Figure 5.29: Bias in the presence of attrition when data are MCAR.
As we might expect, when attrition is distributed uniformly across the developmental trajectory,
we find no association with the amount of attrition and the bias in the slope estimates. For the
linear model changes to bias vary by +/- 0.5% as attrition changes within a design, but with no
clear pattern for when these changes to bias occur. For the nonlinear model, bias values vary by
as much as 60% from the no-attrition simulation with, again, no apparent pattern pertaining to
the amount of attrition.
Similar patterns are observed when examining the estimator efficiency in Figure 5.30. We
would naturally expect that as the number of measurements in the system are decreased, the
efficiency of the estimator would perform worse. For the linear model (panel A) there appears to
be a slight increase (~0.1) in inefficiency for all designs as the attrition increases which reflects a
small change in efficiency (~10%) from the scenario with no attrition. For the nonlinear model
(panel B), changes to the attrition seem to have no association with the average efficiency of the
designs. While it seems likely that, similar to the linear model, there is some change occurring in
the nonlinear model, the variation from the randomness of the simulations likely overwhelms the
possibility of observing these small changes to efficiency as a result of attrition.
134
5.30: Estimator Efficiency in the presence of attrition when data are MCAR
When examining the coverage probability, we also unexpectedly find no association for the
linear or nonlinear models in Figure 5.31 below.
5.31: Coverage Probability in the prescience of attrition when data are MCAR
While the above have shown a minimal impact of attrition, this may have more to do with how
the attrition has been concentrated, as a uniform distribution of attrition though altering the total
number of measurements would not, on average, be expected to change the ability to estimate the
slope parameter.
For this reason, we have also explored how these metrics behave when the attrition is
fixed at 15% and where the attrition is concentrated is changed. For the linear model we
modified gamma values to explore attrition as it was concentrated towards the end of the study
(e.g. γ=2 and 3) while for the nonlinear model we examined concentrations at the beginning of
the developmental curve (e.g. γ=0.5) where the growth rate is greatest as well as towards the end
of development (e.g. γ=2). When applied to our examination of bias in Figure 5.32 we find slight
135
increases in the amount of bias tending towards overestimation of the growth parameters as
attrition is concentrated towards the end of the study (i.e. higher gamma).
Figure 5.32: Bias when attrition is fixed and gamma is modulated
For the nonlinear model (panel B), we would have expected that there would be greater bias in
the estimates when the attrition was concentrated towards the beginning of the developmental
trajectory; particularly given that for the nonlinear trajectory this is the area of greatest change
that will help determine the slope value. However, we find that gamma has no association with a
bias in the nonlinear models with a ~ ± 2-14% change in bias across designs compared to when
the data were MCAR.
When examining estimator efficiency, there were no differences as a function of gamma.
The linear models (Figure 5.33, panel A) showed almost no change (< 1.04 fold) from when the
data were MCAR as the attrition was concentrated towards the end the study. A similar story was
found for the nonlinear model (panel B), where there was no consistent relationship between
efficiency and gamma. Changes in efficiency were more variable in the nonlinear model with a
1.5-3 fold change in the efficiency from MCAR.
136
Figure 5.33: Estimator Efficiency when attrition is fixed and gamma is modulated
For the coverage probability, there was also no association with changes in gamma. in
either of the models (Figure 5.34).
Figure 5.34: Coverage Probability when attrition is fixed and gamma is modulated
Overall, our findings on attrition indicate that the amount of attrition or the distribution of
that attrition will have minimal impact on the bias and efficiency of these designs and no impact
on the coverage probability in the linear and nonlinear models.
5.2.8 Equal cost sample size and the changes to bias, efficiency, and coverage probability
The equal cost sample size (EqNc) allows us to offset some of the decrements in bias and
efficiency that occur when using an ALD by boosting the sample size through the increased cost
savings of the ALD. The EqNc is specific to the particular cost function used, which in the
following scenarios represents the sample size available if the costs for recruitment,
measurement, and duration were equal (i.e. each 33.3% of budget). If the cost distribution were
to shift away from duration then the EqNc would decrease as fewer subject would be able to be
afforded due to a decrease in cost savings for the ALD.
137
For bias, use of the EqNc in the linear model would decrease the bias on average by
0.5% which was enough for the ALD designs with fewer than 5 cohorts to have a bias value less
than the single cohort design (Figure 5.35, panel A).
Figure 5.35: Reductions in Bias for the equal cost sample size.
For the nonlinear model (panel B), these reductions, though more substantial (mean=22%)
allowed for the ALD to equal the bias of the SCD in the 3-cohort model only. Surprisingly, the
2-cohort model showed an increase in the bias, which likely reflects the aforementioned
problems with this particular design in the nonlinear models. Nevertheless, the reductions in bias
indicate that the use of the EqNc is a potential mechanism to recoup some of the decrements that
result from utilizing an ALD for the both the linear and nonlinear models. Table 5.7 below shows
the change in the average bias values when using the EqNc. As we can see, all of the linear
ALDs and 3-cohort nonlinear ALD reduce bias values below that of the SCD.
Table 5.7: Average Bias using the EqNc
Model
Number of
Cohorts (Cn)
N
Avg Bias%
EqNc
Avg Bias%
Difference
Linear
1 0.58 - -
2 0.34 -0.40 -0.74
3 0.32 0.26 -0.07
4 0.52 0.13 -0.39
5 -0.03 -0.30 -0.27
Nonlinear
1 7.1 - -
2 6.8 12.6 5.8
3 26.3 4.9 -21.4
4 46.9 6.8 -40.1
5 60.6 -3.0 -63.6
138
When examining the estimator efficiency, we similarly find that use of the EqNc improves
the efficiency with an average 1.10 fold improvement in the efficiency in the linear model
(Figure 5.36, panel A). Although the efficiency never reaches the level of that in the SCD, the
differences from the ALD at the EqNc are minimized with only a 1.06 fold difference in
efficiency from the 2-cohort design and 1.25 fold from the 5-cohort design.
Figure 5.36: Improvements in Estimator Efficiency for the equal cost sample size
In the nonlinear model (panel B), a similar trend is noted though the improvement in efficiency
for the ALDs is greater (mean=1.15 fold). Compared to the SCD, the ALDs using the EqNc
maintain a worse efficiency which range from a 1.4-fold to a 3.4-fold difference from the SCD.
The values for the efficiencies at the EqNc are displayed in Table 5.8 below.
Table 5.8: Average Efficiency using the EqN c
Model
Number of
Cohorts (Cn)
N
Avg Eff
EqNc
Avg Eff
Fold-Change
Linear
1 0.66 - -
2 0.72 0.70 1.02
3 0.80 0.74 1.08
4 0.91 0.81 1.13
5 1.02 0.83 1.23
Nonlinear
1 4.5 - -
2 4.3 10.3 0.4
3 9.3 6.4 1.5
4 22.3 15.4 1.4
5 18.5 15.1 1.2
Given that our previous examinations of sample size with coverage probability did not
yield any notable trends, we would not expect improvements in coverage by using a larger
139
sample size from the EqNc. As a result, we did not examine the role of the EqNc on coverage
probability.
Though the above examine how using the EqNc can improve upon bias and estimator
efficiency relative to the SCD, these assume a scenario where no attrition has occurred. The use
of EqNc may be particularly important in instances where attrition is present, as the ALD will be
less sensitive to attrition as well as be able to have a larger sample size relative to the SCD. The
computation of the EqNc in the presence of attrition was conducted as described in section 3.3.7
in Chapter 3. When examining bias using the EqN c in the presence of attrition we observe that
that changes in bias are reduced when attrition is present (Figure 5.37). For the linear model
(panel A), we see that the decreases in bias by using the EqN c are more pronounced in designs
with more cohorts when attrition is not present. When attrition is present the improvements in
bias are minimized suggesting that use of the EqNc under attrition is not as impactful on bias as
when attrition is not present. This indicates that the loss of measurements per subject (Pn)
through attrition is more impactful on bias than the gain in sample size from the EqN c.
Figure 5.37: Changes in Bias by using the EqNc with and without attrition
140
Table 5.9: Average Bias using the EqNc under attrition
Model
Number of
Cohorts (Cn)
N
Avg Bias%
EqNc
Avg Bias%
Difference
Linear
1 0.66 - -
2 0.72 0.70 0.98
3 0.80 0.74 0.92
4 0.91 0.81 0.89
5
1.02 0.83 0.81
Nonlinear
1 4.5 - -
2 4.3 10.3 2.4
3 9.3 6.4 0.7
4 22.3 15.4 0.7
5 18.5 15.1 0.8
In the nonlinear model, a similar pattern was observed, suggesting that the gains by using the
EqNc are reduced under attrition. Regardless, in both the linear and nonlinear models under
attrition only the 2- and 3-cohort designs approach the same level of bias as in the SCD when
utilizing the EqNc as show in Table 5.9.
When examining estimator efficiency, negative values indicate improvement in efficiency
in the EqNc versus the usual sample size (Figure 5.38). For the linear model (panel A), there
were minor differences in the change in efficiency when the model was constructed under
attrition conditions. The 2 and 3-cohort designs showed an efficiency improvement of 1.2 and
1.03 fold respectively whereas the 4 and 5-cohort designs showed a 1.15 and 1.06 fold loss in
efficiency under attrition. Overall, these results indicate a negligible change in estimator
efficiency when the EqNc is employed under attrition for the linear model. For the nonlinear
model (panel B), the 3-cohort design showed a 1.4-fold loss in efficiency while the 2-, 4-, and 5-
cohort designs showed a 1.6 to 5-fold improvement in efficiency when using the EqNc under
attrition.
141
Figure 5.38: Changes in Estimator Efficiency by using the EqNc with and without attrition
For both models there was generally a pattern of efficiency improvement when using the EqNc or
a minimal (<1.4-fold) loss in efficiency when using these designs.
5.3 Chapter Summary
In this chapter we explored the bias, estimator efficiency, and coverage probability in the
accelerated design for both linear and nonlinear models when there was no between-cohort
variation.
5.3.1 Bias
For the linear model bias was in the ALD was always higher than the SCD but was
always below 5% at low GCR. In the nonlinear model at low GCR, however, bias was
substantially higher for both the SCD and ALD with average bias values at 15% and 32%
respectively. The sample size and total number of measurements was somewhat related to bias
such that after an N of ~40 or more than 500 measurements the bias would reach an equilibrium
in the linear model. The frequency of measurement was minimally impactful on bias in the linear
models, showing higher bias with less frequent measurement as a result of the loss in the total
number of measurements. For the nonlinear model, however, the resulting increase in bias was
substantial suggesting that for these simulations the frequency should be at least annually. While
we did not expect bias differences as a function of the amount of overlap between cohorts, we
found that the variability in the amount of bias was reduced for designs that had high amounts of
overlap. The variability in bias was also found to be greater, even with high overlap, for designs
with more cohorts as these designs would have comparatively less overlap even with the same
Cs/Ps ratios. Changes to the GCR improved bias estimates for the linear model from 0.5 to 0.7
142
but failed to show substantial differences after the GCR of 0.7. For the nonlinear model,
continued improvement to the GCR resulted in greater improvements in the slope bias and for
most designs achieved < 7% bias at a GCR of 0.9. Increases to the effect size resulted in
decreases to bias. For the linear model even at the default effect size of 0.5 the between-design
differences we so slight to suggests that use an ALD will not contribute substantially to bias.
When attrition is present, we found no association of the amount or distribution of attrition with
changes in bias. This was somewhat surprising for the nonlinear model, as we might have
expected greater attrition at the beginning of the study to induce bias. However, as our attrition
was related to the parameters in the model (i.e. advancing age) it appears reasonable that the
modeling of these parameters under MAR proved successful. Use of the equal cost sample size
(EqNc) resulted in a substantial decrease in bias for both the linear and nonlinear models with the
linear models showing less bias in the slope estimate than the SCD. For the nonlinear model,
though the EqNc substantially decreased the bias, this only translated into a less bias relative to
the SCD for the 3-cohort design. When attrition was present, however the use of the EqNc was
not as impactful, though still provided for less bias than when not employing the EqNc.
5.3.2 Estimator Efficiency
Estimator efficiency in the ALD was always worse than in the SCD with an average 1.6
and 2.4 fold-change for the linear and nonlinear models respectively. Having a greater number of
measurements was also related to improved efficiency, however for a given total number of
measurements it is more beneficial to have less frequent measurement. Greater overlap in the age
distributions between the cohorts was also related to improved efficiency and this improvement
was more beneficial for designs with a larger number of cohorts. Both the absolute and relative
efficiencies are better for designs with a smaller number of cohorts and this improvement relative
to the SCD declines after about a sample size of ~ 40. As the GCR increased, all designs showed
improvements in efficiency, particularly those with a larger number of cohorts. Increase to the
effect size showed improvements to the estimator efficiency as well as a reduction in between-
design differences with the larger cohort designs showing the greatest benefit. Greater attrition
resulted in very slight losses of efficiency and these were likely related to the loss in the total
number of measurements available. The distribution of the attrition (i.e. gamma) had no impact
on the efficiency. Use of the EqNc improved efficiency in all of the ALDs, however none were
able to reach as low of levels as in the SCD, though the relative efficiency ranged from a 1.06 to
143
a 1.25 fold-change for the linear model. When the EqNc was used under attrition there were only
minor decrements (<1.4 fold-change) and in the case of the linear models, no appreciable
change.
5.3.3 Coverage Probability
For the linear models the ALD and SCD showed coverage close to the nominal level
(0.95), while the nonlinear models had consistently low coverage (~0.86) for both the ALD and
SCD indicating inflated type 1 error. We believe the reason why the nonlinear model shows such
poor coverage is due to the low GCR (0.5) employed as part of the default simulations. We
found no differences in coverage probability as a function of the number of cohorts used for the
linear but did find cohort differences in the nonlinear model. This suggests that when coverage is
low there are marked improvements through using fewer cohorts or larger GCRs or effect sizes
however once the nominal coverage is approached these differences disappear. This is likely the
reason why changes to the GCR had minimal impact on the coverage probability for the linear
models, however in the nonlinear model we found that improvements to the GCR resulted in
improved coverage. We believe this occurred as result of a faster decrease in bias than in
estimator efficiency as the GCR improved. As a result, the confidence intervals around the
estimate became narrower, but were less positively biased to include the population estimate.
Changes to the effect size did not result in changes to the coverage probability in the linear
models, however in the nonlinear models larger effect sizes dramatically improved the coverage
for larger cohort designs. Neither the amount of distribution of attrition had any meaningful
impact on the coverage probability.
144
5.4 Chapter References
1. Enders, C. K., & Bandalos, D. L. (2001). The relative performance of full information
maximum likelihood estimation for missing data in structural equation models. Structural
equation modeling, 8(3), 430-457.
145
Chapter 6
Statistical power in accelerated designs with linear growth and
in the presence of between-cohort differences
In the previous chapters we saw how the accelerated design can compete with the
traditional single-cohort longitudinal design where there are no differences between the cohorts.
These scenarios mirrored the implicit assumptions of many researchers studying development;
that the developmental trajectory in their own study would generalize to other birth-cohorts from
around the same time period. For most, this assumption goes untested either because the data
was not collected as multi-generational or through ignorance of the impact that generational
differences can have on the study of development. In this chapter we will explore how
differences in development between the cohorts can impact statistical power in the accelerated
design with regards to detecting a generalized developmental trajectory and the between cohort
variability in this trajectory.
The influence of cohorts has routinely been shown to modify developmental trajectories
(Ryder, 1965; Finkle, Reynolds, McArdle, & Pederson, 2007), however few have examined the
statistical properties necessary to detect these effects nor demonstrated the utility of a
generalized trajectory across the cohorts. The previous work by Raudenbush and Chan (1992;
1993) provided a framework for assessing differences between two cohorts through examining
the within-cohort trajectories between the cohorts where there is overlap in the age distribution.
Formal testing of differences was conducted through the use of an age-by-cohort fixed effect
interaction, which in the absence of statistical significance indicated that the cohorts should be
analyzed as having the same trajectory. This work was similar to that of McArdle, Anderson, and
Aber (1987) who assessed cohort convergence in a structural equation modeling framework
through tests for group invariance. Miyazaki and Raudenbush (2000) extended this work for
application to larger cohort designs using the likelihood-ratio (for ML estimation) and the Wald
test (for REML estimation) to test for convergence to a single trajectory, again using fixed
effects to estimate the age-by-cohort interaction. Other methodologists investigating accelerated
longitudinal designs such as Moerbeek (2011), have ignored the complications of the age-by-
cohort interaction entirely. Galbraith, Bowden, and Mander (2017) employed both the fixed
effects methods of Miyazaki and Raudenbush (2000) as well as investigated the modeling of
146
cohort effects as random effects. The incorporation of cohort effects as random effects is based
on the work of O'Brien, Hudson, and Stockard (2008) who proposed the use of an age-period-
cohort mixed model for aggregate data, described previously in Chapter 3. By allowing for
within-person longitudinal change, the APCMM can be turned into the accelerated longitudinal
design mixed model (ALDMM), variations of which have gone under the moniker of the
hierarchical APC model as proposed by Yang and Land (2006; 2008; 2016). While Galbraith et
al. (2017) does not discuss this model in much detail in their paper, they did note that the fixed
cohort effects methods are more powerful with fewer cohorts and less overlap and that the
opposite was true when utilizing the ALDMM. This chapter will go into greater detail on the
specifications of the ALDMM as well as present the methods and considerations when
simulating accelerated designs with between-cohort variability in the developmental trajectory.
While many of the investigations into ALDs echo the work of Duncan, Duncan, and Strycker
(2006) as well as Glenn (2005) who recommend that a single-cohort design be employed in
instances where age-by-cohort interactions exist; we will argue that this decision to trade-off the
generalizability of an ALD for the decreased bias of a single-cohort design is antithetical to the
goals of most research projects. Moreover, we will demonstrate the performance of the
accelerated design in the presence of between-cohort variability and how the ratio of between-
cohort to within-subject variance influences decisions to utilize the accelerated design.
6.1 Models for age-by-cohort interaction
6.1.1 ALD Mixed Models for Cohort Differences
For simulating cohort differences, the simplest approach is a model which presumes no
interaction of cohort-by-age. In these models the growth trajectories from each cohort are
parallel to one-another (i.e. same slope for age) but lagged as a function of their differences in
intercept (σ
2
u0). The variation in starting level can be varied to examine situations where
between-cohort variability is greater than within-cohort variability and vice-versa. If we recall
from Chapter 3, the most basic form of the ALDMM is equation 1 below, which shows the
assumed data generating mechanism for a model with i observations nested within j subjects
belonging to k cohorts.
yijk = β0 + β1xijk + υ0j + υ0k + εijk eq. 1
From this equation (eq. 1) it can be seen that there is a fixed effect (β1) for age (x) which does not
147
vary between subjects or cohorts. The starting level (β0) is dependent upon the variation in these
levels between subjects belonging to the same cohort (υ0j) and the variation between cohorts
(υ0k). There is no correlation between intercept and slope, as the slope (β1) is a fixed effect. This
equation can then be expanded (eq. 2) to allow for between-subject variation in the slope
parameter (υ1j), yet without variation in the slope between cohorts.
yijk = β0 + β1xijk + υ0j + υ0k + υ1j xijk + εijk eq. 2
In instances where the slope for age varies between the cohorts, we can add an additional
parameter to capture these cohort-driven differences in slope (υ1k).
yijk = β0 + β1xijk + υ0j + υ0k + υ1j xijk + υ1k xijk+ εijk eq. 3
In these models with random coefficients, the correlation between intercept and slope will be
specified as 0, reflecting a conservative estimate. In practice, the correlation between slope and
intercept are often negatively related such that the amount of growth is minimized for those high
on the construct. Equation 3 represents the full specification of the ALDMM and in this chapter
will be the only model used. We will assume that the within-subject effects (level 3) are nested
within-cohorts (level 2), however alternate specifications such as crossed random effects could
be employed. For the purposes of simulating these models, it is desirable to apply constraints to
the random cohort variation in the intercept (υ0k ) and slope (υ1k ) so that between-cohort
differences have a consistent and logical directionality that is aligned with the theory behind their
differences, these are described in the subsequent section.
6.1.2 Simulating Cohort Differences
Cohort differences can be created such that they represent unidirectional and equal
magnitude deviations from the youngest cohort as function of the Cohort Intercept (CID) or
Slope (CSD) differences specified. Both CID and CSD represent cohort differences and are
modeled as random normal variables. It is important to consider that the number of between-
cohort deviation values generated by incorporating random cohort effects is small. For example,
for k-cohorts there will be k-values representing the cohort specific deviations from the
population intercept or slope. As a result, though the random variation between the cohorts are
assumed to be normally distributed, for any of these designs this assumption cannot readily be
tested. The potential consequences of misspecification of the error distribution for the random
effects are likely to impact the bias in these random variance parameters and not the estimation
of the fixed effects (Verbeke & Lesaffre, 1997; McCulloch & Neuhaus, 2011). Both the CID and
148
CSD can be expressed as percentages of the intercept or slope in the youngest cohort, which for
simulation purposes may be advantageous for those unsure of the amount of cohort differences to
interject. Assuming the first cohort (k=1) is the youngest cohort, subsequent older cohorts will
have values that are greater than (CID > 0) or less than (CID < 0) the younger cohorts depending
upon the CID value. Similarly, older cohorts can have a steeper slope than younger cohorts when
CSD > 0 and vice versa. Moreover, the population fixed effects of the level (β0) and slope (β1)
are also dependent on these parameters such that:
β0 = (β0 c1 + (β0 c1 + CID(Cn-1)))/2 eq. 4a
β1 = (β1 c1 + (β1 c1 + CSD(Cn-1)))/2 eq. 4b
In the above equations (4a & 4b) where β0 c1 and β1 c1 represent the intercepts and slope for the
youngest cohorts, the CID and CSD are shown to represent the average difference between
subsequent cohorts. In practice, this allows for the difference between subsequent cohorts to be
equal in a 2-cohort, 3-cohort, or the k
th
-cohort design. This is termed the unrestricted growth
method for inducing cohort differences, which will be explained further below. In order to know
the intercept or slope of the k
th
cohort using the unrestricted method we can apply the formula of:
β0 ck = β0 c1 + CID(k-1) eq. 5a
β1 ck = β1 c1 + CSD(k-1) eq. 5b
While holding the average difference between cohorts to be equal is ideal for investigating the
power to detect between-cohort differences in ALDs that have different numbers of cohorts, this
can have negative implications when trying to examine the power to detect slope differences.
Indeed, in these equations a 3-cohort design with CSD of 4 would have a larger population
averaged slope than a 2-cohort design with the same CSD, which would differentially impact
power and hamper between-design comparisons where the number of cohorts vary. As a result,
for simulations comparing the power to detect the fixed effect slope, an alternate equation will
have to be used whereby the CID and CSD represent the total amount of difference between the
youngest and oldest cohorts. This is termed the restricted growth method. For these, the
population fixed effects would be expressed as:
β0 = (β0 c1 + (β0 c1 + CID))/2 eq. 6a
β1 = (β1 c1 + (β1 c1 + CSD))/2 eq. 6b
These lead to the following equations for determining the slope of the k
th
cohort using the
restricted method:
149
β0 ck = β0 c1 + (CID/(Cn-1))(k-1) eq. 7a
β1 ck = β1 c1 + (CSD/(Cn-1))(k-1) eq. 7b
For both the unrestricted and restricted growth methods the population level between-cohort
variance parameters (e.g. υ0k and υ 1k) can then be calculated as the sum of the squared difference
between the cohort specific slopes or intercepts (eqs. 5a, 5b, 7a, 7b) and the population level
values (eqs. 4a, 4b, 6a, 6b) averaged across the number of cohorts in the design. In order to better
understand how the CID and CSD are being used in these simulations, we present the following
graphical exemplars that show how the CSD alters the population fixed effect slope and
between-cohort variance calculations.
In Figure 6.1 below, we show how use of the CSD (value=4) would impact an ALD of
varying cohort sizes where the slope in the youngest cohort is a starting value of 2. There are no
differences in the intercept between the cohorts. The cohort differences displayed ensure that
each subsequent cohort has a slope that is 4 points greater than the previous cohort (unrestricted
growth method). As a result, we utilize equation 4b to compute the population estimate for the
slope. Estimates and the computation of the between-cohort slope variance (υ1k ) are also shown
within each figure. We call this form of interjecting cohort differences the unrestricted growth
method because of the resulting impact on the increasing amount of growth with the addition of
cohorts.
Figure 6.1: Exemplar of ALDs with CSD of 4 using the unrestricted growth method
150
We can see that designs with an even number of cohorts (panels A & C) will have a population
slope that does not go through any one cohort's trajectory whereas for designs with an odd
number of cohorts (panels B & D) the population slope will reflect the value of the cohort in the
center of the age distribution. As explained above, using the CSD to hold constant the difference
between the slopes of successive cohorts can be ideal for assessing the power to detect between-
cohort variability because as more cohorts are added, the between-cohort slope variance (υ1k )
increases as we would expect, ostensibly resulting in greater power to detect these differences.
However, we note that the population level fixed effect slope (β1) will then differ between
designs with different numbers of cohorts, making comparisons for power of the population level
fixed effect slope difficult to compare across designs. In order to address this issue we can
consider the restricted growth method for incorporating cohort differences.
In the restricted growth method we limit the total amount of change in the slopes (or
intercepts) between cohorts to the value of the CSD (or CID) such that the slope difference
between the youngest and oldest cohorts will be equal to the CSD. This has the effect of limiting
the total amount of growth over the age-span to be equal across designs of differing numbers of
cohorts. Moreover, the population estimate for the slopes from these designs are equal (equation
6b), allowing for a fairer comparison of the power to detect these fixed effects between designs
of varying numbers of cohorts. Figure 6.2 below shows examples of these designs using the
restricted growth method with the same parameters as used in the unrestricted method.
Figure 6.2: Exemplar of ALDs with CSD of 4 using the restricted growth method
151
In addition to allowing for equal slopes across designs, the restricted method reduces the
population level between-cohort slope variance (υ 1k ) as the number of cohorts increases thereby
decreasing the ability to detect between-cohort differences. What is also of note is that for the
special case of the 2-cohort design, the unrestricted (Fig. 6.1 panel A) and restricted (Fig. 6.2
panel B) methods are equivalent.
Choosing between the use of the unrestricted or restricted growth method for
incorporating these differences should reflect the underlying theory as well as the goals of the
simulations. In most instances the restricted method is probably closer to the reality that
researchers imagine, as there is some total amount of difference between the generations and the
researcher must then choose between the number of cohorts to use to capture this fixed amount
of difference. However certainly we can imagine scenarios where a researcher, based on prior
literature, has an idea about the cohort differences between two successive cohorts and would
like to extrapolate what this might look like if the trends were to continue for additional cohorts.
Regardless, the difference between utilizing the unrestricted and restricted methods is only
relevant for conducting comparisons between designs of different numbers of cohorts, as within-
design comparisons are not confounded by the changes to the population fixed effects and
between-cohort variability when incorporating the CSD or CID.
When specifying these cohort effects, we are assuming mechanisms that are both
unidirectional and of equal magnitude between-cohorts. This does not necessarily mirror reality,
as certainly the between-cohort effects could be nonlinear and accelerating or decelerating as
well as varying in the magnitude and direction of their deviation from the youngest cohort
between generations. Some cohorts might deviate from each other while others experience
convergence, and how far away these 'deviant' cohorts are from the center of the age distribution
would likely impact power (Miyazaki & Raudenbush, 2000). Nevertheless, unidirectional and
152
equal magnitude differences are convenient for simulation and would, in-general, align with
many theories of generational differences in developmental trajectories.
6.1.3 The ratio of between-cohort variability to within-cohort, between-subject variability
The impact of introducing between-cohort differences in the mean levels or trajectory are
not absolute, but instead are relative to the amount of variability that occurs within a given
cohort between the subjects. As we recall from Chapter 3 and as should be apparent from
equation 3 above, between-subject variability in the slope and intercept are assumed to be equal
between cohorts. For example, our prior simulations have used a slope value of 2 and an effect
size of 0.5, which indicates that the between-subject variability in slope is a standard deviation of
4 (i.e. 2/4=0.5). In the presence of between-cohort differences in slope, as the slope increases in
each successive cohort, the effect size is thus increased as the 'signal' is amplified relative to the
constant standard deviation. Introducing the CID or CSD into these parameters, relative to the
within-cohort variability, will likely impact the ability to detect cohort differences as well as
impact the power of the fixed effect slope. In our example with a slope standard deviation of 4 in
each cohort, introduction of between-cohort variability that is less than 4 would likely have
minimal impact on power as the cohort specific between-subject variability is greater than the
between-cohort variability; however as this ratio increases to be equal to or greater than 1, we
would begin to expect greater effects on power. This ratio, termed the cohort intercept ratio
(CIR) for intercepts or cohort slope ratio (CSR) for the slope, will be a primary metric used to
quantify the potential impact of the CSD on the power to detect effects. The CIR and CSR are
described below for both the unrestricted and restricted methods for inducing cohort differences.
Unrestricted Growth Method
CIR = CID/σu0 eq. 8a
CSR = CSD/σu1 eq. 8b
Restricted Growth Method
CIR = (CID/(Cn-1))/σu0 eq. 9a
CSR = (CSD/(Cn-1))/σu1 eq. 9b
If we recall from Chapter 3, the terms σu0 and σu1 refer to the within-cohort standard deviations
of the intercept and slope respectively. It's important to remember that changes to either the CSD
(or CID) can impact the ratio as well as changes to the effect size such that the same ratio can be
accomplished at a low effect size with a high CSD or vice versa. While one might assume that
153
these ratios should be a function of the population variance parameters υ0k and υ1k, as we will see
from the results of the simulations, the impacts on power are more related to the changes in the
CIR and CSR specified relative to the cohort specific between-subject variation.
6.1.4 Cohort differences as percentage change
In the previous sections we saw that between-cohort changes could be described as a
function of their absolute magnitude (CID or CSD) with regard to successive changes
(unrestricted growth) or total growth (restricted method) in section 6.1.2 as well as expressed as
a ratio of these between-cohort differences to the within-cohort variability termed the cohort
intercept (CIR) or slope ratios (CSR) in section 6.1.3. Though both describe the nature of these
between-cohort changes, neither are intuitive for a researcher looking to explore potential
designs and their consequences on statistical power or other metrics such as bias etc. It is for this
reason that we present the quantification of between-cohort differences as a percentage of the
intercept or slope in the youngest cohort. This allows a researcher with knowledge of the effects
in an SCD to explore how incorporating other cohorts that are within a percentile range of the
SCD values might impact power. The cohort intercept and slope percentage are defined by:
CIP = CID/β0 c1 eq. 10a
CSP = CSD/β0 c1 eq. 10b
The cohort intercept and slope percentages will thus represent the %change in the estimate from
either: between the youngest cohort and the subsequent cohort, if the CSD is to be specified
using the unrestricted method, or as the total %change from the youngest to the oldest cohort if
using the restricted method. For the equations, the terms can be rearranged to specify the desired
CIP or CSP and then solved for the CID or CSD, which once obtained can be used in the
appropriate equations from the preceding sections to solve for the population effects, individual
cohort values, and ratios depending on the researchers desire to utilize the unrestricted or
restricted methods for inducing between-cohort variability. This is particularly useful in the
restricted method when one might want to limit the total amount of growth between the youngest
and oldest cohorts to be a certain percentage of the baseline growth. These methods of describing
between-cohort differences in terms of percentage change can be more intuitive for researchers,
however they should be considered alongside the aforementioned cohort ratios which account for
the within-cohort variability which is not captured in these percentage metrics.
154
6.2 A note on simulating period effects
The models for period effects are not dissimilar to the models of cohort effects. Indeed, in
the situation where cohort differences (i.e. CLD and CSD) are equal to zero, the above models
become models of period and age alone. Recognizing this, it becomes apparent that the slope of
age (β1) is a slope that is comprised of these two components, period effects and age effects.
While the primary focus of this paper is on ALDs, and by design emphasizes the influence of
cohorts; the incorporation of period effects into these designs is possible as well as the
incorporation of period-by-cohort interactions. While running simulations to understand period
effects is beyond the scope of this dissertation, we provide equations below that could be used to
simulate linear fixed-effects for periods as well as incorporate differential shifts by cohort in the
slope for period, termed the period slope difference (PSD).
For incorporating a linear effect of period in these simulations, a simple fixed effects
slope could be added to equations 1-3 above (β2wijk) to reflect the impact of period influences on
the development. For example, modification of equation 1 would yield:
yijk = β0 + β1xijk + β2wijk + υ0j + υ0k + εijk eq. 12
Where wijk are period effects and β2 the slope for these which will not differ between subjects or
cohorts. More complicated random effects structures could be employed which allow period to
vary between subjects (and/or cohorts) through the incorporation of additional random
coefficients in equation 2 or 3. For allowing period effects to differ by cohort, an equation
similar to that of the equations for cohort specific slopes of the CSD (eqs. 5b, 7b) could be used.
For these period slope differences in the k
th
cohort could be defined by:
β2 ck = β2 c1 + PSD(k-1) eq. 13
Where β2 ck represents the slope for period in the k
th
cohort and β2 c1 the period effects in the
youngest cohort. In this equation 13, the period effects are linear and equally spaced between the
cohorts. This specification would necessitate the use of a random slope for period nested within
the cohort random intercept in equation 12 above.
There are, of course, alternate means to model these period effects, however these
methods described above allow us to imagine one of the ways the effects could be simulated and
analyzed. However, it should be noted that while these models may provide solutions to the
statistical identification problem for age, period, and cohort effects, they do not resolve the
theoretical problems posed by these models nor can they provide us with unbiased estimates for
155
each effect when age, period, and cohort effects are presented. As a result, the solutions to these
models may not be unique and require further consideration of the theory underlying their
specification rather than the blind implementation and interpretation of their output.
6.3 Statistical power of ALDs to detect the population fixed effect slope
6.3.1 Parameters of the Simulations
The following sections will examine statistical power to detect the population level fixed
effect slope as well as the ability to detect between-cohort differences. Power to detect the fixed
effect was determined as previously described in Chapter 3. For the between-cohort differences,
a likelihood ratio test comparing the ALD mixed model with and without a random intercept and
slope for cohort and nested age effects were used. The proportion of statistically significant
(p<0.05) likelihood ratio tests from the simulations was taken as the power to detect between-
cohort differences in slope. Table 6.1 below shows the ALD design conditions for the
simulations. All simulations were conducted using the user-written package aldsim (Jackson,
2017) in Stata version 15.0 (College Station, TX).
Table 6.1: Variation in design parameters for simulation
Design
Parameters
Cn, Cs, Pn, Ps
Overlap % GCRs Sample Sizes
CSPs
Restricted Growth
2 1 10 1 90.0 (high) 0.5 & 0.9 10 to 120, by 10
0 to 100, by 10
100 to 800, by 100
2 3 8 1 62.5 (low ) 0.5 & 0.9 10 to 120, by 10
0 to 100, by 10
100 to 800, by 100
3 1 9 1 88.8 (high) 0.5 & 0.9 10 to 120, by 10
0 to 100, by 10
100 to 800, by 100
3 3 5 1 40.0 (low ) 0.5 & 0.9 10 to 120, by 10
0 to 100, by 10
100 to 800, by 100
4 1 8 1 87.5 (high) 0.5 10, 60, 120
0 to 100, by 10
100 to 800, by 100
5 1 7 1 85.7 (high) 0.5 10, 60, 120
0 to 100, by 10
100 to 800, by 100
Each of the simulations covered an age-span of 10 years. Starting parameters were similar to
those described in Chapter 3, such that the slope of the youngest cohort was a value of 2 with
standard deviation of 4 indicating an effect size of 0.5. The intercept for all cohorts was set to 0
(i.e. CID=0) with between-subject variability in the intercept equal to the slope SD (i.e. 4). There
was no correlation between the slope and intercept, reflecting a conservative test. GCRs of both
low (0.5) and high (0.9) growth reliability were used. Between-cohort slope differences were
156
generated using the restricted growth method outlined above and were varied from 0 to 16 in
increments of 2. These CSDs are generally presented as the cohort slope ratios (CSRs) outlined
in equations 8b and 9b above. One-thousand simulations were conducted using these parameters
across sample sizes ranging from 10 to 120 in increments of 10 for most designs.
As we can see in Table 6.1, we have limited our examination of the design parameters to
primarily those with 2 or 3 cohorts with both high (Cs=1) and low (Cs=3) overlap between the
successive cohorts. We anticipate that the limited scope of these simulations covering the
scenarios where the population slope is either estimated between the cohorts (Cn=2) or through
the cohort in the center of the age distribution (Cn=3) will be sufficient to generalize more
broadly to designs of a larger number of cohorts. To help verify this, we have conducted limited
simulations in the high overlap condition for designs with 4 or 5 cohorts.
Conspicuously absent from these simulations is the examination of the SCD. By its
definition the SCD cannot have cohort differences and thus us is not relevant to this chapter that
explores the influence of cohort differences. Nonetheless, we can consider how the choice to use
an SCD when in-fact there are between-cohort differences would limit the generalizability of
such a developmental trajectory as well as compare power to detect the fixed effect slope to the
SCD power values.
Lastly, while the prior chapters examining power (Chapters 3 & 4) did so as both a
function of total sample size and the total number of measurements, this was largely to facilitate
comparisons with the prior work of both Moerbeek (2011) and Galbriath et al. (2017) who
examined power as a function of both metrics. However, we find that in most circumstances a
researcher looking to design a study would have an idea about the number of measurements
needed per person to capture developmental change a priori and would thus be primarily
interested in the number of subjects needed to have power to detect the change. It is for this
reason we present the tables and figures of this chapter as a function of sample size and not the
total number of measurements.
6.3.2 Understanding the elements of between-cohort slope variation
Before we can begin to understand the power to detect the population slope we must first
understand how the parameters of the population slope, between-cohort variance in slope, cohort
slope differences, and cohort slope ratio relate to one another in these simulations. In Figure 6.3
157
below, we explore these relationships to aid in our understanding of the results in the following
sections.
Figure 6.3: Relationships of population slope, CSP, between-cohort slope variance, CSD, and
CSR
In panel A, we find that from our initial slope of 2 in the scenario of no between-cohort
differences, the slope increases by 1 point per every 100% increase in the CSP corresponding to
a 2 point increase in CSD (second y-axis). This occurs regardless of the number of cohorts
because the cohort differences were created using the restricted growth method, meaning that the
total difference in the slopes between the youngest and oldest cohorts was restricted to the CSP.
As the reader can see, we investigated a wide range of differences with a maximal difference in
slope that was 800% larger in the oldest cohort. While the slope does not vary between-cohorts
within a given CSP, the amount of between-cohort variability will change as a function of the
number of cohorts (panel B), with designs with more cohorts showing less variability at the same
CSP (or CSD). We can understand this intuitively by re-examining Figure 6.2. As the number of
cohorts increase, the 'space' with which to fill the CSP gets increasingly reduced. These changes
occur nonlinearly, with larger CSPs exerting a greater influence on increasing the between-
cohort variance than smaller CSPs. Because the variance is larger for designs with a smaller
number of cohorts, we also find that the ratio of the cohort slope differences to the within-cohort
between subject slope variability (SD=4) is greater for designs with a smaller number of cohorts
158
(panel C). Termed the cohort slope ratio (CSR), these are expressed as a linear function of the
CSP and show how the between-cohort differences in the slope relate to within-cohort
differences. These are described in more detail in section 6.1.3. Lastly, the relationship of the
CSR to the between-cohort variance is shown (panel D). Designs with a greater number of
cohorts are shown to have greater between-cohort variability at the same CSR. This occurs
because a design with a larger number of cohorts can have a high CSD but low CSR as a result
of having a smaller on-average between-cohort slope difference at the same CSD as a smaller
cohort design. As a result, the variability at this seemingly low CSR is actually derived from a
much higher CSD in the large cohort design compared to the small; causing an inflation of the
between-cohort variance. It's also important to note that it is for this reason that the CSR of
larger cohort designs are smaller than designs with a smaller number of cohorts (panels C & D).
6.3.3 The generalizability of the population slope
While our simulations employed a large range of CSP values, it seems unlikely that all of
these could be considered acceptable for generating a model that generalizes across the cohorts.
To exemplify this, in Table 6.2 we show how the CSP relates to the average slope error for the
youngest and oldest cohorts regardless of design using the center of the age distribution at age
15.
Table 6.2: Misfit at Age=15 for various CSPs
CSP
Predicted Y
Y=Slp*15
Difference
from CSP at
0%
Avg
Error %
0% 30.0 0 0%
10% 31.5 1.5 5%
20% 33.0 3.0 10%
30% 34.5 4.5 15%
40% 36.0 7.5 20%
50% 37.5 9.0 25%
60% 39.0 10.5 30%
70% 40.5 12.0 35%
80% 42.0 13.5 40%
90% 43.5 15.0 45%
100% 45.0 15.0 50%
200% 60.0 30.0 100%
300% 75.0 45.0 150%
400% 90.0 60.0 200%
500% 105 75.0 250%
600% 120 90.0 300%
700% 135 105.0 350%
800% 150 120.0 400%
159
As we can see, the percentage error in the slope estimation for the youngest and oldest cohorts
will always be half that of the CSP specified. This holds true regardless of where along the age
distribution these values are created, the number of cohorts, or even value of the slope in the
youngest cohort. Thus researchers who want to create a population slope that generalizes to all of
the cohorts must keep in mind that as the CSP induces more variability between the cohort
specific slopes the error rate for the youngest and oldest cohorts will also increase. This, of
course, is only describing error rates between the population level slope and the cohort specific
slopes which represent the average change per subject. Depending on the amount of variability in
the cohort specific slope estimates, we may still find that the population slope is a decent
estimator for many of the participants at these CSPs. To illustrate this, we can visually examine
how changes to the CSP are differential depending on the CSR as modified by changes to the
effect size.
Figure 6.4: Changes to CSP are impacted by changes to the CSR
Even at a CSP of 100% (a doubling of the slope between-cohorts, panel A) the population slope
(dashed line) appears to be a fair estimator of the average changes occurring in both of the
cohorts. This is because despite the high CSP, the CSR remains low (0.5) reflecting that the
within-cohort variability is twice the size of the between-cohort variability. In panels A thru C,
the CSP is increased for a fixed effect size of 0.5, reflecting increases in the CSR. As the CSR
increases, the population slope becomes less of an accurate estimator of the cohort specific
160
changes. In panels D thru F we can see how this is dependent on the CSR more than the CSP, as
these panels have a CSP that is double that of the panels above, yet have the same CSR by
decreasing the effect size from 0.5 to 0.25. Though the population slopes are larger in panels D
thru F, the generalizability of these population slopes to both of the cohorts follows the same
trend as in panels A thru C with lower CSP and higher effect size. Nevertheless, we would expect
lower statistical power in these designs with the same CSR but greater slope differences (panels
D thru F) due to the greater within-cohort variability from the effect size.
Visualization of these slopes helps us see how having a low CSR is important for
improving the generalizability of the population slope to all of the cohorts in the ALD, however
this is a subjective measure of generalizability. While we will examine metrics related to overall
model fit such as the AIC in subsequent sections, these do not have a magnitude that informs us
about the amount of misfit in the slopes for the individuals. In order to create this measure of
slope misfit we can compare the slope predictions in the ALD mixed model for each individual
to the true slopes derived from the individual level regression of Y on age. Equation 14 below
describes this procedure for computing the slope misfit across j individuals:
𝑠𝑙𝑜𝑝𝑒 𝑚𝑖𝑠𝑓𝑖𝑡 =
∑ -.
/
01
/2
01
/3
456
2
7
28/
9
eq. 14
In order to calculate the slope misfit, regressions of Y on age are run for each individual in each
cohort with the age slope for the j
th
individual defined as bj. We will subsequently refer to these
values as the individuals true slope. Each individuals true slope is then subtracted from their
predicted slope from the mixed model based on the population level fixed effect slope (β1) and
their within-cohort between subject deviation from this slope (υ1j) as well as their cohort (k)
specific deviation (υ1k). These difference in these values are then averaged across all of the
individuals in the dataset. A modified version of this equation can be used to find the average
absolute slope misfit by taking the absolute value of the differences between the true slope and
predicted slope in equation 14. In order to examine this misfit we ran simulations using the 2 and
3-cohort designs described in Table 6.1, but for a single sample size of 1,200 and at the effect
sizes of both 0.5 and 0.25. In Figure 6.5 we can see how the absolute misfit differs by the
number of cohorts, the amount of overlap, effect size, and GCR.
161
Figure 6.5: Absolute Average Slope Misfit across CSR by Effect Size and GCR
In the top panels (A & B), the effect size is held constant at 0.5 while in the bottom panels (C &
D) the effect size is at 0.25. The left panels show a low GCR (0.5) and the right panels show a
high GCR (0.9). We unsurprisingly find that the absolute misfit is greatest when both the effect
size and GCR are low (panel C). Increases to the effect size or GCR will improve the absolute
misfit of slopes from the models. Similarly, between design differences are minimized at high
GCR and larger effect size (panel B). For any given GCR and effect size, the 2-cohort design will
have a lower absolute misfit than the 3-cohort design, though when accounting for overlap the 3-
cohort design with high overlap (Cs=1) will have lower misfit than the 2-cohort design with low
overlap (Cs=3). Additionally, the within design between-overlap differences is misfit are
reduced in the 2-cohort design compared to the 3-cohort design, suggesting that changes to
overlap are more impactful on the misfit for designs with a greater number of cohorts. What is
notable is that there do not appear to be differences in the absolute slope misfit as a function of
the cohort slope ratio. This is a result of the scaling on the y-axis as there are subtle differences
in the misfit as a function of the CSR which we will see below. Moreover, the reader may notice
that even at a CSR of 0 (i.e. no differences between cohorts) there is some level of misfit. This is
to be expected as the slopes are generated from a random normal distribution, thus the only way
for the slopes to have no misfit would be to eliminate between-subject differences in the slope.
This is why increasing the effect size (i.e. reducing the between-subject SD in slope) improves
162
the average absolute misfit in panels A & B. In Table 6.3 we can see the average absolute misfit
at a CSR of 0 for these various designs.
Table 6.3: Absolute Misfit at CSR = 0
Effect Size
δ
GCR SCD
Cn = 2
high Overlap
Cn = 2
low Overlap
Cn = 3
high Overlap
Cn = 3
low Overlap
0.25
0.5 5.99 6.77 9.80 8.30 20.62
0.9 1.95 2.16 3.17 2.70 6.79
0.50
0.5 3.00 3.38 4.90 4.15 10.31
0.9 0.97 1.08 1.58 1.35 3.39
Indeed, even at the best-case scenario for the ALDs (2 cohorts, high overlap, high GCR and
larger effect size) the average absolute misfit is 1.08 indicating that on average the slopes from
individual level regressions would differ from the mixed models by ~ 1.08 points. While the
SCD provides the lowest absolute misfit to the slopes, the ALD high overlap scenarios provide
for a close approximation to this value.
Though we did not see any differences in the misfit as a function of the CSR, as noted
above, this was due to the scaling of the absolute misfit parameters. When we examine these
changes across the CSR individually, we can see how changes to the CSR impact the slope misfit
in the 2 (Figure 6.6) and 3 (Figure 6.7) cohort ALDs.
Figure 6.6: Changes to the CSR and Change in Absolute Slope Misfit for the 2-Cohort ALD
163
When the amount of overlap is high (Cs=1; panels A & C), the amount of misfit increases from a
CSR of 0 with peak misfit between a CSR of 0.1 and 0.2 regardless of the effect size or GCR
specified (GCR changes not displayed). The absolute misfit continues to decrease from peak as
the CSR increases, though remaining above the levels when there were no between-cohort
differences. When the amount of overlap in decreased (Cs=3; panels B & D), an opposite pattern
is observed, with all values of the CSR showing lower absolute misfit than when there were no
between-cohort differences. In this low overlap condition, the misfit was minimized at a CSR of
0.1 with increases to the misfit as the CSR increases. For the 3-cohort design (Fig. 6.7), peak
differences occurred in the high overlap condition (panels A & C) at around a CSR of 0.05 with
subsequent decreases below the misfit rate at CSR of 0.
Figure 6.7: Changes to the CSR and Change in Absolute Slope Misfit for the 3-Cohort ALD
In the high overlap condition (panels B & D), the absolute misfit initially decreases at a CSR of
0.05, however the misfit then increases between 0.10 and 0.15 followed by a decrease to the
nadir at a CSR of 0.5. At all values of the CSR the absolute misfit is less than when the CSR was
0.
Despite the observed changes in the absolute misfit as a function of the CSR, the
magnitude of differences across CSRs is exceptionally small. Perhaps a better indicator of the
impact of the CSR on the misfit of these models is the average percentage difference between the
true slope and predicted slope which can be accomplished by dividing the numerator in equation
164
14 by the true slope (bj). In Figure 6.8 we can see how these percentage differences relate to the
CSR, with positive values indicating a steeper slope value for the predicted slope.
Figure 6.8: Average % Difference in Slopes
For the 2-cohort ALD (panel A), the high overlap scenario shows consistent underestimation of
the predicted slope with low GCR designs showing the greatest underestimation as well as
designs with lower effect size. Differences from the true slope were minimized for CSRs
between 0.1 and 0.2 followed by a slight increase in difference and subsequent gradual decrease
as the CSR increased. For the low overlap designs, the percentage difference is positive
indicating overestimation by the mixed model with, again, the worst performance by designs
with both low GCR and small effect size. Differences were maximized for CSRs between 0.05
and 0.10 with a gradual decrease as the CSR increased. Overall these results for the 2-cohort
design indicate that a small amount of between-cohort variability is desirable for high overlap
designs and that no between-cohort variability or a moderate amount is desirable for designs with
low overlap. In the 3-cohort design (panel B), a similar ordering of the percentage differences is
shown, with designs of low GCR and effect size showing the greatest difference. Unlike in the 2-
cohort design, the mixed model predicted slopes overestimate the true slopes at all of the CSRs
with the exception of the no cohort differences condition (CSR=0) and very low CSR values in
the high GCR condition. In order to better visualize these changes panel C provides a zoomed in
perspective showing that in the high overlap condition differences are maximized between CSRs
165
of 0.05 and 0.10 while in the low overlap this shifts to 0.15. For both, the slope differences are
gradually reduced as the CSR increases indicating that for 3-cohort design low CSRs (<0.05) or
moderate CSRs are desirable to limit the slope differences.
Lastly, though we’ve talked about the generalizability of the population slope in the
context of the absolute slope misfit as well as the average percentage difference in the misfit,
neither measure captures how nested the population slope is relative to the true slope nor how
well the modeling of these slopes fit to the data. In reality, as between-cohort differences get
larger the population slope generally becomes a worse approximator of the individual level
slopes, thus minimizing the CSR is imperative to maximizing the generalizability of the
population slope. We can additionally examine fit indices measures such as Akaike’s
Information Criteria (AIC) in relation to the CSR in Table 6.4.
Table 6.4: AIC (x100) across CSRs at GCR=0.5 and Effect Size = 0.5
CSR
Cn = 2
high Overlap
Cn = 2
low Overlap
Cn = 3
high Overlap
Cn = 3
low Overlap
0.00 1264.618 1018.927 1141.487 644.5454
0.025 1264.549 1018.867 1141.518 644.5834
0.05 1264.543 1018.817 1141.419 644.5887
0.10 1264.542 1018.986 1141.541 644.6132
0.15 1264.556 1019.003 1141.565 644.6417
0.20 1264.676 1019.017 1141.583 644.6623
0.25 1264.690 1019.027 1141.598 644.6775
0.50 1264.726 1019.056 1141.642 644.7225
1.00 1264.757 1019.084 1141.686 644.7656
For each design, as the CSR increases, the AIC also increases suggesting that designs with a
higher CSR do not fit the data as well. However, despite this the within-design changes to the
AIC are very slight to suggest no real difference. More importantly the 3-cohort design fits the
data better at all of the CSRs and for both the 2-cohort and 3-cohort designs, the design with the
least amount of overlap provides for a better fit based on the AIC. These findings showing
designs with more cohorts having a better fit make sense as the 3-cohort design will have the
population slope pass through the cohort specific slope for the cohort in the middle of the age
distribution, thereby providing a better fit. The decrease in overlap providing better fit is a bit
less intuitive as one might assume that greater overlap in the age distributions would provide a
population slope that is more similar to each of the cohorts.
166
6.3.3 Power to detect the population fixed-effect slope
The power to detect the population fixed effect is one of the main concerns in conducting
ALDs, as researchers of development are primarily interested in characterizing developmental
change. While some have argued that the use of ALDs is not appropriate when there are cohort
differences (Glenn, 2005) due to the fact that the developmental trajectory will not necessarily
represent the trajectory of any single cohort, we would argue that the pooling of information
from multiple cohorts maximizes the generalizability to a broader range of generations.
Now that we have a basic understanding of how the CSD, CSP, CSR, and between-cohort
slope variance relate to one another from the prior section, we can begin examining the role that
these play on the power of these designs. In Figure 6.9 below, we show how power to detect the
population slope relates to the amount of cohort slope differences expressed as a percentage
using the CSP for designs of various cohort and sample sizes. When the CSP value is 0, there
will be no difference in the slopes between the cohorts (i.e. population slope =2). As mentioned
above, within each CSD (and thus CSP), the population slope is the same across designs of
varying cohort or sample sizes.
Figure 6.9: Cohort Slope Percentage and the power to detect the population slope
We find that each design, regardless of the Cn or sample size, experiences an initial increase in
the power as the CSP moves from 0 to 100% of the slope value reflecting a doubling in the slope
in the oldest cohort. The reason why power would increase when inducing between-cohort slope
variance should be obvious, as the increase in slope differences result in an overall stronger
population fixed effect. In this instance the population slope is increasing for all designs from a
value of 2 to 4 (Fig. 6.3, panel A), while the within-cohort slope variability has remained
constant at 4. As result the effect size (δ) has increased from 0.5 to 1.0, thus an increase in the
167
power is not unexpected. This, of course, relies on it being reasonable to assume that the
variation in slopes between subjects within a cohort would not change between-cohorts even as
there are changes to the slopes between-cohorts. We generally find this assumption to be
reasonable, as a researcher studying a single cohort would likely assume that the between-subject
variability would generalize to a novel unseen cohort. Moreover, the face validity of this
assumption makes it clear that an attempt to maintain a constant effect size (δ) in the presence of
cohort differences would yield results that are not congruent with our experiences in examining
between-cohort differences. Figure 6.10 below provides an exemplar of an ALD with 3 cohorts
with in an initial slope of 2 with the total amount of change (restricted method) between the
youngest and oldest cohorts restricted to 50% of the initial slope. In panel A, the effect size in
each cohort increases while the between-subject variability in slope is held constant at an SD of
4. In panel B, the effect size is held constant at 0.5. As the slope increases in subsequent cohorts
the between-subject variability in slope (SD) is also increased to maintain a constant effect size.
Figure 6.10: Increasing effect size versus holding effect size constant in the presence of CSD
As we can see, the data in panel A more closely resemble our expectations of what these
between-cohort differences should look like. Nevertheless, we will examine how instances where
the effect size is constant across cohorts with increasing slopes in subsequent sections.
Back to our assessments of Figure 6.9, we also notice that while designs with greater than
two cohorts continue to see an increase in power as the CSP increases, this is not the case for the
2-cohort ALD, which begins to see a decrease in power as the CSP is further increased. One
explanation for this may have to do with the fact that the 2-cohort design is estimating the
population slope between the two cohorts and does not generate an estimate that passes through
any observed cohort. While this could explain discrepancies from the 3- and 5-cohort designs,
168
this would not explain why the 4-cohort design, which has a similar population slope estimation
method, does not behave similarly to the 2-cohort design. Alternately we might recall that the
ratio of the average between-cohort slope difference to the within-cohort slope variability is
much higher at the same CSP in the 2-cohort design, making comparisons by CSP unfair. In
Figure 6.11 below, we re-plot these power values as a function of the CSR instead.
Figure 6.11: Cohort Slope Ratio and the power to detect the population slope
Panel A shows power across the full spectrum of the CSRs investigated while panel B examines
power more closely for CSRs less than 1. When comparing Figure 6.11 using the CSR to Figure
6.9 using the CSP we note that the relative ordering of the power values are reversed such that
with a fixed CSP, designs with a smaller number of cohorts were more powerful; however when
accounting for the average difference between slopes by using the CSR we find the opposite (i.e.
larger cohort designs are more powerful at fixed CSR). As is also apparent, particularly in the
smaller sample size, the rate of change in power slows after a CSR value of 1. While the pattern
for the 2-cohort design showing decreased power with increased CSR persists, we might guess
that perhaps this is due to having much higher CSR values than the other designs. Unfortunately,
this is not the case, as CSR ratios as high as 8 (a CSP of 3000%) were investigated post-hoc for
both the 3- and 4-cohort design (not displayed) and neither showed a downward trend in power
at these high CSRs. It should be noted however that this shift towards having higher p-values
(i.e. less power) for the population slope at higher CSRs is present for the other designs as well,
but that these designs don't seem to have these shifts impact power even at very high CSRs (e.g.
8). Figure 6.12 below shows the p-value curve for the 2- and 3-cohort design by the CSD values
of 2, 6, and 12 which corresponds to CSR values of 0.5, 1.5, and 3 in the 2-cohort design and
0.25, 0.75, and 1.5 in the 3-cohort design.
169
Figure 6.12: P-curve by CSD values
Apparent for both designs in the p-curve figure is that the distribution of p-values shifts towards
higher values as the CSR is increased. Indeed, the 2-cohort design despite showing a loss of
power at the higher CSR has the majority of its p-values below 0.10.
Though we cannot know the impact on the SCD when cohort influences are present, we
can compare the sample sizes required in order to have 80% and 90% under the various CSPs
(and CSRs) for the ALDs compared to the SCD. Table 6.5 below examines the sample sizes
required for power in these designs in the high overlap condition (Cs=1) at a low GCR (0.5). As
a reminder, the N's for 80% and 90% in the SCD were 56 and 80 respectively from Chapter 3
Table 3.2.
Table 6.5: Sample Sizes for achieving power with high overlap at GCR=0.5
CSP
(CSR-Cn2, CSR-Cn3)
Cn = 2
80% Power
N
Cn = 2
90% Power
N
Cn = 3
80% Power
N
Cn = 3
90% Power
N
0% (0.0, 0.0) 67 91 82 114
10% (0.05, 0.025) 63 86 78 108
20% (0.10, 0.05) 58 80 73 96
30% (0.15, 0.075) 53 76 67 88
40% (0.20, 0.10) 49 72 62 80
50% (0.25, 0.125) 47 67 58 76
60% (0.30, 0.15) 46 60 53 71
70% (0.35, 0.175) 44 58 50 67
80% (0.40, 0.20) 44 57 46 64
90% (0.45, 0.225) 43 56 42 60
100% (0.50, 0.25) 39 60 42 60
200% (1.0, 0.5) 44 86 28 43
300% (1.5, 0.75) 82 >120 24 35
400% (2.0, 1.0) >120 >120 20 29
500% (2.5, 1.25) >120 >120 18 28
600% (3.0, 1.5) >120 >120 16 27
700% (3.5, 1.75) >120 >120 15 25
800% (4.0, 2.0) >120 >120 14 24
170
With even a small amount of between-cohort variation, that is where the total variation in the
slopes is within 20-30% (CSR 0.10-0.15), the sample size required to achieve 80 or 90% power
is equivalent to that of the SCD in the 2-cohort design. For the 3-cohort design this increases to a
CSP between 40-60% though reflects the same CSR values ranging between 0.10 and 0.15.
Despite the 2-cohort design showing a decrease in power after a CSP of 100%, the 2-cohort
design in the presence of these cohort differences still performs as good as the SCD even up to a
CSP of 200%. This change at around 200% reflects when the CSR value exceeds 1.0 which is
likely meaningful as this reflects when the between-cohort variability in slope becomes the same
size as the within-cohort variability.
6.3.4 Power to detect the population fixed-effect slope and changes in Overlap and GCR
The previous section examined the power to detect the fixed-effect population slope in
the presence of between-cohort differences for designs with high overlap and low GCR (0.5).
We will next explore how power changes as a result of changes to the overlap and GCR and
compare these to the amount of change in the scenarios when there were no between-cohort
differences from Chapter 3. If we recall from Chapter 3, increases to the GCR increased the
power, while decreases to the amount of overlap, through increased Cs, decreased the power. In
Figure 6.13 we can see how changes to the CSR impact the power curves of the 2- and 3-cohort
ALDs in both the high and low overlap conditions.
Figure 6.13: Power curves for the population slope by CSR and Overlap
For the CSRs specified (≤0.50) the both the 2-cohort (panel A) and 3-cohort (panel B) design
show that increases to the CSR improve the power curve and that the high overlap condition
always has greater power than the low overlap condition. For the 2-cohort design, with high
171
overlap, the power curve at CSR=0.10 is nearly equivalent to that of the SCD while the low
overlap curves are only similar to the SCD at a CSR of 0.50. For the 3-cohort design, the high
overlap condition also shows power nearly equivalent to the SCD at a CSR of 0.10 while the low
overlap condition does not outperform the SCD at any of the CSRs plotted. When examining the
amount of change in the average power across sample sizes between the overlap conditions in
Table 6.6 we note that the within-cohort, between-overlap differences decrease as the CSR
increases.
Table 6.6: Average Power Differences between Overlap Conditions by CSR
Cn = 2 Cn = 3
CSP
(CSR-Cn2, CSR-Cn3)
Cs = 1
High
Overlap
Avg
Power
Cs = 3
Low
Overlap
Avg
Power
Δ %Δ
Cs = 1
High
Overlap
Avg
Power
Cs = 3
Low
Overlap
Avg
Power
Δ %Δ
0% (0.0, 0.0) 69.0 56.0 -12.9 -18.8 62.4 27.4 -34.9 -56.0
10% (0.05, 0.025) 72.1 59.6 -12.5 -17.3 65.2 31.8 -33.4 -51.2
20% (0.10, 0.05) 74.6 62.8 -11.8 -15.8 68.1 36.0 -32.2 -47.2
30% (0.15, 0.075) 76.6 65.5 -11.1 -14.5 70.8 39.7 -31.1 -43.9
40% (0.20, 0.10) 78.2 67.9 -10.2 -13.1 73.1 43.0 -30.1 -41.1
50% (0.25, 0.125) 79.6 69.9 -9.7 -12.2 75.2 46.2 -29.0 -38.6
60% (0.30, 0.15) 80.6 71.6 -9.0 -11.2 77.2 49.6 -27.6 -35.7
70% (0.35, 0.175) 81.5 72.8 -8.7 -10.7 78.8 52.3 -26.5 -33.6
80% (0.40, 0.20) 82.0 73.8 -8.2 -10.0 80.0 54.9 -25.1 -31.4
90% (0.45, 0.225) 82.6 75.0 -7.6 -9.2 81.4 57.4 -24.0 -29.5
100% (0.50, 0.25) 82.8 75.4 -7.4 -9.0 82.3 58.4 -23.9 -29.0
200% (1.0, 0.5) 81.4 74.9 -6.5 -8.0 88.6 68.7 -19.9 -22.5
300% (1.5, 0.75) 74.5 68.9 -5.7 -7.6 91.3 73.5 -17.8 -19.5
400% (2.0, 1.0) 63.7 59.7 -4.0 -6.3 93.0 76.4 -16.6 -17.9
500% (2.5, 1.25) 50.5 48.8 -1.7 -3.4 94.0 78.5 -15.5 -16.5
600% (3.0, 1.5) 37.3 38.0 0.7 1.8 94.6 80.3 -14.4 -15.2
700% (3.5, 1.75) 25.9 28.4 2.6 9.9 95.1 81.6 -13.6 -14.2
800% (4.0, 2.0) 16.8 20.1 3.3 19.4 95.5 82.8 -12.8 -13.4
Moreover, the differences in average power in the 2-cohort design are much smaller (~3 fold)
than those in the 3-cohort design. Overall these data suggest that the power loss in the ALD from
decreases to the overlap are minimized when cohort differences are present and that these
between-overlap differences are greater in designs with more cohorts.
When examining changes to the GCR we expect that increases to the GCR will increase
power across designs regardless of the CSR specified. In Figure 6.14 below we show that this is
172
indeed the case and that improvements to the GCR are more impactful when the CSR is low and
the overlap between cohorts is low. For example, in the 2-cohort (panel C) and 3-cohort (panel
D) designs with low overlap, the power curves at a CSR of 0.10 show power similar to the SCD.
Figure 6.14: Power curves for the population slope by CSR and GCR
In Table 6.7 we can examine the average power differences across the sample sizes between
GCRs across the CSRs for both the high and low overlap conditions. As we can see, average
difference between GCRs is minimized at a high CSR and that changes to the GCR are more
impactful when there is a low amount of overlap between cohorts.
173
Table 6.7: Average Power Differences between GCRs by CSR
Cn = 2 Cn = 3
Overlap
CSP
(CSR-Cn2, CSR-
Cn3)
GCR=.5
Avg
Power
GCR=.9
Avg
Power
Δ %Δ
GCR=.5
Avg
Power
GCR=.9
Avg
Power
Δ %Δ
High
0% (0.0, 0.0) 69.0 84.7 15.7 22.8 62.4 83.2 20.9 33.4
10% (0.05, 0.025) 72.1 86.3 14.2 19.7 65.2 85.1 19.9 30.5
20% (0.10, 0.05) 74.6 87.8 13.2 17.7 68.1 86.5 18.3 26.9
30% (0.15, 0.075) 76.6 88.8 12.2 15.9 70.8 87.7 17.0 24.0
40% (0.20, 0.10) 78.2 89.6 11.5 14.7 73.1 88.8 15.7 21.5
50% (0.25, 0.125) 79.6 90.3 10.7 13.4 75.2 89.7 14.4 19.2
60% (0.30, 0.15) 80.6 90.6 10.0 12.4 77.2 90.5 13.3 17.2
70% (0.35, 0.175) 81.5 90.9 9.4 11.5 78.8 91.1 12.4 15.7
80% (0.40, 0.20) 82.0 91.2 9.1 11.1 80.0 91.8 11.7 14.7
90% (0.45, 0.225) 82.6 91.4 8.8 10.7 81.4 92.4 11.1 13.6
100% (0.50, 0.25) 82.8 91.4 8.6 10.4 82.3 92.7 10.4 12.6
200% (1.0, 0.5) 81.4 88.5 7.1 8.7 88.6 95.7 7.0 7.9
300% (1.5, 0.75) 74.5 80.7 6.2 8.3 91.3 96.8 5.5 6.0
400% (2.0, 1.0) 63.7 68.4 4.6 7.3 93.0 97.3 4.3 4.7
Low
0% (0.0, 0.0)
56.0 83.0 27.0 48.2 27.4 70.4 42.9 156.4
10% (0.05, 0.025)
59.6 84.4 24.8 41.6 31.8 73.9 42.0 131.9
20% (0.10, 0.05)
62.8 86.0 23.2 36.9 36.0 76.3 40.3 112.0
30% (0.15, 0.075)
65.5 87.2 21.7 33.1 39.7 78.4 38.7 97.5
40% (0.20, 0.10)
67.9 87.9 20.0 29.5 43.0 80.3 37.2 86.5
50% (0.25, 0.125)
69.9 88.6 18.8 26.8 46.2 81.8 35.6 77.1
60% (0.30, 0.15)
71.6 89.2 17.6 24.6 49.6 83.0 33.4 67.2
70% (0.35, 0.175)
72.8 89.5 16.7 23.0 52.3 84.3 32.0 61.3
80% (0.40, 0.20)
73.8 89.9 16.0 21.7 54.9 85.3 30.4 55.3
90% (0.45, 0.225)
75.0 90.2 15.2 20.3 57.4 86.3 28.9 50.4
100% (0.50, 0.25)
75.4 90.5 15.1 20.0 58.4 87.6 29.2 50.1
200% (1.0, 0.5)
74.9 87.4 12.5 16.7 68.7 92.2 23.5 34.3
300% (1.5, 0.75)
68.9 79.8 10.9 15.8 73.5 94.0 20.5 27.9
400% (2.0, 1.0)
59.7 67.7 8.0 13.3 76.4 95.0 18.6 24.4
Moreover, changes are more impactful in the 3-cohort ALD because of its lower average power
values at the low GCR. This further shows that the differences between the 2 and 3 cohort
designs are minimal when the GCR is high. To get a better idea about the CSR impacts on power
for the high GCR ALDs, we've tabulated the required sample sizes for both 80% and 90% power
below in Table 6.8. As a reminder from Chapter 3, the SCD showed a sample size of 35 and 46
for 80% and 90% power at a GCR of 0.9
174
Table 6.8: Sample Sizes for achieving power at GCR=0.9
Overlap
CSP
(CSR-Cn2, CSR-Cn3)
Cn = 2
80% Power
N
Cn = 2
90% Power
N
Cn = 3
80% Power
N
Cn = 3
90% Power
N
High
0% (0.0, 0.0) 39 54 41 56
10% (0.05, 0.025) 35 49 37 54
20% (0.10, 0.05) 32 43 34 50
30% (0.15, 0.075) 29 40 31 45
40% (0.20, 0.10) 28 38 29 42
50% (0.25, 0.125) 27 37 27 39
60% (0.30, 0.15) 26 36 26 37
70% (0.35, 0.175) 25 35 25 34
80% (0.40, 0.20) 24 35 24 31
90% (0.45, 0.225) 23 34 23 29
100% (0.50, 0.25) 23 35 23 29
200% (1.0, 0.5) 25 53 15 21
300% (1.5, 0.75) 50 >120 12 19
400% (2.0, 1.0) >120 >120 10 18
Low
0% (0.0, 0.0) 42 55 66 90
10% (0.05, 0.025) 38 51 59 79
20% (0.10, 0.05) 35 47 55 73
30% (0.15, 0.075) 32 44 51 68
40% (0.20, 0.10) 30 42 48 62
50% (0.25, 0.125) 29 40 44 59
60% (0.30, 0.15) 28 39 41 56
70% (0.35, 0.175) 27 38 38 53
80% (0.40, 0.20) 26 38 35 49
90% (0.45, 0.225) 25 38 33 47
100% (0.50, 0.25) 24 38 30 45
200% (1.0, 0.5) 27 63 22 32
300% (1.5, 0.75) 61 110 18 28
400% (2.0, 1.0) 110 110 16 26
In the high overlap condition both the 2 and 3 cohort designs achieve equal power to the SCD at
a CSR of 0.05 for 80% power and between 0.05 and 0.10 CSR for 90% power. In the low
overlap condition there are greater differences between the 2 and 3 cohort designs such that the
2-cohort design equals the power of the SCD at a CSR of 0.10 and between 0.10 and 0.15 for
80% and 90% power respectively. For the 3-cohort design this increases to a CSR of 0.20 and
between 0.20 and 0.225 for 80 and 90% power. Overall these results indicate that the
improvements in power by increasing the GCR are minimized when there are between-cohort
differences present and that designs with less overlap and more cohorts are more sensitive to
these changes in GCR (a finding similar to the one in Chapter 3).
175
6.3.5 Power to detect the population fixed-effect slope and the impact of a constant effect size
As mentioned previously in this chapter, the finding that increases to the CSR (or CSP)
result in increases in power of the population slope are partially due to the fact that in these
simulations the variability in the slope was held constant despite the increases to the slopes in the
older cohorts. This is discussed in the beginning of section 6.3.3 and the impact of this displayed
in Figure 6.10. This has the effect of increasing the effect sizes in the older cohorts resulting in a
population slope that has a larger effect size when cohort differences are present than when they
are not (i.e. at CSR=0). While we believe that this assumption of holding the slope variability
constant between cohorts is correct when simulating these designs, we will explore the
consequences of increasing the slope standard deviation in the older cohorts as the cohort
differences are increased. This will have the effect of ensuring that there is a constant effect size
of 0.5 in each cohort in these simulations. In Figure 6.15 below, we compare the power curves
between designs with and without the effect size held constant across the CSRs which correspond
to cohort slope percentages ranging from 0 to 90%.
Figure 6.15: Power Curves by CSR with and without constant effect size
At a CSR of 0, the power was equivocal between designs with and without the effect size held
constant. This is expected as at a CSR of 0 even the simulations that are not holding the effect
size constant would nevertheless have the same effect sizes between the cohorts in the absence of
between-cohort differences. As the CSR increases, we still see increases to power in constant
effect size simulations, however the increases in power occur at much slower rate than they do in
the designs where the effect size is allowed to vary between cohorts. The percentage difference
between the two design types increase as the CSR increases and are minimized at the largest
sample size of N=120.
176
Table 6.9: Power difference between designs with and without constant effect size
N CSR
Cn = 2
high Overlap
Power %Δ
Cn = 2
low Overlap
Power %Δ
Cn = 3
high Overlap
Power %Δ
Cn = 3
low Overlap
Power %Δ
10
0.0 0 0 0 0
0.1 19.4 15.8 19.0 -2.0
0.2 30.8 16.1 34.2 20.1
0.3 39.4 24.3
0.4 48.0 15.0
60
0.0 0 0 0 0
0.1 4.7 11.1 9.1 12.2
0.2 8.4 13.9 13.1 28.2
0.3 13.0 15.0
0.4 14.6 17.6
120
0.0 0 0 0 0
0.1 0.7 4.3 1.3 11.4
0.2 0.5 4.7 2.1 16.3
0.3 0.6 5.3
0.4 2.0 5.2
As we can see in Table 6.9, power differences between the designs were greater in the smaller
samples sizes, increased as the CSR increased, and that the differences were generally greater for
the 3-cohort design. The low overlap condition in particular showed greater differences at larger
sample sizes. The overall conclusions of these analyses indicate that power differences between
the two methods for simulating these slopes can be minimized by using a larger sample size and
designing ALDs with a greater amount of overlap between cohorts. Nevertheless, these
proportional differences in power were as high as 48% reflecting an absolute difference of 12
percentage points in power between the two methods.
6.3.6 Consequences of misspecification on the power to detect the population slope
One issue that we have not yet addressed with regard to the power to detect the slope in
the presence of between-cohort differences is that some researchers may have collected
multicohort data over time but have implemented an analysis that assumes no between-cohort
differences. Indeed, the collection of multicohort data is common for studying age-related
changes, however not everyone analyses the data in a manner that can capture between-cohort
variability. In this section we will examine the consequences of fitting a mixed model that
excludes the modeling of between-cohort variability when in fact cohort differences exist. This
mixed model will be the same one specified in equation 3 in the beginning of this chapter, but
177
that excludes the terms for a random cohort intercept (υ 0k) and random cohort-specific slope (υ1k
xijk). In Figure 6.16 we examine how the power to detect the slope is changed when the model is
misspecified.
Figure 6.16: Power Curves by CSR with and without model misspecification
For both the 2 and 3-cohort designs at high (panel A) and low (panel B) overlap, the
misspecified model shows greater power to detect the population slope. Moreover, though the
differences in power are slight when the overlap is high and the sample size larger; at low
overlap the power differences become even greater as the CSR increases. This is particularly true
for our 2-cohort ALD which shows substantial power loss after CSRs of 1 in the correctly
specified model but not in the misspecified one. Despite this increased power to detect the slope
through misspecification, as we shall see in the subsequent chapter (section 7.2.5), this improved
power comes with the trade-off of increased bias in the slope estimates.
178
Table 6.10: Power difference between designs with and without model misspecification
N CSR
Cn = 2
high Overlap
Power %Δ
Cn = 2
low Overlap
Power %Δ
Cn = 3
high Overlap
Power %Δ
Cn = 3
low Overlap
Power %Δ
10
0.0 -6.0 -15.6 9.4 0
0.1 -2.2 6.6 -1.0 8.6
0.2 -0.2 2.7 -4.2 -3.0
0.3 0.3 3.2
0.4 -5.2 -2.4
60
0.0 -4.5 -4.9 -1.7 -7.5
0.1 -2.2 -4.3 -2.6 -6.1
0.2 -2.8 -4.9 -3.2 -15.5
0.3 -3.2 -7.3
0.4 -4.5 -10.6
120
0.0 0.0 1.0 -0.6 -1.6
0.1 -0.3 1.0 -0.9 -6.8
0.2 -1.1 -1.3 -0.9 -12.3
0.3 -2.0 -1.8
0.4 -2.5 -3.4
Negative values in the Table 6.10 indicate that the power is greater in the misspecified analysis.
As we can see, in general, as the CSR increases the differences between the misspecified
analysis and the correctly specified analysis increase. This occurs to an even greater extent for
the larger cohort design as well as when there is less overlap. At most, in this table, the
misspecified analysis results in an 15.5% proportional increase in power which equates to a 10-
point absolute difference in power values between the two methods. While overall these results
indicate that it may be beneficial to incorrectly analyze multicohort data, especially where
between-cohort differences are present; the detriments to bias in the slope (Chapter 7, sec 7.2.5)
and the inability to detect between-cohort variability are a drawback.
6.4 Statistical power of ALDs to detect between-cohort variability
The previous section (6.3) examined how it is that we simulated the designs in this
chapter (6.3.1), the interrelation among measures describing between-cohort variability (6.3.2),
the impact of this variability on the population slope generalizability (6.3.3), and how these
cohort differences translate to power to detect the fixed effect slope (6.3.4-6.3.6). Now we will
explore how the aforementioned between-cohort differences in slope can affect the ability to
detect this between-cohort variability. The power to detect between-cohort differences in slope
179
was determined by comparing a reduced model without cohort random intercept (υ0k) and
random slope (υ1k xijk) to the full model which incorporated these terms (equation 3) using a
likelihood-ratio test. The proportion of tests that were statistically significant at the 0.05 level
was taken to be the power to detect these effects.
6.4.1 Power to detect between-cohort slope differences
We will initially examine the power to detect the cohort differences in a scenario with
high overlap between the cohorts (Cs=1), low GCR (0.5), and moderate effect size (0.5). In
Figure 6.17 we examine the power as a function of the CSR at various sample sizes for designs
of varying number of cohorts.
Figure 6.17: Cohort Slope Ratio and the power to detect between-cohort slope variability
In panel A we examine power across the full spectrum of the CSRs examined while in panel B
we focus on CSRs less than 1.0. At a given CSR we find that designs with a greater number of
cohorts show more power. This aligns with the findings of Galbraith et al. (2017) who noted that
when employing a random effects design for detecting cohort differences, designs with more
cohorts showed more power. The opposite conclusion when reached when Galbraith employed a
fixed effects approach to modeling cohort variability, which we did not attempt here due to the
necessitated increase in expending degrees of freedom in a fixed-effects only method which
would dampen power in the small sample sizes presented. We additionally note that the power
differences between cohorts is decreasing as the number of cohorts increases. In Table 6.11 we
can see the CSRs required to achieve 80% and 90% power to detect these differences at the
sample sizes of N=60 and N=120.
180
Table 6.11: CSRs needed for 80% and 90% power to detect between-cohort differences at high
overlap and low GCR (0.5)
Power N
Cn = 2
CSR
Cn = 3
CSR
Cn = 4
CSR
Cn = 5
CSR
80%
60 1.23 0.73 0.57 0.47
120 0.89 0.52 0.41 0.32
90%
60 1.38 0.86 0.64 0.52
120 0.95 0.64 0.46 0.36
Table 6.11 illuminates the amount of between-cohort variability that is needed in order for it to
be detected for various designs. Based on what we know about the performance of the 2-cohort
model with respect to power to detect the fixed effect, the CSRs needed to detect between-cohort
differences are very close to the CSRs at which the power for the fixed effects begin to decrease
in these designs. As a result, this suggests that the 2-cohort design may not be viable for
detecting both a fixed effect population slope and between-cohort differences at the low GCR of
0.5 and small sample sizes used here. Given this, a researcher employing the 2-cohort design
would need to exercise caution when employing this design and either minimize the CSR to
focus on the detection of the population slope or have a larger CSR to detect between-cohort
variability. In terms of CSP these CSR values indicate that for the 2-cohort design slope
differences over 200% are needed to have adequate power to detect cohort differences, a tripling
of the slope in the youngest cohort. Table 6.12 converts these CSRs from Table 6.11 into CSPs
to get an idea on the amount of change in the slope between the cohorts needed to be able to
detect these differences.
Table 6.12: CSPs needed for 80% and 90% power to detect between-cohort differences at high
overlap and low GCR (0.5)
Power N
Cn = 2
CSP
Cn = 3
CSP
Cn = 4
CSP
Cn = 5
CSP
80%
60 250% 290% 340% 380%
120 180% 210% 250% 260%
90%
60 280% 340% 380% 420%
120 190% 260% 280% 290%
For all of these designs at the low GCR, a very high CSP is needed in order to detect the
between-cohort differences suggesting that seeking a slope that generalizes across cohorts is
antithetical to the goals of detecting between–cohort variability at these small sample sizes.
181
6.4.2 Power to detect cohort effects and changes in Overlap and GCR
Previous examinations of changes to overlap have routinely shown that as the overlap
between cohorts is decreased, the power to detect the fixed effects are also decreased. In this
section we'll examine how changes to the overlap modify the power to detect between-cohort
differences in the slopes of ALDs. Figure 6.18 we plot the power curves for the 2-cohort (panel
A) and 3-cohort (panel B) designs as a function of the CSR.
Figure 6.18: Power curves to detect between-cohort variability by CSR and Overlap
For the 2-cohort design, the differences between the overlap conditions are minimal indicating
that differences in overlap are not likely to make a difference with respect to the amount of
power to detect cohort variability in the 2-cohort design. However, for the 3-cohort design we
find that the low overlap condition always shows lower power than the high overlap condition
and that this difference between the conditions is greater at lower CSRs. In Table 6.13 we can
see the sample sizes required to achieve 80% and 90% power at the CSRs for both the high and
low overlap conditions at a GCR of 0.5.
Table 6.13: CSRs for 80 and 90% power to detect between-cohort differences at low GCR (0.5)
Power CSR
Cn = 2, Cs=1
high overlap
N
Cn = 3, Cs=3
low overlap
N
Cn = 3, Cs=1
high overlap
N
Cn = 3, Cs=3
low overlap
N
80%
0.5 >120 >120 >120 >120
1.0 78 77 35 43
1.5 36 37 21 24
2.0 24 25 17 18
90%
0.5 >120 >120 >120 >120
1.0 94 95 44 53
1.5 45 46 26 29
2.0 29 30 19 21
182
The sample size requirements for power in the 2-cohort design are roughly equivalent between
overlap conditions while for the 3-cohort design the sample size is larger in the low overlap
condition, though only for lower CSRs. Overall these results indicate that low overlap only
contributes to decrements in power to detect between-cohort variation for designs with a greater
number of cohorts and even then, only at lower CSR. These results are somewhat contradictory
to those of Galbraith et al. (2017) who showed that greater overlap yielded more power to detect
cohort effects. Part of this discrepancy may stem from the fact that Galbraith primarily examined
between-cohort differences as a function of the total number of measurements. When we graph
these differences as a function of the number of measurements in Figure 6.19 below, we indeed
find that there are greater differences between the overlap conditions at a fixed number of
measurements.
Figure 6.19: Power curves to detect between-cohort variability by CSR and Overlap as a
function of the total number of measurements
However, though the differences between conditions are exaggerated at a fixed number of
measurements, we instead find that the low overlap condition is more powerful for detecting
these between-cohort differences. This is in stark contrast with the findings of Galbriath when
using a random effects model for cohort differences but does agree with the findings by
Galbraith when modeling was performed using fixed cohort effects.
183
Similar findings for GCR were found such that improvements to GCR had minimal
impact on the power to detect between-cohort variability (Figure 6.20). This was particularly true
for the high overlap conditions (panels A & B) of both designs.
Figure 6.20: Power curves for detecting between-cohort variability by CSR and GCR
In the low overlap condition (panels C & D), the impact of improved GCR on power was more
pronounced, though only for the low 3-cohort design. In Table 6.14 below, the sample sizes
required to achieve 80% and 90% power at the CSRs for both the high and low overlap
conditions at the GCR of 0.9 are displayed.
Table 6.14: CSRs for 80 and 90% power to detect between-cohort differences at high GCR (0.9)
Power CSR
Cn = 2, Cs=1
high overlap
N
Cn = 3, Cs=3
low overlap
N
Cn = 3, Cs=1
high overlap
N
Cn = 3, Cs=3
low overlap
N
80%
0.5 >120 >120 >120 >120
1.0 71 74 35 36
1.5 36 36 22 23
2.0 24 24 17 18
90%
0.5 >120 >120 >120 >120
1.0 88 88 43 45
1.5 45 45 27 28
2.0 29 29 20 21
184
In comparison to the sample sizes required at a low GCR (Table 6.13) the results in Table 6.14
show minimal differences for the 2-cohort or high overlap designs. Indeed, only the 3-cohort low
overlap design shows any difference in the required sample size and even so only at a lower CSR
of 1.0. As a result, the findings from both changes to GCR and overlap suggest that neither are
major factors in determining the power to detect between-cohort differences.
6.4.3 Power to detect between-cohort variability and the impact of a constant effect size
If the simulations described above had been conducted using a constant effect size across
the cohorts we would expect reductions in the power to detect between-subject effects resulting
from the increased variability within older cohorts. Indeed, we find large reductions in power to
detect these effects, even at the larger sample size (N=120) when using a constant effect size
(Figure 6.21) in both the high (panel A) and low (panel B) overlap conditions.
Figure 6.21: Power Curves by CSR with and without constant effect size
When employing the constant effect size, the power to detect between-cohort variability never
reaches above 50% for either of the designs at the CSRs specified. Moreover, the differences
between designs differing in the number of cohorts show a similar power curve indicating that
utilizing more cohorts is not a successful strategy for boosting power under these conditions. For
researchers interested in detecting between-cohort variation utilizing a constant effect size, much
larger differences between the cohorts would need to be present than the ones specified in these
simulations which explored up to 800% CSP.
185
6.5 Chapter Summary
6.5.1 Simulating cohort differences
In this chapter we introduced extensions to the ALD mixed model for capturing between-
cohort variation in developmental growth. We discussed ways for introducing between-cohort
variation, noting that using an unrestricted growth method can be used to hold constant the ratio
of between-to-within cohort variability when additional cohorts are added; but that a restricted
growth method can be used to hold the fixed effects slope constant between designs when
additional cohorts are added. We also showed that the growth can more easily be expressed as
percentages when there is uncertainty in the amount of between-cohort variation to induce.
Lastly, we introduced the cohort slope (or intercept) ratio (CSR) to express the ratio of between-
to-within cohort variability.
6.5.2 The generalizability of the slope
One concern when modeling between-cohort variability is the generalizability of the
fixed effect population slope to the cohorts that are being model when it is known that age-by-
cohort interactions exist. We showed that the average error rate for the population fixed effects
will be half the value of the cohort slope percentage, but that how ‘nested’ the population fixed
effect slope is will be more dependent on the cohort slope ratio rather than the total amount of
growth determined by the percentage. We proposed a metric, called slope misfit, that quantified
the average differences between predicted and observed slopes as a means for quantifying how
well the model was able to capture the individual level changes. We found that designs with a
smaller number of cohorts, greater overlap, larger effect sizes, and higher GCR all showed lower
levels of misfit. Changes to the CSR showed minimal differences in the slope misfit and showed
that misfit was slightly higher when models implied cohort differences by fitting cohort random
effects when there were none.
6.5.3 Power of the slope fixed effect
We saw that as the amount of between-cohort differences are increased, the power to
detect the population slope increased. This resulted from the fact that higher CSPs resulted in
increasing effect sizes for subsequent cohorts when the between-subject slope variability was
held constant between cohorts. The 2-cohort design proved to be an exception to this which
showed that at higher CSRs power would show continued decreases, though this occurred mostly
after a CSP of ~200% (or rather CSR of ~1). Despite this, with even small amounts of between-
186
cohort variability (i.e. 20-40%) the ALDs in the high overlap condition begin to show smaller
sample size requirements to achieve 80% power compared to the SCD. While designs in the high
overlap condition showed better performance in terms of power, as the between-cohort
differences were increased the power differences between the overlap conditions decreased. This
suggested that the amount of overlap becomes less important when between-cohort differences
are present. At high GCR, differences in power for the 2- and 3-cohort designs were minimized
and it was found that changes to the GCR are less impactful on power in the presence of
between-cohort differences.
When employing a constant effect size through increasing the standard deviation of the
slope in subsequent cohorts, we found that power still increased as between-cohort differences
increased but that the rate of increase was less steep. Designs with fewer cohorts and greater
overlap minimized the power differences between the constant effect size and our typical (i.e.
increasing effect size) methods. For the CSRs investigates (<0.5) a maximal difference of 12
percentage points was found between the methods.
We also examined the power of these models if between-cohort differences were present,
but not modeled. These misspecified models resulted in greater power to detect the fixed effect
and showed increasing improvements in power over the correctly specified model. Nevertheless,
the subsequent chapter will demonstrate that this misspecification comes at a cost of increased
bias.
6.5.4 Power to detect between-cohort variability
The power to detect between-cohort differences (age-by-cohort interaction) was assessed
using a likelihood ratio test of the cohort random effects. We generally found that large cohort
differences were needed (200-400%) which correspond to larger CSRs (>0.5) in order to have
80% power to detect between-cohort effects. In many respects this makes the detection of
between-cohort differences antithetical to generating a generalized trajectory using small
samples. Differences in power as a result of overlap were minimal, particularly for the 2-cohort
design. While the 3-cohort design showed that less overlap decreased power, this effect was
diminished at larger CSRs. Changes to the GCR had minimal impact on the power to detect
between-cohort differences, though there were minor between GCR power differences in the low
overlap condition for the 3-chort design when the amount of between-cohort variability was low.
As a result, neither overlap or GCR were consider major factors in determining the power to
187
detect between-cohort variability. When employing the constant effect size, power was severely
reduced and between-cohort differences much larger than 800% would be necessary to achieve
80% power.
188
6.6 Chapter References
1. Duncan, S. C., Duncan, T. E., & Strycker, L. A. (2006). Alcohol use from ages 9 to 16: A
cohort-sequential latent growth model. Drug & alcohol dependence, 81(1), 71-81.
2. Finkel, D., Reynolds, C. A., McArdle, J. J., & Pedersen, N. L. (2007). Cohort differences in
trajectories of cognitive aging. The Journals of Gerontology Series B: Psychological
Sciences and Social Sciences, 62(5), P286-P294.
3. Galbraith, S., Bowden, J., & Mander, A. (2017). Accelerated longitudinal designs: an
overview of modelling, power, costs and handling missing data. Statistical Methods in
Medical Research, 26(1), 374-398.
4. Glenn, N. D. (2005). Cohort analysis (Vol. 5). London: Sage.
5. Jackson, N.J. (2017). ALDSIM: Stata program for the simulation of accelerated longitudinal
designs. Stata Version 15.0. revised 09.07.2017.
6. McCulloch, C. E., & Neuhaus, J. M. (2011). Misspecifying the shape of a random effects
distribution: why getting it wrong may not matter. Statistical Science, 388-402.
7. Miyazaki, Y., & Raudenbush, S. W. (2000). Tests for linkage of multiple cohorts in an
accelerated longitudinal design. Psychological Methods, 5(1), 44.
8. Moerbeek, M. (2011). The effects of the number of cohorts, degree of overlap among
cohorts, and frequency of observation on power in accelerated longitudinal
designs. Methodology: European Journal of Research Methods for the Behavioral and Social
Sciences, 7(1), 11.
9. O'Brien, R. M., Hudson, K., & Stockard, J. (2008). A mixed model estimation of age, period,
and cohort effects. Sociological Methods & Research, 36(3), 402-428.
10. Raudenbush, S. W., & Chan, W. S. (1992). Growth curve analysis in accelerated longitudinal
designs. Journal of Research in Crime and Delinquency, 29(4), 387-411.
11. Raudenbush, S. W., & Chan, W. S. (1993). Application of a hierarchical linear model to the
study of adolescent deviance in an overlapping cohort design. Journal of Consulting and
Clinical Psychology, 61(6), 941.
12. Ryder, N.B. (1965). The Cohort as a Concept in the Study of Social Change. American
Sociological Review. 30(6): 843-861.
189
13. Verbeke, G., & Lesaffre, E. (1999). The Effect of Drop-Out on the Efficiency of
Longitudinal Experiments. Journal of the Royal Statistical Society: Series C (Applied
Statistics), 48(3), 363-375.
14. Yang, Y., & Land, K. C. (2006). A mixed models approach to the age-period-cohort analysis
of repeated cross-section surveys, with an application to data on trends in verbal test
scores. Sociological Methodology, 36(1), 75-97.
15. Yang, Y., & Land, K. C. (2008). Age–period–cohort analysis of repeated cross-section
surveys: fixed or random effects? Sociological Methods & Research, 36(3), 297-326.
16. Yang, Y., & Land, K. C. (2016). Age-period-cohort analysis: New models, methods, and
empirical applications. Chapman and Hall/CRC.
190
Chapter 7
Bias, Estimator Efficiency, and Coverage Probability in the presence of between-cohort
differences in linear accelerated designs
Having examined the power to detect the fixed effect slope and between-cohort
differences in the previous chapter, we now examine the issues of bias, estimator efficiency, and
coverage probability in these designs when between-cohort differences are present. While
Chapter 5 contains the definitional aspects of these terms and their standard application, these
concepts when applied in the context of between-cohort variance can take on a different meaning
or have additional complications in what they actually measure. To the best of our knowledge no
publication has addressed how bias, efficiency, or coverage behave in an ALD in the presence of
between-cohort differences, thus we believe this work to be entirely original.
7.1 Considerations for bias, efficiency, and coverage in the presence of between-cohort
differences
Bias in the slope estimation when employing a multi-cohort design is not necessarily
straight forward as there are two potential forms of bias to consider 1) the generalizability of the
population slope to data in all of the cohorts and 2) the bias in the estimation of the population
slope. The former, while not a traditional measure of bias per se, gets at the notion that our
estimate for the population slope should represent an underlying 'truth' about the growth in all of
the cohorts. We termed this the 'generalizabilty' of the slope estimate and discussed this in more
detail in Chapter 6, section 6.3.3. This chapter will concern itself with the more traditional
concept of bias described by the latter. For the designs we examined in the previous chapter, the
population slope was not representative of any one cohort but rather represented the average of
the cohort-specific slopes defined by equation 6 in Chapter 6. Deviations from this average will
be considered the bias in detecting the population slope which will be reported as a percentage
change as described in Chapter 5. It should be noted that the way in which between-cohort
differences were created (unidirectional with equal spacing) means that designs with an odd
number of cohorts (eg. 3 and 5) will have a population slope that passes through the average
within-cohort slope for the cohort in the middle of the age distribution. As a result, we might
191
expect bias estimates to be lower in these designs simply by virtue of how the between-cohort
differences were created.
When examining bias in the between-cohort slope variance parameter, the expected
variance was computed as the sum of the squared difference between the cohort specific slopes
and the population level values averaged across the number of cohorts in the design (see Chapter
6 for equations). This was compared to the estimated between-cohort variance from the model
and was expressed as a percentage difference. Because we utilized the restricted growth method
(explained in chapter 6) in generating these between-cohort differences, as the number of cohorts
increased the between-cohort variance decreased. As mentioned in the previous chapter, the
assumption of these random effects to be normally distributed cannot easily be tested due to the
sparse values that are needed to represent between-cohort variation. As a result, the random
effects distribution may be misspecified potentially resulting in increased bias for the between-
cohort variance point estimate (McCulloch & Neuhaus, 2011). Though these models were fit
using maximum likelihood estimation, a nonparametric likelihood approach could be utilized to
account for the incorrect specification of the random effects distribution (Agresti, Caffo &
Ohman-Strickland, 2004).
Estimator efficiency will be reported for both the fixed effect population slope as well as
for the between-cohort random slope variance parameter. As we did in Chapter 5, we will report
efficiency values in both absolute and relative terms with the relative values being the ratio of the
efficiency in the ALD to the SCD. In order to ensure a consistent and logical directionality on the
figures for efficiency we will label the y-axis as 'Inefficiency' such that larger values are more
naturally interpreted as indicating a more inefficient estimator. Coverage probability will only be
reported for the population slope because the 95% confidence intervals for the random cohort
effects from these models using a Wald test are likely to be asymptotically incorrect. As a
reminder from our description of coverage probability in Chapter 5; coverage greater than 95%
would indicate large standard errors (or bias) resulting in a conservative test (type 1 error < 5%)
while coverage below 95% would indicate smaller standard errors resulting in a permissive test
(type 1 error > 5%). Additionally, coverage probability is a function of both bias and the
standard error estimation, thus changes to either can influence the coverage in the designs.
192
7.2 Bias, efficiency, and coverage of the population slope in the presence of between-cohort
differences
7.2.1 The population fixed-effect slope at high overlap
When examining bias in the population slope for the ALDs as a function of the cohort
slope ratio (CSR) we find that for the high overlap condition designs with a larger number of
cohorts generally show more bias and that this bias is initially greater as the CSR increases
(Figure 7.1A). The bias subsequently decreases to within 1% of the slope estimate for all designs
after a CSR of 0.5. Panels A & B show the ALDs are various sample sizes (N=10, 60, 120) while
panels C & D focus on just the largest sample size of 120. Additionally, panels A & C show the
full range of CSRs explored from 0 to 4.0 while panels B & D focus on CSRs less than 0.5 to
allow for a more fine grained analysis of what is occurring at low CSR values.
Figure 7.1A: Bias in detecting the population slope in the high overlap condition at GCR=0.5
Though we can see in the figures that the 2-cohort design tends to underestimate the population
slope at a large sample size while the 3-, 4-, and 5-cohort designs overestimate it at all CSRs
(panels C & D), this is not likely to generalize, as our previous investigations have shown bias to
be independent of sample size at sufficiently large Ns (e.g. N>40). Additionally, an interesting
saw-tooth pattern is observed in all designs at low CSR (~0 to 0.30) such that bias will show an
oscillating pattern of local minima and maxima within this small window. For the 2-cohort
design, this pattern results in less overall bias in the slope as the %bias values become less
193
negative. For the other designs this results in increased bias. Overall this seems to indicate that
increasing the CSR at these low values results in a trend toward overestimation of the slope
parameter and that the shifts in trending towards decreased bias occur around CSRs of 0.13 (5-
cohort), 0.17 (4-cohort), 0.25 (3-cohort), and 0.5 (2-cohort) which all correspond to cohort slope
percentages (CSP) of 100% in these designs. This suggests that in-general the bias will increase
in these designs until the CSP is greater than 100%.
When examining the efficiency in Figure 7.1B we report efficiency (panel A) and relative
efficiency as a function of the sample size for the various ALDs and the SCD. It should be noted
that the sample sizes for the 4- and 5-cohort designs were restricted to values of 10, 60, and 120
as explained in Chapter 6 Table 6.1, hence their hockey-stick shaped appearance.
Figure 7.1B: Estimator efficiency of the population slope in the high overlap condition at
GCR=0.5
As we've shown in the past (Chapter 5) efficiency of the slope estimate is inversely related to the
number of cohorts used and improves as the sample size increases. We additionally find that, in
absolute terms (panel A), there are minimal differences in efficiency between the varying CSRs
values, with larger CSRs showing higher inefficiency. For the relative efficiency (panel B), we
find that between-CSR differences are greater at larger sample sizes and that, on-average, the
designs show a loss in efficiency that is 1.1, 1.3, 1.4, and 1.6 times greater than the SCD for the
2-, 3-, 4-, and 5-cohort designs respectively.
For coverage probability in the high overlap condition, increases in between-cohort
differences in slope result in increased coverage, reflecting a more conservative test with lower
type 1 error (Figure 7.1C). When there are no between-cohort differences we found (as in
194
Chapter 5) that coverage oscillates around the nominal 0.95 level as a function of sample size
(panel A).
Figure 7.1C: Coverage probability of the population slope in the high overlap condition at
GCR=0.5
Although the 4- and 5-cohort design appear to have lower coverage lower than the nominal level,
this is a result of the limited sample size used for these designs which does not allow for showing
the oscillation around 0.95. When CSR values are increased past a CSR of ~0.3 (panel B) the
average coverage probability across the sample sizes increases, with the 2-cohort design
showing the most robustness against this change in coverage. As a consequence, ALDs with
between-cohort variation will have good type 1 error control for detecting the slope despite the
increased bias (Fig 7.1A) and power (Chapter 6, Fig 6.11).
7.2.2 Comparisons between high and low overlap conditions
Because bias tends to oscillate across sample sizes (as shown in chapter 5) we will
henceforth report the bias in terms of absolute. Additionally, with the exception of very low
sample sizes (eg. N <=40), we do not anticipate changes to bias as a function of sample size. As
a result, we will subsequently report bias as averaged across the sample sizes of 10 to 120.
Figure 7.2A shows the average of the bias for the 2- and 3-cohort designs (panels A & B) at a
low GCR (0.5) as a function of CSR and overlap. The bias value for the SCD is also reported for
comparison.
195
Figure 7.2A: Bias in detecting the population slope between overlap conditions at GCR=0.5
The average bias across sample sizes for the 2-cohort design increases and then decreases as the
CSR increases for the high overlap condition and remains under that of the SCD (panels A & C).
Bias initially increased until a CSR of ~0.45-0.50 reflecting a CSP of 100% after which the bias
decreased and became less than the SCD bias at a CSR of ~1.5 (CSP 200%). For the 3-cohort
design, the high overlap condition is best viewed when scaled with the 2-cohort design in panel
C. We find that the bias increases until around a CSR of 0.17 (CSP 70%) and subsequently
decreases becoming lower than the SCD at around a CSR of 0.35 (CSP ~140%). For the low
overlap condition (panel D), bias increases until a CSR of ~0.22 (CSP 90%) after which bias
values decrease though never reaching the level of the SCD for the CSRs explored. From these
figures we can also see that the impact of decreasing the overlap is much greater in the 3-cohort
design with a 16.7 fold-difference in the maximal bias whereas the 2-cohort design shows at
most a 12.8 fold-difference in the maximal bias between overlap conditions. Overall these
results suggest that at a low GCR (0.5) bias is acceptable for the 2- and 3-cohort high overlap
designs regardless of CSR while for the low overlap condition only the 2-cohort design performs
well (i.e. within 5%).
For efficiency, based on the results from Chapter 5 where there were no between-cohort
differences (CSR=0) we would expect that a decrease in the overlap would yield a less efficient
estimate of the slope. Indeed, this is still holds true when between-cohort differences are present
196
as seen in Figure 7.2B as a function of sample size (panels A & B) and CSR (panels C & D).
Both the 2- and 3-cohort designs have worse efficiency in the low overlap condition with a 1.14
and 1.65 fold-difference in the absolute efficiency levels between conditions averaged across
CSRs and sample size (panels A & C).
Figure 7.2B: Estimator efficiency of the population slope between overlap conditions at
GCR=0.5
The within-design differences in efficiency between differing CSRs was also greater in the low
overlap condition (panels C & D), particularly for the 3-cohort design, showing that the role of
higher CSR in decreasing efficiency is more pronounced when the overlap is low. Of particular
note is that in the low overlap condition, the rate of increase in inefficiency is more pronounced
at CSRs ≤0.5. For the relative efficiency (panels B & D), as previously stated in the high overlap
condition the 2- and 3-cohort designs showed a 1.1 and 1.3 fold-difference from the SCD. In the
low overlap condition this increased to 1.4 and 2.4 in these designs, demonstrating a greater loss
in efficiency for the 3-cohort design.
For the coverage probability we now have additionally incorporated the original SCD
coverage values as well as the coverage in the low overlap condition for the 2- and 3-cohort
ALDs. We find that when there are no between-cohort differences (CSR=0) the coverage
between the high and low overlap conditions are similar to one another and the SCD across the
sample sizes (panel A). When the CSR is increased we find that the 2-cohort low overlap design
197
shows a similar pattern to the high overlap condition such that higher CSR results in greater
average coverage and consequently lower type 1 error (panel B).
Figure 7.2C: Coverage probability for the population slope between overlap conditions at
GCR=0.5
However, for the 3-cohort design, increased CSR in the low overlap condition initially decreases
the coverage (increasing type 1 error) with increases in coverage not seen until a CSR of ~0.25
(CSP 95%) and nominal coverage not achieved again until a CSR of ~0.7 (CSP 300%). This
initial drop in coverage for the 3-cohort low overlap design is likely due to the high level of bias
(Fig 7.2A) and poor efficiency (Fig 7.2B) of the design in the presence of a low GCR. In the next
section we will examine how improving the growth curve reliability can improve upon these
estimates in the presence of between-cohort differences.
7.2.3 Comparisons at high GCR
When the GCR is increased to 0.9, we would expect that bias would be reduced, which is
indeed what we saw such that all designs, regardless of overlap or CSR, showed bias values
below their values at the GCR of 0.5. Moreover, all bias values are within 3% of the true
population slope. Interestingly, we find that for the 2-cohort design, the low overlap condition
shows underestimation of the slope when CSR values are below 0.5 (CSP 100%), Figure 7.3A
panel A. For both conditions, peak bias occurs at a CSR of 0.05 (CSP 10%) which is less than
the peak bias in the GCR of 0.5 at 0.45 for the low overlap condition. For the 3-cohort design,
peak bias occurs at a CSR of 0.025 (CSP 10%) and 0.25 (CSP 100%) for the high and low
overlap conditions respectively.
198
Figure 7.3A: Bias in detecting the population slope between overlap conditions at GCR=0.9
In the high overlap condition, the 3-cohort design generally shows greater bias than the 2-cohort
design (panel C) at all CSRs; which aligns with what was found at low GCR. The within-design
difference between conditions was also reduced such that the fold-difference between conditions
at maximal bias was ~6 for the 2 and 3 cohort designs. Overall, at a high GCR, the level of bias
in the population slope is low and shows that introducing between-cohort differences into the
ALD is not a cause for concern with respect to bias. The average bias values by overlap
condition, CSR, and GCR are displayed below in Table 7.1. This table shows that, with the
exception of the 2-cohort high overlap design, improvements to the GCR are more beneficial
when between-cohort differences are present.
199
Table 7.1: Differences in bias between GCRs
Overlap Cn CSR
GCR=0.5
Avg Bias
GCR=0.9
Avg Bias
Fold-Difference
Bias
ΔGCR
- 1 - -0.25 -0.11 2.30
High
2
0.0 0.78 0.50 1.55
0.5 0.51 0.41 1.23
1.0 0.28 0.26 1.08
2.0 0.22 0.17 1.32
3
0.0 0.65 0.58 1.13
0.5 0.85 0.34 2.48
1.0 0.49 0.23 2.14
2.0 0.27 0.15 1.81
Low
2
0.0 0.92 0.44 2.11
0.5 2.61 0.53 4.90
1.0 1.70 0.38 4.45
2.0 0.70 0.25 2.81
3
0.0 0.93 0.71 1.31
0.5 16.99 1.95 8.73
1.0 7.91 0.95 8.34
2.0 2.23 0.41 5.46
For efficiency, we've previously found that increases in GCR will result in improved
efficiency when there are no cohort differences. In Figure 7.3B below, we can see that these
improvements also occur at a similar rate when between-cohort differences are present for both
the absolute (panels A & B) and relative efficiencies (panels C & D).
Figure 7.3B: Estimator efficiency of the population slope between GCRs of 0.5 and 0.9 at both
overlap conditions
200
When the GCR is increase to 0.9 in the high overlap condition (panels A & C) the differences
between the 2- and 3-cohort designs are minimized as well as their relative efficiencies compared
to the SCD are reduced. While previously the average relative efficiency at a low GCR was 1.1
and 1.3 in the high overlap condition, increases to the GCR result in a drop to 1.1 for both
designs. In the low overlap condition (panels B & D) there are even greater improvements in the
efficiency by increasing the GCR. At a low GCR we had found the average relative efficiency
across sample sizes and CSRs to be 1.4 and 2.4 for the 2- and 3-cohort designs in the low
overlap condition. By increasing the GCR this improves to 1.15 and 1.4 which are comparable to
the values for the high overlap/low GCR conditions. Table 7.2 shows that improvements to the
GCR generally benefit the design with a greater number of cohorts, less overlap, and at lower
CSRs. We can additionally notice that the improvements to the GCR benefit the ALDs more so
than the SCD (1.32 fold-difference).
Table 7.2: Differences in efficiency between GCRs
Overlap Cn CSR
GCR=0.5
Avg Abs
Eff
GCR=0.9
Avg Abs
Eff
Fold-Diff
Abs Eff
ΔGCR
GCR=0.5
Avg Rel
Eff
GCR=0.9
Avg Rel
Eff
Fold-Diff
Rel Eff
ΔGCR
- 1 - 0.82 0.62 1.32 - - -
High
2
0.0 0.88 0.63 1.39 1.07 1.02 1.05
0.5 0.88 0.64 1.38 1.07 1.02 1.05
1.0 0.89 0.65 1.37 1.09 1.05 1.03
2.0 0.93 0.70 1.32 1.14 1.12 0.99
3
0.0 0.98 0.66 1.49 1.18 1.05 1.13
0.5 1.00 0.67 1.48 1.21 1.07 1.13
1.0 1.02 0.70 1.45 1.24 1.13 1.10
2.0 1.09 0.80 1.36 1.35 1.33 1.01
Low
2
0.0 1.07 0.66 1.62 1.31 1.06 1.23
0.5 1.08 0.67 1.62 1.32 1.08 1.23
1.0 1.09 0.68 1.61 1.34 1.10 1.22
2.0 1.12 0.73 1.54 1.38 1.20 1.16
3
0.0 1.83 0.87 2.10 2.21 1.39 1.59
0.5 2.04 0.88 2.31 2.48 1.41 1.76
1.0 2.07 0.90 2.29 2.49 1.45 1.72
2.0 2.07 0.98 2.12 2.51 1.61 1.56
201
When examining the coverage probabilities between GCRs we find that in the high
overlap condition (Figure 7.3C, panels A & C) there are few differences between the low and
high GCR, however in general, as the CSR increases, the high GCR design showed greater
coverage.
Figure 7.3C: Coverage probability for the population slope between GCRs of 0.5 and 0.9 at both
overlap conditions
For the low overlap condition (panels B & D), though previously we found that coverage was
less than the nominal 0.95 for the 3-cohort design at a GCR of 0.5 when the CSR was low;
improvements to the GCR resulted in this trend disappearing. We posit that this is a result of the
improved bias and efficiency estimation at a GCR for these larger cohort designs improving the
coverage probability. Table 7.3 displays the coverage values for these designs at the high and
low overlap conditions for the GCRs and the difference between GCRs.
202
Table 7.3: Differences in coverage probability between GCRs
Overlap Cn CSR
GCR=0.5
Avg Coverage
GCR=0.9
Avg Coverage
Coverage
Difference
Δ(GCR0.9 - GCR0.5)
- 1 - 0.942 0.940 -0.002
High
2
0.0 0.946 0.944 -0.002
0.5 0.968 0.972 0.005
1.0 0.990 0.992 0.002
2.0 0.999 0.999 0.001
3
0.0 0.949 0.946 -0.003
0.5 0.978 0.985 0.007
1.0 0.995 0.998 0.003
2.0 1.000 1.000 0.000
Low
2
0.0 0.946 0.947 0.001
0.5 0.963 0.973 0.009
1.0 0.985 0.994 0.008
2.0 0.997 0.999 0.002
3
0.0 0.941 0.945 0.005
0.5 0.925 0.980 0.055
1.0 0.968 0.996 0.028
2.0 0.994 1.000 0.006
Overall, these results suggest that improvements to coverage as a result of GCR are most
beneficial for designs with a greater number of cohorts and low overlap. In the next section we
will examine how the metrics of bias, efficiency, and coverage probability change when
employing a constant effect size in generating the between-cohort differences.
7.2.4 Constant effect size
As was shown in Chapter 6 (section 6.3.5), using a constant effect size between-cohorts
through increasing the slope standard deviation resulted in a loss in power. In terms of bias we
find that the use of a constant effect size is detrimental to bias, increasing the peak bias at a
sample size of 120 in the high overlap condition (Figure 7.4A, panel A). Though we find that the
3-cohort constant effect size shows less bias, this is partially due to the stochastic properties of
the simulation when choosing a single sample size.
203
Figure 7.4A: Bias in detecting the population slope when using a constant effect size at GCR=0.5
In the low overlap condition (panel B) we find that the differences in using the constant effect
size are greater for the 3-cohort design and attenuated in the 2-cohort design. While the 2-cohort
design generally maintains a lower bias in the original design that does not use the constant
effect size, the 3-cohort design experiences the opposite such that the constant effect size has a
bias that is 7.5 points lower than the original design. This pattern is similar to what occurs in the
high overlap condition. This occurs despite the fact that the constant effect size shows less power
for all designs in the low overlap condition. These findings suggest that at high overlap using a
constant effect size will increase the bias while when there is low overlap the constant effect size
will show equivocal or less bias than the original design. We would expect designs using a
greater number of cohorts to show a similar pattern but with fewer bias differences as a result of
the effect size in the high overlap condition and greater differences favoring the constant effect
size in the low overlap condition. Moreover, we’d expect that a 4- or 5-cohort design with a
constant effect size might show equal bias or less bias than the original design in the high
overlap condition based on the patterns observed in these figures.
When evaluating efficiency in these designs, we would expect poor performance when
using the constant effect size, as increases to the slope standard deviation as the CSR increases
would result in increasingly poor efficiency in estimating the slope. Figure 7.4B confirms this, as
the designs using a constant effect size show similar efficiencies to the original designs at low
CSRs (<0.1) however these constant effect size designs quickly show a deterioration in
efficiency.
204
Figure 7.4B: Efficiency of the population slope when using a constant effect size at GCR=0.5
Interestingly, the differences between the overlap conditions are slightly reduced across the
CSRs when using the constant effect size, suggesting that between-condition differences are less
impactful on efficiency in these designs.
Using the constant effect size we find that the coverage probability is attenuated relative
to the original designs at the same CSRs (Figure 7.4C). Despite this, we generally find that
coverage increases for these designs as the CSR increases; however while the original designs
would tend towards an upper asymptote of 1.00, the designs using a constant effect size tend
towards an average coverage of 0.96.
Figure 7.4C: Coverage probability for the population slope when using a constant effect size at
GCR=0.5
Interestingly, the initial drop in coverage for the 3-cohort design in the low overlap condition is
also present for the constant effect size design with shifts in the trend at the same previously
mentioned CSRs. This indicates that this loss in coverage is ubiquitous for larger cohort designs
at low CSRs when the overlap is low. Further work with 4- and 5-cohort designs with low
205
overlap would be needed to confirm this. Lastly, despite the poor efficiency of the designs using
a constant effect size and higher bias in the high overlap conditions, the coverage probability is
close to nominal and errors conservatively at higher CSRs, indicating decent type 1 error control
when creating between-cohort differences using a constant effect size.
7.2.5 Misspecification of the model.
We saw in Chapter 6 (sec 6.3.6) that the collection of multicohort data analyzed without
incorporating cohort differences into the model resulted in greater power to detect the population
slope. These models are occasionally seen in the literature where an age-based trend is modeled
using data from multiple age cohorts studied over the same periods of time. Though often not
acknowledged as ALDs, the design and data collection methods follow the same process as in an
ALD.
Despite these designs showing greater power when utilizing a misspecified mixed model
that excludes cohort differences, the consequences of this misspecification can result in inflated
bias. In Figure 7.5A below we find that the misspecified model results in peak bias estimates that
are 2.7-5.6 times higher than in the correctly specified model when there is high overlap between
the cohorts. In the low overlap condition this ratio decreases to a 2.6-3.5 fold-difference.
Figure 7.5A: Bias in detecting the population slope with a misspecified model at GCR=0.5
For either condition, even at a CSR of 0 reflecting no cohort differences, the use of the
misspecified model showed higher bias than the correctly specified model. For the 2-cohort
design, bias values were always higher in the misspecified model, while for the 3-cohort design
bias values were similar between the two at models at CSR values less than 0.15 (CSP 60%).
After this CSR in the 3-cohort design, using the misspecified model resulted in much higher bias
for these designs.
206
Despite the increased bias from these misspecified designs, there were no meaningful
changes to the average efficiency when compared to the correctly specified design (Figure 7.5B).
Figure 7.5B: Efficiency of the population slope with a misspecified model at GCR=0.5
Given the greater amount of bias and similar efficiency to the correctly specified design
we would expect a lower amount of coverage probability in these designs, particularly for the 3-
cohort low overlap design. In Figure 7.5C we note how the coverage probability is generally
lower in the misspecified designs, and that these designs experience a greater difference between
overlap conditions than the correctly specified design. This is particularly noticeable in the 3-
cohort low overlap design (panel A) which shows an inflation of type 1 error to nearly 45% at a
CSR of 1.25. Indeed, even the 2-cohort design in both the high and low overlap conditions
demonstrates an inflated type 1 error rate at CSRs less than 1.0 (CSP 200%) (panel B).
Figure 7.5C: Coverage probability for the population slope with a misspecified model at
GCR=0.5
Overall the misspecified model shows greater bias and subsequently worse coverage for the
slope estimate. In scenarios where the overlap between cohorts is low (i.e. 3 years difference)
207
and the number of cohorts large, the coverage is so poor that utilizing the misspecified model is
not recommended.
7.3 Bias and efficiency of the estimates for between-cohort differences
7.3.1 Between-Cohort variance between overlap conditions
Detecting between-cohort slope variance when the between-cohort differences (i.e. CSR)
are slight will prove difficult. As we saw in the previous chapter, the power to detect between-
cohort differences is poor at low CSRs suggesting that the ALD designed for detecting between-
cohort differences in small samples is antithetical to the goal of developing a generalized
developmental trajectory. As we will see when examining the bias and efficiency of the random
cohort variance estimates, large cohort differences are necessary in order to have good estimation
of these values in the small sample sizes used in this dissertation.
When examining bias as a function of the cohort slope ratio (CSR) we find that for the
high overlap condition designs with a larger number of cohorts generally show less bias at the
same CSR. This makes sense, as we would expect for a given distance between the observed
cohorts to show less bias when there are more cohorts to draw the inference from. Figure 7.6A
displays the bias truncated at 50% to allow for better visulatization. It should be noted that the
curves for the 4- and 5-cohort designs are based on a limited sample size which may impact their
shape. Bias values remain above 10% until a CSR of 0.7 (~225% CSP) for the 3-cohort design
and 1.25 (250% CSP) for the 2-cohort design and are as high as 5,000 to 10,000% at very low
CSRs (<0.1).
Figure 7.6A: Bias in detecting between-cohort slope variance between overlap conditions at
GCR=0.5
208
In the low overlap condition, the association between the number of cohorts and the amount of
bias is reversed such that the 2-cohort design shows less bias than the 3-cohort design. Overall,
the difference in bias between the high and low overlap conditions is greater for the 3-cohort
design than the 2-cohort design.
When evaluating the efficiency of the between-cohort variance, we can only examine the
absolute efficiency, as without cohorts in the SCD, the relative efficiency cannot exist for this
parameter. In Figure 7.6B below, we see that the efficiency decreases (high inefficiency) as the
CSR increases and that designs with a smaller number of cohorts have the best performance in
terms of efficiency for a given CSR. However this between-design loss in efficiency with larger
cohorts decreases as more cohorts are added.
Figure 7.6B: Efficiency of the between-cohort slope variance between overlap conditions at
GCR=0.5
Worse efficiency with higher CSR would be expected, as when the CSR is increasing the
between-cohort variability is also increased. The differences between overlap conditions are
slight in the 2-cohort design with an average fold-difference of 1.04 for the low overlap
condition to have worse efficiency. As the CSR increases these differences become more
pronounced, particularly for the 3-cohort design which shows greater between condition
differences than then 2-cohort design. For the 3-cohort design the average fold-difference
between conditions across the CSRs is 1.13.
7.3.2 Comparisons at high GCR
We've seen in the past that increases to the GCR improve the power, bias, and efficiency
of the fixed effects; however we previously found minimal differences on the power to detect
209
between-cohort differences with increased GCR. Below we’ll examine how bias and efficiency
of the estimate for between-cohort variance differ as function increases to the GCR.
In Figure 7.7A we plot the average bias of the random cohort variance estimate as a
function of the CSR in both the overlap conditions at the low (0.5) and high (0.9) GCR. Unlike
our results for power, we find that improvements to the GCR decrease the bias for the ALDs as
well as decrease the differences between the overlap conditions.
Figure 7.7A: Bias in detecting between-cohort slope variance between overlap conditions at
GCRs of 0.5 and 0.9
We additionally note that while relative ordering of the cohorts in the low overlap condition
when the GCR was 0.5 was reversed and favored the 2-cohort design; at a GCR of 0.9 the 3-
cohort design shows less bias regardless of the overlap condition. For both the 2- and 3-cohort
design, the improvements from the GCR increase at higher levels of the CSR suggesting that
changes to the GCR are most beneficial at larger CSR. Table 7.4 shows the average bias within
and between the GCRs for these various designs as the CSR varies.
210
Table 7.4: Differences in bias between GCRs
Overlap Cn CSR
GCR=0.5
Avg Bias
GCR=0.9
Avg Bias
Fold-Difference
Bias
ΔGCR
High
2
0.25 -1.9 7.9 -0.24
0.5 -21.4 -13.4 1.60
1.0 -13.0 -8.0 1.63
2.0 -4.3 -2.2 1.94
3
0.25 -7.2 2.6 -2.77
0.5 -14.6 -7.0 2.10
1.0 -6.1 -2.6 2.31
2.0 -1.0 0.1 -13.37
Low
2
0.25 -4.8 10.5 -0.45
0.5 -24.7 -13.6 1.82
1.0 -15.4 -7.9 1.95
2.0 -5.2 -2.2 2.43
3
0.25 -28.6 -5.5 5.24
0.5 -33.4 -12.2 2.73
1.0 -19.5 -5.0 3.91
2.0 -7.2 -0.7 9.61
For efficiency, we find that increasing the GCR improves the efficiency of the variance
estimate and that this improvement is larger for when the overlap is low and for designs with a
greater number of cohorts (Figure 7.7B). Additionally, the between overlap condition differences
are reduced at a higher GCR.
Figure 7.7B: Efficiency of the between-cohort slope variance between overlap conditions at
GCRs of 0.5 and 0.9
We also note that as the CSR increases the differences between the high and low GCR are more
pronounced. Table 7.5 below shows how the average difference increases for each design as the
GCR improves.
211
Table 7.5: Differences in efficiency between GCRs
Overlap Cn CSR
GCR=0.5
Avg Eff
GCR=0.9
Avg Eff
Fold-Difference
Eff
ΔGCR
High
2
0.25 0.7 0.6 1.04
0.5 1.2 1.2 1.05
1.0 2.6 2.5 1.07
2.0 5.8 5.4 1.08
3
0.25 1.2 1.1 1.03
0.5 2.3 2.1 1.06
1.0 4.8 4.4 1.08
2.0 11.2 10.4 1.07
Low
2
0.25 0.7 0.7 1.10
0.5 1.3 1.2 1.06
1.0 2.8 2.5 1.12
2.0 6.2 5.5 1.14
3
0.25 1.1 1.1 0.96
0.5 2.4 2.2 1.07
1.0 5.9 4.7 1.24
2.0 14.2 10.9 1.30
7.3.3 Constant effect size.
Use of a constant effect size has the effect of increasing the within-cohort slope
variability, which would be expected to induce more noise in the detection of the between-cohort
differences. In chapter 6 we saw that this decreased the power to detect between-cohort
differences and we would likely expect that this would also increase bias and prove detrimental
to efficiency.
In Figure 7.8A below we plot the bias in the between-cohort slope parameter for designs
using a constant effect size at both high and low overlap. We note that the bias for these designs
increase 2- and 4-fold for the 2- and 3-cohort designs respectively. Moreover, the 3-cohort
design does not reach a bias less than 10% at any CSR. The between overlap condition
differences are minimized when using these constant effect size designs as well.
212
Figure 7.8A: Bias in detecting between-cohort slope variance when using a constant effect size at
GCR=0.5
When evaluating efficiency in Figure 7.8B, we find that the loss in efficiency by using the
constant effect size was, on-average, a 4- and 5-fold increase in inefficiency for the 2- and 3-
cohort designs. Additionally, there were few differences between the overlap conditions when
using the constant effect size.
Figure 7.8B: Efficiency of the between-cohort slope variance when using a constant effect size
at GCR=0.5
Overall these results suggest that use of the constant effect size for detecting between-cohort
differences is not ideal and that the original designs where the within-cohort slope variability is
restricted to be the same is preferable.
213
7.4 Chapter Summary
In this chapter we examined the metrics of bias, estimator efficiency, and coverage
probability when between-cohort differences were present. We explored these for both the
population fixed effect slope as well as for the amount of between-cohort variance.
7.4.1 Bias in the population slope
We found that bias for the fixed effect slope generally increased (as high as 4% in the 5-
cohort design) as the amount of between-cohort variation was increased; however the bias
subsequently decreased after a cohort slope percentage (CSP) of around 100% for most designs
in the high overlap condition. Compared to the SCD bias, the 2- and 3-cohort designs of the high
overlap condition generally showed lower amounts of bias. Conversely in the low overlap
condition, only the 2-cohort design showed acceptable amounts of bias with levels less than the
SCD after a CSP of 100%. Improvements to the GCR were more beneficial for bias in the
presence of between-cohort differences and showed that at high GCR (0.9) all designs regardless
of overlap were within 3% of the true population parameter.
Use of a constant effect size in the high overlap condition resulted in increases to bias
while in the low overlap condition the bias was equivocal for the 2-cohort design and actually
improved in the 3-cohort design.
Using a misspecified model that excludes the modeling of cohort differences showed
greater power in Chapter 6, however we find that the misspecification also results in
substantially larger bias in the slope parameter (~2.5-3.5 fold) regardless of overlap condition.
7.4.2 Estimator efficiency of the population slope
For the efficiency in high overlap conditions, we find few differences between differing
cohort slope ratios (CSRs), though greater between-cohort differences showed less efficiency.
The relative efficiency showed slightly greater differences between CSRs, particularly at larger
sample sizes, suggesting that the efficiency improvement in the SCD was increasingly better than
in the high CSR ALDs. In the low overlap conditions, we saw greater decrements in efficiency
for designs with a greater number of cohorts as well as greater between CSR differences in
efficiency. Increases to the GCR reduced the between design differences in efficiency and
showed the greatest improvements when the CSR was low.
214
Use of the constant effect size results in large losses of efficiency that increase with
greater CSRs for either high or low overlap conditions. This occurs because of the resulting
increase in the between-subject slope standard deviation to maintain the constant effect size.
Use of a misspecified model did not result in changes to the estimator efficiency.
7.4.3 Coverage probability of the population slope
For coverage probability, increases in between-cohort variation improved coverage
resulting in conservative tests with type 1 error less than 0.05 for CSR values greater than 0.3 in
the high overlap condition. At low overlap we still showed increased coverage at higher CSRs,
however there were initial increases in type 1 error (i.e. less coverage) for CSRs below 0.5 in the
3-cohort design. When a high GCR was used, this pattern for the 3-cohort low overlap design
disappeared and all designs showed a greater rate of increase towards higher coverage and less
type 1 error.
The coverage probability is largely unaffected by the use of the constant effect size and
tends towards a value of 0.96 as the CSRs increase, indicating decent type 1 error control.
The misspecified model resulted in very poor coverage, particularly for the 3-cohort
design, with coverage values as low as 0.55 indicating a very high likelihood of making a type 1
error.
7.4.4 Bias in estimating between-cohort variance
Bias in estimating the amount of between-cohort variability was high until a CSP of
~250% suggesting that large cohort differences are needed in order to accurately estimate the
amount of between-cohort variation. Increases to the GCR improved the bias, however values
were still large for CSRs < 0.5. Use of a constant effect size results in a 2-4 fold increase in bias
of the variance parameter, suggesting that if cohorts are to be simulated using a constant effect
size, much larger between-cohort differences will be needed.
7.4.5 Estimator efficiency of the between-cohort variance
As the CSR increased, the efficiency of the variance parameter decreased almost linearly.
Increases to the GCR show minimal improvements for efficiency generally ranging from a 1.03-
1.2 fold improvement. Use of the constant effect size resulted in a 4-5 fold decrease in
efficiency.
215
7.5 Chapter References
1. Agresti, A., Caffo, B., & Ohman-Strickland, P. (2004). Examples in which misspecification
of a random effects distribution reduces efficiency, and possible remedies. Computational
Statistics & Data Analysis, 47(3), 639-653.
2. McCulloch, C. E., & Neuhaus, J. M. (2011). Misspecifying the shape of a random effects
distribution: why getting it wrong may not matter. Statistical Science, 388-402.
216
Chapter 8
Age trajectories of marijuana and cigarette use in the National Longitudinal Youth Survey
(NLSY) using the ALD mixed model
The previous chapters have demonstrated the utility of the accelerated designs using
simulated data. In this chapter we will explore the use of the accelerated design using real data
from a nationally representative survey to demonstrate 1) how the modeling of cohort differences
using the ALD mixed model improves estimation; 2) how cohort identification can impact
interpretation; and 3) how incorporating period effects and cohort contextual variables can alter
trajectory estimation.
8.1 Introduction
The study of age-related development is of primary concern across fields in the social and
behavioral sciences. Understanding when behaviors or traits emerge and subside across the life-
span is important for the development of programs designed to screen for, prevent, or intervene
on targeted outcomes. In psychology, age-related development has been used to study changes in
personality (Baltes & Nesselroade, 1972), intellect (Schaie, Willis, & Pennak, 2005), and
substance use (O’Malley, Bachman, & Johnston, 1984), among others. Schaie (1965) proposed a
general developmental model which conceptualized developmental models based on the
identification of age, time period, and cohort effects. As Schaie noted, all developmental models
are special cases of the age-period-cohort design whereby researchers attempt to disentangle the
influences of age (change due to maturation), period (change due to measurement at a particular
time), and cohort (change due to generational differences). While longitudinal designs are the
gold standard among researchers studying within-subject maturational changes, these designs are
often conducted using a single cohort followed for a fixed period of time. These single-cohort
longitudinal designs ignore the influence of cohorts and confound the effects of age and period.
Alternately researchers have generated age-based trajectories using cross-sectional designs
which ignore period and confound the effects of age and cohort. Bell (1953) proposed the use of
an Accelerated Longitudinal Design (ALD) as a means to generate age-based trajectories over a
shortened duration which would address both within-subject maturational changes and between-
subject cohort differences. These designs have in the past gone under the moniker of cross-
sequential (Farrington, 1991) or cohort-sequential (Baltes & Nesselroade, 1979). In the ALD
217
multiple birth-cohorts are studied simultaneously in a longitudinal fashion with overlap in the
age distributions between the cohorts. In this manner the same age span as in a traditional
longitudinal design may be studied while reducing the number of measurements per participant,
the overall study duration, and study costs. These designs also allow for the modeling of
between-cohort differences, which are important for researchers interested in developing age-
based trajectories that generalize to multiple cohorts. For outcomes such as substance use which
have historically been shown to be sensitive to generational changes (O'Malley, Bachman, &
Johnston,1984) the failure to account for these differences can impact the fit of the
developmental trajectories as well as limit the ability to understand how developmental trends
change across generations. In this paper we will examine how cohort influences can impact
within-subject age-related trajectories of tobacco and marijuana use in an ALD.
Over the past 50 years, concentrated efforts in tobacco control have resulted in the lowest
levels of cigarette use (17.8%) in the United States since the government began tracking such use
(http://www.cdc.gov/media/releases/2014/p1126-adult-smoking.html). At the same time, there
has been a relative increase in marijuana use as the social and legal landscape surrounding
marijuana has shifted towards greater acceptance of marijuana use (Grucza, Agrawal, Krauss,
Cavazos-Rehg, & Bierut, 2016; Schauer, Berg, Kegler, Donovan, & Windle, 2015; Hasin et al.,
2015). Among youth, the decline in cigarette use and increase in marijuana use is particularly
pronounced, with daily marijuana use among college students at an all-time high (Johnston,
O’Malley, Bachman, Schulenberg, & Meich, 2016). The decline in tobacco use has been credited
to increased education and public policy initiatives while increased marijuana use has been
attributed to shifting societal beliefs on the danger marijuana poses (Johnston et al., 2016).
Despite this, few studies of the US population have examined either with respect to generational
differences in marijuana and tobacco use. A study by Keyes, Schulenberg, and O'Malley, et al.
(2011) showed that for marijuana use, cohort-specific social norms influence an individual's use
of marijuana regardless of their own attitudes towards marijuana. Moreover Miech, Johnston,
and O'Malley et al. (2015) showed a California specific increase in adolescent marijuana use
which trended with greater acceptance following decriminalization in 2010. Research on
marijuana use in adults has found that recent increases in marijuana use were largely driven by
period effects suggesting that cultural shifts in marijuana use occurred equally across all
generations (Meich & Koester, 2012; Kerr, Lui, & Ye, 2018). This conflicts with estimates of
218
adolescent marijuana use which have instead shown cohort effects (Keyes, Schulenberg,
O'Malley, et al., 2011) with younger cohorts showing greater marijuana use. Tobacco research in
adolescents has also shown a cohort effect, with younger cohorts proving less likely to ever use
tobacco (Chen, Li, Unger, Liu, & Johnson, 2003). The lack of consensus on the role of period
and cohort effects in tobacco and marijuana use trends is partially due to a paucity of research on
these topics in the US population. Moreover, data from these studies come from cross-sectional
designs which further confound between-person differences with the effects of age, period, and
cohort. The aforementioned accelerated design can used to disentangle these confounds, however
while numerous studies have utilized the accelerated designs in assessing marijuana and tobacco
age trends (Duncan, Tildesley, Duncan, & Hops, 1995; Duncan, Duncan, & Stryker, 2006;
Bernat, Erickson, Widome, Perry, & Forster, 2008; Terry-McElrath & O'Malley, 2011; Mathur,
Erickson, Stigler, Forster, & Finnegan, 2013) few have addressed the role of cohorts in
influencing the age-based trajectories. The failure of these studies to model cohort differences
has likely induced bias into the global developmental trajectories (as shown in Chapter 7) and
moreover would result in attributing generational differences to the estimates of maturational
change. However, some studies have utilized the accelerated design to describe cohort
differences in substance use. Burns et al., 2017 showed significant cohort influences for both
tobacco and marijuana use (days using per week) for participants born between 1980 and 1991.
Using cohort fixed effects, older cohorts showed greater marijuana and tobacco use and for
tobacco showed an age-by-cohort interaction indicating steeper age slopes for the younger
cohorts (i.e. earlier tobacco involvement). Jager, Schulenberg, O’Malley, & Bachman (2013)
who examined recent (birth-cohorts 1980-1986) as well as more distant cohorts (1958-1973)
found that there were historical (period effects) declines in marijuana use coupled with an age-
by-cohort interaction in recent birth cohorts indicating a faster growth rate for the youngest
cohorts (i.e. earlier marijuana use). Because there are so few studies examining the role of cohort
influences on substance use development, the goal of this paper is to provide the reader with a
knowledge about the analytic issues concerning cohort modeling as well as contribute to the
body of knowledge on the cohort effects in substance use involvement using an ALD. To this
end, we use data from the National Longitudinal Survey of Youth, a nationally representative
multicohort longitudinal design from adolescence to early adulthood (ages 12-32), in order to
assess the role of cohort influences in the developmental trajectories of tobacco and marijuana
219
use. We examine how the modeling of cohorts improves model fit for these developmental
trajectories as well as discuss the considerations surrounding the modeling of cohort influences.
8.2 Methods
8.2.1 Participants
The National Longitudinal Survey of Youth (NLSY) 1997 cohort is an ongoing study
containing 8,983 adolescents assessed annually from 1997, with initial ages of measurement
ranging from 12 to 18 years (born 1980-1984). Though not typically thought of as an accelerated
design, the NLSY has all of the characteristics of an ALD with the presence of multiple birth
cohorts studied longitudinally in the same periods of time. The NLSY is comprised of two
samples, the original sample of 6,747 persons designed to be representative of the US population
in 1997 and an oversample of 2,236 Hispanic or black Americans. For this paper we have
restricted our analyses to the original sample of 6,748 persons. The NLSY employs a complex
survey design with sample weighting and variance correction for participants collected within a
given strata and primary sampling unit. For this paper we will not utilize the sampling weights or
the variance corrections. Ignoring these components was necessary in order to avoid the
complexities that arise when utilizing hierarchical groups (e.g. cohort) that are not nested within
the same primary sampling unit in a multilevel model. This will have the effect of limiting the
generalizability of these results to this specific sample rather than the population that the NLSY
intends to represent. Lastly, the data presented in this paper is limited to the first 15 rounds
covering the years 1997 through 2011 corresponding to an age span from 12 to 32 due to the
availability of the data at the time of aggregation in July 2017.
8.2.2 Measures
The NLSY measures are reported by the participant to an interviewer either in-person or
via telephone interview. For sensitive questions, such as those regarding drug use or criminal
history, participants recorded their own responses on a laptop computer when interviewed in-
person. The majority of interviews occurred in-person (97% in 1997; 88% in 2011).
Parent/caregiver interviews were also conducted in 1997.
Questions concerning marijuana use and cigarette smoking were asked pertaining to any
use in the past 30 days and any use in the past year and are reported are binary (yes/no) variables.
Use in the past 30 days can be considered a proxy for habitual use while use in the past year is
likely more indicative of occasional use or experimentation. Questions regarding marijuana were
220
agnostic with regard to method of ingestion (i.e. joints, pipes/water pipe, vaping, edibles etc.)
whereas questions on tobacco use were explicitly concerned with cigarette smoking and did not
capture information on other forms of nicotine use. An additional question was asked with
regards to the negative health perceptions of smoking where respondents reported whether or not
they believed smoking a pack or more of cigarettes per day increased the risk of heart disease.
Demographic variables such as age, sex, and race were recorded. Information on
schooling such as: the current level of schooling, the highest level of schooling completed, the
number of grades skipped, the number of grades repeated, and parent reported emotional or
learning problems for the participant were also collected. The Armed Services Vocational
Aptitude Battery (ASVAB) was also administered as a general measure of cognitive ability in
1999 and is reported as a percentile score with higher scores indicating better performance.
Because of the known associations of antisocial behavior with substance use we
additionally examined if the participant had ever been arrested. A delinquency index score was
also assessed in years 1997-2000 based on the participants reporting (yes/no) of 10 items
concerning delinquent/criminal acts such as gang membership, carrying a gun, theft, destruction
of property, selling drugs, and arrest history (prior to 2000). Delinquency scores ranged from 0 to
10 with higher numbers indicating having performed more delinquent acts. The maximum score
across the years was used in this analysis.
8.2.3 Methods of Analysis for ALDs
The linear dependency between the parameters of age, period, and cohort (i.e.
Age=Period-Cohort) have resulted in various solutions to the age-period-cohort identification
problem. Historically researchers have recommended the exclusion of one of these parameters
(usually period) (Baltes, 1968; Baltes & Nesselroade, 1970; Baltes, 1972) or alternately using all
3 parameters but setting of an arbitrary constraint between two levels of one of the factors (e.g.
Period1=Period2; Age1=Age2, etc.) (Mason, Winsborough, and Poole, 1973) to allow for
estimation. Other approaches such as nonlinear transformation of one of the factors (Mason &
Fienberg, 1985) as well as the use of proxy variables (Smith, Mason, & Fienberg, 1982) have
been explored; though each carry drawbacks (Mason & Wolfinger, 2001). Yang and Land (2006;
2008) as well as O'Brien, Hudson, and Stockard (2008) recognized the utility of a mixed model
approach to the previous under identification problems in age-period-cohort modeling. Yang and
Land (2016) provide an extension of these models to the accelerated design which we will term
221
the Accelerated Longitudinal Design Mixed Model (ALDMM) which also account for the
modeling of within-person changes that are present in ALDs but not in typical age-period-cohort
designs. Specification of the ALDMM is provided in equation 1 below:
yijk = β0 + β1xijk + υ0j + υ0k + υ1j xijk + υ1k xijk+ εijk eq. 1
The outcome (y) for the i
th
observation, j
th
subject, and k
th
cohort is a function of the average
starting level (β0) and rate of change (β1) for age (xijk) which are fit as fixed effects. The random
error variance (εijk) is assumed to be Gaussian distributed. Between-cohort (υ0k) and between-
subject (υ0j) random variation in the intercept is modeled as well as random slopes for the
between-cohort (υ1k) and between-subject variability (υ1j) in the rate of change. For this
specification we will assume that the within-subject effects (level 3) are nested within cohorts
(level 2). The ALD mixed model provides a flexible framework for analyzing ALDs and can be
extended to include fixed or random period effects as well as modified to include nonlinear terms
for the age effects. Moreover, the ALDMM can be extended for generalized linear mixed models
in order to allow the modeling of non-Gaussian outcomes such as the binary substance use
outcomes in this paper.
The modeling of cohort effects as random allows for the breaking of the linear
dependency between age, period, and cohort when both age and period are included as fixed
effects in the model. Alternately we could model period effects as random and cohort effects as
fixed. The choice of modeling cohort effects as fixed or random is one that should be aligned
with the theory behind the model generation and is explained in Yang & Land (2008). When
cohort effects are specified as fixed, it is assumed that the indicator variables for cohorts explain
all of the between-cohort variability. As a result, if the goal of the study is to also test for the
effects of variables known to vary between-cohorts, then the use of fixed cohort effects may not
be appropriate. Additionally, this negates the ability to include contextual variables that explain
cohort differences which may result in underestimation of the standard errors. In instances where
there are many cohorts, the inclusion of cohort fixed effects will expend greater degrees of
freedom which will hamper statistical significance versus the modeling of a single parameter
when specifying the effects as random. However, the inclusion of fixed cohort effects facilitates
testing of age-by-cohort interactions where marginal values at a specific age can be tested
between cohorts when there is overlap in the age distributions between the cohorts. For the
random effects specification, it is assumed that the cohort effects result as a sample from a
222
population and that they are uncorrelated with the fixed effects. In instances where there are a
small number of cohorts (k <10) the ability to test for either of these assumptions may be limited.
Despite this, Snijders & Bosker (1999) (as reiterated in Yang & Land 2008) found that with
sufficient sample sizes in each cohort (nk > 100) the choice between using a fixed or random
effects specification does not differ with regard to model fit. The use of random effects
specification also allows for testing of the age-by-cohort interaction through likelihood ratio tests
of the random coefficient for age (υ1k) when using maximum likelihood estimation. Though use
of cohort random effects precludes the testing of marginal estimates for age-specific between-
cohort differences using traditional Wald tests in the age-by-cohort interaction, this testing can
be accomplished through use of nonparametric bootstrapping. Lastly, though the incorporation of
period or cohort as random effects solves the aforementioned model identification problems, the
mixed model approach (or any approach) does not necessarily result in unbiased estimates of
these effects. Thus, for the researcher utilizing these models, it should be acknowledged that the
choice of which variables to model as random effects will impact the estimates of the fixed
effects.
8.2.5 Analytic Strategy
In modeling cohort differences, the choice of cohort can have implications for model fit
as well as the interpretation of cohort differences. While traditionally cohorts are thought of as
birth-cohorts, they can also represent any grouping where individuals share a common set of
experiences or exposures that are time dependent and believed to make them homogenous on an
outcome of interest. In this paper we will examine cohort membership in two ways 1) using
traditional birth cohorts and 2) using grade in school at study entry. Though there is age variation
within a given grade, it may be reasonable to assume that those belonging to the same grade
share a common set of experiences and peers which make them more alike in terms of substance
use.
Between cohort differences on demographic and antisocial behavior variables were
assessed using chi-square tests for categorical variables and analysis of variance for continuous
variables. For modeling of the developmental trajectories, we first observed the functional form
for the age-substance use relationships using loess plots. We then tested a series of nested mixed
models with linear, quadratic, cubic, and quartic functional forms for the fixed effect age
association. Once the functional form was identified for the age-outcome association we then
223
tested for the presence of cohort-effects by comparing models with and without cohort random
intercepts and slopes based on equation 1 above using a likelihood ratio test. In these models
with polynomial age relationships, only the linear slope for age was fitted as a random
coefficient nested within cohort and subject random intercepts. Independent and unstructured
random effect covariance structures were additionally tested using likelihood ratio tests. We
additionally note the consequences for misspecification (i.e. excluding cohort differences) based
on the difference of the fixed effect age parameter estimates between models that
include/exclude random cohort effects as well as compare fit indices such as the Bayesian
Information Criteria (BIC). Using the fixed and random effects from these models we describe
the age and proportion at peak substance use for each cohort in tables. Lastly, we incorporate
models with fixed period effects and contextual variables that explain between-cohort
differences to draw inferences about historical trends and demonstrate how adding contextual
variables can improve model fit.
8.3 Results
8.3.1 Derivation of Sample Size
Of the 6,747 persons part of the original sample, 77 participants were eliminated from the
analysis for not having information on their school grade cohort membership (N=3) or
alternately belonging to a school grade cohort with too few members for meaningful estimation
(N=74) (e.g. grades 3-5 and 13-14). Of the remaining 6,670 participants only 1 provided no
information on either marijuana or cigarette use at any time point resulting in an analysis sample
size of 6,669 persons.
8.3.2 Sample demographics and between-cohort demographic differences
The sample demographics are described in Table 1. The sample was balance on sex (51%
male), predominantly Non-Black/Non-Hispanic (69%) the majority of which (95%) consisted of
white participants. Average age at the first measurement in 1997 was ~ 15 years with equal
distribution across the 5 birth cohorts from 1980 through 1984 (~20% each). When examining
cohort membership as a function of school grade, the majority were in 7
th
through 10
th
grade at
the time of the first measurement.
Between school grade cohort differences for demographic and antisocial behavior
variables are shown in Table 2. Most notably we find that the older cohorts (i.e. 12
th
grade) are
more likely to be male, go on to have a greater amount of schooling, are more likely to have
224
skipped a grade, and score higher on the ASVAB test. Conversely those in the youngest grade
cohort (6
th
grade) were more likely to have a learning disability or emotional problems, to repeat
a grade, get arrested, and scored higher on the delinquency scale. A large component of these
differences is due to how we created the school grade cohorts. Participants in the NLSY would
typically be in 7
th
thru 11
th
grade, thus those in 6
th
grade at study start are more likely to be those
students who were held-back and those in the 12
th
grade being students who were more advanced
for their age. Each of these, as we shall see, can have consequences for out substance use
outcomes.
Between birth cohort differences for demographic and antisocial behavior variables are
shown in Table 3. In comparison to our school grade cohorts we find fewer between-cohort
differences. This is because within each of these cohorts we’d expect participants to be a random
sample as there are no external factors that would influence cohort membership in this case.
Interestingly we do note that the younger cohorts (e.g. 1984) were less likely to repeat a grade
and scored lower on delinquency, though this did not result in a lower probability of arrest.
8.3.3 Model development using cohort random effects
We initially explored the functional form for the age-substance use associations by using
loess plots of the outcomes. In Figure 1 we find the plots for cigarette (panels A & B) and
marijuana (panels C & D) use in the past year separated by school grade cohort (panels A & C)
and birth cohort (panels B & D) membership. Figure 2 shows a similar plot for past 30-day use.
A dashed line is used to represent the overall trajectory irrespective of cohort membership. It
should be noted that these loess plots are not modeling within-subject changes as there is some
measure of missing data. In supplementary Table S1 we find the average number of observations
per subject and minimum/maximum number of observations for the substance use outcomes by
cohort membership. In these plots we notice that a linear model is not likely to fit the data well
thus for model fitting we explored the use of quadratic, cubic, and quartic age functions.
The ALD mixed models were fit using maximum likelihood estimation with a series of
nested models based on testing for model fit using likelihood ratio tests. After identifying the
functional form for age, a cohort random intercept and then a random coefficient for linear age
slope were added to the model. Random effects residual structures with (unstructured) and
without (independent) covariance were also tested using likelihood ratio tests. In these models,
no covariates or fixed period effects were included. In Table 4 we display the functional form for
225
age, the cohort random effects structure, model fit indices, and likelihood ratio test for
comparison to a model that excludes random cohort effects and only includes a random intercept
for person and random linear coefficient for age with unstructured covariance. For past year
substance use cubic (cigarettes) and quartic (marijuana) models of age were employed. For both,
significant age-by-cohort interactions were found thus all of the models expressed a random
coefficient for age. Past year cigarette use showed greater differences from the model without
cohort differences than marijuana use based on the likelihood ratio test which suggests that the
cohort effects are stronger for past year cigarette use. Additionally, the models using the school
grade cohort show worse fit than those using the birth cohort information based on BIC. For past
30-day substance a quartic model for age was found to fit best however there was little evidence
of cohort effects with the exception of weak evidence for birth cohort level differences in
cigarette use.
8.3.3 Age trends for substance use using cohort random effects
Figures 3 and 4 show past year and past 30-day use for each cohort (where applicable)
based on the models. Included in these figures are also the global trajectory which represents the
fixed effects estimates from the cohort model (dashed black line) as well as the marginal
estimates for a model without cohort differences incorporated (solid grey line). Additionally,
Table 5 details the peak use and age at peak use for marijuana and tobacco by cohort
membership which were solved for based on the first derivative of the mixed model equations.
Cigarette Use, Past Year. In Figure 3, panel A past year cigarette use is plotted as a
function of cubic age for each of the school grade cohorts. Peak use was greatest in the oldest
cohorts (i.e. 11
th
& 12
th
grade) and ranged from 45-49 percent across the cohorts (Table 5).
Moreover, the older cohorts reached peak use at younger ages (12
th
grade, age=19) while the
younger cohorts delayed cigarette use (6
th
grade, age=24). Though the younger cohorts showed a
delay in cigarette use as well as lower levers of peak use, the younger cohorts persist with higher
smoking rates in early adulthood with and 8% difference between the 6
th
and 12
th
grade cohorts
by the age of 30. Part of this persistence of higher smoking rates into older age may have to do
with the characteristics of the younger cohort, such that they were more likely to shows signs of
delinquency. In panel B we observe the same past year cigarette use data, but now with using the
birth cohorts. Based on the peak use data (Table 5) a similar trend is noticed where the older
cohorts show higher peak use (~52%) and younger ages at peak use (1980, age=19.5). Between-
226
cohort differences are also greater when using the birth cohorts, particularly at younger ages. For
example, at age 16 there is a 15% difference in the past year smoking rates between the youngest
and oldest cohorts. These differences attenuate in early adulthood with no greater than a 4-5%
difference in cigarette use between the cohorts after age 25. For both the school grade and birth
cohorts the global trajectory (dashed line) which represents the marginal estimates from the fixed
effects alone is roughly the same, though fit indices in Table 4 indicate model using birth cohorts
provides a better fit. Based on this global trajectory, from the age of 12 approximately 30% of
participates had smoked a cigarette in the past year with a rise in these rates peaking at 47% at
age 21 and then decreasing as the participants aged. By the end of the age span under study here
(age 32) past year cigarette use had dropped to 36%. Also plotted is the trajectory that would be
estimated if cohort effects were ignored in the model (solid grey line). If these models had been
misspecified and excluded cohort effects, the global trajectory would have indicated a peak use
of 48.5% at age 23 with a much slighter decrease at older ages. Indeed, at the older ages the
discrepancies between the two models are greater than 5% in terms of their marginal predictions.
Marijuana Use, Past Year. In Figure 3, panel C past year marijuana use is plotted as a
function of quartic age for each of the school grade cohorts. Though peak use was on-average
greater in the oldest cohorts (i.e. 9
th
-12
th
grades) ranging from 28-31percent across these cohorts,
the between-cohort variability was slight with even the younger cohorts showing usage as high
as 26-27 percent (Table 5). The older cohorts generally reached peak use at a younger age (~18),
however even the age differences at peak use between the cohorts were within one year of each
other, indicating only slight differences in the age distributions for marijuana use. In early
adulthood marijuana use decreased for all of the cohorts and most of the cohorts showed similar
levels of use. In panel D we observe the past year marijuana use data for the birth cohorts. A
more pronounced trend for the cohorts is noticed with respect to the age at peak use and amount
such that the oldest cohort used marijuana earlier (1 year) and with a greater proportion of people
using it (27% vs 31%). Between-cohort differences were also greater when using the birth
cohorts, particularly during late adolescence and early adulthood. While the oldest cohorts
showed the greatest marijuana use during late adolescence, in early adulthood the youngest
cohorts showed the highest levels of use with ~21% usage in the 1984 cohort at age 28 compared
to 16% in the 1980 cohort. For both the school grade and birth cohorts the global trajectory
(dashed line) showed low levels of use prior to age 14 (<15%) with sharp increases in marijuana
227
use peaking at around 18 (28%) and then more gradually declining with advancing age until
reaching an equilibrium at around 18% after age 26. Also plotted is the trajectory that would be
estimated if cohort effects were ignored in the model (solid grey line). This estimate did not
diverge as greatly from the global trajectory that incorporated the cohort effects as it did when
modeling cigarette usage as it provides a close approximation to the global trajectory.
In comparison between the past year cigarette and marijuana use data, cigarette use
occurred in greater amounts than marijuana. However peak marijuana use occurred at younger
ages than did cigarette smoking. Despite this, past year smoking was much higher at younger
ages suggesting that tobacco initiation is occurring prior to age 12 for at least 20% of the sample.
In contrast, marijuana initiation appears to be very much an adolescent onset phenomenon.
Cigarette Use, Past 30-Days. In Figure 4, panels A and B past 30-day cigarette use was
plotted as a function of quartic age. There were no between-cohort differences detected when
using school grade cohorts (panel A; Table 4). When modeling birth cohorts (panel B), a
significant random intercept for cohort was modeled (Table 4). Age at peak past 30-day cigarette
use was ~22 years across all birth cohorts as a result of the absence of an age-by-cohort
interaction. Peak use increased across the cohorts with the youngest cohort having 39.9% use and
the oldest cohort 43% past 30-day use. The global trajectory showed low habitual use prior to
age 14 (<15%) increasing until peak use at 22 (43%) followed by a gradual decrease to
equilibrium (32%) at around age 30. The model that did not include cohort effects provided a
close approximation to the cohort random intercept model depicted in Figure 4, panel B.
Marijuana Use, Past 30-Days. In Figure 4, panels C and D past 30-day marijuana use
was plotted as a function of quartic age. There were no between-cohort differences detected
when using either school grade cohorts (panel C) or birth cohorts (panel D; Table 4). The global
trajectory for past 30-day use showed low use prior to age 14 (<5%) with rapid increases until
peak use at around age 20 (~19%) followed by a decrease into early adulthood at around 12.5%
by age 29. Though an increase in use is modeled at ages 30 to 32, this is not necessarily a true
increase as these data exists at the extreme of the age distribution and may exhibit high leverage
on the overall developmental trajectory.
8.3.4 Historical trends for substance use using period fixed effects
Historical period effects were incorporated into the previous random cohort models with
and without the covariates of sex and delinquency score which were shown to vary between
228
cohorts and were available for all participants. Period effects were modeled as categorical to
allow for the estimation of nonlinear period trends. An omnibus test for period effects was
conducted using a likelihood ratio test, the results of which can be found in Table 6. Overall,
models that included the covariate adjustment showed better fit based on BIC.
Cigarette Use, Past Year. For both the school grade and birth cohort random effect
models, a significant fixed period effect was found (Table 6). A plot of the change in the
probabilities of past year cigarette referent to 1997 is found in Figure 5, panel A. These also
represent the coefficients for period, converted to probabilities, referent to 1997. For the school
grade cohort random effects model we find a sharp negative historical trend in past year usage I
unadjusted models indicating historical declines in cigarette smoking over the years. Conversely
for the unadjusted birth-cohort random effects model we find that while smoking decreased from
1997 through 2001, afterward there was a slight increase in smoking rising back to 1997 levels
by 2011. The adjusted models provided the best fit based on BIC in Table 6 and the historical
trends for the school grade and birth-cohort models converged on a similar solution. In the
adjusted models, smoking remained below 1997 levels (by 15%), though saw light increases
after 2005. The discrepancy between the unadjusted models highlights the importance of
incorporating contextual variables such as sex and delinquency while also examining model fit
statistics. Based on model fit statistics (i.e. BIC), the birth-cohort model provides a better fit for
the data.
Marijuana Use, Past Year. Significant period effects were also found for past year
marijuana use in both unadjusted and adjusted models for the school grade and birth cohort
models. In Figure 5, panel B a plot of the change in probabilities due to historical period effects
referent to 1997 is displayed. In the unadjusted models, both cohort models show nearly identical
period effects which indicate a gradual rise in marijuana use since 2004 but remain below 1997
levels in 2011. When adjusted for the sex and delinquency covariates, this trend increases,
especially for the school grade cohort model which shows a near linear increase of 3-4% per
year. In contrast the birth-cohort model shows a slighter increase of 1-2% per year after 2003.
Cigarette Use, Past 30-Days. As we recall from the section on cohort effects, when using
the school grade cohorts, no cohort effect was found while only a cohort random intercept was
found when using the birth cohorts. In modeling period effects, there was no significant effect of
historical period when modeling quartic age with person random effects; yet when modeled with
229
birth year random effects the historical period effect was significant (Table 6). In Figure 6, panel
A the period effects for the birth year cohort model are shown which indicate significant positive
historical effects of nearly 4% per year in the unadjusted and adjusted models. This discrepancy
in the finding of period effects when a random birth cohort effect is specified but not when it is
left out shows how the fixed effects of these mixed models are sensitive to the random effects
specifications.
Marijuana Use, Past 30-Days. For past 30-day marijuana use a model without any cohort
effects was selected as the best fitting from the previous section examining cohort effects. When
modeling fixed period effects, results indicated significant historical effects (Table 6) showing an
increase in marijuana use over time (Figure 6, panel B). Overall the period effects indicate a 4%
change from 1997 by 2001 and then a leveling of usage until around 2003 after which habitual
marijuana use increased at a rate of 1% per year. After adjusting for covariates these effects
increased to around 1.5% per year.
8.4 Discussion
In this study we demonstrated the use of a mixed model approach for assessing age
related development in accelerated longitudinal designs and showed how this modeling can be
used to identify age, period, and cohort effects in cigarette and marijuana use. In our study we
assessed changes is occasional (past year) and habitual (past 30-days) use from the ages of 12 to
32 in a nationally representative sample.
In particular we showed that incorporating cohort information is important for modeling
substance use outcomes as these have been shown to vary across temporal groups (O’Malley,
Bachman, & Johnston, 1984). We chose to model cohort effects in two ways using 1) school
grade and 2) birth cohorts. Our rationale was to show how choice of what comprises a cohort is
somewhat arbitrary and that this choice can have differential impact on the model fit as well as
the estimation of the fixed effects. In selecting school grade, we showed that while grouping
individuals based on their peer group membership may seem logical for studying substance use,
this type of cohort designation is confounded with nonrandom factors (e.g. delinquency) that
make it more likely for an individual to belong to a particular grade as well as be more
susceptible to substance use. This should serve as a caution to researchers looking to create
cohorts based on a common exposure or experience, as if the participant is able to select
themselves into the cohort membership than the associations derived from the cohorts are likely
230
to be confounded by other factors. For most of the models, using birth cohorts provided for better
model fit, however it should be apparent that birth cohort, though common in this type of
modeling, is also an arbitrary distinction. For example, an individual born on December 31
st
1980 would be in a different birth cohort than an individual born on New Year’s Day of 1981;
yet both are likely more similar to each other in terms of how they experience the social world
than they are to other members within their own birth cohorts. Indeed O’Brien (2015) discusses
the modeling of cohorts as a continuum in order to capture cohorts on a spectrum, however this
requires very large sample sizes. Instead we advocate for the investigation of contextual
variables that may explain cohort differences and their use as fixed effects to serve as surrogates
for within-cohort heterogeneity. Variables used as surrogates should be captured on a continuum
and have sufficient within-cohort variability to avoid complications of violating the assumptions
of independence between the random cohort effects and fixed effects. One potential marker of an
appropriate contextual variable would be the reduction in the cohort variance component as
suggested by O’Brien (2014).
In modeling the age trajectories, we found cubic age relationships for past year cigarette
use and quartic relationships for past year marijuana use and past 30-day cigarette and marijuana
use. This functional form for age was not based on theory but rather based on finding the best
fitting model. While we could have explored even higher order polynomials, however in the
absence of observed trends that could justify their use, these would risk overfitting. Prior studies
using ALDs for substance use have modeled linear (Duncan, Duncan, & Stryker, 2006; Jager et
al., 2013; Mathur, Erickson, Stigler, Forster, & Finnegan, 2013) associations in adolescence
(<18), quadratic (Burns et al., 2017) relationships from adolescence to early adulthood, as well as
nonlinear models with categorical age (Duncan, Tildesley, Duncan, & Hops, 1995) in
adolescence. However, few studies with the exception of Burns et al. (2017) have examined
long-term age trends in substance use using within-subject data. In accordance with Burns et al.
(2017) and cross-sectional data from Monitoring the Future (Johnston et al., 2016) participants in
the NLSY show peak use for cigarettes and marijuana in their late teens and early 20’s.
However, unlike these studies, we found much higher levels for annual and 30-day prevalence of
cigarette use (47% & 41%) at these ages compared to an annual smoking prevalence of 35-38%
for these ages and corresponding time-periods in Monitoring the Future (MTF). This may reflect
differences in the sample characteristics, though both purport to be nationally representative.
231
Alternately the NLSY sample may be capturing individuals who were part of the increased
cohort trends in the early to mid 1990’s which saw rapid increases in youth smoking (Johnston et
al., 2016). For marijuana use, in the NLSY peak annual and 30-day prevalence occurred at
younger ages than for cigarettes (~18 years) and at levels (~28% annual; 19% 30-day) that were
below that of the MTF which ranged from 36-38% for annual prevalence and 21-24% for 30-day
prevalence. The findings that the levels of cigarette and marijuana use decrease into early
adulthood and that the decreases in marijuana use is more pronounced is also consistent with
Burns et al. and the MTF data which show similar patterns.
In exploring cohort differences, we provided the reader with an understanding of a
methodology for analyzing accelerated designs; the ALD mixed model. We used this
methodology to test for cohort level differences and age-by-cohort interactions as well as utilized
the random effects to display the marginal cohort-specific estimates from these models. We
additionally provided an easily interpreted method for reporting cohort differences based on the
age and amount of substance used when at peak use. In the NLSY data we found significant
cohort differences in past year cigarette and marijuana use which showed that younger cohorts
were more likely to delay use and use at lower levels than older cohorts. However, with this
delay and lower levels of use came a slowing of the rate of decline in early adulthood showing
higher levels of use compared to the older cohorts. The persistence of these effects into early
adulthood for younger cohorts may be reflective of the ongoing societal changes where adult
responsibilities are delayed thereby allowing for greater substance use later in life (Jager et al.,
2013). This finding of cohort differences aligns with the findings from Burns et al. (2017) for
tobacco use, but not for marijuana use which in their data showed only level differences and not
differences in rate. However, Keyes, Schulenberg, & O’Malley (2011) also found a cohort effect
for past year marijuana use from cross-sectional data relating to the low levels of disapproval of
marijuana use. Chen et al. (2003) also found a cohort effect in cross-sectional data for tobacco
use which align with our data suggesting that younger cohorts are less likely to use tobacco in
adolescence. Unfortunately, Chen et al. was not able to examine these trends past the age of 17.
Though we found ample evidence to suggest cohort differences in annual prevalence,
when examining habitual use (past 30-days) we did not detect cohort effects with the exception
of level differences in past 30-day cigarette use between birth cohorts. The general absence of
cohort effects on habitual use suggests that while generational differences have resulted in
232
delayed cigarette and marijuana initiation and persistent experimentation later in adulthood, this
has not translated to changes in the number of habitual users.
Most importantly, we demonstrated that the failure to account for cohort effects in these
models results in bias in the estimates of the developmental trajectory. In our model of cigarette
use this resulted in estimations that were off by at least 5 points in the global trajectory. Burns et
al. (2017) also highlights the import of formally testing for cohort differences and incorporating
them into the model estimation. The use of the global trajectory in the presence of cohort
differences in an ALD is controversial, as researchers in the past have suggested that models
should be stratified when differences are detected to allow for accurate fixed effects
interpretation (Glenn, 2005). We would instead suggest that the researcher should decide for
themselves when cohort differences are so great that an aggregated global trajectory is no longer
appropriate to describe the developmental pattern. Moreover, the fitting of random cohort effects
does not preclude the researcher from estimating cohort-specific age trajectories (as we did here)
and indeed these can be testing against each other within the context of the ALD mixed model
through bootstrapping.
When examining period effects, we showed positive effects on both past year and past
30-day cigarette and marijuana use which suggests more widespread social changes in these
behaviors that occur across cohorts. The period effects for marijuana use are consistent with the
cross-sectional findings from the MTF study (Johnston et al., 2016) as well as in Meich and
Koester (2012) and Kerr, Lui, and Ye (2018). These are largely postulated to be in response to
the changing social norms and increased permissiveness surrounding marijuana use (Johnston et
al., 2016; Bachman, Johnston, & O’Malley, 1998; Meich & Koester, 2012). However, for
cigarette use, a positive secular trend is the opposite of what would be expected given increased
awareness of the harms of tobacco exposure and a seemingly more health-conscious society. For
annual cigarette use, the period trends, though significant, were flat across time and mostly
reflected decreases relative to 1997 levels after adjustment for covariates. This would align more
closely with the findings in the MTF which show a decreasing secular trend (Johnston et al.,
2016). For past 30-day use, we curiously found no period effect for the model without cohort
random effect and a strong positive period effect when modeled with a birth-cohort random
effect. The period effect was so strong that the numbers are not entirely believable, particularly
given what we know from other studies showing historical period declines in cigarette use.
233
Given the inconsistency of these findings with the known literature as well as seeming instability
in the detection of the effect, further exploration of period effects for habitual smoking is
warranted.
This usual directionality for the period effect highlights one of the primary limitations of
the ALD mixed model. While it resolves the statistical identification problems of age-period-
cohort analysis, it only serves to resolve the linear dependence in the statistical model and does
not address the theoretical dependencies that will always exist in models attempting to estimate
unique effects for age, period, and cohort. This means that while these models can be estimated,
they provide no guarantee that they are creating unbiased estimates of true underlying
parameters. While this may make some researchers uncomfortable about using ALD mixed
models, one must recognize that the alternative solutions such as single-cohort longitudinal
designs or cross-sectional designs of development are only spared this problem in the sense that
they ignore it; which is to say that these problems are inherent to all developmental models. One
safeguard is to build your models using underlying theory and background knowledge so that
you can distinguish between sensible and nonsensible results. Additionally, the modeling of age,
period, and cohort as combinations of fixed or crossed random effects can be used to find
solutions that provide the best fit for the data based on the information criteria.
In addition to the inherent limitations of these models, the current study has a limited
number of cohorts that are only separated by 5 years with which it attempts to make cohort
inferences about. While having cohorts in close time proximity likely improves the validity of
estimating a global trajectory for these models, it also hampers the ability to detect these
differences. Depending on the outcomes under study, it may not be reasonable to expect cohort
changes within such a short (5 year) time difference. For substance use, however, data from the
MTF has routinely shown cohort effects over this narrow range. The choice for modeling of
periods as fixed effects in these models also negates the potential differential impact of historical
change on participants of different ages (age-by-period interaction) which is not addressed in the
current specification of these models.
A further limitation is that these developmental trends in substance use were not analyzed
separately by sex, as much of the prior research suggests that men and women exhibit differential
vulnerabilities and patterns of substance involvement (e.g. Jager et al., 2013; Meich & Koester,
2012; Kerr, Lui, & Ye, 2018). While we include sex as a covariate to account for level
234
differences in cigarette and marijuana use, we fail to stratify analyses on sex or alternately model
age-by-sex interactions. However, because the NLSY was balanced on sex, we argue that the
combined trajectory represents the average effects between the sexes and is not subject to
substantial bias as a result of sex specific influences.
Lastly, we did not take into account the sample weighting in this study which limits the
generalizability of these findings to the sample from which they are drawn. Despite this, the
underlying sample demographics are largely representative of the US population in terms of sex,
race, and geographic location and results closely resemble those of other large sample national
survey designs such as the MTF.
In summary our study provides evidence for cohort effects in the annual prevalence of
cigarette and marijuana use, suggesting that younger cohorts are more likely to delay initiation
and use at lower levels in adolescence; however, this delay seems to result in prolonged use into
early adulthood.
235
8.5 Chapter References
1. Bachman, J. G., Johnston, L. D., & O'malley, P. M. (1998). Explaining recent increases in
students' marijuana use: impacts of perceived risks and disapproval, 1976 through
1996. American journal of public health, 88(6), 887-892.
2. Baltes, P. B. (1968). Longitudinal and cross-sectional sequences in the study of age and
generation effects. Human development, 11(3), 145-171.
3. Baltes, P. B., & Nesselroade, J. R. (1970). Multivariate longitudinal and cross-sectional
sequences for analyzing ontogentic and generational change: A methodological
note. Developmental Psychology, 2(2), 163.
4. Baltes, P. B., & Nesselroade, J. R. (1972). Cultural change and adolescent personality
development: An application of longitudinal sequences. Developmental Psychology, 7(3),
244.
5. Baltes, P. B., & Nesselroade, J. R. (Eds.). (1979). Longitudinal Research in the Study of
Behavior and Development. Academic Press.
6. Bell, R. Q. (1953). Convergence: An accelerated longitudinal approach. Child Development,
145-152.
7. Bernat, D. H., Erickson, D. J., Widome, R., Perry, C. L., & Forster, J. L. (2008). Adolescent
smoking trajectories: results from a population-based cohort study. Journal of Adolescent
Health, 43(4), 334-340.
8. Burns, A. R., Hussong, A. M., Solis, J. M., Curran, P. J., McGinley, J. S., Bauer, D. J., ... &
Zucker, R. A. (2017). Examining cohort effects in developmental trajectories of substance
use. International journal of behavioral development, 41(5), 621-631.
9. Chen, X., Li, G., Unger, J. B., Liu, X., & Johnson, C. A. (2003). Secular trends in adolescent
never smoking from 1990 to 1999 in California: an age–period–cohort analysis. American
Journal of Public Health, 93(12), 2099-2104.
10. Duncan, S. C., Duncan, T. E., & Strycker, L. A. (2006). Alcohol use from ages 9 to 16: A
cohort-sequential latent growth model. Drug & alcohol dependence, 81(1), 71-81.
11. Duncan, T. E., Tildesley, E., Duncan, S. C., & Hops, H. (1995). The consistency of family
and peer influences on the development of substance use in adolescence. Addiction, 90(12),
1647-1660.
236
12. Farrington, D. P. (1991). Longitudinal research strategies: Advantages, problems, and
prospects. Journal of the American Academy of Child & Adolescent Psychiatry, 30(3), 369-
374.
13. Glenn, N. D. (2005). Cohort analysis (Vol. 5). London: Sage.
14. Grucza, R. A., Agrawal, A., Krauss, M. J., Cavazos-Rehg, P. A., & Bierut, L. J. (2016).
Recent trends in the prevalence of marijuana use and associated disorders in the United
States. JAMA Psychiatry, 73(3), 300-301.
15. Hasin, D. S., Saha, T. D., Kerridge, B. T., Goldstein, R. B., Chou, S. P., Zhang, H., ... &
Huang, B. (2015). Prevalence of marijuana use disorders in the United States between 2001-
2002 and 2012-2013. JAMA Psychiatry, 72(12), 1235-1242.
16. Jager, J., Schulenberg, J. E., O'Malley, P. M., & Bachman, J. G. (2013). Historical variation
in drug use trajectories across the transition to adulthood: the trend toward lower intercepts
and steeper, ascending slopes. Development and psychopathology, 25(2), 527-543.
17. Johnston, L. D., O’Malley, P. M., Bachman, J. G., Schulenberg, J. E., & Miech, R. A. (2016).
Monitoring the Future national survey results on drug use, 1975-2015: Volume II, college
students and adults ages 19-55.
18. Kerr, W. C., Lui, C., & Ye, Y. (2018). Trends and age, period and cohort effects for
marijuana use prevalence in the 1984–2015 US National Alcohol Surveys. Addiction, 113(3),
473-481.
19. Keyes, K. M., Schulenberg, J. E., O'malley, P. M., Johnston, L. D., Bachman, J. G., Li, G., &
Hasin, D. (2011). The social norms of birth cohorts and adolescent marijuana use in the
United States, 1976–2007. Addiction, 106(10), 1790-1800.
20. Mason, W. M., & Fienberg, S. E. (1985). Introduction: Beyond the identification problem.
In Cohort analysis in social research (pp. 1-8). Springer, New York, NY.
21. Mason, K. O., Mason, W. M., Winsborough, H. H., & Poole, W. K. (1973). Some
methodological issues in cohort analysis of archival data. American Sociological Review,
242-258.
22. Mason, W. M., & Wolfinger, N. H. (2001). Cohort analysis. In N. J. Smelser & P. B. Baltes
(Eds.), International Encyclopedia of Social and Behavioral Sciences 1
st
Edition (pgs. 1-24).
Oxford: Elsevier.
237
23. Mathur, C., Erickson, D. J., Stigler, M. H., Forster, J. L., & Finnegan Jr, J. R. (2013).
Individual and neighborhood socioeconomic status effects on adolescent smoking: a
multilevel cohort-sequential latent growth analysis. American journal of public
health, 103(3), 543-548.
24. Miech, R. A., Johnston, L., O’Malley, P. M., Bachman, J. G., Schulenberg, J., & Patrick, M.
E. (2015). Trends in use of marijuana and attitudes toward marijuana among youth before
and after decriminalization: The case of California 2007–2013. International Journal of Drug
Policy, 26(4), 336-344.
25. Miech, R., & Koester, S. (2012). Trends in US, past-year marijuana use from 1985 to 2009:
An age–period–cohort analysis. Drug & Alcohol Dependence, 124(3), 259-267.
26. O'Brien, R. M., Hudson, K., & Stockard, J. (2008). A mixed model estimation of age, period,
and cohort effects. Sociological methods & research, 36(3), 402-428.
27. O'Brien, R. (2014). Age-period-cohort models: Approaches and analyses with aggregate
data. CRC Press.
28. O'Malley, P. M., Bachman, J. G., & Johnston, L. D. (1984). Period, age, and cohort effects
on substance use among American youth, 1976-82. American Journal of Public
Health, 74(7), 682-688.
29. Schaie, K. W. (1965). A general model for the study of developmental
problems. Psychological Bulletin, 64(2), 92.
30. Schaie, K. W., Willis, S. L., & Pennak, S. (2005). An historical framework for cohort
differences in intelligence. Research in Human Development, 2(1-2), 43-67.
31. Schauer, G. L., Berg, C. J., Kegler, M. C., Donovan, D. M., & Windle, M. (2015). Assessing
the overlap between tobacco and marijuana: Trends in patterns of co-use of tobacco and
marijuana in adults from 2003–2012. Addictive behaviors, 49, 26-32.
32. Smith, H. L., Mason, W. M., & Fienberg, S. E. (1982). Estimable functions of age, period,
and cohort effects: more chimeras of the age-period-cohort accounting framework: comment
on Rodgers. American Sociological Review, 47(6), 787-793.
33. Snijders, T. & Bosker, R. (1999). Multilevel Analysis: An Introduction to Basic and
Advanced Multilevel Modeling. Thousand Oaks, CA: Sage.
238
34. Terry-McElrath, Y. M., & O'malley, P. M. (2011). Substance use and exercise participation
among young adults: Parallel trajectories in a national cohort-sequential
study. Addiction, 106(10), 1855-1865.
35. Yang, Y., & Land, K. C. (2006). A mixed models approach to the age-period-cohort analysis
of repeated cross-section surveys, with an application to data on trends in verbal test
scores. Sociological Methodology, 36(1), 75-97.
36. Yang, Y., & Land, K. C. (2008). Age–period–cohort analysis of repeated cross-section
surveys: fixed or random effects? Sociological Methods & Research, 36(3), 297-326.
37. Yang, Y., & Land, K. C. (2016). Age-period-cohort analysis: New models, methods, and
empirical applications. Chapman and Hall/CRC.
239
8.6 Tables
Table 1: Demographic Characteristics of the Sample
Variable Values
Age at first visit, Mean ± SD 14.9 ± 1.4
Sex, Male % 51%
Ethnicity
Black % 16%
Hispanic % 14%
Mixed Race (Non-Hispanic), % 1%
Non-Black/Non-Hispanic, % 69%
School Grade Cohort
6
th
Grade, % 8%
7
th
Grade, % 20%
8
th
Grade, % 21%
9
th
Grade, % 21%
10
th
Grade, % 18%
11
th
Grade, % 10%
12
th
Grade, % 2%
Birth Cohort
1980, % 19%
1981, % 21%
1982, % 21%
1983, % 21%
1984, % 19%
240
Table 2: Between School Grade Cohort Differences in Demographic, Schooling, and Antisocial Behavior Indicators
School Grade Cohort
6
th
Grade 7
th
Grade 8
th
Grade 9
th
Grade 10
th
Grade 11
th
Grade 12
th
Grade P
Age at first visit, Mean ± SD 12.8 ± 0.6 13.4 ± 0.7 14.3 ± 0.7 15.3 ± 0.7 16.2 ± 0.6 16.9 ± 0.5 17.3 ± 0.4 -
Sex, Male % 58.2% 52.5% 52.5% 51.7% 47.1% 47.0% 43.8% <.001
Ethnicity
Black % 17.4% 16.6% 16.0% 15.2% 14.4% 18.0% 14.3% 0.216
Hispanic % 13.0% 14.8% 13.6% 13.3% 14.6% 9.9% 18.1% 0.216
Mixed Race (Non-Hispanic), % 1.0% 1.5% 1.0% 1.3% 1.4% 0.7% 1.9% 0.216
Non-Black/Non-Hispanic, % 68.7% 67.1% 69.4% 70.3% 69.6% 71.4% 65.7% 0.216
Learning Disability
1
, % 15% 13% 12% 9% 10% 8% 4% <.001
Years of Schooling
2
, Mean ± SD 12.9 ± 3.2 13.4 ± 3.1 13.6 ± 3.0 13.6 ± 3.0 14.1 ± 2.7 14.6 ± 2.7 14.3 ± 2.6 <.001
Ever Repeat Grade
3
, % 29.6% 20.1% 20.0% 20.0% 15.7% 5.0% 4.4% <.001
Ever Skip Grade
3
, % 3.8% 2.0% 1.9% 1.5% 3.7% 2.5% 10.0% <.001
ASVAB Percentile
4
, Mean ± SD 41.2 ± 29.3 48.5 ± 29.2 50.0 ± 28.7 50.6 ± 28.8 52.8 ± 28.1 58.5 ± 26.8 58.4 ± 25.3 <.001
Ever Arrested, % 40.8% 35.7% 32.8% 36.5% 30.0% 23.8% 21.0% <.001
Delinquency Score, Mean ± SD 1.81 ± 2.08 1.77 ± 2.00 1.79 ± 1.96 1.95 ± 2.17 1.86 ± 2.04 1.65 ± 1.94 1.25 ± 1.85 0.002
1
N=5,916;
2
N=6,596;
3
N=4,641-4,660;
4
N=5,369
241
Table 3: Between Birth Cohort Differences in Demographic, Schooling, and Antisocial Behavior Indicators
Birth Year Cohort
1980 1981 1982 1983 1984 P
Age at first visit, Mean ± SD 16.9 ± 0.4 15.9 ± 0.4 14.9 ± 0.4 13.9 ± 0.4 12.9 ± 0.4 -
Sex, Male % 51.6% 48.6% 51.9% 51.3% 52.3% 0.324
Ethnicity
Black % 17.9% 14.3% 16.2% 16.0% 15.4% 0.795
Hispanic % 13.0% 13.9% 13.8% 13.6% 13.6% 0.795
Mixed Race (Non-Hispanic), % 1.2% 1.1% 1.3% 1.0% 1.4% 0.795
Non-Black/Non-Hispanic, % 67.9% 70.7% 68.7% 69.3% 69.6% 0.795
Learning Disability
1
, % 11.0% 10.1% 9.9% 12.7% 9.8% 0.135
Years of Schooling
2
, Mean ± SD 13.7 ± 2.9 13.7 ± 3.0 13.7 ± 3.1 13.7 ± 3.1 13.9 ± 2.8 0.423
Ever Repeat Grade
3
, % 21.8% 17.3% 19.0% 16.9% 13.9% <.001
Ever Skip Grade
3
, % 3.1% 2.7% 2.2% 1.5% 2.7% 0.214
ASVAB Percentile
4
, Mean ± SD 49.4 ± 29.1 50.5 ± 28.8 50.2 ± 28.9 50.2 ± 29.1 52.6 ± 28.1 0.130
Ever Arrested, % 32.2% 33.7% 33.9% 32.1% 33.9% 0.739
Delinquency Score, Mean ± SD 2.00 ± 2.16 1.91 ± 2.09 1.92 ± 2.15 1.67 ± 1.93 1.56 ± 1.79 <.001
1
N=5,916;
2
N=6,596;
3
N=4,641-4,660;
4
N=5,369
242
Table 4: ALD Mixed Model of Cohort Effects
Model Specification
Comparison to
No Cohort Model
Outcome Cohort
Age Functional
Form
Random Cohort
Effects
Random Person
Effects
BIC DChi
2
, df P
Cigarettes
Past Year
School Grade Age
3
Cohort: Age,
Cov(Un)
Person: Age,
Cov(Un)
64681.71 89.58, 3 <.001
Marijuana
Past Year
School Grade Age
4
Cohort: Age,
Cov(Ind)
Person: Age,
Cov(Un)
56670.67 13.28, 2 0.001
Cigarettes
Past 30-Days
School Grade Age
4
NONE
Person: Age,
Cov(Un)
60097.84 NA NA
Marijuana
1
Past 30-Days
School Grade Age
4
NONE
Person: Age,
Cov(Un)
36650.48 NA NA
Cigarettes
Past Year
Birth Year Age
3
Cohort: Age,
Cov(Ind)
Person: Age,
Cov(Un)
64640.97 118.95, 2 <.001
Marijuana
Past Year
Birth Year Age
4
Cohort: Age,
Cov(Ind)
Person: Age,
Cov(Un)
56633.43 50.53, 2 <.001
Cigarettes
Past 30-Days
Birth Year Age
4
Cohort:
Person: Age,
Cov(Un)
60104.22 4.99, 1 0.026
Marijuana
1
Past 30-Days
Birth Year Age
4
NONE
Person: Age,
Cov(Un)
36650.48 NA NA
1
These are the same model due to the absence of cohort differences
Cohort:Age indicates a random intercept for cohort and random coefficient for age
Ind=Independent Covariance; Un=Unstructured Covariance
243
Table 5: Substance use amounts and ages at peak use by cohorts
Cigarettes
Past Year
Marijuana
Past Year
Cigarettes
Past 30-Days
Marijuana
Past 30-Days
Peak
Use
Age at
Peak
Peak
Use
Age at
Peak
Peak
Use
Age at
Peak
Peak
Use
Age at
Peak
School Grade Cohort
Global Trajectory 47.0% 21.00 28.2% 18.15 41.1% 22.25 19.3% 19.75
6
th
Grade 46.0% 24.00 27.2% 18.35
7
th
Grade 45.0% 22.75 27.4% 18.50
8
th
Grade 45.0% 22.00 26.0% 18.40
9
th
Grade 48.5% 20.25 29.4% 18.05
10
th
Grade 48.5% 19.75 28.8% 17.90
11
th
Grade 49.0% 19.25 30.8% 17.95
12
th
Grade 49.0% 19.00 28.6% 18.25
Birth Cohort
Global Trajectory 47.0% 21.0 28.2% 18.15 41.1% 22.25 19.3% 19.75
1984 43.0% 23.5 27.0% 18.70 39.9%
22.25
1983 43.5% 22.5 25.8% 18.55 40.1%
22.25
1982 46.0% 20.5 27.4% 18.10 40.3%
22.25
1981 50.0% 20.0 29.4% 17.85 42.3%
22.25
1980 52.0% 19.5 31.9% 17.65 42.7%
22.25
244
Table 6: ALD Mixed Model with Period Fixed Effects
Model Specification
Period Fixed Effects
Unadjusted
Period Fixed Effects
Covariate Adjusted
3
Outcome Cohort
Age
Form
Random
Cohort Effects
2
Random
Person Effects
BIC DChi
2
, df P BIC DChi
2
, df P
Cigarettes
Past Year
School
Grade
Age
3
Cohort: Age,
Cov(Un)
Person: Age,
Cov(Un)
64,262 358, 14 <.001 63,250 336, 14 <.001
Marijuana
Past Year
School
Grade
Age
4
Cohort: Age,
Cov(Ind)
Person: Age,
Cov(Un)
56,418 177, 14 <.001 55,153 198, 14 <.001
Cigarettes
Past 30-Days
School
Grade
Age
4
NONE
Person: Age,
Cov(Un)
60,241 16, 14 0.331 58,923 16, 14 0.316
Marijuana
1
Past 30-Days
School
Grade
Age
4
NONE
Person: Age,
Cov(Un)
36,717 93, 14 <.001 35,440 122, 14 <.001
Cigarettes
Past Year
Birth
Year
Age
3
Cohort: Age,
Cov(Ind)
Person: Age,
Cov(Un)
64,478 301, 14 <.001 63,238 308, 14 <.001
Marijuana
Past Year
Birth
Year
Age
4
Cohort: Age,
Cov(Ind)
Person: Age,
Cov(Un)
56,653 140, 14 <.001 55,182 144, 14 <.001
Cigarettes
Past 30-Days
Birth
Year
Age
4
Cohort:
Person: Age,
Cov(Un)
60,228 35, 14 0.001 58,673 34, 14 0.002
Marijuana
1
Past 30-Days
Birth
Year
Age
4
NONE
Person: Age,
Cov(Un)
36,717 93, 14 <.001 35,440 122, 14 <.001
1
These are the same model due to the absence of cohort differences
2
Cohort:Age indicates a random intercept for cohort and random coefficient for age
Ind=Independent Covariance; Un=Unstructured Covariance
3
Sex and Delinquency score
245
Supplementary Table S1: Description of Number of Observations per Subject
Cigarette Use
Past Year
Marijuana Use
Past Year
Cigarette Use
Past 30-Days
Marijuana Use
Past 30-Days
Mean # of Obs
[Min, Max]
Mean # of Obs
[Min, Max]
Mean # of Obs
[Min, Max]
Mean # of Obs
[Min, Max]
School Grade
Cohort
6
th
Grade 13.8 [2, 15] 14.1 [2, 15] 13.7 [2, 15] 14.0 [2, 15]
7
th
Grade 13.8 [3, 15] 14.0 [3, 15] 13.7 [3, 15] 14.0 [3, 15]
8
th
Grade 13.8 [1, 15] 14.0 [1, 15] 13.7 [1, 15] 14.0 [1, 15]
9
th
Grade 13.3 [1, 15] 13.6 [1, 15] 13.2 [1, 15] 13.6 [1, 15]
10
th
Grade 13.3 [1, 15] 13.7 [1, 15] 13.2 [1, 15] 13.7 [1, 15]
11
th
Grade 13.2 [1, 15] 13.6 [1, 15] 13.2 [1, 15] 13.6 [1, 15]
12
th
Grade 12.7 [1, 15] 13.2 [1, 15] 12.7 [1, 15] 13.2 [1, 15]
Birth Cohort
1980 13.1 [1, 15] 13.5 [1, 15] 13.0 [1, 15] 13.5 [1, 15]
1981 13.3 [1, 15] 13.7 [1, 15] 13.2 [1, 15] 13.7 [1, 15]
1982 13.6 [1, 15] 13.9 [1, 15] 13.5 [1, 15] 13.9 [1, 15]
1983 13.8 [1, 15] 14.0 [1, 15] 13.8 [1, 15] 14.0 [1, 15]
1984 13.9 [2, 15] 14.1 [2, 15] 13.8 [2, 15] 14.0 [2, 15]
246
8.7 Figures
Figure 1: Loess plots of past year cigarette and marijuana use by cohort membership
247
Figure 2: Loess plots of past 30-day cigarette and marijuana use by cohort membership
248
Figure 3: Predictive margins of past year cigarette and marijuana use by cohort membership
249
Figure 4: Predictive margins of past 30-day cigarette and marijuana use by cohort membership
250
Figure 5: Historical period effects of past year cigarette and marijuana use by cohort model
Figure 6: Historical period effects of past 30-day cigarette and marijuana use by cohort model
251
Chapter 9
General Discussion
In this dissertation we introduced the language and nomenclature by which we can
discuss the design of accelerated longitudinal designs (Chapter 2) and demonstrated their use in
linear (Chapters 3 & 5) and nonlinear (Chapters 4 & 5) models in the absence and presence of
between-cohort differences (Chapters 6 & 7) in small samples. We discussed how these designs
can be simulated and analyzed (Chapters 3, 4, & 6) as well as the consequences that varying
design choices have on power (Chapters 3, 4, & 6), bias, estimator efficiency, and coverage
probability (Chapters 5 & 7). Lastly, we demonstrate the analysis of an accelerated design using
real data and provide considerations regarding the analytic methods and the choice of cohort
membership (Chapter 8).
9.1 Chapter specific reflections, limitations, and future directions
9.1.1 Chapter 2: The design elements of the ALD
In Chapter 2 we introduced the language by which to discuss the design elements of the
accelerated longitudinal design (ALD) as well as proposed cost equations to allow for
comparison between single-cohort and accelerated designs. We recommended metrics of design
efficiency and overlap for comparing between designs with fixed costs and the incorporation of
cost savings when costs parameters were variable. Most importantly, we recognized the utility of
these cost equations to take advantage of the cost savings in the ALD through shortened duration
to afford a larger sample size in comparison to the single-cohort design (SCD).
The introduction of the language of the design elements for the ALD is an important
contribution as prior methodological work has largely ignored describing these elements.
Through this language we were able to illustrate how the design parameters can be used to
quantify the many metrics of an ALD such as length, overlap, coverage, and design efficiency.
These metrics, though alluded to and occasionally discussed by both Moerbeek (2011) and
Galbraith, Bowden, and Mander (2017), were never mathematically defined until now. While the
equations presented are limited to the special case of the ‘balanced’ ALDs that are presented here
(i.e. equal number of number of measurements and frequency of measurement between cohorts)
they nonetheless serve to inform the most usual scenarios of the ALD. Mathematically defining
252
these terms also facilitated the extension of the cost equation by Galbraith et al. (2017). Though
Galbraith utilized a cost model, they failed to define this model based on the ALD design
parameters, thereby limiting its utility. By recognizing that this model could be re-expressed in
terms of the design parameters we were able to develop a cost model that could be used to
compare costs between the single-cohort and accelerated designs. As we showed in Chapters 3
and 4, this allows for the ALD to offset power loss due to the use of multiple cohorts to cover the
age span, thereby allowing ALDs to be as powerful as a single-cohort design. Though Galbraith
et al. (2017) assumed some known absolute value for costs in their simulations, we recognized
that when comparing between an SCD and ALD these cost parameters could be expressed as
proportions of total costs for a budget category thereby simplifying their use for researchers
attempting to compare designs. Lastly, we provided equations for the incorporation of attrition
into these cost equations which will generally reduce the cost savings of the ALD relative to the
SCD and represent a more realistic scenario of actual study costs.
The assessments in this chapter had some limitations. Though we examined the role of
costs, we did not have a general idea of the actual proportions of a budget that the proposed cost
categories typically amount to in a longitudinal design. It is for this reason that much of the
simulations utilizing the equal cost sample size in subsequent chapters assume equal costs
between the budget categories, which may not be a realistic. While we acknowledge that the
relative costs will certainly differ depending on the nature of the study (e.g. cost for a
questionnaire measurement vs psychophysiology measurement), having some sense of the cost
distribution for the budget categories would allow subsequent simulations to be better informed.
Moreover, this would facilitate the assessment of cost-effectiveness with regards to the cost per
power or cost per bias-reduction in future analyses.
9.1.2 Chapter 3: The power of linear ALDs
In Chapter 3 we introduced the methods for simulating and analyzing linear ALDs in the
absence of between-cohort differences. Our primary metric was statistical power and our results
showed that while the ALD will have less power than an SCD, the power losses can be offset by
using a smaller number of cohorts with maximal overlap between the age distributions. With use
of the equal cost sample size, the simulations showed that the ALD is as powerful as the SCD
making it a reasonable choice when the researcher believes the outcome is invariant to
generational changes.
253
The investigations in Chapter 3 were novel in that they examined the power in the
absence of between-cohort differences similar to Moerbeek (2011) but also explored these at
lower growth curve reliability (GCR) as well as in the presence of attrition. The investigations at
low GCR showed that the prior work of Moerbeek (2011), which indicated designs with a greater
number of cohorts would be more powerful for a fixed number of measurements, was specific to
the higher GCRs Moerbeek investigated. At the low GCR we showed that designs with a smaller
number of cohorts would be the most powerful when the total number of measurements was
fixed. Because the age span coverage was fixed in these designs this demonstrates the tradeoff
between adding more measurements versus adding more subjects, showing that at low reliability
adding more measurements is beneficial for power while at high reliability it is more beneficial
to have more subjects. For attrition, while Galbraith et al. (2017) had also examined attrition,
they did so assuming either a uniform distribution (i.e. Missing Completely at Random) or
attrition concentrated towards the beginning of the study. Given that longitudinal designs
typically experience dropout as a function of study duration (i.e. concentrated towards the end of
the study) our examinations focused on attrition as a function of study duration (i.e Missing at
Random). This work showed that the ALDs had less power loss under attrition compared to the
SCD and that this loss could further be minimized by utilizing the equal cost sample size. This
work could be extended by examining situations where non-random but unmodeled attrition is
introduced (i.e. Missing Not at Random) to explore how these models perform under these worse
case scenarios. Ultimately our findings indicated that the ALD could be as powerful as the SCD
when using the equal cost sample size.
We focused on using a mixed model approach based on the work of Yang and Land
(2006; 2008) and O’Brien, Hudson, and Stockard (2008) to allow for analyzing age on a
continuum. Similar modeling approaches could be applied using binned age categories which
would facilitate using a structural equation modeling approach such as the latent growth curve
analyses of McArdle and Hamagami (1992). Of importance is to note is that the mixed model
approach utilized Z-scores for evaluating the significance of the fixed effect in these analyses.
Given the small sample sizes the use of the Z-score would results in statistical tests that were
anticonservative (i.e. permissive), meaning that the p-values were likely too small. The
evaluation of the test statistics using Z-scores was intentional as to avoid any controversy over of
the correct denominator degrees of freedom, particularly in the later chapters employing within-
254
cohort nesting. Nevertheless, if ALDs are to be employed using small samples, the correct
denominator degrees of freedom will need to be established, particularly when more complex
random effect structures are used. Some approaches to this may be the use of a Satterthwaite or
Kenward-Rogers approximation or alternately the use of parametric bootstrapping. Future
avenues of research can be based on the assessment of the most appropriate denominator degrees
of freedom in evaluating the fixed effects for a given random effects structure in the ALD. We
also assume homoscedastic normally distributed uncorrelated random errors in our simulations.
In practice, longitudinal designs may show amounts of serial correlation unrelated to the random
effects, thus investigations using alternate specifications of the ALD with autocorrelated errors
should be examined in simulations. Additionally, many types of growth data show non-normal
and heteroscedastic error. The simulations presented here do not address these issues and indeed
the application of these models, which assume normally distributed homoscedastic residuals,
would need to be modified to accommodate different residual structures of the data. Lastly, we
specified zero correlation between the intercepts and slopes as a means of simplifying the
number of parameters to vary. However there will often be a negative correlation between these
parameters in longitudinal designs such that those initially higher on the outcome will experience
less growth (and vice versa). The constraining of the covariance to zero would generally result in
a more conservative estimate for the fixed effect. Future simulations will need to examine the
performance of these models when there is correlation between the random intercepts and slopes.
9.1.3 Chapter 4: The power of nonlinear ALDs
In Chapter 4 we examined the ALD for a nonlinear model when between-cohort
differences were absent. In comparison to the linear models we found that larger sample sizes
were needed to achieve the same amount of power. Moreover, for the nonlinear growth specified
attrition was less impactful on power than in the linear model. Lastly, despite the use of an equal
cost sample size, none of the nonlinear ALDs were able to achieve the same power as the SCD.
Despite this, at higher GCRs of 0.7 or 0.9, the nonlinear ALD was shown to have sufficient
power (~80) at the small sample sizes used.
These simulations were novel in that few, if any, studies have examined the performance
of the ALD using nonlinear models. The prior methodological work on ALDs from Raudenbush
& Chan (1992), Moerbeek (2011), and Galbraith et al. (2017) have only explored these designs
using linear growth. While McArlde and Hamagami (1992) examined nonlinear growth, this
255
work was restricted to two repeated measurements in non-overlapping age-distributions with no
consideration towards the design elements of the ALD. The extension of ALDs to nonlinear
models is important, particularly for research that covers the lifespan, as many developmental
phenomena show nonlinear patterns. Future work will be needed to extend the current
simulations to examine their application to other forms of nonlinear growth such as quadratic and
logistic growth models which are common in many disciplines.
A primary problem in fitting the nonlinear growth models was the computation time. The
mixed model estimations for these designs were five to ten times slower than for the linear
models and had much higher model convergence failure rates which resulted in only fitting these
models for a limited sample size (i.e. N’s of 40, 80, and 120). Part of this issue likely had to do
with the nature of the GCR, as the default of 0.5 proved to be too low. Indeed plotting of the
samples at this GCR generally showed an inability to visually distinguish the overall nonlinear
pattern. Another indication that the default GCR was too low was the fact that even at the high
GCR of 0.9, the designs with a smaller number of cohorts were still more powerful for a fixed
number of measurements, which in the linear model only occurred at small GCRs. Future
simulations should examine these nonlinear models using greater growth curve reliability as to
ascertain if this pattern reverses at a sufficiently high GCR.
Attrition was found to be less impactful in the nonlinear models compared to the linear
models. One of main reasons attrition was likely due to the relatively fast rate of approach. With
half of the developmental change occurring in the first 2 years, the earlier time points that were
less subject to attrition would be responsible for most of the pattern of change. This and the
overall lower power levels in the nonlinear models likely explain why these models were less
subject to power loss from attrition. Further analyses will be needed to explore the role of the
rate of change on power in the ALDs, as one might expect that slower rates of change (i.e. more
linear) would allow for the nonlinear models to show power levels similar to the SCD.
9.1.4 Chapter 5: Slope bias, efficiency, and coverage for linear and nonlinear ALDs
In Chapter 5 we examined the bias, estimator efficiency, and coverage probability for the
results from the linear and nonlinear models when there were no between-cohort differences.
This is the first study to evaluate these metrics for the linear ALD and one of first for the
nonlinear ALD (McArdle & Hamagami, 1992). For the linear and nonlinear models we showed
that the ALD tended to overestimate the slope and that the bias was reduced with more frequent
256
measurement. This reduction was substantial for the nonlinear models suggesting at minimum
annual measurements. This was not surprising for nonlinear development, as a sufficient number
of measurements would be needed to adequately capture curvature in the nonlinear model to
ensure bias was minimized. Our findings for a minimum of annual measurement roughly equate
to 5-7 measurements per person. For the linear model, ALD bias was minimal (within 5%)
suggesting that any of the ALDs would perform well in terms of bias. For the nonlinear model,
higher GCRs were needed to reduce the bias to within 7% of the SCD. Use of the equal cost
sample size reduced bias for both the linear and nonlinear models, and the linear ALDs showed
less bias than the SCD when using the equal cost sample. One aspect of note was that bias
generally stabilized for the ALDs after a sample size of 40 or when greater than 500
measurements were used. The estimator efficiency for the ALDs was always worse than the
SCD, particularly for the nonlinear model which was also found by McArdle & Hamagami
(1992). Despite the ability of the ALD to reduce the bias to levels comparable to the SCD, the
ALD always provided a more variable estimate of the slope parameter, even when the equal cost
sample size was used. The reason this occurs should be obvious, as the reduced number of
measurements per subject in the ALD decrease the precision of the slope. It is for this reason that
with a fixed coverage a larger number of cohorts in the ALD will decrease the efficiency through
loss of measurements per subject. The coverage probability was nominal for the linear model
and showed less coverage (i.e. higher type 1 error) when using the nonlinear model indicating
that higher GCR designs must be employed when utilizing nonlinear models. Future work will
examine the role of residual simulation in these designs; in particular how changes to the GCR
may exert nonlinear changes on these metrics as well as the consequences of heteroscedastic or
correlated errors on the estimation of the ALD slopes.
9.1.5 Chapter 6: Between-cohort differences and power
In Chapter 6 we introduced the methods and models for incorporating between-cohort
variability. We examined issues pertaining to the generalizability of the global trajectory as well
as the impact of between-cohort differences on the power to detect the fixed effect slope and
between-cohort variance. We additionally explored the consequences for power when simulating
cohort differences by assuming a constant effect size as well as when the ALD mixed model is
misspecified to exclude the modeling of cohort effects when they are present.
257
Galbraith et al. (2017) have thus far provided the most comprehensive methodological
study of accelerated designs and included designs with between-cohort differences. However,
they excluded description as to how the between-cohort differences were created, the amount of
differences incorporated, and the considerations necessary when incorporating these differences
in order to make cross-design comparisons valid. As a result, this is the first study to describe
how to simulate between-cohort differences and their potential impact on the study of statistical
power in ALDs.
In the examination of slope generalizability we graphically showed how measures of
cohort slope ratio (CSR) are more important for determining the amount of nesting of the global
fixed effect slope rather than the more intuitive cohort slope percentage (CSP). We additionally
explored measures of slope misfit as a means to capture the magnitude of the average within-
person slope difference when estimating the slopes in a ALD mixed model. While information
criteria metrics (e.g. AIC; BIC) will provide a better guide for overall model fit, they do not have
a magnitude that is intuitive or interpretable on the scale of the parameters being estimated. We
saw that the slope misfit would provide an estimate of the differences between the estimated and
true slope parameters that was invariant to changes in the between-cohort differences. While this
metric was useful for understanding the magnitude of differences as a result of changes to the
number of cohorts, the cohort interval spacing, the GCR, effect size, and amount of overlap; this
did not prove to be a useful metric for understanding the generalizability of the fixed effect
slope. This notion of generalizability needs to be further explored, as a key element of the
usefulness of the ALD in the presence of between-cohort differences is if the fixed effects slope
presents an appropriate summary of the aggregate effects. The partial pooling of cohort-specific
estimates should be considered a strength of the ALD when between-cohort differences are
minimal. In the presence of large differences however, one must consider the amount of
deviation from the fixed effect that they are willing to tolerate.
In terms of power for the fixed effect slope, we saw that even slight increases in between-
cohort variability increased the power of the ALD. Given that with real data the between-cohort
differences are not likely to be exactly 0, as specified in the simulations, this indicates that the
ALDs will generally show good performance for detecting the fixed effect slope. One caveat is
that the 2-cohort design showed a loss in power as the between-cohort variability became greater
than the within-cohort variability (i.e. CSR > 1). We examined whether this phenomenon was
258
limited to the 2-cohort design alone by increasing the range of CSRs explore, however we could
not find evidence for this effect in the 3-, 4-, or 5-cohort designs. This suggests that the 3-cohort
design is likely to be more robust by having high power even when cohort differences are
present. Of course, these simulations only examined between-cohort differences in slope, thus
further research will be needed to examine shifts solely due to mean levels as well as the likely
combination of intercept and slope differences. Moreover, these simulations assumed cohort
differences that were unidirectional and equal interval (i.e. linear). In practice this may not be the
case as cohort differences may be nonlinear or perhaps similar for some cohorts but not others.
As a result, more methodological work needs to be done to examine how power would be
affected when cohort differences are specified nonlinearly.
The power to detect between-cohort variability was generally poor except at high levels
of cohort variance. This showed that for small samples the goals of detecting a fixed effect slope
that generalized to all the cohorts and detecting between-cohort variance were antithetical (as we
would expect). These was some discrepancy with the power to detect these variance parameters
from the findings of Galbraith et al. (2017) who showed greater power at a fixed number of
measurements for designs with high overlap. Conversely, we saw that the low overlap designs
showed greater power in these conditions. Because Galbraith does not include many details on
the nature of the between-cohort differences it is difficult to know why these discrepancies arise,
thus further study on how cohort overlap contributes to the detection of between-cohort variance
is warranted.
9.1.6 Chapter 7: Bias, efficiency, and coverage when between-cohort differences are present
In Chapter 7 we examined the bias, estimator efficiency, and coverage probability of the
fixed effect slope and between-cohort variance estimate for linear models in the presence of
between-cohort differences.
We saw that initial increases in CSR resulted in increased bias, but that bias would
reduce after a CSP of ~100%. This indicates that bias could reach as much as 4-5% in a 5-cohort
design under high overlap conditions with low GCR. This would generally be considered a small
amount of bias, suggesting that the ALD performs well with regards to bias when between-
cohort slope differences are present. However, in the low overlap condition these bias values
increased to ~20% for the 3-cohort design, highlighting the importance of maintaining a high
amount of overlap when utilizing an ALD when between-cohort differences are expected.
259
Unfortunately, neither Galbraith et al. (2017) nor others have investigated the role of bias in an
ALD thus it is hard to know if these findings generalize across ALDs or are specific to the
parameters investigated. Findings for coverage probability also indicated that increases to CSR
resulted in improved coverage resulting in decreased type 1 error which corresponds to our
findings of increased power.
Perhaps the most important finding was that model misspecification in the presence of
between-cohort differences resulted in substantial increases in bias and very poor coverage
probability which suggests that for researchers who have collected multi-cohort data the
exclusion of modeling cohort difference can increase bias and type 1 error. We saw in Chapter 8
how in practice the application of a misspecified model can result in large bias in the fixed
effects. This is particularly troubling if mutli-cohort longitudinal studies routinely ignore the
modeling of cohort differences for outcomes that are known to be subject to generational
differences such as cognition and substance use. As a result, one overarching goal of the research
presented here is to increase awareness of the presence of cohort differences and the importance
of their modeling to reduce bias.
9.2 General reflections, limitations, and future directions
There is no ‘ideal’ design for a particular problem of development, as each method
(single-cohort longitudinal, cross-sectional, ALD) will come with tradeoffs pertaining to the
types of effects that are able to be estimated (e.g. age, period, cohort), the length of time and
costs for conducting the study, and the power and estimation accuracy or precision of the effects.
The ALD is, however, well suited for studying outcomes that are subject to rapid generational
changes as it provides a means to estimate the magnitude of these effects as well as a shortened
time span for conducting the research so the findings remain relevant when reported.
In light of this and the previously reported findings, we envision the use of the
accelerated design in the following ways:
1) These methods can be employed by early investigators who often are interested in
within-subject change but typically lack the resources to conduct longitudinal research. The
accelerated design provides a mechanism to collect longitudinal data using small samples which
can be utilized as pilot data or otherwise provide proof of concept in applying for funding
260
opportunities that allow for larger scale designs. To this end, future work needs to be conducted
on outcomes that are invariant to generational change to demonstrate the performance of
trajectories with limited within-person repeated measurements (e.g. 2) and minimal or no overlap
in the age distributions, similar to the work of McArdle and Hamagami (1992). These could
conceivably allow a researcher to estimate a 20-year trajectory based on the collection of 1 year
of data, potentially providing justification for designs with longer term follow-up.
2) For developmental researchers who do not anticipate cohort differences, the ALD
should be the primary longitudinal design. Researchers currently utilizing singe-cohort designs
for developmental research are implicitly rejecting the notion of between-cohort differences. In
these circumstances it would make the most fiscal sense to reduce the length of time for data
collection by utilizing an ALD. This would have additional benefits for the researcher in
allowing earlier publication of results as well as providing the means to statistically assess their
belief in the absence of between-cohort differences. In order to facilitate this, more
methodological work published in substantive journals needs to be accomplished in order to
increase the awareness of these designs.
3) For developmental researchers that are studying outcomes that are sensitive to cultural
or generational influences, the ALD is also a sensible choice. Only a design studying multiple
cohorts longitudinally will allow for inferences about the nature of within-person change,
generational differences, and historical period effects. While multiple cross-sectional
assessments of multiple cohorts will allow for inference on the changes due to age, period, and
cohort; these do not have the ability to assess within-person change that the ALD provides. More
research needs to be conducted on how differences in model specification and estimation result
in differing estimates for age, period, and cohort effects and under what conditions alternate
specifications result in better model fit.
4) For researchers who have collected multi-cohort longitudinal data, the methods of
analysis for the ALD are extremely relevant. Often in the literature the modeling of age-based
trajectories from these studies ignore the fact that data arise from multiple cohorts. In doing so,
researchers are potentially biasing the parameter estimates by ignoring the differential effects of
cohorts. We saw the consequences of this through our simulations in Chapter 7 and in the
applied analyses of Chapter 8.
261
This study is one of the first to assess the issues of power in nonlinear ALDs as well as to
examine the metrics of bias, estimator efficiency, and coverage probability in either linear or
nonlinear ALDs. Though Galbraith, Bowden, and Mander (2017) examined the performance of
ALDs in the presence of between-cohort differences, this study examined these in detail by
discussing the ways with which to implement between-cohort variation as well as by providing
methods for assessing the impact of this variation. Though the methods presented here were
designed for small samples they are also relevant for larger sample designs. We chose to utilize
small samples in order to demonstrate that these designs can be effective for studying
longitudinal age-related change for researchers with budgetary constraints or recruitment issues.
The general recommendations from this study are to utilize ALDs with high amounts of
overlap between cohorts and to use as few cohorts as necessary to make the ALD cost and time
effective relative to the SCD. The use of a 3-chort design provides a compromise between
maximizing power while minimizing study length, as this design also proved to be robust to
power changes when between-cohort variation was present (unlike the 2-cohort design which
showed eventual power loss). However, these simulations cover a limited range of values for the
design parameters and the possible combinations are too numerous to illustrate in this manuscript
(e.g. the growth curve reliability, effect size, type of growth, number of cohorts, cohort interval
spacing, number of subjects per cohort, the number of periods, period interval spacing, the
amount and distribution of attrition, amount of between-cohort variation, and methods of
analysis). These analyses also represent balanced ALDs in that the number of subjects in each
cohort is equal. In practice this may not be the case as certain cohorts may be more difficult to
recruit. Designs with the same total sample size but different Ns in each cohort are likely to have
differential effects on the power to detect the fixed effects slope and as a result should be
investigated.
Of particular importance from this study is the use of the equal cost sample size. The cost
savings from using an ALD can be used to increase the overall sample size which allows for
statistical power that is greater than in an SCD. No prior research has examined longitudinal
designs in this context. The benefits and applicability of the equal cost sample size is dependent
on the design features of the ALD, thus it is recommended that those looking to implement an
ALD utilize simulation methods, such as the ones presented here, to explore the potential costs
and benefits of implementing and ALD that are specifically tailored to their particular design
262
goals. In the course of writing this dissertation, programs for exploring and simulating ALDs
were created in Stata (aldesign, aldcost, and aldsim) which allow for researchers to explore
various designs on their own. The specifics of these programs can be found in the technical
appendix at the end of this manuscript.
While the methods in this dissertation address the simulation and estimation of
longitudinal trajectories in ALDs, often researchers are not just interested in characterizing a
trajectory but also in testing for between (or within) subject differences in the trajectories based
on some exposure. For accelerated designs, these require further consideration and, in some
instances, can present a barrier to their use. The researcher must consider if the exposure is
equally possible across the age distribution and if not then careful consideration of the design
elements of the ALD become necessary. For example, if an exposure is more likely to occur at
younger ages, then the cohort interval spacing needs to be chosen to ensure that one cohort does
not contain all of the exposed subjects. Even if the exposure is present in the oldest cohorts, if
retrospective data on the exposure does not exists, then individuals belonging to the older cohorts
are more likely to be misclassified and attenuate between-group differences. Moreover, for
exposures that exhibit delayed (or sleeper) effects, the number of measurement occasions needs
to be sufficient to allow for the modeling of within-person changes. The types of measurements
needed to assess exposure for these designs also need to be considered. In the case of the analysis
of marijuana use trajectories from Chapter 8, if we were interested in characterizing early
problem users versus normal users, we would need to gather retrospective information from the
older cohorts on the age at first use. This is because habitual use in the first period of
measurement for the oldest cohort would be around age 18, which may reflect normative
behavior, as opposed to an endorsement of habitual use in the first period for the youngest
cohort, who were twelve. In instances where the exposure is age-dependent and not able to be
retrospectively assessed for some cohorts, alternative means of determining exposure group
membership need to be carefully considered or the ALD abandoned. One option might be to
utilize proxy variables to create a propensity score model based on the cohorts that are measured
within the age-range that is at risk. Applying the model to the cohorts that are outside the
window of risk would allow for estimating exposure group membership; however, the statistical
analysis of this data would need to account for the fact that group membership was estimated for
263
some individuals. None of these issues have been addressed by this dissertation and each require
further consideration in future analyses.
The analyses presented here primarily focus on the use of cohorts and treat these cohorts
as discrete groups. As mentioned in Chapters 2 and 8, cohort membership is a somewhat
arbitrary determination and in these studies was assigned based on calendar year at birth.
However alternate definitions for cohort membership can be considered so long as they are time
dependent such as participation in a war or exposure to a disease. Though in these simulations
cohorts were modeled discretely, cohort membership could be modeled on a continuum as
suggested by O’Brien (2014). This could present theoretical challenges to the disentangling of
age and cohort effects when modeling birth cohort as continuous, thus adequate age variability
within cohorts would need to be considered. O’Brien (2014) also makes mention of the use of
surrogate variables for defining cohort membership. To this end, latent class growth analysis
could be utilized for uncovering homogenous groups with similar developmental patterns as a
means for ‘discovering’ cohort membership. These types of atheoretical approaches can provide
utility for understanding how theoretical cohorts align with empirically defined ones and may
illuminate the underlying characteristics that lead to cohort differences. In the ALD, we specify
cohort effects as random effects in a manner similar to those of Yang & Land (2006; 2008) and
O’Brien, Stockard, and Hudson (2008), however alternate specifications such as fixed cohort
effects could instead be used. The choice for modeling cohort as a random effect was based on
flexibility of the ALD mixed model approach, as this allows for the specification of both age and
historical period fixed effects without resulting in model under identification from the
collinearity of all three effects. As mentioned in Chapters 3 and 8 however, this specification
resolves the statistical problems of the model but not the underlying theoretical concerns. As a
result, this specification may not be the best fit for a particular problem and alternate
specifications fitting cohort, period, and age as either fixed or random effects should be
considered. Additionally, in the case of these designs utilizing a small number of cohorts, the
estimation of cohort random effects may be problematic as there is little data from which to
create the estimates which may contribute to bias in the estimates. Indeed, we saw in Chapter 7
that there was substantial bias in the estimation of between-cohort variance except when very
large differences were present. Yang and Land (2016) proposed a fully Bayesian approach for
these hierarchical models as a means for modeling the uncertainty from small samples with
264
limited numbers of cohorts or periods. An examination of these approaches was an original
intent of this dissertation but was not included due to time constraints. This work should be
pursued along with the aforementioned methods for evaluating the denominator degrees of
freedom in small samples.
Though these analyses primarily focused on the aforementioned cohort effects, historical
period effects can also be relevant to the modeling of longitudinal change. In Chapter 6 we
briefly mention a means for introducing fixed period effects as well as the modeling of age-by-
period interactions. Regardless, the simulations presented did not address how these models
perform in the presence of period effects. This is particularly important as many of the same
outcomes (e.g. substance use; cognition) that are subject to generation-specific differences are
also likely impacted by broader social changes that impact outcomes across the generations. The
ALD is uniquely situated to address the effects from age, period, and cohort because of the
presence of within-subject repeated measurement which eliminates the age-cohort confounding
present in typical cross-sectional studies using APC modeling. In Chapter 8 we examined
substance use outcomes for historical period effects and utilized one specification of the ALD
mixed model. These results were largely inconclusive and highlight one of the previously
mentioned drawbacks of attempting to estimate unique age, period, and cohort effects; mainly
that the estimation methods resolve statistical identification issues but not the underlying
theoretical ones. As a result, estimates from these models are not guaranteed to be unbiased (or
unconfounded) and multiple alternative model specifications need to be explored in order to
understand which specifications provide a better fit to the data while still aligning with theory.
Future work should involve the simulation of these designs with known period effects in order to
understand how alternative model specifications result in changes to power and bias of the
estimates from these models. This line of work would be entirely novel as no research on this
topic currently exists.
9.3 Conclusion
Since Bell’s work on convergence (1953, 1954), accelerated designs have been in the
collective consciousness of researchers studying development. While the virtues of these designs
have historically been recognized (see Rao & Rao, 1966; Schaie & Strother, 1968; Woolson,
Leeper, & Clarke, 1978; Kemper & van’t Hof, 1978; Farrington, 1991; McArdle, Hamagami,
265
Elias, & Robbins, 1991), there have been few methodological papers examining the conditions
necessary for these designs to pose a valid alternative to the single-cohort longitudinal design. In
this manuscript, we expanded on the methodological work of Raudenbush and Chan
(1992;1993), Miyazaki and Raudenbush (2000), Moerbeek (2011), and Galbraith et al. (2017) to
examine how design elements of accelerated designs impact the cost and power relative to
single-cohort designs. We proposed new methods for assessing cost in these designs and
mechanisms for utilizing cost to compare sample size requirements between designs. Using the
framework of age-period-cohort modeling, we proposed methods for simulating and analyzing
these designs based on the mixed model approaches of Yang & Lang (2006; 2008; 2016) and
O’Brien et al. (2008); leading to novel investigations on the performance of these designs in the
presence of between-cohort differences.
In the introduction to this manuscript we described how technological, social, and
cultural changes were interconnected and lead to generational differences in cognitive ability.
We also noted how these same forces manifested in changes to health outcomes. It would be
folly to think we in the present are somehow immune from these same forces that shaped
behavioral and health outcomes in our ancestors. Indeed, one might argue that rate of change
both technologically and culturally has been accelerating; and yet our models for studying
change have not been adapted to capture this. As researchers, our goals towards understanding
within-person change are inherently linked to these technological and social sources of variation
that manifest themselves as cohort differences. As students of development we must seek new
ways to study longitudinal change and embrace this variation into our modeling procedures so
that our estimates remain unbiased and relevant. The accelerated longitudinal design provides
one avenue by which we may achieve this goal.
266
9.4 Chapter References
1. Bell, R. Q. (1953). Convergence: An accelerated longitudinal approach. Child Development,
145-152.
2. Bell, R. Q. (1954). An experimental test of the accelerated longitudinal approach. Child
Development, 281-286.
3. Farrington, D. P. (1991). Longitudinal research strategies: Advantages, problems, and
prospects. Journal of the American Academy of Child & Adolescent Psychiatry, 30(3), 369-
374.
4. Galbraith, S., Bowden, J., & Mander, A. (2017). Accelerated longitudinal designs: an
overview of modelling, power, costs and handling missing data. Statistical Methods in
Medical Research, 26(1), 374-398.
5. Kemper, H. C. G., & Van't Hof, M. A. (1978). Design of a multiple longitudinal study of
growth and health in teenagers. European Journal of Pediatrics, 129(3), 147-155.
6. McArdle, J. J., & Hamagami, F. (1992). Modeling incomplete longitudinal and cross-
sectional data using latent growth structural models. Experimental aging research, 18(3),
145-166.
7. McArdle, J. J., Hamagami, F., Elias, M. F., & Robbins, M. A. (1991). Structural modeling of
mixed longitudinal and cross-sectional data. Experimental Aging Research, 17(1), 29-52.
8. Miyazaki, Y., & Raudenbush, S. W. (2000). Tests for linkage of multiple cohorts in an
accelerated longitudinal design. Psychological Methods, 5(1), 44.
9. Moerbeek, M. (2011). The effects of the number of cohorts, degree of overlap among
cohorts, and frequency of observation on power in accelerated longitudinal
designs. Methodology: European Journal of Research Methods for the Behavioral and Social
Sciences, 7(1), 11.
10. O'Brien, R. M., Hudson, K., & Stockard, J. (2008). A mixed model estimation of age, period,
and cohort effects. Sociological Methods & Research, 36(3), 402-428.
11. O'Brien, R. (2014). Age-period-cohort models: Approaches and analyses with aggregate
data. CRC Press.
12. Rao, M. N., & Rao, C. R. (1966). Linked cross-sectional study for determining norms and
growth rates: A pilot survey on Indian school-going boys. Sankhyā: The Indian Journal of
Statistics, Series B, 237-258.
267
13. Raudenbush, S. W., & Chan, W. S. (1992). Growth curve analysis in accelerated longitudinal
designs. Journal of Research in Crime and Delinquency, 29(4), 387-411.
14. Raudenbush, S. W., & Chan, W. S. (1993). Application of a hierarchical linear model to the
study of adolescent deviance in an overlapping cohort design. Journal of Consulting and
Clinical Psychology, 61(6), 941.
15. Schaie, K. W., & Strother, C. R. (1968). A cross-sequential study of age changes in cognitive
behavior. Psychological Bulletin, 70(6p1), 671.
16. Woolson, R. F., Leeper, J. D., & Clarke, W. R. (1978). Analysis of incomplete data from
longitudinal and mixed longitudinal studies. Journal of the Royal Statistical Society. Series A
(General), 242-252.
17. Yang, Y., & Land, K. C. (2006). A mixed models approach to the age-period-cohort analysis
of repeated cross-section surveys, with an application to data on trends in verbal test
scores. Sociological Methodology, 36(1), 75-97.
18. Yang, Y., & Land, K. C. (2008). Age–period–cohort analysis of repeated cross-section
surveys: fixed or random effects? Sociological Methods & Research, 36(3), 297-326.
19. Yang, Y., & Land, K. C. (2016). Age-period-cohort analysis: New models, methods, and
empirical applications. Chapman and Hall/CRC.
268
Appendix A: Acronyms and Abbreviations
AIC Akaike’s information criteria
ALD accelerated longitudinal design
ALDMM accelerated longitudinal design mixed model
APC age-period-cohort. Typically refers to a design or model to identify these
components
APCMM age-period-cohort mixed model
BIC Bayesian information criteria
CID cohort intercept difference or average difference in intercepts between cohorts
CIP cohort intercept percentage or the average difference in intercepts between
cohorts expressed as a percentage of the youngest cohort
CIR cohort intercept ratio or ratio of between-cohort intercept difference to between-
subject intercept standard deviation
CSD cohort slope difference or average difference in slopes between cohorts
CSP cohort slope percentage or the average difference in slopes between cohorts
expressed as a percentage of the youngest cohort
CSR cohort slope ratio or ratio of between-cohort slope difference to between-subject
slope standard deviation
c1 cost for recruiting a subject
c2 cost for taking a measurement of a subject
c3 cost per year of study
Ci cohort interval spacing or difference in years between cohorts
Cn number of cohorts
EqNc equal cost number of subjects per cohort. Refers to sample sizes in the ALD that
are of equal cost to the sample size in the SCD
GCR growth curve reliability
L the length or duration of the study
M the number of measurements per person or the total number of measurements
depending on context
MAR missing at random
MCAR missing completely at random
269
N total sample size
Nc number of subjects per cohort
Pi period interval spacing or annual frequency of measurement
Pn number of periods or measurements per subject
SCD single-cohort design. A longitudinal design that follows a single cohort
δ growth rate or standardized effect size for the slope
w amount of attrition
γ gamma. Distribution of attrition.
269
Appendix B: Glossary of Terms
attrition percentage the proportion of measurements not taken as a result of attrition
attrition savings the difference in attrition percentage between the ALD and SCD
bias difference between estimated parameter and population parameter
reported as a percentage. A measure of accuracy.
coverage amount of time (or age span) under study in an accelerated design
coverage probability proportion of 95% confidence intervals from the simulations
containing the population parameter. Used as a measure of type 1
error control with values < 0.95 indicating greater type 1 error.
design efficiency the proportion of the age span (coverage) that can be analyzed for
a given study duration (length). More efficient designs will cover a
greater number of years of development with shorter study
duration in the accelerated design.
estimator efficiency standard deviation of the estimated parameter. A measure of
estimator precision.
length the length of time it takes to conduct the accelerated design. Can
also be referred to as study duration
overlap the proportion of overlap in the age distributions between two
successive cohorts in an accelerated design
relative efficiency standard deviation of the estimated parameter in the ALD relative
to the SCD. A measure of estimator precision.
sample growth the percentage increase in the number of subjects that can be
afforded in the accelerated design as a result of cost savings
relative to the single-cohort design.
savings the proportion of SCD costs that are saved by using and ALD.
Also referred to as cost savings.
slope misfit the average difference between an individuals estimated slope from
a mixed model and an individuals estimated slope from a
regression using the individual level data. Used as metric of slope
miscalculation in the ALD mixed model.
270
Technical Appendix: Statistical Programs
Below are the help files for the statistical programs aldesign, aldcost, and aldsim.
The source code for these can downloaded as Stata *.ado files from nicholasjjackson.com.
Title
aldesign - Design elements for accelerated longitidual designs (ALD)
Version
Code Version 1.0 on 09/04/2017 programmed in Stata Version 15.0 by
Nicholas J. Jackson
Syntax
aldesign, cnumber(numlist) cinterval(numlist) pnumber(numlist)
pinterval(numlist) [options]
options Description
Required Options
cnumber(numlist) Number of cohorts.
cinterval(numlist) The interval between cohorts in years.
pnumber(numlist) Number of periods.
pinterval(numlist) The interval between periods in years.
startage(numlist) Age(s) at 1st period for the youngest cohort.
Other Options
startage(numlist) Age(s) of measurement at 1st period.
Default is 0.
startyear(numlist) Year(s) of measurement at 1st period.
graph Produces a graphical depiction of the design.
>
sep For use with graph option.
Seperates panels A and B of the figure
replace Replaces current data with content of output.
>
Description
aldesign illustrates the impact of design parameters for an
accelerated longitudinal design. This program is part of a larger
suite of programs for designing, simulating, and estimating
accelerated longitudinal designs. Please see aldcost, aldsim, and
aldest for more information.
Acronyms
ALD - Accelerated Longitudinal Design.
SCD - Single Cohort Design. Refers to the traditional longitudinal
design comprised of a single cohort.
Options
Required Options
cnumber(numlist); The number of cohorts may be specified as integer
values ≥1. Values can be expressed as a sinlge number cnumber(3),
a list of numbers cnumber(1 3 4), or as a range of consecutive
values cnumber(2(1)6).
cinterval(numlist); The interval spacing (difference in age) between
two adjacent cohorts (in years) may be specified as values ≥0.
Values can be expressed as a sinlge number cinterval(2), a list
of numbers cinterval(0.5 1 3), or as a range of consecutive
values cinterval(0.5(0.25)3).
pnumber(numlist); The number of periods may be specified as integer
values ≥1. Values can be expressed as a sinlge number
pnumber(3), a list of numbers pnumber(1 3 4), or as a range of
consecutive values pnumber(2(1)6).
pinterval(numlist); The interval spacing (difference in age) between
two adjacent periods (in years) may be specified as values ≥0.
Values can be expressed as a sinlge number pinterval(2), a list
of numbers pinterval(0.5 1 3), or as a range of consecutive
values pinterval(0.5(0.25)3).
Other Options
startage(numlist); The age(s) in years at the first period for the
youngest cohort. If left blank, the default startage is 0.
Values can be expressed as a sinlge number startage(10), a list
of numbers startage(8 10 11), or as a range of consecutive values
startage(8(1)12).
startyear(numlist); The year(s) at when the first period of
measurement is to be taken. Only effects the behavior of the
graph option. If not blank, any graphs will be produced with
indexing by the birth year of the cohorts based on the
calculations from startage and startyear. If left blank, the
default action is to define the cohort membership based on the
age of the cohort at the first period of measurement. Values can
be expressed as a sinlge number startyear(2018), a list of
numbers startyear(2020 2023 2024), or as a range of consecutive
values startyear(2020(2)2026).
graph; When specified produces a graph of the design(s) specified.
Panel A shows age on the x-axis and cohort membership on the
y-axis with markers colored by the period of measurement. Panel
B shows period on the x-axis and cohort membership on the y-axis
with markers colored by the age at measurement.
sep; When specified with the graph option, this option will result in
Panels A and B of the figure to be produced seperately rather
than in a single combined figure.
replace; Replaces current data in memory with the output from
aldesign. This is a useful option for those who need to sort or
plot the output values.
Remarks
aldesign computes measures of study coverage, length, efficiency, and
overlap. Study coverage refers to the total number of years of
development under study and is defined as Ci*(Cn-1) + (Pn*Pi-Pi).
Study length is the duration it takes to conduct the ALD and is
defined as Pi*(Pn-1). Efficiency refers to the proportion of coverage
that is reduced by the ALD and is defined as (1 -
(length/coverage))*100. Values range from 0 to 100. An efficieny
value of 50% would indicate that the ALD has reduced the length of
time it takes to study these ages of devlopment by half relative to a
single cohort design. Overlap refers to the proportion of periods
that measure the same ages between sucessive cohorts and is defined
as the number of periods that have this overlap divided by the total
number of periods (Pn).
Examples
A single design with graphical display
. aldesign, cn(5) ci(2) pi(1) pn(8) startage(12) graph
A group of 4 designs with graphical display for each
. aldesign, cn(5) ci(2) pi(.5(.5)2) pn(8) startage(12) graph
A group of designs (192 designs) without graphical display
. aldesign, cn(3(1)6) ci(1(.5)2) pi(.5(.5)2) pn(5(1)8)
startage(12)
A group of designs (192 designs) without graphical display and replac
> ing data in memory with output
. aldesign, cn(3(1)6) ci(1(.5)2) pi(.5(.5)2) pn(5(1)8)
startage(12) replace
Stored Results
aldesign stores the following in r():
Scalars
r(cn) Number of Cohorts
r(ci) Cohort interval spacing
r(pn) Number of Periods
r(pi) Period interval spacing
r(year_start) Year 1st period will be measured
r(age_start) Age of Youngest Cohort at 1st period
r(age_end) Age of Oldest Cohort at Last period
r(coverage) Number of years of development covered
by the ALD (age_end-age_start)
r(length) Study length or duration
r(efficiency) Proportion of 'coverage' reduced by the ALD
r(overlap_years) Years of overlap between sucessive cohorts
r(overlap_periods) # of measurements on the same age that
overlap between sucessive cohorts
r(overlap_prop) Proportion of total Periods that
overlap between sucessive cohorts
Matrices
r(X) Matrix containing all of the scalar outputs
References
Title
aldcost - Costs for Accelerated Longitudinal Designs (ALDs)
Version
Code Version 1.0 on 09/05/2017 programmed in Stata Version 15.0 by
Nicholas J. Jackson
Syntax
aldcost, c1(numlist) c2(numlist) c3(numlist) ncohort(numlist)
cnumber(numlist) cinterval(numlist) pnumber(numlist)
pinterval(numlist) [options]
options Description
Required Options
c1(numlist) Cost per study recruit.
c2(numlist) Cost per measurement.
c3(numlist) Cost per year of study.
ncohorts(numlist) Number of people per cohort.
cnumber(numlist) Number of cohorts.
cinterval(numlist) The interval between cohorts in years.
pnumber(numlist) Number of periods.
pinterval(numlist) The interval between periods in years.
Other Options
prop Cost parameters (c*) are treated as proportions
that must sum to 1 (or 100) to be valid. When
using this option please consider the
scaledcost option below as well.
replace Replaces current data with content of output.
scaledcost Specifies that cost parameters (c*) are treated
as the total costs for that category (e.g.
recruitment, measurement, or duration) rather
than as per recruit, per measurement, or per
year. This is a useful option when specifying
prop. Please see Options below for more
information.
scalerefald For use with scaledcost. Reports SCD costs
relative to ALD rather than the reverse (ALD
costs relative to SCD)
scdexactM Computes exact number of measurements for the
Single Cohort Design (SCD). See Options for
more information.
Attrition Options (all must be specified together)
attrition(numlist) Amount of study attrition (w). Proportion of
participants who drop out at some point.
Details on methods for attrition found in
Remarks.
gamma(numlist) Specifies where drop-out should be
concentrated. γ>1, concentrated towards the
end. γ<1 dropout occurs more often in the
beginning.
attrtype(string) Specifies if attrition applied to the ALD
should be based on the study length in the
ALD or in the Single Cohort Design (SCD).
Options are ald or scd. Details on choosing
a method for attrition is in Options below.
Description
aldcost computes the cost for an accelerated longitudinal design
(ALD) and compares this cost to a single cohort design (SCD)
covering the same age-span. Metrics of cost savings and sample
size in the ALD that is equivalent cost to the SCD are also
presented. This program is part of a larger suite of programs for
designing, simulating, and estimating accelerated longitudinal
designs. Please see aldesign, aldsim, and aldest for more
information.
Acronyms
ALD - Accelerated Longitudinal Design.
SCD - Single Cohort Design. Refers to the traditional longitudinal
design comprised of a single cohort.
Options
Required Options
Note: All numlist values can be expressed as single number (e.g.
ncohort(100)), a list of numbers (e.g. cnumber(1 3 4)), or as a range
of consecutive values (e.g. cinterval(0.5(0.25)3)
c1(numlist); The cost for recruiting 1 subject. Values can be in
dollars or as a proportion of the budget (see prop & scaledcost
options).
c2(numlist); The cost per 1 measurement for a subject. Values can be
in dollars or as a proportion of the budget (see prop &
scaledcost options).
c3(numlist); The cost per 1 year of study duration. Values can be in
dollars or as a proportion of the budget (see prop & scaledcost
options).
ncohort(numlist); The number of subjects per cohort may be specified
as integer values. This program assumes an equal N per cohort
such that the total sample size is defined as Nc*Cn. For those
with designs where this assumtption is not correct (i.e. uneqal N
per cohort) they should modify Nc so that Nc*Cn is equal to the
total sample size they will have.
cnumber(numlist); The number of cohorts may be specified as integer
values ≥1.
cinterval(numlist); The interval spacing (difference in age) between
two adjacent cohorts (in years) may be specified as values ≥0.
pnumber(numlist); The number of periods may be specified as integer
values ≥1.
pinterval(numlist); The interval spacing (difference in age) between
two adjacent periods (in years) may be specified as values ≥0.
Other Options
prop; When specified, indicates that cost values have been specified
as proportions such that c1 + c2 + c3 = 1 (or 100). Only valid
values that sum to 1 (or 100) will be processed. This option is
useful when the cost parameters are specified as a range of
consecutive values (e.g. c1(.1(.05)1)) so that only sensible
results are shown. Please keep in mind that these proportions are
still reflecting the proportion of the budget to be spent per
measurement etc. If instead you would prefer these values to
reflect the proportion of the budget allocated to that category
please see the option scaledcost.
replace; Replaces current data in memory with the output from
aldcost. This is a useful option for those who need to sort or
plot the output values.
scaledcost; Often the cost per measurement, per recruitment, or per
study duration information is not known. Instead we might have
the total cost for that category (e.g. measurement) or a total
proportion of the budget for a category. When this is the case,
the scaledcost option should be specified as this will treat the
cost parameters as the totals for that category. This is very
useful when combined with prop as it allows you to more easily
see where the savings in the ALD occur (in measurement or
duration) relative to the SCD. When scaledcost is specified it is
presumed that the cost parameters (e.g. c1, c2, c3) are refering
to the costs in a single cohort design and thus the ALD costs are
presented relative to SCD. If you would prefer the cost
parameters to refer to the ALD and have these scaled costs for
the SCD calculated relative to the ALD, the additional option
scalerefald should be specified.
scalerefald; For use with scaledcost. Reports the scaled costs for
the SCD referent to the ALD. If this option is not specified, the
default behavior of scaledcost occurs which is to report the
scaled costs for the ALD referent to the SCD.
scdexactM; Computes the exact number of measurements for the single
cohort design. Typically the number of measurements in the SCD is
determined by the equation M=(Pi*(Pn-1) + Ci*(Cn-1))/(Pi + 1).
While in most instances this computation and scdexactM will be
the same, for some designs (particularly ones where Pi < Ci)
there may be additional measurements that occur at ages in the
ALD that would not be captured in the SCD if simply dividing the
age-span by the period interval. The aldesign program may
provide some graphical assistnace in making these decisions. As
an example the command: aldcost, cn(5) ci(0.5) pn(8) pi(2) nc(60)
c1(0.2) c2(0.4) c3(0.4) would ordinarily show 9 measurements in
the SCD or 33 measurements if scdexactM is specified.
Attrition Options
Note: All Attrition Options must be specified together (i.e. cannot
speicify just attrition)
attrition(numlist); Specifies the attrition percent, such that by the
end of the study this % of subjects will have missing data.
Values must be between 0 and 1. This value is called w in
attrition equation in the Remarks section below.
gamma(numlist); Specifies where drop-out should be concentrated.
Values must be > 0. Values > 1 indicate drop-out concentrated
towards the end of the study. Values < 1 indicate drop-out
occurs more often in the beginning of the study. This value is
called γ in attrition equation in the Remarks section below.
attrtype(string); Indicates how attrition in the ALD should be
applied. Values allowed are scd or ald. When scd is specified,
the attrition in the ALD is calculated based on the values at
each measurement in the SCD. For example if there will be 25%
attrition by the end of 10 measurements in the SCD, an ALD with
only 4 measurements will have it's attrition values calculated at
the 4th measurement in the SCD. This will result in the ALD
having less than 25% drop-out by the end of the 4th measurement.
This has the effect of making the ALD a much better choice
relative to the SCD when attrition is applied. Using scd is the
recommended choice for most instances as this assumes that
drop-out is related to overall time in the study rather than
being specific to the study design. Choosing ald will result in
attrition values in the ALD being based on the number of
measurements in the ALD. For example, the attrition in the SCD
will still be calculated in the same manner, such that with 25%
attrition there will be 25% of subjects with some drop-out at the
end of 10 measurements. The attrition for an ALD with only 4
measurements however will now also be caluclated so that there is
25% attrition by the end of the 4th measurement. This implies
that both the SCD and ALD should have the same drop-out rates
despite the SCD being a longer study. However, this option might
be desirable under circumstances where the interest is not
comparing values to the SCD.
Remarks
Cost Calculation
Galbraith, Bowden, and Mander (2017) proposed a model for calculating
the cost of an ALD based on the work of Bloch (1985) which was
comprised of:
CostALD = overhead + (c1*N) + (c2*N*M) + (c3*L)
where c1 is the cost of recruiting a subject, c2 is the cost of
taking a measurement, N is the total number of subjects, M is the
number of measurements per subject, c3 is ongoing cost per year, and
L is the study duration in years. To allow this equation to be
comparable with a single cohort longitudinal design, a modified
version has been created which explicitly incorporates the design
parameters of number of periods (Pn), period interval or years
between periods (Pi), number of cohorts (Cn), cohort interval in
years (Ci), and number of subjects per cohort (Nc). For an
accelerated longitudinal design, the cost function can be expressed
as:
CostALD = overhead + (c1*Cn*Nc) + (c2*Cn*Nc*Pn) +
(c3*[Pi*(Pn-1)])
where Cn*Nc=N, Pn=M, and Pi*(Pn-1)=L. The same design parameters for
the ALD can be used to compare the cost in the single cohort
longitudinal design through the substitution of N, M, and L with the
following:
CostSCD = overhead + (c1*Cn*Nc) + (c2*Cn*Nc*M) + (c3*[Pi*(Pn-1)
+ Ci*(Cn-1)])
It can be noted that overhead and the cost of recruitment is the same
in the ALD and SCD. Differences between the SCD and ALD come from the
shorter length (L) of the ALD as well as fewer measurements. In the
SCD, the number of measurements (M) that would be required to conduct
a traditional longitudinal design at the same ages of measurement as
the ALD is a non-trivial problem. By default, unless scdexactM is
specified, the number of measurements (M) in the SCD is assumed to
be:
M = ([Pi*(Pn-1) + Ci*(Cn-1)] / Pi) + 1
Using these equations, one can examine the cost trade-offs of
conducting an ALD versus a single-cohort longitudinal design at the
same total N.
Cost Savings
Cost savings in the ALD, relative to SCD, are computed as:
Savings(%) = 1 - (CostALD/CostSCD)
Higher values represent greater savings in the ALD. For example, a
value of 50% would indicate that the ALD costs half that of an SCD
covering the same age-span.
Equivalent Cost Number of subjects per Cohort
The cost equations can be set equal to each other and solved for the
Nc that would provide the same cost in the ALD as in the single
cohort design. Solving these equations requires assumptions about
the nature of the costs for these study designs. We assume that the
values for the overhead, recruitment costs per subject (c1), cost per
measurement (c2) and costs per year of study (c3) are equivalent
parameters between the ALD and single-cohort design. We can obtain
the equivalent cost Nc by using the following equation:
ALD Equivalent Cost Nc = (CostSCD - c3*Pi*(Pn-1)) / (c1*Cn +
c2*Cn*Pn)
Computing the equivalent cost Nc allows for the researcher to
understand how funding for a single-cohort design can be used to
create an ALD with a greater total sample size, thereby increasing
study power in the ALD. When conducting simulation studies that
compare an SCD to an ALD, the use of this equivlaent cost Nc may be
important as this allows for the creation of ALDs with N's that are
of equal cost to their single-cohort counterparts which may be a
fairer comparison than simply comparing the designs under equal N's.
Incorporating Attrition
Taking the approach of Galbraith, Bowden, & Mander (2017) that was
originally proposed by Verbeke & Lesaffre (1999), we will utilize a
Weibull model for defining the probabilities of attrition. Each
participant has 1 to M measurements, for which each measurement can
be defined by the probability of dropping out of the study p = {p_1,
... p_j ... , p_M}, where p_j is the probability of having exactly j
measurements. The probabilities in p are then determined based on the
Weibull function. The proportion of individuals who have dropped out
at period t, assuming they remained in the study until period t, will
be defined by:
λγt^(γ-1), where λ = -log(1-w) and w is the amount of attrition
and gamma (γ) the drop-out concentration.
The periods of measurement can be rescaled as proportions from 0 to 1
by using the following formula:
t_j*=(t_j-1)/(M-1)
The first period (t_1) is equal to zero and the last (t_M) is equal
to 1. Implied in this rescaling is that the period intervals are
equivalent across all periods.
Using these rescaled periods, we can compute the proportion of
subjects who drop-out after measurment t_j by:
p_j = (1 - w)^(t_j^γ) - (1 - w)^(t_j+1^γ), where p_M = 1 - w
In this manner, for a given value of w, the dropout serves as a
function of γ whereby when γ=1, the dropout is constant over the
course of study. When γ>1, dropout is concentrated towards the end
of the study and when γ<1 dropout occurs more often in the beginning
of the study. When γ=1, missing data are presumed missing completely
at random (MCAR) and when γ≠1 data are presumed missing at random
(MAR).
Metrics of Attrition
The amount of attrition with regard to the number of meaurements and
number of subjects was computed for the ALD and SCD. The amount of
attrition (w) will be equal to the number of subjects who drop-out by
the end of the study for the SCD. For the ALD, the number of subjects
with attrition at the end of the study will depend on if attrtype was
specified as scr (actual attrition will be lower than specified) or
ald (attrition will be same as in SCD). Additionally, the amount of
attrition with respect to the number of measuements will be computed.
These values will be lower than the specified amount (w) because of
differences in when subjects drop-out resulting in varying numbers of
measuements largely based on the the gamma (γ) value specified.
Examples
Costs for a design without attrition. Cost parameter values are
specified as real dollars.
. aldcost, nc(100) cn(5) ci(2) pi(1) pn(8) c1(100) c2(200)
c3(15000)
Costs for a design without attrition. Cost parameter values will be
varied and treated as proportions summing to 1 AND considered as
proportions representing the total proportion of that budget category
(rather than per recruit, measurement, etc.).
. aldcost, nc(100) cn(5) ci(2) pi(1) pn(8) c1(.1(.1)1)
c2(.1(.1)1) c3(.1(.1)1) prop scaledcost
Costs for a design with 25% attrition towards end of study (gamma=2).
Attrition values for ALD calculated based on SCD. Cost parameter
values are specified as real dollars.
. aldcost, nc(100) cn(5) ci(2) pi(1) pn(8) c1(100) c2(200)
c3(15000) attrition(0.25) gamma(2) attrtype(scd)
Stored Results
aldcost stores the following in r():
Scalars
r(c1) Cost per Recruit (or total recruitment costs
> if scaledcost was specified)
r(c2) Cost per Measurement (or total measurement co
> sts if scaledcost was specified)
r(c3) Cost per Year of Study (or total duration cos
> ts if scaledcost was specified)
r(nc) Number of Subjects per Cohort
r(cn) Number of Cohorts
r(ci) Cohort interval spacing
r(pn) Number of Periods
r(pi) Period interval spacing
r(ald_N) ALD: Total Sample Size (Recruitment N)
r(ald_M) ALD: Total Number of Measurements"
r(ald_L) ALD: Total Study Length
r(ald_costN) ALD: Total Recruitment Costs (c1*ald_N)
r(ald_costM) ALD: Total Measurement Costs (c2*ald_M)
r(ald_costL) ALD: Total Duration Costs (c3*ald_L)
r(ald_cost) ALD: Total Costs
r(scd_N) SCD: Total Sample Size (Recruitment N)
r(scd_M) SCD: Total Number of Measurements"
r(scd_L) SCD: Total Study Length
r(scd_costN) SCD: Total Recruitment Costs (c1*scd_N)
r(scd_costM) SCD: Total Measurement Costs (c2*scd_M)
r(scd_costL) SCD: Total Duration Costs (c3*scd_L)
r(scd_cost) SCD: Total Costs
r(cost_save) ALD Cost Savings % (1- ald_cost/scd_cost)
r(ald_eqNC) ALD: Equivalent Cost N per Cohort
r(attrition) Attrition (w) proportion
r(gamma) Attrition Gamma (γ) value
r(ald_attrM) ALD Attrition: % of Measurements Missing
r(ald_attrN) ALD Attrition: % of Subjects Missing
r(scd_attrM) SCD Attrition: % of Measurements Missing
r(scd_attrN) SCD Attrition: % of Subjects Missing
Matrices
r(X) Matrix containing all of the scalar outputs
References
Bloch, D. A. (1986). Sample size requirements and the cost of a
randomized clinical trial with repeated measurements. Statistics
in Medicine, 5(6), 663-667.
Galbraith, S., Bowden, J., & Mander, A. (2017). Accelerated
longitudinal designs: an overview of modelling, power, costs and
handling missing data. Statistical Methods in Medical Research,
26(1), 374-398.
Verbeke, G., & Lesaffre, E. (1999). The Effect of Drop‐Out on the
Efficiency of Longitudinal Experiments. Journal of the Royal
Statistical Society: Series C (Applied Statistics), 48(3),
363-375.
Title
aldsim - Data Simulation for Accelerated Longitudinal Designs (ALDs)
Version
Code Version 1.0 on 09/29/2017 programmed in Stata Version 15.0 by
Nicholas J. Jackson
Syntax
aldsim, nc(number) cn(number) ci(number) pn(number) pi(number)
[options]
options Description
Design Options
nc(number) Number of people per cohort. Alternately can
specify totn.
totn(number) Total sample size. nc=[totn/cn].
cn(number) Number of cohorts.
ci(number) The interval between cohorts in years. May
default to pi if not specified.
pn(number) Number of periods.
pi(number) The interval between periods in years.
Age Options
agestart(number) Age at 1st period for youngest cohort (starting
age).
agesd(number) Standard deviation of agestart. Used for when
ages are generated from Normal or Truncated
Normal distribution.
agelb(number) Lower Bound for agestart. Specifies minimum age
for the 1st period of youngest cohort. Used
for when ages are generated from Truncated
Normal or Uniform distribution. See Options
for more information.
ageub(number) Upper Bound for agestart. Specifies maximun age
for the 1st period of youngest cohort. Used
for when ages are generated from Truncated
Normal or Uniform distribution. See Options
for more information.
Slope and Intercept Options
slp(number) Age slope.
slpsd(number) Standard deviation of the age slope (a random
slope).
int(number) Intercept value.
intsd(number) Standard deviation of the intercept (a random
intercept).
slpintcor(number) Correlation between random intercept and slope.
Non-Linear Growth Options
nltype(string) Type of non-linear model. Options are log for
logistic-normal (default), exp for
exponential growth, or gom for Gompertz.
alpha(number) Rate of approach to the upper asymptote.
Treated as a fixed parameter unless alphasd
is specified.
alphasd(number) Standard deviation of the alpha parameter.
Specification turns these additive non-linear
models into multiplicative non-linear models.
ageinflect(number) Age at which the rate of change is greatest. By
default this value is constant between
cohorts and specified as the median of the
ages under study.
ageinflectshift(number)
Difference between cohorts in the age at
which the rate of change is the greatest.
Effect Size Options
effsize(number) Effect Size. Defined as effsize = slp/slpsd.
Default=0.5.
gcr(number) Growth curve reliability. Proportion of
variance explained by growth parameters. Can
be specified instead of resid option.
Default=0.8.
resid(number) Standard deviation of the residuals. Can be
specified in place of gcr.
Cohort Differences Options
cid(number) Cohort Intercept Differences. Difference in
intercept between sucessive cohorts.
cidratio(number) Cohort Intercept Difference Ratio. Ratio of cid
to intsd.
csd(number) Cohort Slope Differences. Difference in slope
between sucessive cohorts.
csdratio(number) Cohort Slope Difference Ratio. Ratio of csd to
slpsd.
fixedcohort Specifies fixed cohort differences (as opposed
to random). See Options for more information.
Period Differences Options
period(number) Slope for fixed effect of period.
periodshift(number) Difference in period slope between sucessive
cohorts.
Attrition Options (attrition, gamma, and attrtype must be specified
together)
attrition(numlist) Amount of study attrition (w). Proportion of
participants who drop out at some point.
Details on methods for attrition found in
Remarks.
gamma(numlist) Specifies where drop-out should be
concentrated. γ>1, concentrated towards the
end. γ<1 dropout occurs more often in the
beginning.
attrtype(string) Specifies if attrition applied to the ALD
should be based on the study length in the
ALD or in the Single Cohort Design (SCD).
Options are ald or scd. Details on choosing
a method for attrition is in Options below.
attrstatic Specifies how attrition probabilities should be
applied when choosing which observations to
remove.
Output Options
graph Produces a figure of the simulated data.
print Displays design information on-screen.
Description
aldsim simulates data from an accelerated longitudinal design (ALD)
for linear and non-linear models with and without attrition as
well as with options for cohort and period effects. Data from
this program can then be analyzed using standard mixed model
approaches. This program is part of a larger suite of programs
for designing, simulating, and estimating accelerated
longitudinal designs. Please see aldesign, aldcost, and aldest
for more information.
Acronyms
ALD - Accelerated Longitudinal Design.
SCD - Single Cohort Design. Refers to the traditional longitudinal
design comprised of a single cohort covering the same age span as
the ALD.
Options
Design Options
nc(number); The number of subjects per cohort may be specified as
integer values. This program assumes an equal N per cohort such
that the total sample size is defined as Nc*Cn. For those with
designs where this assumtption is not correct (i.e. uneqal N per
cohort) they should modify Nc so that Nc*Cn is equal to the total
sample size they will have. Alternately used may specify the totn
option which provides the total sample size and computes Nc =
totn/Cn.
cn(number); The number of cohorts may be specified as integer values
≥1.
ci(number); The interval spacing (difference in age) between two
adjacent cohorts (in years) may be specified as values ≥0.
pn(number); The number of periods may be specified as integer values
≥1.
pi(number); The interval spacing (difference in age) between two
adjacent periods (in years) may be specified as values ≥0.
Age Options
agestart(number); The mean age (in years) at the first period for the
youngest cohort. When specified with agesd, a normal distribution
for ages at each period is created. When specified with agesd and
agelb and ageub, a truncated normal distribution is created.
agesd(number); The standard deviation of the age distribution at the
first period for the youngest cohort. When specified with
agestart, a normal distribution for ages at each period is
created. When specified with agestart and agelb and ageub, a
truncated normal distribution is created.
agelb(number); Lower bound (absolute minimum value) for the age
distribution at the first period for the youngest cohort. Must be
specified with ageub. When specified without agestart, agestart
= (ageub - agelb)/2. In absence of agesd, the age distribution is
a uniform distribution over ages agelb to ageub. When specified
with agesd and ageub, a truncated normal distribution is created.
ageub(number); Upper bound (absolute maximum value) for the age
distribution at the first period for the youngest cohort. Must be
specified with agelb. When specified without agestart, agestart
= (ageub - agelb)/2. In absence of agesd, the age distribution is
a uniform distribution over ages agelb to ageub. When specified
with agesd and ageub, a truncated normal distribution is created.
Slope and Intercept Options
slp(number); The value for the slope of age. Created as a random
slope when slpsd is specified.
slpsd(number); The standard deviation of the slope for age
(within-cohort variation). Allows the slope to be specified as a
random variable. Slope can alternately be specified as fixed by
setting slpsd=0.
int(number); The value for the intercept. Created as a random
intercept when intsd is specified.
intsd(number); The standard deviation of the intercept (within-cohort
variation). Allows the intercept to be specified as a random
variable. The intercept can alternately be specified as fixed by
setting intsd=0. Defaults to the value of slpsd if not specified.
slpintcor(number); Correlation between the slope and intercept. Set
to 0 if unspecified.
Non-Linear Growth Options
nltype(string); The type of non-linear model to use. Options are:
log Log-Normal Growth
(default): y= int +
slp/(1+exp(-alpha*(a
> ge-ageinflect)))
exp Exponential Growth:
y= int +
slp*(1-exp(-alpha*(a
> ge-ageinflect)))
gom Gompertz Function:
y= int +
slp*exp(-exp(-alpha*
> (age-ageinflect)))
alpha(number); The rate of approach for the non-linear growth. When
alphasd is not specified, alpha is treated as fixed.
alphasd(number); The standard deviation for the rate of approach.
When alphasd is specified, alpha is treated as random normal
variable and the nonlinear models are considered multiplicative.
ageinflect(number); The age at which the rate of change is the
greatest. Age in the non-linear equations will be centered around
this value. When not specified; ageinflect = (agestart +
pi'*(pn-1) + ci*(cn-1))/2
ageinflectshift(number); Specifies the between cohort differences
(fixed effects) in the ageinflect. Assumes linear effects for
the age-inflection shift such that the age at which the rate of
change is greatest for Cohort n will be: ageinflect* = ageinflect
+ ageinflectshift*(n-1)
Effect Size Options
effsize(number); The effect size for the slope, defined as: effsize =
slp / slpsd. When not specified, default=0.5. The recommended
use of effsize is to specify it in conjunction with slp and
thereby avoid needing to specify the slpsd value.
gcr(number); The growth curve reliability. When not specified,
default=0.8. The GCR is defined as the proportion of variability
explained by the growth parameters and is computed by: gcr =
(intsd^2 + (agestart^2 * slpsd^2) +
(2*agestart*slpintcor*intsd*slpsd)) / (intsd^2 + (agestart^2 *
slpsd^2) + (2*agestart*slpintcor*intsd*slpsd) + resid^2). The
recommended use of gcr is to specify it instead of resid. Please
see Hertzdog, Lindenberger, Ghisletta, and von Oertzen (2006) for
more information on the GCR.
resid(number); Standard deviation of the residuals distributed as a
random normal variable. When gcr is specified, resid will be
computed from the gcr.
Cohort Differences Options
cid(number); The Cohort Intercept Difference is defined as the
difference between sucessive cohorts in their intercept value.
When specified, the intercept for Cohort n is: int* = int +
cid*(n-1). As is apparent from the equation, the difference
between cohorts is treated linearly. By default, cid is drawn
from the positive values of a random normal distribution of mean
0 and sd=cid. That is: cid*=abs(rnormal(0, cid)). As a result,
the actual intercept difference between two sucessive cohorts may
be greater or less than the cid specified. If the intercept
differences are desired to be exactly cid, users should specify
the fixedcohort option.
cidratio(number); The ratio of between-cohort to within-cohort
variance in intercept. Defined as: cidratio = cid / intsd Users
can specify cid with cidratio in place of intsd.
csd(number); The Cohort Slope Difference is defined as the difference
between sucessive cohorts in their slope values. When specified,
the age slope for Cohort n is: slp* = slp + csd*(n-1). As is
apparent from the equation, the difference between cohorts is
treated linearly. By default, csd is drawn from the positive
values of a random normal distribution of mean 0 and sd=csd. That
is: csd*=abs(rnormal(0, csd)). As a result, the actual slope
difference between two sucessive cohorts may be greater or less
than the csd specified. If the slope differences are desired to
be exactly csd, users should specify the fixedcohort option.
csdratio(number); The ratio of between-cohort to within-cohort
variance in slope. Defined as: csdratio = csd / slpsd Users can
specify csd with csdratio in place of slpsd.
fixedcohort; Specifies that cid and csd be treated as fixed values as
opposed to being drawn from a random normal distribution. When
specified the difference bettween two sucessive cohorts in their
intercept and slope values will be exactly cid and csd.
Period Differences Options
period(number); For linear models only. Allows a fixed linear period
effect of the nature period*(period-1) that is constant across
cohorts.
periodshift(number); Used in conjunction with period. Allows for
fixed cohort differences in the period effect such that the
period slope for Cohort n is defined as: period* = period +
periodshift*(n-1).
Attrition Options
Note: The Attrition Options of attrition, gamma, and attrtype must be
specified together.
attrition(numlist); Specifies the attrition percent, such that by the
end of the study this % of subjects will have missing data.
Values must be between 0 and 1. This value is called w in
attrition equation in the Remarks section below.
gamma(numlist); Specifies where drop-out should be concentrated.
Values must be > 0. Values > 1 indicate drop-out concentrated
towards the end of the study. Values < 1 indicate drop-out
occurs more often in the beginning of the study. This value is
called γ in attrition equation in the Remarks section below.
attrtype(string); Indicates how attrition in the ALD should be
applied. Values allowed are scd or ald. When scd is specified,
the attrition in the ALD is calculated based on the values at
each measurement in the SCD. For example if there will be 25%
attrition by the end of 10 measurements in the SCD, an ALD with
only 4 measurements will have it's attrition values calculated at
the 4th measurement in the SCD. This will result in the ALD
having less than 25% drop-out by the end of the 4th measurement.
This has the effect of making the ALD a much better choice
relative to the SCD when attrition is applied. Using scd is the
recommended choice for most instances as this assumes that
drop-out is related to overall time in the study rather than
being specific to the study design. Choosing ald will result in
attrition values in the ALD being based on the number of
measurements in the ALD. For example, the attrition in an SCD
would still be calculated in the same manner, such that with 25%
attrition there will be 25% of subjects with some drop-out at the
end of 10 measurements. The attrition for an ALD with only 4
measurements however would now also be caluclated so that there
is 25% attrition by the end of the 4th measurement. This implies
that both the SCD and ALD should have the same drop-out rates
despite the SCD being a longer study. However, this option might
be desirable under circumstances where the interest is not
comparing values to the SCD.
attrstatic; Removal of observations based on the attrition
probabilities (see Remarks below), by default is based on
comparison to a random uniform distribution. In practice, this
means that from one simulation to the next, the number of
subjects with exactly j measurements will vary. If instead the
user would like the number of subjects with exactly j
measurements to remain the same (static) from one simulation to
the next, the attrstatic should be specified. attrstatic will
compare the attrition probabilies to the value of _n/_N for each
cohort, which will result in the same number of observations
being removed between different simulations with the same design
parameters.
Remarks
Development of the models
The models employed here are based on the age-period-cohort mixed
model specification for aggregate data from O'Brien, Hudson, and
Stockard (2008). By adapting their model to incorporate
within-person repeated measurement, the ALD Mixed Model can be
created.
y_ijk = b0 + b1*age + u0_j + u0_k + u1_j*age + u1_k*age + e_ijk
Where b0 is the fixed intercept, b1 the fixed slope, u0_j the
within-cohort between-person variability in intercept (random
intercept), u0_k the between-cohort variability in intercept (random
intercept), u1_j the within-cohort between person variability in
slope (random coefficient), u1_k the between-cohort variability in
slope (random coefficient), and e_ijk the residuals. Fixed (or
random) effects for period could also be specified in this general
description of the ALD Mixed Model (ALDMM).
Incorporating Attrition
Taking the approach of Galbraith, Bowden, & Mander (2017) that was
originally proposed by Verbeke & Lesaffre (1999), we will utilize a
Weibull model for defining the probabilities of attrition. Each
participant has 1 to M measurements, for which each measurement can
be defined by the probability of dropping out of the study p = {p_1,
... p_j ... , p_M}, where p_j is the probability of having exactly j
measurements. The probabilities in p are then determined based on the
Weibull function. The proportion of individuals who have dropped out
at period t, assuming they remained in the study until period t, will
be defined by:
λγt^(γ-1), where λ = -log(1-w) and w is the amount of attrition
and gamma (γ) the drop-out concentration.
The periods of measurement can be rescaled as proportions from 0 to 1
by using the following formula:
t_j*=(t_j-1)/(M-1)
The first period (t_1) is equal to zero and the last (t_M) is equal
to 1. Implied in this rescaling is that the period intervals are
equivalent across all periods.
Using these rescaled periods, we can compute the proportion of
subjects who drop-out after measurment t_j by:
p_j = (1 - w)^(t_j^γ) - (1 - w)^(t_j+1^γ), where p_M = 1 - w
In this manner, for a given value of w, the dropout serves as a
function of γ whereby when γ=1, the dropout is constant over the
course of study. When γ>1, dropout is concentrated towards the end
of the study and when γ<1 dropout occurs more often in the beginning
of the study. When γ=1, missing data are presumed missing completely
at random (MCAR) and when γ≠1 data are presumed missing at random
(MAR).
Examples
Linear model using a truncated normal age distribution. Default gcr
and effsize values are used. Simulation output paramters printed and
graphed. No between cohort differences in intercept or slope.
. aldsim, nc(100) cn(5) ci(2) pi(2) pn(8) agestart(10)
agesd(0.5) agelb(9.5) ageub(10.5) print graph
Linear model using a truncated normal age distribution. gcr=0.7 and
effsize=3 with slp=10. Simulation output paramters printed and
graphed. No between cohort differences in intercept or slope.
. aldsim, nc(100) cn(5) ci(2) pi(2) pn(8) agestart(10)
agesd(0.5) agelb(9.5) ageub(10.5) gcr(0.7) effsize(3) slp(10)
print graph
Linear model using a truncated normal age distribution. gcr=0.7 and
effsize=3 with slp=10. Simulation output paramters printed and
graphed. Between cohort variance in slope is 4 times greater than
within-cohort slope SD. Not between-cohort differences in intercept.
Cohort differences are specified as fixed parameters (i.e.
fixedcohort option).
. aldsim, nc(100) cn(5) ci(2) pi(2) pn(8) agestart(10)
agesd(0.5) agelb(9.5) ageub(10.5) gcr(0.7) effsize(3) slp(10)
csdratio(4) fixedcohort print graph
References
Galbraith, S., Bowden, J., & Mander, A. (2017). Accelerated
longitudinal designs: an overview of modelling, power, costs and
handling missing data. Statistical Methods in Medical Research,
26(1), 374-398.
Hertzog, C., Lindenberger, U., Ghisletta, P., & Oertzen, T. V.
(2006). On the power of multivariate latent growth curve models
to detect correlated change. Psychological Methods, 11(3), 244.
O'Brien, R. M., Hudson, K., & Stockard, J. (2008). A mixed model
estimation of age, period, and cohort effects. Sociological
Methods & Research, 36(3), 402-428.
Verbeke, G., & Lesaffre, E. (1999). The Effect of Drop‐Out on the
Efficiency of Longitudinal Experiments. Journal of the Royal
Statistical Society: Series C (Applied Statistics), 48(3),
363-375.
Abstract (if available)
Abstract
Longitudinal designs are the gold standard for researchers studying within-subject changes in age-related development. These designs are typically conducted using a single cohort followed for a fixed period of time. However, single-cohort designs often necessitate a lengthy time commitment from participants, sponsors, and researchers which make them vulnerable to greater attrition and even premature termination. The time commitment for these designs also means that the results may be obsolete by the time they are published, particularly if the outcomes under study are sensitive to generational differences. Bell (1953) proposed the use of an Accelerated Longitudinal Design (ALD) as a means to generate age-based trajectories over a shortened duration to combat these issues. In the ALD multiple birth-cohorts are studied simultaneously in a longitudinal fashion with overlap in the age distributions between the cohorts. In this manner the same age span may be studied while reducing the number of measurements per participant, the study duration, and study costs. These designs also allow for the modeling of between-cohort differences, which are important for researchers interested in developing age-based trajectories that generalize to multiple cohorts. While models that incorporate cultural influence are increasingly relevant, there has not yet been widespread adoption of these designs. Part of the hesitancy to use ALDs stems from their unfamiliarity, as few methodological papers have demonstrated the efficacy of these designs for studying development. We propose the use of cost equations to utilize the cost-savings of the ALD to determine sample sizes that are of equal cost to a single-cohort design. The use of an equal cost sample size allows for ALDs to have N’s that are 10-85% larger than in the single-cohort design, thereby offsetting the potential loss of power in the ALD. We subsequently utilize Monte Carlo simulation methods to demonstrate how the statistical power and bias in the ALD is comparable to that of the single-cohort design for both linear and nonlinear models and discuss considerations for when between-cohort differences in development are present. Lastly, we use data from the National Longitudinal Survey of Youth (NLSY 1997) to demonstrate the ability of an ALD to capture both within-person and between-cohort variability in marijuana and tobacco use from the ages of 12 to 32. We additionally discuss considerations for the modeling of cohort membership and alternate strategies for cohort inclusion. Results from the simulations and in the NLSY suggest that ALDs should be the preferred longitudinal design for researchers studying age-related development.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Bayesian multilevel quantile regression for longitudinal data
PDF
Effect of measurement error on the association between baseline and longitudinal change
PDF
Antecedents of marijuana initiation
PDF
Evaluating the associations between the baseline and other exposure variables with the longitudinal trajectory when responses are measured with error
PDF
Estimation of nonlinear mixed effects mixture models with individually varying measurement occasions
PDF
Design and testing of SRAMs resilient to bias temperature instability (BTI) aging
PDF
Associations between longitudinal loneliness, epigenetic age, and dementia risk
PDF
Sampling strategies based on existing information in nested case control studies
PDF
On the latent change score model in small samples
PDF
The relationship between pubertal timing and delinquent behavior in maltreated male and female adolescents
PDF
Sources of stability and change in the trajectory of openness to experience across the lifespan
PDF
Dynamic analyses of the interrelationship between mothers and daughters on a measure of depressive symptoms
PDF
A series of longitudinal analyses of patient reported outcomes to further the understanding of care-management of comorbid diabetes and depression in a safety-net healthcare system
PDF
Adolescent conduct problems and substance use: an examination of the risk pathway across the transition to high school
PDF
Phase I clinical trial designs: range and trend of expected toxicity level in standard A+B designs and an extended isotonic design treating toxicity as a quasi-continuous variable
PDF
Developmental trajectories of sensory patterns in young children with and without autism spectrum disorder: a longitudinal population-based study from infancy to school age
PDF
Examining the longitudinal relationships between community violence exposure and aggressive behavior among a sample of maltreated and non-maltreated adolescents
PDF
Selected papers on the evaluation of healthcare costs of prematurity and necrotizing enterocolitis using large retrospective databases
PDF
Effects of stress and the social environment on childhood asthma in the children' s health study
PDF
Understanding the dynamic relationships between physical activity and affective states using real-time data capture techniques
Asset Metadata
Creator
Jackson, Nicholas J.
(author)
Core Title
The design, implementation, and evaluation of accelerated longitudinal designs
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Psychology
Publication Date
07/26/2018
Defense Date
05/09/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
accelerated design,age-related change,longitudinal,mixed effects,Monte Carlo,OAI-PMH Harvest,sequential design
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
John, Richard (
committee chair
), Berhane, Kiros (
committee member
), Leventhal, Adam (
committee member
), Manis, Frank (
committee member
), Wilcox, Rand (
committee member
)
Creator Email
nicholas.jackson@usc.edu,njacks@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-26203
Unique identifier
UC11668790
Identifier
etd-JacksonNic-6493.pdf (filename),usctheses-c89-26203 (legacy record id)
Legacy Identifier
etd-JacksonNic-6493.pdf
Dmrecord
26203
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Jackson, Nicholas J.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
accelerated design
age-related change
longitudinal
mixed effects
sequential design