Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Lifestyle-related exposures and diseases in twins
(USC Thesis Other)
Lifestyle-related exposures and diseases in twins
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Lifestyle-related Exposures and Diseases in Twins
by
Yang Yu
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MOLECULAR EPIDEMIOLOGY)
August 2018
Copyright 2018 Yang Yu
ii
Acknow le dg emen t s
The research included in this dissertation could not have been performed if not for the assistance,
patience, and support of many individuals. First and foremost, I would like to thank to my advisor, Dr.
Wendy Cozen, for the continuous support of my Ph.D. study and research, for your patience, motivation,
enthusiasm, and immense knowledge. Your guidance helped me in all the time of research and writing of
this dissertation, and for that I sincerely thank you for your confidence in me. Besides, tons thanks for
your warmest support when my life was in crisis. As an international student, far from all my families, you
are not just my research mentor but also like a family to me. Thank you!
I would also like to thank my esteemed committee members: Dr. Thomas M Mack, Dr. Joshua
Millstein, Dr. Jane Figueiredo, Dr. William Depaolo and Dr. Andre J Ouellette. I am so grateful for your
enduring support and critical feedback. Dr. Mack, thank you for starting the California Twin Program and
creating such precious source in twins. Without it, my whole dissertation would not exist at all. Dr.
Millstein, I am grateful to you for working with me together and standing by my side at the microbiome
workshop and conference, and for all your help on the statistical methods in this dissertation. Dr.
Figueiredo, I am so appreciative of your infectious enthusiasm and motivation on our birth anomaly
project. Thanks to Dr. William Depaolo and Dr. Andre J Ouellette for your insightful comments as experts
in microbiology. It is my honor to have you all serve as members of my committee.
Many thanks to my wonderful current and past colleagues, Amie Hwang, Jun Wang, Niquelle
Brown, Laura Buchanan, Manny Garcia, David Hyde, and Adam Muench for your daily support and
encouragement.
I would also like to acknowledge my dearest friends, Mei Gong, Shu Cao, Zilu Zhang and many
others, for our best friendships and for being there support me whenever I need.
Finally, I would like to extend my deepest gratitude to my parents Pinghua Yang and Shouming
Yu. Without your love, support and understanding, I could never have completed this doctoral degree.
iii
Table of Contents
Acknowledgements ......................................................................................................................... ii
List of Tables .................................................................................................................................. vii
List of Figures ................................................................................................................................. ix
Abstract ........................................................................................................................................... x
Chapter 1 Twin studies and California Twin Program ............................................................. 1
1.1 Twin Studies ........................................................................................................................................ 1
1.1.1 Twins ............................................................................................................................................ 1
1.1.2 History of Twin Studies ................................................................................................................ 2
1.1.3 Twin study designs in epidemiology studies ............................................................................... 3
1.1.4 Advantages of Twin Studies ........................................................................................................ 8
1.1.5 Limitations of Twin Studies ....................................................................................................... 10
1.2 California Twin Program ................................................................................................................... 10
Chapter 1 References ............................................................................................................................. 13
Chapter 2 Birth anomalies in monozygotic and dizygotic twins: Results from the California
Twin Registry ...................................................................................................................... 19
2.1 Abstract ............................................................................................................................................. 19
2.2 Introduction ..................................................................................................................................... 19
2.2.1 Overview of Birth Anomalies ..................................................................................................... 19
2.2.2 Selected types of Birth Anomalies ............................................................................................. 21
2.2.3 Birth Anomalies in Twins ........................................................................................................... 33
2.3 Materials and Methods .................................................................................................................... 34
2.3.1 Study Population ........................................................................................................................ 34
2.3.2 Birth Anomalies .......................................................................................................................... 35
2.3.3 Covariates .................................................................................................................................. 35
2.3.4 Statistical Analysis ...................................................................................................................... 37
2.4 Results .............................................................................................................................................. 39
2.4.1 Study Population Characteristics .............................................................................................. 39
2.4.2 Prevalence and Trend in Selected Birth Anomalies ................................................................... 39
iv
2.4.3 Pairwise Concordance of Selected Birth Anomalies .................................................................. 40
2.4.4 Parental Exposures .................................................................................................................... 41
2.4.5 Birth Characteristics .................................................................................................................. 42
2.4.6 Sensitivity Analyses .................................................................................................................... 42
2.5 Discussion .......................................................................................................................................... 43
Chapter 2 References ............................................................................................................................. 53
Chapter 3 Epidemiology of Colorectal Cancer (CRC) ............................................................. 80
3.1 CRC around the World ...................................................................................................................... 80
3.2 CRC in the United States .................................................................................................................. 81
3.2.1 CRC in California ......................................................................................................................... 83
3.3 Risk factors ........................................................................................................................................ 84
3.3.1 Non-Modifiable Risk Factors ...................................................................................................... 84
3.3.2 Modifiable Risk Factors ............................................................................................................. 87
3.4 Conclusions ...................................................................................................................................... 94
Chapter 3 References ............................................................................................................................. 95
Chapter 4 Lifestyle Factors and Colorectal Cancer risk in California Twins ........................... 102
4.1 Abstract ........................................................................................................................................... 102
4.2 Introduction ................................................................................................................................... 103
4.3 Materials and Methods .................................................................................................................. 106
4.3.1 Study Population ...................................................................................................................... 106
4.3.2 Case Ascertainment ................................................................................................................ 107
4.3.3 Personal and Lifestyle Factors ................................................................................................. 109
4.3.4 Statistical Analysis .................................................................................................................... 112
4.4 Results ............................................................................................................................................ 113
4.4.1 Study Population Characteristics ............................................................................................ 113
4.4.2 Personal and Lifestyle Factors Associated with CRC ................................................................ 115
4.4.3 Sensitivity Analysis ................................................................................................................... 117
4.5 Discussion ........................................................................................................................................ 117
Chapter 4 References ........................................................................................................................... 123
v
Chapter 5 Introduction of Colorectal Polyps and the Gut Microbiome ................................. 137
5.1 Colorectal Polyps and Lesions ......................................................................................................... 137
5.1.1 Introduction ............................................................................................................................. 137
5.1.2 Neoplastic Colorectal Polyps ................................................................................................... 139
5.1.3 CRC Screening by Colonoscopy ............................................................................................... 145
5.2 The Gut Microbiome ...................................................................................................................... 149
5.2.1 The Gut Microbiome in Human Health .................................................................................... 150
5.2.2 Current Concerns in a Gut Microbiome Study ........................................................................ 159
5.3 Current Knowledge on the Gut Microbiome, Colorectal Polyps and CRC ..................................... 179
5.3.1 The Gut Microbiome and CRC ................................................................................................. 179
5.3.2 The Gut Microbiome and Colorectal Polyps ........................................................................... 186
5.3.3 Driver-Passenger Bacterial Model ........................................................................................... 189
5.3.4 Clinical Applications ................................................................................................................. 191
Chapter 5 References ........................................................................................................................... 195
Chapter 6 Persistent Fecal Microbiome Alteration among California Twins with Colorectal
Adenomas History ............................................................................................................. 227
6.1 Abstract ........................................................................................................................................... 227
6.2 Introduction ................................................................................................................................... 228
6.3 Materials and Methods .................................................................................................................. 231
6.3.1 Study Population ...................................................................................................................... 231
6.3.2 Colorectal Polyps Ascertainment ............................................................................................ 233
6.3.3 Fecal Samples Collection .......................................................................................................... 233
6.3.4 16S rRNA Gene Sequencing Analysis ....................................................................................... 234
6.3.5 Covariates Measures................................................................................................................ 236
6.3.6 Statistical Methods .................................................................................................................. 236
6.4 Results ............................................................................................................................................ 238
6.4.1 Study Population Characteristics ............................................................................................ 238
6.4.2 Diversity Analysis ..................................................................................................................... 240
6.4.3 Compositional Analysis ............................................................................................................ 240
6.4.4 Fecal Microbiota Stability ........................................................................................................ 242
6.4.6 Sensitivity Analysis (Previous Adenomas) ................................................................................ 243
6.4.7 Sensitivity Analysis (Excluding Genus Bacteroides) ................................................................. 244
vi
6.4.8 Diet and the Fecal Microbiota ................................................................................................. 245
6.5 Discussion ........................................................................................................................................ 246
Chapter 6 References ........................................................................................................................... 255
Appendix A Zero-inflated negative binomial mixed model ................................................................... 280
Chapter 7 Summary and Future Perspectives ..................................................................... 281
Chapter 7 References ............................................................................................................................ 288
vii
List of Tables
Table 1.1. Published studies in the California Twin Program (CTP) ......................................................... 16
Table 2.1. Demographic characteristics by birth anomaly status ........................................................... 63
Table 2.2. Frequency of birth anomalies among twin pairs by concordance and zygosity .................... 64
Table 2.3. Co-occurrence of multiple birth anomalies in the same individual ........................................ 65
Table 2.4. Co-occurrence of birth anomalies in twin pairs ..................................................................... 66
Table 2.5. Pairwise Concordance Ratio between MZ and DZ like-sex twins for each birth anomaly ..... 67
Table 2.6. Parental smoking status and risk of birth anomalies ............................................................. 68
Table 2.7. Pairwise Concordance Ratio between MZ and DZ like-sex twins for each birth anomaly,
stratified by parental smoking status. ..................................................................................... 69
Table 2.8. Maternal age and risk of birth anomalies ............................................................................... 70
Table 2.9. Parental education and risk of birth anomalies ..................................................................... 71
Table 2.10. Low birth weight and risk of birth anomalies ......................................................................... 72
Table 2.11. Birth order and risk of birth anomalies .................................................................................. 73
Table 2.12. Sensitive analysis (Double-respondent pairs only): Birth anomalies among twins by zygosity
................................................................................................................................................. 74
Table 2.13. Sensitive analysis (Double-respondent pairs only): Pairwise Concordance Ratio between MZ
and DZ like-sex twins for each birth anomaly ........................................................................ 75
Table 2.14. Sensitive analysis (Double-respondent pairs only): Parental smoking status and risk of birth
anomalies ............................................................................................................................... 76
Table 2.15. Percentage agreement for the shared birth-related factors or parental exposures within
double-respondent twin pairs ................................................................................................ 77
Table 2.16. Comparisons of heritability estimation for each selected birth anomaly using concordance
methods and structural equation modelling (SEM) method ................................................. 78
Table 3.1. Known lifestyle factors related to CRC risk ........................................................................... 101
Table 4.1. Characteristics of 90 double-responded like-sex twin pairs discordant for CRC .................. 126
Table 4.2. By tumor subsites: Characteristics of 90 double-responded like-sex twin pairs discordant for
CRC ........................................................................................................................................ 127
Table 4.3. Lifestyle factors and the risk of CRC ..................................................................................... 128
Table 4.4. By tumor subsites: Lifestyle factors and the risk of CRC ...................................................... 129
viii
Table 4.5. Dietary factors and the risk of CRC ......................................................................................... 130
Table 4.6. By tumor subsites: Dietary factors and the risk of CRC .......................................................... 131
Table 4.7. Relative lifestyle factors and the risk of CRC .......................................................................... 132
Table 4.8. By tumor subsites: Relative lifestyle factors and the risk of CRC ........................................... 133
Table 4.9. Sensitivity analysis: Lifestyle factors and the risk of CRC ....................................................... 134
Table 4.10. Sensitivity analysis: Dietary factors and the risk of CRC ........................................................ 135
Table 4.11. Sensitivity analysis: Relative lifestyle factors and the risk of CRC ......................................... 136
Table 5.1. Most common diversity measurements in a human microbiome analysis ........................... 217
Table 5.2. Published human studies of gut microbiota and colorectal polyps: Study design and methods
................................................................................................................................................ 218
Table 5.3. Findings on fecal microbiota diversity and composition associated with colorectal polyps from
8 published similar studies ..................................................................................................... 224
Table 5.4. List of fecal bacteria associated with colorectal polyps from at least 2 independent studies
selected from 7 fecal microbiota studies using 16S rRNA gene sequencing ......................... 226
Table 6.1. Demographic and polyp characteristics of the study participants ......................................... 262
Table 6.2. Differentially abundant taxa by polyp status (1
st
fecal collection). ......................................... 266
Table 6.3. Demographic characteristics of the subset study participants who completed 2
nd
fecal
collections ............................................................................................................................... 267
Table 6.4. Differentially abundant taxa by polyp status (2
nd
fecal collection). ........................................ 269
Table 6.5. Differentially abundant taxa by polyp status (1
st
fecal collection) for previous adenomas
comparison ............................................................................................................................. 272
Table 6.6. Alpha diversity and demographic characteristics ................................................................... 276
Table 6.7. Dietary changes since colonoscopy ......................................................................................... 279
ix
List of Figures
Figure 2.1. Five-year prevalence rate of selected birth anomalies (per 1,000 persons) ......................... 79
Figure 6.1a. Flowchart of the overall study population nested in the CTP: Participants selection ........ 260
Figure 6.1b. Flowchart of the overall study population nested in the CTP: Sample collections ............ 261
Figure 6.2. Alpha diversity (1
st
fecal collection) by polyp status ........................................................... 263
Figure 6.3. Weighted UniFrac-based PCoA (1
st
fecal collection) by polyp status .................................. 264
Figure 6.4. Genus-level microbial composition (1
st
fecal collection) by polyp status ............................ 265
Figure 6.5. Alpha diversity and Beta diversity of fecal microbiota (2
nd
fecal collection) ...................... 268
Figure 6.6. Fecal microbial stability between 2 collections and between individuals ........................... 270
Figure 6.7. Alpha diversity and Beta diversity of fecal microbiota (1
st
fecal collection) for previous
adenomas comparison ........................................................................................................ 271
Figure 6.8. Nine species of the genus Bacteroides (1
st
fecal collection) ............................................... 273
Figure 6.9. Alpha diversity of fecal microbiota (1
st
fecal collection) after excluding the genus
Bacteroides ......................................................................................................................... 274
Figure 6.10. Weighted UniFrac-based PCoA (1st fecal collection) after excluding the genus Bacteroides
............................................................................................................................................. 275
Figure 6.11. Weighted UniFrac-based PCoA (1st fecal collection) by demographic characteristics ....... 277
x
Abstract
Twin studies have been a valuable source of selected sampling for identification of genetic
heritability of complex traits controlling for early environmental exposures or examination of associations
between environmental factors and diseases at least partially controlling for genetic background. The
California Twin Program (CTP) is a population-based cohort representing a sample of twins born in
California between 1908 and 1982 created by Dr. Thomas Mack and his colleagues at USC. In this
dissertation, I selected different subsets of twins from the CTP for 3 studies, including: 1) A cross-sectional
study with the goal of describing the prevalence of selected birth anomalies in California Twins and
investigating the effect of heritability and potential parental exposures on the occurrence of each selected
birth anomaly; 2) A nested matched case-control study with the goal of examining the associations
between various lifestyle factors and prospectively identified colorectal cancer in twins from the California
Cancer Registry (CCR) and Cancer Surveillance Program (CSP) match with the CTP; and 3) A cross-sectional
study with the goal of investigating the relationship the fecal microbiome and colorectal polyps history in
monozygotic twins. I hope to show the merit of twin studies in this genome sequencing era and to
contribute knowledge about causes of disease in relation to various lifestyle factors in a naturally genetic-
controlled setting, thus eventually providing insight to develop strategies for disease prevention, diagnosis
and therapy.
1
Chapter 1 Twin studies and California Twin Program
1.1 Twin Studies
The classic twin study is established as the ideal study design with which to identify the common
genetic or environmental factors associated with a disease or phenotype. Most diseases, particularly
chronic diseases, are multifactorial complex phenotypes, which are caused from multiple genetic
variants, early-life shared exposures, and unique environmental exposures in the later life. The familial
aggregation of complex disease is not due to only genetics but also the combined effect of common
environmental factors that are shared in a family. Because of the complexity of disease, a standard
epidemiologic study commonly suffers from a lack of power, biases in measurements and most of all,
residual confounding from the failure to account for all the factors associated with the disease process.
By the natural matching on age, race/ethnicity, genetics and other early exposures in twins, twin studies
can disentangle the shared genetic and environmental factors of a disease of interest with the improved
power and reliable resource. Therefore, twin studies continue to play an important role in the
postgenomic era, especially in the study of complex traits and diseases.
1.1.1 Twins
Twins are two living births produced by the same pregnancy. Twins can be derived from a single
fertilized egg that then splits and develops into two embryos, leading to genetically identical
monozygotic (MZ) or identical twins. Alternatively, twins can be derived from two different eggs
fertilized by separated sperms at the same time that develops to two embryos, producing dizygotic (DZ)
or fraternal twins who share approximate 50% genome and are genetically no different from the
2
ordinary siblings [1]. Therefore, the zygosity of twins indicate the degree of identity in genome in twins.
By the nature of genetics, DZ twins can be of like or unlike sex, while MZ twins can only be like sex pairs.
Since 1970s, the natural twinning rates have been highest in some African countries (17 or more
per 1000 live births), followed by Europe, USA and India (9-16 per 1000 live births), and East Asia and
Oceania (less than 8 per 1000 live births) [2-5]. This rate varies by zygosity: The MZ twinning rate is
consistently 3-4 per 1000 live births across the world, while the DZ twinning rate accounts for most of
the variation among races [4, 6, 7]. The exact mechanism for the different geographic twinning rates is
still unknown. The current evidences shows that DZ twinning births are more common among older
mothers who had in vitro fertilization (IVF) techniques [6], whereas, the division of zygote into two
embryos among MZ twins is considered as a spontaneous random event [6]. In the United States, the
twin birth rate was 33.3 per 1000 live births in 2009, which rose 76% from 1980 [8]. The rising twinning
trends have been attributed to the advanced maternal age, prenatal multivitamin supplementation,
fertility drugs or treatments, and family history of multiple births [4, 5].
1.1.2 History of Twin Studies
The earliest documentation of twin studies goes back to 5
th
century BCE when Hippocrates
described that shared material environment was attributable to disease similarity in twins [9]. In 1869,
British scientist Sir Francis Galton became the pioneer first using twins to study genetic and
environmental effects on human development and behavior. Galton proposed that twins who were
initially similar due to their shared early environment, and then grew to be different due to non-shared
environment in their adulthood once they left their parents’ home. However, his work did not
differentiate between identical or MZ twins and DZ twins [10]. Until 1924, Curtis Merriman was the first
to publish the physiological difference between MZ and DZ twins, in which MZ twins are genetically
3
identical because they develop from the same ovum and sperm while DZ twins are genetically no
different from ordinary siblings because they develop from a different ovum and sperm [10, 11]. In the
same year, German dermatologist Hermann Werner Siemens published a paper to describe the classic
twin method. Hermann described the difference in the concordance of physical traits between MZ and
DZ twins and suggested that this difference reflected the heritability of the trait [10, 12]. As a result,
since MZ twins share, on average, twice the genome content compared to DZ twins, two-fold
concordance rate ratio for a dichotomous trait or phenotype among MZ twins compared to DZ twins
suggests that the trait or phenotype is partially heritable.
Since 1980s, due to the need of the large sample size in genetic epidemiology studies, twin
registries emerged worldwide. To date, more than 50 twin registries have been founded in countries
including the United States, Taiwan, Japan, Korea, and several European countries [13-15]. As a valuable
resource for scientific research, twin registries have provided significant contributions to a wide range of
research areas from prenatal to adulthood in the traits of biomedical and social importance. Many
registries keep actively running and recruiting for collections of longitudinal phenotypic data with
biological materials such as blood and tissues from twins in this genome sequencing and omics era. For
this review, 36229 entries for “twins” in human research are listed in PubMed, increased from 713
entries for 1990 [14].
1.1.3 Twin study designs in epidemiology studies
The power of twin study designs arises from the fact that they are their fully (MZ twins) or
partially (DZ twins) genetically matched and share many early life exposures, thus the relative
differences within a twin pair can be attributed to their unique experience. As a result, by comparing the
similarities and differences within twin pairs (e.g. MZ twins) and between twin pairs (e.g. MZ vs. DZ twin
4
pairs or disease concordant and discordant pairs), we are able to disentangle the effects of the heritable
genetic factors, shared environment, and unshared individual environment on the disease etiology.
Many innovative study designs have been developed to utilize the unique properties of twins to
investigate genetic and environmental influences in human complex diseases [15-18]. The following
discussion will focus on study designs that are relevant to this dissertation.
Classic twin studies for genetic effects. As discussed above, the classic twin study is based on the
phenotypic comparisons between MZ and DZ twins. MZ twins develop from the same fertilized egg and
share identical genotypes and family circumstances. Therefore, as above, any phenotypic differences
between MZ twins may likely be due to their environmental exposures. In contrast, DZ twins are like
regular siblings who share on average 50% genetic materials. Meanwhile, DZ twins also are the same
age and share to a degree the same early life shared exposures in family. Under the assumption that
both MZ and DZ twins share unmeasured environmental factors to the same extent (“Equal
Environment Assumption”, EEA), any additional resemblances for a trait or phenotype among MZ twins
compared with DZ twins suggest heritable factors, such as genetic contribution or shared environmental
exposure in family.
For dichotomous traits such as the presence or absence of a disease, the most common method
is to count the disease concordant twin pairs, in whom both members of the twin pair have developed
the disease. Thus, the concordance can be compared between MZ and DZ twins. Given the method of
twin ascertainment, two measures as follows are used to estimate twin concordance: pairwise
concordance and probandwise concordance [19]:
𝑃𝑎𝑖𝑟𝑤𝑖𝑠𝑒 𝑐𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑐𝑒 =
No. concordant pairs (C)
No. concordant pairs (C) + No. discordant pairs (D)
𝑃𝑟𝑜𝑏𝑎𝑛𝑑𝑤𝑖𝑠𝑒 𝑐𝑜𝑛𝑐𝑜𝑟𝑑𝑎𝑛𝑐𝑒 =
2 x No. concordant pairs (C)
2 x No. concordant pairs (C) + No. discordant pairs (D)
5
Pairwise concordance and probandwise concordance are computed depending on the different
methods to assemble the twin study population, also called twin ascertainment. If a cohort of twins is
created based on the individual twin’s status, such as from birth records, and then followed
prospectively to identify the disease status, this twin ascertainment is considered to be random or
complete due to the population-based property. Under random or complete ascertainment, pairwise
concordance is unbiased and sufficient to estimate the disease concordance [19]. However, if a cohort is
selected based on their disease status and then linked to birth records to identify the twin status, this is
an incomplete ascertainment given that disease status of each twin contributes to a probability of being
recruited into the study [20]. In this situation, the members of the disease-concordant twin pairs will
appear twice in the disease registry, while the discordant twins will appear once since only one twin
member has the disease. As a result, in a pair, concordant twin pairs are twice as likely to be recruited in
the study as discordant twin pairs. Therefore, under incomplete ascertainment, probandwise
concordance is the only appropriate measure to estimate the disease concordance [19, 20]. For either
measure, when EEA is assumed, a two-fold or higher excess in a disease concordance among MZ twins
compared to DZ twins suggests a contribution from common genetic factors shared in twins to the
disease etiology [21]. Witte et al. derived the formula to calculate the estimators for both concordance
measures and likelihood-based statistical tests for the difference in the concordances between MZ and
DZ twins [19].
Two major concerns should be addressed in this classic twin study design. First, the key
assumption to correctly interpret the result is the equal environment assumption (EEA) that is MZ and
DZ twins should equally share the unmeasured non-genetic factors. However, some have shown that MZ
twins may experience more environmental similarities than DZ twins, since parents may treat MZ twins
more similarly than DZ twins and MZ twins are likely to stay closer with each other than DZ twins do and
thus influence each other’s behaviors as peers [21, 22]. In this case, EEA assumption is not met. Hence,
6
the excess concordance for disease observed in MZ twins may be attributed to the combined effect of
the greater genetic and environmental factors. Additionally, studies conducted among twins with
wrongly assigned zygosities showed the violation of EEA might not lead to substantial bias in the
interpretation of the data [23]. Thus, the result from the classic twin studies should be interpreted with
caution. The concordance estimator is a crude measure of genetic effect on a trait of interest. The
concordance for MZ twins sets the upper limit on the individual risk attributable to genetic factors. The
difference in concordances between MZ and DZ twins cannot be interpreted as the exact proportion of
genetic contribution to the trait variations, since the absolute values of concordance are influenced by
the underlying prevalence of the trait [18]. Nowadays, other more sophisticated methods, such as
variance components approaches, have been developed to estimate heritability of a disease, defined as
the proportion variance attributable to genetic factors [24]. From our point of view, those methods are
not improvements from the simple concordance measures of the estimations for a dichotomized
disease, since they suffer from the same limitations with the additional assumptions of the underlying
continuous and normally distributed “liability” variable that designates disease for people with its value
above a given threshold. The true estimate for heritability has to be on the basis of human whole
genome sequence.
Case-control studies for environmental effects. Comparisons between members of twin pairs are
useful for investigating environmental effects on a trait or phenotype. In a matched case-control study
using disease or trait discordant twin pairs, the twin with the disease or trait is designated as the “case”
and their unaffected co-twin is designated as the “control”. The individual exposures can be measured in
the usual way as in any case-control study. This study design is also called co-twin control method,
which is equivalent to any standard matched case-control study, except that twins are naturally
matched on partial (for DZ) or full (for MZ) genotypes, age, race/ethnicity, early life exposures and other
family circumstances. Correspondingly, a conditional logistic regression is used to estimate the effect
7
size. Since the unshared environmental factor is the explanation for the disease discordant MZ twins and
the combined effect of unshared genetic and environmental factors are attributed to the disease
discordant DZ twins, the different associations comparing in DZ twins to in MZ twins can suggest the
magnitude of the contribution of unshared genetic factors in DZ twins and potential gene-by-
environment interactions to the disease etiology.
In addition, twin studies can be useful for studying some traits and phenotypes resulting from
prenatal exposures, such as birth defects. These are poorly studied due to the difficulties in measuring
the exposures correctly when the parents are unavailable or deceased. The measures for the common
exposures from both members of twin pairs can be used for self-validation to reduce the
misclassification. In addition, twin pairs can be used as a unit of exposure. In this method, the twin pair
with the disease (either or both twins have the disease or phenotype) can be treated as “case” and the
unaffected twin pair can be treated as “control”. Hence, the common environmental effect for a trait in
twin pairs can be calculated as in any other case-control study using a logistic regression. When the
sample size is large enough, a potential dose-response relationship between the common exposure and
a disease can be evaluated by comparing disease concordant twin pairs, disease discordant twin pairs to
unaffected twin pairs using multinomial logistic regression. Finally, to maximize the potential of large
population-based twin registries, we can utilize the entire cohort to study the environmental factors by
the comparison between the diseased individuals (“case”) and the non-diseased individuals (“controls”).
The correlations between twin pairs can be taken into account using the generalized estimation
equation method or be removed using randomly selection of one twin member from the twin pairs for
logistic regression. If the disease status is ascertained prospectively from disease registry, this study
design can be extended to a prospective cohort study with the appropriate statistical method.
MZ twin studies. MZ twin study designs are extensions of matched case-control study as well as
classic twin studies. As discussed above, the unique advantage of the MZ twin design is the ability to
8
study phenotypic discordance against the equivalent genotypes and family background. This ideally
matched case-control study design provides the most confident conclusions on the causal associations
of interest. In classic twin studies, the disease concordance in MZ twins indicates the common genetic
factors in relation to the disease of interest, thus giving valuable information on disease penetrance. If
the disease concordant MZ twins can be genotyped on the candidate loci or at marker loci, they may
offer specific advantages to identify disease-specific genotypes or alleles.
Moreover, although MZ twins are genetically nearly identical, their genomes are not exactly
same, which makes each twin member has the unique phenotype. Except for the environmental factors,
the differences between MZ twin pairs are normally due to the deactivation of different X chromosomes
in female MZ twins, unequal distribution of mitochondria, stochastic somatic mutations, stochastic
difference in T and B cell receptor rearrangement and rearrangement of epigenetic patterns in utero and
further modifications throughout life [25-27]. Therefore, the discordant MZ twin designs have received
much interest in recent years for studying molecular biology. Recent whole genome sequencing efforts
suggests that the DNA sequence difference across the entire genome between MZ twins are not large
[28-30]. However, an exact estimation of DNA variations has not been reported. With the longitudinal
data collection from large twin registry and the improved human genome sequencing technology, more
and more attentions are paid on the phenotypic impact of genetic or epigenetic or biomarker changes
along with environmental influences throughout life in MZ twins.
1.1.4 Advantages of Twin Studies
As discussed above, the major advantage of twin studies is to allow disentanglement of the
shared genetic, shared environmental and unshared environmental factors for the trait of interest. With
the genetic similarity in the disease concordant twins, we can crudely estimate heritability of a disease
9
and further identify the disease specific genotypes that may be a causal component in the disease
etiology using genotyping and sequencing technology. In the meantime, by controlling for the genetic
influence on the disease in the disease discordant twin pairs, we are able to examine the associations
between the environmental factors and the disease. Particularly, when the studies are restricted to MZ
twin pairs with perfectly matched genetic and family background, there is higher confidence in assuming
a causal relationship (because of a more valid comparison). Finally, the comparison of the disease
occurrence between MZ and DZ twin pairs under a certain environmental influence serves as a crude
indicator of possible genetic susceptibility to the environmental effects in the disease.
Moreover, twin studies have a unique advantage to provide relatively reliable measures for
exposures and outcomes. Twins grow up together and compare with each other all the time for their
developmental, physical and behavioral differences throughout their lives. Therefore, when a twin
member is unavailable, such as in a single respondent twin pair, the responsive co-twin can provide the
proxy information for the nonresponsive twin. Meanwhile, for a double respondent twin pair in which
both twin members are responsive, the comparison between the self-report and the proxy information
about their each other can provide evidences in the accuracy of proxy measures and the validity to
include single respondent twin pairs in the study. A study of breast cancer in female twins by our group
showed the information obtained from the only responsive twin member in single respondent twin pairs
for both twin members is reliable and twins are dependable proxies for each other [31]. Furthermore,
given the life-long comparison with each other, twins can report relative qualitative differences that
might be difficult to measure or recall the absolute values, for example, the number of birth weight vs.
which twin had more birth weight.
In addition, due to the natural interests in their twinning properties, twins and their families are
more motivated to participate the research studies. Twins can facilitate each other to participate and
follow up, since one twin member can help to locate and establish the communication with the other
10
twin member. As a pair of individuals usually with the similar early life experiences such as education,
twins are also more likely interpret and answer questions in the similar way. Therefore, in practice,
studies with twins commonly have a good response rate and high compliance during the follow-up [32].
1.1.5 Limitations of Twin Studies
Some limitations of twin studies, such as violation of equal environment assumption (EEA) and
genetic difference in MZ twins, have been addressed under the specific circumstance for study designs.
Another common criticism on twin studies is that results from twin studies may not be directly
generalized to the general population. Twins differ from the general population in their developmental
environment as two fetuses growing simultaneously. And genetic factors (e.g. family history of twins) or
environmental influences (e.g. in vitro fertilization techniques) may have higher incidence of twins [4, 5].
Therefore, twins are not a random sample of the general population. Nevertheless, studies on
personality and intelligence of twins suggest twins may differ little from their non-twin siblings [33].
In summary, no single study design can address all issues. It is important to understand the
properties of samples studied corresponding to the specific study design. Then taking advantages of the
nature of twins, we can maximize the scientific information gained from an appropriate research study.
1.2 California Twin Program
California Twin Program is housed at the University of Southern California as a population-based
cohort representing a sample of Twins born in California between 1908 and 1982. The program
development and representativeness have been described elsewhere [34, 35]. Briefly, 256,616 multiple
births were identified through birth records and linked with the California Department of Motor
11
Vehicles (DMV) for address information between 1990 and 2001, in which 161,109 individuals were
matched. Among the matched individuals, questionnaires were mailed to 115,733 with valid addresses.
Total of 52,262 individuals, comprising 14,827 double respondent twin pairs (both members of twin
pairs responded) and 22,608 single respondent twin pairs (only one member of twin pairs responded),
completed and returned questionnaires. The crude overall response rate is 45.2%, which is comparable
with or higher than similar cohort studies [36].
In comparison with census data, the responding group is representative of the underlying
population regarding to age, sex, race and residential distribution. Some exceptions have been
addressed elsewhere in detail [34, 35]. The major differences are that there were more younger females
presented in the cohort, especially in the double respondent pairs, than the California-born 1990
resident population, and the male respondents were more likely to be married but the older female
respondents were more likely to be unmarried. This is possibly resulted from the incomplete records of
maiden names for older married females in DMV which system was founded in 1960. In addition,
compared to 1990 resident population, this study cohort slightly overrepresented Hispanic population
and had less education. Compared to the birth cohort from California record of multiple births, although
the proportion of MZ in the responding group is slightly lower than the estimated, correspondingly, the
proportion of DZ is slightly higher, the distribution of zygosity-gender in twin pairs are very similar.
Two versions of 16-page questionnaires were sent to the cohort during two time periods based
on birth year: Version 1 was sent in 1992 to twins who were born before 1956; Version 2 was sent in
1998 to 2001 to twins who were born from 1957 to 1982. Both versions include questions on
demographic characteristics (such as age, sex, zygosity, education, height, weight), development and
growth for child experience, reproductive history, medical history including diagnosed diseases in twins
and their family members and personal medicine uses, dietary preference and frequency, and lifestyle
(such as smoking, alcohol consumption and exercise). There are approximately 90% identical questions
12
in these two versions, except that version 2 has updated list of disease experiences and food items. The
uniqueness of this set of questionnaires is that some questions query the same factors for both the
respondent and his or her co-twin, which can be used to validate answers for the double respondents or
substitute proxy answers from the single respondent for his or her co-twin.
Thus, the California Twin Program has been a valuable source for scientific research since its
establishment. Twenty-three related articles listed in Table 1.1 have been published in peer-reviewed
journals. Given the comprehensive cross-sectional measures of life experience in twins using the CTP
questionnaires at time of participation, the CTP data has been used for estimating the prevalence and
heritability of phenotypic traits and examining associations between risk factors and health outcomes. A
number of nested studies successfully re-contacted the participant twins to collect updated or
additional information and some form of biospecimens (e.g. DNA samples for genotyping and
sequencing, stool samples for microbiome analysis). Furthermore, several other projects nested in CTP
are currently ongoing involving environmental exposures (e.g. parental exposures, birth characteristics,
and lifestyle factors) and the potential influences of internal exposome (e.g. DNA methylation,
microbiome) associated with disease phenotypes (e.g. birth anomalies, mononucleosis, and colorectal
cancers). Among them, three nested studies are included in this dissertation: 1) A cross-sectional study
to describe the prevalence of selected birth anomalies in California Twins and investigate the crude
heritability and potential parental exposures for each birth anomaly (Chapter 2); 2) A nested matched
case-control study to examine the associations between various life-style factors and colorectal cancer
in twin cases and controls prospectively identified by California Cancer Registry (CCR) and Cancer
Surveillance Program (CSP) (Chapter 4); 3) A cross-sectional study to investigate the relationship
between fecal microbiome and colorectal polyps in monozygotic twins (Chapter 6).
13
Chapter 1 References:
1. Cummings, M., Human heredity: principles and issues. 2015: Cengage learning.
2. Smits, J. and C. Monden, Twinning across the developing world. PLoS One, 2011. 6(9): p. e25239.
3. Imaizumi, Y., Trends of twinning rates in ten countries, 1972-1996. Acta geneticae medicae et
gemellologiae: twin research, 1997. 46(04): p. 209-218.
4. Imaizumi, Y., A comparative study of zygotic twinning and triplet rates in eight countries, 1972–
1999. Journal of biosocial science, 2003. 35(02): p. 287-302.
5. Imaizumi, Y., A comparative study of twinning and triplet rates in 17 countries, 1972-1996. Acta
geneticae medicae et gemellologiae: twin research, 1998. 47(02): p. 101-114.
6. Bortolus, R., et al., The epidemiology of multiple births. Human Reproduction Update, 1999. 5(2):
p. 179-187.
7. Oleszczuk, J.J., et al., Projections of population-based twinning rates through the year 2100. The
Journal of reproductive medicine, 1999. 44(11): p. 913-921.
8. Martin, J.A., et al., Three decades of twin births in the United States, 1980-2009. 2012: Citeseer.
9. Jones, W.H.S., E.T. Withington, and P. Potter, Hippocrates. Vol. 3. 1928: Loeb Classical Library.
10. Rende, R.D., R. Plomin, and S.G. Vandenberg, Who discovered the twin method? Behavior
genetics, 1990. 20(2): p. 277-285.
11. Merriman, C., The intellectual resemblance of twins. Psychological Monographs, 1924. 33(5): p. i.
12. Zwillingspathologie, D., ihre Bedeutung, ihre Methodik, ihre bisherigen Ergebnisse. Berlin: Julius
Springer, 1924.
13. Boomsma, D.I., Twin registers in Europe: an overview. Twin Research, 1998. 1(1): p. 34-51.
14. Busjahn, A. and Y.-M. Hur, Twin registries: An ongoing success story. Twin Research and Human
Genetics, 2006. 9(06): p. 705-705.
15. Van Dongen, J., et al., The continuing value of twin studies in the omics era. Nature Reviews
Genetics, 2012. 13(9): p. 640-653.
16. MacGregor, A.J., et al., Twins: novel uses to study complex traits and genetic diseases. Trends in
Genetics, 2000. 16(3): p. 131-134.
17. Boomsma, D., A. Busjahn, and L. Peltonen, Classical twin studies and beyond. Nature reviews
genetics, 2002. 3(11): p. 872-882.
18. Snieder, H., X. Wang, and A.J. MacGregor, Twin methodology. In: eLS. John Wiley & Sons Ltd,
Chichester. http://www.els.net [doi: 10.1002/9780470015902.a0005421.pub2], 2010.
19. Witte, J.S., J.B. Carlin, and J.L. Hopper, Likelihood-based approach to estimating twin
concordance for dichotomous traits. Genet Epidemiol, 1999. 16(3): p. 290-304.
20. McGue, M., When assessing twin concordance, use the probandwise not the pairwise rate. 1992.
21. Tishler, P.V. and V.J. Carey, Can comparison of MZ-and DZ-twin concordance rates be used
invariably to estimate heritability? Twin Research and Human Genetics, 2007. 10(05): p. 712-717.
22. Guo, G. and E. Stearns, The social influences on the realization of genetic potential for
intellectual development. Social forces, 2002. 80(3): p. 881-910.
23. Kyvik, K., Generalisability and assumptions of twin studies. Advances in twin and sib-pair analysis,
2000: p. 67-78.
24. Fisher, R.A., XV.—The correlation between relatives on the supposition of Mendelian inheritance.
Transactions of the royal society of Edinburgh, 1919. 52(02): p. 399-433.
25. Martin, N., D. Boomsma, and G. Machin, A twin-pronged attack on complex traits. Nature
genetics, 1997. 17(4): p. 387-392.
26. Petronis, A., Human morbid genetics revisited: relevance of epigenetics. Trends in Genetics, 2001.
17(3): p. 142-146.
14
27. Machin, G.A., Some causes of genotypic and phenotypic discordance in monozygotic twin pairs.
American journal of medical genetics, 1996. 61(3): p. 216-228.
28. Li, R., et al., Somatic point mutations occurring early in development: a monozygotic twin study.
Journal of medical genetics, 2013: p. jmedgenet-2013-101712.
29. Veenma, D., et al., Copy number detection in discordant monozygotic twins of Congenital
Diaphragmatic Hernia (CDH) and Esophageal Atresia (EA) cohorts. European Journal of Human
Genetics, 2012. 20(3): p. 298-304.
30. Baranzini, S.E., et al., Genome, epigenome and RNA sequences of monozygotic twins discordant
for multiple sclerosis. Nature, 2010. 464(7293): p. 1351-1356.
31. Hamilton, A.S. and T.M. Mack, Use of twins as mutual proxy respondents in a case-control study
of breast cancer: Effect of item nonresponse and misclassification. American journal of
epidemiology, 2000. 152(11): p. 1093-1103.
32. Cockburn, M.G., et al., Twins as willing research participants: Successes from studies nested
within the California Twin Program. Twin Research and Human Genetics, 2006. 9(06): p. 927-932.
33. Deary, I.J., F.M. Spinath, and T.C. Bates, Genetics of intelligence. European Journal of Human
Genetics, 2006. 14(6): p. 690-700.
34. Cockburn, M., et al., The occurrence of chronic disease and other conditions in a large
population-based cohort of native Californian twins. Twin Res, 2002. 5(5): p. 460-7.
35. Cockburn, M.G., et al., Development and representativeness of a large population-based cohort
of native Californian twins. Twin Res, 2001. 4(4): p. 242-50.
36. Cozen, W., et al., The USC Adult Twin Cohorts: International Twin Study and California Twin
Program. Twin Res Hum Genet, 2013. 16(1): p. 366-70.
37. Ringold, D.A., et al., Further evidence for a strong genetic influence on the development of
autoimmune thyroid disease: the California twin study. Thyroid, 2002. 12(8): p. 647-653.
38. Hawkins, S.A., et al., An estimate of physical activity prevalence in a large population-based
cohort. Medicine and science in sports and exercise, 2004. 36(2): p. 253-260.
39. Zhang, L., et al., Functional allelic heterogeneity and pleiotropy of a repeat polymorphism in
tyrosine hydroxylase: prediction of catecholamines and response to stress in twins. Physiological
genomics, 2004. 19(3): p. 277-291.
40. Cozen, W., et al., Th1 and Th2 cytokines and IgE levels in identical twins with varying levels of
cigarette consumption. Journal of clinical immunology, 2004. 24(6): p. 617-622.
41. Hamilton, A.S., et al., Gender differences in determinants of smoking initiation and persistence in
California twins. Cancer Epidemiology Biomarkers & Prevention, 2006. 15(6): p. 1189-1197.
42. Chen, Y., et al., Naturally occurring human genetic variation in the 3 ′-untranslated region of
the secretory protein chromogranin A is associated with autonomic blood pressure regulation
and hypertension in a sex-dependent fashion. Journal of the American College of Cardiology,
2008. 52(18): p. 1468-1481.
43. Cozen, W., et al., Use of an electrostatic dust cloth for self-administered home allergen collection.
Twin Research and Human Genetics, 2008. 11(02): p. 150-155.
44. Ursin, G., et al., The relative importance of genetics and environment on mammographic density.
Cancer Epidemiology Biomarkers & Prevention, 2009. 18(1): p. 102-112.
45. Wang, L., et al., Neuropeptide Y1 receptor NPY1R: Discovery of naturally occurring human
genetic variants governing gene expression in cella as well as pleiotropic effects on autonomic
activity and blood pressure in vivo. Journal of the American College of Cardiology, 2009. 54(10):
p. 944-954.
46. Hwang, A., et al., Evidence of genetic susceptibility to infectious mononucleosis: a twin study.
Epidemiology and infection, 2012. 140(11): p. 2089-2095.
15
47. Hwang, A.E., et al., Childhood infections and adult height in monozygotic twin pairs. American
journal of epidemiology, 2013. 178(4): p. 551-558.
48. Miller, K.A., et al., Prevalence and predictors of recent skin examination in a population-based
twin cohort. Cancer Epidemiology Biomarkers & Prevention, 2015. 24(8): p. 1190-1198.
49. Silventoinen, K., et al., The CODATwins project: The cohort description of Collaborative Project of
Development of Anthropometrical Measures in Twins to study macro-environmental variation in
genetic and environmental effects on anthropometric traits. Twin Research and Human Genetics,
2015. 18(04): p. 348-360.
50. Jelenkovic, A., et al., Zygosity Differences in Height and Body Mass Index of Twins From Infancy
to Old Age: A Study of the CODATwins Project. Twin Research and Human Genetics, 2015. 18(05):
p. 557-570.
16
Table 1.1 Published studies in the California Twin Program (CTP)
Authors and Year Twins Study Purpose Study Design Statistical Method
Cockburn et al., 2001
[35]
29,951 pairs (MZ/DZ) Describe the development and characteristics in
CTP
Descriptive
Cockburn et al., 2002
[34]
36,965 pairs (MZ/DZ) Describe the occurrence of chronic disease and
other conditions in CTP;
Estimate the incident cancers and deaths during
10 years of follow-up
Descriptive
Ringold et al., 2002
[37]
13,708 pairs (MZ/DZ) Estimate genetic influences on autoimmune
thyroid disease
Classic twin
design
Pairwise/Probandwise concordance
measures for heritability
Hawkins et al., 2003
[38]
40,261 twins (MZ/DZ) Estimate physical activity prevalence;
Characterize the association between physical
activity and health outcome
Cross-
sectional
Logistic regression with randomly
selected one twin in pairs
Zhang et al., 2004 [39] 148 pairs (MZ/DZ) Test the relationship tyrosine hydroxylase
polymorphism, catecholamines secretion and
cardiovascular reactivity to stress;
Estimate heritability of autonomic function in
biochemical and physiological traits
Cross-
sectional;
Twin
phenotyping
Multivariate ANOVA;
Variance component analysis for
heritability
Cozen et al., 2004 [40] 45 pairs (Double
respondent MZ twins)
Study the association between Th1/Th2 cytokines
and IgE levels and cigarette smoking
Nested
prospective
cohort
ANCOVA with a term for twin pairs
Hamilton et al., 2006
[41]
32,359 pairs (MZ/DZ) Determine factors related to smoking phenotype;
Estimate genetic, shared environment and
nonshared environmental influences on smoking
initiation and persistence
Cross-
sectional
Logistic regression with randomly
selected one twin in pairs;
Structural equation model for
heritability
17
Authors and Year Twins Study Purpose Study Design Statistical Method
Chen et al., 2008 [42] 148 pairs (MZ/DZ) Determine common genetic variation at CHGA
locus related to hypertension;
Estimate heritability of "intermediate" traits
Cross-
sectional;
Twin
phenotyping
ANOVA;
Variance component analysis for
heritability
Cozen et al., 2008 [43] 32 twins (10 double
respondent MZ twin
pairs + 12 single
respondent MZ twin
pairs)
Compare allergy measurements using an
electrostatic dust cloth returned by mail vs. direct
collection of house hold dust samples
Descriptive;
Method
evaluation
Nonparametric correlation;
Kappa agreement
Ursin et al., 2009 [44] 553 pairs (Double
respondent MZ/DZ
female twins)
Investigate the importance of genetics and
environment on mammographic density
Matched
case-control
Generalized estimation equations
taking into account the correlation
between twins;
Conditional logistic regression;
Variance component analysis for
heritability
Wang et al., 2009 [45] 180 pairs (MZ/DZ) +
16 siblings
Identify the effect of genetic variants at NPY1R
locus on autonomic activity and blood pressure in
vivo;
Estimate heritability of autonomic traits
Cross-
sectional;
Twin
phenotyping
Generalized estimation equations;
Variance component analysis for
heritability
Cozen et al., 2012 [36] 36,965 pairs (MZ/DZ) Describe the characteristics and ongoing projects
in CTP
Descriptive
Hwang et al., 2012
[46]
6,926 pairs (Double
respondent MZ/DZ
twins)
Estimate genetic susceptibility to infectious
mononucleosis;
Predict the rate to develop IM in the unaffected
co-twin based on affected twin
Classic twin
design
Pairwise concordance measures for
heritability;
Cox proportional hazard function
and survival analysis
Hwang et al., 2013
[47]
305 pairs (Double
respondent MZ twins)
Investigate the association between childhood
infections and adult height
Matched
case-control
Conditional logistic regression
18
Authors and Year Twins Study Purpose Study Design Statistical Method
Miller et al., 2015 [48] 37,435 pairs (MZ/DZ) Estimate the prevalence of skin examination;
Investigate sociodemographic and constitutional
risk factors associated with skin examination
Cross-
sectional
Logistic regression with randomly
selected one twin in pairs
Silventoinen et al.,
2015 [49]
23,237 pairs (MZ/DZ) Describe the CODATwins Project;
As a part of the CODATtwins Project
Descriptive
Jelenkovic et al., 2015
[50]
23,237 pairs (MZ/DZ) As a part of the CODATwins Project;
Study the zygosity differences in height and BMI of
twins from infancy to old age
Descriptive Linear regression;
Levene’s clustered test
19
Chapter 2 Birth anomalies in monozygotic and dizygotic twins:
Results from the California Twin Registry
2.1 Abstract
Birth anomalies are a global problem and result in a worldwide health care burden. The causes
of birth anomalies can be genetic factors originating preconception or environmental factors during
post-conception or both. Studies have suggested heritable factors play an important role in different
types of congenital malformations. The effects of various environmental factors, such as maternal
smoking and maternal age, have been well-studied. Here we present a cross-sectional study including
total 20,803 pairs of twins from the population-based California Twin Program (CTP). We compared
concordance in monozygotic (MZ) twins to dizygotic (DZ) twin pairs for the most common types of birth
anomalies: clubfoot, oral cleft, spina bifida, muscular dystrophy, deafness, strabismus, Down syndrome,
and congenital heart defects. We also examined the associations between each birth anomaly and birth
outcomes (birth weight, rural/urban residency at birth, birth order) as well as parental exposures (age,
smoking, education). The unique twin study design gives valuable knowledge to plan the future studies
to refine the understanding of the causes of selected birth anomalies.
2.2 Introduction
2.2.1 Overview of Birth Anomalies
A birth anomaly is defined as any anomaly involving body structure or function presenting at
birth, which can be diagnosed either at birth or later in life. Every year, approximately 6% of total births
worldwide and 3% of newborns in the United States are born with a structural or genetic birth anomaly
20
[1]. Serious birth anomalies can be life-threating or potentially lead to lifelong disability, thus resulting in
significant health care expenses. In 2009, more than 5,500 infants were died due to birth anomalies,
accounting for 20% of infant mortality as the leading cause in the United States [2]. The hospital costs
for medical care for affected children totaled $8 billion annually [3, 4]. Despite changes in public health
policies and advances in prenatal and gestational care of mothers, the overall prevalence of major birth
anomalies has remained relatively stable in the United States since the 1970s [5]. Although the causes of
most major birth anomalies are unknown, the few known causes include genetic factors predisposing
before conception (preconception) and environmental causes occurring after conception but before
birth (post-conception). Concerns have been raised that the increased prevalence of a certain cause may
result in the increased prevalence of birth anomalies.
More than half of birth anomalies are categorized as multifactorial disorders. Different from
Down syndrome that is mainly caused by chromosomal abnormality (Trisomy 21), multifactorial
disorders are due to complex genetic and environmental interactions [6]. Examples of multifactorial
birth anomalies include congenital heart problem, cleft lip/cleft palate and clubfoot, which commonly
involves in certain malformations in an organ or a limb. Although different birth anomalies have distinct
etiologies, they nevertheless share common underlying causes that include strong familial aggregation
indicating a substantial genetic component and parental exposures, including effects of aging and
smoking. A systematic review including 38 case-control and cohort studies published from 1959-2010
showed a significant association between maternal smoking and an increased risk for several most
common birth anomalies, for instance, orofacial clefts (OR=1.28; 95% CI= 1.20-1.26), clubfoot (OR=1.28;
95% CI= 1.10-1.47), and congenital heart problem (OR=1.09; 95% CI= 1.02-1.27) [7]. The relationship and
potential biological mechanisms between advanced maternal age and increasing prevalence of birth
anomalies, particularly Down syndrome, have been well-documented [8-10]. However, comparing to
maternal exposures, the relationships between offspring health and paternal exposures, age, and
21
parental educations are still relatively weak, probably due to less extensive studies and complications
with socioeconomic status (SES) and access to health care.
2.2.2 Selected types of Birth Anomalies
Down Syndrome. Down syndrome is one of the most common genetic birth anomalies.
Throughout the world, approximately 1 per 1000 births are born with Down syndrome in 2010 [11]. In
2013, there were about 8.5 million individuals born with Down syndrome, resulting in 36,000 deaths [12,
13]. In the United States, the prevalence of this anomaly decreased from 2 per 1000 live births in 1950s
[14] to 1.4 per 1000 live births in 2000s [1], majorly due to prenatal screening programs and abortions
[14]. Each year, about 6,000 babies with Down syndrome are born in US [1] . This syndrome always
show some degree of mental retardation and characteristic facial features [15]. About 30-60% of babies
with Down syndrome also have other health problems, such as congenital heart defect (40%) [16],
strabismus (~35%) [17] and other visual and hearing impairment [16, 18-22]. The severity of these
problems varies greatly. Down syndrome has no cure. The life expectancy is about 50-60 years with the
proper health care in the developed country [23]. The medical cost among children of 0-4 years old who
have Down syndrome was 12 times higher compared to children at the same age without Down
syndrome [24]. The risk of leukemia and testicular cancer is increased among individuals with Down
syndrome [23]. However, due to the extra expressions of tumor suppressor genes on chromosome 21,
there are fewer solid tumor cases later in life [23, 25].
The major cause of Down syndrome is an extra copy of chromosome 21, which is also called
trisomy 21. However, how Down syndrome occurs and how many risk factors involve in its etiology are
still unclear. Down syndrome affects all races equally [26]. Its risk has been known to increase with
maternal age (1 in 84 at age 40). Women who are 35 years or older are more likely to have a pregnancy
22
affected by Down syndrome than women at a younger age [27-29]. However, 70% of babies with Down
syndrome are born to women who are less than 35 years old, because younger women have more births
[30, 31]. A group of environmental risk factor candidates have been identified in epidemiological studies.
Smoking during pregnancy has been well-studied. A number of studies reported no association [32-34]
or negative association [35-40] between maternal smoking and the risk of Down syndrome. One
explanation for the negative association was the trisomic mutations could be lost prenatally due to
natural selection among women who smoke [29]. Other environmental factors, such as parental alcohol
consumption, maternal irradiation, fertility drugs, oral contraceptives, and low SES have been
investigated, but no association has been confirmed [29]. A study showed parental education less than
high school level significantly increased the risk of Down syndrome by 28-29%, which result was
significant for mother’s low education but marginal significant for father’s low education [41]. In
addition, case reports of infants with low birth weight and Down syndrome have been frequently
described in the literatures [42-44].
In all the available twin studies, only Danish Twin Registry reported 6 Down syndrome cases per
10,000 in DZ twins and 2 Down syndrome cases per 10,000 in MZ twins [45].
Congenital heart problem. Congenital heart defects are problems in the structure and function
of the heart that is present at birth with various symptoms from none to a severe, life-threatening
complex defect [46]. This is the most common birth anomaly and the leading cause of birth anomaly
related deaths. About 40,000 infants are born with heart problem each year in the United States, which
affects nearly 1 per 100 births [47, 48]. There are many types of congenital heart problem. The most
common type is a ventricular septal defect (VSD). The prevalence of overall congenital heart problem
increased from 50.3 per 10,000 in 1978-1983 to 86.4 per 10,000 in 2000-2005, in which some mild types
23
(e.g. VSD) were increasing while the other types remained stable [49]. However, for each type of heart
problem, the prevalence and trends vary by race/ethnicity [49, 50]. During 1999-2006, there were
41,494 deaths due to congenital heart problem in the United States, 48% of which occurred within their
first year of life [51]. Survival of infants with congenital heart problem depends on the disease severity.
Among babies born with heart problems, 25% are critical condition who are generally required surgery
or other medical procedure in their first year of life. About 69% of babies with critical conditions and 95%
with non-critical conditions are expected to survival to age of 18 [52]. In 2011, hospital costs for
individuals with congenital heart problem were $1.9 billion in the United States [53].
In most cases, the etiology of congenital heart problems is incompletely understood. Both
genetic and environmental factors appear to play some role. Compared to the population prevalence,
congenital heart problems showed strong familial aggregation in first-degree relatives with a variability
of 3-80 fold, indicating a genetic component in the disease etiology [54]. At least 15% of congenital
heart problems are associated with genetic conditions. Large chromosomal abnormalities (e.g. Down
syndrome, Trisomies 13, 18) causes about 8-10% of cases of congenital heart problems [55, 56]. 3-5% of
cases are associated with single gene mutations [57], such as heart muscle protein MYH6 [58],
transcription factor GATA4 [58], the notch signaling pathway [59] or the Ras/MAPK pathway [60].
Known environmental risk factors are attributed to about 2% of congenital heart problem cases,
including Rubella or other infections during pregnancy, maternal diabetes mellitus, obesity, maternal
drug exposures (e.g. retinoid, NSAIDs, or marijuana) and maternal phenylketonuria syndrome [57, 61].
In addition, maternal lifestyle factors, such as alcohol use and cigarette smoking have been examined to
increase the risk of congenital heart problem by a number of studies [62]. A pooled analysis from 25
published studies showed a small but significant increased risk for congenital heart problem from
maternal smoking during pregnancy [7]. Other maternal exposures to herbicides, maternal education,
maternal age >40 years, and paternal age >35 years have been investigated but cannot be confirmed
24
due to little information [61]. Studies also showed maternal nutrition is important to the risk of this
heart problem. A meta-analysis suggested a protective role of multivitamin supplements against the
development of congenital heart problem [63] The specific exposures may differ by the type of
congenital heart problem [61].
Congenital heart problems have been examined in twin studies. A recent study in the Danish
Twin Registry identified 1.4% of twins had congenital heart problems in the period from 1977 to 2001.
Comparing to randomly selected singletons, twins were 63% more likely to develop congenital heart
problems. In this study, there was no difference in the frequency of the overall congenital heart problem
between MZ and DZ twins [64]. Similar results were observed in UK twins for congenital cardiovascular
anomalies for the time between 1998 and 2002. In this population, the rates were 1.14% in twins vs.
0.78% in singletons [45]. Another twin study with 66 DZ twins in Italy found a higher recurrence of
congenital heart problem in DZ twins (13.6%) compared to non-twin siblings (3.8%) during 1999-2002.
This occurrence is higher than previous reports probably because the participants were specifically
selected for the use of echocardiography. They also tried to examine the differences in the effects of
maternal age, birth weight and other reproductive factors on congenital heart problems by comparing
concordant twin pairs to discordant twin pairs. However, no significant results were found due to the
small sample size [65]. In addition, Kaiser Permanente in Northern California investigated 926 MZ twins
from 1996 to 2003 and showed the higher prevalence in MZ twins (7.5%) than the general population
rate (0.4-0.6%). Comparing with those without the heart problems, the patients with congenital heart
problem had significantly smaller gestational age at delivery and less birth weight [66].
Clubfoot. Clubfoot, also called congenital talipes equinovarus (CTEV), is a term for several kinds
of ankle and foot deformities usually presenting in one foot or both at birth. The symptom shows the
25
affected foot rotating internally from the ankle. This anomaly is one of the most common birth
anomalies in the musculoskeletal system, affecting one in 1000 live births [67, 68]. The prevalence has
remained stable across the world from 1950s [67] to 2000s [68]. There are racial/ethnic difference in the
birth prevalence of clubfoot; Polynesians have the highest risk (6.8 per 1000 live births), followed by
whites (1.12 per 1000 live births), Hispanic (0.76 per 1000 live births), and Chinese, with the lowest risk
(0.39 per 1000 live births) [69]. In general, boys are consistently affected twice as frequently as girls [69].
Currently there is no method to prevent clubfoot. However, with the early physical treatment or surgery,
most of patients can recover completely and are able to walk normally [70].
The etiology of clubfoot remains largely unknown. There are a number of theories for clubfoot
etiology [71]. Among them, the most acceptable explanation is that this anomaly starts early in
pregnancy when the baby is small and cannot stay in one position for long time [71]. In general, a
combination of heredity and other factors probably cause clubfoot. There are strong evidences for a
genetic component in the development of clubfoot from family studies. Twin studies show the
concordance for clubfoot is 33% in MZ twins but 2.9% in DZ twins [67]. Furthermore, a higher
occurrence was found among first-degree relatives compared to more distant relatives [72]. Several
parental and environmental risk factors, such as maternal age, maternal education, marital status, and
prenatal care have been implicated in the etiology of clubfoot [73-77]. However, the findings are
inconsistent with different associations, mostly due to the limited sample size. A recent pooled study
combining data from 10 states in the United States found the significant increased risk of clubfoot from
low birth weight, being twins, younger maternal age, maternal diabetes, lower maternal education, and
maternal smoking [69]. In addition, the Atlanta Birth Defects Case-Control Study (ABDCC) found the joint
multiplicative effects of family history and smoking in the etiology of clubfoot, indicating an interaction
between genetic factors and tobacco exposures [72].
26
Early in 1974, a twin study with 1195 twins born in the Collaborative Perinatal Project tried to
describe the epidemiology of congenital malformations in twins. They found 18.33% of twins with
malformations and concordance rates were consistently higher in MZ twins than in DZ twins for all
conditions in which the only significant difference was for clubfoot [78]. Another large population-based
twin study using Danish Twin Registry looked at 46,418 twins born in 1931-1982. Based on the self-
report, the prevalence of clubfoot was 2.7 per 1000. The pairwise concordance was 0.17 for MZ twins,
0.09 for DZ like-sex twins and 0.05 for all DZ twins [79].
Cleft lip/Cleft palate. Cleft lip/Cleft palate, also known as orofacial cleft, is a heterogeneous
group of disorders with an opening in the upper lip (cleft lip), and/or the roof of the mouth and/or the
soft tissue in the back of the mouth (cleft palate). It can be a single or combined condition. These
openings normally develop in early pregnancy and close by the 10
th
-12
th
week of pregnancy. This failure
to close arises in 1-2 per 1000 births in the developed world [80]. Each year in the United States, about
4,400 babies are estimated to be affected by a cleft lip a cleft palate and 2,650 babies are born with a
cleft palate [1]. During 2002-2006, the average prevalence was 7.75 per 10,000 live births with the
stable trend in the United States [81]. Cleft lip occurs as twice as often in males comparing with females
while cleft palate without cleft lip occurs more frequent in females [80]. Asians and certain groups of
American Indians (1-4 per 1000 births) have higher risk to have cleft lip/cleft palate than whites (~1 per
1000 births) and African populations have the lowest risk (0.18-1.7 per 1000 births) [82]. In 2013, 3,300
global deaths were caused by this anomaly, which was reduced from 7,600 deaths in 1990 [83]. Children
with cleft lip/cleft palate often have other problems, such as difficulty in feeding and speaking, social
problems, or ear infections leading to hearing problems [82]. Most cases of this birth anomaly can be
successfully treated by surgery in the first 1-2 year of life. The outcome is good with appropriate health
care.
27
The causes of cleft lip/cleft palate are not well understood. Evidence suggests that they may be
caused by a combination of genes and environmental factors such as maternal lifestyle factors. The
development of the face is a complex morphogenetic event involving in rapid cell differentiation and
proliferation that is highly susceptible to genetic factors and micro-environment effects. Cleft lip/cleft
palate is listed with more than 200 specific genetic syndromes [84]. Concordance rates for this anomaly
is higher in MZ twins than DZ twins [85] and the occurrence in siblings is greater than that predicted by
familial aggregation of environmental factors [86]. Linkage studies and GWAS have identified hundred
regions and loci on chromosomes that may play a role in the etiology of cleft lip/cleft palate related to
craniofacial development, for example, genetic polymorphisms in growth factors TGFA, transcription
factors MSX1 and IRF6, factors related to nutrition metabolism MTHFR, or epigenetic regulation related
enzyme PHF8 and so on [82, 87, 88]. The environmental risk factors that might be important to cleft
lip/cleft palate include maternal smoking, alcohol use, poor nutrition, maternal diabetes, viral infection
during pregnancy, certain medicinal drugs, and teratogens exposure in early pregnancy [82]. Maternal
smoking during pregnancy has been consistently associated with the increased risk of cleft lip/cleft
palate by about 20% [89, 90]. Studies also have shown that fetuses with a certain predisposing gene may
increase the risk of developing cleft palate if their mothers smoke [91]. Other factors, such as parental
exposure to pesticide or lead, and SES have been investigated in relation to cleft lip/cleft palate.
However, the findings were inconsistent and inconclusive [82].
The prevalence of Cleft lip/cleft palate has been extensively studied in twins and their offspring
in Europe. The Danish Twin Registry published 6 articles regarding cleft lip/cleft palate [92-97]. For
Danish twins during the period 1970-1990, pairwise concordance for cleft lip/cleft palate was 43% for
MZ twins and 5% for DZ twins, while proband concordance was 60% for MZ twins and 10% for DZ twins
[93]. The ratio of MZ vs. DZ twins was very similar to the previous studies. An extensive population-
based cohort study in the same registry linked to the Danish Facial Cleft Database included 130,710
28
twins born from 1936 through 2004 in Denmark. The prevalence of cleft lip/cleft palate was 15.8 per
10,000 twins, which is not different from the general population (16.6 per 10,000 singletons). The
proband concordance was higher in MZ twins (50%) than DZ twins (8%). Comparing to the non-twin
siblings, DZ twins have higher recurrence risk for cleft lip/cleft palate [97]. Furthermore, they also
showed the increased recurrence risk for cleft lip/cleft palate were observed among offspring of both
concordant cleft twins and concordant unaffected twins [96]. Comparing to the findings from the Danish
group, Finnish researchers examined cleft twin pairs born between 1948 and 1987 in Finland. They
found their prevalence of cleft lip/cleft palate was 6.1 per 10,000. The concordance rates were 17 and 0
for MZ and DZ twins, respectively, which are generally lower than Danish twins [98]. In addition, a small
Taiwan twin study showed 26% concordance for all twins and 57% for MZ twins, which is higher than
those rates from European groups [99].
Spina Bifida. Spina bifida is a birth anomaly usually presenting at birth with the incomplete
closing of the backbone and sometimes the spinal cord. It is a type of neural tube defect (NTD). Spina
bifida is among the most common severe birth anomaly, leading to potential physical and intellectual
disabilities. The severity depends on the location and size of the opening on the spinal cord [100]. The
worldwide incidence of spina bifida ranges from 0.1 to 5 per 1000 births by countries [101, 102]. In the
United States, about 1,500 babies are born with spina bifida each year with a prevalence of 0.2-0.4 per
1000 births in 2000s [103]. Due to the introduction of mandatory folic acid fortification and the
recommendation to consume 400 µg of folic acid daily among women of childbearing age in 1992, the
prevalence of spina bifida declined from 0.25 per 1000 live births in 1991 to 0.20 in 2001 [104]. Spina
bifida occurs more frequently among Hispanic population (0.38 per 1000 births) and non-Hispanic
whites (0.31 per 1000 births). It is less common among African population (0.27 per 1000 births) [1].
There is no cure for spina bifida. Most cases can be prevented by taking enough folic acid before and
29
during mother’s pregnancy. Opened spinal cord can be closed by surgery before or after birth. A study in
Florida showed that the average hospital cost in 2012 for a baby with spina bifida was $21,900 in the
first year of life [105].
The causes of spina bifida are still largely unknown, which are believed as a combination of
genetic and environmental factors. Compared to the general population, siblings of affected individuals
had 3-8% increased risk [100]. Families with a child with spina bifida have a higher chance to have
another child with neural tube defect or Down’s syndrome [106]. This evidence implies a genetic
component in the etiology of spina bifida. Furthermore, about 132 candidate genes with known
biological activity that are potentially associated with the development of spina bifida or other NTD
were identified and published between 1994 and 2010 [107]. As discussed above, inadequate maternal
intake of folic acid has been known to increase the risk of spina bifida. Moreover, maternal diabetes,
and drug use (e.g. valproic acid or carbamazepine) have been consistently associated with the increased
risk [100]. Other risk factors, including lower maternal education, maternal age of 40 or older or younger
than 19, higher birth order, parental occupation with exposure to agrochemical and animals, and lower
consumption of maternal caffeine have been investigated in relation to the development of spina bifida
[107]. However, most of findings showed the week associations and couldn’t be replicated across
studies [100].
Congenital Muscular Dystrophy. Congenital muscular dystrophy (CMD) is a heterogeneous group
of birth anomalies including 20 types of disorders. This birth anomaly usually presents at birth or within
the first few months of life with some progressive muscle weakness and possible joint deformities,
leading to shorten life span [108, 109]. CMD is rare and currently available epidemiological studies
indicated the incidence of CMD was 4.65 per 100,000 during 1979-1993 and the prevalence was 0.68
30
per 100,000 in 1993 in North East of Italy [110], and the prevalence was 0.8-2.5 per 100,000 by types in
2000 in Sweden [111]. This is an autosomal recessively inherited disease with different inheritance
pattern for the specific type of CMD. In general, the known mutations were found in proteins related to
the dystrophin-glycoprotein complex and to the connections between muscle cells and surrounding
cellular structures [108, 112, 113]. In addition, there are some genetic mutations in some form of CMD
without identified genetic defects [112], suggesting possible susceptibility to some environmental
factors. However, no suggestive environmental factor has been reported. There is no known cure or
appropriate treatments for any of the dystrophies at this time. Few individuals can survival their early
twenties [108, 114].
Cerebral Palsy. Cerebral palsy is a wide group of permanent movement disorders presenting in
early childhood with symptoms including poor coordination, stiff muscles, weak muscles, and tremors
[115]. This is the most common motor impairment in childhood caused by damage or abnormal
development to a certain locations of the brain that are in charge of movement [115]. It affects about 2
per 1000 children. The rates are similar in both the developing and developed country and slightly
increased from 1.5 per 1000 births in 1960s to 2.5 in 1990s [116, 117]. Males have higher risk than
females [116]. However, the mortality has been decreased since 1960s to a plateauing in 1990s,
probably due to the improved medical care [116]. In the United States, 764,000 children and adults were
estimated to have cerebral palsy in 2001. Each year, about 8,000 babies and infants and 1,300 children
at preschool age are diagnosed with this disorder [115]. The management of cerebral palsy is not to cure
but to provide lifelong treatment to maximize person’s independence [115].
The known causes of cerebral palsy is the damage or abnormal development occurring to the
developing brain during pregnancy, delivery, the first month of life, or in early childhood for some less
31
common cases [116]. About 2% of cerebral palsy cases are inherited autosomal recessive disease
involving in a candidate enzyme named glutamate decarboxylase-1 [118]. It has been long known that
there is an association between low birthweight and cerebral palsy. The prevalence rate among low
birthweight children is higher than that among normal birthweight children [119-121]. Corresponding to
the birthweight, premature birth also increased the risk of cerebral palsy [122]. Moreover, among
normal birthweight children, low SES has been associated with the increased risk of cerebral palsy [123,
124]. Multiple births study showed cerebral palsy in twins is 5 times and in triplets 21 times higher than
in singletons [125, 126]. As a result, several studies involving twins were conducted to examine familiar
risk for the partial heritable etiology [126-129]. In summary, the risk of cerebral palsy was 6 times higher
in Swedish twins [129] and 9 times higher in Norwegian twins [126] compared to non-twin siblings. The
other suggestive environmental factors include difficult delivery, head trauma during the first few years
of life, exposure to methylmercury, or certain infections during pregnancy [115, 122, 130].
Congenital hearing loss. Congenital hearing loss is hearing loss at birth or in early infancy,
commonly impairing the ability to acquire a spoken language. At birth, the hearing problem occurs in 3
per 1000 babies in developed countries and more than 6 per 1000 babies in developing countries [131].
Hearing loss increases with age from 3% at age 20-35 to 11% at age 44-55 to 43% at age 65-85 [132]. 55%
of individuals with congenital hearing loss are males [133]. The current treatment includes use of
hearing aids devices or surgery for certain conditions.
Limited epidemiological studies have been carried on the causes of congenital hearing loss. Like
general hearing loss, congenital hearing loss is thought to have multiple causes, including genetic and
environmental factors. Congenital hearing loss can be inherited, in which the most common form is an
autosomal recessive non-syndromic hearing impairment, named DFNB1, associated with connexin26
32
(GJB2) mutations involving in the defects in electrical synapses [134]. Furthermore, a population-based
study with 134 children with congenital hearing loss in Australia, 8% of cases were associated with
known genetic syndromes, such as Down’s and Pendred [135]. Another study in England found 4 times
higher occurrence of congenital hearing loss after a major rubella epidemic than that in the absence of
the epidemic. In the same study, 36% of cases during no rubella epidemics had genetic syndromes [133].
Premature birth and low birthweight were associated with the increased risk of congenital hearing loss.
The risk is highest for those weighing less than 1500 g at birth [136]. Moreover, other congenital
infections, craniofacial anomalies, asphyxia, maternal diabetes, multiple births, maternal alcohol intake
during pregnancy have been linked to congenital hearing loss [137, 138].
Strabismus. Strabismus, also called lazy eye, is a condition preventing the eyes from aiming at
the same point in space. Strabismus is a common disorder that presents in 3-4% of children [139]. In Los
Angeles, California, the prevalence of strabismus was 2.4% and 2.5% among Hispanic and African
American children at age 6-72 months, respectively [140]. There are two major forms of strabismus:
esotropia and exotropia. The former is more common in children and the latter is more common in
adults 55-75 years of age [141]. Asian and African American populations are more likely to have
exotropia than Caucasians [142]. The current recommendation for strabismus is to diagnose and treat
(e.g. corrective glasses, pharmacologic treatment or strabismus surgery) as early as possible.
The causes of strabismus are still largely unknown. Current evidences indicate its etiology
involved in both genetic and environmental factors. Familiar aggregations for strabismus were observed,
in which 23% to 70% of the recurrence were found in family members who had a parent or sibling with
strabismus [143-145]. However, it is unclear whether strabismus itself or conditions underlying it are
genetic. Strabismus is more likely to co-occur in about 50% of persons with Down syndrome [146, 147],
33
44% of persons with cerebral palsy [148, 149], and 90% of patients with craniofacial dysostosis [150,
151]. Comparing to children born at term, preterm children with low birthweight have a higher risk to
develop strabismus [152]. Furthermore, maternal smoking has been linked to the increased risk of
strabismus [153]. Cigarette smoking was associated with esotropia but not exotropia among women
who smoked during pregnancy. This association was modified by birth weight and gestational age [154].
Wilmer et al have reviewed twin studies on strabismus, including 3 published studies with
complete (systematic) ascertainment and 13 non-systematically ascertained studies. From the data of
the former 3 studies, the average prevalence of strabismus is 6.4% and combined pairwise concordances
for MZ and DZ twin are 54% and 14%, respectively. These are slightly lower than the combined pairwise
concordance from the later 13 studies that are 66% for MZ and 19% for DZ. Meanwhile, given the known
knowledge, they also tried to estimate the genetic and environmental contribution to this disorder
quantitatively by fitting a full ACE model [155].
2.2.3 Birth Anomalies in Twins
Studies of twins offer a unique opportunity to investigate the potential roles of genetics and
shared early exposures. Identical (monozygotic; MZ) twins share 100% and fraternal (dizygotic; DZ) twins
share on average, 50% of their genome, respectively. Under the equal environment assumption (EEA)
that is MZ and DZ twins equally share their environment and exposure, comparison of concordance in
disease between MZ and DZ twins is a classic twin study to crudely estimate disease heritability [156,
157]. When such assumption can be made, a higher excess in a disease concordance among MZ twins
compared to DZ twins is a crude indicator that there may be a heritable component in the disease
etiology [158]. For birth anomalies, this heritable component can be genetic factors or shared prenatal
exposures or together. Due to the rareness of twinning rate and low prevalence of birth anomalies,
34
there were limited epidemiological studies to investigate limited birth anomalies in twins, and most of
such studies were conducted in Europe and focused more on genetic components rather than
environmental exposures [159-163]. Details refer to each specific birth anomaly discussed above.
In this study, taking advantage of the large number of twins in population-based California Twin
Program (CTP) with a comprehensive questionnaire, we were able to explore the birth prevalence of
various anomalies, estimate their heritable, as well as investigate selected birth characteristics and
parental exposures in relation to the risk of selected birth anomalies.
2.3 Materials and Methods
2.3.1 Study Population
The study is based on the California Twin Program (CTP) developed and maintained at the
University of Southern California. CTP is a population-based twin cohort born in California between 1908
and 1982. The development and representativeness of CTP have been described elsewhere [164, 165].
Briefly, 256,616 multiple births were identified through birth records and linked with the California
Department of Motor Vehicles (DMV) for address information between 1990 and 2001, in which
161,109 individuals were matched. Among the matched individuals, a 16-page screening questionnaire
was mailed to 115,733 with valid addresses. Total of 52,262 individuals, comprising 14,827 double
respondent twin pairs (both members of twin pairs responded) and 22,608 single respondent twin pairs
(only one member of twin pairs responded), completed and returned the questionnaires. The crude
overall response rate is 45.2%, which is comparable with or higher than similar cohort studies [166].
During the period of 1998-2001, an updated version of the questionnaire with more detailed
information on development, diet and medical history including congenital conditions was sent to a
subset of twins born from 1957 to 1982. This study is based this subset including the 20,803 responses
35
of twin pairs (7,247 double respondent twins and 13,556 single respondent twins) who completed the
updated questionnaire. In comparison with census data and California multiple birth record, the
responding participants were representative of native California twins regarding to age, sex, zygosity
and residential distribution [164, 165].
This study was conducted in accordance with the Declaration of Helsinki, and no personally
identifiable data was used during this study. Approval for the study was obtained from the University of
Southern California Institutional Review Board.
2.3.2 Birth Anomalies
We considered the following birth anomalies: clubfoot, oral cleft, deafness, cerebral palsy,
muscular dystrophy, Down syndrome, spina bifida, strabismus (lazy eye), conditional heart defects.
These conditions were listed in a session of “Congenital Conditions” in the updated questionnaire.
Individuals were asked to report their own conditions (self-report) as well as those of their twins,
brothers/sisters and mothers/fathers. For double-respondent twin pairs, the disease condition is based
on self-reports from both respondent twins, whereas for single-respondent twin pairs, the disease
condition is based on self- and proxy-report from the only respondent twin. Concordant disease pairs
were defined as the pairs in which both twins have the same condition, while discordant disease pairs
are the pairs in which only one twin has the condition.
2.3.3 Covariates
Participants’ characteristics (sex, and race/ethnicity and zygosity) were obtained based on the
self-reported questionnaires validated by birth records. Self-reported zygosity based on the
36
questionnaires were adjusted based on gender, similarity questions (“Were you as alike as two peas in a
pod?”, “How frequent did good friends or close relatives get you mixed up?”) and confirmed by co-twins
in double-respondent twin pairs if available. Twins’ birth year and the resident county at birth were
attained from birth records obtained from Vital Statistics [164, 165]. A rural county at twins’ birth was
defined when the rural population is greater than the urban population in the county based on
corresponding census data (www.mooseroots.com) in the year close to the twins’ birth year. All other
variables were ascertained from the twins’ responses in the questionnaire. For double-respondent twin
pairs, all the exposures from questionnaires were based on the self-report from both respondent twins,
whereas for single-respondent twin pairs, the exposures were based on the only report from the
respondent twin if available.
The twin’s birth order was defined as the birth of the twins among all their mother’s
pregnancies that resulted in live births, and was categorized as 1
st
birth, 2
nd
birth, 3
rd
birth, 4
th
or later
birth. For a co-twin control study, relative birth weight (“Which twin weighted more at birth?”) was
coded as a binary variable for the twin pairs who were discordant in the birth weight: “1” for a twin
member who indicated less birth weight vs. “0” for the co-twin in the twin pair.
All the parental exposures, including maternal age, parent smoking history, mother’s education
and father’s education, were reported by the twins. Maternal age was grouped as 29 or younger vs. 30
or older. Based on the questionnaire answers on “Did your parents smoke cigarettes”, parent smoking
history was summarized as neither parent smoked, only father smoked, only mother smoked, both
parent smoked, and then further combined as only one parent smoke and at least one a parent smoke.
Parent’s (mother or father) education was treated as binary variables with 12 or less years’ education vs.
13 or more years’ education.
37
To test the validity of the proxy responses from single-respondent twins’ questionnaires about
their twins, agreement on shared factors (relative birth weight, birth order, maternal age, parental
smoking and parental education) was evaluated, comparing self-reports to proxy reports in double-
respondent twin pairs. Proxy responses from single-respondent twins were included when agreement
on the variable of interest was high (>70%). Responses from double-respondent twin pairs with
consistent responses between members of the pair were included but those with inconsistent responses
or missing values were excluded from the analysis.
2.3.4 Statistical Analysis
The major analyses were conducted in the total 20,803 twin pairs unless specified. The
characteristics of study population were shown separately by affected twin pairs (twin pairs with at least
one case of any selected birth anomaly) and unaffected twin pairs (twin pairs without any selected birth
anomaly). For each birth anomaly, the frequency and percentage of concordant, discordant and non-
diseased twin pairs were described separately by zygosity (monozygotic (MZ) twins, dizygotic (DZ) twins
and unknown zygosity twins) and in total.
Pairwise concordance is the proportion of the pairs in which both twins are affected
(“concordant”) among the pairs in which at least one twin is affected. The value of pairwise concordance
is used to predict the disease status of the other twin in the pair given that one twin has the disease and
is calculated as follows:
Pairwise concordance=
No. concordant pairs
No. concordant pairs+ No. discordant pairs
The standard errors and χ
2
test were calculated using methods developed by Witt et al [167]. The excess
in pairwise concordance for a disease in MZ twins than DZ twins indicates the presence of heritable
38
contribution for this disease. For the compatible comparison purpose, only MZ (n=6,752) and DZ like-sex
(n=7,326) were included for fair comparisons since the factors (i.e. hormone) other than inheritable
factors in unlike-sex DZ pairs could contribute to the different outcomes.
To examine the effects of birth characteristics or parental exposures on each birth anomaly, two
analytic approaches were used to the outcome given the property of exposures and the statistical power.
First, a within-pair case-control study was performed in which the affected (case) and unaffected
(control) twin’s characteristics were compared using conditional logistic regression. Since the twin pairs
share all parental and demographic factors, no confounder was adjusted in the model. Second, for
shared birth characteristics and parental exposures, a between-pair analysis was performed with the
pair as the unit. Due to small numbers, birth anomaly discordant and concordant pairs were combined
as “affected pairs” and were compared to unaffected control pairs using a logistic regression model.
Based on prior knowledge, the between-pair analyses were adjusted for zygosity, gender, maternal age
at the twins’ birth, parental education and birth order when the adjusted variable was not tested as the
main effect. To further address the potential trend for the effect of parental smoking (from “neither
parents smoked”, “only father smoked”, “only mother smoked”, to “both parents smoked”) and birth
order (from 1st birth, 2nd birth, 3rd birth, to 4th birth or later) on the selected birth anomaly, trend
tests with the corresponding contrasts were used to test for linear, quadratic or cubic trend.
To further address the potential interaction between parental smoking and zygosity (a surrogate
of genetic factors) in twins on the selected birth anomalies, pairwise concordance was calculated
separately among affected twins who had at least one parent smoking and affected twins who had
neither parent smoking. Due to the limited sample size, only clubfoot, strabismus, congenital heart
problem and any birth anomaly were examined.
39
Sensitivity analyses were conducted the analyses were repeated in a subset comprising of the
double-respondent twin pairs (N=7,242) with completely consistent and non-missing answers, in order
to examine the potential bias from the inclusion of single-respondent twin pairs in the main analyses.
All the effects are reported as the odds ratios with 95% confidence interval. If the cell number
was less than 5, a Fisher’s exact test was used for statistical testing. Otherwise, likelihood ratio χ
2
test
was used. P values are reported for interaction tests and tests for concordant rates. P < 0.05 is
considered to be a significant finding. Statistical analysis was performed using SAS software (Version 9.4,
SAS institute, Inc., Cary, North Carolina).
2.4 Results
2.4.1 Study Population Characteristics
Table 2.1 showed the characteristics of California twin pairs in the study. Overall, females, DZ
twins and double-respondent twin pairs were more likely to report the presence of affected cases of a
birth anomaly. The risk to have a birth anomaly in twins was higher in Non-Hispanic whites than any
other ethnic/racial group. The affected twin pairs were more likely found in the families whose both
parents smoked or whose father had lower education.
2.4.2 Prevalence and Trend in Selected Birth Anomalies
For any birth anomaly, the overall prevalence is 38.2 per 1,000 births and the prevalence for
concordant twin pairs is 7.6 per 1,000 pairs (Table 2.2). The prevalence of concordant disease pairs
ranges from undetectable numbers in Down syndrome to 1.4 per 1,000 pairs in congenital heart
problem. For each birth anomaly, there were consistently more concordant disease pairs among MZ
40
than among DZ twin pairs. However, there weren’t always more discordant disease pairs among MZ
compared to DZ twin pairs, probably due to the environmental factor, such as maternal age. When we
compared frequencies of multiple anomalies in single individuals or in twin pairs, mental retardation
was more likely to concurrent in individuals with oral cleft, cerebral palsy or Down syndrome (6.9%, 13.6%
and 16.3%, respectively, Table 2.3). Strabismus was the most commonly reported anomaly in twins
when the co-twin reported the presence of other birth anomalies (7.7%-37.5%, Table 2.4).
The overall trend of any birth anomaly has been slightly decreasing from 1957 to 1982 (Figure
2.1). The rates of oral cleft, strabismus, deafness and spina bifida dropped by 72%, 46%, 43% and 37%,
respectively. For cerebral palsy, the rate decreased in 1960s and then slightly increased in 1980s. The
other anomalies, the trend was relatively stable during the study period. The overall prevalence of any
birth anomaly for the study period is 38 per 1,000 persons.
2.4.3 Pairwise Concordance of Selected Birth Anomalies
For any birth anomaly, pairwise concordance in MZ and DZ like-sex pairs is respectively 17.15%
(SE=1.85%) and 7.82% (SE=1.14%, Table 2.5). The relative risk of MZ compared to DZ like-sex pairs
(concordance ratio) is 2.19 (p<0.0001). Except for Down syndrome and spina bifida that were lack of
concordant pairs (Table 2.5), all the other birth anomalies showed the excess concordance among MZ
than DZ. A significant difference in concordance between MZ and DZ pairs was observed for clubfoot
(concordance ratio=5.91, p=0.043) and strabismus (concordance ratio=2.52, p=0.001). No other
significant results were found. The results are not significantly different when “concordance” was
redefined as a pair of a specific birth anomaly and any birth anomaly instead of a certain birth anomaly
pair. These findings were consistently reported using probandwise concordances or structural equation
modelling methods (Table 2.16).
41
2.4.4 Parental Exposures
Percent agreement among double-respondent pairs for shared factors ranged from 75% to 93%
for shared factors (Table 2.15). Thus, proxy reports from single-respondent twins were included for all
these variables.
In terms of parental exposures, parental smoking (Table 2.6) was associated with an increased
risk of spina bifida (OR=3.48; 95% CI=1.48-8.18) and strabismus (OR=1.61; 95% CI=1.28-2.03).
Interestingly, maternal smoking was marginally associated with a decreased risk of Down syndrome
(OR=0.34; 95% CI=0.12-1.01). Overall, any birth anomaly risk is significantly increased by 43% among
twin pairs whose both parents smoked (OR=1.43; 95% CI=1.23-1.67). For clubfoot and strabismus, the
concordance ratio between MZ and DZ among twins with smoking parents was, 5-fold and 3-fold,
respectively, higher that among twins with neither parent smoking. On the contrary, for congenital heart
defects, parental smoking only led to about half of the concordance ratio between MZ and DZ
comparing to twins with non-smoking parents (Table 2.7).
Maternal age (>= 30 vs. <30 years, Table 2.8) was significantly associated with an increasing risk
of Down syndrome (OR=3.34; 95% CI=1.21-9.23) but with a decreasing risk of spina bifida (OR=0.29; 95%
CI=0.12-0.73) and congenital deafness (OR=0.68; 95% CI=0.46-0.99). No significant associations were
found between maternal education and any selected birth anomaly. Paternal education was associated
with a decreased risk of any birth anomaly (OR=0.87; 95% CI=0.75-0.99, Table 2.9), and particularly for
Down syndrome (OR=0.32; 95% CI=0.11-0.97) and strabismus (OR=0.81; 95% CI=0.66-0.99) (Table 2.9).
42
2.4.5 Birth Characteristics
We also examined the effect of birth weight (within pairs) and birth order (between pairs) on
the risk of birth anomalies. Lower birth weight was associated with an increased risk of any birth
anomaly within twin pairs (OR=1.32; 95% CI=1.17-1.49, Table 2.10). Specifically, the occurrence of
deafness (OR=1.63; 95% CI=1.19-2.24), cerebral palsy (OR=1.83; 95% CI=1.23-2.76) and congenital heart
defects (OR=1.77; 95% CI=1.34-2.35) were significantly associated with lower birth weight. Twin pairs
who were first born compared to later born were more likely to be affected by strabismus (OR 2nd vs.
1st=0.99, 95% CI=0.79-1.24; OR 3rd vs. 1st=0.76, 95% CI=0.57-1.01; OR ≥4th vs. 1st=0.75, 95% CI=0.57-0.99;
Quadratic trend p=0.008, Table 2.11).
2.4.6 Sensitivity Analyses
A total of 7,242 double-respondent twin pairs with consistent answers were included in the
sensitive analyses, including 2955 MZ twin pairs, 3969 DZ twin pairs and 318 twin pairs with unknown
zygosity. All the analyses in this study were repeated using this subset. The results are reported for the
birth anomaly distribution by zygosity (Table 2.12), pairwise concordance (Table 2.13) and parental
smoking effects. For any birth anomaly, the overall prevalence is 47.6 per 1,000 births and the
prevalence for concordant twin pairs is 7.7 per 1,000 pairs, which are slightly higher than the full
analysis (7.6 per 1,000). In terms of each specific anomaly, the prevalence is similar with small variations.
The significant excess concordance in MZ twin pairs (7.0% and 12.8%, respectively) against DZ twin pairs
(0.9% and 5.1%, respectively) were preserved for strabismus and any birth anomaly. Due to the smaller
sample size, we were unable to estimate risk for clubfoot and congenital heart defects. For the same
reason, we failed to find the association between parental smoking in both parents and risk of spinal
bifida (OR=1.7; 95% CI=0.49-6.23, Table 2.14), which was significant in the full analysis. Except that, the
43
other significant findings and the direction of all non-significant findings were remained consistent.
Similar consistent results were found in all the other analyses (data not shown).
2.5 Discussion
In this large population-based study in California twins, we observed the generally decreasing
trend in the prevalence of any birth anomaly from 1957 to 1982. The overall prevalence of any birth
anomaly for this period is 38 per 1,000 persons. Concordance for any birth anomaly in MZ twins was 2.2
times that of DZ like-sex twins. Significant excess of concordance for clubfoot (5.9 times) and strabismus
(2.5 times) suggested the potential heritable contributions to the etiologies. In terms of parental
exposures, parental smoking was associated with an increased risk of spina bifida and strabismus. As
expected, advanced maternal age significantly increased the risk of Down syndrome. Comparing within
twin pairs, twin member who had lower birth weight than the co-twin was more likely to have
congenital deafness, cerebral palsy and congenital heart defects.
Previous studies have suggested an increase risk among twins for a number of different birth
anomalies across anatomical locations [97, 160-163, 168-170]. Although, some population-based data
have found no excess risk among twins compared to singletons (i.e., for oral clefts) [15]. This is of
increasing importance given that the birth rate for twins in the United States is increasing. The rates rose
76% from 1980 through 2009, from 18.9 to 33.3 per 1,000 live births [16] in part due to increasing
maternal age, use of fertility drugs or other fertility treatments, and changes in nutrition. Global report
on birth defects in 2006 provided the worldwide prevalence of congenital malformations in groups,
including cardiovascular system (7.9 per 1000 live births) and oral cleft (1.4 per 1000 live births) [6]. Our
study twins gave the similar prevalence rates for congenital heart problem (7.5 per 1000 live births) and
cleft palate/cleft lip (1.25 per 1000 live births). These comparable statistics indicate no excess risk of
44
these two birth anomalies in California twins comparing to the general population. However, California
twins appeared to at a substantially greater risk for muscular dystrophy (0.6 per 1000 live births) and
spina bifida (1.3 per 1000 live births) comparing to the generals (0.025 [111] and 0.35 [1] per 1000 live
births, respectively). The major difference for the former anomaly is probably due to geographic
difference and limited evidence since only two available records for muscular dystrophy were both from
smaller population in Europe [110, 111]. For spina bifida, the introduction of folate uses during
pregnancy in 1990s is most likely to be the reason of the excess risk in our twin who were all born
before 1990s. Additionally, moderate greater prevalence for clubfoot, deafness and cerebral palsy were
also observed in California twins. The rate of clubfoot (2.9 per 1000 live births) is very close to that from
other twins, which also showed about 2-fold increase in risk [79]. The unclear definition of the disease
for the participations may be contributed to the excess risk of deafness and cerebral palsy as well as the
lower risk of strabismus. On the other hand, less cases of Down syndrome in our twin cohort is possibly
because their conditions might make them less likely to be identified in the DMV records or perhaps less
likely to return a useful questionnaire. But for some conditions, the unaffected twin or the closes
relatives were the reporters, which might explain only DZ twins affected by Down syndrome were
reported. Besides, different composition of race and ethnicity in the general population from CTP twins
may be another explanation.
In this study, significant increases in concordance in MZ compared to DZ twin pairs, suggesting a
common intrauterine exposure and/or genetic susceptibility, were found for clubfoot and strabismus,
consistent with previous studies. Clubfoot is one of the most common birth anomalies involving in the
musculoskeletal system deformations [168]. Although the causes of clubfoot are still unclear, evidences
suggest clubfoot is multifactorial in origin that includes both genetic factors [169, 170] and
environmental effects [171-173]. A Danish twin study found the self-reported prevalence of clubfoot of
2.7 per 1,000 persons in the similar birth twin cohort and 3.4 times excess in concordance comparing
45
MZ to DZ like-sex [174]. A recent GWAS study first identified a mutation in the transcription factor PITX1
responsible for the pelvic morphology in vertebrate [168, 169]. Strabismus is a condition preventing the
eyes from aiming at the same point in space in the same time. The prevalence of strabismus is 20-40 per
1,000 among the white population [175]. The occurrence of the common forms of strabismus is likely to
involve in both genes and environmental contributions. The previous twin studies showed 2-5 times
higher concordance for strabismus among MZ as DZ [155]. Family studies and linkage analysis provided
strong evidences of familiar aggregated strabismus as a Mendelian trait [175]. In addition, oral cleft, a
structure opening in the lip, the roof of the mouth, or the soft tissue in the back of the mouth, have
been shown to have approximately 3-8 times excess in concordance in MZ compared to DZ twin pairs in
Danish and Finland twin studies, in which the variation in the ratio is due to different syndromes [93, 98].
A number of genes have been identified to play a role in craniofacial development and involve in the
morphogenesis of the midface [176, 177]. However, in our study, the difference between MZ and like-
sex DZ concordance for oral cleft, although the excess in concordance was greater than 2-fold. The
possible explanation is the concordance rate among MZ was more likely to be underestimated than that
among DZ. Certain left cleft/lip syndrome associated with genetic causes have been shown co-
occurrence with mental retardation [178], which were also observed in our study. This combination
could interfere with the ability to complete the questionnaire particularly among concordantly affected
MZ pairs. For the similar reason, we were unable to measure the concordance in Down syndrome.
Parental smoking, particularly maternal smoking, has been well-studied in relation to the risk of
birth anomalies. Maternal smoking during pregnancy has been associated with an increased risk for
certain birth anomalies such as orofacial clefts, clubfoot, congenital heart problems [7, 91]. The normal
development of fetus in uterus could be interfered and altered by DNA damage, loss of essential
nutrients, teratogenic effects, or fetal hypoxia, which could be induced by exposure to tobacco smoking
toxins, such as CO, nicotine, cadmium, and PAH [91]. With less evidences, paternal smoking during
46
preconception has also been linked with the increased risk of offspring anomalies, probably due to
smoking-caused DNA mutations in the germ line of the father or the secondhand smoking from the
mother [179]. In our study, we couldn’t find the associations for any selected birth anomaly or combined
if only one parent had smoked. However, when both parents smoked, the risks of spina bifida and
strabismus in offspring were significantly increased. An quadratically increasing trend was also observed
for the risk of clubfoot, spina bifida and strabismus among offspring who had only father smoked, only
mother smoked, and both parents smoked, suggesting paternal smoking might have synergistic effect
on the etiology of these three anomalies. The positive association between maternal smoking and the
occurrence of clubfoot has been consistently reported by previous studies [69, 72, 76, 180-183].
Maternal smoking during pregnancy is one of the identified environmental risk factors associated with
strabismus [184-186]. However, a systematic reviews including 17 studies found no association between
maternal smoking and spina bifida [7]. From the limited studies on paternal smoking or secondhand
smoking, there were no strong evidences in relation to each birth anomaly [185, 187-191]. Marginal
positive associations were revealed between paternal/secondhand smoking and clubfoot (OR=1.8; 95%
CI=0.97-3.37) [188] and spina bifida (OR=1.9; 95% CI=0.70-9.40) [191]. One possible explanation for our
failure to detect the significant association between maternal smoking and selected birth anomaly (e.g.
oral cleft) despite of the observed increasing risk in most anomalies is potential misclassification of
smoking measurement. In the CTP questionnaire, the parental smoking was only asked for the smoking
history without smoking level or time. Thus, we didn’t have information on time points of maternal
smoking, or level of smoking, which was more likely lead to null findings but we cannot surely predict.
Only one study examined the risk of congenital anomalies of the heart, neural tube or limb associated
with parental cigarette smoking separately by father only, mother only and both parents smoking [188].
They also didn’t observe the increased risk associated with maternal smoking in the absence of paternal
smoking and found the modestly elevated risks for congenital heart defects (OR=1.9; 95% CI=1.2-3.1)
47
and limb deficiencies (OR=1.7; 95% CI=0.96-2.9) by comparing both parents smoking to neither parent
smoking. In our study, we also found a marginal negative association between parental smoking and
Down syndrome after controlling for maternal age. Interestingly, our finding is similar to the results
from a number of studies [34, 37, 38, 40]. A proposed explanation for this negative association is that
the fetus with the trisomic mutations could be lost prenatally due to natural selection among women
who smoke [29].
In addition, certain populations with genetic polymorphisms, coding for proteins relating to
inflammatory responses or metabolic pathways, may be more susceptible to damage from tobacco
smoke [91]. Our stratified concordance calculations by parental smoking status in clubfoot and
strabismus suggest a potential interaction between the disease inheritable genes and parental smoking.
A recent Atlanta Birth Defects Case-Control study confirmed a significant interaction departure from
additivity between maternal smoking and family history in the etiology of clubfoot [72]. The
Collaborative Prenatal Project of NIH reported that the heritability of certain types of strabismus
remained significant after adjustment for the known environmental factors [192]. However, there are
lack of evidence on the interaction between strabismus heritability and maternal smoking. Our
proposed hypothesis for these two birth anomalies is the disease caused genes may highly correlate
with the genes susceptible to the tobacco smoke toxins in genomic distance or protein function, leading
to the greatest risk among MZ with smoked parents. Further GWAS or NGS studies are needed to prove
this hypothesis.
Congenital heart defect is the most common type of birth anomaly and leading cause of birth
anomaly related death. Congenital heart defect has been shown associations with both genetic factors
[193, 194] and environmental factors [195, 196]. A pooled analysis from 25 published studies showed a
small but significant increased risk for congenital heart problem from maternal smoking during
pregnancy [7]. Our study provided the right directional but insignificant results (Concordance ratio=1.90,
48
p=0.121; OR for both parents smoking vs. neither smoking=1.23, 95% CI= 0.88-1.72). Our explanation is
lack of the power to detect the small difference and the misunderstanding in the definition of congenital
heart problem in the questionnaire that might mix in the adult heart problem unrelated to birth.
Furthermore, the stratified concordance calculations by parental smoking status in congenital heart
problem found the more concordance rate among DZ with smoked parents but similar in MZ regardless
of parental smoking status, indicating the possible independent inheritable mode between disease
genetic inheritability and parental smoking factors in the etiology of congenital heart problem.
Maternal age, parental education and other birth characteristics (e.g. low birth weight, birth
order and residency) are also important factors related to birth anomalies. It is well-documented that
advanced maternal age increases birth prevalence of chromosomal abnormalities, particularly Down
syndrome, because more [8]. However, there were inconsistent evidences of association between
maternal age and other non-chromosomal birth anomalies [8, 74, 187, 197-205]. Our study revealed the
consistently increased risk of Down syndrome but unexpectedly decreased risk of spina bifida and
congenital deafness among women 30 years or older comparing to women younger than 30 years old.
There have been 5 studies that investigated the associations between different types of birth anomalies
and young or advanced maternal age [74, 198, 201, 203, 206]. Among them, one large population-based
study in the Metropolitan Atlanta Congenital Defects Program (MACDP) involving in a total of 1,050,616
singleton infants found young maternal age (14-19 years) was significantly associated with the elevated
risk of anencephaly/spina bifida (OR=1.81; 95% CI=1.30-2.52) and all ear defects (OR=1.57, 95% CI=1.10-
1.49) [203]. One of the limitations in this analysis is the cutoff of advanced maternal age. The most
common cutoff is women at age 35. However, due to the design of CTP questionnaire in 1990s, we only
can define older age as age at birth over 30 years old, which may partially mask the true association.
Advanced maternal age is associated with adverse pregnancy outcome, such as birth anomalies, low
birth weight or infertility, probably caused by the more frequent DNA mutations in germline cells and
49
detrimental effects on decidual and placental development [10]. On the other hand, advanced maternal
age is also associated with number of children, higher SES, higher income, better health care and more
stable living condition. This complexity increases the uncertainty to identify of the true effects of
advanced maternal age. Thus, we may lack of power to separate the single effects from maternal, birth
order, parental education and residency as surrogate of SES and potential occupational exposures.
Low birth weight has been an important factor in a study related to a birth anomaly. The
population-based Metropolitan Atlanta Congenital Defects Program (MACDP) for 1978-1988 found that
the risk of having birth defects was 80% higher for infants in low-birth-weight classes (≤ 2499 g) than
those weighing 2500-3999 g (RR=1.8, 95% CI=1.7-1.8) [207]. However, low birth weight is commonly
associated with premature birth and/or intrauterine growth retardation, which also affect the risk of
birth anomalies. Cerebral palsy [119-121], clubfoot [69], congenital hearing loss [136] and strabismus
[152] have been linked to low birth weight. However, none of these studies had genetic background or
birth factors other than birth year controlled like our twin study. After matching on parental exposures,
birth conditions and the full (MZ) or partial (DZ) genome by comparing the different birth weight within
twin pairs, we found almost twice increased risk of cerebral palsy, deafness and congenital heart defects
in the twin members with lower birth weight than their co-twins. These associations persisted for
cerebral palsy in MZ twins and deafness and congenital heart defects in DZ like-sex twins.
This study was based on data collected from cross-sectional questionnaire and has several
limitations. First, self-reported zygosity could lead to non-differential misclassification of zygosity, which
is more likely to underestimate the true difference. However, several studies including a study with 600
pairs of the CTP twins showed that there is more than 95% agreement between self-reported zygosity
and molecular genetically determined zygosity [165, 208, 209]. Second, the twins in this study were
identified through DMV linkage, which might introduce potential selection bias, especially for those
conditions (e.g. Down syndrome), which might affect the capability to be in the DMV records or
50
complete the questionnaires, thus resulting in less likely to be included in the study. This is an inherent
limitation in this study design, which might lower the prevalence of such conditions or attenuate the
concordance ratios, since the concordant affected twin pairs, usually with more severe conditions than
the discordant affected pairs, were less likely included in the study. However, our consistent findings
with previous studies in the association between Down syndrome and maternal ages as well as maternal
smoking, suggesting the measures of these conditions were reasonably reliable, which might be
completed by their unaffected co-twin or close relatives. Third, due to small sample size, we obtained
disease status and exposure using combined information from both self- and proxy-report, in which
proxy-report was used when self-report was missing. Since the selected birth anomalies in this study
were selected because they were listed in a “Congenital Conditions” section in the questionnaire, which
is a general survey designed decade ago with no specific study purpose. There were no detailed
information regarding to treatment or subtypes. However, given that these conditions were presented
at birth, might affect the life since birth, or corrected in a certain life point, the misclassification of
disease status would be more likely to report no abnormality at a present condition in a self-report or be
missed in a proxy-report. In this case, our findings would be underestimated. In terms of exposures,
more than 75% agreement was found between proxy and self-report for each shared among double-
respondent twin pairs, indicating that proxy responses from single-respondent pairs were reasonably
valid for these exposures. Furthermore, sensitivity analyses restricted our analysis in double-respondent
twin pairs with only self-reported disease status and consistent exposure measures repeated the most
of major findings, except the non-significant but the same directional results on the MZ excess in
concordance for clubfoot and lack of association between parental smoking and spina bifida, probably
due to lack of power, suggesting the potential bias from the proxy response would be limited. Lastly, in
the classic twin study, MZ and DZ twins are assumed to share unmeasured environmental factors to the
same extent (“Equal Environment assumption”), thus any additional concordance among MZ over DZ
51
twins suggest heritable factors. However, it is possible that DZ twins may experience a different uterine
environment compared to MZ twins. If true and relevant to birth anomalies, the contribution solely due
to inheritable factors could be exaggerated.
However, using twins as the study population to investigate birth anomaly has the unique
strengths other than traditional observational study. The major advantage of a twin study is to
disentangle the genetic factors, shared environmental factors and unshared environmental factors in a
physical trait [210-212]. Except for Down syndrome that is generally caused by chromosomal
abnormality (Trisomy 21), all the other selected birth anomalies in this study are multifactorial disorders
due to complex genetic and environmental interaction [6]. In this study, pairwise concordance rate ratio
between MZ and DZ twins provided a crude measure of a combined effect of genetic components and
common intrauterine exposures for each selected birth anomaly. The examinations of shared
environmental factors (parental smoking, parental education, maternal age and birth order) in twins
suggested the potential significant contributions in the proportion of common intrauterine effects, since
a mutation related to an anomaly could be inherited from ancestry or be induced in the intrauterine
environment by parental exposures (GxE interaction). Our findings may offer directions for the future
genetic studies. Furthermore, the prevalence and etiology of a birth anomaly may be different in twins,
who are often compromised by premature birth and low birth weight. Our study provided evidences to
compare the prevalence of a birth anomaly in twins to the general population. Also, the association
study between lower birth weight and each selected birth anomaly within twin pairs was able to peel off
parental exposures and other birth-related factors with fully or partially matched genetic background.
In addition, the self-reported information in twins can be validated by the co-twin, providing more
confidence than a standard study of unrelated persons. Finally, the most current knowledge of birth
anomalies in twins were from published studies in the Europe, which population is relatively
homogeneous and very different from the US population. This study is the first twin study in the United
52
States to explore diverse types of the birth anomaly, to estimate the heritable contributions, as well as
to provide evidence of their associations with a set of birth characteristics and parental exposures. The
unique twin study design may provide valuable clues to further understand the etiology, thus directing
the future studies.
53
Chapter 2 References
1. Parker, S.E., et al., Updated National Birth Prevalence estimates for selected birth defects in the
United States, 2004-2006. Birth Defects Res A Clin Mol Teratol, 2010. 88(12): p. 1008-16.
2. Mathews, T. and M.F. MacDorman, Infant mortality statistics from the 2009 period linked
birth/infant death data set. National vital statistics reports, 2013. 61(8): p. 1-28.
3. Russo, C.A. and A. Elixhauser, Hospitalizations for birth defects, 2004. 2007.
4. Control, C.f.D. and Prevention, Economic costs of birth defects and cerebral palsy--United States,
1992. MMWR. Morbidity and mortality weekly report, 1995. 44(37): p. 694.
5. Update on overall prevalence of major birth defects--Atlanta, Georgia, 1978-2005. MMWR Morb
Mortal Wkly Rep, 2008. 57(1): p. 1-5.
6. Christianson, A., C.P. Howson, and B. Modell, March of Dimes: Global report on birth defects.
Global report on birth defects. The hidden toll of dying and disable children. New York: White
Plains, 2006: p. 29-35.
7. Hackshaw, A., C. Rodeck, and S. Boniface, Maternal smoking in pregnancy and birth defects: a
systematic review based on 173 687 malformed cases and 11.7 million controls. Hum Reprod
Update, 2011. 17(5): p. 589-604.
8. Morris, J., D. Mutton, and E. Alberman, Revised estimates of the maternal age specific live birth
prevalence of Down's syndrome. Journal of medical screening, 2002. 9(1): p. 2-6.
9. Schmidt, L., et al., Demographic and medical consequences of the postponement of parenthood.
Human reproduction update, 2011. 18(1): p. 29-43.
10. Nelson, S.M., E.E. Telfer, and R.A. Anderson, The ageing ovary and uterus: new biological
insights. Human reproduction update, 2013. 19(1): p. 67-83.
11. Weijerman, M.E. and J.P. de Winter, Clinical practice. The care of children with Down syndrome.
Eur J Pediatr, 2010. 169(12): p. 1445-52.
12. Global, regional, and national incidence, prevalence, and years lived with disability for 301 acute
and chronic diseases and injuries in 188 countries, 1990-2013: a systematic analysis for the
Global Burden of Disease Study 2013. Lancet, 2015. 386(9995): p. 743-800.
13. Global, regional, and national age-sex specific all-cause and cause-specific mortality for 240
causes of death, 1990-2013: a systematic analysis for the Global Burden of Disease Study 2013.
Lancet, 2015. 385(9963): p. 117-71.
14. Menkes, J.H., H.B. Sarnat, and B.L. Maria, Child neurology. 2006: Lippincott Williams & Wilkins.
15. Faragher, R. and B. Clarke, Educating Learners with Down Syndrome: Research, Theory, and
Practice with Children and Adolescents. 2013: Routledge.
16. Epstein, C.J., The consequences of chromosome imbalance: principles, mechanisms, and models.
Vol. 18. 2007: Cambridge University Press.
17. Van Cleve, S.N. and W.I. Cohen, Part I: clinical practice guidelines for children with Down
syndrome from birth to 12 years. Journal of Pediatric Health Care, 2006. 20(1): p. 47-54.
18. McPhee, S.J., et al., Pathophysiology of disease: an introduction to clinical medicine. 2000: Lange
Medical Books/McGraw-Hill.
19. Hickey, F., E. Hickey, and K.L. Summar, Medical update for children with Down syndrome for the
pediatrician and family practitioner. Advances in pediatrics, 2012. 59(1): p. 137-157.
20. Bourgeois, B.F. and E. Dodson, Pediatric epilepsy: diagnosis and therapy. 2007: Demos Medical
Publishing.
21. Tecklin, J.S., Pediatric physical therapy. 2008: Lippincott Williams & Wilkins.
22. Wilson, G.N. and W.C. Cooley, Preventive management of children with congenital anomalies
and syndromes. Vol. 1. 2000: Cambridge University Press.
54
23. Malt, E., et al., Health and disease in adults with Down syndrome. Tidsskrift for den Norske
laegeforening: tidsskrift for praktisk medicin, ny raekke, 2013. 133(3): p. 290-294.
24. Boulet, S.L., et al., Health care expenditures for infants and young children with Down syndrome
in a privately insured population. J Pediatr, 2008. 153(2): p. 241-6.
25. Thomas-Tikhonenko, A., Cancer genome and tumor microenvironment. 2010: Springer.
26. Warkany, J., Congenital malformations. Chicago: Year Book Medical Publishers pp. 313–14. 1971,
Inc.
27. Allen, E.G., et al., Maternal age and risk for trisomy 21 assessed by the origin of chromosome
nondisjunction: a report from the Atlanta and National Down Syndrome Projects. Human
genetics, 2009. 125(1): p. 41-52.
28. Ghosh, S., E. Feingold, and S.K. Dey, Etiology of Down syndrome: Evidence for consistent
association among altered meiotic recombination, nondisjunction, and maternal age across
populations. American Journal of Medical Genetics Part A, 2009. 149(7): p. 1415-1420.
29. Sherman, S.L., et al., Epidemiology of Down syndrome. Mental retardation and developmental
disabilities research reviews, 2007. 13(3): p. 221-227.
30. Adams, M.M., et al., Down's syndrome: recent trends in the United States. Jama, 1981. 246(7): p.
758-760.
31. OLSEN, C.L., et al., The effects of prenatal diagnosis, population ageing, and changing fertility
rates on the live birth prevalence of Down syndrome in New York State, 1983–1992. Prenatal
diagnosis, 1996. 16(11): p. 991-1002.
32. Cuckle, H., et al., Maternal smoking habits and Down's syndrome. Prenatal diagnosis, 1990.
10(9): p. 561-567.
33. Källén, K., Down's syndrome and maternal smoking in early pregnancy. Genetic epidemiology,
1997. 14(1): p. 77-84.
34. Torfs, C.P. and R.E. Christianson, Effect of maternal smoking and coffee consumption on the risk
of having a recognized Down syndrome pregnancy. American journal of epidemiology, 2000.
152(12): p. 1185-1191.
35. Kline, J., et al., Maternal smoking and trisomy among spontaneously aborted conceptions.
American journal of human genetics, 1983. 35(3): p. 421.
36. Kline, J., et al., Cigarette smoking and trisomy 21 at amniocentesis. Genetic epidemiology, 1993.
10(1): p. 35-42.
37. Hook, E.B. and P.K. Cross, Cigarette smoking and Down syndrome. American journal of human
genetics, 1985. 37(6): p. 1216.
38. Hook, E. and P. Cross, Maternal cigarette smoking, Down syndrome in live births, and infant race.
American journal of human genetics, 1988. 42(3): p. 482.
39. Shiono, P.H., M.A. Klebanoff, and H.W. Berendes, Congenital malformations and maternal
smoking during pregnancy. Teratology, 1986. 34(1): p. 65-71.
40. Chen, C.-L., T.J. Gilbert, and J.R. Daling, Maternal smoking and Down syndrome: the confounding
effect of maternal age. American journal of epidemiology, 1999. 149(5): p. 442-446.
41. Torfs, C.P. and R.E. Christianson, Socioeconomic effects on the risk of having a recognized
pregnancy with Down syndrome. Birth Defects Research Part A: Clinical and Molecular
Teratology, 2003. 67(7): p. 522-528.
42. Mili, F., et al., Prevalence of birth defects among low-birth-weight infants: a population study.
American journal of diseases of children, 1991. 145(11): p. 1313-1318.
43. Pueschel, S.M., K.J. Rothman, and J. Ogilby, Birth weight of children with Down's syndrome.
American journal of mental deficiency, 1976. 80(4): p. 442-445.
44. Chen, A.T., et al., Chromosome studies in full-term, low-birth-weight, mentally retarded patients.
The Journal of pediatrics, 1970. 76(3): p. 393-398.
55
45. Glinianaia, S., J. Rankin, and C. Wright, Congenital anomalies in twins: a register-based study.
Human Reproduction, 2008. 23(6): p. 1306-1311.
46. Mendis, S., P. Puska, and B. Norrving, Global atlas on cardiovascular disease prevention and
control. 2011: World Health Organization.
47. Hoffman, J.I. and S. Kaplan, The incidence of congenital heart disease. Journal of the American
college of cardiology, 2002. 39(12): p. 1890-1900.
48. Reller, M.D., et al., Prevalence of congenital heart defects in metropolitan Atlanta, 1998-2005.
The Journal of pediatrics, 2008. 153(6): p. 807-813.
49. Bjornard, K., et al., Patterns in the prevalence of congenital heart defects, metropolitan Atlanta,
1978 to 2005. Birth Defects Research Part A: Clinical and Molecular Teratology, 2013. 97(2): p.
87-94.
50. Botto, L.D., A. Correa, and J.D. Erickson, Racial and temporal variations in the prevalence of
heart defects. Pediatrics, 2001. 107(3): p. e32-e32.
51. Gilboa, S.M., et al., Mortality resulting from congenital heart disease among children and adults
in the United States, 1999 to 2006. Circulation, 2010. 122(22): p. 2254-2263.
52. Oster, M.E., et al., Temporal trends in survival among infants with critical congenital heart
defects. Pediatrics, 2013. 131(5): p. e1502-e1508.
53. Riehle-Colarusso, T., et al., Congenital heart defects and receipt of special education services.
Pediatrics, 2015. 136(3): p. 496-504.
54. Øyen, N., et al., Recurrence of congenital heart defects in families. Circulation, 2009. 120(4): p.
295-301.
55. Hartman, R.J., et al., The contribution of chromosomal abnormalities to congenital heart defects:
a population-based study. Pediatric cardiology, 2011. 32(8): p. 1147-1157.
56. Miller, A., et al., Congenital heart defects and major structural noncardiac anomalies, Atlanta,
Georgia, 1968 to 2005. The Journal of pediatrics, 2011. 159(1): p. 70-78. e2.
57. van der Bom, T., et al., The changing epidemiology of congenital heart disease. Nature Reviews
Cardiology, 2011. 8(1): p. 50-60.
58. Srivastava, D., Making or breaking the heart: from lineage determination to morphogenesis. Cell,
2006. 126(6): p. 1037-1048.
59. Niessen, K. and A. Karsan, Notch signaling in cardiac development. Circulation research, 2008.
102(10): p. 1169-1181.
60. Tidyman, W.E. and K.A. Rauen, The RASopathies: developmental syndromes of Ras/MAPK
pathway dysregulation. Current opinion in genetics & development, 2009. 19(3): p. 230-236.
61. Jenkins, K.J., et al., Noninherited risk factors and congenital cardiovascular defects: current
knowledge a scientific statement from the American Heart Association Council on Cardiovascular
Disease in the Young: endorsed by the American Academy of Pediatrics. Circulation, 2007.
115(23): p. 2995-3014.
62. Kučienė, R. and V. Dulskienė, Selected environmental risk factors and congenital heart defects.
Medicina (Kaunas), 2008. 44(11): p. 827-832.
63. Goh, Y., et al., Prenatal multivitamin supplementation and rates of congenital anomalies: a
meta-analysis. Journal of obstetrics and gynaecology Canada: JOGC= Journal d'obstetrique et
gynecologie du Canada: JOGC, 2006. 28(8): p. 680-689.
64. Herskind, A.M., D.A. Pedersen, and K. Christensen, Increased prevalence of congenital heart
defects in monozygotic and dizygotic twins. Circulation, 2013: p. CIRCULATIONAHA. 113.002453.
65. Caputo, S., et al., Congenital heart disease in a population of dizygotic twins: an
echocardiographic study. International journal of cardiology, 2005. 102(2): p. 293-296.
66. Pettit, K., et al., Congenital heart defects in a large, unselected cohort of monochorionic twins.
Journal of Perinatology, 2013. 33(6): p. 457-461.
56
67. Equinovarus, T. and M. Varus, Family studies and the cause of congenital club foot. 1964.
68. Byron ‐Scott, R., et al., A South Australian population ‐based study of congenital talipes
equinovarus. Paediatric and perinatal epidemiology, 2005. 19(3): p. 227-237.
69. Parker, S.E., et al., Multistate study of the epidemiology of clubfoot. Birth Defects Research Part
A: Clinical and Molecular Teratology, 2009. 85(11): p. 897-904.
70. Siapkara, A. and R. Duncan, Congenital talipes equinovarus a review of current management.
Journal of Bone & Joint Surgery, British Volume, 2007. 89(8): p. 995-1000.
71. Miedzybrodzka, Z., Congenital talipes equinovarus (clubfoot): a disorder of the foot but not the
hand. Journal of anatomy, 2003. 202(1): p. 37-42.
72. Honein, M.A., L.J. Paulozzi, and C.A. Moore, Family history, maternal smoking, and clubfoot: an
indication of a gene-environment interaction. American journal of epidemiology, 2000. 152(7): p.
658-665.
73. Alderman, B.W., E.R. Takahashi, and M.K. LeMier, Risk indicators for talipes equinovarus in
Washington State, 1987-1989. Epidemiology, 1991: p. 289-292.
74. Hollier, L.M., et al., Maternal age and malformations in singleton births. Obstetrics &
Gynecology, 2000. 96(5, Part 1): p. 701-706.
75. Carey, M., et al., Risk factors for isolated talipes equinovarus in Western Australia, 1980–1994.
Paediatric and Perinatal epidemiology, 2005. 19(3): p. 238-245.
76. Cardy, A., et al., Pedigree analysis and epidemiological features of idiopathic congenital talipes
equinovarus in the United Kingdom: a case-control study. BMC musculoskeletal disorders, 2007.
8(1): p. 1.
77. Correa, A., et al., Reporting birth defects surveillance data 1968-2003. Birth defects research.
Part A, Clinical and molecular teratology, 2007. 79(2): p. 65.
78. Myrianthopoulos, N.C., Congenital malformations in twins: epidemiologic survey. Birth defects
original article series, 1974. 11(8): p. 1-39.
79. ENGELL, V., et al., Club foot: A twin study. Journal of bone and joint surgery. British volume,
2006. 88(3): p. 374-376.
80. Watkins, S.E., et al., Classification, epidemiology, and genetics of orofacial clefts. Clinics in plastic
surgery, 2014. 41(2): p. 149-163.
81. Tanaka, S.A., et al., Updating the epidemiology of cleft lip with or without cleft palate. Plastic and
reconstructive surgery, 2012. 129(3): p. 511e-518e.
82. Mossey, P.A., et al., Cleft lip and palate. The Lancet, 2009. 374(9703): p. 1773-1785.
83. Lozano, R., et al., Global and regional mortality from 235 causes of death for 20 age groups in
1990 and 2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet,
2013. 380(9859): p. 2095-2128.
84. Wong, F. and U. Hagg, An update on the aetiology of orofacial clefts. Hong Kong Medical Journal,
2004. 10(5): p. 331-336.
85. Little, J. and E. Bryan. Congenital anomalies in twins. in Seminars in perinatology. 1986. Elsevier.
86. Wyszynski, D.F., et al., Survey of genetic counselors and clinical geneticists regarding recurrence
risks for families with nonsyndromic cleft lip with or without cleft palate. American journal of
medical genetics, 1998. 79(3): p. 184-190.
87. Stanier, P. and G.E. Moore, Genetics of cleft lip and palate: syndromic genes contribute to the
incidence of non-syndromic clefts. Human molecular genetics, 2004. 13(suppl 1): p. R73-R81.
88. Loenarz, C., et al., PHF8, a gene associated with cleft lip/palate and mental retardation, encodes
for an Nε-dimethyl lysine demethylase. Human molecular genetics, 2010. 19(2): p. 217-222.
89. Little, J., A. Cardy, and R.G. Munger, Tobacco smoking and oral clefts: a meta-analysis. Bulletin
of the World Health Organization, 2004. 82(3): p. 213-218.
57
90. Honein, M.A., et al., Maternal smoking and environmental tobacco smoke exposure and the risk
of orofacial clefts. Epidemiology, 2007. 18(2): p. 226-233.
91. Health, U.D.o. and H. Services, The health consequences of smoking—50 years of progress. A
report of the Surgeon General, 2014.
92. Christensen, K. and P. Fogh-Andersen, Isolated cleft palate in Danish multiple births, 1970-1990.
The Cleft palate-craniofacial journal, 1993. 30(5): p. 469-474.
93. Christensen, K. and P. Fogh ‐Andersen, Cleft lip ( ±cleft palate) in Danish twins, 1970 –1990.
American journal of medical genetics, 1993. 47(6): p. 910-916.
94. Mitchell, L.E. and K. Christensen, Evaluation of family history data for Danish twins with
nonsyndromic cleft lip with or without cleft palate. American journal of medical genetics, 1997.
72(1): p. 120-121.
95. Christensen, K., The 20th century Danish facial cleft population-epidemiological and genetic-
epidemiological studies. The Cleft palate-craniofacial journal, 1999. 36(2): p. 96-104.
96. Grosen, D., et al., Recurrence risk for offspring of twins discordant for oral cleft: a population-
based cohort study of the Danish 1936-2004 cleft twin cohort. Am J Med Genet A, 2010.
152A(10): p. 2468-74.
97. Grosen, D., et al., Risk of oral clefts in twins. Epidemiology, 2011. 22(3): p. 313-9.
98. Nordström, R.E., et al., Cleft-twin sets in Finland 1948-1987. The Cleft palate-craniofacial journal,
1996. 33(4): p. 340-347.
99. Lin, Y., et al., Cleft of the lip and palate in twins. Changgeng yi xue za zhi/Changgeng ji nian yi
yuan= Chang Gung medical journal/Chang Gung Memorial Hospital, 1999. 22(1): p. 61-67.
100. Mitchell, L.E., et al., Spina bifida. The Lancet, 2004. 364(9448): p. 1885-1895.
101. Sandler, A.D., Children with spina bifida: key clinical issues. Pediatric Clinics of North America,
2010. 57(4): p. 879-892.
102. Burn, S. and J. Drake, Spina Bifida: Management and Outcome. JAMA, 2009. 301(4): p. 439-440.
103. Parker, S.E., et al., Updated national birth prevalence estimates for selected birth defects in the
United States, 2004–2006. Birth Defects Research Part A: Clinical and Molecular Teratology,
2010. 88(12): p. 1008-1016.
104. Mathews, T., M.A. Honein, and J.D. Erickson, Spina bifida and anencephaly prevalence—United
States, 1991–2001. MMWR Recomm Rep, 2002. 51(RR-13)): p. 9-11.
105. Radcliff, E., et al., Hospital use, associated costs, and payer status for infants born with spina
bifida. Birth Defects Research Part A: Clinical and Molecular Teratology, 2012. 94(12): p. 1044-
1053.
106. Barkai, G., et al., Frequency of Down's syndrome and neural-tube defects in the same family. The
Lancet, 2003. 361(9366): p. 1331-1335.
107. Au, K.S., A. Ashley ‐Koch, and H. Northrup, Epidemiologic and genetic aspects of spina bifida
and other neural tube defects. Developmental disabilities research reviews, 2010. 16(1): p. 6-15.
108. Mercuri, E. and C. Longman, Congenital muscular dystrophy. Pediatric annals, 2005. 34(7): p.
560-568.
109. Sparks, S., et al., Congenital muscular dystrophy overview. 2012.
110. Mostacciuolo, M.L., et al., Genetic epidemiology of congenital muscular dystrophy in a sample
from north-east Italy. Human genetics, 1996. 97(3): p. 277-279.
111. Darin, N. and M. Tulinius, Neuromuscular disorders in childhood: a descriptive epidemiological
study from western Sweden. Neuromuscular Disorders, 2000. 10(1): p. 1-9.
112. Muntoni, F. and T. Voit, The congenital muscular dystrophies in 2004: a century of exciting
progress. Neuromuscular Disorders, 2004. 14(10): p. 635-649.
58
113. Griggs, R.C. and A.A. Amato, Muscular Dystrophies: Handbook of Clinical Neurology. Chapter 4
Congenital Muscular Dystrophy (Series Editors: Aminoff, Boller and Swaab) Vol. 101. 2011:
Elsevier.
114. Emery, A.E., The muscular dystrophies. The Lancet, 2002. 359(9307): p. 687-695.
115. Krigger, K.W., Cerebral palsy: an overview. Am Fam Physician, 2006. 73(1): p. 91-100.
116. Cans, C., J. De-la-Cruz, and M.-A. Mermet, Epidemiology of cerebral palsy. paediaTRicS and cHild
HealTH, 2008. 18(9): p. 393-398.
117. Paneth, N., T. Hong, and S. Korzeniewski, The descriptive epidemiology of cerebral palsy. Clinics
in perinatology, 2006. 33(2): p. 251-267.
118. Lynex, C.N., et al., Homozygosity for a missense mutation in the 67 kDa isoform of glutamate
decarboxylase in a family with autosomal recessive spastic cerebral palsy: parallels with Stiff-
Person Syndrome and other movement disorders. BMC Neurol, 2004. 4(1): p. 20.
119. Herder, G., [Cerebral palsy among children in Nordland 1977-91. Occurrence, etiology, disability].
Tidsskrift for den Norske laegeforening: tidsskrift for praktisk medicin, ny raekke, 1998. 118(5): p.
706-709.
120. Pharoah, P., et al., Epidemiology of cerebral palsy in England and Scotland, 1984–9. Archives of
Disease in Childhood-Fetal and Neonatal Edition, 1998. 79(1): p. F21-F25.
121. Liu, J., et al., Prevalence of cerebral palsy in China. International journal of epidemiology, 1999.
28(5): p. 949-954.
122. Odding, E., M.E. Roebroeck, and H.J. Stam, The epidemiology of cerebral palsy: incidence,
impairments and risk factors. Disability and rehabilitation, 2006. 28(4): p. 183-191.
123. Dowding, V. and C. Barry, Cerebral palsy: social class differences in prevalence in relation to
birthweight and severity of disability. Journal of Epidemiology and Community Health, 1990.
44(3): p. 191-195.
124. Dolk, H., S. Pattenden, and A. Johnson, Cerebral palsy, low birthweight and socio ‐economic
deprivation: inequalities in a major cause of childhood disability. Paediatric and perinatal
epidemiology, 2001. 15(4): p. 359-363.
125. Pharoah, P. and T. Cooke, Cerebral palsy and multiple births. Archives of Disease in Childhood-
Fetal and Neonatal Edition, 1996. 75(3): p. F174-F177.
126. Tollanes, M.C., et al., Familial risk of cerebral palsy: population based cohort study. BMJ, 2014.
349: p. g4294.
127. Pharoah, P., T. Price, and R. Plomin, Cerebral palsy in twins: a national study. Archives of Disease
in Childhood-Fetal and Neonatal Edition, 2002. 87(2): p. F122-F124.
128. Scher, A.I., The Risk of Mortality or Cerebral Palsy in Twins: A Collaborative Population-Based
Study. Pediatric Research, 2002. 52(5): p. 671-681.
129. Hemminki, K., et al., High familial risks for cerebral palsy implicate partial heritable aetiology.
Paediatric and perinatal epidemiology, 2007. 21(3): p. 235-241.
130. Blair, E. and L. Watson. Epidemiology of cerebral palsy. in Seminars in Fetal and Neonatal
Medicine. 2006. Elsevier.
131. Elzouki, A.Y., et al., Textbook of clinical pediatrics. pg.602. 2011: Springer Science & Business
Media.
132. Lasak, J.M., et al., Hearing loss: Diagnosis and management. Primary Care: Clinics in Office
Practice, 2014. 41(1): p. 19-31.
133. Fisch, L., Epidemiology of congenital hearing loss. Audiology, 1973. 12(5-6): p. 411-425.
134. Zelante, L., et al., Connexin26 mutations associated with the most common form of non-
syndromic neurosensory autosomal recessive deafness (DFNB1) in Mediterraneans. Human
molecular genetics, 1997. 6(9): p. 1605-1609.
59
135. Russ, S.A., et al., Epidemiology of congenital hearing loss in Victoria, Australia: Epidemiología de
la hipoacusia congénita en Victoria, Australia. International journal of audiology, 2003. 42(7): p.
385-390.
136. Yoshikawa, S., et al., The effects of hypoxia, premature birth, infection, ototoxic drugs,
circulatory system and congenital disease on neonatal hearing loss. Auris Nasus Larynx, 2004.
31(4): p. 361-368.
137. Kountakis, S.E., et al., Risk factors for hearing loss in neonates: a prospective study. American
journal of otolaryngology, 2002. 23(3): p. 133-137.
138. Church, M.W. and J.A. Kaltenbach, Hearing, speech, language, and vestibular disorders in the
fetal alcohol syndrome: a literature review. Alcoholism: Clinical and Experimental Research,
1997. 21(3): p. 495-512.
139. Mohney, B.G., Common forms of childhood strabismus in an incidence cohort. American journal
of ophthalmology, 2007. 144(3): p. 465-467.
140. Scholar, B.S.B.H., Prevalence of Amblyopia and Strabismus in African American and Hispanic
Children Ages 6 to 72 Months.
141. Roberts, J., M. Rowland, and N.C.f.H. Statistics, Refraction status and motility defects of persons
4-74 years, United States, 1971-1972. 1978: US Department of Health, Education, and Welfare,
Public Health Service, National Center for Health Statistics.
142. Ing, M. and S. Pang, The racial distribution of strabismus. Strabismus. New York: Grune &
Stratton, 1978: p. 107-9.
143. Maumenee, I.H., et al., Inheritance of congenital esotropia. Transactions of the American
Ophthalmological Society, 1986. 84: p. 85.
144. Griffin, J., et al., Heredity in congenital esotropia. Journal of the American Optometric
Association, 1979. 50(11): p. 1237-1242.
145. Schlossman, A. and B.S. Priestley, Role of heredity in etiology and treatment of strabismus. AMA
archives of ophthalmology, 1952. 47(1): p. 1-20.
146. Falls, H.F., Ocular changes in mongolism. Annals of the New York Academy of Sciences, 1970.
171(2): p. 627-636.
147. Harcourt, B., Strabismus affecting children with multiple handicaps. The British journal of
ophthalmology, 1974. 58(3): p. 272.
148. BREAKEY, A.S., Ocular findings in cerebral palsy. Archives of Ophthalmology, 1955. 53(6): p. 852.
149. Seaber, J.H. and A. Chandler, A five-year study of patients with cerebral palsy and strabismus.
Orthoptics: past, present, and future. New York: Stratton Intercontinental Medical Book Corp,
1976: p. 271-7.
150. Carruthers, J.D., Strabismus in craniofacial dysostosis. Graefe's archive for clinical and
experimental ophthalmology, 1988. 226(3): p. 230-234.
151. Miller, M. and E. Folk, Strabismus associated with craniofacial anomalies. The American
orthoptic journal, 1974. 25: p. 27-37.
152. Cosgrave, E., C. Scott, and R. Goble, Ocular findings in low birthweight and premature babies in
the first year: Do we need to screen? European Journal of Ophthalmology, 2008. 18(1): p. 104-
111.
153. CHRISTIANSON, R.E., The relationship between maternal smoking and the incidence of
congenital anomalies. American Journal of Epidemiology, 1980. 112(5): p. 684-695.
154. Hakim, R.B. and J.M. Tielsch, Maternal cigarette smoking during pregnancy: a risk factor for
childhood strabismus. Archives of Ophthalmology, 1992. 110(10): p. 1459-1462.
155. Wilmer, J.B. and B.T. Backus, Genetic and environmental contributions to strabismus and phoria:
evidence from twins. Vision research, 2009. 49(20): p. 2485-2493.
60
156. Martin, N., D. Boomsma, and G. Machin, A twin-pronged attack on complex traits. Nat Genet,
1997. 17(4): p. 387-92.
157. Evans, D.M. and N.G. Martin, The validity of twin studies. GeneScreen, 2000. 1(2): p. 77-79.
158. Tishler, P.V. and V.J. Carey, Can comparison of MZ-and DZ-twin concordance rates be used
invariably to estimate heritability? Twin Research and Human Genetics, 2007. 10(05): p. 712-717.
159. Glinianaia, S.V., J. Rankin, and C. Wright, Congenital anomalies in twins: a register-based study.
Hum Reprod, 2008. 23(6): p. 1306-11.
160. Mastroiacovo, P., et al., Congenital malformations in twins: an international study. Am J Med
Genet, 1999. 83(2): p. 117-24.
161. Ramos-Arroyo, M.A., Birth defects in twins: study in a Spanish population. Acta Genet Med
Gemellol (Roma), 1991. 40(3-4): p. 337-44.
162. Doyle, P.E., et al., Congenital malformations in twins in England and Wales. J Epidemiol
Community Health, 1991. 45(1): p. 43-8.
163. Kallen, B., Congenital malformations in twins: a population study. Acta Genet Med Gemellol
(Roma), 1986. 35(3-4): p. 167-78.
164. Cockburn, M., et al., The occurrence of chronic disease and other conditions in a large
population-based cohort of native Californian twins. Twin Res, 2002. 5(5): p. 460-7.
165. Cockburn, M.G., et al., Development and representativeness of a large population-based cohort
of native Californian twins. Twin Res, 2001. 4(4): p. 242-50.
166. Cozen, W., et al., The USC Adult Twin Cohorts: International Twin Study and California Twin
Program. Twin Res Hum Genet, 2013. 16(1): p. 366-70.
167. Witte, J.S., J.B. Carlin, and J.L. Hopper, Likelihood-based approach to estimating twin
concordance for dichotomous traits. Genet Epidemiol, 1999. 16(3): p. 290-304.
168. Dobbs, M.B. and C.A. Gurnett, Update on clubfoot: etiology and treatment. Clinical orthopaedics
and related research, 2009. 467(5): p. 1146-1153.
169. Gurnett, C.A., et al., Asymmetric lower-limb malformations in individuals with homeobox PITX1
gene mutation. Am J Hum Genet, 2008. 83(5): p. 616-22.
170. Gurnett, C.A., et al., Impact of congenital talipes equinovarus etiology on treatment outcomes.
Dev Med Child Neurol, 2008. 50(7): p. 498-502.
171. Hootnick, D.R., et al., Congenital arterial malformations associated with clubfoot. A report of
two cases. Clin Orthop Relat Res, 1982(167): p. 160-3.
172. Dunn, P.M., Congenital postural deformities: perinatal associations. Proc R Soc Med, 1972. 65(8):
p. 735-8.
173. Bonnell, J. and R.L. Cruess, Anomalous insertion of the soleus muscle as a cause of fixed equinus
deformity. A case report. J Bone Joint Surg Am, 1969. 51(5): p. 999-1000.
174. Engell, V., et al., Club foot A TWIN STUDY. Journal of Bone & Joint Surgery, British Volume, 2006.
88(3): p. 374-376.
175. Engle, E.C., Genetic basis of congenital strabismus. Archives of ophthalmology, 2007. 125(2): p.
189-195.
176. Cox, T.C., Taking it to the max: the genetic and developmental mechanisms coordinating
midfacial morphogenesis and dysmorphology. Clinical genetics, 2004. 65(3): p. 163-176.
177. Beaty, T.H., et al., Evidence for gene ‐environment interaction in a genome wide study of
nonsyndromic cleft palate. Genetic epidemiology, 2011. 35(6): p. 469-478.
178. Siderius, L.E., et al., X ‐linked mental retardation associated with cleft lip/palate maps to Xp11.
3 ‐q21. 3. American journal of medical genetics, 1999. 85(3): p. 216-220.
179. De Santis, M., et al., Paternal exposure and counselling: experience of a Teratology Information
Service. Reproductive Toxicology, 2008. 26(1): p. 42-46.
61
180. Honein, M.A. and S.A. Rasmussen, Further evidence for an association between maternal
smoking and craniosynostosis. Teratology, 2000. 62(3): p. 145-146.
181. Dickinson, K.C., R.E. Meyer, and J. Kotch, Maternal smoking and the risk for clubfoot in infants.
Birth Defects Research Part A: Clinical and Molecular Teratology, 2008. 82(2): p. 86-91.
182. Skelly, A.C., et al., Talipes equinovarus and maternal smoking: A population ‐based case ‐
control study in Washington state. Teratology, 2002. 66(2): p. 91-100.
183. Kancherla, V., et al., Epidemiology of congenital idiopathic talipes equinovarus in Iowa, 1997–
2005. American Journal of Medical Genetics Part A, 2010. 152(7): p. 1695-1700.
184. Chew, E., et al., Risk factors for esotropia and exotropia. Archives of ophthalmology, 1994.
112(10): p. 1349-1355.
185. Orlebekr, J. and F. Koole, The effects of parental smoking and heredity on the etiology of
childhood strabismus: a twin study. Human & Experimental Toxicology, 1999. 18(4): p. 304-304.
186. Maconachie, G.D., I. Gottlob, and R.J. McLean, Risk factors and genetics in common comitant
strabismus: a systematic review of the literature. JAMA ophthalmology, 2013. 131(9): p. 1179-
1186.
187. Savitz, D.A., P.J. Schwingl, and M.A. Keels, Influence of paternal age, smoking, and alcohol
consumption on congenital anomalies. Teratology, 1991. 44(4): p. 429-440.
188. Wasserman, C.R., et al., Parental cigarette smoking and risk for congenital anomalies of the
heart, neural tube, or limb. Teratology, 1996. 53(4): p. 261-267.
189. Leonardi-Bee, J., J. Britton, and A. Venn, Secondhand smoke and adverse fetal outcomes in
nonsmoking pregnant women: a meta-analysis. Pediatrics, 2011. 127(4): p. 734-741.
190. Maconachie, G.D., I. Gottlob, and R.J. McLean, Risk factors and genetics in common comitant
strabismus: a systematic review of the literature. JAMA Ophthalmol, 2013. 131(9): p. 1179-86.
191. Zhang, J., et al., A case-control study of paternal smoking and birth defects. International journal
of epidemiology, 1992. 21(2): p. 273-278.
192. Podgor, M.J., N.A. Remaley, and E. Chew, Associations between siblings for esotropia and
exotropia. Archives of ophthalmology, 1996. 114(6): p. 739-744.
193. Richards, A.A. and V. Garg, Genetics of congenital heart disease. Current cardiology reviews,
2010. 6(2): p. 91.
194. Wilson, J.M., Essential Cardiology: Principles and Practice. Texas Heart Institute Journal/from the
Texas Heart Institute of St. Luke's Episcopal Hospital, Texas Children's Hospital, 2005. 32(4): p.
616.
195. Stanford, L.P.C.s.H.a., Factors Contributing to Congenital Heart Disease. Retrieved 30 July 2010.
196. Mills, J.L., et al., Maternal obesity and congenital heart defects: a population-based study. The
American journal of clinical nutrition, 2010. 91(6): p. 1543-1549.
197. Ingalls, T.H., T.F. Pugh, and B. MacMahon, Incidence of anencephalus, spina bifida, and
hydrocephalus related to birth rank and maternal age. British journal of preventive & social
medicine, 1954. 8(1): p. 17-23.
198. Hay, S. and H. Barbano, Independent effects of maternal age and birth order on the incidence of
selected congenital malformations. Teratology, 1972. 6(3): p. 271-279.
199. JANERICH, D.T., Maternal age and spina bifida: longitudinal versus cross-sectional analysis.
American journal of epidemiology, 1972. 96(6): p. 389-395.
200. SEIDMAN, D.S., P. EVER-HADANI, and R. GALE, Effect of maternal smoking and age on congenital
anomalies. Obstetrics & Gynecology, 1990. 76(6): p. 1046-1050.
201. Baird, P.A., A.D. Sadovnick, and I.M. Yee, Maternal age and birth defects: a population study.
The Lancet, 1991. 337(8740): p. 527-530.
202. Milner, M., et al., The impact of maternal age on pregnancy and its outcome. International
Journal of Gynecology & Obstetrics, 1992. 38(4): p. 281-286.
62
203. Reefhuis, J. and M.A. Honein, Maternal age and non ‐chromosomal birth defects, Atlanta —
1968 –2000: Teenager or thirty ‐something, who is at risk? Birth Defects Research Part A:
Clinical and Molecular Teratology, 2004. 70(9): p. 572-579.
204. Green, R.F., et al., Association of paternal age and risk for major congenital anomalies from the
National Birth Defects Prevention Study, 1997 to 2004. Ann Epidemiol, 2010. 20(3): p. 241-9.
205. Gill, S.K., et al., Association between maternal age and birth defects of unknown
etiology―United States, 1997–2007. Birth Defects Research Part A: Clinical and Molecular
Teratology, 2012. 94(12): p. 1010-1018.
206. Croen, L.A. and G.M. Shaw, Young maternal age and congenital malformations: a population-
based study. American journal of public health, 1995. 85(5): p. 710-713.
207. Birth Defects among Low Birth Weight Infants - Metropolitan Atlanta, 1978-1988. Morbidity and
Mortality Weekly Report: MMWR, 1991. 40(6): p. 99-106.
208. Jackson, R.W., et al., Determination of twin zygosity: a comparison of DNA with various
questionnaire indices. Twin Res, 2001. 4(1): p. 12-8.
209. Reed, T., et al., Verification of self-report of zygosity determined via DNA testing in a subset of
the NAS-NRC twin registry 40 years later. Twin Research and Human Genetics, 2005. 8(04): p.
362-367.
210. MacGregor, A.J., et al., Twins: novel uses to study complex traits and genetic diseases. Trends in
Genetics, 2000. 16(3): p. 131-134.
211. Boomsma, D., A. Busjahn, and L. Peltonen, Classical twin studies and beyond. Nature reviews
genetics, 2002. 3(11): p. 872-882.
212. Snieder, H., X. Wang, and A.J. MacGregor, Twin methodology. eLS, 2010.
63
Table 2.1. Demographic characteristics by birth anomaly status among 20,803 twin pairs participating in the
California Twin Program (Birth cohort 1957-1982). Affected pairs are twin pairs with at least one selected birth
anomaly, unaffected pairs are twin pairs without any selected birth anomaly.
Affected Twin Pairs Unaffected Twin Pairs
N=1,432 N=19,371
Sex Pair
Male-Male 365 (25.49) 5450 (28.13)
Male-Female 416 (29.05) 5568 (28.74)
Female-Female 651 (45.46) 8353 (43.12)
Zygosity
MZ 414 (28.91) 6338 (32.72)
DZ 966 (67.46) 12344 (63.72)
Unknown 52 (3.63) 689 (3.56)
Race/Ethnicity
White 1086 (75.84) 13318 (68.75)
Hispanic 150 (10.47) 2880 (14.87)
African American 45 (3.14) 905 (4.67)
Others 104 (7.26) 1563 (8.07)
Unknown 47 (3.28) 705 (3.64)
County at Birth
Urban County 1365 (95.32) 18597 (96.00)
Rural County 64 (4.47) 747 (3.86)
Unknown 3 (0.21) 27 (0.14)
Birth Order
1st Birth 400 (27.93) 5425 (28.01)
2nd Birth 427 (29.82) 5565 (28.73)
3rd Birth 250 (17.46) 3454 (17.83)
4th or later Birth 309 (21.58) 4336 (22.38)
Unknown 46 (3.21) 591 (3.05)
Smoking Exposure
Neither Smoked 361 (25.21) 5812 (30.00)
Only Father Smoked 245 (17.11) 3877 (20.01)
Only Mother Smoked 163 (11.38) 2250 (11.62)
Both smoked 548 (38.27) 6013 (31.04)
Unknown 115 (8.03) 1419 (7.33)
Maternal Age at
Birth
<25 503 (35.13) 6434 (33.21)
25 - 29 394 (27.51) 5338 (27.56)
30 - 34 281 (19.62) 4095 (21.14)
>= 35 158 (11.03) 2179 (11.25)
Unknown 96 (6.70) 1325 (6.84)
Mother's Education
12 or less years 657 (45.88) 8860 (45.74)
13 or more years 647 (45.18) 8723 (45.03)
Unknown 128 (8.94) 1788 (9.23)
Father's Education
12 or less years 598 (41.76) 7602 (39.24)
13 or more years 638 (44.55) 9274 (47.88)
Unknown 196 (13.69) 2495 (12.88)
Response
Double 543 (37.92) 6704 (34.61)
Single 889 (62.08) 12667 (65.39)
Mean Age at completion of Questionnaire 31.81 ± 6.61 30.96 ± 6.77
64
Table 2.2. Frequency of birth anomalies among twin pairs by concordance and zygosity in the California Twin
Program (Birth cohort 1957-1982, N=20,803 pairs).
Birth Anomaly
Monozygotic
Twin Pair
Dizygotic Twin
Pair
Unknown
Zygosity
Total
Clubfoot
Concordant disease pairs 5 (0.07) 4 (0.03) 1 (0.13) 10 (0.05)
Discordant disease pairs 17 (0.25) 82 (0.62) 1 (0.13) 100 (0.48)
Non-disease pairs 6730 (99.67) 13224 (99.35) 739 (99.73) 20693 (99.47)
Oral Cleft
Concordant disease pairs 2 (0.03) 2 (0.02) 0 (0) 4 (0.02)
Discordant disease pairs 7 (0.1) 36 (0.27) 1 (0.13) 44 (0.21)
Non-disease pairs 6743 (99.87) 13272 (99.71) 740 (99.87) 20755 (99.77)
Deafness
Concordant disease pairs 10 (0.15) 8 (0.06) 3 (0.4) 21 (0.1)
Discordant disease pairs 70 (1.04) 128 (0.96) 4 (0.54) 202 (0.97)
Non-disease pairs 6672 (98.82) 13174 (98.98) 734 (99.06) 20580 (98.93)
Cerebral
Palsy
Concordant disease pairs 3 (0.04) 5 (0.04) 0 (0) 8 (0.04)
Discordant disease pairs 44 (0.65) 81 (0.61) 7 (0.94) 132 (0.63)
Non-disease pairs 6705 (99.3) 13224 (99.35) 734 (99.06) 20663 (99.33)
Muscular
Dystrophy
Concordant disease pairs 2 (0.03) 1 (0.01) 1 (0.13) 4 (0.02)
Discordant disease pairs 3 (0.04) 12 (0.09) 2 (0.27) 17 (0.08)
Non-disease pairs 6747 (99.93) 13297 (99.9) 738 (99.6) 20782 (99.9)
Down
Syndrome
Concordant disease pairs 0 (0) 0 (0) 0 (0) 0 (0)
Discordant disease pairs 0 (0) 26 (0.2) 1 (0.13) 27 (0.13)
Non-disease pairs 6752 (100) 13284 (99.8) 740 (99.87) 20776 (99.87)
Spina Bifida
Concordant disease pairs 1 (0.01) 0 (0) 0 (0) 1 (0)
Discordant disease pairs 16 (0.24) 33 (0.25) 3 (0.4) 52 (0.25)
Non-disease pairs 6735 (99.75) 13277 (99.75) 738 (99.6) 20750 (99.75)
Strabismus
Concordant disease pairs 33 (0.49) 27 (0.2) 5 (0.67) 65 (0.31)
Discordant disease pairs 161 (2.38) 412 (3.1) 19 (2.56) 592 (2.85)
Non-disease pairs 6558 (97.13) 12871 (96.7) 717 (96.76) 20146 (96.84)
Congenital
Heart Defects
Concordant disease pairs 13 (0.19) 14 (0.11) 2 (0.27) 29 (0.14)
Discordant disease pairs 70 (1.04) 169 (1.27) 14 (1.89) 253 (1.22)
Non-disease pairs 6669 (98.77) 13127 (98.63) 725 (97.84) 20521 (98.64)
Any birth
anomaly
Concordant disease pairs 71 (1.05) 76 (0.57) 11 (1.48) 158 (0.76)
Discordant disease pairs 343 (5.08) 890 (6.69) 41 (5.53) 1274 (6.12)
Non-disease pairs 6338 (93.87) 12344 (92.74) 689 (92.98) 19371 (93.12)
Total 6752 13310 741 20803
65
Table 2.3. Co-occurrence of multiple birth anomalies in the same individual from the California Twin Program (Birth cohort 1957-1982, N=20,803 pairs), shown in N (%).
Proband disease
(column)
Clubfoot Oral Cleft Deafness Cerebral
Palsy
Muscular
Dystrophy
Down
Syndrome
Mental
Retardation
Spina
Bifida
Strabismus
(Lazy Eye)
Congenital
Heart
Problem
Other
Clubfoot 120 (83.92) 2 (2.78) 1 (0.35) 2 (0.91) 2 (4.65) 0 (0) 2 (0.9) 1 (1.39) 5 (0.62) 4 (1.13) 4 (1.31)
Oral Cleft 2 (1.4) 52 (72.22) 1 (0.35) 1 (0.45) 2 (4.65) 1 (2.33) 5 (2.24) 0 (0) 3 (0.37) 2 (0.56) 3 (0.98)
Deafness 1 (0.7) 1 (1.39) 244 (86.22) 5 (2.27) 2 (4.65) 2 (4.65) 5 (2.24) 1 (1.39) 9 (1.11) 8 (2.25) 5 (1.63)
Cerebral Palsy 2 (1.4) 1 (1.39) 5 (1.77) 148 (67.27) 3 (6.98) 0 (0) 30 (13.45) 3 (4.17) 18 (2.22) 7 (1.97) 3 (0.98)
Muscular Dystrophy 2 (1.4) 2 (2.78) 2 (0.71) 3 (1.36) 25 (58.14) 1 (2.33) 2 (0.9) 2 (2.78) 1 (0.12) 2 (0.56) 1 (0.33)
Down Syndrome 0 (0) 1 (1.39) 2 (0.71) 0 (0) 1 (2.33) 27 (62.79) 7 (3.14) 1 (1.39) 0 (0) 4 (1.13) 0 (0)
Mental Retardation 2 (1.4) 5 (6.94) 5 (1.77) 30 (13.64) 2 (4.65) 7 (16.28) 143 (64.13) 4 (5.56) 11 (1.36) 7 (1.97) 7 (2.29)
Spina Bifida 1 (0.7) 0 (0) 1 (0.35) 3 (1.36) 2 (4.65) 1 (2.33) 4 (1.79) 54 (75) 3 (0.37) 2 (0.56) 1 (0.33)
Strabismus (Lazy
Eye)
5 (3.5) 3 (4.17) 9 (3.18) 18 (8.18) 1 (2.33) 0 (0) 11 (4.93) 3 (4.17) 722 (89.25) 20 (5.63) 17 (5.56)
Congenital Heart
Problem
4 (2.8) 2 (2.78) 8 (2.83) 7 (3.18) 2 (4.65) 4 (9.3) 7 (3.14) 2 (2.78) 20 (2.47) 286 (80.56) 13 (4.25)
Other 4 (2.8) 3 (4.17) 5 (1.77) 3 (1.36) 1 (2.33) 0 (0) 7 (3.14) 1 (1.39) 17 (2.1) 13 (3.66) 252 (82.35)
Total 143 72 283 220 43 43 223 72 809 355 306
66
Table 2.4. Co-occurrence of birth anomalies in twin pairs from the California Twin Program (Birth cohort 1957-1982, N=20,803 pairs), shown in N (%).
Proband disease (Column)
Co-twin's disease (Row)
Clubfoot Oral Cleft Deafness Cerebral
Palsy
Muscular
Dystrophy
Down
Syndrome
Spina
Bifida
Strabismus
(Lazy Eye)
Congenital
Heart
Problem
Clubfoot 10 (41.67) 2 (15.38) 0 (0) 2 (8.33) 0 (0) 0 (0) 1 (12.5) 3 (3.19) 6 (11.54)
Oral Cleft 2 (8.33) 4 (30.77) 2 (4.88) 1 (4.17) 2 (20) 1 (12.5) 1 (12.5) 1 (1.06) 0 (0)
Deafness 0 (0) 2 (15.38) 21 (51.22) 4 (16.67) 1 (10) 1 (12.5) 0 (0) 7 (7.45) 5 (9.62)
Cerebral Palsy 2 (8.33) 1 (7.69) 4 (9.76) 8 (33.33) 1 (10) 1 (12.5) 1 (12.5) 4 (4.26) 2 (3.85)
Muscular Dystrophy 0 (0) 2 (15.38) 1 (2.44) 1 (4.17) 4 (40) 4 (50) 1 (12.5) 1 (1.06) 0 (0)
Down Syndrome 0 (0) 0 (0) 1 (2.44) 1 (4.17) 0 (0) 0 (0) 0 (0) 0 (0) 0 (0)
Spina Bifida 1 (4.17) 1 (7.69) 0 (0) 1 (4.17) 1 (10) 1 (12.5) 1 (12.5) 3 (3.19) 0 (0)
Strabismus (Lazy Eye) 3 (12.5) 1 (7.69) 7 (17.07) 4 (16.67) 1 (10) 0 (0) 3 (37.5) 65 (69.15) 10 (19.23)
Congenital Heart Problem 6 (25) 0 (0) 5 (12.2) 2 (8.33) 0 (0) 0 (0) 0 (0) 10 (10.64) 29 (55.77)
Total 24 13 41 24 10 8 8 94 52
67
Table 2.5. Pairwise Concordance Ratio between monozygotic twins (MZ, N=6,752 pairs) and dizygotic like-sex twins (DZ like-sex, N=7,326
pairs) for each birth anomaly in the California Twin Program (Birth cohort 1957-1982).
Proband Zygosity Concordant (n 11) Discordant (n d) Pairwise concordance
1
(%) SE
2
(%) Concordance Ratio
3
P-value
4
Clubfoot
MZ 5 17 22.73 8.93
5.91 0.043
DZ like-sex 2 50 3.85 2.67
DZ 4 82 4.65 2.27
Oral Cleft
MZ 2 7 22.22 13.86
4.89 0.224
DZ like-sex 1 21 4.55 4.44
DZ 2 36 5.26 3.62
Deafness
MZ 10 70 12.50 3.70
2.25 0.129
DZ like-sex 4 68 5.56 2.70
DZ 8 128 5.88 2.02
Cerebral Palsy
MZ 3 44 6.38 3.57
1.40 0.699
DZ like-sex 2 42 4.55 3.14
DZ 5 81 5.81 2.52
Muscular
Dystrophy
MZ 2 3 40.00 21.91
2.80 0.315
DZ like-sex 1 6 14.29 13.23
DZ 1 12 7.69 7.39
Down
Syndrome
MZ 0 0 N.A N.A
N.A N.A
DZ like-sex 0 13 0.00 0.00
DZ 0 26 0.00 0.00
Spina Bifida
MZ 1 16 5.88 5.71
Inf. 0.303
DZ like-sex 0 18 0.00 0.00
DZ 0 33 0.00 0.00
Strabismus
MZ 33 161 17.01 2.70
2.52 0.001
DZ like-sex 17 235 6.75 1.58
DZ 27 412 6.15 1.15
Congenital
Heart Defects
MZ 13 70 15.66 3.99
1.90 0.121
DZ like-sex 9 100 8.26 2.64
DZ 14 169 7.65 1.96
Any Birth
Anomaly
MZ 71 343 17.15 1.85
2.19 <0.0001
DZ like-sex 43 507 7.82 1.14
DZ 76 890 7.87 0.87
1
% Pairwise concordance=n 11/(n 11+n d);
2
SE=sqrt(Var)=sqrt(n 11n d/(n 11+n d)^3)
3
Relative pairwise concordance=MZ (% pairwise concordance)/DZ like-sex (% pairwise concordance)
4
Chi-square tests to assess the difference in pairwise concordance rates between MZ and DZ like-sex
68
Table 2.6. Parental smoking status and risk of birth anomalies in the California Twin Program (Birth cohort 1957-1982, N=20,803).
Affected
(Concordant+Discordant) vs.
Unaffected
Neither
Smoking
N (%)
Only Father
Smoking
N (%)
Only Mother
Smoking
N (%)
Both
Smoking
N (%)
Father only vs.
Neither
Mother only vs.
Neither
Both vs. Neither P-value for
quadratic
trend
OR Adj.
*
95% CI OR Adj.
*
95% CI OR Adj.
*
95% CI
Clubfoot
Affected (C+D) pairs 23 (0.45) 10 (0.30) 11 (0.60) 39 (0.76) 0.63 0.30 - 1.34 1.27 0.61 - 2.61 1.53 0.90 - 2.60
0.029
Unaffected pairs 5086 (99.55) 3303 (99.70) 1837 (99.40) 5123 (99.24) 1 (ref) 1 (ref) 1 (ref)
Oral Cleft
Affected (C+D) pairs 13 (0.25) 6 (0.18) 7 (0.38) 10 (0.19) 0.67 0.25 - 1.80 1.47 0.58 - 3.72 0.72 0.31 - 1.67
0.517
Unaffected pairs 5096 (99.75) 3307 (99.82) 1841 (99.62) 5152 (99.81) 1 (ref) 1 (ref) 1 (ref)
Deafness
Affected (C+D) pairs 50 (0.98) 39 (1.18) 13 (0.70) 60 (1.16) 1.13 0.73 - 1.73 0.69 0.37 - 1.27 1.10 0.75 - 1.62
0.291
Unaffected pairs 5059 (99.02) 3274 (98.82) 1835 (99.30) 5102 (98.84) 1 (ref) 1 (ref) 1 (ref)
Cerebral
Palsy
Affected (C+D) pairs 36 (0.70) 19 (0.57) 10 (0.54) 37 (0.72) 0.92 0.52 - 1.62 0.83 0.41 - 1.68 1.16 0.72 - 1.85
0.923
Unaffected pairs 5073 (99.30) 3294 (99.43) 1838 (99.46) 5125 (99.28) 1 (ref) 1 (ref) 1 (ref)
Muscular
Dystrophy
Affected (C+D) pairs 6 (0.12) 1 (0.03) 1 (0.05) 5 (0.10) 0.23 0.03 - 1.90 0.42 0.05 - 3.54 0.70 0.21 - 2.36
0.864
Unaffected pairs 5103 (99.88) 3312 (99.97) 1847 (99.95) 5157 (99.90) 1 (ref) 1 (ref) 1 (ref)
Down
Syndrome
Affected (C+D) pairs 11 (0.22) 4 (0.12) 0 (0.00) 5 (0.10) 0.43 0.13 - 1.37 <0.001 N.A. 0.34 0.12 - 1.01
0.965
Unaffected pairs 5098 (99.79) 3309 (99.88) 1848 (100.00) 5157 (99.90) 1 (ref) 1 (ref) 1 (ref)
Spina
Bifida
Affected (C+D) pairs 7 (0.14) 4 (0.12) 4 (0.22) 25 (0.48) 0.88 0.25 - 3.03 1.60 0.47 - 5.51 3.48 1.48 - 8.18
0.026
Unaffected pairs 5102 (99.86) 3309 (99.88) 1844 (99.78) 5137 (99.52) 1 (ref) 1 (ref) 1 (ref)
Strabismu
s
Affected (C+D) pairs 130 (2.54) 76 (2.29) 54 (2.92) 202 (3.91) 0.91 0.68 - 1.22 1.19 0.86 - 1.65 1.61 1.28 - 2.03
0.0005
Unaffected pairs 4979 (97.46) 3237 (97.71) 1794 (97.08) 4960 (96.09) 1 (ref) 1 (ref) 1 (ref)
Congenital
Heart
defects
Affected (C+D) pairs 66 (1.29) 51 (1.54) 28 (1.52) 82 (1.59) 1.19 0.82 - 1.73 1.18 0.76 - 1.85 1.23 0.88 - 1.72
0.491
Unaffected pairs 5043 (98.71) 3262 (98.46) 1820 (98.48) 5080 (98.41) 1 (ref) 1 (ref) 1 (ref)
Any Birth
Defects
Affected (C+D) pairs 305 (5.97) 188 (5.67) 120 (6.49) 433 (8.39) 0.94 0.78 - 1.14 1.10 0.88 - 1.37 1.43 1.23 - 1.67
0.0003
Unaffected pairs 4804 (94.03) 3125 (94.33) 1728 (93.51) 4729 (91.61) 1 (ref) 1 (ref) 1 (ref)
* Adjusted for zygosity, gender, maternal age, parental education and birth order
69
Table 2.7. Pairwise Concordance Ratio between monozygotic twins (MZ, N=6,752 pairs) and dizygotic like-sex twins (DZ like-sex, N=7,326 pairs) for each
birth anomaly identified in California Twin Program (Birth cohort 1957-1982), stratified by parental smoking status.
Parent Smoking history At least one parent smoked Neither parent smoked
CRR
*
Proband Zygosity Pairwise concordance (%) Concordance Ratio P-value Pairwise concordance (%) Concordance Ratio P-value
Clubfoot
MZ 25.00
10.00 0.020
16.67
2.00 1.000
5.00
DZ like-sex 2.50 8.33
Cleft Lip
/Cleft
palate
MZ 33.33
5.67 0.155
0.00
N.A N.A
N.A
DZ like-sex 5.88 0.00
Deafness
MZ 12.77
1.91 0.329
11.11
Inf. 0.539
N.A
DZ like-sex 6.67 0.00
Cerebral
Palsy
MZ 3.33
0.47 0.605
11.76
Inf. 0.516
N.A
DZ like-sex 7.14 0.00
Muscular
Dystrophy
MZ 50.00
N.A 0.167
0.00
0.00 1.000
N.A
DZ like-sex 0.00 50.00
Down
Syndrome
MZ N.A
N.A N.A
N.A
N.A N.A
N.A
DZ like-sex 0.00 0.00
Spina
Bifida
MZ 10.00
Inf. 0.400
0.00
N.A N.A
N.A
DZ like-sex 0.00 0.00
Strabismu
s (Lazy
Eye)
MZ 19.66
3.40 0.0003
13.11
1.16 1.000
2.93
DZ like-sex 5.79 11.32
Congenital
Heart
Problem
MZ 16.33
1.73 0.274
16.67
3.67 0.226
0.47
DZ like-sex 9.41 4.55
Any Birth
Anomaly
MZ 18.50
2.34 <.0001
15.91
1.81 0.123
1.29
DZ like-sex 7.89 8.77
* CRR=Ratio of concordance ratio comparing parent smoking vs. neither parent smoking
70
Table 2.8. Maternal age and risk of birth anomalies in the California Twin Program (Birth cohort 1957-1982,
N=20,803 pairs).
Affected (Concordant+Discordant) vs. Unaffected < 30 N (%) >= 30 N (%)
>= 30 vs. <30
OR Adj.
*
95% CI P-value
Clubfoot
Affected (C+D) pairs 63 (0.58) 23 (0.41) 0.73 0.43 - 1.24 0.246
Unaffected pairs 10725 (99.42) 5639 (99.59) 1 (ref)
Oral Cleft
Affected (C+D) pairs 21 (0.19) 16 (0.28) 1.55 0.75 - 3.20 0.240
Unaffected pairs 10767 (99.81) 5646 (99.72) 1 (ref)
Deafness
Affected (C+D) pairs 122 (1.13) 47 (0.83) 0.68 0.47 - 0.99 0.045
Unaffected pairs 10666 (98.87) 5615 (99.17) 1 (ref)
Cerebral Palsy
Affected (C+D) pairs 81 (0.75) 30 (0.53) 0.69 0.43 - 1.09 0.112
Unaffected pairs 10707 (99.25) 5632 (99.47) 1 (ref)
Muscular Dystrophy
Affected (C+D) pairs 13 (0.12) 1 (0.02) 0.18 0.02 - 1.52 0.116
Unaffected pairs 10775 (99.88) 5661 (99.98) 1 (ref)
Down Syndrome
Affected (C+D) pairs 7 (0.06) 13 (0.23) 3.34 1.21 - 9.23 0.020
Unaffected pairs 10781 (99.94) 5649 (99.77) 1 (ref)
Spina Bifida
Affected (C+D) pairs 37 (0.34) 6 (0.11) 0.29 0.12 - 0.73 0.008
Unaffected pairs 10751 (99.66) 5656 (99.89) 1 (ref)
Strabismus (Lazy Eye)
Affected (C+D) pairs 325 (3.01) 184 (3.25) 1.20 0.98 - 1.47 0.084
Unaffected pairs 10463 (96.99) 5478 (96.75) 1 (ref)
Congenital Heart Defects
Affected (C+D) pairs 165 (1.53) 79 (1.40) 0.84 0.62 - 1.13 0.243
Unaffected pairs 10623 (98.47) 5583 (98.60) 1 (ref)
Any Birth Anomaly
Affected (C+D) pairs 759 (7.04) 373 (6.59) 0.94 0.82 - 1.09 0.416
Unaffected pairs 10029 (92.96) 5289 (93.41) 1 (ref)
* Adjusted for zygosity, gender, parental education and birth order
71
Table 2.9. Parental education and risk of birth anomalies in the California Twin Program (Birth cohort 1957-1982, N=20,803 pairs).
Affected (Concordant+Discordant) vs.
Unaffected
Mother's Education Father's Education
<=12
N (%)
> 12
N (%)
> 12 vs. <= 12
<=12
N (%)
> 12
N (%)
> 12 vs. <= 12
OR Adj.
*
95% CI P-value OR Adj.
*
95% CI P-value
Clubfoot
Affected (C+D) pairs 52 (0.64) 34 (0.41) 0.61 0.37 - 1.00 0.051 43 (0.58) 43 (0.48) 1.04 0.64 - 1.69 0.879
Unaffected pairs 8083 (99.36) 8281 (99.59) 1 (ref)
7395 (99.42) 8969 (99.52) 1 (ref)
Oral Cleft
Affected (C+D) pairs 18 (0.22) 19 (0.23) 1.10 0.51 - 2.34 0.810 18 (0.24) 19 (0.21) 0.79 0.37 - 1.68 0.539
Unaffected pairs 8117 (99.78) 8296 (99.77) 1 (ref)
7420 (99.76) 8993 (99.79) 1 (ref)
Deafness
Affected (C+D) pairs 97 (1.19) 72 (0.87) 0.83 0.58 - 1.19 0.311 90 (1.21) 79 (0.88) 0.82 0.58 - 1.17 0.272
Unaffected pairs 8038 (98.81) 8243 (99.13) 1 (ref)
7348 (98.79) 8933 (99.12) 1 (ref)
Cerebral
Palsy
Affected (C+D) pairs 42 (0.52) 69 (0.83) 1.55 0.99 - 2.43 0.053 42 (0.56) 69 (0.77) 1.10 0.71 - 1.72 0.668
Unaffected pairs 8093 (99.48) 8246 (99.17) 1 (ref)
7396 (99.44) 8943 (99.23) 1 (ref)
Muscular
Dystrophy
Affected (C+D) pairs 9 (0.11) 5 (0.06) 0.70 0.20 - 2.42 0.568 9 (0.12) 5 (0.06) 0.52 0.15 - 1.80 0.302
Unaffected pairs 8126 (99.89) 8310 (99.94) 1 (ref)
7429 (99.88) 9007 (99.94) 1 (ref)
Down
Syndrome
Affected (C+D) pairs 12 (0.15) 8 (0.10) 1.07 0.38 - 3.01 0.897 14 (0.19) 6 (0.07) 0.32 0.11 - 0.97 0.043
Unaffected pairs 8123 (99.85) 8307 (99.90) 1 (ref)
7424 (99.81) 9006 (99.93) 1 (ref)
Spina Bifida
Affected (C+D) pairs 21 (0.26) 22 (0.26) 1.17 0.58 - 2.33 0.665 21 (0.28) 22 (0.24) 0.84 0.42 - 1.68 0.627
Unaffected pairs 8114 (99.74) 8293 (99.74) 1 (ref)
7417 (99.72) 8990 (99.76) 1 (ref)
Strabismus
(Lazy Eye)
Affected (C+D) pairs 231 (2.84) 278 (3.34) 1.26 0.98 - 1.55 0.051 237 (3.19) 272 (3.02) 0.81 0.66 - 0.998 0.047
Unaffected pairs 7904 (97.16) 8037 (96.66) 1 (ref)
7201 (96.81) 8740 (96.98) 1 (ref)
Congenital
Heart Defects
Affected (C+D) pairs 121 (1.49) 123 (1.48) 1.05 0.78 - 1.41 0.753 113 (1.52) 131 (1.45) 0.97 0.72 - 1.30 0.833
Unaffected pairs 8014 (98.51) 8192 (98.52) 1 (ref)
7325 (98.48) 8881 (98.55) 1 (ref)
Any Birth
Anomaly
Affected (C+D) pairs 553 (6.80) 579 (6.96) 1.10 0.95 - 1.26 0.194 537 (7.22) 595 (6.60) 0.87 0.75 - 0.998 0.047
Unaffected pairs 7582 (93.20) 7736 (93.04) 1 (ref)
6901 (92.78) 8417 (93.40) 1 (ref)
* Adjusted for zygosity, gender, maternal age and birth order
72
Table 2.10: Low birth weight and risk of birth anomalies in total (N=20,803) and stratified by MZ (N=6,752) and DZ like-sex (N=7,326) twin pairs in the
California Twin Program (Birth cohort 1957-1982).
Discordant
Pairs
Total MZ DZ like-sex
b/c* ORUnadj. 95% CI P-
Value
#
b/c* ORUnadj. 95% CI P-
Value
#
b/c* ORUnadj. 95% CI P-Value
#
Low High Low High Low High
Clubfoot 48/44 1.09 0.71 1.68 0.755 7/8 0.88 0.27 2.76 1.000 24/24 1.00 0.54 1.84 1.000
Oral Cleft 20/17 1.18 0.59 2.39 0.743 3/3 1.00 0.13 7.47 1.000 10/7 1.43 0.49 4.42 0.629
Deafness 109/67 1.63 1.19 2.24 0.002 32/28 1.14 0.67 1.97 0.699 39/23 1.70 0.99 2.97 0.056
Cerebral Palsy 73/40 1.83 1.23 2.76 0.003 24/12 2.00 0.96 4.39 0.065 21/17 1.24 0.62 2.49 0.627
Muscular
Dystrophy
5/7 0.71 0.18 2.61 0.774 1/1 1.00 0.01 78.50 1.000 0/4 0.19 <0.001 1.12 0.125
Down
Syndrome
15/8 1.88 0.75 5.11 0.210 1/1 1.00 0.01 78.50 1.000 8/5 1.60 0.46 6.22 0.581
Spina Bifida 26/17 1.53 0.80 3.00 0.222 9/4 2.25 0.63 10.00 0.267 8/9 0.89 0.30 2.60 1.000
Strabismus
(Lazy Eye)
264/257 1.03 0.86 1.23 0.793 69/75 0.92 0.65 1.29 0.677 102/104 0.98 0.74 1.30 0.945
Congenital
Heart Defects
143/81 1.77 1.34 2.35 <.0001 36/25 1.44 0.84 2.50 0.200 60/30 2.00 1.27 3.21 0.002
Any Birth
Anomaly
638/482 1.32 1.17 1.49 <.0001 157/142 1.11 0.88 1.40 0.418 252/203 1.24 1.03 1.50 0.024
* Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin.
#
Fisher's Exact test
73
Table 2.11: Birth order and risk of birth anomalies in the California Twin Program (Birth cohort 1957-1982, N=20,803 pairs).
Affected
(Concordant+Discordant) vs.
Unaffected
1st Birth
N (%)
2nd Birth
N (%)
3rd Birth
N (%)
4th or later
Birth
N (%)
2nd vs. 1st 3rd vs. 1st 4th or later vs. 1st P-value for
quadratic
trend
OR Adj.
*
95% CI OR Adj.
*
95% CI OR Adj.
*
95% CI
Clubfoot
Affected (C+D) pairs 30 (0.61) 26 (0.52) 14 (0.46) 16 (0.45) 0.84 0.50 - 1.43 0.74 0.38 - 1.41 0.70 0.36 - 1.38
0.323
Unaffected pairs 4864 (99.39) 4946 (99.48) 3000 (99.54) 3554 (99.55) 1 (ref)
1 (ref)
1 (ref)
Oral Cleft
Affected (C+D) pairs 10 (0.20) 14 (0.28) 4 (0.13) 9 (0.25) 1.29 0.57 - 2.92 0.54 0.16 - 1.76 0.89 0.32 - 2.44
0.211
Unaffected pairs 4884 (99.80) 4958 (99.72) 3010 (99.87) 3561 (99.75) 1 (ref)
1 (ref)
1 (ref)
Deafness
Affected (C+D) pairs 41 (0.84) 59 (1.19) 29 (0.96) 40 (1.12) 1.45 0.97 - 2.16 1.21 0.75 - 1.98 1.53 0.94 - 2.48
0.474
Unaffected pairs 4853 (99.16) 4913 (98.81) 2985 (99.04) 3530 (98.88) 1 (ref)
1 (ref)
1 (ref)
Cerebral
Palsy
Affected (C+D) pairs 40 (0.82) 29 (0.58) 22 (0.73) 20 (0.56) 0.75 0.46 - 1.22 1.05 0.61 - 1.79 0.94 0.52 - 1.71
0.532
Unaffected pairs 4854 (99.18) 4943 (99.42) 2992 (99.27) 3550 (99.44) 1 (ref)
1 (ref)
1 (ref)
Muscular
Dystrophy
Affected (C+D) pairs 7 (0.14) 2 (0.04) 4 (0.13) 1 (0.03) 0.29 0.06 - 1.42 1.11 0.32 - 3.88 0.31 0.04 - 2.72
0.921
Unaffected pairs 4887 (99.86) 4970 (99.96) 3010 (99.87) 3569 (99.97) 1 (ref)
1 (ref)
1 (ref)
Down
Syndrome
Affected (C+D) pairs 2 (0.04) 9 (0.18) 1 (0.03) 8 (0.22) 3.54
0.76 -
16.54
0.45 0.04 - 5.08 1.96 0.37 - 10.36
0.313
Unaffected pairs 4892 (99.96) 4963 (99.82) 3013 (99.97) 3562 (99.78) 1 (ref)
1 (ref)
1 (ref)
Spina
Bifida
Affected (C+D) pairs 18 (0.37) 6 (0.12) 13 (0.43) 6 (0.17) 0.36 0.14 - 0.90 1.50 0.72 - 3.12 0.78 0.29 - 2.11
0.099
Unaffected pairs 4876 (99.63) 4966 (99.88) 3001 (99.57) 3564 (99.83) 1 (ref)
1 (ref)
1 (ref)
Strabismu
s
Affected (C+D) pairs 163 (3.33) 167 (3.36) 80 (2.65) 99 (2.77) 0.99 0.79 - 1.24 0.76 0.57 - 1.01 0.75 0.57 - 0.998
0.0081
Unaffected pairs 4731 (96.67) 4805 (96.64) 2934 (97.35) 3471 (97.23) 1 (ref)
1 (ref)
1 (ref)
Congenital
Heart
defects
Affected (C+D) pairs 74 (1.51) 63 (1.27) 48 (1.59) 59 (1.65) 0.86 0.61 - 1.20 1.13 0.78 - 1.65 1.23 0.83 - 1.81
0.093
Unaffected pairs 4820 (98.49) 4909 (98.73) 2966 (98.41) 3511 (98.35) 1 (ref)
1 (ref) 1 (ref)
Any Birth
Defects
Affected (C+D) pairs 344 (7.03) 349 (7.02) 202 (6.70) 237 (6.64) 1.00 0.86 - 1.17 0.96 0.80 - 1.16 0.95 0.78 - 1.15
0.4914
Unaffected pairs 4550 (92.97) 4623 (92.98) 2812 (93.30) 3333 (93.36) 1 (ref)
1 (ref) 1 (ref)
* Adjusted for zygosity, gender, maternal age and parental education.
74
Table 2.12. Sensitive analysis (Double-respondent pairs only): Birth anomalies among twins by zygosity in the
California Twin Study
Birth Anomaly
Monozygotic
Twin Pair
Dizygotic Twin
Pair
Unknown
Zygosity
Total
Clubfoot
Concordant disease pairs 2 (0.07) 1 (0.03) 0 (0.00) 3 (0.04)
Discordant disease pairs 6 (0.20) 22 (0.55) 0 (0.00) 28 (0.39)
Non-disease pairs 2947 (99.73) 3946 (99.42) 318 (100.00) 7211 (99.57)
Oral Cleft
Concordant disease pairs 1 (0.03) 1 (0.03) 0 (0.00) 2 (0.03)
Discordant disease pairs 2 (0.07) 11 (0.28) 0 (0.00) 13 (0.18)
Non-Disease pairs 2952 (99.90) 3957 (99.70) 318 (100.00) 7227 (99.79)
Deafness
Concordant disease pairs 6 (0.20) 2 (0.05) 1 (0.31) 9 (0.12)
Discordant disease pairs 28 (0.95) 35 (0.88) 1 (0.31) 64 (0.88)
Non-disease pairs 2921 (98.85) 3932 (99.07) 316 (99.37) 7169 (98.99)
Cerebral
Palsy
Concordant disease pairs 2 (0.07) 1 (0.03) 0 (0.00) 3 (0.04)
Discordant disease pairs 17 (0.58) 22 (0.55) 0 (0.00) 39 (0.54)
Non-disease pairs 2936 (99.36) 3946 (99.42) 318 (100.00) 7200 (99.42)
Muscular
Dystrophy
Concordant disease pairs 1 (0.03) 0 (0.00) 0 (0.00) 1 (0.01)
Discordant disease pairs 0 (0.00) 6 (0.15) 0 (0.00) 6 (0.08)
Non-disease pairs 2954 (99.97) 3963 (99.85) 318 (100.00) 7235 (99.90)
Down
Syndrome
Concordant disease pairs 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Discordant disease pairs 0 (0.00) 4 (0.10) 0 (0.00) 4 (0.06)
Non-disease pairs 2955 (100.00) 3965 (99.90) 318 (100.00) 7238 (99.94)
Spina Bifida
Concordant disease pairs 0 (0.00) 0 (0.00) 0 (0.00) 0 (0.00)
Discordant disease pairs 6 (0.20) 11 (0.28) 0 (0.00) 17 (0.23)
Non-disease pairs 2949 (99.80) 3958 (99.72) 318 (100.00) 7225 (99.77)
Strabismus
Concordant disease pairs 7 (0.24) 3 (0.08) 2 (0.63) 12 (0.17)
Discordant disease pairs 93 (3.15) 174 (4.38) 8 (2.52) 275 (3.80)
Non-disease pairs 2855 (96.62) 3792 (95.54) 308 (96.86) 6955 (96.04)
Congenital
Heart Defects
Concordant disease pairs 5 (0.17) 0 (0.00) 1 (0.31) 6 (0.08)
Discordant disease pairs 33 (1.12) 56 (1.41) 4 (1.26) 93 (1.28)
Non-disease pairs 2917 (98.71) 3913 (98.59) 313 (98.43) 7143 (98.63)
Any birth
anomaly
Concordant disease pairs 29 (0.98) 21 (0.53) 6 (1.89) 56 (0.77)
Discordant disease pairs 198 (6.70) 362 (9.12) 17 (5.35) 577 (7.97)
Non-disease pairs 2728 (92.32) 3586 (90.35) 295 (92.77) 6609 (91.26)
Total 2955 3969 318 7242
75
Table 2.13. Sensitive analysis (Double-respondent pairs only): Pairwise Concordance Ratio between monozygotic twins (MZ) and dizygotic like-
sex twins (DZ like-sex) for each birth anomaly
Proband Zygosity Concordant (n 11) Discordant (n d) Pairwise concordance
1
(%) SE
2
(%) Concordance Ratio
3
P-value
4
Clubfoot
MZ 2 6 25.00 15.31 N.A
0.102
DZ like-sex 0 16 0.00 0.00
DZ 1 22 4.35 4.25
Oral Cleft
MZ 1 2 33.33 27.22 3.00
0.446
DZ like-sex 1 8 11.11 10.48
DZ 1 11 8.33 7.98
Deafness
MZ 6 28 17.65 6.54 2.21
0.256
DZ like-sex 2 23 8.00 5.43
DZ 2 35 5.41 3.72
Cerebral Palsy
MZ 2 17 10.53 7.04 N.A
0.135
DZ like-sex 0 10 0.00 0.00
DZ 1 22 4.35 4.25
Muscular
Dystrophy
MZ 1 0 100.00 0.00 N.A
N.A
DZ like-sex 0 2 0.00 0.00
DZ 0 6 0.00 0.00
Down
Syndrome
MZ 0 0 N.A N.A N.A
N.A
DZ like-sex 0 2 0.00 0.00
DZ 0 4 0.00 0.00
Spina Bifida
MZ 0 6 0.00 0.00 Inf.
N.A
DZ like-sex 0 5 0.00 0.00
DZ 0 11 0.00 0.00
Strabismus
MZ 7 93 7.00 2.55 7.70
0.024
DZ like-sex 1 109 0.91 0.90
DZ 3 174 1.69 0.97
Congenital
Heart Defects
MZ 5 33 13.16 5.48 N.A
0.016
DZ like-sex 0 37 0.00 0.00
DZ 0 56 0.00 0.00
Any Birth
Anomaly
MZ 29 198 12.78 2.22 2.51
0.004
DZ like-sex 12 224 5.08 1.43
DZ 21 362 5.48 1.16
1
% Pairwise concordance=n 11/(n 11+n d)
2
SE=sqrt(Var)=sqrt(n 11n d/(n 11+n d)^3)
3
Relative pairwise concordance=MZ (% pairwise concordance)/DZ like-sex (% pairwise concordance)
4
Chi-square test to assess the difference in pairwise concordance rates between MZ and DZ like-sex
76
Table 2.14. Sensitive analysis (Double-respondent pairs only): Parental smoking status and risk of birth anomalies in twins
Affected (Concordant+Discordant) vs.
Unaffected
Neither
Smoking
N (%)
Only Father
Smoking
N (%)
Only Mother
Smoking
N (%)
Both
Smoking
N (%)
Father only vs.
Neither
Mother only vs.
Neither
Both vs. Neither
OR Adj.* 95% CI OR Adj.* 95% CI OR Adj.* 95% CI
Clubfoot
Affected (C+D) pairs 3 (0.15) 4 (0.35) 3 (0.53) 15 (0.88) 2.35 0.52 - 10.51 0.265 3.51 0.71 - 17.44 0.125
Unaffected pairs 1979 (99.85) 1140 (99.65) 563 (99.47) 1682 (99.12) 1 (ref)
1 (ref)
Oral Cleft
Affected (C+D) pairs 2 (0.10) 2 (0.17) 2 (0.35) 5 (0.29) 1.69 0.24 - 11.99 0.602 3.53 0.50 - 25.14 0.208
Unaffected pairs 1980 (99.90) 1142 (99.83) 564 (99.65) 1692 (99.71) 1 (ref)
1 (ref)
Deafness
Affected (C+D) pairs 15 (0.76) 15 (1.31) 6 (1.06) 18 (1.06) 1.75 0.85 - 3.59 0.127 1.40 0.54 - 3.64 0.485
Unaffected pairs 1967 (99.24) 1129 (98.69) 560 (98.94) 1679 (98.94) 1 (ref)
1 (ref)
Cerebral Palsy
Affected (C+D) pairs 13 (0.66) 6 (0.52) 4 (0.71) 13 (0.77) 0.80 0.30 - 2.12 0.655 1.08 0.35 - 3.32 0.897
Unaffected pairs 1969 (99.34) 1138 (99.48) 562 (99.29) 1684 (99.23) 1 (ref)
1 (ref)
Muscular
Dystrophy
Affected (C+D) pairs 2 (0.10) 1 (0.09) 0 (0.00) 1 (0.06) 0.90 0.08 - 9.90 0.929 <0.001 N.A. 0.973
Unaffected pairs 1980 (99.90) 1143 (99.91) 566 (100.00) 1696 (99.94) 1 (ref)
1 (ref)
Down
Syndrome
Affected (C+D) pairs 1 (0.05) 0 (0.00) 0 (0.00) 1 (0.06) <0.00
1
N.A. 0.963 <0.001 N.A. 0.974
Unaffected pairs 1981 (99.95) 1144 (100.00) 566 (100.00) 1696 (99.94) 1 (ref)
1 (ref)
Spina Bifida
Affected (C+D) pairs 4 (0.20) 1 (0.09) 1 (0.18) 6 (0.35) 0.43 0.05 - 3.88 0.454 0.88 0.10 - 7.85 0.905
Unaffected pairs 1978 (99.80) 1143 (99.91) 565 (99.82) 1691 (99.65) 1 (ref)
1 (ref)
Strabismus
Affected (C+D) pairs 67 (3.38) 43 (3.76) 24 (4.24) 83 (4.89) 1.12 0.76 - 1.65 0.585 1.27 0.79 - 2.04 0.332
Unaffected pairs 1915 (96.62) 1101 (96.24) 542 (95.76) 1614 (95.11) 1 (ref)
1 (ref)
Congenital
Heart defects
Affected (C+D) pairs 32 (1.61) 17 (1.49) 10 (1.77) 26 (1.53) 0.92 0.51 - 1.66 0.769 1.10 0.54 - 2.25 0.800
Unaffected pairs 1950 (98.39) 1127 (98.51) 556 (98.23) 1671 (98.47) 1 (ref)
1 (ref)
Any Birth
Defects
Affected (C+D) pairs 123 (6.21) 81 (7.08) 49 (8.66) 156 (9.19) 1.01 0.85 - 1.20 0.885 1.16 0.96 - 1.41 0.127
Unaffected pairs 1859 (93.79) 1063 (92.92) 517 (91.34) 1541 (90.81) 1 (ref)
1 (ref)
* Adjusted for zygosity, gender, maternal age, parental education and birth order
77
Table 2.15. Percentage agreement for the shared birth-related factors or parental exposures within double-respondent
twin pairs (N=7,247 pairs) from the California Twin Program (Birth cohort 1957-1982).
Agreement on the shared factors within twin pairs
%
Did your parents smoke cigarettes? (Neither parents vs. Father only vs. Mother only vs. Both
parents)
81.1
How old was your biological mother when you were born? (< 30 vs. ≥ 30 years old)
91.8
How many years of school did your mother finish? (≤ 12 vs. > 12 years)
81.3
How many years of school did your father finish? (≤ 12 vs. > 12 years)
80.3
Which twin weighed more at birth? (You vs. Your twin)
75.0
Considering all of your mother’s pregnancies that resulted in live births, which pregnancy
resulted in the birth of you and your twin? (1
st
vs. 2
nd
vs. 3
rd
. vs 4
th
or later).
92.7
78
Table 2.16. Comparisons of heritability estimation for each selected birth anomaly using concordance methods and structural equation modelling (SEM)
method in the California Twin Program.
Birth Anomaly
Pairwise Concordance Ratio
1
(Table 3)
Probandwise
Concordance Ratio
2
SEM estimates
3
Probandwise Concordance
Ratio Heritability (SE)
Clubfoot 5.91 5.00 5.44 0.80 (0.21)
Oral Cleft 4.89 4.18 4.50 0.65 (0.28)
Deafness 2.25 2.11 2.00 0.36 (0.18)
Cerebral Palsy 1.40 1.38 1.00 0.00 (0.00)
Muscular Dystrophy 2.80 2.29 2.70 0.40 (0.33)
Spina Bifida Inf. Inf. 8.00 0.55 (0.18)
Strabismus 2.52 2.30 2.89 0.58 (0.12)
Congenital Heart Defects 1.90 1.78 2.30 0.38 (0.17)
1
Pairwise concordance ratios were computed as shown in Table 3 by comparing pairwise concordance between 6,752 MZ twin pair and 7,326 DZ like-sex twin
pairs, where pairwise concordance was calculated for MZ or DZ-like sex twin pairs as n11/(n11+nd), and n11 is number of concordant twin pairs, nd is number
discordant twin pairs.
2
Probandwise concordance ratios were computed from Table 3 by comparing probandrwise concordance between 6,752 MZ twin pair and 7,326 DZ like-sex
twin pairs, where probandwise concordance was calculated for MZ or DZ-like sex twin pairs as 2n11/(2n11+nd), and n11 is number of concordant twin pairs, nd is
number of discordant twin pairs.
3
Structural equation modelling (SEM) estimates were computed using R package “mets” by comparing 6,752 MZ and 13,310 DZ twin pairs after adjustment for
gender. The estimates included probandwise concordance ratio for MZ vs. DZ as well as the heritability estimates based on a classic ACE twin model. In a ACE
model, “A” stands for additive genetic component, while “C” and “E” are shared and non-shared environmental components, respectively, in which heritability
was calculated with Var(A)/(Var(A)+Var(C)+Var(E)) as the contribution of additive genetic factors to the total variance.
79
Figure 2.1. Five-year prevalence rate of selected birth anomalies (per 1,000 persons)
0
2
4
6
8
10
12
14
16
18
20
22
24
1955-1959 1960-1964 1965-1969 1970-1974 1975-1979 1980-1984
5-year Prevalence Rate (per 1,000 persons)
Birth Year
Clubfoot
Oral Cleft
Deafness
Cerebral Palsy
Muscular Dystrophy
Down syndrome
Spina Bifida
Strabismus (Lazy Eye)
Congenital Heart
Defects
80
Chapter 3 Epidemiology of Colorectal Cancer
Colorectal cancer (CRC), also known as colon and rectal cancer or bowel cancer, is a cancer developed
and located in the colon or rectum. It is one of major causes of morbidity and mortality worldwide. CRC
accounts for around 10% of all cancer incidence. The incidence rate increases with civilization and
industrialization. More than half of CRC cases die from this cancer. It is highly important to apply the existing
knowledge of the distribution and potential causes of CRC to the cancer prevention. In this chapter, we review
the incidence, mortality and their trends for CRC both worldwide and in the United States, as well as the risk
factors associated with the incidence of CRC.
3.1 CRC around the World
CRC is the third most common cancer worldwide [1]. In 2008, there were about 1.2 million new CRC
cases globally that accounted for around 10% of all cancer incidence [2], with women (9.2%) and men (10.0%)
almost equally affected [3]. CRC incidence is not uniformly distributed around the world. It is mainly a disease
of countries with advanced industrialization and westernized lifestyles. The developed countries have two-
thirds of all CRC cases [4]. The age-adjusted incidence rates range from more than 30 per 100,000 people in
the United States, Australia, New Zealand, Western Europe, and Japan to less than 5 per 100,000 in African
and South-Central Asia [2]. However, the trend of CRC incidence has been stabilizing or even slightly declining
in some Western European countries and the United States [5], but increasing rapidly to almost doubled rates
since the 1970s in the countries that recently underwent the economic transition, such as Japan and some
Eastern European countries [2, 4, 6].
CRC led to nearly 715,000 deaths in 2010 up from 490,000 in 1990 [7], resulting in the fourth most
common cause of cancer death after lung, stomach and liver cancer [1]. It represents around 8% of all cancer
deaths worldwide [7]. Corresponding to the incidence, the mortality from CRC has been greatly declining in
81
those high incidence countries [6] but significantly increasing in those newly westernized countries [4].
However, the trend of mortality is affected by both incidence and survival. CRC survival is highly dependent on
the screening program, diagnostic techniques, and the stage at diagnosis. The survival rate for all-stage CRC
has increased substantially since the 1960s [8]. Particularly for the most developed countries where the
regular screening is common, the 5-year overall survival rates are 40-60 [2]. By stages, the 5-year survival rate
ranges from 90% for localized cancer, 70% at regional stage, to 10% in distant metastatic cases [8, 9]. The
differences in prevalence of screening and accessibility to the diagnosis and treatment services are likely to
result in the global disparity in CRC survival.
3.2 CRC in the United States
In the United States, CRC is the 3rd most common cancer in 2017 [10-12]. Based on records from
2012-2014, 4.3% of American men and women will develop CRC during their lifetime [10]. In 2014, the most
recent year with the currently available U.S. statistics, 139,992 individuals were diagnosed with CRC, including
73,396 men and 66,596 women [13]. Over the last decade, the African-American population has the highest
age-adjusted incidence rate for CRC [10, 14]. For 2017, it was estimated that 135,430 new cases of CRC will be
diagnosed [10]. Since the middle 1980s, CRC incidence rates have been declining on average, 2.7% each year
over the last 10 years [10]. This decreasing trend is largely due to the screening program for the early
detection of precancerous colon polyps.
In 2014, 51,651 people in the United States died from CRC cancer, including 27,134 men and 24,517
women, leading to CRC being the second leading cause of cancer death in both men and women [13]. It was
estimated that 50,260 people from the United States would die from CRC in 2017 [10]. The mortality rate for
CRC has been falling for all racial groups on the overall average of 2.5% each year during 2003-2014 [10]. The
5-year survival for CRC improved from 50% in the 1970s to 65% in 2014 with the similar trend in men and
82
women [10]. However, the increase in survival was reduced among African Americans compared with whites
[10, 14]. The survival rates by stage at diagnosis are similar to the worldwide rates [10].
According to the International Classification of Diseases for Oncology (ICD-O-3) [15], CRC cases can be
classified by location to colon (codes C18.0-C18.9 and C26.0 for cecum, appendix, ascending colon, hepatic
flexure of colon, transverse colon, splenic flexure of colon, descending colon, sigmoid colon, overlapping lesion
of colon, colon NOS and intestinal tract NOS) or rectum (codes C19.9 and C20.9 for rectosigmoid junction and
rectum NOS) cancer, in which colon cancer are further designated as proximal (codes C18.0 and C18.2-C18.5
for cecum, ascending colon, hepatic flexure of colon, transverse colon and splenic flexure of colon, also known
as “right colon”), distal (codes C18.6-C18.7 for descending colon and sigmoid colon, also known as “left colon”),
and other (codes C18.1, C18.8, C18.9, and C26.0 for appendix, overlapping lesion of colon, colon NOS and
intestinal tract NOS). Evidences have suggested that the etiologic mechanisms differ by different tumor
locations, thus leading to different clinical characteristics, drug responses and prognosis [16-19]. Based on
SEER database in 2009-2013, the proximal colon is the most common tumor location representing 41% of
newly diagnosed CRC, and followed by the rectum (28%) and the distal (22%) [20]. The proximal colon cancer
is more likely diagnosed in women than in men, as increasing with age, while the rectum cancer is the most
common CRCs diagnosed among both women and men younger than 50 years. Accordingly, the older women
usually have the highest risk for the proximal colon cancer. From 2000 to 2013, the CRC incidences for each
tumor subsite were slightly increased among people younger than 50 years and steadily decrease among the
older population [11]. Moreover, the CRC survival rates are also different by tumor subsites. The proximal
colon tumors have been associated with the higher CRC mortality, even after controlling for diagnosis stage
and treatment [21]. 69% of 5-year survival rate for the distal colon cancer is higher than the proximal cancer
[20]. Although previous studies suggested colonoscopy is less effective to detect polyps in the proximal colon,
2 recent randomized trial in European found the similar detection rates for adenomas at both the proximal
and distal colon [22, 23].
83
3.2.1 CRC in California
Based on the reports from California Cancer Registry (CCR), in 2014, 14,604 Californians were
diagnosed with CRC (10,155 colon cancer cases and 4,449 rectal cancer cases) and 6,247 deaths were due to
this disease (5,195 from colon cancer and 1,052 from rectal cancer) [12, 24]. In California, CRC is the third
most common cancer and also the second leading cause of cancer-related deaths for both males and females
in 2016. CRC accounted for 10% and 8% of newly diagnosed cancers in California males and females,
respectively; and 9% of cancer-related deaths in both males and females [24, 25]. In 2013, approximately 90%
new cases were diagnosed in Californians at age of 50 or older. Like the national trend, both incidence rate
and mortality of CRC have declined steadily for major four racial/ethnic groups over the past 25 years, largely
attributed to screening. However, CRC incidence rates are not going down equally for all groups. In 2014, Non-
Hispanic whites have the highest risk to get CRC and Asian/Pacific Islanders have the lowest risk. Compared to
the rest of the nation in 2009-2013, a higher risk was found among California Asian/Pacific Islanders (1%), but
lower risk among and African Americans (3%), Hispanics (12%) and non-Hispanic whites (8%). Non-Hispanic
whites in California have had the greatest drop in CRC incidence since 1988 by 40%, followed by African
Americans (-40%) , Asian/Pacific Islanders by (-29%), and Hispanics (-13%) [24]. The difference can be partially
explained by the disparity in access to the CRC screening. At age 50, people at average risk should get routine
screening. In 2014, 55% of Californians older than age 50 reported having had colonoscopy or sigmoidoscopy
within the past five years, and 19% reported having a fecal occult blood test (FOBT) in the past five years. The
screening rates were even lower among individuals with low incomes and among Hispanics [24]. Furthermore,
CRC risk is also not the same for subgroups in California Asians. In 2007, Japanese and Koreans have higher
CRC risk compared to other Asian subgroups. Unlike the declining trends for all the other racial/ethnic
subgroups, an increase trend in CRC rates have been observed among Korean, Filipino and Vietnamese
resident in California during the period of 1988-2007 [24, 25]. For Californians, the overall five-year survival
84
rate with CRC is 67.4%. By stages, the five-year survival is from 91.8% for localized stage dropped to 71.9% at
regional stage, and to 14% in distant cases. However, only 42.2% of CRC cases are diagnosed at an early stage
[24].
3.3. Risk factors
Although the overall incidence of CRC has been declining over the last decade, the burden of disease
remains high and disproportionate within demographic subpopulations. Many risk factors have been
associated with the risk of CRC. These include factors that individuals cannot control, called “Non-modifiable
risk factors” (e.g. age and genetic factors), and those that can be modified with the change of life styles or
environment, named “Modifiable risk factors” (e.g. diet and physical activity).
3.3.1 Non-Modifiable Risk Factors
Age. Older age is one of the strongest risk factors for the development of CRC. The disease diagnosis
progressively rises from the age of 40 and then greatly increases after the age of 50 [1, 26]. According to SEER
reports, more than 90% of new CRC cases were diagnosed at age of 50 or older. Of these, about 55% occurred
between 65 and 84 years in age, similar when colon and rectal cancers are considered separately [26].
Compared to those younger than 40 years, persons aged 60 to 79 years are more than 50 times more likely to
develop CRC [26, 27]. However, recent studies have shown increasing CRC incidence rate among younger
people, particularly those younger than 50 years with high risk conditions and not routinely screened (e.g.
those with family history or other predisposing conditions) [28-30].
Personal history of colorectal polyps. Colorectal polyps are the precursor lesions of CRC. A recent study
has shown that persons with a previous history of colorectal adenomatous polyps have a higher risk of
developing CRC than those with no history of colorectal polyps [31]. In the United States, nearly 19% of the
85
general population are estimated to develop a colorectal adenomas [32]. More than 70% of CRC arise from
colorectal adenomatous polyps [33], especially from multiple polyps or single polyps with a size greater than 1
cm, or with a villous structure, or high grade dysplasia [34-36]. Those cases comprise approximately 95% of
sporadic CRC [37]. Colorectal polyps can be detected and removed by screening procedures (e.g. colonoscopy
or sigmoidoscopy), which can potentially prevent 60% of deaths from CRC [38]. More details will be discussed
in Chapter 5.
Personal history of inflammatory bowel disease. Inflammatory bowel disease (IBD) is a group of
inflammatory conditions of the small intestine and colon, including two major types: ulcerative colitis and
Crohn disease. Ulcerative colitis induces inflammation and ulcers in the colon and rectum. Crohn disease is a
chronic inflammatory disorder, causing inflammation of the full thickness of the bowel wall that may affect
any part of gastrointestinal track from mouth to anus [27]. Patients with a history of IBD may increase the
overall risk of developing CRC by 4 to 20-fold [39, 40]. The risk increases with those conditions that last for the
long period or have more severe inflammation [41, 42]. The association of IBD and CRC is believed to be a
result of a progressive development of dysplasia, due to inflammation, that can change to cancer gradually
[42]. Therefore, people with IBD are recommended to be screened more frequently for CRC regardless of age.
Personal history of allergic conditions. Allergic conditions, such as asthma, hay fever, and food
allergies have been suggested to play a protective role in CRC risk by recent epidemiology studies [43-45]. A
potential explanation for the lower CRC risk among people with a history of allergic conditions is those people
are more likely to have a general predisposition to develop immunoglobulin E (IgE)-medicated reactions
against colorectal neoplasia [43]. However, this inverse association appears to be more significant in CRC
mortality rather than CRC incidence, and be stronger for the combined allergic conditions than a single
condition [46, 47]. In addition, a meta-analysis including 8 prospective studies and 8 case-control studies failed
86
to show a significant association between allergic conditions and CRC risk (both incidence and mortality) [48].
Therefore, evidences to date are still limited and further studies are needed to examine the association
between allergic conditions and CRC risk.
Family history of CRC or adenomatous polyps. People with a history of CRC or adenomatous polyps in
one or more relatives have a higher risk of developing CRC [27]. Those CRC cases with a family history account
for about 20% of all cases [1]. The risk is higher in people with a stronger family history, for example, a family
history of CRC or adenomatous polyps in any first-degree relative at age younger than 60, or a family history of
CRC or adenomatous polyps in two or more first-degree relatives at any age [49]. Although the reason is still
unclear, inherited genes or shared environmental factors, or their combinations are the plausible causes of the
increased CRC risk with a family history.
Inherited genetic risk. Approximately 5-10% of CRC cases are a consequence of recognized hereditary
conditions [50]. There are two major genetic syndromes associated with CRC: hereditary nonpolyposis CRC
(HNPCC or Lynch syndrome) and familial adenomatous polyposis (FAP). HNPCC is the most common type,
presenting in about 2 to 6% CRC cases [1, 27]. Mutations in MLH1 and MSH2 genes that involve in the DNA
repair pathway have been identified to cause HNPCC [51]. On average, the carriers with those recognized
mutations develop HNPCC in their mid-40s [52] and their lifetime risk of CRC can be as high as 70-80% [53, 54].
Furthermore, HNPCC conditions are also associated with the increased risk of a number of other
gastrointestinal cancers, such as stomach, small bowel, uterus and pancreas cancers [1]. For people with FAP,
CRC almost always occurs, accounting for less than 1% of all CRC cases [1, 55]. Unlike HNPCC, FAP is usually
diagnosed at a relative early age with characteristics of hundreds of polyps that typically transform malignancy
as early as age 20. For people with FAP, the lifetime risk of CRC approaches 100% without intervention by age
of 40 [1, 55]. Mutations in the tumor suppressor gene APC is responsible for FAP, which are inherited in an
87
autosomal dominant manner [56]. At least one affected parent is found in nearly 75-80% of persons with FAP
[27].
In addition, two characterized pathways, the “gatekeeper” and the “caretaker” pathway, are
associated with sporadic CRC [57]. Of them, the gatekeepers are involved in approximate 85% of sporadic CRC
cases, including a temporal acquisition of genetic changes – inactivation of the tumor suppressor gene APC,
followed by activation of K-ras oncogene and other mutations in TGF-β and p53 to drive subsequent malignant
transformation leading to CRC. As discussed above, this is also the pathway associated with FAP [58]. The
other 15% of sporadic CRC cases are caused by mutations in the caretakers, genes involved in DNA mismatch
repair pathway, which leads to DNA microsatellite instability (MSI). This is also the pathway involved in the
development of HNPCC [59]. Moreover, epigenetic factors, such as abnormal methylations of the promoter
regions of tumor suppressor genes or the mismatch repair gene, also play a role in the development of CRC
[60].
3.3.2 Modifiable risk factors
CRC is commonly considered to be an environmental disease. About 70-80% of CRC cases have been
observed as the result of combined effects from cultural, social and lifestyle factors [61]. Therefore, a number
of environmental factors associated to CRC risk have been identified and extensively studied (Table 3.1) [1, 27,
62, 63]. If those risk factors are modified in the life, a large proportion of CRC cases can theoretically be
prevented [64].
Physical Activity. There is abundant and consistent evidence from epidemiology studies showing that
higher levels of physical activities in both frequency and intensity reduce the risk of CRC cancer with a
potential dose-response effect [62]. A meta-analysis reported a 20% reduction in colon cancer risk among
males and a 14% reduction among females by the comparison of the highest physical activity vs. the lowest
88
activity. But no significant reduction was found for rectal cancer [65]. Similar results were found in Continuous
Update Project (CUP) from American Institute for Cancer Research. CUP meta-analyses including 15 papers
from cohort studies showed that CRC risk was decreased by 3% per 5 MET hr/d increasing in total physical
activity. The effect was stronger for colon cancer but reduced or removed for rectal cancer, and the similar
effects were found for both the proximal and distal tumors [62]. The potential biological explanation for this
inverse association is that physical activity can stimulate gut motility, increase the metabolic rate, and
maximize oxygen uptake. As a result, a regular physical activity for a long time period improves the body’s
metabolic efficiency and capacity [27, 31, 62]. Higher physical activity is associated with lower blood pressure,
reduced inflammation, as well as decreased insulin levels and insulin resistance, thus protect against colon
cancer [66].
Obesity. Lack of physical activity can be attributed to the increased incidence of obesity, another
factor associated with the increased risk of CRC. A clear dose-response relationship has been suggested by a
number of meta-analyses including nearly 30 cohort studies [62]. CUP meta-analysis showed that the CRC risk
increases by 2% per 1 kg/m
2
increasing in BMI. A stronger effect was found in men than women, for USA and
Asia than Europe, and for colon cancer than rectal cancer [62]. Other two published meta-analyses also
reported that CRC risk is estimated to increase by 25% in overweight men and 50% in obese men. Similar
differences in sex and cancer site were found [67, 68]. Being overweight or obese is believed to generate an
environment to initiate and promote carcinogenesis by increasing estrogens, decreasing insulin, or stimulating
the inflammatory response [62]. Moreover, a lower risk of CRC was shown among individuals who use energy
more efficiently, suggesting that the increased risk of CRC associated with overweight and obesity may not be
resulted only from increased energy intake, but also different metabolic efficiency [31, 64].
89
Adult attained height. The published epidemiological studies have shown the consistent dose-
response relationship of the increased CRC risk with increased adult attained height [62]. CUP meta-analyses
reported the increased height (per 5cm) raises the risk of CRC and colon cancer by 5% and 9%, respectively.
The increasing trend is statistically significant in both men and women for colon cancer or CRC, but only in
men for rectal cancer [62]. The biological mechanisms underlying this association are the factors that are
important in determining adult height, such as early-life nutrition, different hormone profiles (growth
hormones, insulin-like growth factors, and sex hormones), sexual maturation rate, which are relevant to
cancer risk [69-71].
Cigarette smoking. Cigarette smoking is known a risk factor for several types of cancers, particularly
lung cancer. In 2009, the International Agency for Research on Cancer concluded that tobacco smoking is a
cause of CRC and the stronger association was found for rectal cancer (RR=1.4-2.0) compared to colon cancer
(RR=1.2-1.4) [72-74]. Smoking 40 cigarettes per day can raise the risk of CRC by 38% [75]. A meta-analysis
based on 42 observational studies showed current smokers have 2 times risk of developing adenomatous
polyps than nonsmokers. Additionally, long-term smoking was found to be associated with larger colorectal
polyps [76]. The carcinogens found in tobacco induce cancer growth in the colon and rectum, thus increasing
the risk of CRC [77]. Evidence shows that 12% of CRC deaths are estimated to result from smoking [78].
Furthermore, men and women who smoke cigarettes were observed to have earlier average age of onset for
CRC [78]. However, more recent studies have suggested that smoking may be associated only with specific
tumor subtypes for CRC, such as p53 overexpression-positive tumor [79, 80].
Alcohol consumption. Regular consumption of alcohol is another factor that may increase the risk of
developing CRC. The evidences are generally consistent and there is an apparent dose-response relationship
[62]. A pooled analysis showed the 41% increased risk for the group who drank the most alcohol [81].
90
Individuals who consume 2-4 alcohol drinks per day in the lifetime have a 23% higher risk of developing CRC
than those who have less than 1 drink per day [72]. CUP meta-analyses including 15 cohort studies reported
10% increased risk for CRC and 8% increased risk for colon cancer with respect to every 10 g/day increasing in
alcohol consumption. These effects are greater in men than women, which are possibly due to the overall
higher consumption in men, different types of alcohol drinks or hormone-related differences in alcohol
metabolism [62]. Furthermore, alcohol consumption is one of factors associated with CRC onset at a younger
age [78, 82], as well as increase tumor growth in the distal colon [83]. The carcinogenic factor can be the
reactive metabolites of alcohol such as acetaldehyde, or the products that are mediated by the effects of
alcohol, for example, prostaglandins, lipid peroxidation, or free radical oxygen species [84]. Alcohol may
interact with smoking, in which the DNA mutations induced by tobacco are less efficiently repaired in the
presence of alcohol [62]. Alcohol can also function as a solvent, resulting in the enhanced penetration of other
carcinogenic molecules into mucosal cells [84]. In addition, the higher alcohol consumption individuals have,
the less essential nutrients they intake. As a result, their tissues may be more susceptible to carcinogenesis [62,
84].
Diet. Dietary factors account for about 30% of cancers in the developed countries and about 20% in
the developing countries [85]. A number of dietary factors have been shown with the strong influence in the
risk of CRC, and changes in food habits might reduce this cancer burden up to 70% [86]. Based on current
epidemiological evidence, a diet recommendation list for preventing against cancer has been suggested by the
World Cancer Research Fund, American Institute for Cancer Research [1].
One of the common dietary patterns positively associated with CRC is so called a typical “Western”
diet, including high consumption of red and processed meat, refined grains, potatoes and foods containing
sugar, but less intake of fruit and vegetables [87-89]. This diet habits are usually high in fat, animal fat in
particular, which favors the development of microbiota composition capable of degrading bile salts to
91
potentially carcinogenic N-nitroso compounds [90]. A substantial amount of epidemiological evidences has
shown the increased risk with higher intake of red meat. CUP meta-analyses reported the risk of CRC increases
17% with the increase of every 100 g/day intake of red meat, as well as 18% per every 50 g/day increased
consumption of processed meat [62]. This adverse association is stronger for colon cancer than rectal cancer
[27, 90]. The potential underlying mechanisms for this association include the presence of heme iron in red
meats [27, 91, 92] and cooking at high temperature to produce heterocyclic amines and polycyclic aromatic
hydrocarbons, both of which are carcinogenic, particularly for colon cancer among people with a genetic
predisposition [27, 91, 93].
On the contrary, other “healthy” pattern, such as a “vegetable”, “prudent” diet are inversely
associated with CRC risk [94-96], which usually consists of high intake of fruits and vegetables, fish and poultry,
whole-grain products, and low fat dairy products. Based on 12 cohort studies, CUP meta-analyses showed the
consistent inverse association between CRC risk and foods containing dietary fiber (i.e. cereal, fruit, vegetables,
wholegrain and legumes) with a clear dose-response relationship [62]. The risk of developing CRC is reduced
by 10% for every 10 g/day more intake of dietary fiber. The effect is apparent in both men and women [62].
Differences in dietary fiber intake has been postulated to be a factor attributable to the geographic
differences in the CRC incidence rates [40]. The biological mechanisms for this inverse association is still
unclear. Dietary fiber usually have effects in the gastrointestinal tract, such as diluting fecal content, increasing
stool mass, and reducing transit time, thus ultimately decreasing the CRC risk [27, 63]. The gut microflora can
also interact with the dietary carbohydrates in the colon to produce short-chain fatty acids (e.g. butyrate),
which has been observed to induce apoptosis, cell cycle arrest, and cell differentiation in experimental studies
[62]. In addition, dietary fiber intake is strongly correlated with intake of folate that has also been suggested
to protect against CRC by preventing against aberrant DNA methylation patterns [63, 97, 98]. Based on the
CUP report, every 100 g/day increase in consumptions of non-starchy vegetables and fruit decreases the CRC
risk by 2% and 3%, respectively [62]. The reduced risk is preserved but not statistically significant for colon
92
cancer and rectal cancer. Micronutrients (e.g. folate, vitamin B and methionine) and antioxidants (e.g.
vitamins C and E, selenium, flavomoids) from vegetables and fruits contribute have anti-carcinogenic
properties [61]. However, due to the wide and disparate food category and complicate food constituents, the
evidences for these protective effects are still limited [62]. Moreover, long-term use of multivitamin
supplements has been associated with a reduced risk of CRC [63]. A pooled analysis of 13 cohort studies
reported people who ever used multivitamin supplements had 12% lower risk of colon cancer than those with
never use [99]. The postulated mechanism for this potential beneficial role of multivitamins are also attributed
to the potential chemopreventive roles of single vitamins and minerals included in the multivitamin
supplement [100].
Moreover, consumption of milk and other foods containing vitamin D and calcium have been studied
for their association with CRC risk. The current evidences suggested a reduced risk or no association [63].
People who drank the most milk have shown 10-15% reduction for CRC or colon cancer comparing to those
who had the lowest milk consumption [101]. Any effect of milk in reducing CRC risk is believed to be mediated
at least in part by calcium. 6 out of 7 published cohort studies investigating calcium and CRC reported the
decreased risk with calcium supplements [62]. Use of calcium supplements has been shown to reduce the risk
of developing CRC by 24% in a published meta-analysis [101]. Plasma or serum vitamin D has been reported
consistently to decrease CRC risk as measures of intake or status increases [62]. However, there is lack of
significant association between dietary vitamin D and CRC risk [101]. Calcium may reduce the CRC risk by
binding bile and free fatty acids and promoting differentiation and apoptosis in intestinal cells, whereas both
calcium and vitamin D may restrain epithelial cell proliferation [61, 62, 102].
To date, no firm conclusions can be reached on the associations between coffee consumption and risk
of CRC because of inconsistent epidemiological evidence varied with study designs [103]. However, the
growing evidence suggests coffee consumption as a protective factor for CRC [104-106]. A most recent
population-based Molecular Epidemiology of CRC (MECC) case-control study showed that coffee consumption
93
was associated with 26% reduced risk of developing CRC in a dose-response manner for both colon and rectal
cancers. The inverse association was also found for decaffeinated coffee consumption alone [107]. A number
of bioactive compounds (e.g. chlorogenic acids, polyphenols, melanoidins, diterpenes cafestol and kahweol, as
well as caffeine) in coffee are suggested to have chemopreventive properties, which might maintain colon
health and reduce CRC risk by improving colon motility and fecal output, changing microbiome composition,
reducing inflammation, decrease oxidative stress and secreting bile acids [107, 108].
Other dietary factors, such as garlic, fish, cheese, food containing selenium/iron/sugar, have also been
extensively studied and summarized in SER and CUP report by American Institute for Cancer Research [1, 62].
Of them, consistent evidences have suggested garlic probably protects against CRC in a dose-response manner
[62]. Except that, no conclusions about other factors in relation to CRC risk can be drawn due to limited
evidences.
Medications. Evidence from epidemiological and experimental studies has shown consistently that
aspirin and other nonsteroidal anti-inflammatory drugs (NSAIDs) reduce the risk of CRC. Most studies of
aspirin have reported a lower risk of CRC incidence or mortality [63]. A meta-analysis of randomized trials
showed that incidence and mortality due to CRC are reduced by 30-40% after 20 years of follow-up among
people who used aspirin for approximate 5 years. A greater protective effect was found in proximal colon
tumors [109]. Larger dose and longer use are probably associated with lower risk [110, 111].The effectiveness
of NSAIDs may result from the chemopreventive effects on CRC by inhibiting cyclooxygenase (COX) enzymes
which expression are elevated in CRC [112]. Additionally, aspirin can inhibit proliferation and promote
apoptosis of colon cancer cells [113].
Observational studies suggest a 20-40% reduced risk of CRC among women taking postmenopausal
hormones [114]. The reduction was observed strongest among current users and the duration might not have
influence on the risk of cancer [115]. The biological evidence underlying this inverse association include
94
reducing bile acid synthesis which secondary products are believed to initiate malignant change in the colonic
epithelium, and inhibiting proliferation and deregulating growth in colonic mucosa by suppressing the
expressions of the estrogen-receptor gene and serum insulin-like growth factor-1 (IGF-1) [116-118]. Moreover,
estrogen receptor hypermethylation increases with age, a known risk factor of CRC, suggesting the declining
levels of estrogen may be important to the etiology of CRC [119]. The most recent data showed that the effect
of estrogen plus progestin is only associated with MSI-low and microsatellite-stable CRC, but not MSI-H
subtype of cancer [120]. In addition, given the important role of hormones in CRC, reproductive factors were
also examined in more than 20 epidemiological studies. However, no association were observed with age at
first birth or other parity data in relation with CRC risk [63].
3.4. Conclusions
From epidemiological studies for decades, several risk factors have been shown with convincing
evidence of a causal association with CRC risk (Table 3.1). Obese people with a sedentary lifestyle and heavy
consumption of alcohol or cigarettes have substantially greater risk of developing CRC. Although the exact
dietary patterns that increase CRC risk is still less clear, higher intake of red meat and processed meat and
lower intake of vegetables and fruits appears important. Aspirin and NSAIDs may reduce the risk. With the
knowledge of these identified causes, we can fortunately implement to cancer preventive strategies.
Therefore, maintenance of healthy weight, regular exercise, and appropriate dietary habits, together with
target screening program and early therapeutic intervention, could substantially reduce the cases and deaths
from CRC.
95
Chapter 3 References:
1. World Cancer Research Fund, A.I.f.C.R., Food, Nutrition, Phsical Activiety, and the Prevention of Cancer:
a Global Perspective. 2007.
2. Ferlay, J., et al., GLOBOCAN 2008 v1. 2, Cancer Incidence and Mortality Worldwide: IARC CancerBase
No. 10 [Internet]. International Agency for Research on Cancer, Lyon, France. Lyon, France, 2010.
3. Stewart, B. and C. Wild, World Cancer Report 2014. International Agency for Research on Cancer.
World Health Organization, 2014.
4. Boyle, P. and M.E. Leon, Epidemiology of colorectal cancer. British medical bulletin, 2002. 64(1): p. 1-
25.
5. Jemal, A., et al., Annual report to the nation on the status of cancer, 1975–2001, with a special feature
regarding survival. cancer, 2004. 101(1): p. 3-27.
6. Boyle, P. and J. Ferlay, Mortality and survival in breast and colorectal cancer. Nature Clinical Practice
Oncology, 2005. 2(9): p. 424-425.
7. Lozano, R., et al., Global and regional mortality from 235 causes of death for 20 age groups in 1990
and 2010: a systematic analysis for the Global Burden of Disease Study 2010. The Lancet, 2013.
380(9859): p. 2095-2128.
8. Edwards, B.K., et al., Annual report to the nation on the status of cancer, 1975 ‐2006, featuring
colorectal cancer trends and impact of interventions (risk factors, screening, and treatment) to reduce
future rates. Cancer, 2010. 116(3): p. 544-573.
9. Ries, L.A., et al., SEER cancer statistics review, 1975-2003. 2006.
10. Howlader N, N.A., Krapcho M, Garshell J, Miller D, Altekruse SF, Kosary CL, Yu M, Ruhl J, Tatalovich
Z,Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds), SEER Cancer Statistics Review, 1975-2014,
National Cancer Institute. Bethesda, MD, http://seer.cancer.gov/csr/1975_2012/, based on November
2016 SEER data submission, posted to the SEER web site, April 2017. 2017.
11. Siegel, R.L., et al., Colorectal cancer statistics, 2017. CA: a cancer journal for clinicians, 2017. 67(3): p.
177-193.
12. Society., A.C., Colorectal Cancer Facts & Figures 2017-2019. Atlanta: American Cancer Society, 2017.
13. Group, U.S.C.S.W., United States Cancer Statistics: 1999–2014 Incidence and Mortality Web-based
Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease Control and
Prevention, and National Cancer Institute. 2017.
14. Ghafoor, A., et al., Cancer statistics for African Americans. CA: a cancer journal for clinicians, 2002.
52(6): p. 326-341.
15. Percy, C., et al., International classification of diseases for oncology. 1990.
16. Loupakis, F., et al., Primary tumor location as a prognostic factor in metastatic colorectal cancer.
Journal of the National Cancer Institute, 2015. 107(3): p. dju427.
17. Lee, G., et al., Is right-sided colon cancer different to left-sided colorectal cancer?–a systematic review.
European Journal of Surgical Oncology (EJSO), 2015. 41(3): p. 300-308.
18. Nawa, T., et al., Differences between right ‐and left ‐sided colon cancer in patient characteristics,
cancer morphology and histology. Journal of gastroenterology and hepatology, 2008. 23(3): p. 418-423.
19. Iacopetta, B., Are there two sides to colorectal cancer? International journal of cancer, 2002. 101(5): p.
403-408.
20. Surveillance, E., and End Results (SEER) Program. SEER, EER*Stat Database: North American
Association of Central Cancer Registries (NAACCR) Incidence-Cancer in North America (CiNA) Analytic
File, 1995-2013, for NHIA v2 Origin, Custom File With County, American Cancer Society (ACS) Facts and
Figures Projection Project. . Springfield, IL: NAACCR, 2016.
96
21. Petrelli, F., et al., Prognostic survival associated with left-sided vs right-sided colon cancer: a
systematic review and meta-analysis. JAMA oncology, 2017. 3(2): p. 211-219.
22. Son, J.S., et al., Altered Interactions between the Gut Microbiome and Colonic Mucosa Precede
Polyposis in APCMin/+ Mice. PLoS One, 2015. 10(6): p. e0127985.
23. Morgan, X.C. and C. Huttenhower, Chapter 12: Human microbiome analysis. PLoS Comput Biol, 2012.
8(12): p. e1002808.
24. American Cancer Society. California Department of Public Health, C.C.R., California Cancer Facts &
Figures 2017 Oakland, CA: American Cancer Society, Inc., California Division, 2017.
25. Mann SC, G.B., Morris CR, Parikh-Patel A, Kizer KW, Kwong SL, Snipes KP, Cancer in California, 1988-
2011. Sacramento, CA: California Department of Public Health, Chronic Disease Surveillance and
Research Branch, June 2015., 2015.
26. Ries, L., et al., SEER Cancer Statistics Review, 1975–2005, National Cancer Institute 2008. SEER web site,
2008.
27. Haggar, F.A. and R.P. Boushey, Colorectal cancer epidemiology: incidence, mortality, survival, and risk
factors. Clinics in colon and rectal surgery, 2009. 22(4): p. 191.
28. O'Connell, J.B., et al., Rates of colon and rectal cancers are increasing in young adults. The American
Surgeon, 2003. 69(10): p. 866.
29. O'Connell, J.B., et al., Colorectal cancer in the young. The American journal of surgery, 2004. 187(3): p.
343-348.
30. Fairley, T.L., et al., Colorectal cancer in US adults younger than 50 years of age, 1998–2001. Cancer,
2006. 107(S5): p. 1153-1161.
31. de Jong, A.E., et al., Prevalence of adenomas among young individuals at average risk for colorectal
cancer. The American journal of gastroenterology, 2005. 100(1): p. 139-143.
32. Labianca, R., et al., Colorectal cancer: screening. Annals of oncology: official journal of the European
Society for Medical Oncology/ESMO, 2004. 16: p. ii127-32.
33. Schatzkin, A., et al., Interpreting precursor studies: what polyp trials tell us about large-bowel cancer.
Journal of the National Cancer Institute, 1994. 86(14): p. 1053-1057.
34. O'Brien, M.J., et al., The national polyp study. Gastroenterology, 1990. 98(2): p. 371-379.
35. Jass, J.R., Do all colorectal carcinomas arise in preexisting adenomas? World journal of surgery, 1989.
13(1): p. 45-51.
36. Schuman, B.M., H. Simsek, and R.C. Lyons, The Association of Multiple Colonic Adenomatous Polyps
with Cancer of the Colon∗. American Journal of Gastroenterology, 1990. 85(7).
37. Society, A.C., Colorectal Cancer Facts & Figures Special Edition 2005. Oklahoma City, OK: American
Cancer Society, 2005.
38. He, J. and J.E. Efron, Screening for colorectal cancer. Adv Surg, 2011. 45: p. 31-44.
39. Jawad, N., N. Direkze, and S.J. Leedham, Inflammatory bowel disease and colon cancer, in
Inflammation and Gastrointestinal Cancers. 2011, Springer. p. 99-115.
40. Janout, V. and H. Kollárová, Epidemiology of colorectal cancer. ACTA-UNIVERSITATIS PALACKIANAE
OLOMUCENSIS FACULTATIS MEDICAE, 2001: p. 5-10.
41. Xie, J. and S.H. Itzkowitz, Cancer in inflammatory bowel disease. World J Gastroenterol, 2008. 14(3): p.
378-89.
42. Triantafillidis, J.K., G. Nasioulas, and P.A. Kosmidis, Colorectal cancer and inflammatory bowel disease:
epidemiology, risk factors, mechanisms of carcinogenesis and prevention strategies. Anticancer Res,
2009. 29(7): p. 2727-37.
43. Tambe, N.A., et al., Atopic Allergic Conditions and Colorectal Cancer Risk in the Multiethnic Cohort
Study. American journal of epidemiology, 2015. 181(11): p. 889-897.
44. Hwang, C.Y., et al., Cancer risk in patients with allergic rhinitis, asthma and atopic dermatitis: a
nationwide cohort study in Taiwan. International journal of cancer, 2012. 130(5): p. 1160-1167.
97
45. Prizment, A.E., et al., History of allergy and reduced incidence of colorectal cancer, Iowa Women's
Health Study. Cancer Epidemiology Biomarkers & Prevention, 2007. 16(11): p. 2357-2362.
46. Turner, M.C., et al., Cancer mortality among US men and women with asthma and hay fever. American
journal of epidemiology, 2005. 162(3): p. 212-221.
47. Jacobs, E.J., et al., Hay fever and asthma as markers of atopic immune response and risk of colorectal
cancer in three large cohort studies. Cancer Epidemiology Biomarkers & Prevention, 2013. 22(4): p.
661-669.
48. Josephs, D., et al., Epidemiological associations of allergy, IgE and cancer. Clinical & Experimental
Allergy, 2013. 43(10): p. 1110-1123.
49. Boardman, L.A., et al., Colorectal cancer risks in relatives of young-onset cases: is risk the same across
all first-degree relatives? Clinical Gastroenterology and Hepatology, 2007. 5(10): p. 1195-1198.
50. Jackson ‐Thompson, J., et al., Descriptive epidemiology of colorectal cancer in the United States,
1998 –2001. Cancer, 2006. 107(S5): p. 1103-1111.
51. Papadopoulos, N., et al., Mutation of a mutL homolog in hereditary colon cancer. Science, 1994.
263(5153): p. 1625-1629.
52. Weitz, J., et al., Colorectal cancer. Lancet, 2005. 365(9454): p. 153-65.
53. Jeter, J.M., W. Kohlmann, and S.B. Gruber, Genetics of colorectal cancer. Oncology (Williston Park, NY),
2006. 20(3): p. 269-76; discussion 285-6, 288-9.
54. Al-Sukhni, W., M. Aronson, and S. Gallinger, Hereditary colorectal cancer syndromes: familial
adenomatous polyposis and lynch syndrome. Surgical Clinics of North America, 2008. 88(4): p. 819-844.
55. Half, E., D. Bercovich, and P. Rozen, Familial adenomatous polyposis. Orphanet J Rare Dis, 2009. 4: p.
22.
56. Wilmink, A., Overview of the epidemiology of colorectal cancer. Diseases of the colon & rectum, 1997.
40(4): p. 483-493.
57. Kinzler, K.W. and B. Vogelstein, Cancer-susceptibility genes. Gatekeepers and caretakers. Nature, 1997.
386(6627): p. 761, 763.
58. Pino, M.S. and D.C. Chung, The chromosomal instability pathway in colon cancer. Gastroenterology,
2010. 138(6): p. 2059-2072.
59. Boland, C.R. and A. Goel, Microsatellite instability in colorectal cancer. Gastroenterology, 2010. 138(6):
p. 2073-2087. e3.
60. Schuebel, K.E., et al., Comparing the DNA hypermethylome with gene mutations in human colorectal
cancer. PLoS Genet, 2007. 3(9): p. e157.
61. Rasool, S., et al., A comparative overview of general risk factors associated with the incidence of
colorectal cancer. Tumor Biology, 2013. 34(5): p. 2469-2476.
62. World Cancer Research Fund, A.I.f.C.R., Colorectal Cancer 2011 Report: Food, Nutrition, Physical
Activity, and the Prevention of Colorectal Cancer (Continuous Update Project). 2011.
63. Hunter, J.D.P.a.D., Chapter 1: Colorectal Cancer: Epidemiology, Genetics of Colorectal Cancer, pg. 5-15.
Springer Science, LLC 2009, 2009.
64. Boyle, P. and J. Langman, ABC of colorectal cancer: Epidemiology. British Medical Journal, 2000.
321(7264): p. 805-808.
65. Harriss, D., et al., Lifestyle factors and colorectal cancer risk (2): a systematic review and meta ‐
analysis of associations with leisure ‐time physical activity. Colorectal Disease, 2009. 11(7): p. 689-
701.
66. Lee, K.-J., et al., Physical activity and risk of colorectal cancer in Japanese men and women: the Japan
Public Health Center-based prospective study. Cancer Causes & Control, 2007. 18(2): p. 199-209.
67. Moghaddam, A.A., M. Woodward, and R. Huxley, Obesity and risk of colorectal cancer: a meta-analysis
of 31 studies with 70,000 events. Cancer Epidemiology Biomarkers & Prevention, 2007. 16(12): p.
2533-2547.
98
68. Renehan, A.G., et al., Body-mass index and incidence of cancer: a systematic review and meta-analysis
of prospective observational studies. The Lancet, 2008. 371(9612): p. 569-578.
69. Jousilahti, P., et al., Relation of adult height to cause-specific and total Mortality: A prospective follow-
up study of 31, 199 middle-aged men and women in Finland. American journal of epidemiology, 2000.
151(11): p. 1112-1120.
70. Gunnell, D., et al., Height, leg length, and cancer risk: a systematic review. Epidemiologic reviews,
2001. 23(2): p. 313-342.
71. Wadsworth, M., et al., Leg and trunk length at 43 years in relation to childhood health, diet and family
circumstances; evidence from the 1946 national birth cohort. International Journal of Epidemiology,
2002. 31(2): p. 383-390.
72. Society, A.C., Colorectal Cancer Facts & Figures 2011-2013. 2011, American Cancer Society Atlanta, GA.
73. Paskett, E.D., et al., Association between cigarette smoking and colorectal cancer in the Women’s
Health Initiative. Journal of the National Cancer Institute, 2007. 99(22): p. 1729-1735.
74. Humans, I.W.G.o.t.E.o.C.R.t., Tobacco smoke and involuntary smoking. IARC monographs on the
evaluation of carcinogenic risks to humans/World Health Organization, International Agency for
Research on Cancer, 2004. 83: p. 1.
75. Liang, P.S., T.Y. Chen, and E. Giovannucci, Cigarette smoking and colorectal cancer incidence and
mortality: Systematic review and meta ‐analysis. International Journal of Cancer, 2009. 124(10): p.
2406-2415.
76. Botteri, E., et al., Cigarette smoking and adenomatous polyps: a meta-analysis. Gastroenterology,
2008. 134(2): p. 388-395. e3.
77. Secretan, B., et al., A review of human carcinogens—Part E: tobacco, areca nut, alcohol, coal smoke,
and salted fish. The lancet oncology, 2009. 10(11): p. 1033-1034.
78. Zisman, A.L., et al., Associations between the age at diagnosis and location of colorectal cancer and
the use of alcohol and tobacco: implications for screening. Archives of internal medicine, 2006. 166(6):
p. 629-634.
79. Diergaarde, B., et al., Cigarette smoking and genetic alterations in sporadic colon carcinomas.
Carcinogenesis, 2003. 24(3): p. 565-571.
80. Watson, A.J. and P.D. Collins, Colon cancer: a civilization disorder. Digestive diseases, 2011. 29(2): p.
222-228.
81. Cho, E., et al., Alcohol intake and colorectal cancer: a pooled analysis of 8 cohort studies. Annals of
internal medicine, 2004. 140(8): p. 603-613.
82. Tsong, W., et al., Cigarettes and alcohol in relation to colorectal cancer: the Singapore Chinese Health
Study. British journal of cancer, 2007. 96(5): p. 821-827.
83. Bazensky, I., C. Shoobridge-Moran, and L.H. Yoder, Colorectal cancer: an overview of the epidemiology,
risk factors, symptoms, and screening guidelines. Medsurg Nursing, 2007. 16(1): p. 46.
84. Pöschl, G. and H.K. Seitz, Alcohol and cancer. Alcohol and alcoholism, 2004. 39(3): p. 155-165.
85. Rasool, S., et al., Esophageal cancer: associated factors with special reference to the Kashmir Valley.
Tumori, 2012. 98(2): p. 191.
86. Willett, W.C., Diet and cancer: an evolving picture. Jama, 2005. 293(2): p. 233-234.
87. Williams, C.D., et al., Dietary patterns, food groups, and rectal cancer risk in Whites and African-
Americans. Cancer Epidemiology Biomarkers & Prevention, 2009. 18(5): p. 1552-1561.
88. Kim, M.K., et al., Dietary patterns and subsequent colorectal cancer risk by subsite: a prospective
cohort study. International journal of cancer, 2005. 115(5): p. 790-798.
89. Kesse, E., F. Clavel-Chapelon, and M.-C. Boutron-Ruault, Dietary patterns and risk of colorectal tumors:
a cohort of French women of the National Education System (E3N). American journal of epidemiology,
2006. 164(11): p. 1085-1093.
99
90. Larsson, S.C. and A. Wolk, Meat consumption and risk of colorectal cancer: a meta ‐analysis of
prospective studies. International journal of cancer, 2006. 119(11): p. 2657-2664.
91. Santarelli, R.L., F. Pierre, and D.E. Corpet, Processed meat and colorectal cancer: a review of
epidemiologic and experimental evidence. Nutrition and cancer, 2008. 60(2): p. 131-144.
92. Kabat, G., et al., A cohort study of dietary iron and heme iron intake and risk of colorectal cancer in
women. British journal of cancer, 2007. 97(1): p. 118-122.
93. Sinha, R., An epidemiologic approach to studying heterocyclic amines. Mutation
Research/Fundamental and Molecular Mechanisms of Mutagenesis, 2002. 506: p. 197-204.
94. Slattery, M.L., et al., Eating patterns and risk of colon cancer. American Journal of Epidemiology, 1998.
148(1): p. 4-16.
95. Dixon, L.B., et al., Dietary patterns associated with colon and rectal cancer: results from the Dietary
Patterns and Cancer (DIETSCAN) Project. The American journal of clinical nutrition, 2004. 80(4): p.
1003-1011.
96. Terry, P., et al., Prospective study of major dietary patterns and colorectal cancer risk in women.
American journal of epidemiology, 2001. 154(12): p. 1143-1149.
97. Feinberg, A.P. and B. Vogelstein. Alterations in DNA methylation in human colon neoplasia. in
Seminars in surgical oncology. 1987. Wiley Online Library.
98. Timbo, B.B., et al., Dietary supplements in a national survey: prevalence of use and reports of adverse
events. Journal of the American Dietetic Association, 2006. 106(12): p. 1966-1974.
99. Park, Y., et al., Intakes of vitamins A, C, and E and use of multiple vitamin supplements and risk of colon
cancer: a pooled analysis of prospective cohort studies. Cancer Causes Control, 2010. 21(11): p. 1745-
57.
100. Chau, R., et al., Multivitamin, calcium and folic acid supplements and the risk of colorectal cancer in
Lynch syndrome. International Journal of Epidemiology, 2016: p. dyw036.
101. Huncharek, M., J. Muscat, and B. Kupelnick, Colorectal cancer risk and dietary intake of calcium,
vitamin D, and dairy products: a meta-analysis of 26,335 cases from 60 observational studies.
Nutrition and cancer, 2008. 61(1): p. 47-69.
102. Chang, D., et al., Evaluation of oxidative stress in colorectal cancer patients. Biomedical and
Environmental Sciences, 2008. 21(4): p. 286-289.
103. Bidel, S., et al., Coffee consumption and risk of colorectal cancer. European journal of clinical nutrition,
2010. 64(9): p. 917-923.
104. Je, Y., W. Liu, and E. Giovannucci, Coffee consumption and risk of colorectal cancer: a systematic
review and meta-analysis of prospective cohort studies. Int J Cancer, 2009. 124(7): p. 1662-8.
105. Galeone, C., et al., Coffee consumption and risk of colorectal cancer: a meta-analysis of case-control
studies. Cancer Causes Control, 2010. 21(11): p. 1949-59.
106. Ainslie-Waldman, C.E., et al., Coffee intake and gastric cancer risk: the Singapore Chinese health study.
Cancer Epidemiol Biomarkers Prev, 2014. 23(4): p. 638-47.
107. Schmit, S.L., et al., Coffee Consumption and the Risk of Colorectal Cancer. Cancer Epidemiology
Biomarkers & Prevention, 2016. 25(4): p. 634-639.
108. Bai, Y., et al., Relationship between bladder cancer and total fluid intake: a meta-analysis of
epidemiological evidence. World J Surg Oncol, 2014. 12: p. 223.
109. Rothwell, P.M., et al., Long-term effect of aspirin on colorectal cancer incidence and mortality: 20-year
follow-up of five randomised trials. Lancet, 2010. 376(9754): p. 1741-50.
110. Flossmann, E. and P.M. Rothwell, Effect of aspirin on long-term risk of colorectal cancer: consistent
evidence from randomised and observational studies. Lancet, 2007. 369(9573): p. 1603-13.
111. Algra, A.M. and P.M. Rothwell, Effects of regular aspirin on long-term cancer incidence and metastasis:
a systematic comparison of evidence from observational studies versus randomised trials. Lancet Oncol,
2012. 13(5): p. 518-27.
100
112. Komiya, M., et al., Prevention and intervention trials for colorectal cancer. Jpn J Clin Oncol, 2013. 43(7):
p. 685-94.
113. Hamoya, T., et al., Effects of NSAIDs on the risk factors of colorectal cancer: a mini review. Genes and
Environment, 2016. 38(1): p. 1.
114. Mørch, L.S., et al., The influence of hormone therapies on colon and rectal cancer. European journal of
epidemiology, 2016: p. 1-9.
115. Grodstein, F., P.A. Newcomb, and M.J. Stampfer, Postmenopausal hormone therapy and the risk of
colorectal cancer: a review and meta-analysis. Am J Med, 1999. 106(5): p. 574-82.
116. McMichael, A.J. and J.D. Potter, Reproduction, endogenous and exogenous sex hormones, and colon
cancer: a review and hypothesis. 1980.
117. Lointier, P., D. Wildrick, and B. Boman, The effects of steroid hormones on a human colon cancer cell
line in vitro. Anticancer research, 1991. 12(4): p. 1327-1330.
118. Campagnoli, C., et al., Differential effects of oral conjugated estrogens and transdermal estradiol on
insulin-like growth factor 1, growth hormone and sex hormone binding globulin serum levels.
Gynecological Endocrinology, 2009.
119. Issa, J.-P.J., et al., Methylation of the oestrogen receptor CpG island links ageing and neoplasia in
human colon. Nature genetics, 1994. 7(4): p. 536-540.
120. Newcomb, P.A., et al., Estrogen plus progestin use, microsatellite instability, and the risk of colorectal
cancer in women. Cancer Research, 2007. 67(15): p. 7534-7539.
121. World Cancer Research Fund, A.I.f.C.R., Diet, nutrition, physical activity and colorectal cancer 2017.
2017.
101
Table 3.1. Lifestyle factors related to CRC risk (Abstracted from World Cancer Research
Fund/American Institute for Cancer Research (2007, 2011, 2017) & Potter et al. 2009) [1, 27, 62, 63,
121]
Reported evidences Decreases risk Increases risk
Convincing Physical Activity Processed meat
Alcoholic drinks
Body fatness
Adult attained height
Cigarette smoking
Probable Wholegrains Red meat
Foods containing dietary fiber
Calcium supplements
Dairy products
Suggestive Non-starchy vegetables Foods containing haem iron
Fruits
Foods containing Vitamin C
Vitamin D
Multivitamin supplements
Fish
No conclusion Cereals (grains) and their products; potatoes; animal fat; poultry; shellfish
and other seafood; fatty acid composition; cholesterol; dietary n-3 fatty acid
from fish; legumes; garlic; non-dairy sources of calcium; foods containing
added sugars; sugar (sucrose); coffee; tea; caffeine; carbohydrate; total fat;
starch; glycaemic load; glycaemic index; folate; vitamin A; vitamin B6;
vitamin E; selenium; low fat; methionine; beta-carotene; alpha-carotene;
lycopene; retinol; energy intake; meal frequency; dietary pattern
102
Chapter 4 Lifestyle Factors and Colorectal Cancer risk in California Twins
4.1 Abstract
Associations between lifestyle factors and colorectal cancer have been studied previously, but
few studies have had the capability to control for genetic and early shared environmental factors. In this
study, we analyzed questionnaire data from 37,435 twin pairs born 1908-1982 enrolled in the California
Twin Program (CTP) to obtain the lifestyle-related factors, including anthropometric measures, alcohol
use, smoking, exercise, medication use, history of disease as well as diet, and studied their associations
with the risk of colorectal cancer (CRC) that was identified by linking to databases from California Cancer
Registry (CCR) and Los Angeles County Cancer Surveillance Program (CSP). We performed a matched
case-control study within CRC-discordant like-sex twin pairs using the unaffected co-twin as a control
(n= 90 pairs) to control for the genetic background, shared early life exposures, as well as the gender
differences, and also stratified the analysis by twin zygosity and tumor subsites for possible different
etiologic effects.
In this study, we found that twins who consumed one more meal of vegetables per week had 21%
lower risk to develop CRC compared to their co-twins (OR=0.79; 95% CI: 0.63-1.00). For each food
category, the protective effect is more apparent among twins who ate more potatoes, which leads to 80%
reduction in CRC risk (OR=0.20; 95%CI: 0.04-0.71) and was strengthened among MZ twins (OR=0.10; 95%
CI: 0.002-0.70) or when limiting to proximal colon cancer only (OR=0.11; 95% CI: 0.003-0.80). When
separating to monozygotic (MZ) and dizygotic (DZ) like-sex twins, MZ twins who smoked at age of 18 or
younger were 9 times more likely to have CRC than those smoking at older age (OR=9.00; 95% CI: 1.25-
394.48). Similar positive association was observed for all twins with distal colon cancer (OR=6.73; 95% CI:
1.22-Inf.). Moreover, CRC risk was elevated more than two-fold among DZ twins who reported more
alcohol drinks (OR=2.27; 95% CI: 1.08-5.12). However, due to the small sample size, the findings should
103
be interpreted with caution. To increase the power and validate our findings, we are planning to
conduct another linkage to California cancer registries for more cases diagnosed after 2010.
4.2 Introduction
Colorectal cancer (CRC) is the third most common cancer in both men and women [1] and the
second leading cause of cancer-related deaths in the United States [2]. The lifetime risk of CRC is 4.3% in
American men and women [1]. In 2017, 135,430 CRC cases was estimated to be newly diagnosed and
50,260 deaths from CRC were expected [1]. Both incidence and mortality of CRC in the United States
have been declining over the past 20 years [1], largely due to screening for the early detection of
precancerous colon polyps. However, the burden of disease still remains high and disproportionate
within demographic subpopulations.
About 70-80% of CRC cases have been observed as the result of combined effects from cultural,
social and lifestyle factors [3]. In particular, advanced industrialization and westernized lifestyles have
been associated with this disease. The risk of all CRC is usually higher in the developed countries [4].
Recent migration studies reported the increased incidence rates of CRC among migrants and their
offspring who migrated from low-risk to high-risk countries [5, 6]. In the United States, Japanese
migrants to Hawaii have a greater risk of developing CRC compared to native Japanese. In fact, CRC
incidence among the offspring of Japanese migrants to the United States now has increased close to
that in the white population, which is 3-4 times higher than the rates among the Japanese in Japan [7, 8].
The excess incidence among the American Japanese is attributed to a number of environmental factors
associated with CRC risk. Therefore, if those risk factors can be identified and modified, a large
proportion of CRC cases may theoretically be prevented.
Personal and lifestyle factors that may modify the risk of CRC have been extensively studied in
the general population. World Cancer Research Fund and American Institute for Cancer Research group
104
have conducted comprehensive meta-analyses and systematic literature review incorporating the fast-
growing epidemiological evidence to summarize the relationship between a variety of lifestyle factors
and CRC risk. Based on their report, being overweight or obese, being tall, alcohol consumption,
cigarette smoking, high intake of red meat and processed meat are associated with an increased risk of
CRC, whereas aspirin use, regular physical activity , and dietary fiber intake are associated with a
decreased risk [8-10]. Moreover, although individual study findings have been inconsistent, other
dietary factors (i.e. vegetables, fruits and coffee) as well as use of medications (i.e. postmenopausal
hormone) and supplements (i.e. multivitamin and calcium) have been shown to contain many
micronutrients, which have chemoprevention properties [11].
Research has shown that heritable factors are also important in the etiology of CRC.
Approximately 5-10% of CRC cases are a consequence of recognized hereditary conditions [12]. Genetic
and epigenetic factors involving in inactivation of the tumor suppressor genes or mutations in DNA
mismatch repair pathway are also contributed to sporadic CRC cases [13]. With the emergence and
improvement of genome sequencing technology, more and more environmental factors have been
reported to be associated with the expression of specific molecular pathways involved in the
development of specific subtype of CRC. For example, recent studies have suggested that smoking may
be only associated with p53 overexpression-positive colorectal tumors [14, 15], and postmenopausal
estrogen plus progestin use may be only beneficial to MSI-low and microsatellite-stable CRC [16]. As a
result, controlling for genetic background is important to a study to investigate the effects of lifestyle
factors on CRC risk.
Additionally, studies have suggested that the etiology differs between colon cancer and rectal
cancer, as well as within different subsites of colon cancer (proximal and distal cancer), thus leading to
distinct clinical characteristics, drug responses and prognosis [17-20]. The proximal colon, also known as
right colon, includes cecum, ascending colon, hepatic flexure, transverse colon and splenic flexure,
105
where is the most common tumor location for nearly 41% of newly diagnosed CRC [21]. The risk of the
proximal colon cancer is higher in women than men, and also increases with age [22]. On the other side,
rectal cancer, composed of tumors in rectosigmoid junction and rectum, represents 28% of CRC and is
usually more prevalent among younger population [22]. The distal colon, known as left colon including
descending colon and sigmoid colon, locates between the proximal colon and rectum, which tumors are
usually associated with higher survival rates compared to the proximal colon tumors [21]. It has been
suggested that the underling mechanisms associated with such different CRC incidence and mortality
include the differences in physiology, genetic and environment factors [23]. However, the knowledge of
specific etiological factors in relation to these colorectal subsites is still scarce.
The advantage of twin studies is that twins share their genetic materials: monozygotic (MZ)
twins are genetically identical and dizygotic (DZ) twins share on average 50% of their genes like regular
siblings. Besides that, twins are usually exposed to the same early life environment, and during their
growth they are assumed to be affected physically and psychologically in a similar way [24]. To date, 3
observational studies were published to examine personal and lifestyle factors in relation to CRC risk
among twins, all of which were conducted in European twin cohorts [25-27]. Terry et al. analyzed a
Swedish twin cohort that included 498 CRC cases and reported that long-term heavy smoking was
associated with a nearly three-fold increased risk of CRC compared with never smoking [25]. Another
Swedish twin study (n=248 CRC cases) found no association between birth characteristics (birth weight
and birth length) and risk of CRC (n=248) [27]. By combining two twin cohorts from Sweden and Finland
to increase the sample size (n=837 CRC cases), Lundqvist et al. investigated associations between
anthropometric measures and CRC risk. They found a non-significant 90% increased risk of colon cancer
among older obese subjects and a null result for height [26]. Given the limited evidence in a twin study
design, the purpose of our study is to further investigate if the possible associations between
anthropometric measures, alcohol use, smoking, exercise, medication use, history of disease (e.g. allergy,
106
CRC history in spouses) as well as diet and risk of colorectal cancer persisted after controlling for genetic
background and early environmental factors by using a twin cohort from the California Twin Program
(CTP).
4.3 Material and Methods
4.3.1 Study Population
The study is based on the California Twin Program (CTP) developed and maintained at the
University of Southern California. CTP is a population-based cohort in which twins born in California
between 1908 and 1982. The development and representativeness of CTP have been described
elsewhere [28, 29]. Briefly, 256,616 multiple births were identified through birth records and linked with
the California Department of Motor Vehicles (DMV) for address information between 1990 and 2001, in
which 161,109 individuals were matched. Among the matched individuals, a 16-page screening
questionnaire was mailed to 115,733 with valid addresses. In 1998, the questionnaire was updated with
more detailed information on development, diet and medical history and sent to twins born from 1957
to 1982. This questionnaire is called “new version”. Accordingly, the version of questionnaire before
1998, also called “old version” was used for twins born before 1957. A total of 52,262 individuals
comprising 14,827 double respondent twin pairs (both members of twin pairs responded) and 22,608
single respondent twin pairs (only one member of twin pairs responded), completed and returned the
questionnaires. The crude overall response rate is 45.2%, which is comparable with or higher than
similar cohort studies [30]. This study is based on the whole cohort including the 37,435 responsive twin
pairs in whom at least one twin member completed any version of questionnaires. In comparison with
census data and California multiple birth record, the responding participants were representative of
native California twins regarding to age, sex, zygosity and residential distribution [28, 29].
107
This study was conducted in accordance with the Declaration of Helsinki with approval from the
University of Southern California Institutional Review Board.
4.3.2 Case Ascertainment
In 2010, all CTP twins including twins who completed and returned CTP questionnaires
(respondent twins) as well as their co-twins who didn’t returned the questionnaire (non-respondent
twins) were linked to California Cancer Registry (CCR) and Los Angeles County Cancer Surveillance
Program (CSP). The CCR is California’s statewide population-based cancer surveillance system, which
collects information about all cancers diagnosed among residents of California since 1988. CSP is the
population-based cancer registry for Los Angeles County. Since 1972, CSP has routinely collected
information on all newly diagnosed cancer cases among residents of the County and has been the Los
Angeles regional registry hub for CCR since 1988. The linkage was conducted from the CCR database
with the software Registry Plus Link Plus (Version 2.0) using all the possible variables, such as first and
last names, addresses, sex, date of birth, social security number if available. In 2012, another linkage
was made to CSP database to collect cancer cases diagnosed in Los Angeles prior to 1988. Therefore, we
were able to collect information of all cancer cases from CTP twins that were diagnosed during 1972-
1987 from CSP and during 1988-2010 from CCR. As a result, 5,416 subjects who were diagnosed with at
least one type of cancer during 1972-2010 were identified among all 37,435 respondent twin pairs in
CTP. The colorectal cancer (CRC) diagnoses were coded according to the definition from SEER
(Surveillance, Epidemiology, and End Results Program)
(http://seer.cancer.gov/siterecode/icdo3_dwhoheme/): 1) SEER codes of 21041-21049 for colon cancer
and 21051-21052 for rectal cancer were included; 2) ICD-O-3 [23] codes of C18.0-18.9 and C26.0 for
colon cancer, and C19.9 and C20.9 for rectal cancer were included; 3) ICD-O-3 histology sites of 9050-
9055, 9140, 9590-9992 were excluded. In colon cancer, codes of C18.0 and C18.2-C18.5 were designated
108
to the proximal colon tumors, while the distal tumors were coded by C18.6-C18.7 as well as other tumor
locations including appendix, overlapping lesion of colon, colon NOS and intestinal tract NOS were
coded by C18.1, C18.8, C18.9, and C26.0. Only primary CRC cases were included in this study. In total,
356 primary CRC cases with their date of diagnosis were identified by CCR/CSP registry.
CTP twins also reported their CRC status for themselves and their co-twins in the questionnaires
at baseline. With this self- or co-twin proxy-reports, we recognized 212 CRC cases diagnosed before
their participation in CTP. Of these, 48 cases overlapped with registry cases. Due to temporality issues,
we included the primary incident cases newly diagnosed after exposures were reported in the
questionnaire (i.e. prospective cases). Thus, 212 self- or proxy-reported CRC cases and 26 registry-
recognized cases diagnosed prior to the completion of questionnaire were excluded. Thus, there were
282 primary incident prospective CRC cases included in this study, including 198 colon cases and 82
rectum cases as well as 2 colon and rectum cases.
In terms of the distribution in twin pairs, the 282 primary incident cases were arisen from 151
single-respondent twin pairs and 130 double-respondent twin pairs. However, because most of lifestyle
factors in this study were personal measures that were not available in co-twin proxy-reports, single-
respondent pairs were excluded from the analysis. In double-respondent twin pairs, there were 2 pairs
were concordant for CRC, in which one pair contained 1 incident case and 1 previous case diagnosed
before enrollment, thus being left out of the study. Furthermore, among the 128 CRC-discordant twin
pairs, due to the greater differences in hormones and lifestyles between males and females, we also
excluded 38 unlike-sex (male-female) DZ twin pairs. In the end, 90 double-respondent, CRC- discordant
twin pairs, including 42 MZ pairs and 48 DZ like-sex pairs, were included in the analysis.
109
4.3.3 Personal and Lifestyle Factors
Information about personal and lifestyle factors and covariates was based on the questionnaire
completed at the CTP enrollment. For those factors that were measured slightly differently between the
two versions of the questionnaires, we used the most straightforward measures and dichotomized the
categories to combine the variable uniformly.
Subjects were asked about their weight and height when they completed the questionnaire.
Overweight and obesity were estimated by body mass index (BMI), which is calculated as weight in
kilograms divided by the square of height in meters. We calculated BMI at study baseline and used the
categories defined by the World Health Organization (WHO): <25 kg/m
2
underweight/normal vs. ≥25
kg/m
2
overweight/obese [31].
Alcohol use was measured by asking: “Have you had at least 10 drinks in your life?” and “How
many drinks did you consume in the past 14 days?”. Those who had 3 or more drinks per day or had
drinks for more than 3 days during the last 14 days were considered exposed and all others with lower
consumption or no consumption were considered unexposed.
Smoking was evaluated based on a set of related questions. The age at starting smoking was
simply measured directly from the answers from “At what age did you start to smoke regularly?” and
categorized into two categories at age of 18. Question of “Did you smoke at least 100 cigarettes in life?”
was used to define ever and never smokers by answering “yes” and “no”, respectively. Among ever
smokers, former or current smokers were defined based on the answer of “Have you smoked in the last
6 months?”. Thus, it should be noted that the time regarding to “former” or “current” referred to the
time at questionnaire completion but not the time at diagnosis. The counts for cigarettes per day were
directly obtained from the answer of “How many cigarettes each day have you usually smoked?”. Then,
pack-years among smokers were calculated by multiplying the number of cigarettes per day to smoking
110
years and then dividing by 20 cigarettes per pack, in which smoking years measured the time from the
age of smoking start to stop if specified, otherwise to the time at diagnosis.
Participants were asked if they had regular physical activity and if so, how often and how long
they did activities that made them breathe hard. Exercise MET score in unit of minutes/week was
calculated by adopting the guidelines for International Physical Activity Questionnaire (IPAQ) short form
[32]. In brief, exercise intensity was calculated based on exercise frequency and duration in a week for
two types of activities: activities like jogging, swimming, aerobics, bicycling, tennis, weight-lifting that
make people breathe hard, or activities such as walking, playing golf, gardening that usually do not make
people breathe hard. Breathing-hard activities were treated as “vigorous” and the other activities were
treated as “moderate”, which were given different weights in calculation. Thus, the total MET score was
computed by 8 multiplying vigorous intensity plus 4 multiplying moderate intensity. The binary MET
variable was cut at 200 minutes/week to define as low activity vs. moderate/high activity.
Measures for medications or supplement use included ever/never use of aspirin, multivitamins
or vitamins A/C/E, calcium and iron supplements. In the questionnaire, participants were asked to fill in
the bubbles if they were current user and if they used these medications/supplements for a brief time,
over one year, or over 5 years. Due to the small sample size, we combined all the information and
defined exposed as those who had at least one bubble filled in. Otherwise, subjects were considered as
unexposed.
History of allergy was a combined measure including conditions like seasonal hay fever, allergic
asthma, allergic collapse (anaphylaxis), animal or egg allergy, plant allergy, drug allergy, unknown allergy,
or ever user of medication for asthma.
One of the unique feature of CTP questionnaires is that subjects reported medical history not
only for themselves but also for their co-twins, parents, other siblings, children and their spouses.
Therefore, we easily extracted personal history of colon polyps, disease history of CRC in their first-
111
degree relatives as well as their spouses directly from the questionnaires. Disease history of CRC in
relatives suggests the heritable risk whereas CRC history in spouses may indicate the deleterious
lifestyles shared in the home.
Dietary factors in this study included coffee preference, summary measures for frequencies of
red meat, vegetables, and fruits and juice, as well as 9 food categories dichotomized at their median
frequency. In the diet section of CTP questionnaires, two measures were applied for a set of food items
– preference and frequency. Food frequency was used to evaluate the overall consumptions of red meat
(beef and bacon, as well as pork that was only available in the updated version of questionnaire),
vegetables (beans, tomato, broccoli, spinach, greens and carrots), and fruits or juice by calculating the
average servings consumed per week, and treated as continuous variables in the analysis. Moreover, in
order to evaluate the overall food pattern, we calculated a summary score indicating the adherence to
Mediterranean diet by adopting the algorithms from published studies [33, 34]. We selected 9 food
categories by combining the related food items listed in the food frequency section, including bread,
potatoes (potatoes/rice), fruits (citrus/other fruits), vegetables
(tomatoes/broccoli/spinach/greens/carrots), beans (red/pinto/lentils), fish (fish/tuna), red meat
(beef/hamburger/lamb), processed meat (bacon/hot dogs/sausage/cold cuts) and dairy (cottage
cheese/cheese/milk/yogurt). It was noted that some of them cannot be separated because they were
listed as the same food item, such as beef and hamburgers, or potatoes and rice. First, we calculated the
average frequency per week for each food category, and then obtained their questionnaire-specific and
sex-specific median values for each twin. Score of 1 was assigned to the twins who consumed the food
category at the value greater than the median, while the others was assigned with 0 score. Thus, the
Mediterranean diet score was calculated by adding the scores for bread, potatoes, fruits, vegetables,
beans and fish, then minus the sum score of red meat, processed meat and dairy, thus resulting in the
higher scores the healthier dietary patterns. In addition, because coffee consumption was not listed in
112
the food frequency section, the preference question was used. In the food preference list, subjects were
asked if they like or dislike a certain food item. The unexposed was defined as stated dislike of coffee
when people were unlike to drink it, while the exposed was considered to be all other responses (like it,
take it or leave it).
Another unique feature of CTP questionnaires is the ability to ask qualitative relative differences
between twins, for example, which twin smoked more cigarettes?”. Since twins grow up together and
always compare with each other during the growth, these questions may be easier to answer
particularly for those aspects that are difficult to remember precisely or measure quantitatively. In
addition, the accuracy of such questions can be evaluated by comparing the answers in double
respondent twin pairs. In this study, we selected the available relative measures corresponding to the
direct measures discussed above, including higher birth weight, faster growth at puberty, taller, weighed
more, drank alcohol more, smoked more, exposed to more second-hand smoke, exercised more, ate
more red meat, vegetables, or fruits. For double respondent twin pairs, only consistent answers were
included in the analysis.
Other covariates, such as age at participation, sex, race/ethnicity were self-reported based on
questionnaire, whereas age at diagnosis of CRC, site and histology were extracted from registry data.
4.3.4 Statistical Analysis
Conditional logistic regression was performed to estimate odds ratios (OR), as measures of
relative risk. There was no need to adjust for age, sex, race and ethnicity because twins are matched on
these. All lifestyle factors described above were examined. Monozygotic (MZ) twins and dizygotic (DZ)
like-sex twins were analyzed separately as well as combined due to the different power to control for
genotype between MZ twins (~100% matched) and DZ twins (~50% matched). All analyses were also
stratified by tumor subtypes (proximal colon, distal colon and rectum) to examine the potentially
113
different etiology. However, due to the small numbers in each subsite, we didn’t further separate for
different zygosity. Adjustments were made for analyses of BMI with exercise, or alcohol use with
smoking, or red meat with vegetables and fruits. However, the adjusted results were not substantially
different from the unadjusted risk estimates, but with less precision due to further reduced numbers.
Hence, we only reported unadjusted results.
Since it may take long time for lifestyle factors to effect on the development of CRC, to
eliminate the temporal ambiguity due to the earlier onset of CRC in cases before diagnosis, thus possibly
before questionnaire, sensitivity analyses were conducted by repeating the analyses excluding cases
diagnosed at the first 2 years since survey as well as the first 7 years (midpoint between survey and
diagnosis).
All the effects are reported as the odds ratios (OR) with a 95% confidence interval. If the cell
number was less than 5, a Fisher’s exact test was used for statistical testing if possible. Otherwise,
likelihood ratio χ
2
test were used. P values were reported for the difference between groups. P < 0.05 is
considered to be significant. Statistical analysis was performed using SAS software (Version 9.4, SAS
institute, Inc., Cary, North Carolina).
4.4 Results
4.4.1 Study participants characteristics
First, the characteristics of the study population were examined among 42 MZ pairs and 48 DZ
like-sex pairs separately and as combined (Table 4.1). In overall, more than 90% of twins are whites.
There twice as many affected male twins as female twins. 13% of affected twins were diagnosed before
the age of 50. About 8% of twins reported CRC cases in their first-degree relatives. Proximal colon cancer
was the most common CRC diagnosed in this study, representing 37.8% of cases, and then followed by
114
distal colon cancer (28.9%) and rectal cancer (20%). In addition, more than 90% of measures for lifestyle
factors were from the old version questionnaire.
By zygosity (Table 4.1), in this study population, nearly half of MZ twins completed
questionnaire at age younger than 50 years compared to about 30% of younger DZ twins. Meanwhile,
MZ twins were more likely diagnosed with CRC at age of 60 or younger than DZ twins. On the other hand,
there were more male DZ twin pairs than MZ pairs. And more DZ twins reported a family history of CRC
in their first-degree relatives than MZ twins (10.4% vs. 4.8%). Compared to DZ twin cases, the CRC
tumors diagnosed in MZ twin cases were more likely located in the distal colon (42.9% vs. 16.7%) but
less in the proximal colon (28.6% vs. 45.8%). About 20% of CRC were the rectal cancer for both MZ and
DZ twins.
Then by tumor subsites (Table 4.2), there were 34 twins diagnosed with proximal caner, 26 with
distal caner and 18 cases of rectal cancer. The other tumor site was excluded from the analysis due to
the small number. The overall distribution of each characteristic tended to more similar within colon
subsites (the proximal and the distal colon) than between colon cancer and rectal cancer. Compared to
colon cases, more rectal cancer cases completed the questionnaire at a relatively early age. Interestingly,
non-white twins were 2.5-4 times more likely to be diagnosed with rectal cancer than colon cancer. But
family history of CRC in the first-degree relatives was more likely observed in colon cancer twins than
rectal cancer twins. There were more younger cases (age<60 years) with rectal cancer than colon cancer,
whereas more proximal cancer cases were diagnosed at age of 70 years or older than both proximal
cancer and rectal cancer cases. However, we didn’t observe the distinct variation in sex across all the
three subsites, which all had twice number of males than females.
115
4.4.2 Personal and lifestyle factors Associated with CRC
To assess the association between each selected lifestyle factor and CRC risk, we first tested the
difference by comparing the affected twins and their matched unaffected co-twins in all 90 like-sex twin
pairs discordant for CRC, and then conducted separate analyses between 42 MZ and 48 DZ like-sex twin
pairs for possible genetic components interacting with such associations if any, as well as within each
cancer subgroups by tumor subsite (34 proximal or 26 distal colon cancer cases, or 18 rectal cancer
cases) for potentially different etiology.
First, we examined all the selected personal and lifestyle factors, which were measured from the
direct questions based on CTP questionnaires, including arthrometric measures for height and BMI,
alcohol use, smoking status, exercise levels, medicine use (aspirin, multivitamin, calcium and iron
supplements), personal history of allergy and colon polyps, and family history of CRC in spouses. We
didn’t find significant results linking these factors to CRC using all 90 pairs (Table 4.3). However, after
stratified by zygosity (Table 4.3), we observed a strong positive association between earlier age at
starting smoking and the risk of CRC particularly among MZ twins, in which the CRC risk was 9 times
higher among twins who started to smoke at age of 18 or earlier than those smoking after 18 years old
(OR=9.00; 95% CI: 1.25-394.48). In addition, there were no significant differences in each tested factor
within CRC-discordant DZ twins (Table 4.3), or among each subgroup of twin pairs by tumor subsites
(Table 4.4).
In terms of dietary habits in relation to CRC risk, we first assessed the relationships by
comparing the intake levels relative to the study population median for each single food category within
CRC-discordant twin pairs, in which above the median was considered as the high consumption, and vice
versa. Among the 9 selected food categories (bread, potatoes, fruits, vegetables, beans, fish, red meat,
processed meat and dairy), we found higher consumption of potatoes was associated with 80%
reduction in CRC risk (OR=0.20; 95% CI: 0.04-0.71, Table 4.5) in all twins, and this protective effect
116
seemed strengthened in MZ twins (OR=0.10; 95% CR: 0.002-0.70, Table 4.5), as well as was observed
when restricting to proximal colon cancer cases only (OR=0.11; 95% CI: 0.003-0.80, Table 4.6). Based on
these 9 food categories, a score was summarized to indicate how the observed dietary patterns were
adherent to Mediterranean diets that are generally considered as healthy habits protecting CRC. Thus,
the higher scores, the healthier gut. However, although we observed the apparent protective effects
from this dietary pattern among all the tested study groups regardless of types of zygosity or tumor
subsite locations, none of them were statistically significant (Table 4.5 and Table 4.6). Then we also
tested the food frequency counting the meals consumed during a week from three food categories that
were commonly associated with CRC, including red meat, vegetable and fruits. Of them, vegetable
intakes were shown a marginal, inverse association with CRC risk among all 90 discordant like-sex twins,
in which 21% lower risk was associated with every addition of one more meal of vegetable per week
(OR=0.79; 95% CI: 0.63-1.00, Table 4.5). Additionally, we didn’t find any link regarding to coffee
preference in CRC risk.
Furthermore, we also examined relative measures in lifestyle factors by comparing which twin
had more exposures based on the consistent answers from both members within twin pairs. The factors
available in CTP questionnaires included birth weight, adult height and weight, growth at puberty,
alcohol use, smoking and 2
nd
hand smoking exposure, exercise, as well as dietary factors (red meat,
vegetable and fruits). As a result, compared to their DZ like-sex co-twins, twins who reported to have
more alcohol drink were about twice more likely to develop CRC (OR=2.27; 95% CI: 1.08-5.12, Table 4.7).
However, this positive association was attenuated among all twins to a marginal significance (OR=1.58;
95% CI=0.94-2.69) due to the null finding in MZ twins. And unfortunately, our study was lack of power to
test such different effects for the cancer subgroups by tumor subsites. For the other lifestyle factors, no
significant results were found in any tested study populations (Table 4.7 and Table 4.8).
117
4.4.3 Sensitivity Analysis
After excluding cases in the first two years since survey, 81 discordant, like-sex twin pairs were
left for this sensitivity analysis (Table 4.9 and Table 4.10). The significantly protective effects from
vegetable and potatoes on CRC risk were retained with the similar effect sizes, while the positive
association between early-age smoking and CRC was attenuated. In addition, the increased risk to have
CRC among twins who claimed to drink more alcohol than their co-twins became more apparent among
all the eligible twins (OR=1.91; 95% CI=1.10-3.40). Then, in another similar sensitivity analysis conducted
in 47 discordant twin pairs for CRC after further excluding more cases diagnosed in the first 7 years since
surveys, similar effects and effect sizes were observed (Table 4.9 ad Table 4.10). Because of the
decreased numbers, we didn’t further separate the sensitivity analysis by zygosity or by tumor subsites.
4.5 Discussion
This study is the first time to link 37,435 twin pairs from the population-based CTP cohort [28,
29] to the population-based California cancer registries (CCR and CSP). According to the estimation
based on age-specific SEER incidence rates at enrollment, the expected incident cancer cases among
individuals in respondent pairs in CTP were 4,054 after 10 years of follow-up [28]. In fact, until 2010,
5,416 cancer cases were identified, including both prevalent and incident cases since 1972, in which
there were 356 primary CRC cases, accounting for 6.6% of all cancer cases. Of them, 90 double-
respondent, like-sex twin pairs who were discordant for CRC were eventually included in this study.
Given the high validity and coverage of California cancer registries, the risk of potential misclassification
of diagnoses was minimized. Moreover, all the disease status was prospectively collected, and thus the
risk of systematic errors such as recall and selection bias was minimized as well.
By comparing the affected twin cases to their unaffected co-twin controls within these 90 CRC-
discordant twin pairs, we conducted a matched case-control study to examine the associations between
118
CRC risk and a variety of lifestyle factors measured from the CTP questionnaires at enrollment, after at
least partially controlled for the genetic background and early life exposures. And each association was
also tested separately by zygosity (MZ vs. DZ like-sex twin pairs) for possible genetic-environment
interactions, as well as by CRC subgroups based on tumor subsites (proximal or distal colon, or rectum)
for potentially different etiology. Our results confirmed the previously established adverse associations
between alcohol use or smoking and CRC risk, as well as the protective effect of vegetable consumption
on the development of CRC. Additionally, further evidence from our study was added to show the
inverse association between CRC risk and potatoes intake. Except for vegetable, all the associations
were shown zygosity-specific or tumor subsite-specific. But interpretations should be with caution due
to the smaller numbers for each subgroup.
Alcohol consumption can increase the risk of developing CRC. The current epidemiological
evidences are generally consistent and there is an apparent dose-response relationship [9, 10]. A pooled
analysis showed the 41% increased risk for the group who drank the most alcohol [35]. Individuals who
consume 2-4 alcohol drinks per day in the lifetime have a 23% higher risk of developing CRC than those
who have less than 1 drink per day [36]. Continuous Update Project (CUP) from American Institute for
Cancer Research conducted a meta-analysis including 16 published studies and reported 7% increased
risk for CRC with respect to every 10 g/day increasing in alcohol consumption. The significant positive
associations with CRC were observed in both men and women but more significant in men, as well as
across different cancer subsites (colon cancer and rectal cancer) with similar effect sizes [10]. The
underlying mechanisms are still not well-described, probably due to genotoxic and carcinogenic effects
from the products produced or induced by alcohol, or promoting cellular penetration of carcinogenic
molecules into mucosal cells by alcohol as a solvent [10, 37, 38]. In this study, we assessed this
association both quantitatively (alcohol consumption) and qualitatively (more drinking). For the
quantitative measures, although we observed the positive associations that were generally consistent
119
across zygosity, sex, or tumor subsites, none of them were significant. One possible reason is the
misclassification of alcohol measures, since the CTP questionnaire was a general survey, in which the
related questions may not be specific enough to quantify alcohol consumption, especially for the heavy
alcohol use that usually had the more apparent effect. On the other hand, the significant positive
association was found among DZ twins using the relative question to identify the twin who drank more,
suggesting this unique questionnaire design in twins may be less subject to such measurement error.
However, the null results among MZ twins might indicate the different genetic susceptibility to alcohol
in the development of CRC from DZ twins, which should be interpreted in cautious due to the small
numbers after stratification. In addition, after excluding cases in the first two years since survey, the
association in all twins became more significant, suggesting the effect of the development of CRC due to
alcohol might take time.
Cigarette smoking is a known cause of lung cancer, but it has also been linked to CRC. In 2009,
based on evidences, the International Agency for Research on Cancer concluded that tobacco smoking is
a cause of CRC and the stronger association was found for rectal cancer (RR=1.4-2.0) compared to colon
cancer (RR=1.2-1.4) [36, 39, 40]. Smoking 40 cigarettes per day can raise the risk of CRC by 40% and
nearly double the risk of CRC mortality [41]. Men and women who smoke cigarettes were observed to
have earlier average age of onset for CRC [42]. The carcinogens found in tobacco have been suggested
to initiate tumor growth in the colon and rectum, thus increasing the risk of CRC [43]. In this study, we
included direct measures for smoking status (never vs. ever/former/current), smoking level and duration,
and age at starting smoking, as well as the relative measure for more exposures to smoking or second-
hand smoking to examine the associations with CRC. However, most of our results were null findings,
probably also due to measurement errors and smaller numbers in the finer categories. But interestingly,
we showed evidences that MZ twins who started to smoke at the earlier age had a significantly
increased risk of CRC, particularly for distal colon cancer. Our speculation for this association is that the
120
adolescents and young adults might be more susceptible to the physiological and genetic changes in the
colon and rectum, probably much more in the distal colon, induced by carcinogens in cigarettes, which
could be modulated by certain genetic predispositions. More interestingly, we noticed that such
associations were actually observed frequently in CRC studies with respect to smoking [44-46], but
commonly less addressed given its potentials in CRC prevention.
On the other hand, healthy dietary patterns, such as Mediterranean diets, have been negatively
associated with CRC risk [33, 34, 47-49], which usually consist of high intakes of fruits and vegetables,
fish and poultry, whole-grain products, and low intakes of red meat and processed meat. However,
although reasonably consistent across studies, due to the wide and disparate food category and
complicate food constituents, the evidences for these protective effects are still limited. Accordingly,
based on the food preference and frequency questionnaires in the CTP, we examined the CRC
associations with the preference to coffee, frequencies of vegetable (non-starchy), fruits and red meat,
as well as 9 food categories used to calculate a summary score for Mediterranean diets in twins. As a
result, we found a marginally significant dose-response association between vegetable intake and the
reduced CRC risk, which seemed consistent with the current knowledge. Based on the systematic
reviews and meta-analysis from CUP, every 100 g/day increase in consumptions of non-starchy
vegetables decreases the CRC risk by 2%. A combination of potential anti-tumorigenic agents produced
by digesting vegetables in the gastrointestinal tract, such as dietary fiber, vitamins C and E, folic acid and
many others, have been suggested responsible for the lower CRC risk [50]. CUP also showed this inverse
association was more significant in men and to colon cancer only [9, 10]. However, our study might be
lack of power to test such differences. Moreover, we also observed a strong protective effect from
potatoes category on the risk of CRC among all twins, MZ twins only, or for proximal colon cancer cases
only. Such associations have been rarely reported. A recent study in the Norwegian women reported
high potato consumption was associated with 32% increase in CRC risk, and the association was more
121
apparent for rectal cancer than colon cancer [51]. However, another animal study found purple potatoes
can prevent colon tumorigenesis by suppressing pro-inflammation signaling pathways and promoting
cellular apoptosis [52]. And we should also address that the potatoes category in this study included not
only potatoes but also rice, because they were grouped into the same question in the survey, leading to
the potential misclassifications. Thus, we also couldn’t separate white rice and brown rice, in which the
latter, known as whole grain rice, has been associated with the reduced CRC risk [10]. Similarly, such
ambiguity in the definitions of food categories likely resulted in the other null findings in this study.
The strengths of this study include genotype- and early life-matched twins, incident CRC cases
from population-based registries, CRC subgroups by tumor subsites, as well as minimal recall bias and
misclassification of disease. However, there are still several limitations. First, as mentioned above, the
non-differential misclassification of exposures may be the major source for the null findings, as well as
can result in difficulties to interpret the positive findings. In this study, all the personal and lifestyle
factors were measured based on the self-reports from CTP questionnaires at enrollment, which had two
versions. For the maximal power, we had to harmonize the two versions for some variables, such as
height, weight, aspirin use, and some food items. Moreover, in this 16-page general survey, some
questions were not specific enough to measure the selected lifestyle factors. For example, we had to
combine 4 questions to generate the conventional smoking variables; or some food categories in fact
contained different food items because they were grouped together in the survey, such as “red meat”
including beef and hamburger while “potatoes” containing potatoes and rice. Therefore, optimizations
of the definition for such variables with more validations are needed. Second, although the selection
bias due to disease status was eliminated by incident cases, other selection biases possibly arose from
the CTP cohort assembly or the design of this study. The CTP cohort had been overrepresented by young
female participants, since the DMV linkage had difficulty to identify those female twins who changed
their maiden name after marriage [28, 29]. As a result, less female CRC cases, especially for proximal
122
colon cancers were included in this study, compared to the general population [11]. In addition, due to
our matched case-control study to control for genotype, only double-respondent twin pairs were
included in the study, since the most information from the unresponsive co-twins in the single-
respondent twin pairs were missing, and the agreements on the most selected lifestyle factors
measured between self-reports and proxy-reports if any were poor (agreement <50%). In this case, if
double-respondent twin pairs had cared more about their health and then were more likely to have
healthier lifestyles than single-respondent twin pairs, their choice of lifestyles could lead to selection
bias, resulting in less discordant pairs for exposures and then decreasing the test precision. Third,
although we started with a large cohort and CRC is one of the most common cancer, the number of
incident CRC cases after on average 12 years of follow-up was still limited, thus further limiting the
number of study population. As a result, we might lack power to adjust for other factors, test
interactions, or restrict to subgroups. Therefore, we are planning to conduct another linkage to
registries for more cases diagnosed after 2010. Last, since the effects of lifestyle factors on CRC are
usually small to moderate that may require more time to develop, there could be temporal ambiguity
for CRC diagnosis relative to survey. The consistent results from sensitivity analyses by excluding cases
diagnosed in the first 2 years after survey as well as the first 7 years suggested such temporal issue was
minimal.
In conclusion, in this genetic- and early-life matched cases-control study using 90 twin pairs, our
findings on the associations between CRC risk and alcohol consumption, smoking or vegetable intakes
are consistent with the current knowledge that are usually derived from systematic reviews, pool
analyses or meta-analyses using thousands of participants, suggesting the efficiency and validity of twin
studies. However, the current study has limited power to test such associations in subgroups, such as
different zygosity or tumor subsites. We are planning to conduct another linkage to California cancer
registries for more cases diagnosed after 2010 to further validate our findings.
123
Chapter 4 references
1. Howlader N, N.A., Krapcho M, Garshell J, Miller D, Altekruse SF, Kosary CL, Yu M, Ruhl J,
Tatalovich Z,Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds), SEER Cancer Statistics
Review, 1975-2014, National Cancer Institute. Bethesda, MD,
http://seer.cancer.gov/csr/1975_2012/, based on November 2016 SEER data submission, posted
to the SEER web site, April 2017. 2017.
2. Group, U.S.C.S.W., United States Cancer Statistics: 1999–2014 Incidence and Mortality Web-
based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease
Control and Prevention, and National Cancer Institute. 2017.
3. Rasool, S., et al., A comparative overview of general risk factors associated with the incidence of
colorectal cancer. Tumor Biology, 2013. 34(5): p. 2469-2476.
4. Boyle, P. and M.E. Leon, Epidemiology of colorectal cancer. British medical bulletin, 2002. 64(1):
p. 1-25.
5. Janout, V. and H. Kollárová, Epidemiology of colorectal cancer. ACTA-UNIVERSITATIS
PALACKIANAE OLOMUCENSIS FACULTATIS MEDICAE, 2001: p. 5-10.
6. Johnson, I. and E. Lund, Review article: nutrition, obesity and colorectal cancer. Alimentary
pharmacology & therapeutics, 2007. 26(2): p. 161-181.
7. Boyle, P. and J. Langman, ABC of colorectal cancer: Epidemiology. British Medical Journal, 2000.
321(7264): p. 805-808.
8. World Cancer Research Fund, A.I.f.C.R., Food, Nutrition, Phsical Activiety, and the Prevention of
Cancer: a Global Perspective. 2007.
9. World Cancer Research Fund, A.I.f.C.R., Colorectal Cancer 2011 Report: Food, Nutrition, Physical
Activity, and the Prevention of Colorectal Cancer (Continuous Update Project). 2011.
10. World Cancer Research Fund, A.I.f.C.R., Diet, nutrition, physical activity and colorectal cancer
2017. 2017.
11. Haggar, F.A. and R.P. Boushey, Colorectal cancer epidemiology: incidence, mortality, survival,
and risk factors. Clinics in colon and rectal surgery, 2009. 22(4): p. 191.
12. Jackson ‐Thompson, J., et al., Descriptive epidemiology of colorectal cancer in the United States,
1998 –2001. Cancer, 2006. 107(S5): p. 1103-1111.
13. Kinzler, K.W. and B. Vogelstein, Cancer-susceptibility genes. Gatekeepers and caretakers. Nature,
1997. 386(6627): p. 761, 763.
14. Diergaarde, B., et al., Cigarette smoking and genetic alterations in sporadic colon carcinomas.
Carcinogenesis, 2003. 24(3): p. 565-571.
15. Watson, A.J. and P.D. Collins, Colon cancer: a civilization disorder. Digestive diseases, 2011. 29(2):
p. 222-228.
16. Newcomb, P.A., et al., Estrogen plus progestin use, microsatellite instability, and the risk of
colorectal cancer in women. Cancer Research, 2007. 67(15): p. 7534-7539.
17. Loupakis, F., et al., Primary tumor location as a prognostic factor in metastatic colorectal cancer.
Journal of the National Cancer Institute, 2015. 107(3): p. dju427.
18. Lee, G., et al., Is right-sided colon cancer different to left-sided colorectal cancer?–a systematic
review. European Journal of Surgical Oncology (EJSO), 2015. 41(3): p. 300-308.
19. Nawa, T., et al., Differences between right ‐ and left ‐ sided colon cancer in patient
characteristics, cancer morphology and histology. Journal of gastroenterology and hepatology,
2008. 23(3): p. 418-423.
20. Iacopetta, B., Are there two sides to colorectal cancer? International journal of cancer, 2002.
101(5): p. 403-408.
124
21. Surveillance, E., and End Results (SEER) Program. SEER, EER*Stat Database: North American
Association of Central Cancer Registries (NAACCR) Incidence-Cancer in North America (CiNA)
Analytic File, 1995-2013, for NHIA v2 Origin, Custom File With County, American Cancer Society
(ACS) Facts and Figures Projection Project. . Springfield, IL: NAACCR, 2016.
22. Siegel, R.L., et al., Colorectal cancer statistics, 2017. CA: a cancer journal for clinicians, 2017.
67(3): p. 177-193.
23. Li, F.-y., Colorectal cancer, one entity or three. Journal of Zhejiang University Science B, 2009.
10(3): p. 219-229.
24. Evans, D.M. and N.G. Martin, The validity of twin studies. GeneScreen, 2000. 1(2): p. 77-79.
25. Terry, P., et al., Long ‐term tobacco smoking and colorectal cancer in a prospective cohort study.
International journal of cancer, 2001. 91(4): p. 585-587.
26. Lundqvist, E., et al., Co ‐twin control and cohort analyses of body mass index and height in
relation to breast, prostate, ovarian, corpus uteri, colon and rectal cancer among Swedish and
Finnish twins. International journal of cancer, 2007. 121(4): p. 810-818.
27. Cnattingius, S., F. Lundberg, and A. Iliadou, Birth characteristics and risk of colorectal cancer: a
study among Swedish twins. British journal of cancer, 2009. 100(5): p. 803-806.
28. Cockburn, M., et al., The occurrence of chronic disease and other conditions in a large
population-based cohort of native Californian twins. Twin Res, 2002. 5(5): p. 460-7.
29. Cockburn, M.G., et al., Development and representativeness of a large population-based cohort
of native Californian twins. Twin Res, 2001. 4(4): p. 242-50.
30. Cozen, W., et al., The USC Adult Twin Cohorts: International Twin Study and California Twin
Program. Twin Res Hum Genet, 2013. 16(1): p. 366-70.
31. Organization, W.H., Obesity: preventing and managing the global epidemic. 2000: World Health
Organization.
32. Committee, I.R., Guidelines for data processing and analysis of the International Physical Activity
Questionnaire (IPAQ)–short and long forms. Retrieved September, 2005. 17: p. 2008.
33. Son, J.S., et al., Altered Interactions between the Gut Microbiome and Colonic Mucosa Precede
Polyposis in APCMin/+ Mice. PLoS One, 2015. 10(6): p. e0127985.
34. Morgan, X.C. and C. Huttenhower, Chapter 12: Human microbiome analysis. PLoS Comput Biol,
2012. 8(12): p. e1002808.
35. Cho, E., et al., Alcohol intake and colorectal cancer: a pooled analysis of 8 cohort studies. Annals
of internal medicine, 2004. 140(8): p. 603-613.
36. Society, A.C., Colorectal Cancer Facts & Figures 2011-2013. 2011, American Cancer Society
Atlanta, GA.
37. Pöschl, G. and H.K. Seitz, Alcohol and cancer. Alcohol and alcoholism, 2004. 39(3): p. 155-165.
38. Seitz, H.K. and F. Stickel, Molecular mechanisms of alcohol-mediated carcinogenesis. Nature
Reviews Cancer, 2007. 7(8): p. 599-612.
39. Paskett, E.D., et al., Association between cigarette smoking and colorectal cancer in the
Women’s Health Initiative. Journal of the National Cancer Institute, 2007. 99(22): p. 1729-1735.
40. Humans, I.W.G.o.t.E.o.C.R.t., Tobacco smoke and involuntary smoking. IARC monographs on the
evaluation of carcinogenic risks to humans/World Health Organization, International Agency for
Research on Cancer, 2004. 83: p. 1.
41. Liang, P.S., T.Y. Chen, and E. Giovannucci, Cigarette smoking and colorectal cancer incidence and
mortality: Systematic review and meta ‐analysis. International Journal of Cancer, 2009. 124(10):
p. 2406-2415.
42. Zisman, A.L., et al., Associations between the age at diagnosis and location of colorectal cancer
and the use of alcohol and tobacco: implications for screening. Archives of internal medicine,
2006. 166(6): p. 629-634.
125
43. Secretan, B., et al., A review of human carcinogens—Part E: tobacco, areca nut, alcohol, coal
smoke, and salted fish. The lancet oncology, 2009. 10(11): p. 1033-1034.
44. Leufkens, A.M., et al., Cigarette smoking and colorectal cancer risk in the European Prospective
Investigation into Cancer and Nutrition study. Clinical Gastroenterology and Hepatology, 2011.
9(2): p. 137-144.
45. Tsong, W., et al., Cigarettes and alcohol in relation to colorectal cancer: the Singapore Chinese
Health Study. British journal of cancer, 2007. 96(5): p. 821-827.
46. Chao, A., et al., Cigarette smoking and colorectal cancer mortality in the cancer prevention study
II. Journal of the National Cancer Institute, 2000. 92(23): p. 1888-1896.
47. Slattery, M.L., et al., Eating patterns and risk of colon cancer. American Journal of Epidemiology,
1998. 148(1): p. 4-16.
48. Dixon, L.B., et al., Dietary patterns associated with colon and rectal cancer: results from the
Dietary Patterns and Cancer (DIETSCAN) Project. The American journal of clinical nutrition, 2004.
80(4): p. 1003-1011.
49. Terry, P., et al., Prospective study of major dietary patterns and colorectal cancer risk in women.
American journal of epidemiology, 2001. 154(12): p. 1143-1149.
50. Steinmetz, K.A. and J.D. Potter, Vegetables, fruit, and cancer. II. Mechanisms. Cancer Causes &
Control, 1991. 2(6): p. 427-442.
51. Åsli, L.A., et al., Potato Consumption and Risk of Colorectal Cancer in the Norwegian Women and
Cancer Cohort. Nutrition and Cancer, 2017. 69(4): p. 564-572.
52. Charepalli, V., et al., Anthocyanin-containing purple-fleshed potatoes suppress colon
tumorigenesis via elimination of colon cancer stem cells. The Journal of nutritional biochemistry,
2015. 26(12): p. 1641-1649.
126
Table 4.1. Characteristics of 90 double-responded like-sex twin pairs in CA Twin Program, discordant
for CRC diagnosed at 1993-2009 and prospectively identified through CCR/CSP.
Characteristics
Total like-sex twin
pairs
Monozygotic twin
pairs
Dizygotic like-sex
twin pairs
N % N % N %
Total twin pairs 90
42
48
Questionnaire version
Old 82 91.1
38 90.5
44 91.7
New 8 8.9
4 9.5
4 8.3
Age at participation
<40 10 11.1
6 14.3
4 8.3
40-49 26 28.9
15 35.7
11 22.9
50-59 27 30.0
11 26.2
16 33.3
60-69 16 17.8
7 16.7
9 18.8
70+ 11 12.2
3 7.1
8 16.7
Sex
Male-male 60 66.7
25 59.5
35 72.9
Female-female 30 33.3
17 40.5
13 27.1
Race/Ethnicity
White 82 91.1
38 90.5
44 91.7
Non-white 7 7.8
4 9.5
3 6.3
Unknown or Missing 1 1.1
0 0.0
1 2.1
Family history of CRC in 1st
degree relatives
Yes 7 7.8
2 4.8
5 10.4
No 83 92.2
40 95.2
43 89.6
Age at diagnosis
<40 1 1.1
1 2.4
0 0.0
40-49 11 12.2
6 14.3
5 10.4
50-59 31 34.4
18 42.9
13 27.1
60-69 24 26.7
10 23.8
14 29.2
70+ 23 25.6
7 16.7
16 33.3
Tumor subsites
Proximal Colon 34 37.8
12 28.6
22 45.8
Distal Colon 26 28.9 18 42.9 8 16.7
Rectum 18 20.0 8 19.1 10 20.8
Others 12 13.3 4 9.5 8 16.7
127
Table 4.2. By tumor subsites: Characteristics of 90 double-responded like-sex twin pairs in CA Twin
Program, discordant for CRC diagnosed at 1993-2009 and prospectively identified through CCR/CSP.
Characteristics
Proximal Colon Distal Colon Rectum
N % N % N %
Total twin pairs 34
26
18
Questionnaire version
Old 30 88.2 25 96.2 17 94.4
New 4 11.8 1 3.9 1 5.6
Age at participation
<40 3 8.8 2 7.7 2 11.1
40-49 10 29.4 7 26.9 7 38.9
50-59 9 26.5 8 30.8 6 33.3
60-69 8 23.5 5 19.2 2 11.1
70+ 4 11.8 4 15.4 1 5.6
Sex
Male-male 23 67.7 18 69.2 12 66.7
Female-female 11 32.4 8 30.8 6 33.3
Race/Ethnicity
White 32 94.1 25 96.2 14 77.8
Non-white 2 5.9 1 3.9 3 16.7
Unknown or Missing 0 0.0 0 0.0 1 5.6
Family history of CRC in 1st
degree relatives
Yes 4 11.8 3 11.5 0 0.0
No 30 88.2 23 88.5 18 100.0
Age at diagnosis
<40 0 0.0 0 0.0 1 5.6
40-49 4 11.8 4 15.4 0 0.0
50-59 12 35.3 8 30.8 9 50.0
60-69 6 17.7 9 34.6 5 27.8
70+ 12 35.3 5 19.2 3 16.7
128
Table 4.3. Lifestyle factors and the risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin Program, in which cases
were diagnosed at 1993-2009 and prospectively identified through CSP/CCR.
Lifestyle Factors
Total like-sex twin pairs (N=90) Monozygotic twin pairs (N=42) Dizygotic like-sex twin pairs (N=48)
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
Low High Low High Low High
Anthropometric measures
Height
per 10 cm
1.42 0.62 3.24
5.01 0.56 44.86
1.11 0.45 2.72
BMI (kg/m
2
)
>=25 vs. <25 14/8 1.75 0.69 4.82 4/2 2.00 0.29 22.11 10/6 1.67 0.55 5.58
Alcohol Use
Lifetime total Number drinks
>=10 vs. <10 10/4 2.50 0.72 10.92
5/2 2.50 0.41 26.25
5/2 2.50 0.41 26.25
Drinks during last 14 days
>=9 vs. <9 9/7 1.29 0.43 4.06 5/1 5.00 0.56 236.49 4/6 0.67 0.14 2.81
Smoking
Age at starting smoking
<= 18 vs. > 18 12/5 2.40 0.79 8.70 9/1 9.00 1.25 394.48 3/4 0.75 0.11 4.43
Cigarette smoking status
Ever vs. Never 10/13 0.77 0.30 1.90 4/3 1.33 0.23 9.10 6/10 0.60 0.18 1.82
Former vs. Never 6/9 0.67 0.20 2.10 2/2 1.00 0.07 13.80 4/7 0.57 0.12 2.25
Current vs. Never 4/4 1.00 0.19 5.37 2/1 2.00 0.10 117.99 2/3 0.67 0.06 5.82
Pack-years (smokers only)
per 10 pack-years 0.97 0.80 1.18 1.02 0.76 1.37 0.93 0.71 1.22
Cigarette per day (smoker only)
per 10 cigarettes 1.03 0.67 1.59 1.05 0.61 1.78 1.00 0.47 2.13
Exercise
Regular exercise
Yes vs. No 18/19 0.95 0.47 1.91
7/9 0.78 0.25 2.35
11/10 1.10 0.42 2.89
Exercise MET Score
<=200 vs. >200 7/10 0.70 0.23 2.04 2/7 0.29 0.03 1.50 5/3 1.67 0.32 10.73
Medicines /Supplements
Aspirin
Ever vs. Never 16/14 1.14 0.52 2.53
9/6 1.50 0.48 5.12
7/8 0.88 0.27 2.76
Multivitamin
Ever vs. Never 18/14 1.29 0.60 2.79
6/7 0.86 0.24 2.98
12/7 1.71 0.62 5.14
Calcium
Ever vs. Never 10/5 2.00 0.62 7.46 5/3 1.67 0.32 10.73 5/2 2.50 0.41 26.25
Iron
Ever vs. Never 6/4 1.50 0.36 7.23 4/3 1.33 0.23 9.10 2/1 2.00 0.10 117.99
History of Disease
Allergy
Ever vs. Never 18/17 1.06 0.52 2.19
4/7 0.57 0.12 2.25
14/10 1.40 0.58 3.52
History of CRC in spouses
Yes vs. No 2/0 2.41 0.29 Inf. 0/0 N.A. N.A. N.A. 2/0 2.41 0.29 Inf.
Personal history of colon polyps
Yes vs. No 2/3 0.67 0.06 5.82 1/2 0.50 0.01 9.61 1/1 1.00 0.01 78.50
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin.
129
Table 4.4. By tumor subsites: lifestyle factors and the risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin Program,
in which cases were diagnosed at 1993-2009 and prospectively identified through CSP/CCR.
Lifestyle Factors
Proximal Colon (N=34) Distal Colon (N=26) Rectum (N=18)
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
Low High Low High Low High
Anthropometric measures
Height
per 10 cm 0.95 0.29 3.17 1.17 0.25 5.51 13.92 0.74 263.01
BMI (kg/m
2
)
>=25 vs. <25 8/2 4.00 0.80 38.67 4/2 2.00 0.29 22.11 1/2 0.50 0.01 9.61
Alcohol Use
Lifetime total Number drinks
>=10 vs. <10 6/1 6.00 0.73 275.99 3/1 3.00 0.24 157.49 0/2 0.41 0.00 3.47
Drinks during last 14 days
>=9 vs. <9 3/3 1.00 0.13 7.47 3/2 1.50 0.17 17.96 2/1 2.00 0.10 117.99
Smoking
Age at starting smoking
<= 18 vs. > 18 3/2 1.50 0.17 17.96
5/0 6.73 1.22 Inf.
1/3 0.33 0.01 4.15
Cigarette smoking status
Ever vs. Never 2/6 0.33 0.03 1.86 3/2 1.50 0.17 17.96 3/2 1.50 0.17 17.96
Former vs. Never 1/3 0.33 0.01 4.15 1/1 1.00 0.01 78.50 3/2 1.50 0.17 17.96
Current vs. Never 1/3 0.33 0.01 4.15 2/1 2.00 0.10 117.99 0/0 NA NA NA
Pack-years (smokers only)
per 10 cigarettes 0.99 0.70 1.40 1.03 0.74 1.44 0.79 0.51 1.23
Cigarette per day (smoker only)
per 10 cigarettes 1.64 0.58 4.63 0.96 0.51 1.83 0.67 0.27 1.65
Exercise
Regular exercise
Yes vs. No 11/6 1.83 0.62 6.04
5/4 1.25 0.27 6.30
2/4 0.50 0.05 3.49
Exercise MET Score
<=200 vs. >200 4/3 1.33 0.23 9.10 1/2 0.50 0.01 9.61 1/4 0.25 0.01 2.53
Medicines /Supplements
Aspirin
Ever vs. Never 3/4 0.75 0.11 4.43
5/5 1.00 0.23 4.35
6/3 2.00 0.43 12.36
Multivitamin
Ever vs. Never 6/3 2.00 0.43 12.36
5/4 1.25 0.27 6.30
5/4 1.25 0.27 6.30
Calcium
Ever vs. Never 4/3 1.33 0.23 9.10 2/1 2.00 0.10 117.99 2/1 2.00 0.10 117.99
Iron
Ever vs. Never 1/2 0.50 0.01 9.61 3/2 1.50 0.17 17.96 1/0 1.00 0.05 Inf.
History of Disease
Allergy
Ever vs. Never 9/7 1.29 0.43 4.06
4/3 1.33 0.23 9.10
5/3 1.67 0.32 10.73
History of CRC in spouses
Yes vs. No 1/0 1.00 0.05 Inf. 0/0 NA NA NA 1/0 1.00 0.05 Inf.
Personal history of colon polyps
Yes vs. No 0/1 1.00 0.00 19.00 1/2 0.50 0.01 9.61 0/0 NA NA NA
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin.
130
Table 4.5. Dietary factors and the risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin Program, in which cases were
diagnosed at 1993-2009 and prospectively identified through CSP/CCR.
Dietary Factors
Total like-sex twin pairs (N=90) Monozygotic twin pairs (N=42) Dizygotic like-sex twin pairs (N=48)
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
Low High Low High Low High
Coffee
Like vs. Dislike 15/15 1.00 0.46 2.20
6/5 1.20 0.31 4.97
9/10 0.90 0.32 2.46
Bread
> median vs. <= median 7/9 0.78 0.25 2.35 4/4 1.00 0.19 5.37 3/5 0.60 0.09 3.08
Potatoes
> median vs. <= median 3/15 0.20 0.04 0.71 1/10 0.10 0.002 0.70 2/5 0.40 0.04 2.44
Fruits
> median vs. <= median 14/18 0.78 0.36 1.66 6/6 1.00 0.27 3.74 8/12 0.67 0.24 1.77
Vegetables
> median vs. <= median 13/17 0.77 0.34 1.67 3/6 0.50 0.08 2.34 10/11 0.91 0.35 2.36
Beans
> median vs. <= median 10/12 0.83 0.32 2.11 4/6 0.67 0.14 2.81 6/6 1.00 0.27 3.74
Fish
> median vs. <= median 15/11 1.36 0.59 3.28 5/6 0.83 0.20 3.28 10/5 2.00 0.62 7.46
Red meat
> median vs. <= median 14/8 1.75 0.69 4.82 5/3 1.67 0.32 10.73 9/5 1.80 0.54 6.84
Processed meat
> median vs. <= median 18/15 1.20 0.57 2.56 6/6 1.00 0.27 3.74 12/9 1.33 0.52 3.58
Diary
> median vs. <= median 13/13 1.00 0.43 2.34 5/5 1.00 0.23 4.35 8/8 1.00 0.33 3.06
Adhere to Mediterranean Diet
per 1 unit increase 0.86 0.64 1.14 0.82 0.50 1.33 0.88 0.61 1.26
Red meat
per 1 meal in a week 1.08 0.88 1.32 0.94 0.68 1.31 1.18 0.89 1.55
Vegetable
per 1 meal in a week 0.79 0.63 1.00 0.77 0.55 1.10 0.81 0.59 1.11
Fruit
per 1 meal in a week 0.93 0.82 1.05 0.96 0.78 1.17 0.91 0.78 1.06
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin.
131
Table 4.6. By tumor subsites: Dietary factors and the risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin Program,
in which cases were diagnosed at 1993-2009 and prospectively identified through CSP/CCR.
Dietary Factors
Proximal Colon (N=34) Distal Colon (N=26) Rectum (N=18)
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
Low High Low High Low High
Coffee
Like vs. Dislike 4/8 0.50 0.11 1.87
4/4 1.00 0.19 5.37
5/3 1.67 0.32 10.73
Bread
> median vs. <= median 3/5 0.60 0.09 3.08 2/1 2.00 0.10 117.99 1/1 1.00 0.01 78.50
Potatoes
> median vs. <= median 1/9 0.11 0.003 0.80 0/2 0.41 0.000 3.47 1/3 0.33 0.006 4.15
Fruits
> median vs. <= median 6/10 0.60 0.18 1.82 7/3 2.33 0.53 13.98 1/1 1.00 0.01 78.50
Vegetables
> median vs. <= median 8/9 0.89 0.30 2.60 4/1 4.00 0.40 196.99 1/5 0.20 0.004 1.79
Beans
> median vs. <= median 5/2 2.50 0.41 26.25 2/6 0.33 0.03 1.86 2/1 2.00 0.10 117.99
Fish
> median vs. <= median 5/7 0.71 0.18 2.61 7/2 3.50 0.67 34.53 2/1 2.00 0.10 117.99
Red meat
> median vs. <= median 7/4 1.75 0.45 8.15 5/2 2.50 0.41 26.25 0/2 0.41 0.00 3.47
Processed meat
> median vs. <= median 7/8 0.88 0.27 2.76 5/4 1.25 0.27 6.30 3/2 1.50 0.17 17.96
Diary
> median vs. <= median 9/4 2.25 0.63 10.00 3/5 0.60 0.09 3.08 1/0 1.00 0.05 Inf.
Adhere to Mediterranean Diet
per 1 unit increase 0.78 0.51 1.19 1.35 0.78 2.34 0.47 0.12 1.84
Red meat
per 1 meal in a week
1.00 0.78 1.29
1.15 0.77 1.74
0.90 0.40 2.06
Vegetable
per 1 meal in a week
0.78 0.55 1.11
0.77 0.49 1.23
0.72 0.34 1.49
Fruit
per 1 meal in a week
0.88 0.74 1.05
1.14 0.87 1.48
0.98 0.69 1.39
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin.
132
Table 4.7. Relative lifestyle factors and the risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin Program, in which
cases were diagnosed at 1993-2009 and prospectively identified through CSP/CCR.
Lifestyle Factors (Which twin has more?)
Total like-sex twin pairs (N=90) Monozygotic twin pairs (N=42)
Dizygotic like-sex twin pairs
(N=48)
b/c* OR Unadj.
95% CI
b/c* OR Unadj.
95% CI
b/c* OR Unadj.
95% CI
Low High Low High Low High
Anthropometric
measures
Birth weight more 5/3 1.67 0.32 10.73 3/1 3.00 0.24 157.49 2/2 1.00 0.07 13.80
Faster growth at puberty 26/26 1.00 0.56 1.79
7/9 0.78 0.25 2.35
19/17 1.12 0.55 2.29
Current taller 34/29 1.17 0.69 2.00
13/10 1.30 0.53 3.31
21/19 1.11 0.57 2.17
Current weight more 36/39 0.92 0.57 1.49 16/17 0.94 0.45 1.98 20/22 0.91 0.47 1.75
Alcohol Use Alcohol drink more 41/26 1.58 0.94 2.69 16/15 1.07 0.49 2.32 25/11 2.27 1.08 5.12
Smoking
Smoke more 31/28 1.11 0.64 1.92 13/12 1.08 0.46 2.60 18/16 1.13 0.54 2.36
2nd hand smoke more 29/25 1.16 0.66 2.07 14/11 1.27 0.54 3.10 15/14 1.07 0.48 2.40
Exercise Exercise more 33/37 0.89 0.54 1.47 15/16 0.94 0.43 2.03 18/21 0.86 0.43 1.69
Diet
Red meat more 24/22 1.09 0.59 2.04 7/12 0.58 0.20 1.61 17/10 1.70 0.74 4.15
Vegetable more 18/19 0.95 0.47 1.91
4/10 0.40 0.09 1.39
14/9 1.56 0.63 4.07
Fruit more 16/17 0.94 0.45 1.98 3/8 0.38 0.06 1.56 13/9 1.44 0.57 3.83
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin
133
Table 4.8. By tumor subsites: Relative lifestyle factors and risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin
Program, in which cases were diagnosed at 1993-2009 and prospectively identified through CSP/CCR.
Lifestyle Factors (Which twin has more?)
Proximal Colon (N=34) Distal Colon (N=26) Rectum (N=18)
b/c* OR Unadj.
95% CI
b/c* OR Unadj.
95% CI
b/c* OR Unadj.
95% CI
Low High Low High Low High
Anthropometric
measures
Birth weight more 3/1 3.00 0.24 157.49
1/0 1.00 0.05 Inf.
1/0 1.00 0.05 Inf.
Faster growth at puberty 8/15 0.53 0.20 1.34
8/3 2.67 0.64 15.61
7/4 1.75 0.45 8.15
Current taller 14/12 1.17 0.50 2.76
9/7 1.29 0.43 4.06
8/5 1.60 0.46 6.22
Current weight more 14/15 0.93 0.42 2.07
12/10 1.20 0.48 3.10
6/8 0.75 0.21 2.47
Alcohol Use Alcohol drink more 17/9 1.89 0.80 4.81
8/10 0.80 0.27 2.25
10/5 2.00 0.62 7.46
Smoking
Smoke more 10/11 0.91 0.35 2.36
8/8 1.00 0.33 3.06
6/6 1.00 0.27 3.74
2nd hand smoke more 9/10 0.90 0.32 2.46
9/7 1.29 0.43 4.06
6/5 1.20 0.31 4.97
Exercise Exercise more 14/15 0.93 0.42 2.07
10/11 0.91 0.35 2.36
4/6 0.67 0.14 2.81
Diet
Red meat more 10/6 1.67 0.55 5.58
5/8 0.63 0.16 2.17
4/5 0.80 0.16 3.72
Vegetable more 9/6 1.50 0.48 5.12
4/7 0.57 0.12 2.25
4/1 4.00 0.40 196.99
Fruit more 7/7 1.00 0.30 3.34
4/3 1.33 0.23 9.10
4/2 2.00 0.29 22.11
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin
134
Table 4.9. Sensitivity analysis: lifestyle factors and the risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin Program, in which
cases were diagnosed at 1993-2009 and prospectively identified through CSP/CCR.
Lifestyle Factors
Diagnosis at 2 years after survey (N=81) Diagnosis at 7 years after survey (N=47)
b/c* OR Unadj.
95% CI
b/c* OR Unadj.
95% CI
Low High Low High
Anthropometric measures
Height
per 10 cm 1.56 0.65 3.73 3.19 0.85 11.94
BMI (kg/m
2
)
>=25 vs. <25 12/6 2.00 0.70 6.50 7/5 1.40 0.38 5.59
Alcohol Use
Lifetime total Number drinks
>=10 vs. <10 10/3 3.33 0.86 18.85 7/2 3.50 0.67 34.53
Drinks during last 14 days
>=9 vs. <9 9/5 1.80 0.54 6.84 2/4 0.50 0.05 3.49
Smoking
Age at starting smoking
<= 18 vs. > 18 11/5 2.20 0.71 8.08
8/2 4.00 0.80 38.67
Cigarette smoking status
Ever vs. Never 9/12 0.75 0.28 1.94 7/8 0.88 0.27 2.76
Former vs. Never 5/8 0.63 0.16 2.17 3/6 0.50 0.08 2.34
Current vs. Never 4/4 1.00 0.19 5.37 4/2 2.00 0.29 22.11
Pack-years (smokers only)
per 10 pack-years 0.96 0.78 1.17 0.89 0.67 1.18
Cigarette per day (smokers only)
per 10 cigarettes 1.00 0.64 1.55 0.94 0.54 1.65
Exercise
Regular exercise
Yes vs. No 17/18 0.94 0.46 1.94
12/12 1.00 0.41 2.43
Exercise MET Score
<=200 vs. >200
7/9 0.78 0.25 2.35 5/7 0.71 0.18 2.61
Medicines /Supplements
Aspirin
Ever vs. Never 15/12 1.25 0.55 2.93
12/6 2.00 0.70 6.50
Multivitamin
Ever vs. Never 18/14 1.29 0.60 2.79
10/9 1.11 0.41 3.09
Calcium
Ever vs. Never
9/5 1.80 0.54 6.84 4/3 1.33 0.23 9.10
Iron
Ever vs. Never 6/3 2.00 0.43 12.36 2/2 1.00 0.07 13.80
History of Disease
Allergy
Ever vs. Never 14/17 0.82 0.38 1.78
11/12 0.92 0.37 2.27
History of CRC in spouses
Yes vs. No
1/0 1.00 0.05 Inf. 0/0 NA NA NA
Personal history of colon polyps
Yes vs. No 2/2 1.00 0.07 13.80 2/0 2.41 0.29 Inf
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin.
135
Table 4.10. Sensitivity analysis: Dietary factors and risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin Program, in
which cases were diagnosed at 1993-2009 and prospectively identified through CSP/CCR. Sensitivity analysis was conducted in 2 subsets: CRC
diagnosis at 2 years after enrollment, and CRC diagnosis at 7 years after enrollment.
Dietary Factors
Diagnosis at 2 years after survey (N=81) Diagnosis at 7 years after survey (N=47)
b/c* ORUnadj.
95% CI
b/c* ORUnadj.
95% CI
Low High
Low High
Coffee
Like vs. Dislike 13/14 0.93 0.40 2.13
8/10 0.80 0.27 2.25
Bread
> median vs. <= median 6/9 0.67 0.20 2.10 3/7 0.43 0.07 1.88
Potatoes
> median vs. <= median 2/15 0.13 0.015 0.57 1/9 0.11 0.003 0.80
Fruits
> median vs. <= median 11/18 0.61 0.26 1.37 6/10 0.60 0.18 1.82
Vegetables
> median vs. <= median 12/15 0.80 0.34 1.83 9/8 1.13 0.39 3.35
Beans
> median vs. <= median 9/11 0.82 0.30 2.17 6/5 1.20 0.31 4.97
Fish
> median vs. <= median 14/10 1.40 0.58 3.52 7/5 1.40 0.38 5.59
Red meat
> median vs. <= median 13/8 1.63 0.62 4.52 8/4 2.00 0.54 9.08
Processed meat
> median vs. <= median 18/10 1.80 0.79 4.37 11/6 1.83 0.62 6.04
Diary
> median vs. <= median 11/12 0.92 0.37 2.27 6/6 1.00 0.27 3.74
Adhere to Mediterranean Diet
per 1 unit increase 0.80 0.59 1.09 0.76 0.51 1.13
Red meat
per 1 meal in a week
1.10 0.89 1.35
1.16 0.84 1.60
Vegetable
per 1 meal in a week
0.80 0.63 1.01
0.88 0.66 1.18
Fruit
per 1 meal in a week
0.90 0.79 1.03
0.92 0.78 1.09
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin.
136
Table 4.11. Sensitivity analysis: Relative lifestyle factors and risk of CRC in Like-sex, double response, CRC discordant twin pairs in CA Twin
Program, in which cases were diagnosed at 1993-2009 and prospectively identified through CSP/CCR. Sensitivity analysis was conducted in 2
subsets: CRC diagnosis at 2 years after enrollment, and CRC diagnosis at 7 years after enrollment.
Lifestyle Factors (Which twin has more?)
Diagnosis at 2 years after survey (N=81) Diagnosis at 7 years after survey (N=47)
b/c* OR Unadj.
95% CI
b/c* OR Unadj.
95% CI
Low High Low High
Anthropometric measures
Birth weight more 5/3 1.67 0.32 10.70
2/2 1.00 0.07 13.80
Faster growth at puberty 24/23 1.04 0.56 1.94
14/14 1.00 0.44 2.26
Current taller 31/25 1.24 0.71 2.19
18/14 1.29 0.60 2.79
Current weight more 33/34 0.97 0.58 1.62
22/16 1.38 0.69 2.80
Alcohol Use Alcohol drink more 40/21 1.91 1.10 3.40
22/11 2.00 0.93 4.57
Smoking
Smoke more 29/26 1.12 0.63 1.97
14/15 0.93 0.42 2.07
2nd hand smoke more 27/23 1.17 0.65 2.14
18/11 1.64 0.73 3.83
Exercise Exercise more 31/31 1.00 0.59 1.70
17/18 0.94 0.46 1.94
Diet
Red meat more 21/20 1.05 0.54 2.04
12/15 0.80 0.34 1.83
Vegetable more 15/18 0.83 0.39 1.75
7/14 0.50 0.17 1.32
Fruit more 13/17 0.77 0.34 1.67
6/12 0.50 0.15 1.44
*
Matched 2x2 table cell number b and c: b=Exposed case & Unexposed co-twin, c=Unexposed case & Exposed co-twin
137
Chapter 5 Introduction of Colorectal Polyps and the Gut Microbiome
The gut microbiota, a complex community of microbes residing in the gut, have been linked
to colorectal cancer (CRC) recently. However, most attentions have been paid to the association
between the gut microbiota and advanced colorectal carcinoma, rather than the early-stage of
colorectal carcinogenesis. It is generally accepted that most of CRCs arise from colorectal
adenomatous polyps through a stepwise process involving multiple sequential, inherited or acquired
genetic mutations, and the corresponding morphological and functional alterations in the colon and
rectum. The early detection and removal of polyps in current screening programs contribute to the
substantially declining of the incidence and mortality of CRC. The research on the roles of the gut
microbiota in the progression from colorectal polyps to CRC have potential merits to improve the
knowledge of the molecular mechanism of CRC and transform this information into prevention and
early detection of CRC, thus reducing the burden of this disease.
5.1 Colorectal Polyps and Lesions
5.1.1 Introduction
Colorectal cancer (CRC) is the third most common cancer worldwide and a leading cause of
cancer mortality [1]. It is generally accepted that the development of CRC is a stepwise process
starting with abnormal cell proliferation, and aberrant crypt foci resulting in the formation of
different types of colorectal polyps and lesions [2].
Colorectal polyps, particularly colorectal adenomas and serrated lesions, are listed as the
precursor lesions of CRC by the current WHO classification of tumors of the digestive system [3].
Approximately 95% of sporadic CRCs arise from these malignant lesions [4]. Observational studies on
the prevalence of adenomas at CRC risk have shown persons with a previous history of colorectal
138
adenomatous polyps have a higher risk to develop CRC than those with no history of colorectal polyp
[5, 6]. Colorectal polyps can be detected and removed by screening procedures, such as colonoscopy
and polypectomy, thus reducing the CRC risk and potentially preventing 60% of death from CRC [7].
In the United States, nearly 19% of the general population were estimated to have colorectal
adenomatous polyps during their lifetime [8], and the risk increases up to 40% among people over
the age of 60 [9].
A classic stochastic model of CRC, referred as the adenoma-carcinoma sequence [10], has
been proposed to the malignant evolution from polyps to CRC, in which loss of tumor suppressor
gene APC (adenomatous polyposis coli) has been associated with the initiation of aberrant crypt foci
from normal mucosa, followed by activating oncogene K-ras and inactivating tumor suppressor gene
p53 that were commonly observed in the histological lesions such as small adenoma as well as large
polyps, and then altering the molecular pathways involving in chromosomal and microsatellite
instability, mismatch repair, and CpG island methylation to progress to malignant adenocarcinoma
[10, 11]. These genetic and epigenetic alterations can be induced by chemical or environmental
factors, which effects, if occurred in the stem cells, can be permanent and prone to accumulation of
additional mutations [11, 12]. The most common environmental factors that have been associated
with colon polyps and CRC risk are lifestyle and dietary factors. Extensive epidemiological studies
have shown that obese people with a sedentary lifestyle and heavy consumption of alcohol or
cigarettes have substantially greater risk of developing CRC. Moreover, although the exact dietary
patterns that increase CRC risk is still less clear, higher intake of red meat and processed meat and
less consumption of vegetables and fruits appears important [1, 13]. Potential underlying
mechanisms for these associations are thought to relate to a permissive environment to initiate and
promote carcinogenesis in the colon and rectum by modifying gut mucosal function, altering
hormone panels, changing metabolism efficiency, or stimulating the inflammatory response [1, 13].
139
The tumor progression can take from 5 to more than 20 years [14, 15]. However, as is a proposal of a
stochastic process, due to the different combinations of genetic and environmental factors involved
in each step, the adenoma-carcinoma sequence can be probabilistically ended through the different
evolutive paths, resulting in progression towards carcinoma, or stabilization at an intermediate
condition, or even regression to a less malignant stage.
5.1.2 Neoplastic Colorectal Polyps
Although every colorectal polyp has the capacity to progress to the malignancy, only 2-5%
polyps will actually develop into an invasive cancer [16]. Knowledge and identification of the
histological type and pathological features of different colorectal polyps could be crucial to predict
the potential of cancerization, stabilization, or even regression in the evolution from adenoma to
carcinoma. The following reviews will be focus on the most common colorectal polyps: conventional
colorectal adenoma and serrated lesions including hyperplastic polyps and serrated adenomas.
Conventional Colorectal Adenomas
Colorectal adenomas are the most common neoplastic lesions in the colon and rectum,
presenting in 85-90% of the detected colorectal polyps [17]. The prevalence of adenomas detected
by colonoscopy is at least 25% in men and 15% in women, and further increases with age [9]. Most
adenomas appear to be circumscribed, polypoid, and protruding to varying extent into the bowel
lumen, and are usually redder than the surrounding mucosa. From their architecture, adenomas can
be classified as tubular, villous or tubulovillous adenomas. Most adenomas are smaller than 1 cm and
have pedunculated polyps as tubular architecture. Villous adenomas are often larger and have leaf-
like projections of epithelium. Tubulovillous adenomas have a mixture of tubular and villous
structures.
140
As precursor lesions for CRC, more than 70% of CRCs arise from colorectal adenomas [18],
especially from high-risk advanced adenomas, defined as adenomas with at least one of the following
characteristics: 3 or more polyps, or any single polyp with a size greater than 1 cm or with villous
structure, or high grade dysplasia [19-21]. In the general population, the malignant transformation
from adenomas to carcinoma is estimated to be about 0.25% per year. However, the advanced
adenomas have up to 5% annual risk to progress to cancer, which risk increases to 25% at age of 55
years and to 40% at age of 80 years [22]. The progression rate also increases with the larger polyp
size, the greater polyp numbers, the condition with high grade dysplasia, and the presence of
proximal adenomas and villous architecture [22-26]. Patients with 1-2 tubular adenomas in size less
than 1 cm and without high grade dysplasia are considered at low risk for CRC, which carries no
increased risk of CRC compared to the general population [27-32]. However, a recent meta-analysis
including 7 studies on 11,387 patients demonstrated that the incidence of post-polypectomy
advanced neoplasia (advanced adenomas or cancer) among patients with low risk adenomas at
baseline was 3.6%, which was significantly higher than those without adenomas on the first
colonoscopy (1.6%) (RR=1.8; 95% CI: 1.3-2.6) [33].
The formation of colorectal adenomas initiates within colonic crypt, comprising stem cells
capable of self-renewal and regenerating all intestinal cell types though a series of events involving
proliferation, differentiation and migration toward the intestinal lumen [34]. The APC/β-catenin
signaling pathway is known to play a primary role in regulating stem cell differentiation and function,
which leads to the proposed multistep genetic model of colorectal carcinogenesis as the conventional
adenoma-carcinoma sequence [10]. At the initial step, activation of the Wnt/β-catenin signaling
pathway results in the mutations in the APC tumor suppressor genes. A stem cell acquiring an
oncogenic mutation can expand to occupy the entire crypt through genetic drifting and selection,
thus developing into a dysplastic aberrant crypt focus or microadenoma [35]. The followed
141
mutational activation of oncogene K-ras, inactivation of tumor suppressor gene p53, and loss of
heterozygosity at chromosome 18q are required for the progression to larger adenomas and early
carcinomas [36]. CIN, referring to chromosomal instability resulting in gains or losses of whole or
large portion of chromosome, is often observed in benign adenomas and increases with tumor
progression [37]. In addition, conventional colorectal adenomas are also a manifestation of Lynch
syndrome, which is an autosomal dominant disorder caused by the defects in DNA mismatch repair
genes, thus leading to the progression to CRC with high microsatellite instability (MSI-H) [38].
Serrated Lesions
Serrated lesions are defined as a heterogeneous group of polyps with morphological
characteristic of a serrated or saw-toothed architecture on the epithelial compartment. About 10-15%
of detected colorectal polyps are characterized as serrated lesions [5], which ultimately result in
approximate 20-30% of CRC. Lesions in this group include benign hyperplastic polyps and potentially
malignant serrated adenomas (sessile serrated adenoma/polyps, traditional serrated adenoma), and
mixed hyperplastic-adenomatous polyps with a combination of discrete hyperplastic and non-
serrated adenomatous components [39]. Different from conventional colorectal adenomas, aberrant
crypt foci, the earliest recognizable lesions in this group, are often non-dysplastic, which tumor
progression follows an alternative serrated pathway, leading to microsatellite unstable CRCs, such as
BRAF mutant CIMP-positive (CpG island methylator phenotype) tumors [40-42].
Hyperplastic Polyps
Hyperplastic polyps are common and present in up 30% of asymptomatic adults undergoing
screening colonoscopy [42], accounting for 80-90% of all serrated lesions [5]. They develop at an
earlier age than adenomas, with no evidence of correlation with increasing age as found in adenomas
142
[43-45]. Hyperplastic polyps usually appear to be diminutive, pale sessile lesions with size <5 mm,
and frequently located in the left colon, particularly in the rectosigmoid colon, which contrasts to
adenomas with even distribution throughout the colon and rectum [46, 47]. Based on morphology,
distribution and molecular characteristics, distinct subtypes are recognized. Microvesicular
hyperplastic polyps locate predominantly in the distal colon or rectum, and are associated with BRAF
mutations, which result in epithelial crowding and are responsible for the serrated morphology [41,
42]. Goblet cell hyperplastic polyps are shown as numerous goblet cells with inconspicuous serration
found exclusively in the distal left colon. K-ras mutations were observed in approximate 50% of this
subtype [41, 42].
In the traditional view, hyperplastic polyps are likely to be clinically insignificant lesions with
no malignant potential to CRC [33, 42]. It has been a long-time subject to study the clinical
association between hyperplastic polyps and adenomas, particularly in the topic of whether distal
hyperplastic polyps serve as markers for synchronous proximal adenomas. The available evidences,
weighted the results from the prospective studies [48-50] over those from the retrospective studies
[51, 52], suggest that the association is unlikely. However, analysis from data derived from two large
randomized trials [53, 54] on vitamin and calcium supplements to prevent colorectal adenomas
revealed that patients with hyperplastic polyps alone on the first colonoscopy had a higher risk of
hyperplastic polyps recurrence than those without hyperplastic polyps (OR=3.67; 95% CI: 2.54-5.31),
but hyperplastic polyps history was not significantly associated with metachronous adenoma
occurrence during follow-up (OR=1.06; 95% CI: 0.73-1) [55]. There are molecular genetic evidences
that hyperplastic polyps are in fact benign early neoplasms, since they allow the accumulation of
genetic changes in a normal crypt cell, and represent clonal expansion and neoplastic proliferation of
colonic epithelial cells carrying mutations or epigenetic alterations [56, 57]. Furthermore, it is
commonly believed that microvesicular hyperplastic polyps is the precursor of sessile serrated
143
adenomas/polyps, although the transition process may be slow or random [42]. Therefore, the
distinction between hyperplastic polyps and more advanced lesions is of clinical importance.
Sessile Serrated Adenomas/Polyps (SSA/Ps)
Sessile serrated adenomas/polyps (SSA/Ps) are a type of neoplastic colorectal polyps usually
located in the right colon, accounting for approximately 15-20% of all serrated lesions [5, 42]. They
are usually large (>5 mm) and appear to be sessile and yellow. SSA/Ps are often covered by a mucus
cap, thus difficult to be detected endoscopically [58]. Despite their malignant progression towards to
carcinoma, uncomplicated SSA/Ps lack cytoplastic dysplasia compared with conventional colorectal
adenomas. Nowadays, a combination of conventional adenomas with different grades of dysplasia
and serrated lesions, referred to as “mixed hyperplastic-adenomatous polyps” in the past, are
suggested to be termed as “SSA/P with cytological dysplasia” [39]. It is noted that SSA/Ps are
associated with both synchronous and metachronous CRC risk, but the degree of risk is still unknown
[40]. Like conventional adenomas, the risk of progression from SSA/Ps to carcinoma increases with
larger size of the polyp and older age in patients. Some evidence observed more aggressive biologic
features in SSA/Ps among women than among men [39, 59, 60].
There appears to be an alternative serrated pathway, rather than the traditional adenoma-
carcinoma sequence, regulating the transformation from non-dysplastic aberrant crypt foci, to
microvesicular hyperplastic polyps, to SSA/Ps, to SSA/Ps with cytological dysplasia, and eventually to
invasive cancer. On a molecular level, BRAF mutations and the high level of CpG island methylation
(CIMP-H) on promoters contribute to the development of SSA/Ps from a normal colonic mucosa.
Impaired mismatch repair driven by methylation and consequential suppressed expression of a major
DNA mismatch repair gene (MLH1) leads to tumorigenesis with the high level of microsatellite
instability (MSI-H) phenotype. In addition, it was hypothesized that cytological dysplasia in
144
conjunction with the methylation of MLH1 are prone to increase the rate of additional mutations in
tumor suppressor genes and oncogenes, thus resulting in the faster rate of malignant progression
observed in SSA/Ps than via the traditional pathway [41, 61]. This is consistent with the observation
that serrated adenocarcinoma are more prevalent than SSA/Ps with cytological dysplasia [62, 63],
and also explains most interval cancers (CRC occurring before scheduled surveillance or between 6
and 36 months after a clearing colonoscopy) are featured with CIMP-H MSI-H [64]. Another possible
explanation is the failure of endoscopic detection of SSA/Ps at the initial examination. Moreover, the
“mild mutator” pathway involving in a higher frequency of K-ras mutations, and promoter
methylation and inactivation of MGMT gene has been associated with CIMP-H MSS (microsatellite
stable) tumor, comprising of 40% of tumors developed from SSA/Ps [57, 65, 66].
Traditional Serrated Adenomas (TSAs)
TSAs are the rare subtype of serrated lesions, with a frequency of 1-5% of all serrated lesions,
accounting for approximately 1% of colorectal polyps [5, 42]. Like most serrated lesions (hyperplastic
polyps), TSAs are predominantly located in the distal colon and the rectum. The lesions are often
large (>5 mm). 67% of TSAs resemble conventional adenomas with a pedunculated, polypoid
appearance, while the rest lesions appear to be sessile and are more likely to occur in the proximal
colon [42]. Cytologic dysplasia are often present in TSAs and the malignant risk to carcinoma is
comparable with conventional adenomas [5, 67]. On a molecular level, TSAs have not been well
characterized. Cancers developed from TSAs are frequently K-ras mutants with CIMP-H MSS
phenotype [40-42]. However, BRAF mutations alone or both may also occur [42, 68].
145
5.1.3 CRC Screening by Colonoscopy
The aim of cancer screening is the detection of disease at the earlier stage in asymptomatic
individuals to reduce the incidence and mortality of the cancer by the introduction of early
intervention. CRC is a good candidate for a screening, because: (i) it has high incidence and
prevalence, and is one of the top causes of cancer mortality worldwide; (ii) it takes long time (5-20
years) to develop from precursor lesions (colorectal adenomatous polyps) to invasive cancer [14, 15];
(iii) adenomatous polyps can be well managed during the screening procedure; (iv) its early stage can
lead to better prognosis and survival. In the United States, the general guideline for the CRC
screening is to initiate at the age of 50 years for people at average risk but earlier for people at
increased risk because of family history or certain medical conditions, with suggested follow-up
surveillance intervals based on the individual circumstances at the initial screening (the presence and
features of neoplasia, advanced neoplasia, or cancer) [69-71]. The current screening methods include
the fecal occult-blood test (FOBT), fecal immunochemical test (FIT), stool DNA test, sigmoidoscopy,
colonoscopy, CT colonoscopy, and barium enema with the use of air contrast [70]. In 2015, 63% of
the U.S. population at the age of 50 or older are current with recommended screening, which
increased from 34% in 2000 largely due to colonoscopy testing [72]. Except for improved lifestyle
(such as better diet and more exercise, less smoking and alcohol consumption) and better treatment,
the uptake of CRC screening contributes to more than 50% of reduction in incidence and mortality
from CRC since 1975 [69, 72].
Colonoscopy has been used as the primary screening tool for CRC in several countries in the
western world, such as Australia, Germany, Poland, Switzerland and the United States, although
majority of European countries have recommended FOBT-based program. According to the US
National Health Interview Survey in 2015, 60% of adults at the age of 50 or older had either a
sigmoidoscopy in the past 5 years or colonoscopy in the past 10 years, while only 7% had an FOBT or
146
FIT in the past year [72]. Colonoscopy is usually performed by a gastroenterologist to examine the
entire colon and rectum as well as remove polyps if present. This procedure requires purgative bowel
preparation to completely cleanse the colorectum before examinations, and often needs sedation
during performance since a long instrument with light and video camera called a colonoscope is
inserted into the anus and moved through the rectum to the colon, and then to the cecum. Most
polyps detected during colonoscopy are sliced by a wire loop or electric current for biopsy or removal,
known as a polypectomy procedure [70, 72]. Therefore, colonoscopy is not only performed for
screening purpose, but also used for diagnosis after abnormal results from other screening tests,
such as FOBT. Comparing to other screening tests, colonoscopy can detect both cancer and
precancerous adenomas throughout the entire colon and rectum with 90% sensitivity for large
lesions (≥10 mm). The recommended surveillance interval for a patient at average risk with a normal
result is 10 years to have another colonoscopy, which is longer than sigmoidoscopy (5 years) and
FOBT or FIT (annually) [70]. However, the limitations of colonoscopy include its invasive character
with sedation, the potential complications such as bowel tears and bleeding, the excessive cost and
the need for colonoscopy capacity.
As of today, there are no available randomized controlled trials to show the benefit of
screening colonoscopy in reducing the incidence or mortality of CRC. The effectiveness of
colonoscopy and polypectomy are currently evaluated based on observational studies. Brenner et al.
[73] conducted a meta-analysis including 3 cohort studies and 3 case-control studies from the US,
Germany, Switzerland and Canada [74-79] and reported a 40% to 60% lower risk of incidence and
death from CRC after screening colonoscopy, although the significant reduction in mortality was only
found in the proximal colon cancer. Furthermore, after a median follow-up of 16 years, the CRC
mortality of 2,602 patients who underwent initial colonoscopy and polypectomy in the National
Polyp Study was reduced by 53% (95% CI: 20-74%) compared to that in the general population from
147
SEER database [80]. More scientific evidence is needed and will be provided by several ongoing
randomized controlled trials [81-84] to investigate the protective effect of colonoscopy on CRC
incidence and mortality, as well as its superiority to other screening methods.
Patients who underwent polypectomy at colonoscopy are at an increased risk for developing
metachronous adenomas or even cancer later in life [28, 32, 85-88]. The possible explanations for
this finding include the individual’s unfavorable characteristics, such as lifestyles and/or an unknown
genetic predisposition, or missed or incompletely resected lesions at the initial colonoscopy, thus
leading to the development of new lesions in patients with a personal history of colorectal
adenomas [27, 29, 30, 88, 89]. Therefore, besides detection of neoplastic polyps and early-stage
cancer, the screening colonoscopy also serves as an endoscopic surveillance program to enroll
patients who had precancerous adenomas.
As mentioned previously, colorectal polyp characteristics, such as size, numbers, morphologic
or histologic types, have different malignant potential toward carcinomas. Based on these factors
identified from the initial colonoscopy, patients can be stratified into a low- or high-risk group for the
recommendation of the appropriate surveillance interval. According to the guidelines of the
American Gastroenterological Association (AGA) [90], the high-risk group, comprising patients with 3
or more adenomas, or any adenoma with size ≥10 mm or villous histology or high grade dysplasia, is
suggested to undergo surveillance colonoscopy after a 3-year interval. Patients with 1-2 tubular
adenomas of size <10 mm and with low grade dysplasia, categorized in the low-risk group, have
recommendation of surveillance in 5-10 years, due to no increased risk of CRC compared to the
general population. For serrated polyps, 3-year or 5-year surveillance interval is suggested to the
high-risk group with polyp size ≥10 mm and dysplasia, or the low-risk group with polyp size <10 mm
and no dysplasia, respectively. Currently, there is no specific recommendation to patients who had
the removal of hyperplastic polyps alone at the initial colonoscopy.
148
The risk of cancer from a colorectal adenoma should be eliminated with its complete removal
at colonoscopy, at least before the formation of new lesions in the recommended surveillance
interval. However, the occurrence of interval CRC after colonoscopy suggests the imperfection of this
approach. Interval CRCs are defined as CRC occurring before scheduled surveillance or between 6 and
36 months after a complete colonoscopy, which account for 3-7% of all detected CRCs and are more
likely to locate in the proximal colon [91-93]. The most noble cause of interval cancers after polyp
removal is probably inappropriate quality of the colonoscopy procedure. Missed lesions have been
reported to result in nearly half of interval cancers [94-96]. Moreover, interval CRCs were frequently
observed at the site of a previous polypectomy, suggesting the possibility of incomplete resection
[94-96]. Lesions in the proximal colon are usually flat and sometimes covered by mucus, especially
SSA/Ps, which leads to more difficulty to be detected and removed completely at colonoscopy.
Furthermore, interval CRC tumor often display CIMP-H MSI-H phenotype that is commonly associated
with fast-growing tumors progressed from SSA/Ps [64, 97, 98]. In addition, poor adherence to
surveillance guidelines might be another reason for interval CRC. Gastroenterologists may be lack of
knowledge or lack of acceptance of guidelines to give patients appropriate recommendations [99],
whereas patients may hesitate to complete surveillance colonoscopy on time probably due to the
invasive characters or financial concerns [100]. Compared with those on schedule, delayed
surveillance has been associated with the increased risk of advanced neoplasia as well as invasive
cancer [101].
Observational studies and trials [23, 102-105] have demonstrated that 20-50% of patients
will develop a metachronous neoplastic lesion detected at follow-up colonoscopy within 3-5 years
after colonoscopic removal of one or more adenomas. Of those newly-diagnosed neoplastic lesions
at surveillance, up to 20% are advanced adenomas and nearly 1% are diagnosed with invasive CRC
[88, 94, 96, 106]. Knowledge on why there is a higher risk to develop advanced neoplasm or cancer
149
among patients who underwent polypectomy than those with no polyps is still very limited. More
research is needed to investigate the factors that may facilitate or accelerate this malignant
transformation, so that we can optimize the screening-surveillance strategy based on risk
stratification, can improve the quality of the colonoscopy techniques, and may develop new
preventive methods to further reduce the incidence and mortality from CRC.
5.2 The Gut Microbiome
The human intestine contains a wide composition of microbial flora, known as the gut
microbiota. There are as many as 100 trillion bacteria composed of more than 1,000 species along
with virus, archaea, and fungi that interact with the host usually in symbiosis. Only 30% of the
microorganisms have been characterized [107]. The gut microbiota contains 10-fold more cells than
the total number of human cells and 150-fold more genes than the human genome. This collective
bacterial genome is referred to the gut microbiome [108, 109]. Normally, the gut microorganisms
exist in a mutually beneficial symbiotic relationship with the host by training the host immune system
[110], protecting against opportunistic pathogens [111], harvesting nutrients and energy from diet
[112], fermenting non-digestible carbohydrates [113] and also maintaining intestinal barrier function
[114, 115]. However, when the balance is deregulated in a certain situation, a shift from normal
microbiota colonized by commensals to dysbiosis results in an increase of pathobionts that are
organisms with pathogenic potentials, leading to the induction of inflammation [116]. Therefore,
such changes in the gut microbiota have been shown to be associated with a variety of chronic
conditions or diseases, such as obesity, diabetes, inflammatory bowel disease (IBD), as well as
colorectal adenomas and CRC [117-120].
150
5.2.1 The Gut Microbiome in Human Health
The gut microbiome is a complicated micro-ecosystem residing in the human gastrointestinal
tract. It is extremely dynamic. The compositions and functions of the microbial community assemble
and develop in the host since birth, and then keep changing with temporal stability over the life span
of an individual [121-123]. It is highly interactive with the host genetics and lifestyles as well as other
environmental factors, leading to the great variations from people to people, or even different body
sites within the same individual. With the emerging of culture-independent genome sequencing
techniques, the growing understanding of the gut microbiota addresses its significant role played in
humans as an indicator of and contributor to human health and disease.
Development of the Gut Microbiota during Lifespan
It is commonly accepted that the human fetal environment is sterile and the microbial
colonization begins at birth [124]. When the baby leaves the uterus, the microorganisms are
immediately acquired both from the mother and the environment. Compared to the adult
microbiome, the microbial communities in newborns are less diverse with a relatively low level of
Bacteroidetes and are highly volatile [123, 125]. Recent studies suggest that mode of delivery
(vaginal delivery vs. Cesarean delivery) may heavily influence the diversity and composition of the
early microbiome in newborns, although the long-term effect is still unknown [125-129]. The gut
bacterial composition of vaginally delivered infants resembles that of the mother’s vagina, while the
infants delivered via C-section are more likely to harbor the microorganisms dominant on maternal
skin as well as bacteria from the surrounding environment [125, 129]. The different microbiota due
to Cesarean delivery can be partially restored via vaginal microbial transfer [126]. The first four years
during early childhood are critical for the development and maturation of the gut microbiome, which
may affect the health outcome in their future life. More microorganisms acquired from the diet and
151
environment enrich the overall bacterial diversity in children’s gut microbiome. Breast-fed infants
exhibit more commensal bacteria with potential benefits but less putative pathogens than formula-
red infants [130-133]. Meanwhile, the components in breast milk, such as bacteria derived from
maternal gut microbiome, immunoglobin, and prebiotic oligosaccharides, may further shape the
infant gut microbiome as well as promote the development of infant immune system [134, 135].
Moreover, during the transition of infant diet from breast milk to solid food, a functional shift of the
gut bacterial community in nutrition biosynthesis and metabolism has been observed, leading to the
increased similarity to the adult gut microbiome composition [130, 136]. In addition, the different
developments of the child gut microbiome are also associated with other early life exposures, such as
childhood illness and treatment (e.g. antibiotics use) and hygiene environment (e.g. number of
siblings, exposures to pets, daycare attendance) [130, 133, 137, 138]. With less knowledge in
adolescents, hormone change is postulated as a major factor affecting the gut microbiome besides
other childhood exposures [139].
In adulthood, the gut microbiome becomes maturation and remains relatively stable with
more divergent microorganisms compared to those in childhood [140]. A healthy adult’s gut is
dominated by the bacteria from the phyla Firmicutes and Bacteroidetes, following by the less
abundant phyla including Actinobacteria, Proteobacteria, Verrucomicrobia and Fusobacteria [141].
The Bacteroidetes are often associated with the fermentation of non-digestible carbohydrates to
metabolizable short chain fatty acids (SCFAs) resulting in several health benefits [142], whereas the
Firmicutes are mostly attributed to energy harvest from the diet, which ratio to Bacteroidetes are
commonly linked with body weight gain [143]. The adult gut microbiome, shaped by the host genome
and childhood experience, is very much individual-specific. However, the similar functional gene
profiles have been observed in the gut microbiota across individuals with highly variable
compositions based on several culture-independent sequencing studies, suggesting the healthy
152
adults share a functional core microbiome rather than a compositional core [118, 141, 144].
Longitudinal studies have shown that the gut microbial composition from the same individual are
more similar over time than those from different individuals [118, 145-148]. This temporal stability
likely represents the equilibrium states for the gut microbiome, which could be disturbed by the
individual interferences, such as travelling, dietary changes, health condition alterations or medical
treatments [149]. A healthy gut microbial community can tolerate such perturbations and stress to a
certain extent before it shifts to a different equilibrium state, referred to resilience of stable state,
thus resulting in a restoration in both composition and function of the gut microbiota from the
changes [150]. With increases in age and declines in health, the microbial stability and resilience
decreases in the elderly. As a result, the gut microbiome of the elderly is substantially different from
that of younger adults, with a reduction in beneficial commensal bacteria such as Bacteroides and
Bifidobacterium and an increase in Enterobacteriaceae [151-154]. The factors attributable to the
observed changes in the elderly gut microbiome include dietary deficiency and altered oral
microbiome due to the impaired dental conditions, increased illness and use of medications, loss of
mobility with less physical activities, as well as dramatic changes in hormone levels especially in
women [154, 155].
Factors Shaping the Gut Microbiota
Besides age, during the establishment, development and maintenance of the microbial
community in the human gut, host genetics and environment factors such as diet, antibiotic use,
geographical location and social factors have a significant impact on the composition of the gut
microbiome. However, how these factors shape microbial diversity in the gut is still unknown, partly
because they are often confounded with each other. Early studies in human twins and mother-
daughter pairs found the more similarity in the microbiota compositions compared to unrelated
153
individuals, suggesting a small host genetic effect on the microbiome [118, 123, 156-158]. A
subsequent twin study with the larger sample size reported a weak but significant heritability of the
gut microbiome, since monozygotic (MZ) twin pairs had slightly more similar microbiome than
dizygotic (DZ) twins [159]. The underscored environmental factors, particularly diet and lifestyle
preferences that are also heritable in families [160, 161], are possible explanations for the lack of a
strong effect for host genotype across twin studies. With more controls in the environmental
influences, quantitative trait loci (QTL) mapping studies in mice identified a set of genomic loci
significantly associated with microbial abundances [162-164]. Genome-wide association studies
(GWAS) extended the findings in human genes with roles in metabolism (e.g. APOA1 involved in the
secretion of bile acids) or innate/adaptive immunity (e.g. MYD88 responsible for TLRs signaling, or
HLA genes associated with T cell differentiations) , which details have been summarized elsewhere
[165]. Moreover, a most recent 16S rRNA-based GWAS study including 1,126 twin pairs from the
TwinsUK registry further confirmed the microbiota heritability in both bacterial diversities and each
taxon, as well as recognized the family Christensenellaceae as the most highly heritable taxon. They
replicated an association between Bifidobacterium and the lactase (LCT) gene locus using both a
candidate gene approach and GWAS, validating the finding on the higher level of Bifidobacteria in the
stools from the lactose-intolerant individuals who intakes milk. Additional genetic determinants of
the gut microbiome detected in UK twins have been linked to diet-sensing, metabolism and immune
defense, but still require validations across multiple studies with a large number of subjects and more
controls for diet [166].
Diet is one of the most important factors shaping the human gut microbiota, especially
during the developing stage in the earliest life. Once established, more focus has been paid on how
changes in diet alter the gut microbiome composition in relation to diseases. Under a fiber-rich diet,
the fiber-degrading bacteria, such as the genera Prevotella and Xylanibacter, can metabolize the
154
nondigestible dietary fiber into SCFAs (e.g. acetate, propionate, and butyrate) [167], which exert
beneficial effects on glucose and energy homeostasis [168, 169], enhance the gut barrier function of
intestinal epithelial cells [170], and promote regulatory T (Treg) cells generation as well as anti-
inflammatory activities in the intestine [171, 172]. In contrast, under a fiber-deficient diet, some
gram-negative Bacteroides and Proteobacteria expand and impair the integrity of the mucus layer,
which are linked to metabolic endotoxemia-induced inflammation [173, 174]. The microbial
composition in the gut can response to dietary changes rapidly [149]. It has been shown that shifting
from a high-fat, low-fiber “western” diet to a low-fat, plant-based diet can change the gut microbiota
within a day in both mice [175] and humans [176]. However, these microbial changes are relatively
small compared with the variations between individuals and likely to be restored once cessation.
Thus, the short-term changes in dietary patterns may not have major influences [176]. More notable
changes in species and gene contents have been observed for the long-term dietary changes for 10
days to a year [142, 176], suggesting a long period of time may be required in order to complete the
systematic shift in the gut microbiota to an alternative stable state due to changes in diet [123, 176].
Furthermore, the gut microbiome can be modulated by dietary supplements, such as prebiotics or
probiotics. Prebiotics are nondigestible food components that induce expansion of beneficial
microbiota, for example, Bifidobacterium and Actinobacteria. The former microbe reduces endotoxin
levels and improves gut barrier function [177] , while the latter promotes the host’s immune
responses to protect against pathogens [178, 179]. Probiotics are live microorganisms, such as strains
of the genera Lactobacillus and Bifidobacterium, that are commonly found in dietary yogurt and
confer a beneficial effect on the host when administered in adequate amounts [180, 181]. The
suggested mechanisms include restoration of a disturbed gut microbial community by boosting other
beneficial bacteria and inhibiting the pathogen growths [182, 183], protection of mucosal integrity
[177], and activation of anti-inflammation pathways [184]. By modulation of the gut microbiome
155
from a combination of probiotics and prebiotics known as synbiotics, health benefits have been
reported in the protection against food allergies [185] as well as the reduction in CRC risk [186].
Although the gut microbiota is resilient to a relatively stable state within individuals over
time, one of the major external perturbations that alter the microbial community is antibiotic use. In
western countries, overuse of antibiotics as well as the increased environmental cleanliness have
been negatively associated with the bacterial diversity in the gut, thus contributing to the increasing
risk of allergic diseases and attributing to “the hygiene hypothesis” [187]. Major alterations of the gut
microbial composition with decreased diversity and richness have been observed following antibiotic
treatment, and persisted for months to years, in which some taxa are lost permanently [145, 188-
190]. After the initial treatment of antibiotic, the reshaped microbial structure in the gut becomes
more vulnerable to the colonization by exogenous microbes and overgrows the antibiotic-resistant
pathogens, then leading to the slow and incomplete recovery of the original microbiota, and likely
stabilizing in a different interim state that is more resilient to the repeated use of antibiotic [146,
191]. Moreover, these long-term changes associated with gene activations involving in the host
immune system or metabolism might result in sustained proinflammatory states and ultimately
develop into chronic diseases [192, 193].
Apart from age, host genetics, diet and antibiotics use, the characteristics in the gut
microbiome can also be different with respect to gender [194, 195], geographic locations [123, 196-
198], social factors (e.g. social-economic background, living condition and styles) [122, 194, 196, 199,
200], as well as seasons [201, 202]. The effects of each factor on the microbiome composition are
likely explained by a combined effect from other factors. For example, the observed geographic
differences between rural and urban areas [198] or between western and nonindustrialized countries
[123, 203] may be attributable to variations in ancestry effects, dietary habits, social cultures,
hygiene conditions or other environmental influences. Therefore, to better understand the role of
156
the gut microbiota in human health, all these host and environmental factors should be taken into
considerations for both study designs and evaluations.
Roles of the Gut Microbiota in Human Health
Over million years along the path to human evolution, the gut microbiota lives and evolves
together with human beings for mutual benefit. They colonize in the human intestine, adapt with the
gut microenvironment, response to the external environmental changes, and interact with the host,
then modulating various aspects of host physiology with metabolic, immunological and protective
functions in a healthy individual.
A major role of the gut microbiome is to produce important metabolites for its host [204]. As
discussed previously, bacterial fermenters, such as Bacteroidetes and butyrate-producing strains in
Clostridium, can digest dietary carbohydrates to SCFAs, which involves in energy homeostasis [168,
169], intestinal barrier maintenance [170], anti-inflammatory and anti-carcinogenic effects [171, 172],
and reducing oxidative stress [205]. Meanwhile, hydrogen gas produced during fermentation can
react with sulfate to generate hydrogen sulfide, which is toxic to the mucosal tissues with potential
pathological consequences [204]. In addition, the gut microbiota is also able to synthesize certain
vitamins (e.g. vitamins K and B) [206] and hydrolyze proteins into small peptides and then amino
acids to support the energy and metabolism of other bacteria in the colon [207], whereas some
bacteria can convert primary bile acid to some secondary bile acids (e.g. desoxycholic acid and
lithocholic acid), leading to carcinogenic effects [208].
The gut microbiota is also important in protecting the gut integrity. The intestinal epithelium,
composed of the mucus layer and the tight junctions, forms a barrier between the gut and the
environment and regulates the absorption of nutrients and water to maintain the intestinal
homeostasis. Butyrate, a member of SCFAs, produced by the gut bacteria, can increase the
157
expression of tight junction proteins as well as promote the secretion of mucins, thus decreasing the
gut permeability and protecting the intestinal barrier [209, 210]. Germ-free mice showed the
impaired intestinal barrier with incomplete mucus layer and low expression of tight junction proteins,
leading to a susceptibility to dextran sulfate sodium (DSS)-induced colitis [211, 212].
Since the intestinal mucosa overlying the gut microbiota represents the largest surface in
contact with the antigens of the external environment, the host’s mucosal and systematic immune
systems have been developed and modulated by the continuous and dynamic interactions with the
gut microbiota and their metabolites, in which both innate and adaptive immune systems are
involved. A detailed overview of the gut immunology can be found elsewhere [213-215]. Briefly, in
the mucosa, the innate immune cells present the pattern recognition receptors (PRRs) on their
surface to recognize and bind to the non-specific microbe-associated molecular patterns (MAMPs)
from different bacteria, leading to a host’s immune response. Components of the bacteria wall (e.g.
lipopolysaccharide and peptidoglycan), flagellin or other microbial molecules are some common
MAMPs recognized by the host toll-like receptors (TLRs) or NOD-like receptors (NLRs) [216], which
can induce directly killing invasive bacteria [217], produce anti-inflammatory cytokines (e.g. IL-10) to
maintain intestinal homeostasis [217], release pro-inflammatory mediators (e.g. IL-22) and
immunoglobulin-A (IgA) to stimulate the adaptive immune responses [218, 219], and form oligomers
(inflammasomes) to serve as sensors of damage-associated patterns [214]. Moreover, in response to
the detection of commensal bacteria, a series of cell activations by dendritic cells and the production
of IL-1β by macrophages lead to the generation of regulatory T cells, which is important to suppress
inflammatory response and promote immunological tolerance [220, 221]. The major sites for
adaptive immune responses include peyer’s patches and isolated lymphoid follicles, where are
enriched with microfold cells (M cells), allowing certain commensal bacteria can be translocated and
recognized by dendritic cells and then presented to the naïve T cells to activate B cells and therefore
158
promote IgA secretion as well as activate pro-inflammatory T cells and regulatory T cells [222]. Thus,
the T cell differentiations can not only be determined by self and non-self discrimination, but also
modulated by the gut microbiota [223]. Smaller peyer’s patches and mesenteric lymph nodes,
reduced number of M cells and T lymphocytes, and decreased level of gut secretory IgA have been
found in germ-free mice [224]. As a bridge between innate and adaptive immunity in mucosa, IgA
plays a fundamental role to protect mucosal surface, contribute to host protection against invading
pathogens, and prevent the overgrowth of the commensal microbes [225, 226]. In addition, IL-17
produced by the T helper 17 (Th17) cells can be induced by bacteria colonization, thus stimulating the
production of antimicrobial peptides (AMPs) close to the intestinal mucosa by the mucus layer to
allow local replication of intestinal bacterial as well as limit their contact with the host epithelium
[227]. Overall, immune responses depend on the entire bacteria community and vary in response to
microbial densities.
As a physical barrier in mucosa, the intestinal microbes also play an important role in
protection against pathogens colonization. It has been shown that antibiotic-treated mice are more
susceptible to the infection of introduced colonic pathogens than those untreated mice [228, 229].
Commensal bacteria and microbial pathogens often require the same niche to survive and thrive in
the human intestine. In order to prevent the colonization by pathogens, commensal bacteria are able
to modulate the microenvironment in the gut unfavorable to pathogens by producing toxins and
bacteriocins [230], influencing the pH or oxygen concentration in the gut [111, 231], inhibiting
virulence gene expression in pathogens [232], or competing for nutrient sources or attachment sites
[233, 234]. Furthermore, another strategy for the commensals to control the growth of pathogenic
bacteria in the gut is to activate the host innate immunity via various MAMPs/PRRs pathways to
stabilize the mucosal barrier and produce anti-microbial substances [235]. However, some pathogens
159
have also developed effective strategies to use commensal bacteria or escape from the defense
mechanisms for their own dissemination [236].
In summary, the composition of human gut microbial community is host specific and dynamic
throughout an individual's lifetime by continuously interacting with both exogenous and endogenous
environments, and then ultimately involves in numerous aspects of human health. Any imbalance of
this bacterial community known as dysbiosis, as well as the presence or absence of a certain microbe
in response to specific modifications, is not surprising to be linked with many diseases, including both
systemic conditions (e.g. obesity, diabetes) and gut-related IBD or CRC, although the underlying
contribution of the gut microbiota to these diseases is still unclear. In the following sessions, we will
focus on the gut microbiome influences on CRC development. However, before we review the
current scientific evidences, we need to address some concerns regarding to a human gut
microbiome study.
5.2.2 Current Concerns in a Gut Microbiome Study
Due to the rapid growth and advancement of the next-generation sequencing technologies,
accumulating evidences suggest an altered microbiota composition or function in the gut presents
between human health and disease. Although it is still unclear that the gut microbiota serves as a
contributor to or an indicator of human disease, it draws more and more attentions to the potential
to be a biomarker for disease detection or a therapeutic tool to treat disease. However, the field of
microbiome research in human is still young. Like any other human genomic studies, a microbiome
study requires specific knowledge of biologists and clinicians as well as promptly updating analytic
methods from bioinformaticians and biostatisticians. Meanwhile, the extremely dynamic and
individual-specific characteristics of the human gut microbiota also create new challenges for
160
researchers to generate, interpret and reproduce the findings. Therefore, to conduct an
epidemiological study on the gut microbiome in human disease, researchers need to acquire the
interdisciplinary knowledge and techniques in a timely manner. Recently, Claesson et al. [237]
published a comprehensive guide of microbiome analysis for clinicians on Nature Reviews and
provided recommendations based on current information. On the basis of it, we add here additional
insights from an epidemiologist’s perspective as well as a beginner to the human microbiome file,
based on our experience on a study of a relationship between colon adenomas and the fecal
microbiome using 16S ribosomal RNA (rRNA) gene sequencing technology.
Study Design
Like any epidemiological study, study hypothesis is the key to decide how to design a human
gut microbiome study. Given the extremely dynamic property of the gut microbiome, a longitudinal
study with repeated sampling is always ideal. However, in reality, financial budget is another critical
limiting factor to adjust the methods for study purposes.
Sequencing methods
Because the current culture-independent microbiome study initiates from the emerging of
the PCR-based sequencing technology, the advancement from marker genes analysis (e.g. 16S rRNA
for bacteria and archaea or 18S rRNA for fungal communities) to the shotgun whole-genome
sequencing has arisen the argument on which method to use. Metagenomic shotgun sequencing, in
which the total extracted DNA from both microbes and host is sequenced, has been favored, because
of its high resolution to species level and in the detection of rare taxa, as well as its ability to
incorporate microbial gene functional analysis as well as the host genomic features in a single study
[238]. In particular, since the gut microbiota shows highly variable in composition but relatively stable
161
on functions across individuals and time, metagenomic analysis has the unique advantage to reduce
the concerns about microbial compositional stability affected by the factors other than study of
interest. However, the applications of this newest method are still currently limited due to high
sequencing cost, extensive computational complexity, and still-developing analytic tools and
databases. Furthermore, besides microbial DNA sequences, transcribed microbial genes (mRNA)
representing the metabolic functions can also be characterized by sequencing microbial RNA (RNAseq
or metatranscriptomic analysis). But this method is even more expensive and complicated due to the
difficulty to extract microbial RNA from the host tissue or collect the undegradable RNA in the stool
sample.
On the other hand, 16S rRNA method, which sequences the 16S rRNA gene that is ubiquitous
in prokaryotes, is the most commonly used method to study human microbial composition. It is
nearly 20-30 times cheaper than shotgun sequencing and requires less computational skill with more
integrated bioinformatic pipelines for analysis. Meanwhile, its lower coverage can afford more
samples sequenced on one instrument run, leading to less bias from the instrument fluctuations
between different runs. Nevertheless, compared to the metagenomic results, the microbial
abundance from 16S rRNA gene sequencing only reaches to the genus level, which could be a
combined measure from different effects of varied species or even strains within the same genus.
Moreover, 16S rRNA method is more likely affected by PCR bias and uneven copy numbers of 16
rRNA gene across different bacteria [238, 239]. Therefore, although most of the current microbiome
knowledge in both animal and human studies is based on 16S rRNA gene analysis, the findings should
be interpreted in more cautious.
Furthermore, the next-genome sequencing approach is usually used for an agnostic study, in
which there is no candidate microorganism associated with the clinical question of interest. If the
study interest is a single bacterium or a limited number of bacteria with the significant roles based on
162
the previous knowledge, quantitative PCR (qPCR) assay using established or species-specific PCR
primers is another choice to conduct an association study or validate the agnostic findings, given its
low cost and less analytic requirement.
Confounding factors
The microbiota composition in the gut or other body sites can be influenced by the host
characteristics (e.g. age, ethnicity, BMI, disease history, or genomic background) and environmental
factors (e.g. diet or antibiotic use). Therefore, study population should be carefully recruited with the
specific inclusion and exclusion criteria based on the study hypothesis. Currently, due to its
substantial impact and potential long-term effect [145, 188-190], antibiotic use within 6 months is
often excluded from a gut microbiome study. Although all the possible confounding factors are
recommended to be recorded as detailed as possible at the time of sample collection, what and
when to collect and how to utilize such information is still somewhat arbitrary. In the most gut
microbiota studies, majority of those factors have been neglected due to lack of statistical power. As
a result, special cares on certain factors, such as restriction or matching in study design or statistical
adjustment using mathematical models, should be considered before conducting a study. From our
point of view, host genome and dietary changes are the most common factors left out of the current
microbiome studies, because they are difficult to measure and analyze. The former can be adjusted
in a metagenomic study or taken care by using matching samples, such as affected tissue vs.
unaffected tissues, or samples from different time points, or samples from discordant monozygotic
twins, whereas the effect of the latter is still often “forgotten” unless they are study of interest. In
the current practice, a validated dietary frequency questionnaire is a standard method to collect the
dietary information. However, such information mostly reflects the dietary habit for the past month
or year prior to the microbial sample collection. Since a short-term dietary change can transiently
163
modify the gut microbial composition within 24 hours [176], it may be also important to record the
particular dietary differences from the regular habit during one or two days before collecting the
samples. In addition, recent studies observe the seasonal variations in the gut microbiota [201, 202],
suggesting the date or time to collect samples might be an additional confounding factor in
consideration.
Sample types and collections
The gastrointestinal (GI) tract is the largest human organ, in which the numbers and
distributions of the microbiota vary greatly by sampling types and sites [240-242]. The common
sampling methods with their advantages and disadvantages were summarized by Claesson et al.
[237]. The choice of sample types depends on study purpose, disease states and collection
convenience. In particular, for a gut microbiome study on colon adenomas or CRC, the most common
sampling methods include fecal samples for both adenomas and CRC patients, or biopsy samples,
such as colorectal mucosal samples for adenomas and tumor or adjacent tissues for CRC. Fecal
samples are widely used because it is noninvasive and relatively easy to collect frequently even by
the participants themselves. However, fecal microbiome as a surrogate for the gut microbiota
community, is more likely to represent the bacteria, such as the families Bacteroidaceae,
Prevotellaceae, Rikenellaceae, Lachnospiraceae and Ruminococcaceae, harboring in the colon and
rectum, but to underestimate the microorganisms presenting in the other regions of the
gastrointestinal tract like the small intestine that is enriched with the families Lactobacillaceae and
Enterobacteriaceae [241]. Moreover, fecal samples might contain dead bacteria, and sometimes
sampling variables during collection and transportation are less controlled than biopsy samples.
Therefore, fecal microbial community may be less informative than mucosal biopsies in
characterizing disease-associated dysbiosis or serving as a diagnostic tool, but more suitable as a
164
screening biomarker for frequent monitoring. Nevertheless, although biopsy samples represent the
gut microbiota at the onset of disease, patients who went through the required bowel preparation
during the invasive procedure has been shown to harbor an instant and substantial change to the
intestinal microbiota, compared to the community before the procedure. The alterations were found
in both fecal and biopsy samples, and then the microbial composition was restored within 2 weeks
[243-248]. Hence, when planning a future study or evaluating results from previous studies, we
should keep in mind the difference between the fecal and mucosal microbiota, as well as the
potential confounding from the collection time in relation to disease and procedures. Additional
experimental supports are often needed to verify the findings.
Another consideration of sampling strategy is sampling frequency. It has been argued
whether the limited resource should be used to resample the same individuals over time or spent on
sampling more individuals for only once. The current gut microbiome studies are mostly based on
one-time measures, which are considered to be sufficient to characterize the microbiota community.
Several studies suggest the adult human gut microbiota is relatively stable within individuals over
time compared to the inter-individual differences, thus showing the reproducible patterns in diversity
at different time points in the same setting [118, 141, 142, 148, 176, 249, 250]. Moreover, some
specific bacteria have been observed in the host gut for years [251]. However, the microbial intra-
individual changes can approach to the variation between subjects at some time point due to
unmeasured confounding factors or by chance alone [147]. In this case, resampling over time will
provide more confidence in the findings as well as more comprehensive picture of changes
(absence/presence or relative abundance) in the microbial community with respect to disease status
vs. different collections. In addition, repeated sampling may be difficult for biopsy samples in a
clinical setting, but is possible for fecal samples self-collected at home with high compliance rates
[252], thus enabling the large-scale longitudinal epidemiological studies.
165
Sample transportation and short-term or long-term storage are also important factors in a
gut microbiome study, because these conditions have great impact on the yields and qualities of
microbial DNA or RNA. And the samples for metatranscriptomic analysis likely need more stringent
conditions to minimized the RNA degradation. The most widely accepted protocol is to freeze the
samples on dry ice or in liquid nitrogen immediately after collection and then store at -80°C, since it is
not always possible to process samples promptly in particular for large-scale studies [250]. However,
this method is not practical for fecal samples that are often collected at home and temporally stored
in home freezer at -20°C before shipping and long-term storage at -80°C. Numerous studies have
assessed the effect of storage conditions and durations on the fecal microbiota composition by 16S
rRNA gene sequencing and have found the controversial results, in which some samples showed little
effect [253-257], while others differed substantially [258, 259]. Meanwhile, the number of freeze-
thaw cycles has been associated with the compositional alteration in the microbial community and
should be minimized [260]. Moreover, preservative buffer such as RNAlater or other commercial
available storage kit, for example fecal occult blood test cards, can stabilize fecal samples for more
than a week at room temperature [259, 261]. RNAlater has also been used successfully for recovering
high yields of DNA and RNA from different sample types [259, 262, 263]. Therefore, whatever the
methods used, the rule of thumb for sample collections is to be consistent across all the samples and
to keep conditions constant, in which any unexpected changes should be recorded carefully.
DNA/RNA extraction, 16S rRNA gene amplification, and sequencing
DNA and/or RNA extraction is a critical step to a microbiome study, which could introduce
the most biases due to uneven cell lysis across different microorganisms in the community. Ideally, all
the cell types should be lysed equally though chemical (enzymatic) and/or mechanical methods, thus
resulting in the extracted microbial nucleic acids being representative of the microbial community in
166
the sample. However, different extraction protocols can produce distinct diversity profiles, because
cells can vary in their susceptibility to different cell lysing methods. Several studies have shown that
the use of mechanical lysis, such as beating human fecal samples by mechanical beads, likely gives
the highest diversity in 16S rRNA genes [264, 265] and better recovery of gram-positive bacteria [266],
leading to more faithful representation of the microbial composition [267]. In addition, the wide use
of commercial extraction kits with stringent quality controls increases the possibility for consistency
and reproducibility between studies, although different commercial kits may produce the different
yields of DNA or RNA and apparent bacterial compositions [268-273]. It is therefore important to
consider the types of samples and the expected microorganisms in the sample when choosing an
extraction method, and to be consistent across all the samples.
The microbial nucleic acid extractions from human samples often contain large amount of
host DNA or RNA, along with the cellular debris. The microbial 16S rRNA gene or the total DNA needs
to be amplified by PCR method to prepare a corresponding DNA library for the subsequent 16S rRNA
gene sequencing or metagenomic sequencing, respectively. Contamination of non-indigenous
bacteria or DNA from the lab environment or experimental reagents can significantly bias the results
from the sample’s true composition [274], which is particularly critical for samples with low microbial
DNA concentrations. Proper negative and positive controls should be introduced at each step to
monitor for contamination and maximize the use of genomic materials.
In studies using marker gene sequence to distinguish prokaryotic DNA from human genome,
16S rRNA gene is the most commonly used since it is ubiquitously found in all bacteria and archaea.
The full length of 16S rRNA gene is about 1500 base pairs, consisting of nine hypervariable regions
(V1-V9) separated by the conserved interspace areas. The combinations of variable regions are
unique to bacterial or archaeal taxa and can be used for taxonomic classification by comparing to the
reference databases, whereas the sequences of the interspace regions usually serve as templates to
167
design universal primers for PCR amplification of one to three variable regions in high-throughput
sequencing methods. However, the choice of PCR primers can affect the microbial diversity and
compositions [275]. Although the primers are designed based on the conserved sequences, some
taxa may have mismatches in those regions, thus resulting in the underestimated abundance of such
taxa due to failure of amplifying their DNA [276]. For instance, V1-V3 regions are usually used to
study bacteria only because of their poor ability to identify archaeal taxa, while primers targeting 16S
rRNA V3-V4 or V4 regions have been shown optimal to detect both bacteria and archaea in human
fecal samples [275, 277]. Moreover, some variable regions, such as V6 region of 16S rRNA gene, are
not specific enough to discriminate the gut microorganisms [278, 279]. The longer 16S rRNA gene
amplicon does not always perform better for taxonomic classification. In fact, the specific targeted
hypervariable regions in relation to sample types and microorganisms of interest have the largest
effect [275, 277]. Furthermore, we should keep in mind that microorganisms carrying different copy
numbers of 16S rRNA gene can lead to inaccurate estimations of their relative abundance [280].
Recent advances in DNA sequencing technology and computational biology have
revolutionized the field of microbiome, allowing mechanistic evaluation of the relationships between
human and its microbial symbionts. Since the first report to use 16S rRNA gene as a marker to study
microbial phylogeny in 1977 [281], several sequencing technologies have developed for microbiome
studies that use 16S rRNA-based methods as well as shotgun metagenomic analyses, including Sanger
sequencing (Applied Biosystems 3730xl DNA analyser), pyrosequencing (Roche 454 Genome
Sequencer GS, FLX and FLX Titanium), Illumina’s clonal arrays (Illumina GAIIx, MiSeq and HiSeq
system) and Thermo Fisher’s Personal Genome Machine (PGM) system (Thermo Fisher Ion Proton
system), which detailed comparisons can be found somewhere else [282]. Among them, the 454
pyrosequencing and Illumina MiSeq or HiSeq systems are the current major platforms for sequencing
of 16S rRNA gene amplicons. An assessment of platform effects on 16S rRNA gene sequencing found
168
that the 454 pyrosequencing and Illumina MiSeq systems had minor impact on quantitative
abundance estimations for the microbial community, but MiSeq over 454 produced better quality
reads for the downstream data analysis with higher throughput [275]. In addition, compared to
MiSeq, the Illumina HiSeq can generate more reads with shorter read length but take longer running
time and more cost.
Bioinformatics Analysis
Large-scale sequencing data from 16S rRNA gene amplicon or shotgun metagenomic
sequencing present bioinformatic, statistical and computational challenges. Currently, there are a
wide range of open-source or commercial software and analytical packages available for microbiome
data analysis, from processing raw DNA sequence data to the downstream analysis and ending with
publication-quality figures and statistical results. Claesson et al. [237] provided a list of the current
software options and packages used in published microbiome analysis for both 16S rRNA gene and
metagenomic sequencing data. And updated versions and novel methods are kept developing in a
timely manner to acquire the rapid improvement of sequencing technologies and the growing
requirements from scientific research. The choice of such analytic tools should be tailored for the
specific research question and data type. In the following review, we focus our discussion on a
general workflow of 16S rRNA gene amplicon sequence analysis.
For 16S rRNA-based microbiome study, the most popular bioinformatics pipelines are
Quantitative Insights Into Microbial Ecology (QIIME) [283] and mothur [284]. Both packages are free
and open source with online tutorials and forums. Mothur is a single, large program for analysis. It is
convenient for a standard protocol but not flexible to implement a specialized analysis step into the
large codebase that can increase development time and support burden, thus resulting in errors. By
splitting the workflow into the steps with fully transparent scripts, QIIME users have more control to
169
the analysis and easily build the “name-brand” pipelines for their own needs. However, the downside
is that the users must track down and install the individual tools. On January 1, 2018, QIIME 2, a more
integrated and extensible version with automatic tracking system will replace QIIME. Other software
and algorisms are also available independently, and may requires more specialized knowledge for
applications.
A 16S rRNA gene analysis generally begins with quality-filtering and demultiplexing of the raw
sequence reads. During PCR amplification, a barcode sequence is added to the primers, which is
unique to each sample. Demultiplexing process is used to assign the raw sequences to its sample of
origin based on these barcodes. Any reads that are not matched to any barcode are discarded at this
step. Because 16S rRNA gene sequences are clustered into operational taxonomic units (OTUs) based
on sequence similarity, sequencing errors can lead to spurious separated OTUs, thus overestimating
the microbial diversity [285, 286]. As a result, based on read length, the presence of ambiguous base
calls, and quality scores that are specific to the sequencing instruments, the sequences that fail to
meet the minimum quality thresholds are removed from the downstream analysis. Moreover, the
incomplete template extension during PCR amplification and sequencing can generate recombinant
sequences from different templates, known as “chimeras” [287]. These PCR artifacts can cause the
inflation of the microbial richness estimates [288, 289]. Several software options have been
developed to detect and remove chimeric sequences, such as ChimeraSlayer [288], UCHIME and
Perseus. It has been noted that these different methods often disagree with each other and the
impacts from such noise are generally greater on α diversity than β diversity [290].
After quality filtering, sequences are clustered into OTUs based on DNA sequence similarity.
The thresholds of sequence similarity vary for different taxonomic levels. A 97% identity in sequences
are the most common choice to denote bacterial species [291-293]. Closed-reference, open-
reference and de novo are three major OTU clustering algorithms for picking OTUs [290, 294, 295]. In
170
a reference-based clustering method, sequences are aligned to the known sequences in a reference
database and classified into taxonomic groups based on identity. Closed-reference algorithm is
exclusively based on the reference database of choice, in which reads without aligning to the
reference sequences are discarded. This is a good option for a quick classification with known
knowledge that microorganisms in samples are thoroughly covered in the reference database, but
throwing away the information from unknown sequences can be problematic. In contrast, de novo
method clusters 16S rRNA gene sequences solely based on similarity without any external references,
which has been shown to be the optimal method since it is not restricted by the completeness of
reference databases [296], but can be extremely time consuming. Open-reference clustering
algorithm is a two-step process, including picking OTUs using closed-reference method first and then
de novo clustering the unmatched sequences, thus leading to improved performance and retaining all
sequence data. In current microbiome studies, open-reference methods are usually recommended
due to its efficiency and completeness [297].
Following OTU clustering algorithms, the representative sequence of each OTU is compared
to a database for taxonomic classification that assigns OTUs with taxonomic labels. The most popular
classifiers include mothur [284] and the Ribosomal Database Project II (RDP) classifier [295]. For 16S
rRNA gene sequencing analysis, the Ribosomal Database Project (RDP) [298, 299], SILVA [300], and
Greengenes [301] are common reference databases used for OTU clustering in reference-based
methods as well as taxonomic assignment. It is important to address the version of the database,
since these databases are periodically updated and the taxonomic information can be changed over
time. In addition, the coverage of the databases can vary across different sample types. To date,
approximate 95% of the human gut microbiota is well represented in the databases, whereas the
microorganisms from other human body sites or mice gut are much less covered [302]. Sometimes,
because of lacking taxonomic information from the database, certain OTUs can only be assigned to
171
incomplete taxonomic labels with missing lower taxonomic ranks. As a result, a taxonomic assigned
OTU table containing the read counts for each taxon per sample is ready for the downstream
statistical analysis for comparisons of the microbial diversity and compositions between subjects,
subjects’ groups, and possibly within subjects over time.
Finally, the phylogenetic relationships among sequences can be inferred, either based on a
known phylogeny from the reference databases, such as SILVA or Greengenes, or by de novo
estimations using tools like NAST [303, 304] for sequence alignment and FastTree [305] for phylogeny
inference. This phylogenetic information is commonly incorporated into microbial diversity analysis
through the measures of Phylogenetic Diversity [306] and UniFrac distances [307].
Different from the standardized protocols for human genome sequences, the bioinformatic
processes in microbiome sequencing analysis should be carefully tailored for each specific study
design, which does not involve in the only decision of pipeline selections, but also the choices of each
step implemented in the pipeline as well as parameters set for single code lines. A home-made recipe
is often needed in practice. One study compared the 16S rRNA sequencing results from fecal samples
using QIIME, mothur and MG-RAST, suggesting similar microbial compositions were observed
regardless of the bioinformatics tools of choice [308]. However, from our hand-on experience, it may
not be true to a beginner in the human microbiome field, even using the QIIME pipeline
independently on the same sequencing data. The number of OTUs from all default settings in QIIME
was much greater than the number produced by an experienced bioinformatician using her own
QIIME pipeline, thus leading to the different results. Therefore, bioinformatic supports from an
expertise are usually required to generate robust microbiota data.
172
Statistical Analysis
The presence of a large number of microorganism that dynamically interact with the
environment, the host, and each other results in unique challenges in statistical analysis for a human
microbiome study. Human health can be affected by a single bacterium or a set of microbes working
together in a short term or chronically. As a result, the research questions range from “who are
there?”, “who are active?”, “what are they doing?” in one community, to comparing different
communities for “what are different?”, “how are they different?” and ultimately “why are they
different?”. A common workflow to analyze microbiota data includes diversity analysis for an overall
picture, differential abundance test to pin down a single microbe, clustering and network analysis
looking for the effected microbial groups in interactions with the ecosystem, and functional analysis
to link the potential mechanism. However, the last two methods are not always plausible, given the
limitation in available statistical tools and the requirements for a large sample size and/or
metagenomic data. The standard statistical analysis and visualization for microbial data have been
implemented in the pipelines, such as QIIME and mothur. Other software and R packages (such as
Vegan, Phyloseq and ggplot2) are also available and provide more flexibility for the specific needs
from data characteristics.
Data normalization
Microbiome data is particularly challenging, because it contains many zeros and sequence
counts often vary over several ranges of magnitude. In most time, sequences observed only once or
only in a single sample, known as singletons, are often removed from the downstream analysis. It is
common that in a sequencing run there are uneven sequence counts across samples simply due to
technical rather than biological reasons, leading to spurious diversity estimates. Thus, microbiota
data should be standardized across samples before the further analysis. Although it has been
173
debated recently, rarefying samples for normalization is the current standard implemented in most
microbiota data analysis, in which equal numbers of sequences are randomly selected from each
sample such that all samples have the same number of total counts. And the sequence count of a
sample with the smallest acceptable number of sequences is widely used as the threshold number to
draw [309, 310]. As a result, researchers have to face sometimes difficult trade-offs between
retaining as many sequences as possible without excluding too many samples with low sequence
counts. Moreover, this rarefaction is not an ideal normalization method, because it discards the
valuable information from high-sequence count samples and possibly introduce errors in the
following analysis [311]. So far, rarefaction is commonly used in diversity measures, but not
necessary to analyze OTU abundance for compositional analysis. Alternatives to rarefaction have
been proposed until recently, including scaling the counts by total sample sequence count or other
scaling factors [312-315], or performing transformations [316, 317]. However, each option has its
own statistical pitfalls subjecting to specific bias.
Diversity analysis
As a microecosystem in human, statistical analysis of human microbiota data adopts some
ecological concepts, such as α diversity, an estimate of mean diversity within a sample, and β
diversity, the comparison of diversities between samples. Methods for measuring α and β diversities
have been reviewed in details somewhere else [310, 318]. Here, we focus on the most common
diversity measurements found in published microbiota studies (Table 5.1).
α diversity describes the number and distribution of microorganisms within a single
community. The common measurements include the number of observed OTUs, Chao 1 [319],
Phylogenetic Diversity (PD) whole tree [306] and Shannon index [320] (Table 5.1) [310]. The first
three measures consider the presence or absence of OTUs (richness), while Shannon index also takes
174
into account OTU abundance (richness and evenness). The difference between Chao 1 and the
number of observed OTUs (a simple count of unique OTUs presenting in a sample) is Chao 1 is more
sensitive to rare taxa with low abundance since it deals with singletons and doubletons, which is not
applicable to the whole metagenome taxonomic profiling. Additionally, PD whole tree incorporates
phylogenetic information by summing the total branch length in a phylogenetic tree for each
member in a community. Therefore, a greater α diversity usually indicates more diverse microbial
composition in the community at taxa and/or phylogenetic level. Since the reduction in diversity has
been associated with microbial dysbiosis, thus possibly linking to human disease [321, 322], the
applications of α diversity in epidemiological studies are often to compare differences between
healthy and disease states.
For comparisons between samples, β diversity usually draws more attentions since it can
measure to extent to which two or more communities differ, thus evaluating how a microbial
community changes with different disease states, or over time. Measurements of β diversity create a
distance matrix between samples, taking into account the presence or absence of OTUs and their
abundance or phylogeny [310, 323]. Among them, Bray-Curtis [323] and UniFrac distances [307, 324]
have been widely applied in microbiome studies. Bray-Curtis calculates the distance between
samples quantitatively based on the abundance of OTUs in each sample, while phylogeny-based
UniFrac distances considers samples with more phylogenetically similar sequences to be more similar,
which can be calculated either weighted or unweighted for OTU abundance, known as weighted
UniFrac distance and unweighted UniFrac distance, respectively.
Once the distances between samples are determined, they can be compared by their
similarities. Currently, the most commonly used approach is to use principal coordinates analysis
(PCoA), which plots each sample in multidimensional space locating based on the calculated distance
matrix. The direction in this space separating the samples the most is the first principal coordinate,
175
and separating the second best is the second principal coordinate, and so on. Each principal
coordinate indicates the percent of the variation explained by this dimension. Thus, samples can be
visualized in a 2D- (the first two principal coordinates) or 3D- (the first three principal coordinates)
scatterplot, where the axes represent the principle coordinates. Samples that are ordinated closer to
one another are more similar than those ordinated further apart. Color coding the samples by
metadata can reveal the factors driving the separations between samples, for instance disease status.
The group differences and significance are commonly tested by permutational multivariate analysis
of variance (PERMANOVA), also known as nonparametric multivariate analysis of variance
(NPMANOVA) [325]. Other clustering methods by similarity are also available, such as neighbor-
joining method, the unweighted pair-group method with arithmetic mean (UPGMA), or squash
clustering approach, which can be visualized in a dendrogram or a principle component analysis (PCA)
plot [326].
However, it should be noted that those ecological metrics are mainly exploratory, which is
still unclear how to translate these measures into biological insights. Therefore, the differences in
diversity measures between samples should be interpreted with caution.
Compositional analysis
When comparing between samples or groups of samples, one fundamental question is what
is different in the microbial community compositions. In a standard OTU table derived from 16S rRNA
gene sequencing, abundance of OTU is the sequence count of the OTU for each sample. To remove
the effect of uneven sampling depth, relative abundance of OTU per sample is calculated through
dividing each OTU count by the total sequence counts for each sample to allow fair comparisons
between samples. And according to the sequencing method as well as the taxonomic rank of interest,
the number of OTUs or taxa can range from 10 to 10
5
or more. Thus, microbiota composition is often
176
represented by a zero-inflated, highly skewed, sparse data set with limited statistical power and
multiple comparison issue. Moreover, the correlation within the microbial community and the
interactions between the community and the ecosystem, as well as the potential variations from
technical sources greatly increase the difficulty to statistically test the association with a few states,
such as healthy and disease.
Many statistical methods have been developed to compare the absolute or relative
abundance of taxon between groups, known as differential abundance testing. In the earlier time,
nonparametric tests, such as Wilcoxon rank sum test for two-group tests or Kruskal-Wallis test for
multiple-group tests, are widely used, since OTU counts are not normally distributed [327]. However,
such nonparametric methods have limited use when other related taxa or covariates need to be
controlled in the analysis. Then, some statistical methods designed for RNA-Seq data, most
commonly including DESeq [327], DESeq2 [328], edgeR [329] and baySeq [330], have been applied to
microbiota data that contain much more zeros than RNA-Seq data. Accordingly, metagenomeSeq
[331] and analysis of composition of microbiomes (ANCOM) [313] have been designed specifically for
microbial compositional features. Of them, except for baySeq using an empirical Bayes method for
parameter estimation and ANCOM incorporating the nonparametric test, all the methods make some
assumptions for data distribution. Generalized linear models (GLMs) with variant choice of
distributional assumptions have long been considered in ecology for modeling count data, in which
zero-inflated lognormal and zero-inflated negative binomial (NB) models have been shown
advantages to deal with sparsity and uneven sampling depth, as well as to add random effects taking
care of incomplete matched or time series data [332-335]. In addition, all the mentioned approaches
above assume that all the associated taxa nested at each upper-level taxon have the same direction
of effect, which is often violated resulting in a substantial loss of power. This can be particularly
problematic for 16S rRNA gene sequencing results due to the relatively low resolution. As a result,
177
more sophisticate microbial association testing methods, including microbiome regression-based
kernel association test (MiRKAT), microbiome-based sum of powered score tests (MiSPU), and
optimal microbiome-based association test (OMiAT) with microbiome comprehensive association
mapping (MiCAM) [336], have been developed. To date, none of the available approaches is ideal,
and some of them keep acquiring newer techniques. Thus, choice of the differential abundance
testing methods should depend on the specific data structure.
The fact that relative abundances of taxa are compositional with inherent correlations is
often neglected by the classic differential abundance tests. Thus, samples can be grouped according
to the similar microbial compositions, whereas the overall microbiota community in samples can be
clustered into different sub-groups based on compositional or functional correlation. Either way is
particularly useful in reducing the complexity of the microbiota dataset and allowing epidemiological
studies with a large sample size. Classification methods have been proposed and severed for such
purpose of dimensionality reduction when the metadata is categorical, including supervised
approaches that requires knowledge of predefined groups to samples [337] and unsupervised
approaches without such prior knowledge about the samples [338]. Supervised classification
methods, such as random forests (RF) [339] and support vector machines (SVMs) [340], are usually
used to detect discriminatory taxa from the training data to predict the classification of the test
samples. A recent study incorporated the relative abundance of gut microbiota and the clinical
measure in stool to improve the sensitivity of an existing screening test for colonic lesions using a
random forest classification method [341]. Examples of other commonly used supervised
classification methods can be found somewhere else [337]. On the other hand, unsupervised
classifications (clustering) can distinct clusters susceptible to disease. Such clusters are usually
generated based on a between-sample distance matrix, such Bray-Curtis or UniFrac distance, which
taxa discriminating the classes can be further identified. For example, enterotype, an appealing
178
concept, clusters human gut microbiome samples into discrete types based on community
compositions [342]. However, this approach have been questioned recently, because the samples in
relation to communities may appear to be a continuous gradient or alternative structure other than a
discrete clustering pattern, and the grouping outcome can be highly influenced by both distance
metrics and clustering algorithms chosen [338, 343]. Other clustering methods, such PCoA and
Jensen-Shannon hierarchical clustering on Ward linkage, have also been used to assign communities
to samples [344, 345]. Current recommendation for unsupervised classification is that the clusters
should be created in several different methods to ensure the grouping outcome does not depend on
only one set of parameters [338].
Functional analysis
The most important but also the most difficult questions in human microbiome study are
“what can they do”, “who are active?” or “why are they different?”. Functional analysis provides
insights to link bacteria residing in human to their molecular mechanisms, and then ultimately to the
host’s functions. With the advancement in whole-genome sequencing technology, metagenomic or
metatranscriptomic analysis allows to identify the abundance of metabolically active bacteria in
relation to the host. Unfortunately, such functional analysis cannot be applied to 16S rRNA gene-
based study, since no microbial genome sequence is available. Recently, bioinformatics software,
such as PICRUSt [346] and Tax4Fun [347], have been developed to use inference algorithms to assign
metabolic functions to 16S rRNA genes by mapping their reads to reference genomes. However, their
results can only be served as exploratory tools for the further studies. Furthermore, when possible,
using animal models or in vitro tests in a well-controlled experimental setting to validate the findings
from sequencing methods is still the most convincing functional assays to address the scientific
questions.
179
Taken together, in such a young field with rapid progress in both technologies and analytic
tools, the key to conduct a human microbiome study is to use the current and approved methods,
keep everything consistent across samples or studies if possible, and document and report each step
as detailed as possible. Meanwhile, it has been challenging to compare results with previously
published studies. To detect any potential biases, the study design (study population, sample type
and process), the sequencing variable regions and platform, as well as data process and analysis,
should all be taken into consideration. Additionally, it is also essential that all the data should be
stored in a web-accessible database to allow comparisons across studies. Therefore, the
development of standardized protocols for a human microbiome study is an emerging area of inquiry.
Last but not least, although 16S rRNA-based methods still hold the current major market for the
human microbiota research, the improved technology along with the reduced cost make
metagenomic and metatranscriptomic analysis more and more favorable as functional information
can greatly advance the knowledge in this field. Accordingly, when designing a new study,
researchers should keep up with such change thoughtfully and technically.
5.3. Current Knowledges on the Gut Microbiome, Colorectal Polyps and CRC
5.3.1 the Gut Microbiome and CRC
Although the gut microbiome is important for maintaining health in the host, it has also been
associated with various diseases, such as inflammatory bowel disease (IBD) and obesity. It has been
established that up to 20% of the global cancer burden attributes to infectious agents [348]. Since
Helicobacter pylori serves the initial link between bacteria and gastric cancer [349], more attentions
have been paid to which and how the gut microbiota possibly influences the initiation and
development of CRC. The gastrointestinal (GI) tract is the largest body site bearing the human
180
microbiome. However, the distribution of the gut microbiota is uneven across the length of the GI
tract. The bacteria density is on average of 10
12
cells/ml in the large intestine that is about 10
10
times
more than that in the small intestine, which interestingly corresponds to an approximate 13-fold
increased risk for CRC compared to small intestine cancer [69, 108]. Significantly reduced tumor loads
have been found in the genetically mutated mouse and rat models with CRC susceptibility in the
absence of the intestinal microbiota than under a conventional condition [350-352]. With the recent
developments of culture-independent genome sequencing techniques, accumulating evidences have
observed CRC patients have the altered microbial diversities as well as compositions in their colonic
tissues and feces compared to healthy controls [353-361], although conflicts were found in some
studies. Moreover, the gut microbiota can be affected by a number of factors, such as age [123],
gender [194, 195], host genotype [164, 165], use of medicine (e.g. antibiotics) [145] and diet [149],
which are also identified as risk factors associated with colorectal carcinogenesis [1, 13], suggesting
these local residents in the colon may serve as functional modulators for CRC development in
response to environmental changes since they closely interact with host mucosal metabolic and
immune system. Therefore, regardless of whether the altered microbiome is an indicator of or
contributor to disease, it is becoming clear that the microbiome provides potential biomarkers that
could be tested for risk or presence of CRC.
Given the difficulties of separating the interplays between environmental exposures, host
genetics, and the gut microbiota, it is important to identify the microbial members or communities
that may be in the loop and associated with the initiation and progression of CRC before elucidating
the underlying molecular mechanisms. To date, more than a thousand studies have been published
regarding to the composition and structure of the CRC-associated community in both animal models
and human. However, the findings are highly heterogenous, partly due to the differences in study
designs, sample types, and technical sources. Additionally, it has been noted that the identified CRC-
181
associated bacteria were reported in various taxonomic ranks, generally from the higher rank phylum
to the lower ranks genus, species or strains, which may lead to obscure conclusions. For example, the
genus Bacteroides have been frequently detected in CRC-associated microbiome studies, but the
direction of the effects could be completely opposite. Within this genus, the species B. fragilis has
been suggested pathogenic roles in CRC development, while other species, such as B.
thetaiotaomicron, display beneficial commensals in producing nutrition as well as adapting
environmental changes and stress in the GI tract [362]. Besides, the members of the same bacterial
genus are not necessarily the closest relatives between each other, such as the genus Clostridium
that can be found in different bacterial families. Thus, the incomplete taxonomic ranks or incorrect
collapsing taxonomies can result in meaningless category. Meanwhile, the associations between the
recognized microbes and CRC should also be confirmed by functional assays in animal models or in
vitro experiments when possible to address the underlying molecular mechanism. As a result, we
summarized here the five bacterial species (Fusobacterium nucleatum, Streptococcus
bovis/gallolyticus, Bacteroides fragilis, Escherichia coli, Enterococcus faecalis) which association with
colorectal carcinogenesis have been shown by the most comprehensive evidences. Other candidate
oncogenic bacteria, such as Clostridium septicum and Helicobacter pylori, or potential beneficial
commensal Faecalibacterium prausnitzii in relation to CRC have been described in detail elsewhere
[363, 364].
Fusobacterium nucleatum
F. nucleatum is a Gram-negative anaerobic bacterium as a normal constituent of the human
oral cavity, which has been largely studied in periodontal health [365]. Recently, cumulative evidence
has reported the association between F. nucleatum and the increased risk of CRC. F. nucleatum is
rarely detected in the gut microbiota of healthy individuals, but its prevalence is enhanced in the
182
mucosa of CRC patients [366]. The higher proportion of F. nucleatum has been found in CRC tumor
compared to adjacent normal tissues, as well as in colorectal tissues with high-grade dysplasia and
advanced adenoma than those in healthy controls [367-369]. Independent studies using APC
Min/+
mice
suggest that F. nucleatum may not only modify the tumor microenvironment by activating the
immune response, but also directly impact on the tumor growth, leading to the increased tumor size
and number and shorten survival [370-373]. The gene expression study and in vitro tests have shown
that this pathogen is able to activate the TLR4/MYD88/NF-κB signaling pathway to induce the
expression of miRNA21 with oncogenic properties [372], accompanied with increased number of
myeloid-derived suppressor cells and tumor-associated neutrophils/macrophages that are known to
promote carcinogenesis and tumor progression [371]. Moreover, enhanced proliferation of CRC cells
is stimulated by F. nucleatum virulence factor binding to the cellular adhesion molecule E-cadherin
and the subsequent activation of several transcriptional factors [373]. As a result, F. nucleatum
accelerates CRC development without inducing colitis.
Streptococcus bovis/gallolyticus
S. bovis is a species of Gram-positive bacteria and has been linked to CRC for decades [374].
Enrichment of this bacteria has been found in both fecal and mucosal samples of CRC patients
compared to healthy controls, as well as in tumor tissues compared to non-tumor tissues [375-378]. S.
bovis has a potential role in inflammation-induced CRC, in which the infection with S. bovis results in
the increased expression of pro-inflammatory mediators (IL-1β, COX-2, and IL-8) and the release of
prostaglandin E2 (PGE2) in both tumor tissues and CRC epithelial cells [375, 379]. After
administration of S. bovis, the increased number of aberrant crypts with hyper-proliferative
properties were observed in azoxymethane (AOM)-treated rats, and more infected mice developed
polyps than uninfected ones [379, 380]. A recent meta-analysis suggests that the association
183
between S. bovis and CRC is stronger for patients infected with S. bovis biotype I (formerly S.
gallolyticus) than S. bovis biotype II-infected patients [377].
Bacteroides fragilis
B. fragilis is an obligately anaerobic, Gram-negative bacterium, which commonly resides in
the entire colon but is the least common Bacteroides in fecal microbiota [381]. There are two
subtypes of B. fragilis: nontoxigenic B. fragilis (NTBF) and enterotoxigenic B. fragilis (ETBF). The latter
produces an enterotoxin named “fragilysin” or B. fragilis metalloprotease toxin (BFT) to induce
diarrhea and promote the colon carcinogenesis [382]. B. fragilis has been associated with both CRC
and early-stage colorectal neoplasia, while ETBF is more likely to be found in late-stage CRC tumor
[383, 384]. Both bacteria B. fragilis and the bft gene are more abundant in stools from CRC patients
than healthy individuals [360, 361, 385]. ETBF has been shown to induce colitis and CRC in the
intestinal neoplasia mice [386-388]. Fragilysin from ETBF has the proteolytic activity to degrade tight
junction proteins and disrupts the epithelial barrier, thus leading to increased gut permeability and
damaged intestinal crypts [389-391]. ETBF also activates transcription-3 (STAT3) pathway with
characterization of Th17 response to induce the secretion of the pro-inflammatory cytokine IL-17
[392]. In addition, an in vitro study showed that BFT is able to degrade E-cadherin expressed in the
CRC cells and activate the oncogene c-myc, then promoting cell proliferation and DNA damage [393,
394].
Escherichia coli
E. coli, a Gram-negative, anaerobic commensal bacterium, is commonly found in the human
gut and has a symbiotic relationship with the host. Several studies have linked E. coli with CRC risk,
but not all the E. coli strains are pathogenic. The E. coli strains are divided into 4 phylogenetic groups:
184
A, B1, B2 and D. Of them, fecal samples are usually abundant with the strains from groups A and B1,
whereas groups B2 and D contain the most pathogenic strains, some of which are associated with
chronic inflammations and found the enrichment in CRC patients [395]. Significantly higher
prevalence of the E. coli strains has been observed in both mucosal samples and colonic biopsies
among adenoma and CRC patients than healthy controls, but no difference was found between
adenomas and carcinomas [396-401]. Clinical studies show cyclomodulins, virulence toxins produced
by some E. coli strains, can induce DNA damage and influence the cellular proliferation,
differentiation and apoptosis, thus such cyclomodulin-producing E. coli are more prevalent in most
cancerous samples, suggesting correlation with the poorer CRC prognosis [396, 398, 399, 402].
Besides, another group of the E. coli strains harboring the polyketide synthase (pks) pathogenicity
island has been associated with CRC, in which the pks gene coded colibactin, another genotoxin, can
induce DNA double-strand breaks and chromosomic instability, as well as increase gene mutations
probably via depleting the DNA mismatch repair system in vivo [403-405]. Recent in vitro
experiments have shown that pks-harboring E. coli strains are able to promote cancer cell growth and
survival by inducing pro-inflammatory and pro-carcinogenic mediators such as cyclooxygenase-2
(COX-2) and prostaglandin E2 (PGE2) accompanying with reactive oxygen species (ROS) production
[406-409]. Mice models for CRC, such as APC
Min/+
or xenograft mice, have confirmed the effect of the
pks-harboring E. coli strains to promote intestinal tumorigenesis [396, 410].
Enterococcus faecalis
E. faecalis, a Gram-positive anaerobic commensal bacterium, usually appears harmless in
human gut. However, recent studies have found E. faecalis are overrepresented in fecal samples from
CRC patients compared to healthy controls, as well as their tumors and the adjacent tissues
compared to the healthy mucosa [411, 412]. In il10-/- mice, infection with E. faecalis can promote
185
colitis and induce dysplasia and rectal carcinoma, probably via TLR2-mediated proinflammatory gene
expressions by inhibiting TGF-/Smad signaling pathway, thus inducing chronic inflammation [413,
414]. In addition, in vitro assays have shown that E. faecalis can produce extracellular superoxide and
hydrogen peroxide, members of reactive oxygen species (ROS), to induce DNA damage and
chromosomal instability in colonic cells, which have been associated with the initiation of CRC [415,
416]. The production of extracellular superoxide by E. faecalis infection is able to enhance COX-2
expression in macrophages and induce chromosomal instability (CIN) in colon epithelial cells, leading
to expression of progenitor and tumor stem cell markers and then driving CRC tumorigenesis [417,
418].
As described above, some bacterial species have been associated with colorectal
carcinogenesis, which may share the same molecular pathways. In summary, the underlying
molecular mechanisms include: 1) Production of genotoxins and other virulence factors by bacteria
that may help bacteria penetrate the gut mucosal barrier and invade intestinal epithelial cells, can
modulate certain host-derived signaling pathways to activate carcinogenesis-promoting pathways, or
can induce DNA damage and chromosomal instability to promote colorectal tumorigenesis; 2)
Microbial-derived metabolism affecting CRC development possibly via regulating the generation of
secondary bile acids that can promote CRC, metabolic activation or inactivation of pro-carcinogenic
pathways, or food phytochemicals on xenobiotics; 3) Modulation of host defenses and inflammation
pathways though both innate and adaptive immune responses; 4) Regulation of oxidative stress and
anti-oxidative defenses involving in ROS/NOS productions and the DNA repair system.
186
5.3.2 The Gut Microbiome and Colorectal Polyps
As nonsymptomatic preneoplastic lesions, colorectal polyps or adenomas are considered to
be the physiological initiation of colorectal adenoma-carcinoma sequence. However, unlike CRC-
associated microbial community that have been examined by thousands of studies, few, until most
recent, have focused on the relationship between the gut microbiota and this CRC precursor,
probably because colorectal polyps, a non-life threating condition, has low rate to progress to CRC,
and can be taken care easily during screening procedure, such as colonoscopy. Etiologically, the
presence of colorectal polyps is the consequence of some underlying factors susceptible to CRC, such
as host genetic predispositions or deleterious life styles, which unlikely change even after the
removal of polyps. Thus, the bacteria community at polyps may contain the microbial features for
colorectal carcinogenesis. On the other side,
human interventions, such as polypectomy during colonoscopy, medicine use, or dietary change, can
change the colorectal microenvironment, in response to which the composition and structure of the
gut microorganisms may be adjusted accordingly, although it is unclear how stable such changes can
be. As a result, dynamic competitions between the “good” and the “bad” bacteria thriving in the
colon occur at this preneoplastic stage as well as post-polypectomy, then potentially directing the
disease outcome. Therefore, the microbial profiling for colorectal polyps is important by the
recognition of the “bad” bacteria in CRC progression as well as the “good” bacteria to prevent CRC,
thus providing biomarkers for CRC diagnosis, prevention and therapy.
By searching on NCBI PubMed and Google scholar for “gut microbiome/microbiota” of “fecal
microbiome/microbiota” and “colon polyps” or “colon adenomas” in human study, we identified 18
published studies (Table 5.2) specifically designed to investigate the relationship between the gut
microbiota and colorectal polyps/adenomas, 9 of which also included CRC patients [341, 356, 368,
419-433]. Of them, 1 study used formalin-fixed tissues, 5 used mucosal biopsy samples while 10 used
187
fecal samples, and 2 include both fecal and mucosal samples. Despite the potential publication bias,
most of them suggested the gut microbial compositions in fecal or mucosal samples are different
between polyps or adenoma patients and healthy controls, and 4 studies [341, 356, 424, 426] also
found the different bacterial community between colorectal polyps and CRC. As discussed in the
previous session, although the fecal microbiota is considered as a proxy of the gut microbiome, it
contains distinct diversity and composition from the mucosal microbes, which has also been
confirmed by Flemer and colleagues [427]. Thus, fecal samples are more suitable as a screening
biomarker for frequent monitoring, while mucosal biopsies are better in characterizing disease-
associated bacteria or serving as a diagnostic tool. Given the convenience for sample collection and
the potential in CRC prevention, we are more interested in those studies using fecal samples, and
also proposed one for colorectal polyps using 16S rRNA gene sequencing method on human stools
(Chapter 6). After comparing with our study design and method, we focused on and summarized the
findings in fecal microbiota diversity and composition associated with colorectal polyps from 8
previously published studies [341, 356, 420, 424, 426, 428, 429, 432] in Table 5.3. The exclusion
criteria included distinct bioinformatics pipeline [431], no 16S rRNA sequencing results available for
fecal samples [422, 427], or lack of statistical tests [419]. Additionally, we also included the only
metagenomic sequencing study [424] to compare the diversity result but not for the polyp-associated
bacteria, since sequencing methods have greater impact on the on the microbial composition [434].
In terms of study design, the sample size in the selected 8 studies ranged from single digit to
3 digits per group. For the time regarding to sample collection, we noticed that fecal samples were
collected 1 week to 3 months before bowel preparation and colonoscopy in 2 studies, 1-4 weeks
after colonoscopy in 2 studies, 1-2 weeks before or after colonoscopy in 1 study, with no information
on timing in 2 study, and most interestingly Peters and colleagues [432] combined two sites’ samples
in which one site collected up to 4 months before colonoscopy and the other collected within 3 years
188
after colonoscopy. As mentioned above, except for one study [424], all the studies used 16S rRNA
gene sequencing techniques although no bioinformatics methods were the exactly same across the
studies.
Among the 8 studies (Table 5.3), 5 performed diversity tests for both α diversity and β
diversity comparing polyp patients to healthy controls. Peters et al. [432] reported significantly
decreased richness as well as evenness in adenoma cases, while Feng et al. [424] found fecal
microbiota from advanced adenoma patients presented a significantly higher richness but not
evenness. Using PCA or PCoA method on β diversity matrix, distinct microbial compositions between
adenoma patients and healthy individuals were observed by 3 studies [420, 429, 432].
To examine the polyp-associated taxa detected from these 7 fecal microbiota studies based
on 16S rRNA gene sequencing, we only listed those that have been reported in at least 2 independent
studies (Table 5.4), due to the worrisome of variances from different study designs and data process
methods. The taxonomic labels of the selected microorganisms ranked from the lowest level at genus
to the highest level at class. Because none of selected studies reported to species level given the low
resolution of 16S rRNA gene sequencing method, no bacteria species have repetitively observed. On
the other side, inconsistency is likely to be found in phylum, since it is such a large taxonomic
category to represent mixed results from all the included classes. In summary, by examining the
findings from the selected 7 fecal microbiota studies, we found a consistent enrichment of genus
Sutterella in fecal samples from polyp cases compared to healthy controls, whereas the genus
Bacteroides, the families Clostridiaceae and Lachnospiraceae, the order RF39, and the class Clostridia
had been repetitively reported depleted in fecal microbial community among adenoma patients. In
addition, based on the only study examining different subtypes of colorectal polyps by Peters and
colleagues [432], comparing to non-polyp controls, the genus Coprobacillus was less abundant in
both adenoma and hyperplastic polyp fecal samples, while the class Gammaproteobacteria was
189
found overrepresented in adenoma cases but less abundant in hyperplastic polyp cases, suggesting
the possible similarity or difference between in microbial compositions between these two subtypes.
Although the interpretations of the apparent correlations between microbes and colorectal
polyps identified using 16S rRNA gene sequencing technique on human fecal samples may be limited,
the overall profile in microbial composition at this CRC precursor stage is actually more appealing as
candidate biomarkers to promote the earlier identification of polyps or CRC as well as design
therapeutic agents to prevent polyp recurrence or CRC progression. Recently, Baxter et al. [341] and
Zackular et al. [426] have shown that the 16S rRNA-profiled microbial data from fecal samples in
accompany with clinical data can improve the accuracy of the existing screening methods in better
discrimination between healthy and colorectal polyps or CRC patients. Of course, when possible,
longitudinal study designs are always valuable to facilitate the identification of biomarkers that
present before the onset of CRC or detect changes in the microbiota along colorectal adenoma-
carcinoma sequence.
5.3.3 Driver-Passenger Bacterial Model
The current results from colorectal polyps- or CRC-associated microbiota studies are highly
heterogenous, although growing evidences have directly or indirectly linked a number of
microorganisms to CRC. The possible reasons include the lengthy developing time of disease with
various stages, the complexity and dynamics of the gut microbiome, as well as the technical and
analytical difficulties. Given the current knowledge, theories have been developed to address the
possible role of the gut microbiota in CRC initiation or progression. And the discrepancies are usually
explained by the biases from the difference in study designs and data process methods. Until recent,
a bacterial “driver-passenger” model was proposed for CRC, suggesting an temporal associations
between different bacteria and cancer stages [435]. In the intestinal mucosa of patients at risk of CRC,
190
one or several microbes can start colonizing and function as “drivers” of CRC due to their pro-
carcinogenic properties, which can produce DNA-damage compounds, increase cell proliferation and
intestinal barrier permeability, and induce chronic inflammation, thus leading to initiation of CRC.
Enterococcus faecalis, certain Escherichia coli strains, enterotoxigenic Bacteroides fragilis (ETBF), and
members of family Enterobacteriaceae such as Shigella, Citrobacter and Salmonella have been
suggested as the candidates of “driver” bacteria. Such “driver” bacteria are usually found
overrepresented in the early stages of CRC, such as colonic adenomas, but not in the tumor tissues as
the disease progresses. During the oncogenic process, the pro-carcinogenic effects from “driver”
bacteria can eventually alter the tumor microenvironment, then resulting in the selective outgrowth
of “passengers” bacteria that gradually replace the bacterial “drivers”. According to the individual
property of the new habitation, the possible “passengers” can be pathogens, such as Fusobacterium
nucleatum, Streptococcus bovis/gallolyticus and less evidence Clostridum septicum, promoting the
CRC development, or probiotic commensals (e.g. members of family Coriobacteriaceae including
some Slackia and Collinsella strains, genera Roseburia and Faecalibacterium) that can suppress the
tumor progression.
This hypothesized bacterial model, that provides a dynamic picture in how the bacterial
activities affect disease outcome along the colorectal adenoma-carcinoma sequence, is appealing but
still requires more scientific proof. The current supportive evidences for this hypothesis mostly came
from the gut microbial findings of cross-sectional or case-control studies either on the preneoplastic
lesions (colonic adenoma) or when disease onset (CRC tumors), with almost none of knowledge in
the between. Ideally, a prospective longitudinal microbiota study with frequent follow-ups to the
patients from the healthy to disease would be the best option to fill in the gap. However, given an
incubation period of at least 5-20 years for CRC and lack of standardized protocols in current human
microbiome study, the financial burden and technique immaturity obstruct the feasibility of such
191
study design at this very moment. Furthermore, human behaviors, such as colonoscopy, dietary
changes or medicine use, can have substantial impacts on the natural pathological relationship
between the gut microbiota and CRC. Therefore, snapshots in the gut microbiota composition on the
way from adenoma to carcinoma still add valuable knowledge by capturing the instantaneous
interactions between the microbial community and CRC progression in the colon, as well as any
uncontrolled human or environmental influences.
5.3.3 Clinical Applications
The current focus on the gut microbiota in CRC has been on identification of pathogenic
bacteria associated with colorectal carcinogenesis for the underlying molecular mechanisms, then
applying such knowledge for preventive, diagnostic and therapeutic purposes. For example,
Fusobacterium nucleatum is one of pathogens that have been studied most extensively in relation to
CRC. Studies have shown that F. nucleatum has been overrepresented in the fecal samples of CRC
patients and been observed as a gradient distribution along colorectal adenoma-carcinoma sequence,
in which the difference between adenoma patients and healthy individuals is much smaller than that
between CRC patients and healthy controls [367, 371, 426, 427]. This property may limit the use of F.
nucleatum as a preventive tool for detecting CRC at the early stage, but hold the potential as a
prognostic marker in CRC, given the recent evidence on a negative association between F. nucleatum
and CRC survival [436]. Besides, understanding the underlying mechanisms for how the candidate
pathogen influences tumorigenesis in the colon is also important to design a targeted CRC treatment.
Most gut microorganisms affect human health by shaping the host immune system. A microbial lectin
(Fap2) can mediates enrichment F. nucleatum in colorectal tumors by binding and activating TIGIT, an
immunoregulatory signaling receptor in T cells and NK cells, thus inhibiting killing of tumor cells by NK
and tumor-infiltrating lymphocytes [437]. Accordingly, the development of antibodies targeting F.
192
nucleatum Fap2 to restore the impaired host anti-tumor immunity could be used to treat F.
nucleatum-positive CRC. In addition, bacteria-induced DNA damage could also be targeted as a
strategy for CRC prevention. Animal studies and in vitro experiments on human cells have shown that
compounds blocking the synthesis on colibactin can suppress colibactin-induced DNA damage, then
consequently reduce tumor cell proliferation induced by pks-producing E. coli [438].
Instead of targeting single carcinogenic bacteria for CRC, the emergence of high-throughput,
high-resolution genome sequencing technology allows profiling individual’s microbiome cost-
efficiently. CRC is a multifactorial disease, of which the identified risk factors include genetic and
environment factors that also modulate the microbial compositions and functions in the colon. The
members of the gut microbiota likely cooperate rather than work alone to adapt the environmental
changes, shape the colorectal microenvironment, thus promoting the disease or reflecting the human
health conditions. Therefore, given the current limited knowledge on single candidate CRC-associated
microbe, detecting a group of bacteria as a characterized biomarker to discriminate healthy and
diseased state or different disease stages is also appealing for CRC early detection or diagnosis.
Moreover, fecal samples, as a surrogate for an overall view of the luminal and mucosal microbial
environment, can be frequently collected without invasive procedures, making the use of fecal
microbiota as a screening tool more feasible. Recent studies have shown that the 16S rRNA-profiled
microbial data from fecal samples in accompany with clinical data can improve the accuracy of the
existing screening methods [341, 426].
As a human “forgotten organ”, the main function of the gut microbiota is to maintain human
health rather than cause diseases. Some bacteria and their metabolites can manipulate the intestinal
environment by shaping immune development and altering immune response, then reducing chronic
inflammation [172, 439]. Thus, these beneficial microbes may not specifically target CRC, but
consequently decrease CRC risk. For instance, it has been long to speculate how dietary intervention
193
can contribute to CRC risk. The gut microbes can break down food into secondary metabolites which
can lead to both pro- and anti-inflammatory responses [440]. Meanwhile, dietary changes can alter
the colorectal bacterial composition even within a day [441, 442]. Therefore, strategies on dietary
interventions have been proposed to prevent or treat CRC by shifting the gut microbial community
towards anti-tumor properties. Direct administration of short-chain fatty acids (SCFAs) supplements,
the metabolites usually converted from fiber by the microbiota, has shown to reduce colonic
inflammation and cellular proliferation, thus leading to suppression of colorectal tumorigenesis in a
mouse model [443]. Additionally, a combination of two prebiotics galacto-oligosaccharide and inulin,
which also are products from microbial activities, has shown to inhibit aberrant crypt foci formation
in rats [444]. Moreover, some probiotics, particularly lactic acid bacteria-containing probiotics, have
been drawn more attentions in prevention of CRC development. Various Lactobacillus strains were
shown to inhibit the inflammatory environment associated with CRC by preventing DNA damage
[445], promoting expression of anti-inflammatory cytokines (e.g. IL-10) [446], or inducing cellular
apoptosis to reduce colorectal tumor growth [447].
Furthermore, from a therapeutic perspective, other non-CRC-associated microbes may alter
the tumor microenvironment in response to chemotherapy or immunotherapy by stimulating the
host immune system. The cancer immunotherapies using immune checkpoint blockage inhibitors
targeting the cytotoxic T lymphocyte-associated protein 4 (CTLA-4) and programmed death protein 1
(PD-1) pathways to activate T cell-medicated antitumor immunity have been successfully treated
several cancers, such as lung cancer and melanoma, but have limited effects on CRC treatment [448,
449]. Recent studies have shown that some Bacteroides species are required by anticancer
immunotherapy by CTLA-4 blockade in sarcomas [450], while commensal Bifidobacterium can
promote antitumor immunity, thus increasing anti-PD-L1 efficacy in melanoma [451]. These
194
evidences shed light on the potential treatments for CRC by manipulating the local microbial
sidekicks to directly target the tumors as well as facilitate other therapeutic methods.
Technological advancements in culture-independent methods, next-generation sequencing
and bioinformatics tools have opened the door to the role of the human gut microbiome in CRC
progression, leading to the potential for personalized medicine to improve CRC outcomes. However,
we still need more evidences to support the causal relationship between the gut microbiota and CRC
as well as understand their underlying biology, which is essential to plan any translational strategies
to CRC prevention, diagnosis and therapy.
195
Chapter 5 References:
1. World Cancer Research Fund, A.I.f.C.R., Food, Nutrition, Phsical Activiety, and the Prevention
of Cancer: a Global Perspective. 2007.
2. Muto, T., H. Bussey, and B. Morson, The evolution of cancer of the colon and rectum. Cancer,
1975. 36(6): p. 2251-2270.
3. Bosman, F.T., et al., WHO classification of tumours of the digestive system: Carcinoma of the
colon and rectum. 2010, IARC: World Health Organization.
4. Society, A.C., Colorectal Cancer Facts & Figures Special Edition 2005. Oklahoma City, OK:
American Cancer Society, 2005.
5. Conteduca, V., et al., Precancerous colorectal lesions. International journal of oncology, 2013.
43(4): p. 973-984.
6. Haggar, F.A. and R.P. Boushey, Colorectal cancer epidemiology: incidence, mortality, survival,
and risk factors. Clinics in colon and rectal surgery, 2009. 22(4): p. 191.
7. He, J. and J.E. Efron, Screening for colorectal cancer. Adv Surg, 2011. 45: p. 31-44.
8. Labianca, R., et al., Colorectal cancer: screening. Annals of oncology: official journal of the
European Society for Medical Oncology/ESMO, 2004. 16: p. ii127-32.
9. Levine, J.S. and D.J. Ahnen, Adenomatous polyps of the colon. New England Journal of
Medicine, 2006. 355(24): p. 2551-2557.
10. Fearon, E.R. and B. Vogelstein, A genetic model for colorectal tumorigenesis. Cell, 1990. 61(5):
p. 759-767.
11. Sobhani, I., et al., Microbial dysbiosis and colon carcinogenesis: could colon cancer be
considered a bacteria-related disease? Therap Adv Gastroenterol, 2013. 6(3): p. 215-29.
12. Baker, M., Stem cells: Fast and furious. Nature, 2009. 458(7241): p. 962-965.
13. World Cancer Research Fund, A.I.f.C.R., Colorectal Cancer 2011 Report: Food, Nutrition,
Physical Activity, and the Prevention of Colorectal Cancer (Continuous Update Project). 2011.
14. de Jong, A.E., et al., Prevalence of adenomas among young individuals at average risk for
colorectal cancer. The American journal of gastroenterology, 2005. 100(1): p. 139-143.
15. Bonnington, S.N. and M.D. Rutter, Surveillance of colonic polyps: are we getting it right?
World journal of gastroenterology, 2016. 22(6): p. 1925.
16. Labianca, R., et al., Colorectal cancer: screening. Ann Oncol, 2005. 16 Suppl 2: p. ii127-32.
17. Spring, K.J., et al., High prevalence of sessile serrated adenomas with BRAF mutations: a
prospective study of patients undergoing colonoscopy. Gastroenterology, 2006. 131(5): p.
1400-1407.
18. Schatzkin, A., et al., Interpreting precursor studies: what polyp trials tell us about large-bowel
cancer. Journal of the National Cancer Institute, 1994. 86(14): p. 1053-1057.
19. O'Brien, M.J., et al., The national polyp study. Gastroenterology, 1990. 98(2): p. 371-379.
20. Jass, J.R., Do all colorectal carcinomas arise in preexisting adenomas? World journal of
surgery, 1989. 13(1): p. 45-51.
21. Schuman, B.M., H. Simsek, and R.C. Lyons, The Association of Multiple Colonic Adenomatous
Polyps with Cancer of the Colon∗. American Journal of Gastroenterology, 1990. 85(7).
22. Eide, T.J., Risk of colorectal cancer in adenoma ‐bearing individuals within a defined
population. International journal of cancer, 1986. 38(2): p. 173-176.
23. Winawer, S.J., et al., Randomized comparison of surveillance intervals after colonoscopic
removal of newly diagnosed adenomatous polyps. New England Journal of Medicine, 1993.
328(13): p. 901-906.
24. Winawer, S.J., et al., Prevention of colorectal cancer by colonoscopic polypectomy. New
England Journal of Medicine, 1993. 329(27): p. 1977-1981.
196
25. Butterly, L.F., et al., Prevalence of clinically important histology in small adenomas. Clinical
Gastroenterology and Hepatology, 2006. 4(3): p. 343-348.
26. Lieberman, D., et al., Polyp size and advanced histology in patients undergoing colonoscopy
screening: implications for CT colonography. Gastroenterology, 2008. 135(4): p. 1100-1105.
27. Atkin, W.S., B.C. Morson, and J. Cuzick, Long-term risk of colorectal cancer after excision of
rectosigmoid adenomas. New England Journal of Medicine, 1992. 326(10): p. 658-662.
28. Cottet, V., et al., Long-term risk of colorectal cancer after adenoma removal: a population-
based cohort study. Gut, 2012. 61(8): p. 1180-1186.
29. Brenner, H., et al., Role of Colonoscopy and Polyp Characteristics in Colorectal Cancer After
Colonoscopic Polyp DetectionA Population-Based Case–Control Study. Annals of internal
medicine, 2012. 157(4): p. 225-232.
30. Brenner, H., et al., Risk of colorectal cancer after detection and removal of adenomas at
colonoscopy: population-based case-control study. Journal of clinical oncology, 2012. 30(24):
p. 2969-2976.
31. Brenner, H., et al., Case-control study supports extension of surveillance interval after
colonoscopic polypectomy to at least 5 yr. The American journal of gastroenterology, 2007.
102(8): p. 1739.
32. Martínez, M.E., et al., A pooled analysis of advanced colorectal neoplasia diagnoses after
colonoscopic polypectomy. Gastroenterology, 2009. 136(3): p. 832-841.
33. Hassan, C., et al., Systematic review with meta ‐analysis: the incidence of advanced
neoplasia after polypectomy in patients with and without low ‐risk adenomas. Alimentary
pharmacology & therapeutics, 2014. 39(9): p. 905-912.
34. Radtke, F. and H. Clevers, Self-renewal and cancer of the gut: two sides of a coin. Science,
2005. 307(5717): p. 1904-1909.
35. Zeki, S.S., T.A. Graham, and N.A. Wright, Stem cells and their implications for colorectal
cancer. Nature Reviews Gastroenterology and Hepatology, 2011. 8(2): p. 90-100.
36. Pino, M.S. and D.C. Chung, The chromosomal instability pathway in colon cancer.
Gastroenterology, 2010. 138(6): p. 2059-2072.
37. Lengauer, C., K.W. Kinzler, and B. Vogelstein, Genetic instabilities in human cancers. Nature,
1998. 396(6712): p. 643-649.
38. Vasen, H.F., et al., Revised guidelines for the clinical management of Lynch syndrome (HNPCC):
recommendations by a group of European experts. Gut, 2013: p. gutjnl-2012-304356.
39. Snover, D., Serrated polyps of the colon and rectum and serrated polyposis. WHO
classification of tumours of the digestive system, 2010: p. 160-165.
40. Leggett, B. and V. Whitehall, Role of the serrated pathway in colorectal cancer pathogenesis.
Gastroenterology, 2010. 138(6): p. 2088-2100.
41. Snover, D.C., Update on the serrated pathway to colorectal carcinoma. Human pathology,
2011. 42(1): p. 1-10.
42. Bettington, M., et al., The serrated pathway to colorectal carcinoma: current concepts and
challenges. Histopathology, 2013. 62(3): p. 367-386.
43. Clark, J.C., et al., Prevalence of polyps in an autopsy series from areas with varying incidence
of large ‐bowel cancer. International journal of cancer, 1985. 36(2): p. 179-186.
44. Morimoto, L.M., et al., Risk factors for hyperplastic and adenomatous polyps. Cancer
Epidemiology and Prevention Biomarkers, 2002. 11(10): p. 1012-1018.
45. Isbister, W.H., Hyperplastic polyps. ANZ Journal of Surgery, 1993. 63(3): p. 175-180.
46. DiSario, J.A., et al., Prevalence and malignant potential of colorectal polyps in asymptomatic,
average-risk men. American Journal of Gastroenterology, 1991. 86(8).
197
47. Farraye, F.A. and M. Wallace, Clinical significance of small polyps found during screening with
flexible sigmoidoscopy. Gastrointestinal endoscopy clinics of North America, 2002. 12(1): p.
41-51.
48. Provenzale, D., et al., Risk for colon adenomas in patients with rectosigmoid hyperplastic
polyps. Annals of internal medicine, 1990. 113(10): p. 760-763.
49. Zauber, A., et al. THE NATIONAL POLYP STUDY (NPS)-THE ASSOCIATION OF COLONIC
HYPERPLASTIC POLYPS AND ADENOMAS. in American Journal of Gastroenterology. 1988.
WILLIAMS & WILKINS 351 WEST CAMDEN ST, BALTIMORE, MD 21201-2436.
50. Sciallero, S., et al., Distal hyperplastic polyps do not predict proximal adenomas: results from
a multicentric study of colorectal adenomas. Gastrointestinal endoscopy, 1997. 46(2): p. 124-
130.
51. Ansher, A.F., et al., Hyperplastic colonic polyps as a marker for adenomatous colonic polyps.
American Journal of Gastroenterology, 1989. 84(2).
52. Provenzale, D., et al., Colon adenomas in patients with hyperplastic polyps. Journal of clinical
gastroenterology, 1988. 10(1): p. 46-49.
53. Greenberg, E.R., et al., A clinical trial of antioxidant vitamins to prevent colorectal adenoma.
New England journal of medicine, 1994. 331(3): p. 141-147.
54. Baron, J., et al., Calcium supplements for the prevention of colorectal adenomas. New
England Journal of Medicine, 1999. 340(2): p. 101-107.
55. Bensen, S.P., et al., Colorectal hyperplastic polyps and risk of recurrence of adenomas and
hyperplastic polyps. The Lancet, 1999. 354(9193): p. 1873-1874.
56. Jass, J.R., et al., Emerging concepts in colorectal neoplasia. Gastroenterology, 2002. 123(3): p.
862-876.
57. Iino, H., et al., DNA microsatellite instability in hyperplastic polyps, serrated adenomas, and
mixed polyps: a mild mutator pathway for colorectal cancer? Journal of Clinical Pathology,
1999. 52(1): p. 5-9.
58. Munding, J. and A. Tannapfel, Epidemiology of Colorectal Adenomas and Histopathological
Assessment of Endoscopic Specimens in the Colorectum. Visceral Medicine, 2014. 30(1): p. 10-
16.
59. Crockett, S.D., et al., Sessile serrated adenomas: an evidence-based guide to management.
Clinical Gastroenterology and Hepatology, 2015. 13(1): p. 11-26. e1.
60. Sweetser, S., T.C. Smyrk, and F.A. Sinicrope, Serrated colon polyps as precursors to colorectal
cancer. Clinical Gastroenterology and Hepatology, 2013. 11(7): p. 760-767.
61. Sheridan, T.B., et al., Sessile serrated adenomas with low-and high-grade dysplasia and early
carcinomas: an immunohistochemical study of serrated lesions “caught in the act”. American
journal of clinical pathology, 2006. 126(4): p. 564-571.
62. Higuchi, T., K. Sugihara, and J. Jass, Demographic and pathological characteristics of serrated
polyps of colorectum. Histopathology, 2005. 47(1): p. 32-40.
63. Mäkinen, M., Colorectal serrated adenocarcinoma. Histopathology, 2007. 50(1): p. 131-150.
64. Sawhney, M.S., et al., Microsatellite instability in interval colon cancers. Gastroenterology,
2006. 131(6): p. 1700-1705.
65. Whitehall, V.L., et al., Methylation of O–6-methylguanine DNA methyltransferase
characterizes a subset of colorectal cancer with low-level DNA microsatellite instability.
Cancer research, 2001. 61(3): p. 827-830.
66. Jass, J.R., et al., Characterisation of a subtype of colorectal cancer combining features of the
suppressor and mild mutator pathways. Journal of clinical pathology, 1999. 52(6): p. 455-460.
198
67. Aust, D.E., G.B. Baretton, and W.G.G.-P.o.t.G.S.o. Pathology, Serrated polyps of the colon and
rectum (hyperplastic polyps, sessile serrated adenomas, traditional serrated adenomas, and
mixed polyps)—proposal for diagnostic criteria. Virchows Archiv, 2010. 457(3): p. 291-297.
68. Rex, D.K., et al., Serrated lesions of the colorectum: review and recommendations from an
expert panel. The American journal of gastroenterology, 2012. 107(9): p. 1315-1329.
69. Society, A.C., Cancer facts and figures 2016. 2016, Atlanta: American Cancer Society
70. Levin, B., et al., Screening and surveillance for the early detection of colorectal cancer and
adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US
Multi ‐Society Task Force on Colorectal Cancer, and the American College of Radiology. CA: a
cancer journal for clinicians, 2008. 58(3): p. 130-160.
71. Qaseem, A., et al., Screening for colorectal cancer: a guidance statement from the American
College of Physicians. Annals of internal medicine, 2012. 156(5): p. 378-386.
72. Society., A.C., Colorectal Cancer Facts & Figures 2017-2019. Atlanta: American Cancer Society,
2017.
73. Brenner, H., C. Stock, and M. Hoffmeister, Effect of screening sigmoidoscopy and screening
colonoscopy on colorectal cancer incidence and mortality: systematic review and meta-
analysis of randomised controlled trials and observational studies. Bmj, 2014. 348: p. g2467.
74. Cotterchio, M., et al., Colorectal screening is associated with reduced colorectal cancer risk: a
case–control study within the population-based Ontario Familial Colorectal Cancer Registry.
Cancer Causes and Control, 2005. 16(7): p. 865-875.
75. Kahi, C.J., et al., Effect of screening colonoscopy on colorectal cancer incidence and mortality.
Clinical gastroenterology and hepatology, 2009. 7(7): p. 770-775.
76. Manser, C.N., et al., Colonoscopy screening markedly reduces the occurrence of colon
carcinomas and carcinoma-related death: a closed cohort study. Gastrointestinal endoscopy,
2012. 76(1): p. 110-117.
77. Doubeni, C.A., et al., Screening colonoscopy and risk for incident late-stage colorectal cancer
diagnosis in average-risk adults. Ann Intern Med, 2013. 158(5): p. 312-320.
78. Nishihara, R., et al., Long-term colorectal-cancer incidence and mortality after lower
endoscopy. New England Journal of Medicine, 2013. 369(12): p. 1095-1105.
79. Brenner, H., et al., Reduced risk of colorectal cancer up to 10 years after screening,
surveillance, or diagnostic colonoscopy. Gastroenterology, 2014. 146(3): p. 709-717.
80. Zauber, A.G., et al., Colonoscopic polypectomy and long-term prevention of colorectal-cancer
deaths. New England Journal of Medicine, 2012. 366(8): p. 687-696.
81. Jover, R., et al., Rationale and design of the European Polyp Surveillance (EPoS) trials.
Endoscopy, 2016. 48(06): p. 571-578.
82. Kaminski, M.F., et al., The NordICC Study: rationale and design of a randomized trial on
colonoscopy screening for colorectal cancer. Endoscopy, 2012. 44(07): p. 695-702.
83. ClinialTrials.gov. CONFIRM trial. Last update on August 10, 2017 on cite. Available from
https://clinicaltrials.gov/ct2/show/NCT01239082.
84. ClinicalTrials.gov. SCREESCO trials. Last update on October 2, 2017 on cite. Available from
https://clinicaltrials.gov/ct2/show/NCT02078804.
85. Saini, S.D., H.M. Kim, and P. Schoenfeld, Incidence of advanced adenomas at surveillance
colonoscopy in patients with a personal history of colon adenomas: a meta-analysis and
systematic review. Gastrointestinal endoscopy, 2006. 64(4): p. 614-626.
86. Yamaji, Y., et al., Incidence and recurrence rates of colorectal adenomas estimated by
annually repeated colonoscopies on asymptomatic Japanese. Gut, 2004. 53(4): p. 568-572.
87. Loeve, F., et al., Colorectal cancer risk in adenoma patients: A nation ‐wide study.
International journal of cancer, 2004. 111(1): p. 147-151.
199
88. Robertson, D.J., et al., Colorectal cancer in patients under close colonoscopic surveillance.
Gastroenterology, 2005. 129(1): p. 34-41.
89. Rex, D.K., et al., Colonoscopic miss rates of adenomas determined by back-to-back
colonoscopies. Gastroenterology, 1997. 112(1): p. 24-28.
90. Lieberman, D.A., et al., Guidelines for colonoscopy surveillance after screening and
polypectomy: a consensus update by the US Multi-Society Task Force on Colorectal Cancer.
Gastroenterology, 2012. 143(3): p. 844-857.
91. Kaminski, M.F., et al., Quality indicators for colonoscopy and the risk of interval cancer. New
England Journal of Medicine, 2010. 362(19): p. 1795-1803.
92. Bressler, B., et al., Rates of new or missed colorectal cancers after colonoscopy and their risk
factors: a population-based analysis. Gastroenterology, 2007. 132(1): p. 96-102.
93. Farrar, W.D., et al., Colorectal cancers found after a complete colonoscopy. Clinical
Gastroenterology and Hepatology, 2006. 4(10): p. 1259-1264.
94. Robertson, D.J., et al., Colorectal cancers soon after colonoscopy: a pooled multicohort
analysis. Gut, 2013: p. gutjnl-2012-303796.
95. Brenner, H., et al., Interval cancers after negative colonoscopy: population-based case-control
study. Gut, 2011: p. gutjnl-2011-301531.
96. le Clercq, C.M., et al., Postcolonoscopy colorectal cancers are preventable: a population-
based study. Gut, 2013: p. gutjnl-2013-304880.
97. Singh, S., et al., Prevalence, risk factors, and outcomes of interval colorectal cancers: a
systematic review and meta-analysis. The American journal of gastroenterology, 2014. 109(9):
p. 1375-1389.
98. Boland, C.R. and A. Goel, Microsatellite instability in colorectal cancer. Gastroenterology,
2010. 138(6): p. 2073-2087. e3.
99. Saini, S.D., et al., Why don't gastroenterologists follow colon polyp surveillance guidelines?:
results of a national survey. Journal of clinical gastroenterology, 2009. 43(6): p. 554-558.
100. Menees, S.B., et al., Adherence to recommended intervals for surveillance colonoscopy in
average-risk patients with 1 to 2 small (< 1 cm) polyps on screening colonoscopy.
Gastrointestinal endoscopy, 2014. 79(4): p. 551-557.
101. van Heijningen, E.-M.B., et al., Adherence to surveillance guidelines after removal of
colorectal adenomas: a large, community-based study. Gut, 2015: p. gutjnl-2013-306453.
102. Schatzkin, A., et al., Lack of effect of a low-fat, high-fiber diet on the recurrence of colorectal
adenomas. New England Journal of Medicine, 2000. 342(16): p. 1149-1155.
103. Martínez, M.E., et al., Adenoma characteristics as risk factors for recurrence of advanced
adenomas. Gastroenterology, 2001. 120(5): p. 1077-1083.
104. Lieberman, D.A., et al., Five-year colon surveillance after screening colonoscopy.
Gastroenterology, 2007. 133(4): p. 1077-1085.
105. Baron, J.A., et al., A randomized trial of aspirin to prevent colorectal adenomas. New England
Journal of Medicine, 2003. 348(10): p. 891-899.
106. Pabby, A., et al., Analysis of colorectal cancer occurrence during surveillance colonoscopy in
the dietary Polyp Prevention Trial. Gastrointestinal endoscopy, 2005. 61(3): p. 385-391.
107. Lagier, J.C., et al., Microbial culturomics: paradigm shift in the human gut microbiome study.
Clinical Microbiology and Infection, 2012. 18(12): p. 1185-1193.
108. Proctor, L.M., The human microbiome project in 2011 and beyond. Cell host & microbe, 2011.
10(4): p. 287-291.
109. Wu, G.D. and J.D. Lewis, Analysis of the human gut microbiome and association with disease.
Clinical gastroenterology and hepatology: the official clinical practice journal of the American
Gastroenterological Association, 2013. 11(7).
200
110. Chow, J., et al., Host–bacterial symbiosis in health and disease. Advances in immunology,
2010. 107: p. 243.
111. Fukuda, S., et al., Bifidobacteria can protect from enteropathogenic infection through
production of acetate. Nature, 2011. 469(7331): p. 543-547.
112. Sonnenburg, J.L., et al., Glycan foraging in vivo by an intestine-adapted bacterial symbiont.
Science, 2005. 307(5717): p. 1955-1959.
113. Cantarel, B.L., V. Lombard, and B. Henrissat, Complex carbohydrate utilization by the healthy
human microbiome. PloS one, 2012. 7(6): p. e28742.
114. Hullar, M.A., A.N. Burnett-Hartman, and J.W. Lampe, Gut microbes, diet, and cancer, in
Advances in nutrition and cancer. 2014, Springer. p. 377-399.
115. Jones, M.L., et al., The human microbiome and bile acid metabolism: dysbiosis,
dysmetabolism, disease and intervention. Expert opinion on biological therapy, 2014. 14(4): p.
467-482.
116. Akin, H. and N. Tozun, Diet, microbiota, and colorectal cancer. J Clin Gastroenterol, 2014. 48
Suppl 1: p. S67-9.
117. Hold, G.L., et al., Role of the gut microbiota in inflammatory bowel disease pathogenesis:
what have we learnt in the past 10 years. World J Gastroenterol, 2014. 20(5): p. 1192-1210.
118. Turnbaugh, P.J., et al., A core gut microbiome in obese and lean twins. nature, 2009.
457(7228): p. 480-484.
119. Candela, M., et al., Inflammation and colorectal cancer, when microbiota-host mutualism
breaks. World J Gastroenterol, 2014. 20(4): p. 908-922.
120. Larsen, N., et al., Gut microbiota in human adults with type 2 diabetes differs from non-
diabetic adults. PloS one, 2010. 5(2): p. e9085.
121. Palmer, C., et al., Development of the human infant intestinal microbiota. PLoS biology, 2007.
5(7): p. e177.
122. Koenig, J.E., et al., Succession of microbial consortia in the developing infant gut microbiome.
Proceedings of the National Academy of Sciences, 2011. 108(Supplement 1): p. 4578-4585.
123. Yatsunenko, T., et al., Human gut microbiome viewed across age and geography. Nature,
2012. 486(7402): p. 222-227.
124. Escherich, T., The intestinal bacteria of the neonate and breast-fed infant. Clinical Infectious
Diseases, 1988. 10(6): p. 1220-1225.
125. Dominguez-Bello, M.G., et al., Delivery mode shapes the acquisition and structure of the
initial microbiota across multiple body habitats in newborns. Proceedings of the National
Academy of Sciences, 2010. 107(26): p. 11971-11975.
126. Dominguez-Bello, M.G., et al., Partial restoration of the microbiota of cesarean-born infants
via vaginal microbial transfer. Nature medicine, 2016. 22(3): p. 250-253.
127. Biasucci, G., et al., Cesarean delivery may affect the early biodiversity of intestinal bacteria.
The Journal of nutrition, 2008. 138(9): p. 1796S-1800S.
128. Bokulich, N.A., et al., Antibiotics, birth mode, and diet shape microbiome maturation during
early life. Science translational medicine, 2016. 8(343): p. 343ra82-343ra82.
129. Bäckhed, F., et al., Dynamics and stabilization of the human gut microbiome during the first
year of life. Cell host & microbe, 2015. 17(5): p. 690-703.
130. Thompson, A.L., et al., Milk-and solid-feeding practices and daycare attendance are
associated with differences in bacterial diversity, predominant communities, and metabolic
and immune function of the infant gut microbiome. Frontiers in cellular and infection
microbiology, 2015. 5.
131. Azad, M.B., et al., Gut microbiota of healthy Canadian infants: profiles by mode of delivery
and infant diet at 4 months. Canadian Medical Association Journal, 2013. 185(5): p. 385-394.
201
132. Fallani, M., et al., Intestinal microbiota of 6-week-old infants across Europe: geographic
influence beyond delivery mode, breast-feeding, and antibiotics. Journal of pediatric
gastroenterology and nutrition, 2010. 51(1): p. 77-84.
133. Penders, J., et al., Factors influencing the composition of the intestinal microbiota in early
infancy. Pediatrics, 2006. 118(2): p. 511-521.
134. Rogier, E.W., et al., Secretory antibodies in breast milk promote long-term intestinal
homeostasis by regulating the gut microbiota and host gene expression. Proceedings of the
National Academy of Sciences, 2014. 111(8): p. 3074-3079.
135. Hardy, H., et al., Probiotics, prebiotics and immunomodulation of gut mucosal defences:
homeostasis and immunopathology. Nutrients, 2013. 5(6): p. 1869-1912.
136. Ravcheev, D.A., et al., Polysaccharides utilization in human gut bacterium Bacteroides
thetaiotaomicron: comparative genomics reconstruction of metabolic and regulatory
networks. BMC genomics, 2013. 14(1): p. 873.
137. Mangin, I., et al., Amoxicillin treatment modifies the composition of Bifidobacterium species in
infant intestinal microbiota. Anaerobe, 2010. 16(4): p. 433-438.
138. Nermes, M., et al., Furry pets modulate gut microbiota composition in infants at risk for
allergic disease. Journal of Allergy and Clinical Immunology, 2015. 136(6): p. 1688-1690. e1.
139. Hopkins, M., R. Sharp, and G. Macfarlane, Age and disease related changes in intestinal
bacterial populations assessed by cell culture, 16S rRNA abundance, and community cellular
fatty acid profiles. Gut, 2001. 48(2): p. 198-205.
140. Saulnier, D.M., et al., Gastrointestinal microbiome signatures of pediatric patients with
irritable bowel syndrome. Gastroenterology, 2011. 141(5): p. 1782-1791.
141. Consortium, H.M.P., Structure, function and diversity of the healthy human microbiome.
Nature, 2012. 486(7402): p. 207-214.
142. Ley, R.E., et al., Microbial ecology: human gut microbes associated with obesity. Nature, 2006.
444(7122): p. 1022-1023.
143. Turnbaugh, P.J., et al., An obesity-associated gut microbiome with increased capacity for
energy harvest. nature, 2006. 444(7122): p. 1027-131.
144. Qin, J., et al., A human gut microbial gene catalogue established by metagenomic sequencing.
nature, 2010. 464(7285): p. 59-65.
145. Jakobsson, H.E., et al., Short-term antibiotic treatment has differing long-term impacts on the
human throat and gut microbiome. PloS one, 2010. 5(3): p. e9836.
146. Dethlefsen, L. and D.A. Relman, Incomplete recovery and individualized responses of the
human distal gut microbiota to repeated antibiotic perturbation. Proceedings of the National
Academy of Sciences, 2011. 108(Supplement 1): p. 4554-4561.
147. Caporaso, J.G., et al., Moving pictures of the human microbiome. Genome biology, 2011.
12(5): p. R50.
148. Costello, E.K., et al., Bacterial community variation in human body habitats across space and
time. Science, 2009. 326(5960): p. 1694-1697.
149. David, L.A., et al., Diet rapidly and reproducibly alters the human gut microbiome. Nature,
2014. 505(7484): p. 559-563.
150. Folke, C., et al., Regime shifts, resilience, and biodiversity in ecosystem management. Annual
Review of Ecology, Evolution, and Systematics, 2004. 35.
151. Mariat, D., et al., The Firmicutes/Bacteroidetes ratio of the human microbiota changes with
age. BMC microbiology, 2009. 9(1): p. 123.
152. Claesson, M.J., et al., Composition, variability, and temporal stability of the intestinal
microbiota of the elderly. Proceedings of the National Academy of Sciences, 2011.
108(Supplement 1): p. 4586-4591.
202
153. van Tongeren, S.P., et al., Fecal microbiota composition and frailty. Applied and
environmental microbiology, 2005. 71(10): p. 6438-6442.
154. Claesson, M.J., et al., Gut microbiota composition correlates with diet and health in the
elderly. Nature, 2012. 488(7410): p. 178-184.
155. Voreades, N., A. Kozil, and T.L. Weir, Diet and the development of the human intestinal
microbiome. Frontiers in microbiology, 2014. 5.
156. Van de Merwe, J., J. Stegeman, and M. Hazenberg, The resident faecal flora is determined by
genetic characteristics of the host. Implications for Crohn's disease? Antonie Van
Leeuwenhoek, 1983. 49(2): p. 119-124.
157. Zoetendal, E.G., et al., The host genotype affects the bacterial community in the human
gastronintestinal tract. Microbial ecology in health and disease, 2001. 13(3): p. 129-134.
158. Stewart, J.A., V.S. Chadwick, and A. Murray, Investigations into the influence of host genetics
on the predominant eubacteria in the faecal microflora of children. Journal of medical
microbiology, 2005. 54(12): p. 1239-1242.
159. Goodrich, J.K., et al., Human genetics shape the gut microbiome. Cell, 2014. 159(4): p. 789-
799.
160. Teucher, B., et al., Dietary patterns and heritability of food choice in a UK female twin cohort.
Twin Research and Human Genetics, 2007. 10(5): p. 734-748.
161. Vinkhuyzen, A., et al., Genetic influences on ‘environmental’factors. Genes, Brain and
Behavior, 2010. 9(3): p. 276-287.
162. McKnite, A.M., et al., Murine gut microbiota is defined by host genetics and modulates
variation of metabolic traits. PloS one, 2012. 7(6): p. e39191.
163. Leamy, L.J., et al., Host genetics and diet, but not immunoglobulin A expression, converge to
shape compositional features of the gut microbiome in an advanced intercross population of
mice. Genome biology, 2014. 15(12): p. 552.
164. Benson, A.K., et al., Individuality in gut microbiota composition is a complex polygenic trait
shaped by multiple environmental and host genetic factors. Proceedings of the National
Academy of Sciences, 2010. 107(44): p. 18933-18938.
165. Spor, A., O. Koren, and R. Ley, Unravelling the effects of the environment and host genotype
on the gut microbiome. Nature Reviews Microbiology, 2011. 9(4): p. 279-290.
166. Goodrich, J.K., et al., Genetic determinants of the gut microbiome in UK twins. Cell host &
microbe, 2016. 19(5): p. 731-743.
167. Nøhr, M.K., et al., GPR41/FFAR3 and GPR43/FFAR2 as cosensors for short-chain fatty acids in
enteroendocrine cells vs FFAR3 in enteric neurons and FFAR2 in enteric leukocytes.
Endocrinology, 2013. 154(10): p. 3552-3564.
168. Delaere, F., et al., The role of sodium-coupled glucose co-transporter 3 in the satiety effect of
portal glucose sensing. Molecular metabolism, 2013. 2(1): p. 47-53.
169. Cani, P.D., et al., Gut microbiota fermentation of prebiotics increases satietogenic and incretin
gut peptide production with consequences for appetite sensation and glucose response after
a meal. The American journal of clinical nutrition, 2009. 90(5): p. 1236-1243.
170. Plöger, S., et al., Microbial butyrate and its role for barrier function in the gastrointestinal
tract. Annals of the New York Academy of Sciences, 2012. 1258(1): p. 52-59.
171. Maslowski, K.M., et al., Regulation of inflammatory responses by gut microbiota and
chemoattractant receptor GPR43. Nature, 2009. 461(7268): p. 1282-1286.
172. Arpaia, N., et al., Metabolites produced by commensal bacteria promote peripheral
regulatory T-cell generation. Nature, 2013. 504(7480): p. 451-455.
203
173. Cani, P.D., et al., Changes in gut microbiota control metabolic endotoxemia-induced
inflammation in high-fat diet–induced obesity and diabetes in mice. Diabetes, 2008. 57(6): p.
1470-1481.
174. Choi, Y., et al., Gut microbe-derived extracellular vesicles induce insulin resistance, thereby
impairing glucose metabolism in skeletal muscle. Scientific reports, 2015. 5.
175. Turnbaugh, P.J., et al., The effect of diet on the human gut microbiome: a metagenomic
analysis in humanized gnotobiotic mice. Science translational medicine, 2009. 1(6): p. 6ra14-
6ra14.
176. Wu, G.D., et al., Linking long-term dietary patterns with gut microbial enterotypes. Science,
2011. 334(6052): p. 105-108.
177. Cani, P.D., et al., Selective increases of bifidobacteria in gut microflora improve high-fat-diet-
induced diabetes in mice through a mechanism associated with endotoxaemia. Diabetologia,
2007. 50(11): p. 2374-2383.
178. Moore, J.H., et al., Defined nutrient diets alter susceptibility to clostridium difficile associated
disease in a murine model. PloS one, 2015. 10(7): p. e0131829.
179. Ventura, M., et al., Genomics of Actinobacteria: tracing the evolutionary history of an ancient
phylum. Microbiology and molecular biology reviews, 2007. 71(3): p. 495-548.
180. Rijkers, G.T., et al., Health benefits and health claims of probiotics: bridging science and
marketing. British Journal of Nutrition, 2011. 106(9): p. 1291-1296.
181. Lisko, D.J., G.P. Johnston, and C.G. Johnston, Effects of Dietary Yogurt on the Healthy Human
Gastrointestinal (GI) Microbiome. Microorganisms, 2017. 5(1): p. 6.
182. Collado, M., J. Meriluoto, and S. Salminen, Role of commercial probiotic strains against
human pathogen adhesion to intestinal mucus. Letters in applied microbiology, 2007. 45(4): p.
454-460.
183. Isolauri, E., Probiotics in human disease. The American journal of clinical nutrition, 2001.
73(6): p. 1142S-1146S.
184. Eloe-Fadrosh, E.A., et al., Functional dynamics of the gut microbiome in elderly people during
probiotic consumption. MBio, 2015. 6(2): p. e00231-15.
185. Druart, C., et al., Modulation of the gut microbiota by nutrients with prebiotic and probiotic
properties. Advances in Nutrition: An International Review Journal, 2014. 5(5): p. 624S-633S.
186. Rafter, J., et al., Dietary synbiotics reduce cancer risk factors in polypectomized and colon
cancer patients. The American journal of clinical nutrition, 2007. 85(2): p. 488-496.
187. Azad, M.B., et al., Infant gut microbiota and the hygiene hypothesis of allergic disease: impact
of household pets and siblings on microbiota composition and diversity. Allergy, Asthma &
Clinical Immunology, 2013. 9(1): p. 15.
188. Dethlefsen, L., et al., The pervasive effects of an antibiotic on the human gut microbiota, as
revealed by deep 16S rRNA sequencing. PLoS biology, 2008. 6(11): p. e280.
189. Sullivan, Å., C. Edlund, and C.E. Nord, Effect of antimicrobial agents on the ecological balance
of human microflora. The Lancet infectious diseases, 2001. 1(2): p. 101-114.
190. Jernberg, C., et al., Long-term ecological impacts of antibiotic administration on the human
intestinal microbiota. The ISME journal, 2007. 1(1): p. 56-66.
191. Sommer, M.O., G. Dantas, and G.M. Church, Functional characterization of the antibiotic
resistance reservoir in the human microflora. science, 2009. 325(5944): p. 1128-1131.
192. Gagliani, N., et al., The fire within: microbes inflame tumors. Cell, 2014. 157(4): p. 776-783.
193. Nobel, Y.R., et al., Metabolic and metagenomic outcomes from early-life pulsed antibiotic
treatment. Nature communications, 2015. 6.
194. Ding, T. and P.D. Schloss, Dynamics and associations of microbial community types across the
human body. Nature, 2014. 509(7500): p. 357-360.
204
195. Markle, J.G., et al., Sex differences in the gut microbiome drive hormone-dependent
regulation of autoimmunity. Science, 2013. 339(6123): p. 1084-1088.
196. Nakayama, J., et al., Diversity in gut bacterial community of school-age children in Asia.
Scientific reports, 2015. 5.
197. Schuebel, K.E., et al., Comparing the DNA hypermethylome with gene mutations in human
colorectal cancer. PLoS Genet, 2007. 3(9): p. e157.
198. Tyakht, A.V., et al., Human gut microbiota community structures in urban and rural
populations in Russia. Nature communications, 2013. 4.
199. Lax, S., et al., Longitudinal analysis of microbial interaction between humans and the indoor
environment. Science, 2014. 345(6200): p. 1048-1052.
200. de Mello, R.M., et al., Lactobacilli and bifidobacteria in the feces of schoolchildren of two
different socioeconomic groups: children from a favela and children from a private school.
Jornal de pediatria, 2009. 85(4): p. 307-314.
201. Smits, S.A., et al., Seasonal cycling in the gut microbiome of the Hadza hunter-gatherers of
Tanzania. Science, 2017. 357(6353): p. 802-806.
202. Davenport, E.R., et al., Seasonal variation in human gut microbiome composition. PloS one,
2014. 9(3): p. e90731.
203. De Filippo, C., et al., Impact of diet in shaping gut microbiota revealed by a comparative study
in children from Europe and rural Africa. Proceedings of the National Academy of Sciences,
2010. 107(33): p. 14691-14696.
204. Rowland, I., et al., Gut microbiota functions: metabolism of nutrients and other food
components. European Journal of Nutrition, 2017: p. 1-24.
205. van der Beek, C.M., et al., Role of short-chain fatty acids in colonic inflammation,
carcinogenesis, and mucosal protection and healing. Nutrition Reviews, 2017. 75(4): p. 286-
305.
206. Hill, M.J., Intestinal flora and endogenous vitamin synthesis. Eur J Cancer Prev, 1997. 6 Suppl
1: p. S43-5.
207. Neis, E.P., C.H. Dejong, and S.S. Rensen, The role of microbial amino acid metabolism in host
metabolism. Nutrients, 2015. 7(4): p. 2930-2946.
208. Ajouz, H., D. Mukherji, and A. Shamseddine, Secondary bile acids: an underrecognized cause
of colon cancer. World journal of surgical oncology, 2014. 12(1): p. 164.
209. Wang, H.-B., et al., Butyrate enhances intestinal epithelial barrier function via up-regulation
of tight junction protein Claudin-1 transcription. Digestive diseases and sciences, 2012. 57(12):
p. 3126-3135.
210. Gaudier, E., et al., Butyrate specifically modulates MUC gene expression in intestinal epithelial
goblet cells deprived of glucose. American Journal of Physiology-Gastrointestinal and Liver
Physiology, 2004. 287(6): p. G1168-G1174.
211. Johansson, M.E., et al., The inner of the two Muc2 mucin-dependent mucus layers in colon is
devoid of bacteria. Proceedings of the national academy of sciences, 2008. 105(39): p. 15064-
15069.
212. Hernández-Chirlaque, C., et al., Germ-free and antibiotic-treated mice are highly susceptible
to epithelial injury in DSS colitis. Journal of Crohn's and Colitis, 2016. 10(11): p. 1324-1335.
213. Mason, K.L., et al., Overview of gut immunology, in GI microbiota and regulation of the
immune system. 2008, Springer. p. 1-14.
214. Zmora, N., et al., Inflammasomes and intestinal inflammation. Mucosal Immunol, 2017. 10(4):
p. 865-883.
215. Powell, N. and T.T. MacDonald, Recent advances in gut immunology. Parasite Immunol, 2017.
39(6).
205
216. O'Hara, A.M. and F. Shanahan, The gut flora as a forgotten organ. EMBO reports, 2006. 7(7):
p. 688-693.
217. Bain, C.C. and A.M. Mowat, Macrophages in intestinal homeostasis and inflammation.
Immunological reviews, 2014. 260(1): p. 102-117.
218. Zindl, C.L., et al., IL-22–producing neutrophils contribute to antimicrobial defense and
restitution of colonic epithelial integrity during colitis. Proceedings of the National Academy
of Sciences, 2013. 110(31): p. 12768-12773.
219. Chu, V.T., et al., Eosinophils promote generation and maintenance of immunoglobulin-A-
expressing plasma cells and contribute to gut immune homeostasis. Immunity, 2014. 40(4): p.
582-593.
220. Bekiaris, V., E.K. Persson, and W.W. Agace, Intestinal dendritic cells in the regulation of
mucosal immunity. Immunological reviews, 2014. 260(1): p. 86-101.
221. Mortha, A., et al., Microbiota-dependent crosstalk between macrophages and ILC3 promotes
intestinal homeostasis. Science, 2014. 343(6178): p. 1249288.
222. Slack, E., M.L. Balmer, and A.J. Macpherson, B cells as a critical node in the microbiota–host
immune system network. Immunological reviews, 2014. 260(1): p. 50-66.
223. Lee, Y.K. and S.K. Mazmanian, Has the microbiota played a critical role in the evolution of the
adaptive immune system? Science, 2010. 330(6012): p. 1768-1773.
224. Round, J.L. and S.K. Mazmanian, The gut microbiota shapes intestinal immune responses
during health and disease. Nature Reviews Immunology, 2009. 9(5): p. 313-323.
225. Peterson, D.A., et al., IgA response to symbiotic bacteria as a mediator of gut homeostasis.
Cell host & microbe, 2007. 2(5): p. 328-339.
226. Macpherson, A.J., M.B. Geuking, and K.D. McCoy, Immunoglobulin A: a bridge between
innate and adaptive immunity. Current opinion in gastroenterology, 2011. 27(6): p. 529-533.
227. Ivanov, I.I., et al., Induction of intestinal Th17 cells by segmented filamentous bacteria. Cell,
2009. 139(3): p. 485-498.
228. Rupnik, M., M.H. Wilcox, and D.N. Gerding, Clostridium difficile infection: new developments
in epidemiology and pathogenesis. Nature Reviews Microbiology, 2009. 7(7): p. 526-536.
229. Bammann, L., W. Clark, and R. Gibbons, Impaired colonization of gnotobiotic and
conventional rats by streptomycin-resistant strains of Streptococcus mutans. Infection and
immunity, 1978. 22(3): p. 721-726.
230. Schamberger, G.P. and F. Diez-Gonzalez, Selection of recently isolated colicinogenic
Escherichia coli strains inhibitory to Escherichia coli O157: H7. Journal of food protection,
2002. 65(9): p. 1381-1387.
231. Marteyn, B., et al., Modulation of Shigella virulence in response to available oxygen in vivo.
Nature, 2010. 465(7296): p. 355-358.
232. Gantois, I., et al., Butyrate specifically down-regulates Salmonella pathogenicity island 1 gene
expression. Applied and environmental microbiology, 2006. 72(1): p. 946-949.
233. Boleij, A. and H. Tjalsma, Gut bacteria in health and disease: a survey on the interface
between intestinal microbiology and colorectal cancer. Biological Reviews, 2012. 87(3): p.
701-730.
234. Momose, Y., K. Hirayama, and K. Itoh, Competition for proline between indigenous
Escherichia coli and E. coli O157: H7 in gnotobiotic mice associated with infant intestinal
microbiota and its contribution to the colonization resistance against E. coli O157: H7.
Antonie Van Leeuwenhoek, 2008. 94(2): p. 165-171.
235. Salzman, N.H., M.A. Underwood, and C.L. Bevins. Paneth cells, defensins, and the commensal
microbiota: a hypothesis on intimate interplay at the intestinal mucosa. in Seminars in
immunology. 2007. Elsevier.
206
236. Giel, J.L., et al., Metabolism of bile salts in mice influences spore germination in Clostridium
difficile. PloS one, 2010. 5(1): p. e8740.
237. Claesson, M.J., A.G. Clooney, and P.W. O'toole, A clinician's guide to microbiome analysis.
Nature Reviews Gastroenterology & Hepatology, 2017.
238. Petrosino, J.F., et al., Metagenomic pyrosequencing and microbial identification. Clinical
chemistry, 2009. 55(5): p. 856-866.
239. Acinas, S.G., et al., Divergence and redundancy of 16S rRNA sequences in genomes with
multiple rrn operons. Journal of bacteriology, 2004. 186(9): p. 2629-2635.
240. Lavelle, A., et al., Spatial variation of the colonic microbiota in patients with ulcerative colitis
and control volunteers. Gut, 2015: p. gutjnl-2014-307873.
241. Donaldson, G.P., S.M. Lee, and S.K. Mazmanian, Gut biogeography of the bacterial microbiota.
Nature Reviews Microbiology, 2016. 14(1): p. 20-32.
242. Sender, R., S. Fuchs, and R. Milo, Revised estimates for the number of human and bacteria
cells in the body. PLoS biology, 2016. 14(8): p. e1002533.
243. Shobar, R.M., et al., The effects of bowel preparation on microbiota-related metrics differ in
health and in inflammatory bowel disease and for the mucosal and luminal microbiota
compartments. Clinical and translational gastroenterology, 2016. 7(2): p. e143.
244. Jalanka, J., et al., Effects of bowel cleansing on the intestinal microbiota. Gut, 2014: p. gutjnl-
2014-307240.
245. O’Brien, C.L., et al., Impact of colonoscopy bowel preparation on intestinal microbiota. PLoS
One, 2013. 8(5): p. e62815.
246. Harrell, L., et al., Standard colonic lavage alters the natural state of mucosal-associated
microbiota in the human colon. PLoS One, 2012. 7(2): p. e32545.
247. Mai, V., et al., Effect of bowel preparation and colonoscopy on post-procedure intestinal
microbiota composition. Gut, 2006. 55(12): p. 1822-1823.
248. Mai, V. and O.C. Stine, Bowel preparation for colonoscopy: relevant for the gut’s microbiota?
Gut, 2015: p. gutjnl-2014-308937.
249. Flores, R., et al., Assessment of the human faecal microbiota: II. Reproducibility and
associations of 16S rRNA pyrosequences. European journal of clinical investigation, 2012.
42(8): p. 855-863.
250. Consortium, H.M.P., A framework for human microbiome research. Nature, 2012. 486(7402):
p. 215-221.
251. Faith, J.J., et al., The long-term stability of the human gut microbiota. Science, 2013.
341(6141): p. 1237439.
252. Feigelson, H.S., et al., Feasibility of self-collection of fecal specimens by randomly sampled
women for health-related studies of the gut microbiome. BMC research notes, 2014. 7(1): p.
204.
253. Tedjo, D.I., et al., The effect of sampling and storage on the fecal microbiota composition in
healthy and diseased subjects. PloS one, 2015. 10(5): p. e0126685.
254. Dominianni, C., et al., Comparison of methods for fecal microbiome biospecimen collection.
BMC microbiology, 2014. 14(1): p. 103.
255. Lauber, C.L., et al., Effect of storage conditions on the assessment of bacterial community
structure in soil and human-associated samples. FEMS microbiology letters, 2010. 307(1): p.
80-86.
256. Fouhy, F., et al., The effects of freezing on faecal microbiota as determined using MiSeq
sequencing and culture-based investigations. PloS one, 2015. 10(3): p. e0119355.
257. Wu, G.D., et al., Sampling and pyrosequencing methods for characterizing bacterial
communities in the human gut using 16S sequence tags. BMC microbiology, 2010. 10(1): p. 1.
207
258. Bahl, M.I., A. Bergström, and T.R. Licht, Freezing fecal samples prior to DNA extraction affects
the Firmicutes to Bacteroidetes ratio determined by downstream quantitative PCR analysis.
FEMS microbiology letters, 2012. 329(2): p. 193-197.
259. Flores, R., et al., Collection media and delayed freezing effects on microbial composition of
human stool. Microbiome, 2015. 3(1): p. 33.
260. Sergeant, M.J., et al., High-throughput sequencing of 16S rRNA gene amplicons: effects of
extraction procedure, primer length and annealing temperature. PloS one, 2012. 7(5): p.
e38094.
261. Choo, J.M., L.E. Leong, and G.B. Rogers, Sample storage conditions significantly influence
faecal microbiome profiles. Scientific reports, 2015. 5.
262. Sherker, A.R., et al., Optimal preservation of liver biopsy samples for downstream
translational applications. Hepatology international, 2013. 7(2): p. 758-766.
263. Nechvatal, J.M., et al., Fecal collection, ambient preservation, and DNA extraction for PCR
amplification of bacterial and human markers from human feces. Journal of microbiological
methods, 2008. 72(2): p. 124-132.
264. Yuan, S., et al., Evaluation of methods for the extraction and purification of DNA from the
human microbiome. PloS one, 2012. 7(3): p. e33865.
265. Salonen, A., et al., Comparative analysis of fecal DNA extraction methods with phylogenetic
microarray: effective recovery of bacterial and archaeal DNA using mechanical cell lysis.
Journal of microbiological methods, 2010. 81(2): p. 127-134.
266. Santiago, A., et al., Processing faecal samples: a step forward for standards in microbial
community analysis. BMC microbiology, 2014. 14(1): p. 112.
267. Ariefdjohan, M.W., D.A. Savaiano, and C.H. Nakatsu, Comparison of DNA extraction kits for
PCR-DGGE analysis of human intestinal microbial communities from fecal specimens.
Nutrition journal, 2010. 9(1): p. 23.
268. Kennedy, N.A., et al., The impact of different DNA extraction kits and laboratories upon the
assessment of human gut microbiota composition by 16S rRNA gene sequencing. PloS one,
2014. 9(2): p. e88982.
269. Hart, M.L., et al., Comparative evaluation of DNA extraction methods from feces of multiple
host species for downstream next-generation sequencing. PloS one, 2015. 10(11): p.
e0143334.
270. Gerasimidis, K., et al., The effect of DNA extraction methodology on gut microbiota research
applications. BMC research notes, 2016. 9(1): p. 365.
271. Wesolowska-Andersen, A., et al., Choice of bacterial DNA extraction method from fecal
material influences community structure as evaluated by metagenomic analysis. Microbiome,
2014. 2(1): p. 19.
272. Mirsepasi, H., et al., Microbial diversity in fecal samples depends on DNA extraction method:
easyMag DNA extraction compared to QIAamp DNA stool mini kit extraction. BMC research
notes, 2014. 7(1): p. 50.
273. Becker, L., et al., Comparison of six commercial kits to extract bacterial chromosome and
plasmid DNA for MiSeq sequencing. Scientific reports, 2016. 6.
274. Salter, S.J., et al., Reagent and laboratory contamination can critically impact sequence-based
microbiome analyses. BMC biology, 2014. 12(1): p. 87.
275. Tremblay, J., et al., Primer and platform effects on 16S rRNA tag sequencing. Frontiers in
microbiology, 2015. 6.
276. Schloss, P.D., D. Gevers, and S.L. Westcott, Reducing the effects of PCR amplification and
sequencing artifacts on 16S rRNA-based studies. PloS one, 2011. 6(12): p. e27310.
208
277. Soergel, D.A., et al., Selection of primers for optimal taxonomic classification of environmental
16S rRNA gene sequences. The ISME journal, 2012. 6(7): p. 1440-1444.
278. Liu, Z., et al., Accurate taxonomy assignments from 16S rRNA sequences produced by highly
parallel pyrosequencers. Nucleic acids research, 2008. 36(18): p. e120-e120.
279. Kim, M., M. Morrison, and Z. Yu, Evaluation of different partial 16S rRNA gene sequence
regions for phylogenetic analysis of microbiomes. Journal of microbiological methods, 2011.
84(1): p. 81-87.
280. Kembel, S.W., et al., Incorporating 16S gene copy number information improves estimates of
microbial diversity and abundance. PLoS computational biology, 2012. 8(10): p. e1002743.
281. Woese, C.R. and G.E. Fox, Phylogenetic structure of the prokaryotic domain: the primary
kingdoms. Proceedings of the National Academy of Sciences, 1977. 74(11): p. 5088-5090.
282. Kuczynski, J., et al., Experimental and analytical tools for studying the human microbiome.
Nature Reviews Genetics, 2012. 13(1): p. 47-58.
283. Caporaso, J.G., et al., QIIME allows analysis of high-throughput community sequencing data.
Nature methods, 2010. 7(5): p. 335-336.
284. Schloss, P.D., et al., Introducing mothur: open-source, platform-independent, community-
supported software for describing and comparing microbial communities. Applied and
environmental microbiology, 2009. 75(23): p. 7537-7541.
285. Bokulich, N.A., et al., Quality-filtering vastly improves diversity estimates from Illumina
amplicon sequencing. Nature methods, 2013. 10(1): p. 57-59.
286. Kunin, V., et al., Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial
inflation of diversity estimates. Environmental microbiology, 2010. 12(1): p. 118-123.
287. Brakenhoff, R.H., J. Schoenmakers, and N.H. Lubsen, Chimeric cDNA clones: a novel PCR
artifact. Nucleic acids research, 1991. 19(8): p. 1949.
288. Haas, B.J., et al., Chimeric 16S rRNA sequence formation and detection in Sanger and 454-
pyrosequenced PCR amplicons. Genome research, 2011. 21(3): p. 494-504.
289. Lee, C.K., et al., Groundtruthing next-gen sequencing for microbial ecology–biases and errors
in community structure estimates from PCR amplicon pyrosequencing. PloS one, 2012. 7(9): p.
e44224.
290. Ley, R.E., et al., Evolution of mammals and their gut microbes. Science, 2008. 320(5883): p.
1647-1651.
291. Wang, X., et al., M-pick, a modularity-based method for OTU picking of 16S rRNA sequences.
BMC bioinformatics, 2013. 14(1): p. 43.
292. Cai, Y. and Y. Sun, ESPRIT-Tree: hierarchical clustering analysis of millions of 16S rRNA
pyrosequences in quasilinear computational time. Nucleic acids research, 2011. 39(14): p.
e95-e95.
293. Edgar, R.C., Search and clustering orders of magnitude faster than BLAST. Bioinformatics,
2010. 26(19): p. 2460-2461.
294. He, Y., et al., Stability of operational taxonomic units: an important but neglected property for
analyzing microbial diversity. Microbiome, 2015. 3(1): p. 20.
295. Wang, Q., et al., Naive Bayesian classifier for rapid assignment of rRNA sequences into the
new bacterial taxonomy. Applied and environmental microbiology, 2007. 73(16): p. 5261-
5267.
296. Westcott, S.L. and P.D. Schloss, De novo clustering methods outperform reference-based
methods for assigning 16S rRNA gene sequences to operational taxonomic units. PeerJ, 2015.
3: p. e1487.
297. Kopylova, E., et al., Open-source sequence clustering methods improve the state of the art.
mSystems, 2016. 1(1): p. e00003-15.
209
298. Bacci, G., et al., Evaluation of the performances of ribosomal database project (RDP) classifier
for taxonomic assignment of 16S rRNA metabarcoding sequences generated from Illumina-
Solexa NGS. Journal of genomics, 2015. 3: p. 36.
299. Cole, J.R., et al., Ribosomal Database Project: data and tools for high throughput rRNA
analysis. Nucleic acids research, 2013. 42(D1): p. D633-D642.
300. Pruesse, E., et al., SILVA: a comprehensive online resource for quality checked and aligned
ribosomal RNA sequence data compatible with ARB. Nucleic acids research, 2007. 35(21): p.
7188-7196.
301. DeSantis, T.Z., et al., Greengenes, a chimera-checked 16S rRNA gene database and workbench
compatible with ARB. Applied and environmental microbiology, 2006. 72(7): p. 5069-5072.
302. Werner, J.J., et al., Impact of training sets on classification of high-throughput bacterial 16s
rRNA gene surveys. The ISME journal, 2012. 6(1): p. 94-103.
303. DeSantis, T., et al., NAST: a multiple sequence alignment server for comparative analysis of
16S rRNA genes. Nucleic acids research, 2006. 34(suppl_2): p. W394-W399.
304. Caporaso, J.G., et al., PyNAST: a flexible tool for aligning sequences to a template alignment.
Bioinformatics, 2009. 26(2): p. 266-267.
305. Price, M.N., P.S. Dehal, and A.P. Arkin, FastTree 2–approximately maximum-likelihood trees
for large alignments. PloS one, 2010. 5(3): p. e9490.
306. Faith, D.P. and A.M. Baker, Phylogenetic diversity (PD) and biodiversity conservation: some
bioinformatics challenges. Evolutionary bioinformatics online, 2006. 2: p. 121.
307. Lozupone, C. and R. Knight, UniFrac: a new phylogenetic method for comparing microbial
communities. Applied and environmental microbiology, 2005. 71(12): p. 8228-8235.
308. Plummer, E., et al., A comparison of three bioinformatics pipelines for the analysis of preterm
gut microbiota using 16S rRNA gene sequencing data. Journal of Proteomics & Bioinformatics,
2015. 8(12): p. 283.
309. Brewer, A. and M. Williamson, A new relationship for rarefaction. Biodiversity and
Conservation, 1994. 3(4): p. 373-379.
310. Lozupone, C.A. and R. Knight, Species divergence and the measurement of microbial diversity.
FEMS microbiology reviews, 2008. 32(4): p. 557-578.
311. McMurdie, P.J. and S. Holmes, Waste not, want not: why rarefying microbiome data is
inadmissible. PLoS computational biology, 2014. 10(4): p. e1003531.
312. Friedman, J. and E.J. Alm, Inferring correlation networks from genomic survey data. PLoS
computational biology, 2012. 8(9): p. e1002687.
313. Mandal, S., et al., Analysis of composition of microbiomes: a novel method for studying
microbial composition. Microbial ecology in health and disease, 2015. 26(1): p. 27663.
314. Lovell, D., et al., Proportionality: a valid alternative to correlation for relative data. PLoS
computational biology, 2015. 11(3): p. e1004075.
315. Dillies, M.-A., et al., A comprehensive evaluation of normalization methods for Illumina high-
throughput RNA sequencing data analysis. Briefings in bioinformatics, 2013. 14(6): p. 671-683.
316. Martín-Fernández, J.-A., et al., Bayesian-multiplicative treatment of count zeros in
compositional data sets. Statistical Modelling, 2015. 15(2): p. 134-158.
317. Aitchison, J., The statistical analysis of compositional data. 1986.
318. Kuczynski, J., et al., Microbial community resemblance methods differ in their ability to detect
biologically relevant patterns. Nature methods, 2010. 7(10): p. 813-819.
319. Chao, A., Nonparametric estimation of the number of classes in a population. Scandinavian
Journal of statistics, 1984: p. 265-270.
320. Shannon, C.E. and W. Weaver, The mathematical theory of information. 1949.
210
321. Mosca, A., M. Leclerc, and J.P. Hugot, Gut microbiota diversity and human diseases: should
we reintroduce key predators in our ecosystem? Frontiers in microbiology, 2016. 7.
322. Eckburg, P.B., et al., Diversity of the human intestinal microbial flora. science, 2005.
308(5728): p. 1635-1638.
323. Chao, A., et al., Abundance ‐based similarity indices and their estimation when there are
unseen species in samples. Biometrics, 2006. 62(2): p. 361-371.
324. Lozupone, C., et al., UniFrac: an effective distance metric for microbial community
comparison. The ISME journal, 2010. 5(2): p. 169.
325. Anderson, M.J., A new method for non ‐parametric multivariate analysis of variance. Austral
ecology, 2001. 26(1): p. 32-46.
326. Matsen IV, F.A. and S.N. Evans, Edge principal components and squash clustering: using the
special structure of phylogenetic placement data for sample comparison. PLoS One, 2013.
8(3): p. e56859.
327. Wagner, B.D., C.E. Robertson, and J.K. Harris, Application of two-part statistics for
comparison of sequence variant counts. PloS one, 2011. 6(5): p. e20296.
328. Anders, S. and W. Huber, Differential expression analysis for sequence count data. Genome
biology, 2010. 11(10): p. R106.
329. Love, M.I., S. Anders, and W. Huber, Moderated estimation of fold change and dispersion for
RNA-seq data with DESeq2. Genome biology, 2014. 15(12): p. 550.
330. Hardcastle, T.J. and K.A. Kelly, baySeq: empirical Bayesian methods for identifying differential
expression in sequence count data. BMC bioinformatics, 2010. 11(1): p. 422.
331. Paulson, J.N., M. Pop, and H. Bravo, metagenomeSeq: statistical analysis for sparse high-
throughput sequencing. Bioconductor, 2014: p. p1-20.
332. Weiss, S., et al., Normalization and microbial differential abundance strategies depend upon
data characteristics. Microbiome, 2017. 5(1): p. 27.
333. Fang, R., et al., Application of zero-inflated negative binomial mixed model to human
microbiota sequence data. 2014, PeerJ PrePrints.
334. Zhang, X., et al., Negative binomial mixed models for analyzing microbiome count data. BMC
bioinformatics, 2017. 18(1): p. 4.
335. Thorsen, J., et al., Large-scale benchmarking reveals false discoveries and count
transformation sensitivity in 16S rRNA gene amplicon data analysis methods used in
microbiome studies. Microbiome, 2016. 4(1): p. 62.
336. Koh, H., M.J. Blaser, and H. Li, A powerful microbiome-based association test and a microbial
taxa discovery framework for comprehensive association mapping. Microbiome, 2017. 5(1): p.
45.
337. Knights, D., E.K. Costello, and R. Knight, Supervised classification of human microbiota. FEMS
microbiology reviews, 2011. 35(2): p. 343-359.
338. Koren, O., et al., A guide to enterotypes across the human body: meta-analysis of microbial
community structures in human microbiome datasets. PLoS computational biology, 2013. 9(1):
p. e1002863.
339. Breiman, L., Random forests. Machine learning, 2001. 45(1): p. 5-32.
340. Cortes, C. and V. Vapnik, Support-vector networks. Machine learning, 1995. 20(3): p. 273-297.
341. Baxter, N.T., et al., Microbiota-based model improves the sensitivity of fecal immunochemical
test for detecting colonic lesions. Genome medicine, 2016. 8(1): p. 37.
342. Arumugam, M., et al., Enterotypes of the human gut microbiome. nature, 2011. 473(7346): p.
174-180.
343. Knights, D., et al., Rethinking “enterotypes”. Cell host & microbe, 2014. 16(4): p. 433-437.
211
344. Ravel, J., et al., Vaginal microbiome of reproductive-age women. Proceedings of the National
Academy of Sciences, 2011. 108(Supplement 1): p. 4680-4687.
345. Gajer, P., et al., Temporal dynamics of the human vaginal microbiota. Science translational
medicine, 2012. 4(132): p. 132ra52-132ra52.
346. Langille, M.G., et al., Predictive functional profiling of microbial communities using 16S rRNA
marker gene sequences. Nature biotechnology, 2013. 31(9): p. 814-821.
347. Aßhauer, K.P., et al., Tax4Fun: predicting functional profiles from metagenomic 16S rRNA
data. Bioinformatics, 2015. 31(17): p. 2882-2884.
348. zur Hausen, H., The search for infectious causes of human cancers: where and why. Virology,
2009. 392(1): p. 1-10.
349. Polk, D.B. and R.M. Peek, Helicobacter pylori: gastric cancer and beyond. Nature Reviews
Cancer, 2010. 10(6): p. 403-414.
350. Uronis, J.M., et al., Modulation of the intestinal microbiota alters colitis-associated colorectal
cancer susceptibility. PloS one, 2009. 4(6): p. e6026.
351. Sellon, R.K., et al., Resident enteric bacteria are necessary for development of spontaneous
colitis and immune system activation in interleukin-10-deficient mice. Infection and immunity,
1998. 66(11): p. 5224-5231.
352. Dove, W.F., et al., Intestinal neoplasia in the ApcMin mouse: independence from the microbial
and natural killer (beige locus) status. Cancer research, 1997. 57(5): p. 812-814.
353. Kostic, A.D., et al., Genomic analysis identifies association of Fusobacterium with colorectal
carcinoma. Genome research, 2012. 22(2): p. 292-298.
354. Geng, J., et al., Co-occurrence of driver and passenger bacteria in human colorectal cancer.
Gut pathogens, 2014. 6: p. 26-26.
355. Geng, J., et al., Diversified pattern of the human colorectal cancer microbiome. Gut pathogens,
2013. 5(1): p. 2.
356. Mira-Pascual, L., et al., Microbial mucosal colonic shifts associated with the development of
colorectal cancer reveal the presence of different bacterial and archaeal biomarkers. Journal
of gastroenterology, 2015. 50(2): p. 167-179.
357. Chen, W., et al., Human intestinal lumen and mucosa-associated microbiota in patients with
colorectal cancer. PloS one, 2012. 7(6): p. e39743.
358. Weir, T.L., et al., Stool microbiome and metabolome differences between colorectal cancer
patients and healthy adults. PloS one, 2013. 8(8): p. e70803.
359. Wu, N., et al., Dysbiosis signature of fecal microbiota in colorectal cancer patients. Microbial
ecology, 2013. 66(2): p. 462-470.
360. Wang, T., et al., Structural segregation of gut microbiota between colorectal cancer patients
and healthy volunteers. The ISME journal, 2012. 6(2): p. 320-329.
361. Sobhani, I., et al., Microbial dysbiosis in colorectal cancer (CRC) patients. PloS one, 2011. 6(1):
p. e16393.
362. Wexler, H.M., Bacteroides: the good, the bad, and the nitty-gritty. Clinical microbiology
reviews, 2007. 20(4): p. 593-621.
363. Gagnière, J., et al., Gut microbiota imbalance and colorectal cancer. World journal of
gastroenterology, 2016. 22(2): p. 501.
364. Yamamoto, M. and S. Matsumoto, Gut microbiota and colorectal cancer. Genes and
Environment, 2016. 38(1): p. 11.
365. Han, Y.W., Fusobacterium nucleatum: a commensal-turned pathogen. Current opinion in
microbiology, 2015. 23: p. 141-147.
366. Castellarin, M., et al., Fusobacterium nucleatum infection is prevalent in human colorectal
carcinoma. Genome research, 2012. 22(2): p. 299-306.
212
367. Flanagan, L., et al., Fusobacterium nucleatum associates with stages of colorectal neoplasia
development, colorectal cancer and disease outcome. European journal of clinical
microbiology & infectious diseases, 2014. 33(8): p. 1381-1390.
368. McCoy, A.N., et al., Fusobacterium is associated with colorectal adenomas. PloS one, 2013.
8(1): p. e53653.
369. Li, Y.-Y., et al., Association of Fusobacterium nucleatum infection with colorectal cancer in
Chinese patients. World journal of gastroenterology, 2016. 22(11): p. 3227.
370. Yang, Y., et al., Fusobacterium nucleatum Increases Proliferation of Colorectal Cancer Cells
and Tumor Development in Mice by Activating Toll-Like Receptor 4 Signaling to Nuclear
Factor− κB, and Up-regulating Expression of MicroRNA-21. Gastroenterology, 2017. 152(4): p.
851-866. e24.
371. Kostic, A.D., et al., Fusobacterium nucleatum potentiates intestinal tumorigenesis and
modulates the tumor-immune microenvironment. Cell host & microbe, 2013. 14(2): p. 207-
215.
372. Medina, P.P., M. Nolde, and F.J. Slack, OncomiR addiction in an in vivo model of microRNA-
21-induced pre-B-cell lymphoma. Nature, 2010. 467(7311): p. 86-90.
373. Rubinstein, M.R., et al., Fusobacterium nucleatum promotes colorectal carcinogenesis by
modulating E-cadherin/β-catenin signaling via its FadA adhesin. Cell host & microbe, 2013.
14(2): p. 195-206.
374. McCoy, W. and J. Mason 3rd, Enterococcal endocarditis associated with carcinoma of the
sigmoid; report of a case. Journal of the Medical Association of the State of Alabama, 1951.
21(6): p. 162-166.
375. Abdulamir, A.S., R.R. Hafidh, and F.A. Bakar, Molecular detection, quantification, and isolation
of Streptococcus gallolyticus bacteria colonizing colorectal tumors: inflammation-driven
potential of carcinogenesis via IL-1, COX-2, and IL-8. Molecular cancer, 2010. 9(1): p. 249.
376. Corredoira-Sánchez, J., et al., Association Between Bacteremia Due to Streptococcus
gallolyticus subsp. gallolyticu s (Streptococcus bovis I) and Colorectal Neoplasia: A Case-
Control Study. Clinical infectious diseases, 2012. 55(4): p. 491-496.
377. Boleij, A., et al., Clinical Importance of Streptococcus gallolyticus infection among colorectal
cancer patients: systematic review and meta-analysis. Clinical Infectious Diseases, 2011. 53(9):
p. 870-878.
378. Klein, R.S., et al., Association of Streptococcus bovis with carcinoma of the colon. New
England Journal of Medicine, 1977. 297(15): p. 800-802.
379. Biarc, J., et al., Carcinogenic properties of proteins with pro-inflammatory activity from
Streptococcus infantarius (formerly S. bovis). Carcinogenesis, 2004. 25(8): p. 1477-1484.
380. Ellmerich, S., et al., Promotion of intestinal carcinogenesis by Streptococcus bovis.
Carcinogenesis, 2000. 21(4): p. 753-756.
381. Kuwahara, T., et al., Genomic analysis of Bacteroides fragilis reveals extensive DNA inversions
regulating cell surface adaptation. Proceedings of the National Academy of Sciences of the
United States of America, 2004. 101(41): p. 14919-14924.
382. Sears, C.L., et al., Association of enterotoxigenic Bacteroides fragilis infection with
inflammatory diarrhea. Clinical Infectious Diseases, 2008. 47(6): p. 797-803.
383. Purcell, R.V., et al., Colonization with enterotoxigenic Bacteroides fragilis is associated with
early-stage colorectal neoplasia. PloS one, 2017. 12(2): p. e0171602.
384. Boleij, A., et al., The Bacteroides fragilis toxin gene is prevalent in the colon mucosa of
colorectal cancer patients. Clinical Infectious Diseases, 2014. 60(2): p. 208-215.
385. Ulger Toprak, N., et al., A possible role of Bacteroides fragilis enterotoxin in the aetiology of
colorectal cancer. Clinical microbiology and infection, 2006. 12(8): p. 782-786.
213
386. Sanfilippo, L., et al., Bacteroides fragilis enterotoxin induces the expression of IL ‐8 and
transforming growth factor ‐beta (TGF ‐β) by human colonic epithelial cells. Clinical &
Experimental Immunology, 2000. 119(3): p. 456-463.
387. Housseau, F. and C.L. Sears, Enterotoxigenic Bacteroides fragilis (ETBF)-mediated colitis in
Min (Apc+/-) mice: a human commensal-based murine model of colon carcinogenesis. 2010,
Taylor & Francis.
388. Rhee, K.-J., et al., Induction of persistent colitis by a human commensal, enterotoxigenic
Bacteroides fragilis, in wild-type C57BL/6 mice. Infection and immunity, 2009. 77(4): p. 1708-
1718.
389. Wells, C.L., et al., Bacteroides fragilis enterotoxin modulates epithelial permeability and
bacterial internalization by HT-29 enterocytes. Gastroenterology, 1996. 110(5): p. 1429-1437.
390. Obiso, R., A.O. Azghani, and T.D. Wilkins, The Bacteroides fragilis toxin fragilysin disrupts the
paracellular barrier of epithelial cells. Infection and immunity, 1997. 65(4): p. 1431-1439.
391. Riegler, M., et al., Bacteroides fragilis toxin 2 damages human colonic mucosa in vitro. Gut,
1999. 44(4): p. 504-510.
392. Wu, S., et al., A human colonic commensal promotes colon tumorigenesis via activation of T
helper type 17 T cell responses. Nature medicine, 2009. 15(9): p. 1016-1022.
393. Sears, C.L., A.L. Geis, and F. Housseau, Bacteroides fragilis subverts mucosal biology: from
symbiont to colon carcinogenesis. The Journal of clinical investigation, 2014. 124(10): p. 4166.
394. Wu, S., et al., Bacteroides fragilis enterotoxin induces c-Myc expression and cellular
proliferation. Gastroenterology, 2003. 124(2): p. 392-400.
395. Escobar-Páramo, P., et al., Large-scale population structure of human commensal Escherichia
coli isolates. Applied and environmental microbiology, 2004. 70(9): p. 5698-5700.
396. Bonnet, M., et al., Colonization of the human gut by E. coli and colorectal cancer risk. Clinical
Cancer Research, 2014. 20(4): p. 859-867.
397. Prorok-Hamon, M., et al., Colonic mucosa-associated diffusely adherent afaC+ Escherichia coli
expressing lpfA and pks are increased in inflammatory bowel disease and colon cancer. Gut,
2013: p. gutjnl-2013-304739.
398. Buc, E., et al., High prevalence of mucosa-associated E. coli producing cyclomodulin and
genotoxin in colon cancer. PloS one, 2013. 8(2): p. e56964.
399. Maddocks, O.D., et al., Attaching and effacing Escherichia coli downregulate DNA mismatch
repair protein in vitro and are associated with colorectal adenocarcinomas in humans. PloS
one, 2009. 4(5): p. e5517.
400. Martin, H.M., et al., Enhanced Escherichia coli adherence and invasion in Crohn’s disease and
colon cancer. Gastroenterology, 2004. 127(1): p. 80-93.
401. Swidsinski, A., et al., Association between intraepithelial Escherichia coli and colorectal cancer.
Gastroenterology, 1998. 115(2): p. 281-286.
402. Raisch, J., et al., Colon cancer-associated B2 Escherichia coli colonize gut mucosa and promote
cell proliferation. World Journal of Gastroenterology: WJG, 2014. 20(21): p. 6560.
403. Vizcaino, M.I. and J.M. Crawford, The colibactin warhead crosslinks DNA. Nature chemistry,
2015. 7(5): p. 411-417.
404. Maddocks, O.D.K., K.M. Scanlon, and M.S. Donnenberg, An Escherichia coli effector protein
promotes host mutation via depletion of DNA mismatch repair proteins. MBio, 2013. 4(3): p.
e00152-13.
405. Cuevas-Ramos, G., et al., Escherichia coli induces DNA damage in vivo and triggers genomic
instability in mammalian cells. Proceedings of the National Academy of Sciences, 2010.
107(25): p. 11537-11542.
214
406. He, X., et al., Cross-talk between E. coli strains and a human colorectal adenocarcinoma-
derived cell line. Scientific reports, 2013. 3: p. 3416.
407. Raisch, J., et al., Intracellular colon cancer-associated Escherichia coli promote protumoral
activities of human macrophages by inducing sustained COX-2 expression. Laboratory
Investigation, 2015. 95(3): p. 296-307.
408. Choi, H., et al., Enteropathogenic Escherichia coli-induced macrophage inhibitory cytokine 1
mediates cancer cell survival: an in vitro implication of infection-linked tumor dissemination.
Oncogene, 2013. 32(41): p. 4960-4969.
409. Bronowski, C., et al., A subset of mucosa-associated Escherichia coli isolates from patients
with colon cancer, but not Crohn's disease, share pathogenicity islands with urinary
pathogenic E. coli. Microbiology, 2008. 154(2): p. 571-583.
410. Cougnoux, A., et al., Bacterial genotoxin colibactin promotes colon tumour growth by
inducing a senescence-associated secretory phenotype. Gut, 2014. 63(12): p. 1932-1942.
411. Zhou, Y., et al., Association of oncogenic bacteria with colorectal cancer in South China.
Oncotarget, 2016. 7(49): p. 80794.
412. Balamurugan, R., et al., Real ‐time polymerase chain reaction quantification of specific
butyrate ‐producing bacteria, Desulfovibrio and Enterococcus faecalis in the feces of patients
with colorectal cancer. Journal of gastroenterology and hepatology, 2008. 23(8pt1): p. 1298-
1303.
413. Ruiz, P.A., et al., IL-10 gene-deficient mice lack TGF-β/Smad signaling and fail to inhibit
proinflammatory gene expression in intestinal epithelial cells after the colonization with
colitogenic Enterococcus faecalis. The Journal of Immunology, 2005. 174(5): p. 2990-2999.
414. Balish, E. and T. Warner, Enterococcus faecalis induces inflammatory bowel disease in
interleukin-10 knockout mice. The American journal of pathology, 2002. 160(6): p. 2253-2257.
415. Huycke, M.M., V. Abrams, and D.R. Moore, Enterococcus faecalis produces extracellular
superoxide and hydrogen peroxide that damages colonic epithelial cell DNA. Carcinogenesis,
2002. 23(3): p. 529-536.
416. Huycke, M.M., W. Joyce, and M.F. Wack, Augmented production of extracellular superoxide
by blood isolates of Enterococcus faecalis. Journal of Infectious Diseases, 1996. 173(3): p.
743-745.
417. Wang, X., Y. Yang, and M.M. Huycke, Commensal bacteria drive endogenous transformation
and tumour stem cell marker expression through a bystander effect. Gut, 2015. 64(3): p. 459-
468.
418. Wang, X. and M.M. Huycke, Extracellular superoxide production by Enterococcus faecalis
promotes chromosomal instability in mammalian cells. Gastroenterology, 2007. 132(2): p.
551-561.
419. Brim, H., et al., Microbiome analysis of stool samples from African Americans with colon
polyps. PloS one, 2013. 8(12): p. e81352.
420. Chen, H.-M., et al., Decreased dietary fiber intake and structural alteration of gut microbiota
in patients with advanced colorectal adenoma. The American journal of clinical nutrition,
2013. 97(5): p. 1044-1052.
421. Sanapareddy, N., et al., Increased rectal microbial richness is associated with the presence of
colorectal adenomas in humans. The ISME journal, 2012. 6(10): p. 1858-1868.
422. Scanlan, P.D., et al., Culture ‐independent analysis of the gut microbiota in colorectal cancer
and polyposis. Environmental microbiology, 2008. 10(3): p. 789-798.
423. Shen, X.J., et al., Molecular characterization of mucosal adherent bacteria and associations
with colorectal adenomas. Gut microbes, 2010. 1(3): p. 138-147.
215
424. Feng, Q., et al., Gut microbiome development along the colorectal adenoma-carcinoma
sequence. Nature communications, 2015. 6: p. 6528.
425. Nugent, J.L., et al., Altered tissue metabolites correlate with microbial dysbiosis in colorectal
adenomas. Journal of proteome research, 2014. 13(4): p. 1921-1929.
426. Zackular, J.P., et al., The human gut microbiome as a screening tool for colorectal cancer.
Cancer prevention research, 2014. 7(11): p. 1112-1121.
427. Flemer, B., et al., Tumour-associated and non-tumour-associated microbiota in colorectal
cancer. Gut, 2017. 66(4): p. 633-643.
428. Goedert, J.J., et al., Fecal microbiota characteristics of patients with colorectal adenoma
detected by screening: a population-based study. EBioMedicine, 2015. 2(6): p. 597-603.
429. Hale, V.L., et al., Shifts in the fecal microbiota associated with adenomatous polyps. Cancer
Epidemiology and Prevention Biomarkers, 2017. 26(1): p. 85-94.
430. Ito, M., et al., Association of Fusobacterium nucleatum with clinical and molecular features in
colorectal serrated pathway. International journal of cancer, 2015. 137(6): p. 1258-1268.
431. Kasai, C., et al., Comparison of human gut microbiota in control subjects and patients with
colorectal carcinoma in adenoma: terminal restriction fragment length polymorphism and
next-generation sequencing analyses. Oncology reports, 2016. 35(1): p. 325-333.
432. Peters, B.A., et al., The gut microbiota in conventional and serrated precursors of colorectal
cancer. Microbiome, 2016. 4(1): p. 69.
433. Yoon, H., et al., Comparisons of Gut Microbiota Among Healthy Control, Patients With
Conventional Adenoma, Sessile Serrated Adenoma, and Colorectal Cancer. Journal of cancer
prevention, 2017. 22(2): p. 108.
434. Jovel, J., et al., Characterization of the gut microbiome using 16S or shotgun metagenomics.
Frontiers in microbiology, 2016. 7.
435. Tjalsma, H., et al., A bacterial driver–passenger model for colorectal cancer: beyond the usual
suspects. Nature Reviews Microbiology, 2012. 10(8): p. 575-582.
436. Mima, K., et al., Fusobacterium nucleatum in colorectal carcinoma tissue and patient
prognosis. Gut, 2015: p. gutjnl-2015-310101.
437. Abed, J., et al., Fap2 mediates fusobacterium nucleatum colorectal adenocarcinoma
enrichment by binding to tumor-expressed Gal-GalNAc. Cell host & microbe, 2016. 20(2): p.
215-225.
438. Cougnoux, A., et al., Small-molecule inhibitors prevent the genotoxic and protumoural effects
induced by colibactin-producing bacteria. Gut, 2015: p. gutjnl-2014-307241.
439. Smith, P.M., et al., The microbial metabolites, short-chain fatty acids, regulate colonic Treg
cell homeostasis. Science, 2013. 341(6145): p. 569-573.
440. O'Keefe, S.J., et al., Products of the colonic microbiota mediate the effects of diet on colon
cancer risk. The Journal of nutrition, 2009. 139(11): p. 2044-2048.
441. David, L.A., et al., Diet rapidly and reproducibly alters the human gut microbiome. Nature,
2014. 505(7484): p. 559-63.
442. Walker, A.W., et al., Dominant and diet-responsive groups of bacteria within the human
colonic microbiota. The ISME journal, 2011. 5(2): p. 220-230.
443. Tian, Y., K. Wang, and G. Ji, P112 Short-chain fatty acids administration is protective in colitis-
associated colorectal cancer development. Journal of Crohn's & colitis, 2017. 11(suppl_1): p.
S132.
444. Qamar, T.R., et al., Novel combination of prebiotics galacto-oligosaccharides and inulin-
inhibited aberrant crypt foci formation and biomarkers of colon cancer in wistar rats.
Nutrients, 2016. 8(8): p. 465.
216
445. Zsivkovits, M., et al., Prevention of heterocyclic amine-induced DNA damage in colon and liver
of rats by different lactobacillus strains. Carcinogenesis, 2003. 24(12): p. 1913-1918.
446. Chen, Z.-Y., et al., Inhibitory Effects of Probiotic Lactobacillus on the Growth of Human Colonic
Carcinoma Cell Line HT-29. Molecules, 2017. 22(1): p. 107.
447. Del Carmen, S., et al., Anti-cancer effect of lactic acid bacteria expressing antioxidant
enzymes or IL-10 in a colorectal cancer mouse model. International immunopharmacology,
2017. 42: p. 122-129.
448. Le, D.T., et al., PD-1 blockade in tumors with mismatch-repair deficiency. New England
Journal of Medicine, 2015. 372(26): p. 2509-2520.
449. Brahmer, J.R., et al., Phase I study of single-agent anti–programmed death-1 (MDX-1106) in
refractory solid tumors: safety, clinical activity, pharmacodynamics, and immunologic
correlates. Journal of clinical oncology, 2010. 28(19): p. 3167-3175.
450. Vétizou, M., et al., Anticancer immunotherapy by CTLA-4 blockade relies on the gut
microbiota. Science, 2015. 350(6264): p. 1079-1084.
451. Sivan, A., et al., Commensal Bifidobacterium promotes antitumor immunity and facilitates
anti–PD-L1 efficacy. Science, 2015. 350(6264): p. 1084-1089.
217
Table 5.1. Most common diversity measurements in a human microbiome analysis (Adopted from Table 1, Lozupone and Knight, 2008 [310])
Diversity measurements within a single community
(α diversity)
Diversity measurements shared among communities
(β diversity)
Only presence/absence of
taxa considered
Qualitative α diversity (Richness)
OTUs-based
1
:
Total number of OTUs (D)
Chao 1 (𝐶 ℎ𝑎𝑜 1 = 𝐷 +
𝑓 1
2
2𝑓 2
)
Phylogeny-based
2
:
Phylogenetic Diversity (𝑃𝐷 = ∑ 𝑏 𝑖 𝐼 (𝑝 𝑖 > 0)
𝑛 𝑖 =1
)
Qualitative β diversity
OTUs-based
1
:
Jaccard index (𝐽 = 1 −
∑ min (𝑥 𝑖 𝐴 ,𝑥 𝑖 𝐵 )
𝑛 𝑖 =1
∑ max (𝑥 𝑖 𝐴 ,𝑥 𝑖 𝐵 )
𝑛 𝑖 =1
)
Phylogeny-based
2
:
Unweighted UniFrac (𝑑 (𝑈 )
=
∑ 𝑏 𝑖 |𝐼 (𝑝 𝑖 𝐴 >0)−𝐼 (𝑝 𝑖 𝐵 >0)|
𝑛 𝑖 =1
∑ 𝑏 𝑖 𝑛 𝑖 =1
)
Additionally accounting for
relative abundance of each
taxon
Quantitative α diversity (Richness and/or Evenness)
OTUs-based
1
:
Shannon index (𝐻 = ∑ 𝑝 𝑖 𝑙𝑛 (𝑝 𝑖 )
𝑛 𝑖 =1
)
Simpson index (𝑆𝐼 = 1 − ∑ 𝑝 𝑖 2 𝑛 𝑖 =1
)
Quantitative β diversity
OTUs-based
1
:
Bray-Curtis (𝐵 =
∑ |𝑥 𝑖 𝐴 −𝑥 𝑖 𝐵 |
𝑛 𝑖 =1
∑ (𝑥 𝑖 𝐴 −𝑥 𝑖 𝐵 )
𝑛 𝑖 =1
)
Phylogeny-based
2
:
Weighted UniFrac (𝑑 (𝑊 )
=
∑ 𝑏 𝑖 |𝑝 𝑖 𝐴 −𝑝 𝑖 𝐵 |
𝑛 𝑖 =1
∑ 𝑏 𝑖 (𝑝 𝑖 𝐴 +𝑝 𝑖 𝐵 )
𝑛 𝑖 =1
)
1
OTUs-based formula:
n: the number of OTUs;
D: the number of unique OTUs observed in the sample;
f1: the number of OTUs for which only one read has been found in the sample;
f2: the number of OTUs for which two reads have been found in the sample;
pi: the fraction of reads that belong to OTU i;
xi
A
and xi
B
: the abundances of OTU i in sample A and B, respectively.
2
Phylogeny-based formula:
n: the number of branches in the phylogenetic tree;
bi: the length of branch i;
pi: the proportion of taxa descending from branch i;
pi
A
and pi
B
: the proportion of taxa descending from branch i for samples A and B, respectively;
I(pi>0): the indicator function that assumes the value of 1 if any taxa descending from branch i is present in the sample or 0 otherwise;
I(pi
A
>0) and I(pi
B
>0): the indicator functions that assume the value of 1 if any taxa descending from branch i is present in the sample A and B, respectively, or 0
otherwise;
218
Table 5.2. Published human studies of gut microbiota and colorectal polyps: Study design and methods
Study Sample Type Sample Collection Sample Size Study Method 16S rRNA Sequencing Process Reference
Scanlan et al.,
2008
Fecal Samples Time 0, 6 and 12
weeks after
colonoscopy;
Frozen samples
Healthy controls (n=20);
Polypectomized cases
(n=20);
Colon cancer cases (n=20)
16S rRNA gene
DGGE;
RISA
PCR amplification of partial 16S rRNA genes and
ribosomal intergenic spacer for profile analysis
[422]
Shen et al., 2010 Normal colonic
mucosal biopsy
At colonoscopy;
Frozen samples
Non-adenoma controls
(excluded previous
adenomas) (n=23);
Adenoma cases (n=21)
16S rRNA
sequencing;
T-RFLP;
FISH
16S rRNA gene amplification: 8F and 1492R primers;
Sequenced primers: M13-F and M13-R;
Sequence edit, assembly and alignment: Sequencher
v4.8;
Chimeras check: Bellerophon v3;
OTUs assignment: SeqMatch in RDP-II;
Taxonomic classification: naïve Bayesian rRNA
classifier and BLAST program
[423]
Sanapareddy et
al., 2012
Normal rectal mucosal
biopsy
At colonoscopy;
Frozen samples
Non-adenoma controls
(excluded previous
adenomas) (n=38);
Adenoma cases (n=33)
16S rRNA
pyrosequencing;
RT-qPCR
16S rRNA gene amplification and sequencing: F8-
R357 primers for V1-V2 region for Roche 454
titanium pyrosequencing;
Sequence filtering: RDP pipeline and OTU pipeline;
Chimeras check: ChimeraSlayer;
OTUs assignment: AbundantOTU;
Taxonomic classification: RDP Classifier 2.1
Tree generation: MOTHUR from the Silva reference
[421]
Brim et al., 2013 Fecal samples (African
Americans)
At least 2 months
after colonoscopy;
Frozen samples
Healthy controls (n=6);
Colon polyps cases (n=6)
16S rRNA
pyrosequencing;
16S rRNA-based
phylogenetic
microarray;
HITChip
16S rRNA gene amplification and sequencing: primers
for V1-V3 region for Roche 454 pyrosequencing;
Chimeras check: ChimeraSlayer;
Sequence process and OTUs assignment: MOTHUR;
Taxonomic classification: RDP Classifier
[419]
219
Table 5.2. Continued
Study Sample Type Sample Collection Sample Size Study Method 16S rRNA Sequencing Process Reference
Chen et al., 2013 Fecal samples
(Chinese)
Not mentioned;
Frozen samples
Healthy controls (n=47);
Sex- and age-matched
advanced colorectal
adenoma cases (n=47)
16S rRNA
pyrosequencing;
Fecal SCFA analysis;
RT-qPCR
16S rRNA gene amplification and sequencing: 27F
and 533R primers for V1-V3 region for Roche 454 GS
FLX pyrosequencing;
Sequence process and OTUs assignment: MOTHUR;
Taxonomic classification: the SILVA database
[420]
McCoy et al., 2013 Normal rectal mucosal
biopsy
At colonoscopy;
Frozen samples
Non-adenoma controls
(n=67);
Adenoma cases (n=48);
Matched tumor and
normal tissue biopsies
from CRC patients (n=10)
16S rRNA
pyrosequencing
(CRC tumor tissues
only);
Fusobacterium
culture and RT-
qPCR (Adenoma
and CRC tissues)
16S rRNA gene amplification and sequencing: 27F
and 338R primers for V1-V3 region for Roche 454 GS
FLX pyrosequencing;
Sequence denoise: PyroNoise
Sequence process, OTUs assignment and Taxonomic
classification: QIIME
[368]
Mira-Pascual et
al., 2014
Fecal samples and
normal rectal mucosal
biopsy (lesion tissue
and adjacent normal
tissue for adenomas
and CRC cases)
Fecal samples: at
least 1 week before
colonoscopy;
Biopsy samples: at
colonoscopy;
Frozen samples
Age- and sexed matched:
Healthy controls (n=10);
Tubular adenomas cases
(n=11);
CRC adenomacarcinoma
cases (n=7)
16S rRNA
pyrosequencing;
RT-qPCR
16S rRNA gene amplification and sequencing: 27F
and 533R primers for V1-V3 region for Roche 454 GS
FLX pyrosequencing;
Sequence process and Chimeras check: MOTHUR
OTUs assignment and Taxonomic classification: RDP
pyrosequencing pipeline
[356]
Calcium
Nugent et al.,
2014
Normal rectal mucosal
biopsy
At colonoscopy;
Frozen samples
Non-adenoma controls
(n=15);
Adenoma cases (n=15)
16S rRNA
sequencing;
RT-qPCR;
Metabolomics
Similar to Shen et al., 2010.
[425]
220
Table 5.2. Continued
Study Sample Type Sample Collection Sample Size Study Method 16S rRNA Sequencing Process Reference
Zackular et al.,
2014
Fecal samples 1-4 weeks after
colonoscopy;
Frozen samples
Healthy controls (n=30);
Adenomas cases (n=30);
CRC cases (n=30)
16S rRNA
sequencing
16S rRNA gene amplification and sequencing: primers
for V4 region using the Illumina MiSeq Sequencing
platform;
Chimeras check: UCHIME
OTUs assignment: naïve Bayesian rRNA classifier
trained against the RDP training set;
Taxonomic classification: the SILVA database
[426]
Feng et al., 2015 Fecal samples
(Caucasians)
Not mentioned;
Frozen samples
Healthy controls (n=55);
Advanced adenoma cases
(n=42);
CRC cases (n=41)
Metagenomic
sequencing
Paired-end metagenomic sequencing: Illumina
platform;
Sequence quality controls and assembly:
SOAPdenovo v2.04;
Gene prediction: GeneMark v2.7d
Taxonomic classification: the IMG database (v400)
using an in-house pipeline with BLASTN
[424]
Ito et al., 2015 Formalin-fixed
paraffin-embedded
(FFPE) tissues
(Japanese)
At colonoscopy Hyperplastic polyps cases
(n=138);
Sessile serrated adenomas
cases (n=129);
Traditional serrated
adenomas cases (n=102);
Non-serrated adenomas
cases (n=131);
CRC cases (n=544)
qPCR Genomic DNA was extracted and amplified for
Fusobacterium nucleatum.
[430]
221
Table 5.2. Continued
Study Sample Type Sample Collection Sample Size Study Method 16S rRNA Sequencing Process Reference
Goedert et al.,
2015
Fecal samples
(Chinese)
About 2 weeks
before colonoscopy;
RNAlater with
kanamycin treated
Normal controls (n=24);
Adenoma cases (n=20);
CRC cases (n=2)
16S rRNA
sequencing
16S rRNA gene amplification and sequencing:
319F/806R primers for V3-V4 region using the
Illumina MiSeq 250PE instrument;
Sequence process: the pipeline of the Institute of
Genome Sciences, University of Maryland Medical
School;
OTUs assignment and Taxonomic classification: RDP
naïve Bayesian classifier;
Further analysis: QIIME
[428]
Peters et al., 2016 Fecal samples CDC samples
(n=398): up to 4
months before
colonoscopy;
NYU samples
(n=142): up to 3
years after
colonoscopy;
Collected on
Beckman Coulter
Henoccult II SENSA
cards
Polyp-free controls
(n=323);
Conventional adenoma
cases (n=144): proximal
cases (n=87)/distal cases
(n=55), or non-advanced
cases (n=121)/advanced
cases (n=22);
Serrated polyp cases
(n=73): hyperplastic
polyps cases
(n=40)/sessile serrated
adenomas cases (n=33)
16S rRNA
sequencing
16S rRNA gene amplification and sequencing:
F515/R806 primers for V4 region using the Illumina
MiSeq platform;
Chimeras check: ChimeraSlayer;
Sequence process: QIIME pipeline
Taxonomic classification: IMG/GG Greengenes
database
[432]
Kasai et al., 2016 Fecal samples
(Japanese)
Immediately before
colonoscopy
preparation;
Fresh samples
(stored at 4°C)
Non-adenoma controls
(n=49);
Adenoma cases (n=50);
CRC cases (n=9): Invasive
cancer cases
(n=3)/Carcinoma in
adenoma cases (n=6)
16S rRNA
sequencing;
T-RFLP
16S rRNA gene amplification and sequencing:
341F/806R primers for V3-V4 region using the
Illumina MiSeq platform;
Sequence process: fastqjoin program
Microbial identification: Metagenome@Kin software
from the TechnoSuruga Lab Microbial identification
Database DB-BA9.0
[431]
222
Table 5.2. Continued
Study Sample Type Sample Collection Sample Size Study Method 16S rRNA Sequencing Process Reference
Baxter et al., 2016 Fecal samples Prior to colonoscopy
preparation or 1-2
weeks after
colonoscopy;
Frozen samples
Non-colonic lesion
controls (n=172);
Adenomas cases (n=198):
Non-advanced adenomas
cases (n=89)/Advanced
adenomas cases (n=109);
CRC cases (n=120): Stage I
cases (n=39)/Stage II cases
(n=35)/Stage III cases
(n=36)/Stage IV cases
(n=10)
16S rRNA
sequencing
16S rRNA gene amplification and sequencing:
Custom barcoded primers for V4 region using the
Illumina MiSeq platform;
Sequence process: MOTHUR;
OTUs assignment: naïve Bayesian rRNA classifier
trained against the RDP training set;
Taxonomic classification: the SILVA database
[341]
Hale et al., 2017 Fecal samples 3 months before
colonoscopy;
Frozen samples
Non-adenoma controls
(n=547);
Adenoma cases (n=233)
(excluded hyperplastic
polyps/adenomas
<1cm/CRC cases)
16S rRNA
sequencing
16S rRNA gene amplification and sequencing:
Performed on the Illumina MiSeq platform;
Sequence process, OTUs assignment and Taxonomic
classification: the IM-TORNADO bioinformatics
pipeline
[429]
Flemer et al., 2017 Mucosal biopsy
(CRC/polyps/controls);
Fecal samples
(CRC/controls)
Mucosal biopsy
(RNAlater treated):
at procedure;
Fecal samples
(Frozen): prior to
the bowel
preparation
Mucosal biopsy: Controls
(n=56)/Polyps cases
(n=21)/CRC cases (n=59);
Fecal samples: Age-
matched healthy controls
(n=37)/CRC cases (n=43)
16S rRNA
sequencing
16S rRNA gene amplification and sequencing:
Primers for V3-V4 region using the Illumina MiSeq
platform;
Sequence process: QIIME pipeline;
Chimeras check: ChimeraSlayer;
OTUs assignment: MOTHUR against the 16S rRNA
reference of RDP v14
[427]
223
Table 5.2. Continued
Study Sample Type Sample Collection Sample Size Study Method 16S rRNA Sequencing Process Reference
Yoon et al., 2017 Normal rectal mucosal
biopsy
(Korean)
At colonoscopy Sex- and age (within 3
years)-matched for each
group:
Healthy controls (n=6);
Advanced adenomas cases
(n=6);
Sessile serrated adenomas
cases (n=6);
CRC cases (n=6)
16S rRNA
pyrosequencing
16S rRNA gene amplification and sequencing:
9F/541R primers for V1-V3 region for barcoded 454
pryosequencing;
Chimera check: Bellerophon method against BLASTN;
Taxonomic classification: the EzTaxon-e database
[433]
Abbreviations: DGGE, denaturing gradient gel electrophoresis; RISA, Ribosomal intergenic spacer analysis; FISH, florescence in situ hybridization; T-RFLP, terminal
restriction fragment length polymorphism; OTU, operational taxonomic unit; RDP, Ribosomal Database Project; RT-qPCR, real-time quantitative PCR; HITChip, the
Human Intestinal Tract Chip; SCFA, short-chain fatty acid;
Bold Studies: Selected studies based on the similarity to our colorectal polyp microbiome study in California twins (Chapter 6)
224
Table 5.3. Findings on fecal microbiota diversity and composition associated with colorectal polyps from 8 published studies with the similar
study design and methods. The differences were compared between colorectal adenoma cases and no polyp controls, unless specified.
Study α Diversity β Diversity Taxa increased in Cases Taxa decreased in Cases Reference
Chen et al., 2013 Shannon Index: Similar
Chao1 index: Similar
PCA (Unweighted UniFrac):
p<0.001
Genus level: Enterococcus, Streptococcus Genus level: Bacteroides, Eubacterium,
Roseburia, Clostridium
[420]
Mira-Pascual et
al., 2014
NA PCoA (UniFrac): different,
but not as informative as
biopsies
Not found Not found [356]
Zackular et al.,
2014
NA NA OTU level: Ruminococcaceae, Clostridium
(OTU 60), Pseudomonas,
Porphyromonadaceae
OTU level: Bacteroides, Lachnospiraceae,
Clostridiales, Clostridium (OTUs 20/97/99)
[426]
Feng et al., 2015 * Richness (gene count/genus
count): increased in
advanced CA (p<0.05);
α-diversity (gene/genus):
increased in advanced CA
(p>0.05)
NA Species level: Bacteroides massiliensis,
Paraprevotella clara, Bacteroides dorei
Species level: Bifidobacterium animalis,
Streptococcus mutans
[424]
Goedert et al.,
2015
Richness and α-diversity
(Observed species, Chao 1
and PD whole tree, but not
Shannon index): slightly
higher in CA (all p>0.05)
Unweighted/Weighted
UniFrac distance: not
different (p>0.05)
Phylum level: Proteobacteria, TM7 Not found [428]
* The results were from metagenomic sequencing and not taken into count for Table 5.4.
225
Table 5.3. Continued
Study α Diversity β Diversity Taxa increased in Cases Taxa decreased in Cases Reference
Peters et al., 2016 Richness (number of OTUs):
decreased in CA (p=0.03),
increased in HP (p=0.09),
not differ from non-
advanced CA;
α-diversity (Shannon
index): decreased in CA
(p=0.09), increased in HP
(p=0.07), not differ from
non-advanced CA
Unweighted/Weighted
UniFrac distance: distal CA
and advanced CA cases
differ from controls
(p<0.05);
Other case groups were not
different from controls.
CA vs. Controls:
Class level: Bacilli, Gammaproteobacteria;
Order level: Enterobacteriales;
Genus level: Actinomyces,
Corynebacterium, Streptococcus,
Clostridiales/NA/NA,
[Tissierellaceae]/Peptoniphilus,
Phascolarctobacterium, Sutterella, Dorea;
HP vs. Controls:
Genus level: Anaerostipes
CA vs. Controls:
Class level: Clostridia;
Family level: Ruminococcaceae,
Clostridiaceae, Lachnospiraceae;
Genus level: [Mogibacteriaceae]/NA,
RF39/NA/NA, Coprobacillus;
HP vs. Controls:
Class level: Gammaproteobacteria;
Genus level: Coprobacillus;
SSA vs. Controls:
Class level: Erysipelotrichi
[432]
Baxter et al., 2016 NA NA Not found Family level: Lachnospiraceae
[341]
Hale et al., 2017 Richness (number of OTUs)
and α-diversity (Shannon
index): decreased in CA
(p>0.05)
PCoA
(Unweighted/Weighted
UniFrac): different (p<0.05)
Phylum level: Bacteroidetes;
Class level: Bacteroidia,
Betaproteobacteria, Deltaproteobacteria;
Order level: Bacteroidales,
Burkholderiales;
Family level: Alcallgenaceae;
Genus level: Bilophila, Mogibacterium,
Sutterella
Phylum level: Firmicutes, Tenericutes,
Cyanobacteria;
Class level: Mollicutes, Clostridia, OPB41,
Bifidobacteriales, Actinobacteria, Bacilli,
Lentisphaeria;
Order level: RF39, Pasteurellales,
RsaHF231, Clostridiales, Bifidobacteriales;
Family level: Pasteurellaceae,
Clostridiaceae;
Genus level: Heamophilus, Veillonella,
Clostridium, Streptococcus
[429]
Abbreviations: PCA, principal component analysis; PCoA, principal coordinates analysis; NA, not available; CA, colorectal adenomas; HP, hyperplastic polyps; SSA,
sessile serrated adenomas
226
Table 5.4. List of fecal bacteria associated with colorectal polyps that have been reported from at least
2 independent studies selected from 7 fecal microbiota studies based on 16S rRNA gene sequencing
(Table 5.3).
Taxa Enriched in Polyp Cases Depleted in Polyp Cases
Class
Bacilli Peters et al., 2016 Hale et al., 2017
Clostridia
Peters et al., 2016
Hale et al., 2017
Order
Clostridiales Peters et al., 2016 Zackular et al., 2014
Hale et al., 2017
RF39
Peters et al., 2016
Hale et al., 2017
Family
Clostridiaceae Peters et al., 2016
Hale et al., 2017
Lachnospiraceae
Mira-Pascual et al., 2014
Zackular et al., 2014
Peters et al., 2016
Baxter et al. 2016
Ruminococcaceae Zackular et al., 2014 Peters et al., 2016
Genus
Bacteroides
Chen et al., 2013
Zackular et al., 2014
Clostridium
Chen et al., 2013
Hale et al., 2017
Zackular et al., 2014 * Zackular et al., 2014 *
Sutterella Peters et al., 2016
Hale et al., 2017
Streptococcus Chen et al., 2013
Peters et al., 2016 Hale et al., 2017
* The same genus name but from different OTUs (likely different families).
227
Chapter 6 Persistent Fecal Microbiome Alteration among California Twins
with Colorectal Adenomas History
6.1 Abstract
Background: The gut microbial community has been reported involving in colorectal cancer (CRC)
initiation and progression recently. However, less attention was paid to the role of those intestinal
bacteria in relation to colorectal polyps, a potential precursor of CRC. Since the gut microbiota was
shaped in human’s early life and genetic background, and can be greatly affected by other factors, such
as age, diet, or medical conditions. We conducted a cross-sectional study to profile fecal microbial
composition as a surrogate of the gut microbiota in a set of genetic- and early childhood-matched
monozygotic (MZ) twins whose previous polyp status was identified from the most recent colonoscopy.
Methods: We enrolled 95 individual twins from the California Twin Program (CTP). Of these, 49
(representing 35 twin pairs) who provided fecal samples had documented past colonoscopies and no
gastrointestinal diseases, cancer or recent antibiotic use, composing of 8 hyperplastic polyps (HP), 8
colorectal adenomas (CRA) and 33 controls with no polyp history (NP). The V4 hypervariable region of
the 16S rRNA gene was sequenced using the HiSeq 2500 platform and processed using Quantitative
Insights Into Microbial Ecology (QIIME) pipeline v1.8.0. Alpha diversity (number of operational
taxonomic units (OTUs), Shannon index, and Phylogenetic diversity (PD) whole tree) and beta diversity
(weighted and unweighted UniFrac distances) were calculated after rarefaction for diversity
comparisons by polyp status. Differentially abundant taxa associated with polyp status were identified
by comparing relative abundances of each taxon using a zero-inflated negative binomial mixed model
with a random effect taking twin pair status into account.
Results: CRA cases had higher richness in stool than NP controls, which association was stronger
for those whose colonoscopy was performed within 3 years before fecal collection (p = 0.02). In relation
228
to overall microbiota composition, CRA cases differed significantly from NP controls (p = 0.02). In taxon-
based compositional analysis, fecal samples of CRA cases was depleted in a group of taxa including the
families Streptococcaceae, Micrococcaceae and Bacteroidaceae, and the genera Desulfovibrio, Rothia,
Streptococcus and Bacteroides, but abundant in the family Peptostreptococcaceae and the genus
Klebsiella. HP cases did not differ in diversity or composition from controls or CRA cases, though they
tended to be at an intermediate stage between controls and CRA cases. Fewer taxa were differentially
abundant between HP cases and controls.
Conclusions: Our results indicate that the alterations in the fecal microbiome among colorectal
adenoma patients may persist even years after colonoscopy and polypectomy. Findings in this study
may have implications for developing biomarkers for CRC preventative and therapeutic strategies.
6.2 Introduction
Colorectal cancer (CRC) is the third most common cancer in both men and women [1] and the
second leading cause of cancer-related deaths in the United States [2]. It has been generally accepted
that majority of CRCs arise from colorectal polyps, in particular colorectal adenomas and serrated
adenomas [3]. In the United States, nearly 19% of the general population was estimated to have
colorectal polyps during their lifetime [4], and the risk increases up to 40% among people over the age
of 60 [5]. Colorectal adenomas and hyperplastic polyps are the two most common colorectal polyps
detected by screening process, in which the former has malignancy to progress to CRC while the latter is
generally considered to be the benign tissue growth [3, 6, 7]. Observational studies have shown persons
with a previous history of colorectal adenomas have a higher risk to develop CRC than those with no
history of any polyps [8, 9].
The malignant evolution from colorectal polyps to CRC has been referred to the “adenoma-
carcinoma sequence” [10], in which loss of tumor suppressor gene APC (adenomatous polyposis coli)
has been associated with the initiation of aberrant crypt foci from normal mucosa, followed by
229
activating oncogene K-ras and inactivating tumor suppressor gene p53 that were commonly observed in
the histological lesions such as small adenoma as well as large polyps, and then altering the molecular
pathways involving in chromosomal and microsatellite instability, mismatch repair, and CpG island
methylation to progress to malignant adenocarcinoma [10, 11]. During this stepwise process, host
genotype and environment factors (e.g. diet or lifestyles) can involve in at any step and modulate the
direction as wells as the pace of the evolution, resulting in progression towards carcinoma, or
stabilization at an intermediate condition, or even regression to a less malignant stage [10]. As a result,
although every colorectal polyp has the malignant potential, only 2-5% polyps will actually develop into
an invasive cancer [12], which generally take from 5 to more than 20 years [13, 14]. Furthermore,
colorectal polyps can be detected and removed using screening procedures, such as colonoscopy and
polypectomy, thus reducing the CRC risk and potentially preventing 60% of death from CRC [15].
However, epidemiological evidences suggest 20-50% of patients will develop a metachronous colorectal
polyp with an increased risk of advanced neoplasm or cancer detected at follow-up colonoscopy within
3-5 years after colonoscopic removal of one or more adenomas [16-24]. The possible explanations
include the individual’s characteristics, such as the advanced age, unfavorable lifestyles and/or genetic
predispositions, or missed or incompletely resected lesions at the initial colonoscopy [23, 25-28]. But
there is still lack of knowledge on which factors may facilitate or accelerate this malignant
transformation after polypectomy.
The gut microbiota, a wide composition of microbial flora residing in the gastrointestinal tract,
has been postulated as an indicator of or contributor to CRC tumorigenesis, through interactions with
host metabolic and immune systems, as well as environment factors. Significantly reduced tumor loads
have been found in the genetically mutated mouse and rat models with CRC susceptibility in the
absence of the intestinal microbiota than under a conventional condition [29-31]. Technological
advancements in culture-independent methods, next-generation sequencing and bioinformatics tools
have unlocked the human microbiome research, displaying an extremely dynamic microbial community
230
in the human gut, which compositions and functions assemble and develop during the early years since
birth and then keep relatively stable with temporal variations over the adult lifespan of an individual
[32-34]. A number of factors can greatly affect the gut microbiota, including age [32-34], host genotype
[35-39], medicine use (e.g. antibiotics) [40-43] and diet [44-47]. Interestingly, those factors are also
recognized as risk factors of CRC as well as colorectal polyps [48, 49]. Recently, several bacterial species,
such as Fusobacterium nucleatum, Streptococcus bovis/gallolyticus, Bacteroides fragilis, Escherichia coli,
Enterococcus faecalis, have been associated with the initiation and progression of CRC in both human
and animal studies [50, 51]. The suggested underlying mechanisms include that bacteria or metabolites
and virulence factors produced by bacteria can permeabilize the gut mucosal barrier and invade
intestinal epithelial cells, can induce DNA damage and chromosomal instability as well as impair DNA
repair system, thus modulating host defense and immune system and activating carcinogenesis-
promoting pathways [52].
Unlike numerate studies focus on the gut microorganisms and CRC, few studies have
investigated the role of microbial community in colorectal polyps, and even less have addressed the
intestinal bacterial composition after colonoscopy and polypectomy. Recently, a bacterial “driver-
passenger” model [53] has been proposed on top of the colorectal adenoma-carcinoma sequence, in
which CRC can be initiated by “driver” bacteria, which are usually abundant at the polyp stage or even
earlier and are eventually replaced by “passenger” bacteria that can thrive better in the new intestinal
microenvironment, thus promoting or suppressing CRC tumorigenesis. Such a dynamic hypothesis arises
the new interests to picture the microbial community on the way from colorectal polyps towards CRC or
even before the formation of polyps, to better discriminate the malignant potential from colorectal
polyps even after polypectomy. For this purpose, fecal samples serve as a better tool than mucosal
biopsy tissues, since it is noninvasive and easy to be collected frequently even by the participants
themselves, which microbiota is a proxy of the intestinal bacterial community. Recent studies [54, 55]
have shown that the 16S rRNA-profiled microbial data from fecal samples in accompany with clinical
231
data can improve the accuracy of the existing screening methods in better discrimination between
healthy and colorectal polyps or CRC patients. Here, by further controlling for host genetic background
and early life exposures, we proposed a cross-sectional study to profile the diversity and composition of
fecal microbiota in a set of monozygotic (MZ) twins whose previous polyp history (no polyp, colorectal
adenomas or hyperplastic polyps) was identified based on the most recent colonoscopy.
6.3 Material and Methods
6.3.1 Study Population
The study is nested in the California Twin Program (CTP) developed and maintained at the
University of Southern California. CTP is a population-based cohort including twins born in California
between 1908 and 1982. The development and representativeness of CTP have been described
elsewhere [1, 2]. Briefly, 256,616 multiple births were identified through birth records and linked with
the California Department of Motor Vehicles (DMV) for address information between 1990 and 2001, in
which 161,109 individuals were matched. Among the matched individuals, a 16-page screening
questionnaire was mailed to 115,733 with valid addresses to collect information on demographic
characteristics (such as age, sex, zygosity, education, height, weight), development and growth for child
experience, reproductive history, medical history including diagnosed diseases in twins and their family
members and personal medicine uses, dietary preference and frequency, and lifestyle (such as smoking,
alcohol consumption and exercise). Total of 52,262 individuals, comprising 14,827 double respondent
twin pairs (both members of twin pairs responded; 5,542 MZ pairs, 8,764 DZ pairs and 521 unknown
zygosity pairs) and 22,608 single respondent twin pairs (only one member of twin pairs responded; 5974
MZ pairs, 15930 DZ pairs and 704 unknown zygosity pairs), completed and returned the questionnaires.
The crude overall response rate is 45.2%, which is comparable with or higher than similar cohort studies
[3]. In 1998, an updated version of the questionnaire with more detailed information on development,
diet and medical history was sent to a subset of twins born from 1957 to 1982. In comparison with
232
census data and California multiple birth record, the responding participants were representative of
native California twins regarding to age, sex, zygosity and residential distribution [1, 2].
In 2013, 626 double respondent MZ twin pairs were identified based on the information from
the screening CTP questionnaire collected in 1990s. The selection criteria include: 1) Discordant twin
pairs for self-reported colon polyps’ status; or 2) Both twins had at least one recent examinations,
including complete physical exam, chest X-ray, skin exam for cancer, stool exam for cancer, prostate test
for men only, or mammogram or PAP smear test for women only; or 3) age of 45 years or older in 2013.
The introduction letters were mailed to a set of 453 twins with the valid address for invitation in this
study. Of them, 161 twins were successfully contacted. An initial screening questionnaire was completed
through telephone, including the inclusion criteria: self-report with history of colonoscopy within 10
years; no history of colorectal cancer, autoimmune disease, diabetes, HIV/AIDS, diverticulitis,
inflammatory bowel disease (IBD), celiac disease, ulcer, gastrointestinal reflux; no history of surgery,
major changes in diet, or any antibiotic or steroid use within three months. Among 161 twins who had
contact achieved, 39 refused, and 19 were identified ineligible. A package containing the stool specimen
collection kit, questionnaires to collect covariates and dietary information, HIPAA forms and informed
consents were mailed to 103 eligible twins who agreed to participate verbally. Of those twins who
completed the questionnaires and/or return the stool specimen (1st collection), 3 months after their 1st
collection another stool kit and follow-up questionnaires for updated information about covariate and
dietary were sent (2nd collection). Finally, 95 twins (55 pairs: 40 double-respondent twin pairs and 15
single-respondent twin pairs) returned the signed consents and questionnaires (Figure 1a). Of them, 93
twins returned the 1st collection of fecal samples and 57 of them completed the 2nd collection.
Meanwhile, we retrieved colonoscopy reports from 67 twins. After excluding 8 fecal samples that failed
to pass quality control after sequencing, 4 twins who were found the history of gastrointestinal (GI)
diseases from their medical reports, and 7 twins who had previous history of colon adenomas, 49 twins
(35 pairs: 14 double-respondent twin pairs and 21 single-respondent twin pairs) from the 1st collection
233
and 35 twins (23 pairs: 12 double-respondent twin pairs and 11 single-respondent twin pairs) from the
2nd collection were included in the final analyses (Figure 1b).
6.3.2 Colorectal Polyps Ascertainment
In the initial screening questionnaire, all the invited participants were asked for their history of
colon polyps. After the 1
st
collection, participants who returned the fecal samples and indicated a
colonoscopy history in the questionnaire were asked for the contact information of the doctor who
performed the colonoscopy or the clinic where it was performed to collect the colonoscopy reports, and
the corresponding pathology reports if a colorectal polyp was presented. The colonoscopy/pathology
reports were successfully retrieved from 67 twins. The reasons of no reports included no colonoscopy at
all (N=6), missing pathology report to confirm the histological type (N=3), or failure of receiving reports
at the end of study (N=19). We reviewed all available medical reports and collect the related
information: number of colonoscopy/pathology reports, date of procedure, type and conditions of exam,
personal or family history of colon polyps or CRC (if any), number of polyps, grade of dysplasia, and
size/location /histological type (adenomas, hyperplastic polyps, serrated adenoma/polyps) for each
polyp. The colorectal polyp status (hyperplastic polyps or colorectal adenomas) was identified based on
the most recent exam, in which the only one case with mixed adenomas was categorized to adenomas
group. After excluding 7 twins without sequence data, 4 twins with the history of GI diseases and 7 twins
with previous history of adenomas prior to the most recent status (no polyps or hyperplastic polyps), 33
controls with no polyps (NP), 8 cases with hyperplastic polyps (HP) and 8 cases with colorectal
adenomas (CRA) were identified and included in the analysis.
6.3.3 Fecal Samples Collection
A kit that fits over the toilet seat with instructions and required materials was mailed to
participants for fecal collection. On the self-selected collection day, the participant attached the stool
234
catching pouch to the toilet seat. Defecating normally, the participant used the scoop on the lid of the
pre-labeled and pre-coded tube to obtain an aliquot (approximately 1 g each) from one stool. Two tubes
were pre-loaded with 5mL RNAlater™ (Qiagen, Gaithersburg MD) Specimen were immediately collected
in 2 pre-labeled tubes (Tube A for protein or metabolomics assays; Tube B for DNA extraction), placed in
an insulated soft-sided lunch box, stored in the refrigerator at 4°C overnight, and shipped on dry ice to
University of Southern California. In RNAlater™, fecal microbiota diversity and composition can keep
stable for more than one week at room temperature [56, 57]. In the end of collection, 93 twins (54 pairs:
39 double-respondent twin pairs and 15 single-respondent twin pairs) returned fecal samples for the 1st
collection and 57 twins (32 pairs: 25 double-respondent twin pairs and 7 single-respondent twin pairs)
of them completed the 2nd collection by sending back the follow-up specimens. A total of 150 samples
with a unique code number and no personal identifier were frozen at -80°C and shipped to the Ravel lab
at the Institute of Genome Sciences (IGS), University of Maryland for DNA extraction and sequencing.
6.3.4 16S rRNA Gene Sequence Analysis
Following microbial cell lysis, the microbial DNA was extracted from specimens to amplify 16S
rRNA sequences and store for future analyses, which procedures were described previously [58]. Two
universal primers 515F and 806R were used for PCR amplification of the V4 hypervariable region of the
16S rRNA genes. The purified amplicon mixtures and four negative controls to exclude contamination
during the PCR amplification process were sequenced on the Illumina HiSeq 2500 platform. Post-
sequencing microbial process and analysis were performed using the Quantitative Insights Into
Microbial Ecology (QIIME) pipeline version 1.8.0 [59, 60].
The raw sequences were processed to concatenate forward and reverse reads and to match end
sequences and barcodes. Low-quality sequence reads were discarded, including reads with average
Phred quality score less than 20, paired reads in which at least one read had length less than 75% of its
original length before trimming, and reads with less than 60% similarity to the reference taxonomic
235
database. Chimeric sequences, which are PCR artifacts composed of two or more distinct template
sequences, were detected and removed using UCHIME [61]. Remaining sequences were clustered into
species-level operational taxonomical units (OTUs) at 97% identity using the open-reference based
method [61] and the USEARCH61 algorithm [62], and then assigned to taxonomic labels based on the
Ribosomal Database Project (RDP) Bayesian classifier [63], resulting in 12 phyla, 25 classes, 45 orders, 97
families, 267 genera and 396 species. GreenGenes taxonomic database version 13_8 [64] was the
reference database for both quality control and taxonomic assignment. To estimate the phylogenetic
distance, OTU sequences were aligned with the PyNAST algorithm [65] and the phylogenetic tree was
constructed using the FastTree method [65]. OTUs with only one read or in only one sample, as well as 8
samples with less than 2000 reads were excluded from the further analyses. The number of sequences
obtained ranged from 37,994 to 436,864 per sample.
To measure the diversity of OTUs within a sample, alpha diversity was estimated as the number
of observed OTUs (for richness), Shannon index (for richness and evenness by adding information of the
relative abundance of OTUs) [66], and phylogenetic diversity (PD) whole tree (using information from
phylogenetic tree) [67]. To evaluate the phylogenetic similarities between microbial communities of two
samples, UniFrac distance, a beta diversity measure, was calculated for the degree to their shared
branch lengths on the phylogenetic tree, in which weighted UniFrac distance considers the abundance
of taxa while unweighted UniFrac distance does not [68, 69]. To remove the bias due to unequal
sequence depth across samples for diversity comparisons, the data were rarified by randomly sampling
37,994 sequences from each sample for 20 times without replacement. Both alpha diversity and beta
diversity were calculated by averaging over 20 rarefied tables. Relative abundance was estimated as the
proportion of an assigned taxon for each sample from the unrarefied data.
236
6.3.5 Covariates measures
Along with each fecal sample collection kit for each collection, a demographic questionnaire and
a dietary questionnaire were sent to the participant twins. The demographic information included age,
sex, race/ethnicity, current height and weight, recent use of probiotics and yogurt, and family history of
CRC. Years since last colonoscopy was the time interval calculated between the date of fecal collection
and the date of the most recent colonoscopy.
The dietary questionnaire was adopted from the dietary session in the original CTP
questionnaire, consisting of a set questions of preference and frequency for a list of food items. Food
frequency was used to evaluate the overall consumptions of red meat (beef and bacon, as well as pork
that was only available in the updated version of questionnaire), vegetables (beans, tomato, broccoli,
spinach, greens and carrots), and fruits or juice by calculating the average servings consumed per week
and then dichotomized based on the sex-specific median values for each twin. Score of 1 was assigned
to the twins who consumed the food category at the value equal to or greater than the median, while
the others was assigned with 0 score. In addition, a follow-up questionnaire was also sent to the
participant twins to collect the additional information of their dietary changes after the colonoscopy.
6.3.6 Statistical Methods
All statistical analyses were performed in R version 3.3.0 [70].
Both single-respondent twins (only one member in a twin pair participated) and double-
respondent twin pairs (both members in a twin pair participated) were included in the analyses. To
account for the correlations of twins within the same twin pair, a random effect term using twin pair IDs
was introduced to the mixed effect models, unless stated otherwise. Because of the small number in
each polyp group and no apparent fecal microbiota difference in either diversity or the overall
composition for each covariate, no covariates were adjusted in the models, however some of which,
such as age, race/ethnicity, sex, genetic background as well as unmeasured early childhood exposures,
237
were matched in the double-respondent MZ twin pairs. The main analyses were performed using the
fecal microbial data from the first collection due to the maximal statistical power. The second collection
was used to test the microbial stabilities between two collections and served as a sensitivity analysis.
To assess the association between polyp status and alpha diversity, we fitted a mixed ANOVA
model for each alpha diversity measure in both the full data and a subset limited to only individuals
whose fecal samples were collected within 3 years after colonoscopy. The differences in beta diversity
between polyp status were visualized in a two-dimensional plot using principal coordinates analysis
(PCoA) method based on the weighted UniFrac distance, in which each axis (principal coordinate)
indicates the percent of the variation explained by this dimension. The combined effect of the first two
principal coordinates differing between any two polyp statuses was tested by the likelihood ratio test
(LRT) in mixed logistic models. PCoA plot was generated using R “vegan” and “phyloseq” packages. P-
values < 0.05 were considered significant.
To identify taxa associated with polyp status while accommodating the sparse, non-normally
distributed count data, we conducted differential abundance analysis at the phylum, class, order, family,
and genus levels, using a zero-inflated negative binomial mixed model (R “glmmADMB” package)
(Appendix A). False discovery rate (FDR) was controlled using Benjamini-Hochberg (B-H) procedure [70]
(function “P.adjust” in standard R package) to correct for multiple testing at each taxonomic level, and
FDR-adjusted P values < 0.1 were considered significant. To quantify the effect size of the differentially
abundant taxa between any two polyp statuses, the estimated coefficients with 95% confident intervals
(CIs) from models were reported, which can be interpreted as the logarithm form of the relative
difference in relative abundances between two polyp statuses after controlling for twin pair status. This
analysis was limited to taxa presenting in at least 25% samples.
To show the microbial stability between two fecal collections for 3 months apart regardless of
polyp status, intraclass correlation coefficients (ICC) (function “ICC” in R “psych” package) were
calculated for both alpha diversity (number of OTUs, Shannon index and PD whole tree) and relative
238
abundances of the common genera among individuals who completed both collections. It is generally
considered to be a fair correlation when ICC value is between 0.4 and 0.59 while an excellent correlation
when over this range but a poor one when below it. PCoA based on the weighted UniFrac distance was
also used to test the difference in beta diversity between two collections by LRTs in conditional logistic
regression models. Additionally, the similarities in the fecal microbial communities between two
collections from the same individual (“Within individual”, N=35), between twins in the same twin pair
(“Within twin pair” from 1
st
collection, N=14), and between 2 individuals who are not twins for each
other by randomly pairing any non-twin individuals (“Non-twin pair” from 1
st
collection, N=1162) were
measured using unweighted UniFrac distances, which differences between groups were tested using
ANOVA with Tukey’s method for controlling multiple tests. P-values < 0.05 were considered significant.
Furthermore, to test whether individuals who had been diagnosed with colorectal adenomas
previously but did not at the most recent colonoscopy were different from those who had been
diagnosed with adenomas most recently in their first time, the major analyses were repeated by
comparing previous adenomas (PA) group to non-polyp (NP) group as well as the original colorectal
adenomas (CRA) group.
6.4 Results
6.4.1 Study Participants Characteristics
We included a total 49 colonoscopy-screened individuals in the current analysis, composed of 33
non-polyp controls (NP), 8 cases with hyperplastic polyps (HP), and 8 cases with colorectal adenomas
(CRA). In terms of twin pair status, these 49 individuals represented 35 twin pairs, including 21 single-
respondent twins and 14 double-respondent twin pairs. Of 14 pairs, there were 1 HP/CRA discordant
pair, 4 NP/CRA discordant pairs and 9 NP/NP concordant pairs. Compared to NP controls, both HP cases
and CRA cases were slightly younger at participation, consumed more probiotics or yogurt products
recently, as well as were more likely to have family history of CRC (Table 6.1). In this study population,
239
over 80% participants were white females, in which more male patients were presented in HP and CRA
cases. Slightly higher percent of obese individuals were found in HP cases than NP controls and CRA
cases (Table 6.1). The most recent colonoscopies obtained in this study were performed in the past
within half a year to 13 years before fecal sample collection, in which both polyp cases had more recent
screening than NP controls (median 1.5 years for both HP and CRA cases vs. 5 years for NP controls prior
to participation; p HP vs. NP = 0.03 and p CRA vs. NP = 0.01, respectively) (Table 6.1). Given the younger cases, all
the participants actually had their last colonoscopy at the same median age of approximate 52, which is
consistent with the current recommendations for initiating CRC screening at age 50 years [71]. In terms
of the fecal microbiota, the overall composition showed the marginal separation at the participation age
of 55 years (Figure 6.11, p = 0.08), with the insignificantly higher alpha diversity among the older
participants (Table 6.6). In addition, more CRC cases were found among the families of HP and CRA cases
than those of NP controls (Table 6.1), while the participants who had the family history of CRC had more
diverse bacteria (Table 6.6, p observed OTUs = 0.04 and p shannon = 0.06) but not significantly different
compositions (Figure 6.11) in their stools compared to those without such family history.
Polyp type and histology were accessed based on pathology reports. Of 8 HP cases (Table 6.1),
there were 2 cases found more than 2 polyps, one of which had polyp size no less than 10 mm. Except
for 2 cases lack of information, polyps located at the left colon were observed in 5 out of the left 6 HP
cases. On the other hand, 10 polyps with varied sizes (diminutive to large) were found in one
categorized CRA case, which had a mixed histological structure including hyperplastic polyps,
conventional colorectal adenomas as well as serrated adenomas and were located along the colon and
rectum tract. Of the left 7 CRA cases, 3 cases had polyps at the right colon, 3 had at the left colon, and
there was 1 rectal case. Additionally, high grade dysplasia was reported in one CRA case (Table 6.1).
Thus, based on the current risk stratification for colorectal adenomas [25, 27, 28, 72-74], 6 CRA cases
were considered at low risk for CRC, defined by the criteria of 1-2 tubular adenomas in size less than 1
240
cm and without high grade dysplasia, while the other two had advanced adenomas with the greater
malignancy.
6.4.2 Diversity Analysis
We first investigated microbial alpha diversity of the individuals according to polyp status. In
general, CRA cases tended to have highest community diversity, followed by HP cases and then the
lowest was found in NP controls, regardless of the alpha diversity measures (number of OTUs, Shannon
Index, and PD whole tree) (Figure 6.2a). However, the differences were not significant. Since the longer
time interval from the previous colonoscopy might lead to the attenuated differences, we limited
participants to those who had colonoscopy within 3 years prior to fecal sample collections (Figure 6.2b).
The diversity difference became more apparent between CRA cases and NP controls, but more obscure
comparing HP cases to NP controls. Particularly, a significantly more richness (number of OTUs) in
microbial community was found in CRA cases than NP controls (p = 0.02).
To assess the overall fecal bacteria composition between samples, PCoA plot based beta
diversity (weighted UniFrac distance) showed the significantly different patterns between CRA cases and
NP controls (p = 0.02) (Figure 6.3). Although the coordinates estimated from HP cases tended to locate
them between CRA cases and NP controls in the plot, there was a marginally significant difference
between HP and CRA cases (p = 0.08) while no difference between HP cases and NP controls (p > 0.05)
(Figure 6.2).
6.4.4 Compositional Analysis
We next explore taxonomic features of the fecal microbiota by polyp status. First, across all 49
individuals, 3 phyla (Firmicutes, Actinobacteria and Bacteroidetes) were dominantly observed in fecal
samples, composing of 58%, 16% and 12% of the microbiota, respectively. For the exploratory purpose,
relative abundances of the 15 most abundant genera across all samples were plotted to a stacked box
241
plot by polyp status (Figure 6.4), in which the most abundant genus Bacteroides was located at the
bottom, and the 15
th
abundant genus was at the top which distance to the top boundary of the plot
(total relative abundance = 1 for each sample) was the sum of relative abundances across all the other
genera. By intuition, the average abundance of the genus Bacteroides was gradually reducing from NP
controls to HP cases, and then to CRA cases. In the direct view, the genus Collinsella in CRA cases tended
to be more abundant than the other two groups, while the abundance of the genus Methanobrevibacter
in HP cases seemed depleted compared to the other two groups. Additionally, it is noted that the
incomplete genus or family labels were probably due to lacking the taxonomic information in the
reference database or failing to align the matched sequences, which were presented in a higher
taxonomic level when possible.
Then, by fitting into statistical models, we identified 17 taxa ranked from phylum to genus that
were differentially abundant (FDR adjusted p < 0.10) in at least one comparison in 3 pairwise
comparisons: CRA cases vs. NP controls, HP cases vs. NP controls, and CRA cases vs. HP cases (Table 6.2).
The resolution of the current 16S rRNA gene sequencing technique limits the ability to examine at
species level. As a result, compared to NP controls, 2 taxa (the family Peptostreptococcaceae, and the
genus Klebsiella) were more abundant in fecal samples from CRA cases, 7 taxa (the families
Streptococcaceae, Micrococcaceae and Bacteroidaceae, and the genera Desulfovibrio, Rothia,
Streptococcus and Bacteroides) were depleted from the CRA fecal microbiota. Comparing with the same
control group, HP cases exhibited the greater abundance in the phylum Firmicutes but less in 4 taxa (the
class Alphaproteobacteria, the family [Odoribacteraceae], and the genera Actinomyces and
Butyricimonas). When we compared CRA cases to HP cases, 3 taxa (the class Betaproteobacteria, the
order Burkholderiales, and the family Peptostreptococcaceae) were found overrepresented in CRA fecal
samples, while 4 taxa (the families Streptococcaceae and Micrococcaceae, and the genera Desulfovibrio
and Pseudomonas) were less abundant in adenoma cases. Furthermore, by comparing the effect sizes
with each other, the lower abundances of the families Streptococcaceae and Micrococcaceae, and the
242
genera Desulfovibrio as well as the higher abundance of the family Peptostreptococcaceae seemed to be
the unique features for CRA cases, since they were relatively similar between HP cases and NP controls
but both significantly different from CRA cases. No such feature was found specifically to HP cases.
6.4.5 Fecal Microbiota Stability
The second fecal collection 3 months apart from the first collection provided opportunities to
test the fecal microbiota stability between collections, as well as between individuals. There were 35
individuals who completed both collections, containing 21 NP controls, 7 HP cases, and 7 CRA cases. In
terms of twin pairs, they were from 11 single-respondent twins and 12 double-respondent twin pairs
which included 1 HP/CRA discordant pair, 4 NP/CRA discordant pairs, and 7 NP/NP concordant pairs.
Thus, single-respondent twins in NP control group seemed to be the main source of loss to follow-up.
The distributions of the demographic characteristics from this subset participants were fairly similar with
those who were arisen from (Table 6.3), except that the left NP controls in this subset seemed to have
the comparable consumption of probiotics or yogurt. Moreover, both diversity analyses and taxonomic
differential abundance tests were also performed to the microbial data from this second collection. Of
them, we again observed increased alpha diversity for all three measures in CRA cases compared to NP
controls but the differences were not significant, while no difference was found between HP cases and
NP controls (Figure 6.5a). Based on PCoA and the statistical tests, no distinct microbial composition was
separated by polyp status between each other. However, by visualizing the PCoA plot (Figure 6.5b), the
variance from CRA cases might be leveraged by a single outlier who was one of advanced adenoma
patients. Unfortunately, we failed to reproduce any polyp-associated taxa that were identified in the
fecal microbiota from the first collection at a significance level of FDR-adjusted p < 0.10 (Table 6.4).
We tested fecal microbial stability by calculating intraclass correlation coefficients between two
fecal collections on both alpha diversity and taxonomic relative abundances. For alpha diversity
measures (number of OTUs, Shannon index and PD whole tree), we pooled all 35 pairs of data together
243
regardless of polyp status and found the overall fair to excellent agreements between collections
(ICC OTUs = 0.58, ICC Shannon = 0.50 and ICC PDtree = 0.67) (Figure 6.6a). Since variations in microbial stability
usually have more impact on the lower abundant taxa, we plotted the ICCs calculated from the genera
shared by the microbial communities from two collections against the mean relative abundance of each
genus across all samples by polyp status (Figure 6.6b). As expected, we observed an overall trend that
more abundant genera were more stable between collections, which average ICCs ranged from 0.21 to
0.53. Then by polyp status, NP controls tended to have more consistent levels of genera with low
prevalence than both cases while these less abundant genera seemed to be the least stable in HP cases.
Interestingly, such relative positions were switched for the genera with mean relative abundance
greater than 10
-4
, which was the most stable in HP fecal samples but the least in NP controls.
Furthermore, we compared the overall bacterial compositions between collections using PCoA on
weighted UniFrac distance (Figure 6.6c) and didn’t observe the significant difference (p = 0.31).
Furthermore, with a set of MZ twins, we were also able to compare intra-twin (“Within twin pair”)
variation in fecal microbial composition with intra-individual (“Within individual”) and inter-individuals
(“Non-twin pair”) using beta diversity, despite polyp status (Figure 6.6d). We confirmed that the
bacterial communities were more similar in the same individual over time than between different
individuals regardless of whether they were twins or not (p Within.Ind vs. p Within.Twin <0.001 and p Within.Ind vs.
p Non.Twin =0.003). Meanwhile, the twin members in the same pair were likely to share more fecal
microbiota comparing to any two unrelated individuals (p Within.Twin vs. p Non.Twin <0.001).
6.4.6 Sensitivity Analysis - Previous Adenomas
In addition, individuals who had previous history of colorectal polyps before the index
colonoscopy were usually excluded from the other studies. In our study, we identified 5 participants (PA)
who had been shown free of colorectal polyps at the most recent colonoscopies that were performed
within 2 years, but had the previous adenomas history 4-15 years before. Due to our specific study
244
design, these PA individuals led to the difficulty for us to define them as cases or controls since our
definition was based on the most recent colonoscopy. Thus, although including them in the analysis as
controls didn’t change the findings, we decided to exclude PA individuals from the main analysis but also
tested their possible differences from our CRA cases and NP controls. By comparing PA individuals to NP
controls, there were no significant differences in both alpha diversity and beta diversity (Figure 6.7).
However, PCoA analysis on weighted UniFrac displayed a distinct separation of the fecal microbiota
compositions between PA individuals and CRA cases. By examining the taxonomic relative abundance
(Table 6.5), we recognized compared to NP controls, that fecal samples in PA individuals harbored more
microbe from the family Enterobacteriaceae in the order Enterobacteriales as well as the genera
Peptostreptococcus and [Eubacterium], but less from the unidentified genus in the order
[Barnesiellaceae] and the genus Peptococcus. On the other hand, the microbiota in CRA cases was
different from that of PA individuals with the more abundant genus Collinsella and two more
unidentified genera from the family Pepto-streptococcaceae and the family [Barnesiellaceae], while less
abundant genus Finegoldia.
6.4.7. Sensitivity Analysis - Excluding the Genus Bacteroides
Compared to NP controls, we observed the higher alpha diversity in the stools from CRA cases
with the reduced abundance of the genus Bacteroides. In this study, nine species including an
unidentified group (B. unidentified, B. acidifaciens, B. caccae, B. coprophilus, B. eggerthii, B. fragilis, B.
ovatus, B. plebeius, and B. uniformis) were assigned to the genus Bacteroides. Of them, the unidentified
species group contained the largest Bacteroides spp. group, consisting of nearly 8% of fecal microbiota
across the samples. In relation to the polyp status, the abundances of five Bacteroides species (B.
unidentified, B. caccae, B. fragilis, B. ovatus, and B. uniformis) were significantly decreased in the stools
of CRA cases comparing to those of NP controls, while only B. caccae showed less abundance in HP cases
than NP controls (Figure 6.8, each p < 0.05).
245
Recent studies suggest that some specific species from Bacteroides may produce antibacterial
peptides to limit pathogen’s colonization to the GI tract [104]. As a result, the less abundant Bacteroides
in CRA cases might promote the overgrowth of pathogens in the gut, thus leading to the further
dysbiosis that could be observed as the more diverse microbiota in their fecal samples. To test this
hypothesis, we excluded all the taxa from Bacteroides and then repeated the diversity analyses. After
removing Bacteroides, alpha diversity in relation to the polyp status didn’t change greatly for both the
number of OTUs and Shannon index. However, CRA cases seemed to have more even fecal bacteria than
NP controls, particularly for those who had the colonoscopy within 3 years (Figure 6.9, p Equitability = 0.08),
suggesting the possible overgrowth of the rare species in response to the less control from Bacteroides.
Moreover, the PCoA based on the weighted UniFrac distance found that, without Bacteroides, the
separation in the fecal bacterial community was attenuated between CRA cases and NP controls (Figure
6.10, p = 0.27), but slightly strengthened between HP cases and NP controls (Figure 6.10, p = 0.06). Thus,
the common genus Bacteroides may contribute more to the microbiota composition in CRA stools than
HPs.
6.4.8. Diet and the Fecal Microbiota
Dietary habit is one of the major factors affecting the fecal microbiota [44-47]. However, our
study didn’t find the significant differences in the overall fecal bacterial community by the
probiotics/yogurt use or the consumptions of red meat, vegetables or fruits/juice (Figure 6.11).
Although evidences showed that the microbiota were more diverse in the stools from the participants
who had more probiotics/yogurt or more vegetables, but less from those who consumed more red meat,
none of these associations was statistically significant (Table 6.6).
Because the fecal sample collection and the dietary habit assessment were conducted after the
colonoscopy, dietary changes after the diagnosis of polyps or even just the procedure could be one of
the possible explanations for the fecal microbiota differences between polyp groups. Therefore, a
246
follow-up dietary change questionnaire was sent to all the participant twins to evaluate if and how they
changed their dietary habit after the colonoscopy. 42 twins out of the total 49 study participants
completed and returned the survey, leading to an 85.7% response rate. Of them, 24 twins (57%)
indicated the changes in their diets. A heathier dietary pattern, including less beef consumption but
more vegetables, fruits, yogurt and/or probiotics, was found in most of the changes. However, such
changes didn’t differ by the polyp status (Table 6.7).
6.5 Discussions
In this cross-sectional study using a set of MZ twins from the California Twin Program to partially
control for the host genotype and early life exposures, we found that the fecal microbiota diversity and
composition were different between CRA cases and NP controls even years after colonoscopy and
polypectomy. Such differences became subtle between HP cases and NP controls. Meanwhile, we also
examined the microbial stability using a subset of participants who completed two fecal collections 3-
month apart. We showed the reproducible patterns in diversity between collections, suggesting the
adult fecal microbial community is relative stable within individuals over time compared to the inter-
individual differences. Furthermore, the more similar microbiome in MZ twin pairs than any two
unrelated individuals indicated the host genetic contribution to the gut microbial environment.
Before we elaborate our findings, it is notable that the human microbiome study is such a young
field still facing lots of challenges and opportunities. Despite the rapid growth of next-generation
genome sequencing technology and analytic tools, current lack of standardized protocols results in the
highly heterogenous results even on the same research question, due to the practical differences in
sample types and process, sequencing platforms as well bioinformatics and statistical methods.
Moreover, the extremely dynamic property of the human gut microbiota interacting with both the host
and the environment leads to more extra difficulties to interpret the microbiome results as well as
compare the findings across studies. For this reason, we searched on the internet resources to date and
247
identified 18 published human studies specifically designed to investigate the relationship between the
gut microbiota and colorectal polyps or adenomas [54, 55, 75-90]. By carefully comparing with our study
design, we limited our interest to 8 similar studies using fecal samples for comparison purpose [54, 55,
79, 81, 83, 85, 87, 89]. The exclusion criteria included mucosal biopsy sample only, no 16S rRNA
sequencing results available for fecal samples, distinct “in-house” bioinformatics pipeline, or lack of
statistical tests. Of these 8 similar studies, Peters et al. study [87] was the only related study that
stratified the analysis into different histological types of colorectal polyps, including hyperplastic polyps,
conventional adenomas, and other advanced adenomas like our study. Additionally, we also included
the only metagenomic sequencing study by Feng et al. [83] to compare the diversity result but not for
the polyp-associated microbe, since sequencing methods have greater impact on the on the microbial
composition [91]. Furthermore, only polyp-associated taxa that have been identified consistently across
at least 2 similar studies are considered a reliable finding for comparison. The summaries were
described in detail in Chapter 5 (Table 5.2, 5.3 and 5.4).
Our finding of increased OTUs richness and diversity in CRA fecal samples was unexpected at the
very beginning, because it has been commonly accepted that loss of microbiota diversity due to
intestinal dysbiosis, referring to unbalanced microbiota, is supposed to be harmful, and has been
associated to several diseases, such as obesity and inflammatory bowel disease (IBD) [92]. Thus, a high
alpha diversity is usually an indicator of a better health condition. Reduced diversity has also been found
in colorectal cancer patients, but the current evidences are still contradictory with each other [11, 93,
94]. Of the selected 8 similar fecal microbiome studies on colorectal polyps, Peters et al. [87] reported
significantly decreased richness and evenness in adenoma, while Feng et al. [83] found fecal microbiota
from advanced adenoma patients presented a significantly higher richness but not evenness using
metagenomic sequencing. In this study, our adenoma cases were a combined group of low-risk
conventional adenomas (75%) and advanced cases (25%). Additionally, Peters et al. [87] also observed
the higher alpha diversity in HP cases than NP controls. Furthermore, besides other technical differences,
248
we should also address that unlike most of these similar studies in which fecal samples were collected
either weeks before or months after colonoscopy, this study collected stools in some time point at years
after colonoscopy and polypectomy when needed. As a result, we might capture the gut microbiota
community at the different timing in relation to colorectal polyps. Although the closer collection time
relative to the disease condition is more likely to detect polyp-associated microbes, the fecal microbial
composition from the prior-collections still could be affected by the stresses, a known factor influencing
the gut microbiota [93, 94], from an upcoming invasive procedure, while the variations in the after-
collections might be subject to the effects of procedure itself (such as bowl preparation) or the changes
of disease condition (such as removal of polyps). Some evidences suggest that fecal bacterial
composition can be restored within 2 weeks after colonoscopy [95-100], however, to what extent this
recovery can be as well as whether this restoration may be associated with condition changes is still
unclear. As we mentioned in the introduction, the removal of polyps during colonoscopy can
substantially reduce CRC risk and death [15], but has also been linked to an increased rate of polyp
recurrence or even an accelerated CRC development [16-24], possibly due to the advanced age,
unfavorable lifestyles and/or genetic predispositions, or missed or incompletely resected lesions at the
initial colonoscopy [23, 25-28]. Such deleterious features could presumably be carried over in later life.
Interestingly, the large study conducted by Peters et al. [87] actually combined fecal samples from 2
study sites with different collection time: up to 4 months before colonoscopy in CDC (N=398) and within
3 years after colonoscopy in NYU (N=142). Unfortunately, they didn’t specifically address how similar the
fecal microbiota between two sample sources were. In addition, our study also suggested that people
are likely to change their diet in a healthier way after the colonoscopy, possibly due to the result of the
procedure, or the procedure itself, or even aging. A healthy dietary habit has been associated with a
more diverse gut microbiota community [44-47]. However, we observed neither the fecal microbiota
differences by food categories, nor the dietary changes differing by polyp groups. The possible reasons
could be that the food frequency questionnaire adopted from the one that was developed 20 years ago
249
may not be able to accurately capture the current dietary habit, or that we may lack of systematic
analysis for the nutrient components or metabolites that seem more important in a human gut
microbiome study.
In fact, beta diversity analysis in our study exhibited an apparent separation in the fecal
microbial compositions between CRA cases and NP controls, suggesting the microbiota alteration in
relation to the previous colorectal adenomas may also persist for years. Such microbial differences in
the overall composition have been consistently found in other similar studies [79, 87, 89]. In addition,
except for those predisposed risk factors, changes in human behavior due to colorectal polyps can also
be a potential explanation to a distinct, more diverse microbiota in stool. For example, we noticed that
our polyp cases tended to consume more probiotics or yogurt than NP controls. Meanwhile, in an
exploratory examination, by comparing to the same food frequency questionnaire at the CTP
recruitment in 1990s, our polyp cases were likely to change to a healthier diet (such as less meats and
more vegetables) than NP controls at fecal collection. Moreover, a follow-up questionnaire confirmed
the dietary change among nearly 60% of study participants after colonoscopy. And such healthy dietary
habits are usually associated with a greater diversity in the gut microbiota [47, 101, 102]. Overall, as a
combined result of adenoma risk factors as well as any related and unrelated life changes after the
adenoma removal, our observed microbiota alteration might not be the exactly same as the one at the
onset of adenomas, but it provides a snapshot of a gut microbial environment in relation to colorectal
polyps and colonoscopy procedure years after the initial colonoscopy, which, to our knowledge, hasn’t
been reported before. And we think it should be equally important to identify polyp-associated
pathogen for etiological study and diagnostic purpose as well as potential protective bacteria for CRC
prevention and therapy.
Accordingly, we profiled the taxonomic features unique to our CRA cases by testing their
differential abundances relative to NP controls. A major alteration in fecal microbiota composition
observed in CRA cases was the depleted abundances of the families Streptococcaceae, Micrococcaceae
250
and Bacteroidaceae, and the genera Desulfovibrio, Rothia, Streptococcus and Bacteroides, as well as the
overrepresentation of the family Peptostreptococcaceae and the genus Klebsiella. Of them, the
depletion of the genus Bacteroides in CRA cases has been consistently observed in two similar studies
[55, 79]. Bacteroides is commonly considered as a group of protective bacteria in the colon as a
producer of short chain fatty acids (SCFAs) that has anti-tumorigenic features [103]. Moreover, our
sensitivity analysis excluding the genus Bacteroides found the unchanged numbers but more even
distribution of the fecal bacteria in CRA cases than NP controls, suggesting the depleted Bacteroides may
result in the overgrowth of some rare species. This finding is consistent with the current evidences that
some Bacteroides species can produce antibacterial peptides to prevent pathogens from colonizing in
the gut [104]. However, the genus Bacteroides had also been frequently identified in CRC-associated
microbiome studies with the conflicting effects [11, 93, 94]. Within this genus, the species B. fragilis,
particularly enterotoxigenic B. fragilis (ETBF) has been suggested pathogenic roles in CRC development
[50, 51], while other species, such as B. thetaiotaomicron, display beneficial commensals in involving in
nutrition pathways, or adapting environmental changes and stress in the GI tract [104]. As a result, the
apparent effect of a genus in relation to a health condition might be a mixed result of all the effects and
the relative abundances from the species within this genus, which is true to any higher taxonomic rank.
Unfortunately, the current 16S rRNA gene sequencing technology has limitations to identify the rare
species or bacteria substrains. For instance, our microbial data failed to recognize Fusobacterium
nucleatum that is one of the most interesting species associate with both adenomas and CRC. In
addition, fecal microbiota, as a proxy of the overall view of the microbial community along the GI tract,
may be less informative to address the local molecular function in the colon. Therefore, instead of
tearing them apart for each underlying biology, particularly at the time when the previous knowledge is
limited, we would rather treat this identified taxonomic features as an exploratory signature for CRA
cases and utilize as a potential biomarker to better detect CRA in a following-up surveillance program
post colonoscopy. Then, by incorporating the taxonomic pattern comparing CRA cases to HP cases, we
251
recognize the fecal taxonomic signature for CRA cases differing from both HP cases and NP controls
including the depleted abundance of the families Streptococcaceae and Micrococcaceae as well as the
genus Desulfovibrio but the greater abundance of the family Peptostreptococcaceae.
On the other hand, we also included hyperplastic polyps in this study for both fecal microbial
diversity and compositional analysis, which hasn’t been addressed previously besides by Peters et al.
study [87]. As a common histological type of colorectal polyps, hyperplastic polyps are usually
considered as clinically insignificant lesions with no malignant potential to adenomas or CRC [6, 7]. But
recent accumulating evidences suggest hyperplastic polyps may in fact be benign early neoplasms, since
they allow the accumulation of genetic changes in a normal crypt cell, and represent clonal expansion
and neoplastic proliferation of colonic epithelial cells carrying mutations or epigenetic alterations [105,
106]. However, consistently with the findings from Peters et al. [87], our HP cases did not differ in either
diversity or composition from controls or CA cases, though they tended to be at an intermediate stage
between controls and CRA cases in our study. By comparing to NP controls, HP cases exhibited the
greater abundance in the phylum Firmicutes but less in the class Alphaproteobacteria, the family
[Odoribacteraceae], and the genera Actinomyces and Butyricimonas, none of which replicated the HP-
associated taxa identified by Peters et al. [87].
Moreover, our study showed an on average fair to excellent correlation on alpha diversity and
taxonomic relative abundance between 2 fecal collection 3 months apart, though more variations were
observed in the lower abundant taxa. And the repeated measures were quite similar in the overall
microbiota compositions, which was expected since it is generally accepted that the human adult gut
microbiome is relatively stable over time unless there is substantial changes in health condition or
lifestyles. However, such microbial stability was rarely reported in an association study especially among
most of participants like our study, probably due to the financial cost and data complexity. In the
selected similar studies, Peters et al. [87] had tested but only in 4 volunteers. The reason that our
second fecal collection failed to reproduce the 17 polyp-associated taxa identified in the main analysis
252
using an agnostic approach is likely because of small sample size in this study. Without adjusting for
multiple comparison, we recognized 6 of them from the fecal microbiota 3 month later at significance
level of p <0.05, in which 10 taxa displayed the same relative relationships of the differential abundance
in comparisons between CRA cases vs. controls, HP cases vs. controls, and CRA cases vs. HP cases. And
another plausible reason is the individuals who completed the second collection might have more
concern about their health, since the major source of loss to follow-up was from single-respondent NP
controls. In this situation, if their concerns were from the related family history or other prior knowledge
to colorectal polyps, the observed differences in the fecal microbe associated to polyps from the follow-
up could be attenuated, while their choice of a healthier diet than cases might spuriously enlarge the
different abundances in dietary-related bacteria in fecal samples. Additionally, our MZ twins shared
more phylogenetic lineages within the same pair than any two unrelated individuals, confirming that the
host genetic background and shared early childhood exposures are essential for shaping the adult gut
microbiota [107, 108].
Strengths of this study include MZ twins partially controlling for host genome and early life
experience, unique study design profiling fecal microbial signatures post colonoscopy, different
histological types of polyps (colorectal adenomas and hyperplastic polyps), and microbial stability test.
However, this study also has several limitations. First, this is a small study with limited statistical power
to control for other covariates or conduct interaction or mediation analysis testing for the potential
interplay between the host, the gut microbiota, and the environment. Such highly dynamic networks will
be better described in larger studies. Additionally, although a genetic- and early childhood-matched MZ
twin design rescued some power, inclusion of both single-respondent and double-respondent twin pairs
can also lead to analytic difficulty especially for the sparse, highly skewed microbial data. However,
despite the small number, the replications from the repeated measures enhanced our confidence in our
significant findings. Second, we didn’t have knowledge of polyp status as microbiota was measured, thus
resulting in an inherent temporal ambiguity issue for such a “cross-sectional” study. Our study is a
253
unique design to capture the fecal microbiota features after colonoscopy and the removal of polyps if
any, which provides insight into systematic differences in the gut microbiota due to the previous polyp
condition as well as the procedures and life changes in response to the polyp history, and eventually
utilize to develop biomarkers as a diagnostic tool or predicting the risk for polyp recurrence or CRC
development as a part of surveillance program. However, lack of current polyp status limited the
applications of our findings. A proposed longitudinal follow-up study to track their current polyp status
will be able to better detect specific bacteria responsible for polyp initiation and growth. Third, using
16S rRNA gene sequencing method on fecal samples in this study technically restricted us to detect
polyp-associated species and investigate their underlying biology. On the other hand, they were
convenient and cost-efficient for frequent sampling, which are also important to design screening tools
for CRC prevention. Future studies incorporating colorectal mucosal samples and shotgun metagenomic
sequencing method will better characterize the functional gut microbiota associated with colorectal
polyps. Further limitations are the mostly white female study population which findings may not be
generalized to males or other racial groups, and a mixed adenoma group including both low-risk and
advanced adenomas limiting our further application for risk stratification.
In conclusion, our results indicate that the alterations in the fecal microbiome among colorectal
adenoma patients may persist even years after colonoscopy and polypectomy, possibly due to
adenomas or other factors related to adenomas, such as screening procedures or lifestyle changes.
Although our study was lack of power to identify microbes separately associated with such factors, the
characterized group of fecal bacteria which composition discriminates adenoma patients and controls in
this study might be used as a taxonomic signature to possibly identify metachronous adenomas, which
are usually more advanced and prone to CRC, in a post-colonoscopy surveillance program, or provide
insights to the potential beneficial bacteria preventing the recurrence of adenomas or CRC development.
A large-scale longitudinal microbiome study in combination with the advanced shotgun metagenomic
sequencing technique will be needed to provide a more comprehensive picture to pinpoint the
254
functional microbes to adenomas or other related factors, thus predicting the risk of polyp recurrence or
even CRC, and then developing and optimizing preventative and therapeutic strategies for CRC.
255
Chapter 6 references
1. Howlader N, N.A., Krapcho M, Garshell J, Miller D, Altekruse SF, Kosary CL, Yu M, Ruhl J,
Tatalovich Z,Mariotto A, Lewis DR, Chen HS, Feuer EJ, Cronin KA (eds), SEER Cancer Statistics
Review, 1975-2012, National Cancer Institute. Bethesda, MD,
http://seer.cancer.gov/csr/1975_2012/, based on November 2014 SEER data submission, posted
to the SEER web site, April 2015. 2015.
2. Group, U.S.C.S.W., United States Cancer Statistics: 1999–2012 Incidence and Mortality Web-
based Report. Atlanta (GA): Department of Health and Human Services, Centers for Disease
Control and Prevention, and National Cancer Institute. 2015.
3. Society, A.C., Colorectal Cancer Facts & Figures Special Edition 2005. Oklahoma City, OK:
American Cancer Society, 2005.
4. Labianca, R., et al., Colorectal cancer: screening. Annals of oncology: official journal of the
European Society for Medical Oncology/ESMO, 2004. 16: p. ii127-32.
5. Levine, J.S. and D.J. Ahnen, Adenomatous polyps of the colon. New England Journal of Medicine,
2006. 355(24): p. 2551-2557.
6. Bettington, M., et al., The serrated pathway to colorectal carcinoma: current concepts and
challenges. Histopathology, 2013. 62(3): p. 367-386.
7. Hassan, C., et al., Systematic review with meta ‐analysis: the incidence of advanced neoplasia
after polypectomy in patients with and without low ‐risk adenomas. Alimentary pharmacology
& therapeutics, 2014. 39(9): p. 905-912.
8. Conteduca, V., et al., Precancerous colorectal lesions. International journal of oncology, 2013.
43(4): p. 973-984.
9. Haggar, F.A. and R.P. Boushey, Colorectal cancer epidemiology: incidence, mortality, survival,
and risk factors. Clinics in colon and rectal surgery, 2009. 22(4): p. 191.
10. Fearon, E.R. and B. Vogelstein, A genetic model for colorectal tumorigenesis. Cell, 1990. 61(5): p.
759-767.
11. Sobhani, I., et al., Microbial dysbiosis and colon carcinogenesis: could colon cancer be considered
a bacteria-related disease? Therap Adv Gastroenterol, 2013. 6(3): p. 215-29.
12. Labianca, R., et al., Colorectal cancer: screening. Ann Oncol, 2005. 16 Suppl 2: p. ii127-32.
13. de Jong, A.E., et al., Prevalence of adenomas among young individuals at average risk for
colorectal cancer. The American journal of gastroenterology, 2005. 100(1): p. 139-143.
14. Bonnington, S.N. and M.D. Rutter, Surveillance of colonic polyps: are we getting it right? World
journal of gastroenterology, 2016. 22(6): p. 1925.
15. He, J. and J.E. Efron, Screening for colorectal cancer. Adv Surg, 2011. 45: p. 31-44.
16. Winawer, S.J., et al., Randomized comparison of surveillance intervals after colonoscopic
removal of newly diagnosed adenomatous polyps. New England Journal of Medicine, 1993.
328(13): p. 901-906.
17. Schatzkin, A., et al., Lack of effect of a low-fat, high-fiber diet on the recurrence of colorectal
adenomas. New England Journal of Medicine, 2000. 342(16): p. 1149-1155.
18. Martínez, M.E., et al., Adenoma characteristics as risk factors for recurrence of advanced
adenomas. Gastroenterology, 2001. 120(5): p. 1077-1083.
19. Lieberman, D.A., et al., Five-year colon surveillance after screening colonoscopy.
Gastroenterology, 2007. 133(4): p. 1077-1085.
20. Baron, J.A., et al., A randomized trial of aspirin to prevent colorectal adenomas. New England
Journal of Medicine, 2003. 348(10): p. 891-899.
21. le Clercq, C.M., et al., Postcolonoscopy colorectal cancers are preventable: a population-based
study. Gut, 2013: p. gutjnl-2013-304880.
256
22. Pabby, A., et al., Analysis of colorectal cancer occurrence during surveillance colonoscopy in the
dietary Polyp Prevention Trial. Gastrointestinal endoscopy, 2005. 61(3): p. 385-391.
23. Robertson, D.J., et al., Colorectal cancer in patients under close colonoscopic surveillance.
Gastroenterology, 2005. 129(1): p. 34-41.
24. Robertson, D.J., et al., Colorectal cancers soon after colonoscopy: a pooled multicohort analysis.
Gut, 2013: p. gutjnl-2012-303796.
25. Atkin, W.S., B.C. Morson, and J. Cuzick, Long-term risk of colorectal cancer after excision of
rectosigmoid adenomas. New England Journal of Medicine, 1992. 326(10): p. 658-662.
26. Rex, D.K., et al., Colonoscopic miss rates of adenomas determined by back-to-back colonoscopies.
Gastroenterology, 1997. 112(1): p. 24-28.
27. Brenner, H., et al., Role of Colonoscopy and Polyp Characteristics in Colorectal Cancer After
Colonoscopic Polyp DetectionA Population-Based Case–Control Study. Annals of internal
medicine, 2012. 157(4): p. 225-232.
28. Brenner, H., et al., Risk of colorectal cancer after detection and removal of adenomas at
colonoscopy: population-based case-control study. Journal of clinical oncology, 2012. 30(24): p.
2969-2976.
29. Uronis, J.M., et al., Modulation of the intestinal microbiota alters colitis-associated colorectal
cancer susceptibility. PloS one, 2009. 4(6): p. e6026.
30. Sellon, R.K., et al., Resident enteric bacteria are necessary for development of spontaneous colitis
and immune system activation in interleukin-10-deficient mice. Infection and immunity, 1998.
66(11): p. 5224-5231.
31. Dove, W.F., et al., Intestinal neoplasia in the ApcMin mouse: independence from the microbial
and natural killer (beige locus) status. Cancer research, 1997. 57(5): p. 812-814.
32. Palmer, C., et al., Development of the human infant intestinal microbiota. PLoS biology, 2007.
5(7): p. e177.
33. Koenig, J.E., et al., Succession of microbial consortia in the developing infant gut microbiome.
Proceedings of the National Academy of Sciences, 2011. 108(Supplement 1): p. 4578-4585.
34. Yatsunenko, T., et al., Human gut microbiome viewed across age and geography. Nature, 2012.
486(7402): p. 222-227.
35. Benson, A.K., et al., Individuality in gut microbiota composition is a complex polygenic trait
shaped by multiple environmental and host genetic factors. Proceedings of the National
Academy of Sciences, 2010. 107(44): p. 18933-18938.
36. Spor, A., O. Koren, and R. Ley, Unravelling the effects of the environment and host genotype on
the gut microbiome. Nature Reviews Microbiology, 2011. 9(4): p. 279-290.
37. Spor, A., O. Koren, and R. Ley, Unravelling the effects of the environment and host genotype on
the gut microbiome. Nat Rev Microbiol, 2011. 9(4): p. 279-90.
38. Goodrich, J.K., et al., Human genetics shape the gut microbiome. Cell, 2014. 159(4): p. 789-799.
39. Goodrich, J.K., et al., Genetic determinants of the gut microbiome in UK twins. Cell host &
microbe, 2016. 19(5): p. 731-743.
40. Dethlefsen, L., et al., The pervasive effects of an antibiotic on the human gut microbiota, as
revealed by deep 16S rRNA sequencing. PLoS biology, 2008. 6(11): p. e280.
41. Sullivan, Å., C. Edlund, and C.E. Nord, Effect of antimicrobial agents on the ecological balance of
human microflora. The Lancet infectious diseases, 2001. 1(2): p. 101-114.
42. Jernberg, C., et al., Long-term ecological impacts of antibiotic administration on the human
intestinal microbiota. The ISME journal, 2007. 1(1): p. 56-66.
43. Jakobsson, H.E., et al., Short-term antibiotic treatment has differing long-term impacts on the
human throat and gut microbiome. PloS one, 2010. 5(3): p. e9836.
44. Cipe, G., et al., Relationship between intestinal microbiota and colorectal cancer. World journal
of gastrointestinal oncology, 2015. 7(10): p. 233.
257
45. Nøhr, M.K., et al., GPR41/FFAR3 and GPR43/FFAR2 as cosensors for short-chain fatty acids in
enteroendocrine cells vs FFAR3 in enteric neurons and FFAR2 in enteric leukocytes. Endocrinology,
2013. 154(10): p. 3552-3564.
46. Turnbaugh, P.J., et al., The effect of diet on the human gut microbiome: a metagenomic analysis
in humanized gnotobiotic mice. Science translational medicine, 2009. 1(6): p. 6ra14-6ra14.
47. Wu, G.D., et al., Linking long-term dietary patterns with gut microbial enterotypes. Science, 2011.
334(6052): p. 105-108.
48. World Cancer Research Fund, A.I.f.C.R., Food, Nutrition, Phsical Activiety, and the Prevention of
Cancer: a Global Perspective. 2007.
49. World Cancer Research Fund, A.I.f.C.R., Colorectal Cancer 2011 Report: Food, Nutrition, Physical
Activity, and the Prevention of Colorectal Cancer (Continuous Update Project). 2011.
50. Sun, J. and I. Kato, Gut microbiota, inflammation and colorectal cancer. Genes & diseases, 2016.
3(2): p. 130-143.
51. Gao, R., et al., Gut microbiota and colorectal cancer. European Journal of Clinical Microbiology &
Infectious Diseases, 2017: p. 1-13.
52. Gagnière, J., et al., Gut microbiota imbalance and colorectal cancer. World journal of
gastroenterology, 2016. 22(2): p. 501.
53. Tjalsma, H., et al., A bacterial driver–passenger model for colorectal cancer: beyond the usual
suspects. Nature Reviews Microbiology, 2012. 10(8): p. 575-582.
54. Baxter, N.T., et al., Microbiota-based model improves the sensitivity of fecal immunochemical
test for detecting colonic lesions. Genome medicine, 2016. 8(1): p. 37.
55. Zackular, J.P., et al., The human gut microbiome as a screening tool for colorectal cancer. Cancer
prevention research, 2014. 7(11): p. 1112-1121.
56. Flores, R., et al., Collection media and delayed freezing effects on microbial composition of
human stool. Microbiome, 2015. 3(1): p. 33.
57. Choo, J.M., L.E. Leong, and G.B. Rogers, Sample storage conditions significantly influence faecal
microbiome profiles. Scientific reports, 2015. 5.
58. Qin, J., et al., A metagenome-wide association study of gut microbiota in type 2 diabetes. Nature,
2012. 490(7418): p. 55-60.
59. Kuczynski, J., et al., Using QIIME to analyze 16S rRNA gene sequences from microbial
communities. Current protocols in microbiology, 2012: p. 1E. 5.1-1E. 5.20.
60. Caporaso, J.G., et al., QIIME allows analysis of high-throughput community sequencing data.
Nature methods, 2010. 7(5): p. 335-336.
61. Edgar, R.C., et al., UCHIME improves sensitivity and speed of chimera detection. Bioinformatics,
2011. 27(16): p. 2194-2200.
62. Edgar, R.C., Search and clustering orders of magnitude faster than BLAST. Bioinformatics, 2010.
26(19): p. 2460-2461.
63. Wang, Q., et al., Naive Bayesian classifier for rapid assignment of rRNA sequences into the new
bacterial taxonomy. Applied and environmental microbiology, 2007. 73(16): p. 5261-5267.
64. DeSantis, T.Z., et al., Greengenes, a chimera-checked 16S rRNA gene database and workbench
compatible with ARB. Applied and environmental microbiology, 2006. 72(7): p. 5069-5072.
65. Caporaso, J.G., et al., PyNAST: a flexible tool for aligning sequences to a template alignment.
Bioinformatics, 2009. 26(2): p. 266-267.
66. Shannon, C.E. and W. Weaver, The mathematical theory of information. 1949.
67. Faith, D.P. and A.M. Baker, Phylogenetic diversity (PD) and biodiversity conservation: some
bioinformatics challenges. Evolutionary bioinformatics online, 2006. 2: p. 121.
68. Lozupone, C. and R. Knight, UniFrac: a new phylogenetic method for comparing microbial
communities. Applied and environmental microbiology, 2005. 71(12): p. 8228-8235.
258
69. Lozupone, C., et al., UniFrac: an effective distance metric for microbial community comparison.
The ISME journal, 2010. 5(2): p. 169.
70. Team, R.C., R: A language and environment for statistical computing. Vienna, Austria: R
Foundation for Statistical Computing; 2014. 2014.
71. Levin, B., et al., Screening and surveillance for the early detection of colorectal cancer and
adenomatous polyps, 2008: a joint guideline from the American Cancer Society, the US Multi ‐
Society Task Force on Colorectal Cancer, and the American College of Radiology. CA: a cancer
journal for clinicians, 2008. 58(3): p. 130-160.
72. Cottet, V., et al., Long-term risk of colorectal cancer after adenoma removal: a population-based
cohort study. Gut, 2012. 61(8): p. 1180-1186.
73. Brenner, H., et al., Case-control study supports extension of surveillance interval after
colonoscopic polypectomy to at least 5 yr. The American journal of gastroenterology, 2007.
102(8): p. 1739.
74. Martínez, M.E., et al., A pooled analysis of advanced colorectal neoplasia diagnoses after
colonoscopic polypectomy. Gastroenterology, 2009. 136(3): p. 832-841.
75. Brim, H., et al., Microbiome analysis of stool samples from African Americans with colon polyps.
PloS one, 2013. 8(12): p. e81352.
76. Sanapareddy, N., et al., Increased rectal microbial richness is associated with the presence of
colorectal adenomas in humans. The ISME journal, 2012. 6(10): p. 1858-1868.
77. Scanlan, P.D., et al., Culture ‐independent analysis of the gut microbiota in colorectal cancer
and polyposis. Environmental microbiology, 2008. 10(3): p. 789-798.
78. Shen, X.J., et al., Molecular characterization of mucosal adherent bacteria and associations with
colorectal adenomas. Gut microbes, 2010. 1(3): p. 138-147.
79. Chen, H.-M., et al., Decreased dietary fiber intake and structural alteration of gut microbiota in
patients with advanced colorectal adenoma. The American journal of clinical nutrition, 2013.
97(5): p. 1044-1052.
80. McCoy, A.N., et al., Fusobacterium is associated with colorectal adenomas. PloS one, 2013. 8(1):
p. e53653.
81. Mira-Pascual, L., et al., Microbial mucosal colonic shifts associated with the development of
colorectal cancer reveal the presence of different bacterial and archaeal biomarkers. Journal of
gastroenterology, 2015. 50(2): p. 167-179.
82. Nugent, J.L., et al., Altered tissue metabolites correlate with microbial dysbiosis in colorectal
adenomas. Journal of proteome research, 2014. 13(4): p. 1921-1929.
83. Feng, Q., et al., Gut microbiome development along the colorectal adenoma–carcinoma
sequence. Nature communications, 2015. 6.
84. Ito, M., et al., Association of Fusobacterium nucleatum with clinical and molecular features in
colorectal serrated pathway. International journal of cancer, 2015. 137(6): p. 1258-1268.
85. Goedert, J.J., et al., Fecal microbiota characteristics of patients with colorectal adenoma
detected by screening: a population-based study. EBioMedicine, 2015. 2(6): p. 597-603.
86. Kasai, C., et al., Comparison of human gut microbiota in control subjects and patients with
colorectal carcinoma in adenoma: terminal restriction fragment length polymorphism and next-
generation sequencing analyses. Oncology reports, 2016. 35(1): p. 325-333.
87. Peters, B.A., et al., The gut microbiota in conventional and serrated precursors of colorectal
cancer. Microbiome, 2016. 4(1): p. 69.
88. Flemer, B., et al., Tumour-associated and non-tumour-associated microbiota in colorectal cancer.
Gut, 2017. 66(4): p. 633-643.
89. Hale, V.L., et al., Shifts in the fecal microbiota associated with adenomatous polyps. Cancer
Epidemiology and Prevention Biomarkers, 2017. 26(1): p. 85-94.
259
90. Yoon, H., et al., Comparisons of Gut Microbiota Among Healthy Control, Patients With
Conventional Adenoma, Sessile Serrated Adenoma, and Colorectal Cancer. Journal of cancer
prevention, 2017. 22(2): p. 108.
91. Jovel, J., et al., Characterization of the gut microbiome using 16S or shotgun metagenomics.
Frontiers in microbiology, 2016. 7.
92. Mosca, A., M. Leclerc, and J.P. Hugot, Gut microbiota diversity and human diseases: should we
reintroduce key predators in our ecosystem? Frontiers in microbiology, 2016. 7.
93. Son, J.S., et al., Altered Interactions between the Gut Microbiome and Colonic Mucosa Precede
Polyposis in APCMin/+ Mice. PLoS One, 2015. 10(6): p. e0127985.
94. Morgan, X.C. and C. Huttenhower, Chapter 12: Human microbiome analysis. PLoS Comput Biol,
2012. 8(12): p. e1002808.
95. Shobar, R.M., et al., The effects of bowel preparation on microbiota-related metrics differ in
health and in inflammatory bowel disease and for the mucosal and luminal microbiota
compartments. Clinical and translational gastroenterology, 2016. 7(2): p. e143.
96. Jalanka, J., et al., Effects of bowel cleansing on the intestinal microbiota. Gut, 2014: p. gutjnl-
2014-307240.
97. O’Brien, C.L., et al., Impact of colonoscopy bowel preparation on intestinal microbiota. PLoS One,
2013. 8(5): p. e62815.
98. Harrell, L., et al., Standard colonic lavage alters the natural state of mucosal-associated
microbiota in the human colon. PLoS One, 2012. 7(2): p. e32545.
99. Mai, V., et al., Effect of bowel preparation and colonoscopy on post-procedure intestinal
microbiota composition. Gut, 2006. 55(12): p. 1822-1823.
100. Mai, V. and O.C. Stine, Bowel preparation for colonoscopy: relevant for the gut’s microbiota? Gut,
2015: p. gutjnl-2014-308937.
101. Rijkers, G.T., et al., Health benefits and health claims of probiotics: bridging science and
marketing. British Journal of Nutrition, 2011. 106(9): p. 1291-1296.
102. Lisko, D.J., G.P. Johnston, and C.G. Johnston, Effects of Dietary Yogurt on the Healthy Human
Gastrointestinal (GI) Microbiome. Microorganisms, 2017. 5(1): p. 6.
103. Ley, R.E., et al., Microbial ecology: human gut microbes associated with obesity. Nature, 2006.
444(7122): p. 1022-1023.
104. Wexler, H.M., Bacteroides: the good, the bad, and the nitty-gritty. Clinical microbiology reviews,
2007. 20(4): p. 593-621.
105. Jass, J.R., et al., Emerging concepts in colorectal neoplasia. Gastroenterology, 2002. 123(3): p.
862-876.
106. Iino, H., et al., DNA microsatellite instability in hyperplastic polyps, serrated adenomas, and
mixed polyps: a mild mutator pathway for colorectal cancer? Journal of Clinical Pathology, 1999.
52(1): p. 5-9.
107. Teucher, B., et al., Dietary patterns and heritability of food choice in a UK female twin cohort.
Twin Research and Human Genetics, 2007. 10(5): p. 734-748.
108. Vinkhuyzen, A., et al., Genetic influences on ‘environmental’factors. Genes, Brain and Behavior,
2010. 9(3): p. 276-287.
260
Figure 6.1a. Flowchart of the overall study population nested in the CTP: Participants selection
5,542 CTP MZ Double-respondent Twin
Letter invited: 453 Twins
Selection Criteria based on CTP in
1990s:
Batch 1: Self-reported polyps
discordant twin pairs (n=23 pairs);
Batch 2: Age in 2013 ≥ 45 and both
twins had 5+ exams (n=26 pairs);
Batch 3: Age in 2013 ≥ 45 and both
twins had 4+ exams (n=130 pairs);
Batch 4: Age in 2013 ≥ 45 and both
twins had 3+ exams (n=329 pairs)
Batch 5: Age 50-54 in 2013 and one
twin had 3+ exams (n=118 pairs).
Successfully contacted by telephone: 161 Twins
Eligible and verbally Agree to participate: 103 Twins
Returned signed Consent: 95 Twins
(40 Double-respondent Twin Pairs + 15 Single-respondent Twin Pairs)
Refused or Not Agreed: 39 Twins
Ineligible: 19 Twins
Eligible Criteria:
1) Self-report with history of
colonoscopy within 10 years;
2) No history of cancer, autoimmune
disease, diabetes, HIV/AIDS,
diverticulitis, inflammatory bowel
disease (IBD), celiac disease, ulcer,
gastrointestinal reflux;
3) No history of surgery, major
changes in diet, or any antibiotic or
steroid use within one month.
Selection Criteria based on CTP in
1990s:
1) Self-reported polyps discordant twin
pairs (n=23 pairs); OR
2) Both twins had at least one recent
physical or screening examinations
(n=2,335 pairs).
Selected 2,358 MZ Twin Pairs
Selected 626 MZ Twin Pairs
261
Figure 6.1b. Flowchart of the overall study population nested in the CTP: Sample collections
Fecal Collection (N=150):
Collection 1: 93 twins
Collection 2: 57 twins
Excluded twins failed to send
fecal samples (N=2 twins)
Further excluded twins due to GI
disease history based on medical
reports (N=4 twins)
Available for Analyses:
Collection 1: 56 twins (17 double-respondent twin pairs, 22 single-respondent twin pairs)
Collection 2: 39 twins (14 double-respondent twin pairs, 11 single respondent twin pairs)
Colonoscopy/Pathology
Report Collection:
67 Twins
Excluded recent non-
adenoma twins with history
of adenomas (N=7 twins)
Excluded sequenced samples
failed to pass QC (N=8
samples)
Final Analyses:
Collection 1: 49 twins (14 double-respondent twin pairs, 21 single-respondent twin pairs)
Collection 2: 35 twins (12 double-respondent twin pairs, 11 single-respondent twin pairs)
Returned signed Consent: 95 Twins
(40 Double-respondent Twin Pairs + 15 Single-respondent Twin Pairs)
262
Table 6.1. Demographic and polyp characteristics of the study participants selected from the CTP: 33
controls with no polyp (NP), 8 cases with hyperplastic polyps (HP), and 8 cases with colorectal
adenomas (CRA).
Characteristics
No Polyp (NP)
Hyperplastic
Polyps (HP)
Adenomas (CRA)
P-value
*
(HP vs. NP)
P-value
*
(CRA vs. NP)
Number of twins
a
33 8 8
Participation Age, Median (IQR) 57 (55 - 65) 54 (51 - 54) 54 (53 - 56) 0.71 0.79
Sex, N (%) Male 1 (3.0) 1 (12.5) 2 (25.0) 0.83 0.79
Female 32 (97.0) 7 (87.5) 6 (75.0)
Race, N (%) White 27 (81.8) 6 (75.0) 8 (100.0) 0.98 NA
Non-White 6 (18.2) 2 (25.0) 0 (0.0)
BMI (kg/m2), N (%) BMI < 30 28 (84.8) 6 (75.0) 6 (85.7) 0.82 0.94
BMI ≥ 30 5 (15.2) 2 (25.0) 1 (14.3)
Recent Probiotics/Yogurt
Use, N (%)
No 15 (45.5) 2 (25.0) 2 (28.6) 0.34 0.50
Yes 18 (54.5) 6 (75.0) 5 (71.4)
Family history of CRC,
N (%)
No 20 (69.0) 2 (50.0) 3 (42.9) 0.10 0.66
Yes 9 (31.0) 2 (50.0) 4 (57.1)
Years since last colonoscopy, Median (IQR)
5 (3 - 7) 1.5 (0.75 - 2.25) 1.5 (0 - 2.5) 0.03 0.01
Number of Polyps ≤ 2 - 6 7
> 2 - 2 1
Polyps Size < 10 mm - 6 5
≥ 10 mm - 1 1
Missing - 1 2
Polyps Location
b
Right Colon - 1 3
Left Colon - 5 3
Rectum - 0 1
Mixed - 0 1
Missing - 2 0
High Grade Dysplasia - 0 1
Abbreviations: BMI, body mass index; CRC, colorectal cancer; IQR, Interquartile range (25
th
– 75
th
percentiles).
*
P-values are based on generalize linear mixed model with a random effect accounting for twin pair status between
cases (HP/CRA) and controls (NP). There were no differences between hyperplastic polyp cases and adenomas cases.
a
Number of individual twins. Of them, there were 14 double-respondent twin pairs, including 1 pair of HP/CRA, 4
pairs of NP/CRA, and 9 pairs of NP/NP.
b
Polyps located in the cecum, ascending colon, hepatic flexure and transverse colon were considered right colon,
while polyps located in the splenic flexure, descending colon and sigmoid were considered left colon. Rectum
included both rectosigmoid and rectum. Individuals that had polyps in more than one locations (right/left/rectum)
were classified as mixed.
263
Figure 6.2. Alpha diversity (Number of OTUs, Shannon index, and PD whole tree) from the 1
st
fecal
collection by polyp status in (a) a full set of colonoscopy-screened participants (NP controls n=33, HP
cases n=8, CRA cases n=8), or (b) a subset of participants (NP controls n=11, HP cases n=7, CRA cases
n=5), whose colonoscopy were completed within 3 years prior to fecal collections. These indices were
calculated for 20 iterations of rarefied (37,994 sequences per sample) OTU tables, and the average
over the iterations was taken for each participant. P-values from ANOVA mixed model with a random
effect accounting for twin pair status are shown. NP, no polyp; HP, hyperplastic polyps; CRA,
colorectal adenomas.
Number of OTUs Shannon Index PD Whole Tree
(a) Full Set (n=49) (b) Subset (n=23)
NP HP CRA NP HP CRA NP HP CRA
NP HP CRA NP HP CRA NP HP CRA
p=0.02
264
Figure 6.3. Weighted UniFrac-based principal coordinates analysis (PCoA) of the microbial
communities in the 1
st
fecal collection. Samples were obtained from participants of NP controls
(n=33), HP cases (n=8) and CRA cases (n=8), represented by green, blue and red circles, respectively.
Axis 1 explains 33.5% of all variance while axis 2 explains 16.5%. Separation between the microbiota
in any two groups was tested for the combined effect of axis 1 and axis 2 by the likelihood ratio test
(LRT) in mixed logistic model with a random effect accounting for twin pair status. NP, no polyp; HP,
hyperplastic polyps; CRA, colorectal adenomas.
Polyp Status
CRA
HP
NP
Axis 1 (33.5%)
Axis 2 (16.5%)
HP vs. NP p=0.18
CRA vs. NP p=0.02
CRA vs. HP p=0.08
265
Figure 6.4. Genus-level microbial composition of the 1
st
collection fecal samples by polyp status.
The graph shows the top 15 most abundant genera identified in the total samples, ordering from the
most abundant genus (red) to the least (purple). Missing genus name indicates no genus was
assigned from the database, thus the next available higher taxonomic rank was used. NP, no polyp;
HP, hyperplastic polyps; CRA, colorectal adenomas.
Relative Abundance
Polyp Status
NP (n=33) HP (n=8) CRA (n=8)
266
Table 6.2. Differentially abundant taxa
a
between NP controls (n=33), CRA cases (n=8) and HP cases (n=8), identified from the 1
st
fecal collection.
Taxon
Mean
Relative
Abundance
b
CRA vs. NP
HP vs. NP
CRA vs. HP
Coefficient
c
(95% CI)
P-
value
d
Coefficient
c
(95% CI)
P-
value
d
Coefficient
c
(95% CI)
P-
value
d
Phylum
k__Bacteria.p__Firmicutes 0.488 0.08 (-0.08, 0.25) 0.55
0.22 (0.06, 0.39) 0.09
-0.14 (-0.35, 0.07) 0.68
Class
p__Proteobacteria.c__Alphaproteobacteria 2.6E-03 -0.23 (-0.41, -0.04) 0.13
-0.71 (-1.15, -0.27) 0.03
1.09 (-1.82, 4.00) 0.80
p__Proteobacteria.c__Betaproteobacteria 1.9E-03 -0.27 (-1.82, 1.29) 0.95
-0.16 (-1.53, 1.21) 0.91
1.06 (0.49, 1.64) 0.01
Order
c__Betaproteobacteria.o__Burkholderiales 1.9E-03 0.80 (-0.10, 1.70) 0.30
-0.43 (-0.73, -0.13) 0.15
1.23 (0.44, 2.02) 0.06
Family
o__Clostridiales.f__Peptostreptococcaceae 0.010 1.85 (0.86, 2.85) 0.02
-0.66 (-2.05, 0.73) 0.98
2.52 (1.00, 4.03) 0.07
o__Lactobacillales.f__Streptococcaceae 0.038 -1.35 (-2.12, -0.57) 0.02
0.48 (-0.34, 1.31) 0.91
-1.83 (-3.01, -0.65) 0.07
o__Actinomycetales.f__Micrococcaceae 9.0E-05 -2.72 (-4.31, -1.13) 0.02
-0.07 (-1.54, 1.40) 1.00
-2.65 (-4.45, -0.86) 0.07
o__Bacteroidales.f__Bacteroidaceae 0.101 -1.22 (-2.04, -0.40) 0.05
-0.57 (-1.39, 0.26) 0.86
-0.65 (-1.70, 0.39) 0.76
o__Bacteroidales.f__.Odoribacteraceae. 1.1E-03 -0.45 (-1.91, 1.01) 0.90
-2.61 (-4.08, -1.14) 0.03
2.16 (0.30, 4.02) 0.33
Genus
f__Desulfovibrionaceae.g__Desulfovibrio 1.1E-03 -4.87 (-6.94, -2.79) 0.0005
-0.57 (-2.56, 1.42) 1.00
-4.29 (-6.91, -1.68) 0.07
f__Enterobacteriaceae.g__Klebsiella 7.2E-03 2.22 (0.94, 3.49) 0.04
-0.45 (-2.64, 1.73) 1.00
1.57 (-1.77, 4.90) 0.89
f__Micrococcaceae.g__Rothia 8.0E-05 -2.66 (-4.27, -1.06) 0.04
-0.05 (-1.55, 1.44) 1.00
-2.61 (-4.42, -0.79) 0.18
f__Streptococcaceae.g__Streptococcus 3.3E-02 -0.76 (-1.27, -0.26) 0.08
-0.15 (-1.07, 0.76) 1.00
-0.61 (-1.82, 0.59) 0.89
f__Bacteroidaceae.g__Bacteroides 0.101 -1.22 (-2.04, -0.40) 0.08
-0.57 (-1.39, 0.26) 0.75
-0.65 (-1.70, 0.39) 0.89
f__Actinomycetaceae.g__Actinomyces 7.4E-04 -0.53 (-1.32, 0.27) 0.84
-1.73 (-2.78, -0.67) 0.08
1.20 (-0.15, 2.56) 0.71
f__.Odoribacteraceae..g__Butyricimonas 4.2E-04 -1.13 (-2.99, 0.74) 0.84
-4.32 (-6.28, -2.36) 0.002
3.20 (0.74, 5.65) 0.30
f__Pseudomonadaceae.g__Pseudomonas 1.2E-03 2.45 (-3.82, 8.73) 0.90
10.07 (0.61, 19.52) 0.52
-7.61 (-11.82, -3.40) 0.04
NP, no polyp; HP, hyperplastic polyps; CRA, colorectal adenomas.
a
Differentially abundant taxa were selected by the tests using zero-inflated negative binomial mixed model, in which a random effect was used to taking twin pair status into
account. Any taxa (from phylum to genus) with a Benjamini-Hochberg (B-H) procedure adjusted p < 0.10 are included in the table.
b
Mean relative abundance of each selected taxon across all the samples.
c
Regression coefficients and 95% CIs were estimated by the zero-inflated negative binomial mixed model. The estimates above 0 suggests the more abundant taxa presenting in
the index group comparing to the reference group, while the estimates below 0 suggests the less abundant taxa presenting in the index group comparing to the reference group.
The coefficients can be interpreted as the expected difference in log relative abundance between the index group and the reference group.
d
Benjamini-Hochberg (B-H) procedure adjusted P-values. Multiple comparisons were adjusted at each taxonomic level separately.
267
Table 6.3. Demographic characteristics of the subset study participants who completed both fecal
collections: 21 controls with no polyp (NP), 7 cases with hyperplastic polyps (HP), and 7 cases with
colorectal adenomas (CRA).
Characteristics
No Polyp (NP)
Hyperplastic
Polyps (HP)
Adenomas (CRA)
P-value
*
(HP vs. NP)
P-value
*
(CRA vs. NP)
Number of twins
a
21 7 7
Participation Age, Median (IQR) 62 (55 - 66) 54 (51 - 54)
55 (53 - 56) 0.75
0.72
Sex, N (%) Male
1 (4.8) 1 (14.3) 1 (14.3)
0.84 0.94
Female
20 (95.2) 6 (85.7) 6 (85.7)
Race, N (%) White
18 (85.7) 5 (71.4) 7 (100.0)
0.98 NA
Non-White
3 (14.3) 2 (28.6) 0 (0.0)
BMI (kg/m2), N (%) BMI < 30
18 (85.7) 5 (71.4) 5 (83.3)
0.74 0.99
BMI ≥ 30
3 (14.3) 2 (28.6) 1 (16.7)
Recent Probiotics/Yogurt
Use, N (%)
No
8 (42.1) 2 (33.3) 3 (50.0)
0.72 0.77
Yes
11 (57.9) 4 (66.7) 3 (50.0)
Family history of CRC,
N (%)
No
13 (76.5) 1 (33.3) 2 (33.3)
0.27 0.12
Yes
4 (23.5) 2 (66.7) 4 (66.7)
Years since last colonoscopy, Median (IQR) 5 (3 - 7) 2 (1.5 - 2.5) 2 (0.5 - 3.0) 0.11 0.03
Abbreviations: BMI, body mass index; CRC, colorectal cancer; IQR, Interquartile range (25
th
– 75
th
percentiles).
*
P-values are based on generalize linear mixed model with a random effect accounting for twin pair status
between cases (HP/CRA) and controls (NP). There were no differences between hyperplastic polyp cases and
adenomas cases.
a
Number of individual twins. Of them, there were 12 double-respondent twin pairs, including 1 pair of HP/CRA,
4 pairs of NP/CRA, and 7 pairs of NP/NP.
268
Figure 6.5. Alpha Diversity (a) and Beta Diversity (b) of fecal microbiota from the 2
nd
collection.
Samples were obtained from participants of NP controls (n=21), HP cases (n=7) and CRA cases (n=7),
represented by green, blue and red, respectively. NP, no polyp; HP, hyperplastic polyps; CRA,
colorectal adenomas.
Number of OTUs Shannon Index PD Whole Tree
(a)
NP HP CRA
NP HP CRA
NP HP CRA
(b)
Axis 1 (29.1%)
Axis 2 (24.7%)
Polyp Status
CRA
HP
NP
HP vs. NP p=0.41
CRA vs. NP p=0.45
CRA vs. HP p=0.21
269
Table 6.4. 17 Significantly differentially abundant taxa
a
that were identified from the 1
st
fecal collection were tested in the 2
nd
fecal collection
comparing between NP controls (n=21), CRA cases (n=7) and HP cases (n=7).
Taxon
Mean
Relative
Abundance
b
CRA vs. NP
HP vs. NP
CRA vs. HP
Coefficient
c
(95% CI)
P-
value
d
Coefficient
c
(95% CI) P-value
d
Coefficient
c
(95% CI) P-value
d
Phylum
k__Bacteria.p__Firmicutes 0.576 -0.01 (-0.21, 0.19) 0.95
0.09 (-0.10, 0.29) 0.35
-0.10 (-0.34, 0.14) 0.42
Class
p__Proteobacteria.c__Alphaproteobacteria 3.0E-04 -0.93 (-3.46, 1.59) 0.47
-2.73 (-5.43, -0.04) 0.05
1.80 (-0.81, 4.41) 0.18
p__Proteobacteria.c__Betaproteobacteria 1.8E-03 0.60 (-0.37, 1.57) 0.23
-0.67 (-1.65, 0.31) 0.18
1.27 (0.12, 2.42) 0.03
Order
c__Betaproteobacteria.o__Burkholderiales 1.8E-03 0.60 (-0.37, 1.57) 0.23
-0.67 (-1.65, 0.31) 0.18
1.27 (0.12, 2.42) 0.03
Family
o__Clostridiales.f__Peptostreptococcaceae 0.014 0.30 (-1.46, 2.07) 0.74
-1.55 (-3.55, 0.44) 0.13
1.86 (-0.28, 3.99) 0.09
o__Lactobacillales.f__Streptococcaceae 0.041 -0.74 (-1.85, 0.38) 0.19
0.41 (-0.69, 1.52) 0.47
-1.15 (-2.37, 0.07) 0.07
o__Actinomycetales.f__Micrococcaceae 1.0E-04 -1.22 (-2.78, 0.33) 0.12
0.21 (-1.44, 1.86) 0.81
-1.43 (-3.17, 0.32) 0.11
o__Bacteroidales.f__Bacteroidaceae 0.084 -0.17 (-1.07, 0.73) 0.71
-0.74 (-1.64, 0.16) 0.11
0.57 (-0.53, 1.68) 0.31
o__Bacteroidales.f__.Odoribacteraceae. 8.2E-04 0.33 (-0.99, 1.65) 0.62
-4.51 (-5.94, -3.09) 5.62E-10
4.84 (3.14, 6.55) 2.55E-08
Genus
f__Desulfovibrionaceae.g__Desulfovibrio 5.5E-04 -0.11 (-2.37, 2.16) 0.93
-0.26 (-2.52, 2.01) 0.82
-3.12 (-35.17, 28.93) 0.85
f__Enterobacteriaceae.g__Klebsiella 1.7E-02 1.38 (-0.32, 3.08) 0.11
-2.95 (-4.59, -1.32) 4.00E-04
4.33 (2.34, 6.32) 2.00E-05
f__Micrococcaceae.g__Rothia 9.0E-05 -1.18 (-2.70, 0.34) 0.13
0.25 (-1.36, 1.86) 0.76
-1.43 (-3.13, 0.28) 0.10
f__Streptococcaceae.g__Streptococcus 4.0E-02 -0.73 (-1.89, 0.43) 0.22
0.41 (-0.72, 1.53) 0.48
-1.14 (-2.40, 0.12) 0.08
f__Bacteroidaceae.g__Bacteroides 0.084 -0.17 (-1.07, 0.73) 0.71
-0.74 (-1.64, 0.16) 0.11
0.57 (-0.53, 1.68) 0.31
f__Actinomycetaceae.g__Actinomyces 1.1E-03 -0.40 (-1.18, 0.39) 0.33
0.62 (-0.36, 1.60) 0.21
-1.02 (-2.03, 0.00) 0.05
f__.Odoribacteraceae..g__Butyricimonas 3.7E-04 0.37 (-1.33, 2.07) 0.67
-22.03 (-5,371.26,
5,327.20) 0.99
22.40 (-5,326.83,
5,371.63) 0.99
f__Pseudomonadaceae.g__Pseudomonas 2.7E-03 -0.29 (-3.34, 2.75) 0.85
-4.37 (-9.83, 1.09) 0.12
4.08 (-1.85, 10.00) 0.18
NP, no polyp; HP, hyperplastic polyps; CRA, colorectal adenomas.
a
17 Differentially abundant taxa were selected based on the significant results from 1
st
fecal collection at a Benjamini-Hochberg (B-H) procedure adjusted p < 0.10.
b
Mean relative abundance of each selected taxon across all the samples.
c
Regression coefficients and 95% CIs were estimated by the zero-inflated negative binomial mixed model. The estimates above 0 suggests the more abundant taxa presenting in the
index group comparing to the reference group, while the estimates below 0 suggests the less abundant taxa presenting in the index group comparing to the reference group. The
coefficients can be interpreted as the expected difference in log relative abundance between the index group and the reference group.
d
P-values without adjusting for multiple comparisons for the purpose of replicating the results from the 1
st
fecal collection, in which p < 0.05 was considered significant.
270
F
Figure 6.6. Fecal microbial stability between 2 collections and between individuals.
(a) Intraclass correlation coefficients (ICCs) between 2 collections (N=35) in relation to alpha diversity,
regardless of polyp status. (b) Intraclass correlation coefficients (ICCs) between 2 collections (N=35)
in relation to the relative abundance, separately by polyp status. (c) Weighted UniFrac-based
principal coordinates analysis (PCoA) of the microbial communities. Samples were obtained from
participants (N=35) who completed both collections, represented by blue and red circles. Axis 1
explains 29.4% of all variance while axis 2 explains 22.3%. Separation between the microbiota in two
collections was tested for the combined effect of axis 1 and axis 2 by the likelihood ratio test (LRT) in
conditional logistic regression model. No statistical difference was observed between 2 collections
(p=0.31). (d) Beta diversity (unweighted UniFrac distance) comparing between the same individual
who completed both collections (“Within Individual”, N=35), the twin pairs from the 1
st
collection
(“Within Twin Pair”, N=14), and any non-twin pairs from the 1
st
collection (“Non-Twin Pair”, N=1162),
regardless of polyp status. NP, no polyp; HP, hyperplastic polyps; CRA, colorectal adenomas.
(a)
P = 0.003
P < 0.001 P < 0.001
(b)
(c)
(d)
Within Individual Within Twin Pair Non-Twin Pair
Unweighted UniFrac
Axis 1 (29.4%)
Axis 2 (22.3%)
C1 vs. C2: p=0.31
log 10(mean relative abundance)
CRA
NP
HP
ICC
ICC
No. of OTUs Shannon PD whole tree
271
Figure 6.7. Alpha Diversity (a) and Beta Diversity (b) of fecal microbiota from the 1
st
collection
comparing PA individuals (n=5) to NP controls (n=33) or CRA cases (n=8). NP, no polyp; PA, current no
polyp controls with previous history of adenomas who had been excluded from the main analyses;
CRA, colorectal adenomas.
Number of OTUs Shannon Index PD Whole Tree
(a)
NP PA CRA
NP PA CRA
NP PA CRA
(b)
Axis 1 (33.3%)
Axis 2 (19.2%)
Polyp Status
CRA
PA
NP
PA vs. NP p=0.77
CRA vs. PA p=0.001
272
Table 6.5. Differentially abundant taxa
a
between NP controls (n=33), CRA cases (n=8) and PA individuals (n=5), identified from the 1
st
fecal
collection.
Taxon
Mean Relative Abundance
b
PA vs. NP
CRA vs. PA
Coefficient
c
(95% CI) P-value
d
Coefficient
c
(95% CI) P-value
d
Order
c__Gammaproteobacteria.o__Enterobacteriales 0.054 1.30 (1.21, 1.39) <0.001
-0.66 (-2.61, 1.30) 0.73
Family
o__Enterobacteriales.f__Enterobacteriaceae 0.054 1.30 (1.21, 1.39) <0.001
-0.66 (-2.61, 1.30) 0.81
o__Bacteroidales.f__.Barnesiellaceae 2.68E-03 -3.82 (-6.15, -1.50) 0.03
1.96 (-0.80, 4.71) 0.68
Genus
f__Coriobacteriaceae.g__Collinsella 0.067 -0.85 (-1.58, -0.12) 0.23
3.32 (1.97, 4.68) <0.001
f__Peptostreptococcaceae.g__ 0.011 -1.00 (-1.97, -0.04) 0.28
3.07 (1.77, 4.37) <0.001
f__.Tissierellaceae.g__Finegoldia 7.05E-05 2.53 (0.60, 4.46) 0.13
-4.62 (-7.07, -2.18) 0.01
f__.Barnesiellaceae.g__ 0.572 -3.82 (-6.14, -1.49) 0.04
1.96 (-0.80, 4.71) 0.68
f__Peptostreptococcaceae.g__Peptostreptococcus 3.08E-04 5.03 (3.00, 7.06) <0.001
-4.56 (-6.98, -2.14) 0.01
f__Erysipelotrichaceae.g__.Eubacterium 0.013 2.09 (0.75, 3.43) 0.06
-1.99 (-3.60, -0.37) 0.18
f__Peptococcaceae.g__Peptococcus 5.38E-04 -7.32 (-11.20, -3.44) 0.01
6.98 (1.87, 12.10) 0.11
NP, no polyp; PA, current no polyp controls with previous history of adenomas who had been excluded from the main analyses; CRA, colorectal adenomas.
a
Differentially abundant taxa were selected by the tests using zero-inflated negative binomial mixed model, in which a random effect was used to taking twin pair status into
account. Any taxa (from phylum to genus) with a Benjamini-Hochberg (B-H) procedure adjusted p < 0.10 are included in the table.
b
Mean relative abundance of each selected taxon across all the samples.
c
Regression coefficients and 95% CIs were estimated by the zero-inflated negative binomial mixed model. The estimates above 0 suggests the more abundant taxa presenting in
the index group comparing to the reference group, while the estimates below 0 suggests the less abundant taxa presenting in the index group comparing to the reference group.
The coefficients can be interpreted as the expected difference in log relative abundance between the index group and the reference group.
d
Benjamini-Hochberg (B-H) procedure adjusted P-values. Multiple comparisons were adjusted at each taxonomic level separately.
273
Figure 6.8. Nine identified species of the genus Bacteroides from the 1
st
fecal collection.
Differentially abundances were tested using zero-inflated negative binomial mixed model, in which a
random effect was used to taking twin pair status into account. NP, no polyp; HP, hyperplastic polyps;
CRA, colorectal adenomas.
B. unidentified B. acidifaciens B. caccae
B. eggerthii B. coprophilus B. fragilis
B. plebeius B. ovatus B. uniformis
NP HP CRA NP HP CRA NP HP CRA
NP HP CRA NP HP CRA NP HP CRA
NP HP CRA NP HP CRA NP HP CRA
p = 0.01
p = 0.17
p = 0.99
p = 0.41
p < 0.001
p < 0.001
p < 0.001
p = 0.05
p = 0.81
p = 0.42
p = 0.80
p = 0.83
p = 0.002
p = 0.14
p = 1.00
p = 1.00
p < 0.001
p = 1.00
274
Figure 6.9. Alpha diversity (Number of OTUs, Shannon index, and Equitability) from the 1
st
fecal
collection by polyp status after excluding all the taxa from the genus Bacteroides in (a) a full set of
colonoscopy-screened participants (NP controls n=33, HP cases n=8, CRA cases n=8), or (b) a subset
of participants (NP controls n=11, HP cases n=7, CRA cases n=5), whose colonoscopy were completed
within 3 years prior to fecal collections. These indices were calculated for 20 iterations of rarefied
(29,455 sequences per sample) OTU tables, and the average over the iterations was taken for each
participant. P-values from ANOVA mixed model with a random effect accounting for twin pair status
are shown. NP, no polyp; HP, hyperplastic polyps; CRA, colorectal adenomas.
Number of OTUs Shannon Index Equitability
(a) Full Set (n=49)
(b) Subset (n=23)
NP HP CRA NP HP CRA NP HP CRA
NP HP CRA NP HP CRA NP HP CRA
p=0.04 p=0.08
275
Figure 6.10. Weighted UniFrac-based principal coordinates analysis (PCoA) of the microbial
communities in the 1
st
fecal collection after excluding all the taxa from the genus Bacteroides.
Samples were obtained from participants of NP controls (n=33), HP cases (n=8) and CRA cases (n=8),
represented by green, blue and red circles, respectively. Axis 1 explains 25.3% of all variance while
axis 2 explains 18.8%. Separation between the microbiota in any two groups was tested for the
combined effect of axis 1 and axis 2 by the likelihood ratio test (LRT) in mixed logistic model with a
random effect accounting for twin pair status. NP, no polyp; HP, hyperplastic polyps; CRA, colorectal
adenomas.
Polyp Status
CRA
HP
NP
Axis 1 (25.3%)
Axis 2 (18.8%)
HP vs. NP p=0.06
CRA vs. NP p=0.27
CRA vs. HP p=0.70
276
Table 6.6. Alpha diversity (Number of OTUs and Shannon index) from the 1st fecal collection by
study population characteristics (N=49).
Characteristics N
a
Observed OTUs Shannon
Participation age
≥ 55 years 32 875 (172) 5.76 (0.68)
< 55 years 17 735 (155) 5.37 (0.63)
P-value
c
0.514 0.951
Sex
Male 4 781 (101) 5.41 (0.32)
Female 45 830 (183) 5.65 (0.70)
P-value
c
0.489 0.414
Race
White 41 828 (187) 5.61 (0.71)
Non-White 8 814 (130) 5.72 (0.56)
P-value
c
0.619 0.948
Current BMI (kg/m2)
BMI < 30 40 853 (175) 5.73 (0.66)
BMI ≥ 30 8 719 (145) 5.22 (0.62)
P-value
c
0.096 0.096
Recent probiotics/yogurt Use
No 19 801 (155) 5.60 (0.73)
Yes 29 850 (189) 5.68 (1.66)
P-value
c
0.874 0.817
Family history of CRC
No 25 779 (164) 5.48 (0.59)
Yes 15 942 (181) 6.03 (0.76)
P-value
c
0.040 0.063
Years since last colonoscopy
≤ 3 years 24 827 (187) 5.62 (0.72)
> 3 years 25 825 (172) 5.63 (0.66)
P-value
c
0.297 0.403
Vegetable Consumption
< Median
b
17 766 (165) 5.46 (0.68)
≥ Median
b
28 850 (181) 5.68 (0.71)
P-value
c
0.210 0.430
Fruits/juice Consumption
< Median
b
14 827 (139) 5.55 (0.60)
≥ Median
b
30 813 (198) 5.63 (0.75)
P-value
c
0.886 0.655
Red meat Consumption
< Median
b
23 861 (182) 5.70 (0.73)
≥ Median
b
23 794 (151) 5.52 (0.65)
P-value
c
0.228 0.484
a
Inconsistent total numbers in each characteristic was due to the missing data.
b
The median value of each food category was calculated based on sex-specific food frequency from all the
participants who returned food frequency questionnaires.
c
P-values were based on ANOVA mixed model with a random effect accounting for twin pair status.
277
(a) Participation age (b) Sex
(c) Race (d) BMI
(e) Family history of CRC
(f) Years since last colonoscopy
p=0.081 p=0.670
p=0.679 p=1.000
p=0.149 p=0.114
Figure 6.11. Weighted UniFrac-based principal coordinates analysis (PCoA) of the microbial
communities in the 1st fecal collection by study population characteristics (N=49). Separation
between the microbiota in two groups was tested for the combined effect of axis 1 and axis 2 by the
likelihood ratio test (LRT) in mixed logistic model with a random effect accounting for twin pair status.
278
Figure 6.11. Continued
(g) Recent probiotics/yogurt Use (h) Vegetable consumption
(i) Fruits/juice consumption (j) Red meat consumption
p=0.277 p=0.417
p=0.885 p=0.893
279
Table 6.7. A follow-up questionnaire for dietary habit changes since colonoscopy.
Dietary Habit
No Polyp (NP)
Hyperplastic
Polyps (HP)
Adenomas (CRA)
Number of twins
a
30 7 5
Diet Changed Since
Coloscopy, N (%)
No
15 (50.0) 0 (0.0) 3 (60.0)
Yes 15 (50.0) 7 (100.0) 2 (40.0)
Beef Consumption
b
, N (%) Decreased
12 (40.0) 5 (71.4) 0 (0.0)
Increased 0 (0.0) 0 (0.0) 0 (0.0)
Fruits Consumption
b
, N (%) Decreased 0 (0.0) 1 (14.3) 0 (0.0)
Increased 10 (33.3) 5 (71.4) 1 (20.0)
Vegetables Consumption
b
, N
(%)
Decreased
0 (0.0) 0 (0.0) 0 (0.0)
Increased 10 (33.3) 6 (85.7) 1 (20.0)
Yogurt or Other Fermented
Products Use
b
, N (%)
Decreased 4 (13.3) 1 (14.3) 0 (0.0)
Increased 6 (20.0) 4 (57.1) 0 (0.0)
Probiotics Use
b
, N (%) Decreased 0 (0.0) 1 (14.3) 0 (0.0)
Increased
8 (26.7) 4 (57.1) 1 (20.0)
a
Number of individual twins who returned the follow-up questionnaire. The answers from 3 NP
controls, 1 HP case and 3 CRA cases were missing.
b
For each dietary item, the numbers of individual twins who suggested not to change the
consumption since colonoscopy were not listed.
280
Appendix A. Zero-inflated negative binomial mixed model
Since the microbiome count data is a zero-inflated, highly skewed, sparse data set, a zero-
inflated negative binomial model with a random effect taking into account twin pair status was fitted
to examine the differentially abundant taxa by colorectal polyp status.
𝑙𝑜𝑔 (𝜇 𝑖𝑗𝑘 ) = (𝛽 𝑖 0
) + 𝛽 1𝑘 𝑝𝑜𝑙𝑦𝑝 𝑖𝑗
+ (log 𝑡𝑜𝑡𝑎𝑙 𝑐𝑜𝑢𝑛𝑡 )
𝑖𝑗
(𝛽 𝑖 0
) = 𝑏 0𝑘 + 𝑏 𝑖𝑘
𝐼 {𝑡 𝑤𝑖𝑛𝑝𝑎𝑖𝑟 𝑖𝑗
= 𝑖 }
𝑌 𝑖𝑗𝑘 ~𝑁𝐵 (𝜇 𝑖𝑗𝑘 , 𝜙 𝑘 )
(𝑌 𝑖𝑗𝑘 |𝑋 𝑖𝑗𝑘 = 0) = 0, (𝑌 𝑖𝑗𝑘 |𝑋 𝑖𝑗𝑘 = 1) = 1
𝑋 𝑖𝑗𝑘 ~𝐵𝑒𝑟𝑛𝑜𝑢𝑙𝑙𝑖 (𝜋 )
Y ijk denotes the taxon k’s observed count for twin j in twin pair i, µ ijk is the mean parameter
for taxon k’s count distribution of twin j in twin pair i and φ k is the dispersion parameter. X ijk is a
Bernoulli trial with a probability π to have zero counts. Twin pair refers to ID for twin pair status,
which has a zero-mean random coefficient. Coefficients b 0 and β 0 are fixed constant representing
grand mean and polyp status, respectively. The log total sequence count for each twin is set as an
offset. β 1 is the estimated count coefficient for taxon k by polyp status, which value is the logarithmic
form of the relative difference in taxon k’s counts between polyp status after controlling for twin pair
status and other covariates if any. Thus, the sign of β 1 suggests the direction of the differential
abundance for the comparison group, in which value of 0 indicates equal abundance between polyp
status while less abundance at β 1<0 and more abundance at β 1>0.
281
Chapter 7 Summary and Future Perspectives
Most of diseases and conditions are caused by many contributing factors as a combined
effect of multiple genes associated with lifestyle and environment factors, also called complex or
multifactorial diseases. Unlike cystic fibrosis or down syndrome that are caused by a single mutation
in chromosomes, multifactorial disease has no clear Mendelian pattern of inheritance and difficult to
study and treat because the specific factors resulting in most of these diseases have not yet identified.
A person’s life starts at conception with genetic predispositions that are determined by parents’
genotypes, and also modulated by parental lifestyles, such as smoking and alcohol use. Since
conception until after birth, this person continuously interacts with the environment and other
persons to assembly as a unique integral unit to keep both endogenous and exogenous balance
physiologically and mentally. This balance is dynamic and resilient to the changes in lifestyles or the
environment to a certain extent until broken. An acquired good change can result in a new balance
beneficial to the person’s health, while the imbalance due to a deleterious change can eventually
cause multifactorial diseases. Therefore, it is essential to recognize any lifestyle or environmental
changes affecting human health outcomes for the development of disease prevention, diagnosis and
therapy.
Twins provide a valuable resource to study such multifactorial diseases by allowing
disentanglement of the genetic, shared environment and unshared environment factors for the
disease of interest using monozygotic (MZ) (100% shared genome) and dizygotic (DZ) (on average 50%
shared genome) twins. Thus, the genetic similarity in the disease concordant twins comparing MZ
and DZ twin pairs can crudely estimate the disease heritability. Meanwhile, under the various levels
of a certain environment factor, such comparison serves as an indicator of potential gene-
environment interaction in the disease etiology. Then, by controlling for genetic background and
282
shared early life exposures in the disease discordant twin pairs, we are able to identify unshared
environment factors associated with the disease. Particularly, in combination with the emerging next-
generation genome sequencing technologies and improved analytic tools, twin studies represent a
powerful approach to identify and understand the biological mechanisms underlying the disease.
Additionally, other advantages in twin studies include more valid and reliable measures for both
exposures and outcomes due to the mutual validations between twins [1], as well as a good response
rate and high compliance during the follow-up in practice [2]. In this dissertation, we presented the
applications of twins from a population-based twin registry, the California Twin Program (CTP) [3, 4],
in 3 epidemiological studies to show the merit of twin studies in this genome sequencing era and to
contribute knowledge about several multifactorial diseases in relation to various lifestyle factors in
such a naturally genetic-controlled setting.
In the first study (Chapter 2), we conducted a classic twin study in 20,803 twin pairs (7,247
double-respondent pairs and 13,556 single-respondent pairs) from the CTP who were born from
1957 to 1982, to explore diverse types of birth anomaly, to estimate the heritable contributions, as
well as to provide evidences of their associations with a set of shared birth characteristics and
parental exposures (lifestyles). Both birth anomalies and exposures were cross-sectionally measured
based on the CTP questionnaires at enrollment in 1990s-2000s. And excellent agreements (75-97%)
were observed within double-respondent pairs. Our major finding is the significant genetic
contributions for clubfoot and strabismus by comparing the concordance rates between MZ and DZ
twin pairs. In terms of parental lifestyles, we found parental smoking was associated with increased
risks of spina bifida and strabismus, and their risks as well as the risk of clubfoot increased
quadratically from when neither parents smoked, to only the father smoked, to only the mother
smoked, and then to both parents smoked. In addition, despite the small numbers, we observed
extra genetic susceptibility to parental smoking among twins who had clubfoot or strabismus, but
283
less in twins with congenital heart defects. Therefore, although this twin study only provided crude
estimates and may suffered the limitations of small sample size and potential selection bias from
disease conditions, it shed lights to a future genotyping or genome sequencing study to identify the
specific genetic factors (SNPs or genes) contributing to the development of clubfoot and strabismus
(maybe spina bifida and congenital heart problem as well) in relation to parental smoking. Especially,
such knowledge is still very limited to this date.
In the next study (Chapter 4), we moved our interests to investigate the unshared lifestyle
factors in relation to CRC risk in twins, after controlling for genetic background and shared early life
exposures. The lifestyle factors including anthropometric measures, alcohol use, smoking, exercise,
medication use, history of disease as well as dietary patterns were assessed from the CTP
questionnaires at enrollment in 1990s-2000s. The primary incident CRC cases diagnosed after the CTP
enrollment were identified by linking the entire CTP cohort including 37,435 twin pairs to cancer
registries: California Cancer Registry (CCR) and Los Angeles County Cancer Surveillance Program (CSP)
in 2010. Then, we conducted a matched case-control study among 90 double-respondent like-sex
twin pairs who were discordant for CRC. Our results confirmed the previously established adverse
associations between alcohol use or smoking and CRC risk, as well as the protective effect of
vegetable consumption on the development of CRC [5]. Additionally, further evidence from our study
was added to show the inverse association between CRC risk and potatoes intake. We also tested the
associations separately by zygosity (MZ and DZ twins) for possible gene-environment interactions, or
by tumor subsites (proximal colon, distal colon, and rectum) for potentially different etiology.
Although we showed some zygosity-specific or tumor subsite-specific associations, we should
interpret with caution due to the smaller numbers for each subgroup.
Although the findings from the 2
nd
study may not be extremely exciting, they did show the
power of a twin study using only 90 twin pairs to confirm the results that have been tested across
284
different studies among at least thousands of subjects. Moreover, this study let us think how to
maximally use the CTP data. For instance, in this study, we started with 37,435 twin pairs but in the
end only 90 pairs were included. We had to throw away the data from single-respondent twins and
unlike-sex twin pairs, which might also result in potential selection bias. First, if we want to keep the
exactly same study design with the best control for genetic background, as we proposed in Chapter 4,
another linkage to obtain more cases diagnosed after 2010 is the best choice. On the other hand, if
we want to keep all the data, a statistical method is needed to take into consider correlations within
twins in double-respondent twin pairs, variations between double-respondent and single-respondent
twin pairs, variations between like-sex and unlike-sex twin pairs, as well as adjustment for other
possible confounders, and even survival time. We had tried to break the match to perform a
traditional case-control analysis or a cohort analysis using all the unrelated CRC cases and a set of
randomly selected unrelated controls. However, probably because we didn’t have the same power to
control for unmeasured confounders as a matched case-control twin study, the results were mostly
different and difficult to interpret. Additionally, a mixed model with random effects taking into
account the correlations within twins as well as variations due to different sex can be a choice, but
may be subject to the same problem as before and bias due to the violation of model assumptions.
Permutation method is another choice, and we are considering testing it in the CTP data later.
Furthermore, misclassification of exposures is another problem for the CTP data. This 16-page CTP
questionnaire covered all the aspects in the participant’s life. How to extract the information
accurately and efficiently is another consideration. Besides searching for the optimized algorithms to
define variables, the development of a multidimensional reduction method based on the data’s own
structure is also needed. Moreover, it also could be a choice to switch to a more prevalent cancer
type, such lung cancer. But if we can establish a standard method to deal with all the concerns as
285
discussed above, it would be relatively easy to apply to any disease or cancer of interest using the
CTP cohort.
The third study (Chapter 6) was also interested in CRC, but more focused on its precursor
lesions, known as colorectal polyps. Personal history of polyps has been associated with higher risk of
CRC [6, 7]. Early detection and removal of polyps during screening colonoscopy can greatly reduce
the CRC risk and mortality [8]. However, the individual’s characteristics, such as the advanced age,
unfavorable lifestyles and/or genetic predispositions, or missed or incompletely resected lesions at
the initial colonoscopy [9-13] can lead to polyp recurrence or even CRC sometimes in a faster pace
than the natural progression [13-21]. Moreover, lifestyle changes associated with polyp history or
colonoscopy procedure, such as a healthier diet, can also modify this polyps-CRC sequence. Therefore,
categorization of the clinical biomarkers after colonoscopy to predict the polyps or CRC risk is an
interesting topic for the development of CRC prevention, diagnosis and potential therapy, but has
rarely been studied. With the emerging and advances in 16S rRNA gene sequencing technology, the
microbiota from fecal samples becomes an appealing candidate, since it is easily collected without
invasive procedure and serves as a proxy for the gut microbiome that has been associated with
polyps and CRC [22, 23], and can be shaped by lifestyle factors, such as diet [24-27]. Therefore, we
conducted a cross-sectional study enrolling 95 individual MZ twins from the CTP cohort and included
49 eligible twins representing 35 MZ twin pairs who provided both fecal samples and the past
colonoscopy reports in the final analysis. Based on the most recent colonoscopy and pathology
reports if any, we identified 33 no polyp (NP) controls, 8 hyperplastic polyps (HP) cases and 8
colorectal adenomas (CRA) cases. By comparing their microbial diversity and composition in fecal
samples using mixed models with a random effect taking into account twin status, we observed the
distinct bacteria communities between CRA cases and NP controls and categorized a group of fecal
microbial signatures for CRA cases, after partially controlling for genetic background and early life
286
exposures. This finding suggests that the alterations in the fecal microbiome among colorectal
adenoma patients that have been addressed previously, may persist even years after colonoscopy
and polypectomy, probably not only due to adenomas but also other factors related to adenomas,
such as screening procedures or lifestyle changes. However, as discussed in Chapter 6, this study was
lack of power to pinpoint the specific bacteria functionally associated to adenomas or any lifestyle
factor due to the small sample size and technique limitations, and the cross-sectional study design
limited our potential to link the CRA microbial signatures to the risk of polyp recurrence or CRC
progression. Thus, a large-scale longitudinal study in combination with the advanced shotgun
metagenomic sequencing technique will be needed to obtain a more comprehensive picture and
further understand the underlying biology. In addition, combining the information from our second
study on CRC risk and lifestyle factors, it would be also interesting to investigate the interaction or
mediation relationships between alcohol use, smoking or vegetable intakes and the gut microbiome
on the initiation and progression of CRC.
In summary, this dissertation suggests that twin studies incorporating with the newly
emerging sequencing technology continue to provide valuable insights to human studies in refining
disease, identifying risk factors (such as lifestyles), and to characterize potential biomarkers for
disease. Furthermore, our studies also emphasize the needs for the new statistical methods
specifically designed for twin data structure as well as an ongoing twin registry. The CTP cohort has
been provided such a valuable resource for a number of studies, including 3 studies presented here.
However, the CTP is a cross-sectional cohort and is not active for recruiting new twins any more. The
current large-scale ongoing population-based twin registries are mainly in Europe (e.g. TwinsUK
Registry) and Asia (e.g. South Korean Twin Registry). In US, there are several active twin registries,
such as Mid-Atlantic Twin Registry (MATR) and Avera Twin Registry, but none of them cover the
287
California population. Thus, we hope that the value of twins in human studies demonstrated in this
dissertation can promote the searching for resources to reactivate the CTP.
288
Chapter 7 References
1. Hamilton, A.S. and T.M. Mack, Use of twins as mutual proxy respondents in a case-control
study of breast cancer: Effect of item nonresponse and misclassification. American journal of
epidemiology, 2000. 152(11): p. 1093-1103.
2. Cockburn, M.G., et al., Twins as willing research participants: Successes from studies nested
within the California Twin Program. Twin Research and Human Genetics, 2006. 9(06): p. 927-
932.
3. Cockburn, M., et al., The occurrence of chronic disease and other conditions in a large
population-based cohort of native Californian twins. Twin Res, 2002. 5(5): p. 460-7.
4. Cockburn, M.G., et al., Development and representativeness of a large population-based
cohort of native Californian twins. Twin Res, 2001. 4(4): p. 242-50.
5. World Cancer Research Fund, A.I.f.C.R., Diet, nutrition, physical activity and colorectal cancer
2017. 2017.
6. Conteduca, V., et al., Precancerous colorectal lesions. International journal of oncology, 2013.
43(4): p. 973-984.
7. Haggar, F.A. and R.P. Boushey, Colorectal cancer epidemiology: incidence, mortality, survival,
and risk factors. Clinics in colon and rectal surgery, 2009. 22(4): p. 191.
8. He, J. and J.E. Efron, Screening for colorectal cancer. Adv Surg, 2011. 45: p. 31-44.
9. Atkin, W.S., B.C. Morson, and J. Cuzick, Long-term risk of colorectal cancer after excision of
rectosigmoid adenomas. New England Journal of Medicine, 1992. 326(10): p. 658-662.
10. Rex, D.K., et al., Colonoscopic miss rates of adenomas determined by back-to-back
colonoscopies. Gastroenterology, 1997. 112(1): p. 24-28.
11. Brenner, H., et al., Role of Colonoscopy and Polyp Characteristics in Colorectal Cancer After
Colonoscopic Polyp DetectionA Population-Based Case–Control Study. Annals of internal
medicine, 2012. 157(4): p. 225-232.
12. Brenner, H., et al., Risk of colorectal cancer after detection and removal of adenomas at
colonoscopy: population-based case-control study. Journal of clinical oncology, 2012. 30(24):
p. 2969-2976.
13. Robertson, D.J., et al., Colorectal cancer in patients under close colonoscopic surveillance.
Gastroenterology, 2005. 129(1): p. 34-41.
14. Winawer, S.J., et al., Randomized comparison of surveillance intervals after colonoscopic
removal of newly diagnosed adenomatous polyps. New England Journal of Medicine, 1993.
328(13): p. 901-906.
15. Schatzkin, A., et al., Lack of effect of a low-fat, high-fiber diet on the recurrence of colorectal
adenomas. New England Journal of Medicine, 2000. 342(16): p. 1149-1155.
16. Martínez, M.E., et al., Adenoma characteristics as risk factors for recurrence of advanced
adenomas. Gastroenterology, 2001. 120(5): p. 1077-1083.
17. Lieberman, D.A., et al., Five-year colon surveillance after screening colonoscopy.
Gastroenterology, 2007. 133(4): p. 1077-1085.
18. Baron, J.A., et al., A randomized trial of aspirin to prevent colorectal adenomas. New England
Journal of Medicine, 2003. 348(10): p. 891-899.
19. le Clercq, C.M., et al., Postcolonoscopy colorectal cancers are preventable: a population-
based study. Gut, 2013: p. gutjnl-2013-304880.
20. Pabby, A., et al., Analysis of colorectal cancer occurrence during surveillance colonoscopy in
the dietary Polyp Prevention Trial. Gastrointestinal endoscopy, 2005. 61(3): p. 385-391.
21. Robertson, D.J., et al., Colorectal cancers soon after colonoscopy: a pooled multicohort
analysis. Gut, 2013: p. gutjnl-2012-303796.
289
22. Sun, J. and I. Kato, Gut microbiota, inflammation and colorectal cancer. Genes & diseases,
2016. 3(2): p. 130-143.
23. Gao, R., et al., Gut microbiota and colorectal cancer. European Journal of Clinical
Microbiology & Infectious Diseases, 2017: p. 1-13.
24. Cipe, G., et al., Relationship between intestinal microbiota and colorectal cancer. World
journal of gastrointestinal oncology, 2015. 7(10): p. 233.
25. Nøhr, M.K., et al., GPR41/FFAR3 and GPR43/FFAR2 as cosensors for short-chain fatty acids in
enteroendocrine cells vs FFAR3 in enteric neurons and FFAR2 in enteric leukocytes.
Endocrinology, 2013. 154(10): p. 3552-3564.
26. Turnbaugh, P.J., et al., The effect of diet on the human gut microbiome: a metagenomic
analysis in humanized gnotobiotic mice. Science translational medicine, 2009. 1(6): p. 6ra14-
6ra14.
27. Wu, G.D., et al., Linking long-term dietary patterns with gut microbial enterotypes. Science,
2011. 334(6052): p. 105-108.
Abstract (if available)
Abstract
Twin studies have been a valuable source of selected sampling for identification of genetic heritability of complex traits controlling for early environmental exposures or examination of associations between environmental factors and diseases at least partially controlling for genetic background. The California Twin Program (CTR) is a population-based cohort representing a sample of twins born in California between 1908 and 1982 created by Dr. Thomas Mack and his colleagues at USC. In this dissertation, I selected different subsets of twins from the CTP for 3 studies, including: 1) A cross-sectional study with the goal of describing the prevalence of selected birth anomalies in California Twins and investigating the effect of heritability and potential parental exposures on the occurrence of each selected birth anomaly
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Early childhood health experience & adult phenotype in twins
PDF
Environmental risk factors of Multiple Sclerosis: a twin study
PDF
Prenatal air pollution exposure, newborn DNA methylation, and childhood respiratory health
PDF
Assessing the impact of air pollution on adverse birth outcomes in a low resource setting
PDF
The role of heritability and genetic variation in cancer and cancer survival
PDF
Determinants of menarche discordance in fraternal and identical twins
PDF
The multiethnic nature of chronic disease: studies in the multiethnic cohort
PDF
The development of type 2 diabetes mellitus (type 2 DM) in California twins
PDF
Common immune-related factors and risk of non-Hodgkin lymphomy
PDF
Genetic studies of cancer in populations of African ancestry and Latinos
PDF
Genetic risk factors in multiple myeloma
PDF
Dietary and supplementary folate intake and prostate cancer risk
PDF
Analysis of SNP differential expression and allele-specific expression in gestational trophoblastic disease using RNA-seq data
PDF
The effects of tobacco exposure on hormone levels and breast cancer risk among young women
PDF
Utility of polygenic risk score with biomarkers and lifestyle factors in the multiethnic cohort study
PDF
Factors that influence mammographic density: role of estrogen metabolism genes, biomarkers of inflammation, and lifestyle
PDF
The role of pesticide exposure in breast cancer
PDF
Missing heritability may be explained by the common household environment and its interaction with genetic variation
PDF
Epidemiological studies of Epstein-Barr virus, lymphoma, and immune conditions
PDF
Air pollution, mitochondrial function, and growth in children
Asset Metadata
Creator
Yu, Yang
(author)
Core Title
Lifestyle-related exposures and diseases in twins
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Molecular Epidemiology
Publication Date
08/03/2018
Defense Date
12/20/2017
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
birth anomaly,colon polyp,colorectal cancer,lifestyle factors,OAI-PMH Harvest,twin studies
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Cozen, Wendy (
committee chair
), Mack, Thomas (
committee member
), Millstein, Joshua (
committee member
), Ouellette, Andre (
committee member
)
Creator Email
yyfresh@gmail.com,yyu742@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-51343
Unique identifier
UC11670514
Identifier
etd-YuYang-6632.pdf (filename),usctheses-c89-51343 (legacy record id)
Legacy Identifier
etd-YuYang-6632.pdf
Dmrecord
51343
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Yu, Yang
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement