Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
An integrative approach for determining age related macular degeneration risk facors in Latinos
(USC Thesis Other)
An integrative approach for determining age related macular degeneration risk facors in Latinos
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
AN INTEGRATIVE APPROACH FOR DETERMINING AGE RELATED
MACULAR DEGENERATION RISK FACORS IN LATINOS
by
Corina Julia Shtir
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(BIOSTATISTICS)
May 2009
Copyright 2009 Corina Julia Shtir
ii
Dedication
I sincerely thank my husband John Shtir for his untiring support and
encouragement throughout my graduate work.
iii
Acknowledgements
I am grateful to all my committee members for the great support and guidance
they provided throughout the whole educational and research training process.
I thank Dr. Azen for his utmost support and encouragement throughout my
education at USC. His guidance as both statistical advisor and student
coordinator through both my MS and PHD training has always been effective and
very constructive.
Also, I thank Dr. Marjoram for his untiring help during our weekly sessions. His
critical thinking, research skills, and overall guidance have helped tremendously
shape my own thinking. Also, as a mentor he has provided me with both a mental
and an emotional balance that only the most fortunate students are able to obtain
during their training.
I especially wish to thank Dr. Varma for the great opportunity he has provided me
with by giving me the chance to analyze the LALES data, and for the financial
support throughout my PhD education. His strong support has kept me going and
has helped me achieve my education goal.
Also, I wish to thank Dr. Gauderman and Dr. Humayun for their time and effort in
contributing with their suggestions and support.
iv
Table of Contents
Dedication ii
Acknowledgements iii
List of Tables vi
List of Figures viii
Abstract x
Introduction: Age Related Macular Degeneration (AMD) 1
Chapter 1: Overview of GWAS, Population Admixture, and CNV Study 5
1.1 Integration of GWAS and Population Admixture Analyses 5
1.2 Integration of GWAS and CNV Analyses 7
1.3 Subject Selection for GWAS and CNV 8
1.4 Subject Selection for Population Admixture Study 10
1.5 Genotyping 12
Chapter 2: Genome-wide Identification of Genes Associated with
Early Age-Related Macular Degeneration 14
2.1 GWAS Overview 14
2.2 GWAS Methods 15
2.3 GWAS Results 18
2.4 GWAS Discussion 49
Chapter 3: Variation in Genetic Admixture and Population
Structure Among Latinos 59
3.1 Population Admixture Overview 59
3.2 Admixture Background 60
3.3 Materials and Methods 67
3.4 Statistical Analysis 70
3.5 Results 77
3.6 Discussion 98
Chapter 4: Copy Number Variation (CNV) Association Tests 102
4.1 CNV Objective 102
4.2 CNV Background 103
4.3 Methods for estimating variation in chromosomal location
and number of altered DNA segments 105
v
4.3.1 GADA Algorithm 107
4.3.2 Helix Copy Number Analysis (CNAM) Algorithm 110
4.3.3. HELIX CNAM Association Test 114
4.4 Proposed Tests of Association for Copy Number Variation 115
4.4.1 Chi-square Test for Assessing the Effect of Change
vs. Normal Copy State on Disease 119
4.4.2 Conditional Chi-square Test for Assessing the
Effect of Lost, Normal, or Inserted DNA Copy on
Disease Status 120
4.4.3 Wilcoxon gene-based Association Test 122
4.5 Copy Number Variation Analysis Results 124
Chapter 5: Copy Number Variation in the Framingham Study 133
5.1 Results 134
5.2 Discussion 139
Chapter 6: Integration of CNV, GWAS, and Population Admixture Results 142
6.1 Analysis Overview 142
6.2 Copy Number Variation 146
6.3 Population Admixture 150
Chapter 7: Conclusions 153
7.1 Marker exclusion criteria for GWAS vs. CNV studies 153
7.2 Factors that may induce the QQ-plot difference between
segment- and gene-based CNV association tests 155
7.3 Role of Normalization in Conducting CNV analysis 157
7.4 Other tests to be considered when performing CNV analyses 158
7.5 What sort of study design should be considered when
studying Latino populations? 159
7.6 Benefits of haplotype trend regression (HTR) analysis 160
References 162
vi
List of Tables
Table 1.1: Age distribution and self-reported birthplace
location across case-control and gender status 10
Table 2.1: Genome-wide Allele Association Test Results
on the Whole Set of SNPs for the Additive Model 22
Table 2.2: Genome-wide Allele Association Test Results on
the Tagged Set of SNPs for the Additive Model 24
Table 2.3: Genome-wide Allele Association Test Results on
the Whole Set of SNPs for the Recessive Model 26
Table 2.4: Genome-wide Allele Association Test Results on
the Tagged Set of SNPs for the Recessive Model 30
Table 2.5: Haplotype Trend Regression Results 35
Table 2.6: Linkage disequilibrium correlations for
VLDLR and KCNV2 on chromosome 9p24.2 45
Table 3.1: LALES sample demographics 68
Table 3.2: Simulation summary statistics of ancestry clustering models 80
Table 3.3: Estimation of ancestry proportions for the LALES, MEC,
and Native American populations 81
Table 3.4: Proportion of membership of each pre-defined population
in each of the 5 clusters 84
Table 3.5: Estimation of ancestry proportions for the LALES and
MEC Latinos by birthplace location 85
Table 3.6: Ancestry informative markers with difference in
allele frequency (δ) greater than 0.3 between Native
American and European ancestry among Latinos 87
vii
Table 3.7: Comparison of ancestry proportion medians
(1
st
: 3
rd
quartile) among LALES Latinos by birth location
and case-control status 94
Table 3.8: Association test for AIMs with AMD in LALES Latinos 95
Table 3.9: Bootstrap simulation results for the increased and
decreased sample size methods 96
Table 3.10: Allele frequency divergence among populations
(net nucleotide distance) and expected heterozygosity
between individuals 96
Table 4.1: Chi-square tabulation scheme for DNA copy vs.
case/control status 121
Table 4.2: Comparison of association results for the JAK1
gene/SNPS 126
Table 4.3: Log2 ratio (SNP hybridization) association with early AMD 126
Table 4.4: CNV association with early AMD; p-values are based
on the Wilcoxon test for average copy number
(GADA segmentation) 128
Table 4.5: Genome-wide allele association signals included in
CNV regions associated with early AMD 132
Table 5.1: Summary of gene-based CNV associations for
GADA based segmentation (Wilcoxon rank-sum test) 138
viii
List of Figures
Figure 2.1: AMD gene network and molecular functions based on
the LALES genome-wide allelic test and HTR analyses 20
Figure 2.2: SNPs in Linkage Disequilibrium for VLDLR and KCNV2 21
Figure 3.1: Individual ancestry proportions for the LALES,
MEC, and Native American sampled populations 70
Figure 3.2: Cluster ancestry distribution for the LALES, Multiethnic
panel, and Native American sampled populations 82
Figure 3.3: Native American and European ancestry distribution
among LALES cases and controls 88
Figure 3.4: Distribution of T-values for testing overall homozygosity
and heterozygosity trends 91
Figure 3.5: Distribution of T-values for testing overall homozygosity
and heterozygosity trends in LALES Latinos born within
the US versus LALES Latinos born outside the US 92
Figure 3.6: Bootstrap re-sampling: distribution of European and
Native American ancestry frequencies in LALES Latinos 97
Figure 4.1: CNV Illustration for Deletion and Duplication of DNA 104
Figure 4.2: DNA Fork and Stall Replication 104
Figure 4.3: CNV Hybridization Procedure 106
Figure 4.4: Modeling of probe hybridizations (Pique at al., 2008) 107
Figure 4.5: Numerical characterizations for the GADA piecewise
component algorithm of representing CNV vectors
(Pique et al, 2008) 109
Figure 4.6: GADA detection of CNV in Neuroblastoma Tumor
Cell Lines 110
ix
Figure 4.7: CNV analysis on Chromosome 1 for 303 LALES Latinos
subjects (101 AMD cases and 202 controls) 112
Figure 4.8: Representation of univariate and multivariate
GoldenHelix CNAM segmentation algorithm
(www.goldenhelix.com) 113
Figure 4.9: Illustration of copy number variation segmentation
across four samples 117
Figure 4.10: Chromosome 1 CNV Segmentation (GADA algorithm) 129
Figure 4.11: JAK1 CNV on Chromosome 1 (GADA) 130
Figure 4.12: ASTN1 CNV segmentation 131
Figure 5.1: Q-Q Plot of p-values for gene-based average CNV
associations for GADA and CNAM 136
Figure 5.2: Scatter plots 137
x
Abstract
Age-related macular degeneration (AMD) is the leading cause of
blindness among elderly individuals worldwide. Previous studies confirmed
different frequencies between ethnic groups for a series of risk alleles and a
pressing need to determine unidentified genetic factors characterizing the
pathogenesis of AMD. This study seeks to identify new genes associated with
risk for developing early AMD among Latinos, through the use of a genome wide
association study (GWAS) and through a copy number variation (CNV) analysis
that is based on a series of proposed CNV association tests. In addition, we
performed a population admixture study based on 233 ancestry informative
markers.
GWAS was conducted using single allele and haplotype trend regression
tests. Our overall results include a series of variants involved in oxidative stress
and inflammatory mechanisms that interact with the complement system. In
contrast to Caucasian based AMD pathways we identified a neurodevelopment
and brain injury repair mechanism that may constitute an essential and novel role
in the biogenesis of AMD among Latinos. Also, we proposed a series of
segment- and gene-based CNV association tests. In order to compare CNV
estimates of different algorithms we apply the GADA and CNAM programs, and
xi
further report how the normalizing techniques, mean segment estimates, and
copy number association tests differ. We found wide discrepancies between SNP
hybridization normalization techniques, which in turn affect all downstream CNV
analyses.
Given the strong effect of population admixture on genetic association
studies, we estimated the ancestral load of LALES Latinos by comparing it to its
founder populations. We detected strong evidence for recent population
admixture in LALES Latinos, with overall estimates of 45.5% Native American,
40.1% European, 9.8% Asian, and 4.5% African-American ancestry. Gradients of
increasing Native American background (37.3% to 51.7%) and of
correspondingly decreasing European ancestry (45.3% to 29.6%) were observed
as a function of birth origin from North to South. The strongest excess of
homozygosity, a reflection of recent population admixture, was observed in non-
US born Latinos that recently populated the US. In conclusion we summarize the
lessons we have learned from integrating GWAS, CNV, and Population
Admixture studies.
1
Introduction
Age-related Macular Degeneration (AMD) is a multifactorial disease
caused by genetic defects that exacerbate the risk of developing AMD through
interactions with biological and environmental factors. Progressing phenotypes
consist of drusen deposits along Bruch’s membrane, choroidal
neovascularization, and atrophy of photoreceptors and retinal pigmented
epithelial (RPE) cells. As drusen deposits increase in size, they cover a larger
area of the RPE and risk for development of late AMD also increases (Herrera,
Didishvili et al. 2004; De Jong, Ikram et al. 2007). While early AMD does not
often affect vision adversely, late AMD frequently results in severe vision loss
and blindness. Consequently, AMD is the leading cause of blindness in
developed countries (Klein, Zeiss et al. 2005). The late form of AMD leads to
damage of photoreceptors and to an abnormal growth of blood vessels in the
choriocapillaries. Ultimately, hemorrhage and protein leakage beneath the
macula results in scarring, followed by irreversible loss of vision. Numerous
population-based studies and clinic based cohorts have found that certain types
of early AMD are more likely to progress to the vision impairing late AMD.
Interestingly, prevalence rates for the two forms of AMD appear to differ
by ancestry, suggesting that there may be differences in risk for AMD due to
2
population background. Prevalence rates for early AMD among Latinos are
similar to those found in Caucasians (9.4% LALES vs. 7.2% BMES vs. 15.6%
Beaver Dam) and in individuals of African descent (12.6% BES) (Klein, Klein et
al. 1992; Mitchell, Smith et al. 1995; Varma, Fraser-Bell et al. 2004; Leske, Wu et
al. 2006; Klein, Klein et al. 2007). However, incidence data indicates that only
1.5% of early AMD cases advance into late AMD in Latinos, while 3.4% of cases
progress in Caucasian cohorts. In longitudinal studies of Non-Hispanic Whites,
presence of large drusen deposits (>125 μm diameter) was shown to be
predictive of advanced AMD. Although at baseline the prevalence of large drusen
in the LALES Latinos was 14.5%, incidence of advanced AMD remained very low
(0.43%) compared to that of Caucasian individuals (1.4% for BMES; 1.6% for
Beaver Dam) (Mitchell, Smith et al. 1995; Klein, Klein et al. 2007), but was closer
to that found in populations of African descent – (0.7%) Barbados study (Leske,
Wu et al. 2006). This biologic paradox provides an excellent setting to explore
the different hypothesized biological mechanisms and genetic pathways that
could affect early AMD. Thus the LALES data provides a unique opportunity to
identify the genetic factors that are (1) associated with early AMD, and (2)
attributed to the Latino population. This study provides an analysis of these
issues.
Regardless of ethnicity, several genetic and environmental risk factors
have been consistently associated with the development of both early and late
3
AMD (Klein, Peto et al. 2004). A series of papers implicated the Y402
polymorphism on complement factor H (CFH) in the development of AMD in
separate samples through the use of different association study techniques
(Souied, Benlian et al. 1998; Conley, Thalamuthu et al. 2005; Hageman,
Anderson et al. 2005; Haines, Hauser et al. 2005; Klein, Zeiss et al. 2005;
Zareparsi, Branham et al. 2005; Thakkinstian, Han et al. 2006). Both genome-
wide search and regionalized testing strategies yielded significant results for CFH
and established the importance of this marker in AMD etiology. While these
findings have largely examined risk for late AMD, a significant association has
also been found for early AMD (Tedeschi-Blok, Buckley et al. 2007). Despite
these agreements, the attributable risk explained by CFH appears to vary by
population (Grassi, Fingert et al. 2006). Notably, Grassi and colleagues
evaluated the frequency of the Y402H polymorphism on CFH among 5 different
ethnic populations and identified discordant frequencies among some ethnic
groups, suggesting that additional genetic, environmental, and/or gene-
environment interactions are essential, albeit as yet unidentified (Grassi, Fingert
et al. 2006). However, additional markers in the complement pathway have more
recently been associated with AMD (Dinu, Miller et al. 2007), including
complement factor B (CFB) (Klein, Zeiss et al. 2005), HtrA serine peptidase 1
(HTRA1) (Yang, Camp et al. 2006), C-reactive proteins (CRP) and polyanionic
surface markers (i.e. glycosaminoglycans) (Clark, Higman et al. 2006).
4
LALES Project Objective: The objective of the current study is to explore and
integrate a series of analytical methods that will allow us to gain a better
understanding of population admixture across Latinos, and to further assess how
it affects AMD genetic and molecular regulatory mechanisms. For this purpose
we study early AMD in Latinos ascertained through the Los Angeles Latino Eye
Study (LALES). In short, we perform (1) a genome-wide analysis study (GWAS)
for detecting AMD susceptible loci, (2) a study of population structure and
admixture aimed at identifying the levels of ancestries composing Latino
populations, and (3) an implementation of the proposed copy number variation
(CNV) association tests. Thus, the aims of this project are to:
- Propose a genetic and molecular network involved in AMD among
Latinos: compare genetic risk factors for LALES Latino AMD-affecteds
with previously reported AMD pathways in Caucasian, Asian, and African
populations
- Estimate the levels of population admixture in the LALES cohort, and
study the effect of geographic birth origin on Native American and
European ancestry estimates
- Implement a series of methods for testing association between the
number and/or length of altered CNV segments and risk for AMD
- Compare CNV patterns of genetic expression with signals identified
through GWAS allelic and haplotype association tests
5
Chapter 1
Overview of GWAS, Population Admixture, and CNV Study
1.1 Integration of GWAS and Population Admixture Analyses: Based on a
genome-wide association study using 303 LALES Latinos (101 early AMD cases
and 202 controls) we identified a series of positive associations for genes
involved in several molecular pathways (i.e. complement system, immune
responses, central nervous system, and cardiovascular mechanisms). While
several of these genes have already been reported to be implicated in AMD
molecular mechanisms in Caucasian and Afro-Caribbean populations, various
other results are not yet confirmed in Caucasian, Asian, or African descent
individuals. Nevertheless, different risks for both forms of early and late AMD are
confirmed across various populations. Consequently, to account for the potential
effect of Latino ancestral admixture and/or structure we performed a separate
population admixture study through the analysis of 233 ancestry informative
markers genotyped for 500 LALES Latinos, a set of 355 individuals composed of
Asians (AS), Africans (AA), Europeans (EU), and US born Latinos, and for 30
Native American individuals. The findings of this study give supportive evidence
for recent population admixture in LALES Latinos, most of who migrated recently
6
into the US from Mexico, El Salvador, and Guatemala. Moreover, we observe a
variation in population admixture and Native American (NA) inheritance among
Latino subgroups as a function of geographic distance from North to South (US
to El Salvador). We also identified a set of SNPs that distinguish between
European and Native American populations and will therefore be useful in future
studies aimed at studying admixture in Latinos.
The inclusion of individual ancestry proportions (NA, EU, AS, and AA) as
covariates for allelic or haplotype regression models will allow us to assess the
effect of population admixture on genetic risk factors. The study of admixture will
allow us to measure the amount of structure within Latinos, and the degree it
varies among different geographic locations. However, future genome-wide
studies on the LALES cohort or other Latino population may account for the
effect of population structure through other statistical approaches designed to
eliminate the necessity of obtaining prior information of individual ancestries. For
instance, a different mode for assessing confounding due to population
stratification applies a complete-linkage hierarchical clustering algorithm (Purcell
et al, 2007) with the use of whole genome SNP data. Specifically, we could
cluster individuals into homogeneous subsets and perform a classical
multidimensional scaling to visualize substructure and to provide quantitative
indices of population genetic variation (Purcell, Neale et al. 2007). In short,
multidimensional scaling will produce a k-dimensional representation of
7
population substructure and allow for control of population stratification through
the use of the k identified covariates. Also, we may test the Euclidean identical by
state (IBS) distance metric as an alternative of the standard metric of proportional
sharing, an approach identical to the principal-components analysis. While this
method has not been analyzed yet, its results will permit for a better evaluation of
the role of ancestry estimates, and for a comparison of results between the two
types of structure adjustments.
1.2 Integration of GWAS and CNV Analyses: This proposal includes several
methods designed to assess the degree of association between disease status
and both length and number of altered DNA copies. CNV analysis has become a
major and indispensible tool in discovering chromosomal regions that include
alleles susceptible for disease. Several algorithms have already been developed
for detecting the genome location and copy numbers of recombined DNA
segments. However, I propose to exploit the effect of variation in DNA alteration
by developing statistical tests that will enable us to verify whether there is an
association between disease status and type of CNV. Moreover, through the
comparison of CNV and GWAS association results we will examine if an over-
expression or under-expression of gene product exacerbates risk for AMD.
Finally, based on the observed up- or down-regulation of gene products we will
8
attempt to infer potential genetic pathways that distinguish AMD cases from
controls, or different stages of disease (i.e. early from late AMD).
The following chapters include background information for the LALES cohort
(Chapter 1), results from the GWAS and Population Admixture analyses
(Chapters 2 and 3), description of proposed CNV association tests (Chapter 4),
summary of results based on the CNV analysis (Chapters 4 and 5), and a
summary of the lessons learned from completing the three major analyses
(GWAS, Population Admixture, CNV) (Chapters 6 and 7).
1.3 Subject Selection for Genome-wide Analysis and Copy Number
Variation: Data for these two studies came from the LALES study, a population-
based cohort of 6,357 Latinos with an average age (s.d.) of 56.7 (11.2) years,
who resided in 6 census tracts in Los Angeles County. Based on the results of a
clinical eye examination, a subset of 303 subjects (55.12% females and 44.88%
males), divided in a 1 to 2 case:control ratio, was genotyped for this study. Self-
reported geographic origins for these 303 subjects were 67.33% Mexico, 18.81%
USA, and 13.87% from other places (Table 1.1). A standardized, detailed
assessment of AMD phenotypes was employed for this study. Cases with early
AMD were identified through masked grading of fundus photographs, by the
presence of intermediate to large soft drusen in both eyes.
9
Control subjects lacked drusen in either eye and were of similar age and gender
to the cases. Masked grading of fundus photographs was performed by two
graders from the Wisconsin Ocular Epidemiology Grading Center, using a
modified version of the Wisconsin Age-Related Maculopathy Grading System
(Klein et al. 1991). Discrepancies between these 2 graders were resolved by a
senior grader. If there were still disagreements between all 3 gradings, the
grading was then determined by a trained ophthalmologist. The type of drusen
was determined based on the size, uniformity of appearance, and sharpness of
edges. Soft drusen deposits were divided into two categories: distinct, defined by
sharp margins and nodular appearance, and indistinct, defined by indistinct
margins and less solid appearance. LALES cases for this study were diagnosed
with bilateral soft drusen deposits. Institutional review board approval was
obtained from the Los Angeles County/University of Southern California Medical
Center Institutional Review Board. All study procedures adhere to the principles
outlined in the Declaration of Helsinki for research involving human subjects.
10
Table 1.1: Age distribution and self-reported birthplace location across case-
control and gender status
Demographics Cases Controls Males Females
Age Average
(S.D.)
56. 4
(10.6)
56.8
(11.5)
57.8
(10.9)
55.8
(11.4)
Birthplace %
Mexico 67.32 67.33 64.71 70.05
USA 18.81 18.81 22.05 16.16
Other 13.87 13.86 13.24 13.79
1.4 Subject Selection for Population Admixture Study: Three datasets were
compiled for the estimation of Latino ancestry, as part of an ongoing ocular
disease study for the LALES cohort. Two currently genotyped datasets were
based on (1) 538 LALES subjects in a 1:1 case control ratio (Varma, Paz et al.
2004) and (2) on 30 Native Americans (25 Mayans and 5 Mexican Indians)
collected through Coriell’s human population blood sample (http://ccr.coriell.org/).
In addition, genotypes were provided for 355 individuals comprising Latinos from
the Los Angeles area (non LALES), as well as Asians, African Americans, and
Europeans obtained through our collaborative work with the Multi Ethnic Cohort
(MEC) of the cancer study group at University of Southern California. We give a
brief description of the three datasets below.
LALES Subjects: 538 LALES participants (268 AMD cases : 268 controls)
were genotyped for the population admixture study. Details of the LALES cohort
design are described above.
11
Native American Subjects: In order to establish a reference set for the NA
ancestry among Latinos, we genotyped 25 Mayan Amerindian and 5 Mexican
Indian DNA samples from Coriell’s human population repository collection
(http://ccr.coriell.org/). The Mayan samples were specifically chosen because
they represent ancient Native American civilizations that lived before the arrival
of Europeans in what nowadays are eastern and southern Mexico, El Salvador,
Guatemala, Belize, and Honduras. Since the dispersion of geographic regions for
the LALES population covers Mexico and most of Central America, the Mayan
and Mexican Indian samples overlap the birth locations for most of the LALES
cohort, with the exception of those born within the US.
MEC Subjects: The Multiethnic/Minority Cohort (MEC) study is a
prospective cohort of approximately 215,000 individuals from California (mainly
Los Angeles County) and Hawaii. This study was established between 1993 -
1996 through a collaboration of the Cancer Research Center of Hawaii/University
of Hawaii and the University of Southern California. A subset of 355 MEC
participants were incorporated in the present study, namely 70 individuals from
Centre d'Etude du Polymorphisme Humain (CEPH), 40 Europeans, 70 African
Americans, 70 Latinos from the Los Angeles area, 35 Japanese, 35 Hawaiians,
18 Shangai Chinese, and 17 Singapore Chinese.
12
1.5 Genotyping: A total of 303 LALES subjects were genotyped for the GWAS
and CNV analysis through the Affymetrix platform (Santa Clara, CA). The whole
genome amplification product was used as template for the GeneChip Mapping
500K for a total of 500,568 SNPs. Restriction enzymes HindIII or XbaI were used
for digesting the template. Restriction fragments were ligated to 250-2000 bp
adapter template fragments, which were further amplified, fragmented, and
hybridized to the 500k GeneChip Array.
Population Admixture Marker Selection: A set of 233 ancestry informative
markers (AIMs) had been genotyped for the MEC individuals from European,
Japanese, Chinese, native Hawaiian, African American, and Latino populations
as part of existing studies. These markers are spread throughout the whole
genome and exhibit a substantial difference in allele frequencies across
ethnicities from different geographical regions (Rosenberg, Li et al. 2003). In
addition, AIMs are specifically chosen to lack linkage with any known human
disease candidate locus and are hence selected to test and adjust association
studies for the ethnic matching of cases and controls. Given the existence of this
data, and our desire to incorporate it within our study, we ourselves genotyped
the LALES Latino sample and the Native American collection of individuals at the
same set of AIMs.
Population Admixture Genotyping: The 538 LALES and 30 NA subjects
were mapped through the Illumina Bead 384 platform for 233 AIMs (University of
13
Southern California Laboratory, Los Angeles, CA). A total of 176 SNPs out of 235
had genotyped call rates > 0.98 and were included in the analysis. Samples with
an overall genotype call rate ≤ 0.8 were also removed from analysis, resulting in
a final total of 500 LALES and 30 NA individuals being included in the analysis
(250 cases, 250 controls). The MEC sample was genotyped through the
384 Illumina GeneChip (University of Southern California Laboratory, Los
Angeles, CA).
14
Chapter 2
Genome-wide Identification of Genes Associated with
Early Age-Related Macular Degeneration
2.1 GWAS Overview: The Los Angeles Latino Eye Study (LALES) cohort is a
population based, cross-sectional study. The main objective of this study was to
identify new genes associated with early Age-related Macular Degeneration
(AMD) among LALES Latinos, and to characterize the biological mechanisms
through which these genetic factors exacerbate the risk of AMD.101 early AMD
cases and 202 controls with an average age (s.d.) of 56.7 (11.2) years were
ascertained through the LALES cohort and genotyped for the 500k Affymetrix
platform. Cases were identified through masked grading of fundus photographs
by the presence of bilateral soft drusen. Age- and gender-matched controls
lacked drusen in either eye. Matched frequencies for self-reported geographic
origins were performed for cases vs. controls: 67.32% Mexico, 18.81% USA, and
13.87% from other places. We performed genome-wide allelic association tests
followed by haplotype trend regression (HTR), the latter with the purpose of
improving the power for identifying multi-allelic cumulative effects in regions of
high linkage disequilibrium, signals which otherwise might fail to be detected
using a single allelic test of association. We report the discovery of AMD
15
associations (HTR p values = 10
-6
– 10
-8
; HTR permuted p values = 10
-2
– 10
-3
)
with variants involved in oxidative stress and retinal oxidoreductase metabolism,
along with immune and inflammatory mechanisms that activate the complement
system. However, in contrast to AMD pathways that characterize Caucasians, we
identified genes implicated in neurodevelopment and brain injury repair,
processes that may constitute essential mechanisms in the biogenesis of early
AMD among Latinos. Additional genes implicate the cardiovascular system, Zinc
ion binding, angiogenesis, and apoptosis. Discordant frequencies between
different ethnic groups for AMD risk alleles stress the pressing need for
determining unidentified genetic factors implicated in the pathogenesis of both
early and late AMD. Only 1.5% of early AMD cases progress to advanced AMD
among Latinos, compared to 3.4% in non-Hispanic Whites. This study provides a
unique opportunity to decipher genetic factors that are associated with early
AMD, and are attributed to the Latino population.
2.2 GWAS Methods: Genotype frequencies for all SNPs were tested for Hardy–
Weinberg equilibrium (HWE) in the control population. The set of SNPs with p <
2*10
-8
for the χ
2
HWE test were eliminated from further analyses, the majority of
them most likely due to incorrect calls. A total of 443,283 (88.60%) SNPs with
HWE p-value > 2*10
-8
were kept.
16
Inferring Missing Values: Based on Linkage Disequilibrium (LD)
correlations we applied the expectation maximization (EM) algorithm and inferred
missing values from neighboring markers as described by Chiano and Clayton.
24
In brief, after markers were sorted by physical position, the following algorithm
steps were applied for each marker with missing values: (1) within a window of
25 markers centered about the marker of interest the 8 highest LD markers were
selected; (2) an 8-marker haplotype was used to impute the missing value; (3)
the genotype was assigned only if the imputation probability was greater or equal
to 0.99. Approximately 15% of missing values were inferred.
Tagging SNPs: In order to (1) identify a subset of SNPs that captures the
majority of variation across the genome of the LALES population, and (2) to
reduce the number of multiple tests, we applied the method of Carlson
25
for
tagging SNPs. We considered only those SNPs whose minor-allele frequency
was greater than 0.05. Groupings of markers in strong correlation (LD r
2
statistic
≥ 0.80) with an individual marker or set of markers (tagging markers) were then
determined. This resulted in a set of 249,352 tag-SNPs. It is these tag-SNPs that
we used in all subsequent Haplotype Trend Regression (HTR) analyses.
Allele Association Test: Genome-wide single allelic test of association was
first applied using an additive model. This method used the Armitage test for
allelic “trend” in an ordered case/control contingency table, where ordering
depended on the number of minor alleles in the genotype–zero, one, or two. We
17
also tested genome-wide allelic association based on recessive and dominant
models of inheritance by computing the Pearson Chi-square test. False discovery
rate (FDR) p-values are given as a correction method for multiple testing. The
FDR approach proceeds as follows: given m tested null-hypotheses of which R
are rejected (positive results), and of the rejected hypotheses V are in effect null
(V= number of type I errors or false positive results), the FDR equals the
probability of making at least one rejection times the expected proportion of false
positives among all rejected hypotheses, that is, ) 0 | ( ) 0 Pr( > > = R
R
V
E R FDR .
Haplotype Trend Regression: To identify chromosomal regions that harbor
sets of SNPs whose cumulative contributions have an actual effect on
phenotype, but which might otherwise fail detection though a single allelic
association test, we employed the HTR method developed by Zaykin and
Westfall et al (Zaykin, Westfall et al. 2002). HTR fits an additive model of
haplotype effects and tests for association of haplotype frequencies with a
discrete or a continuous phenotype by calculating how much the odds of having
the haplotype in both chromosomes are changed if the given covariate changes
by one unit (from 0 for controls to 1 for cases). In brief, HTR estimates overall
haplotype frequencies based on a series of marker genotypes, applies the EM
algorithm to compute haplotype probabilities for each observation, and forms a
linear regression on the response using a regression matrix composed of the
individual’s haplotype probabilities (Excoffier and Slatkin 1995). A moving
18
window approach was used, the number of SNPs in each window being pre-
specified by the user (we explored a range of window sizes). For more details on
this method please refer to the Appendix.
Permutation Tests: As is increasingly common we use permutation tests
to assess the significance of HTR results (Herrera, Didishvili et al. 2004; Weiss,
Veenstra-Vanderweele et al. 2004; Bahlo, Stankovich et al. 2006; Li, Atmaca-
Sonmez et al. 2006; Dinu, Miller et al. 2007). In brief, we created 1000 data sets
by permuting the phenotypes in our observed data. Each of these was analyzed
in a manner identical to that of the HTR performed on the original, un-permuted
data. The approach is extremely computationally intensive, so calculation of
genome-wide p-values was impractical. Instead, we calculated the chromosome-
wide p-value for a given haplotype as the proportion of permuted replicates in
which the smallest p-value across all haplotypes on that chromosome was
smaller than the p-value observed for that haplotype on the original data.
2.3 GWAS Results: The majority of genes for which we identified AMD
associated SNPs fell into one of the following functional categories: visual
impairment, immune and inflammatory mechanisms, oxidative stress, CNS glial
cell activities, cardiovascular mechanisms, Zinc ion binding, apoptosis, and
cancer. These genes, along with LD and haplotype findings are diagramed in the
gene network overview of Figures 2.1 and 2.2.
19
Genome-wide Allele Test of Association: We initially performed genome-
wide allele tests of association through the additive, recessive, and dominant
models of inheritance. Comparison of results from the recessive and dominant
analysis revealed that the recessive model is almost always a better fit.
Consequently, we present results only for the additive and recessive models
here. The strongest 40 signals are listed in Tables 2.1 and 2.2 for the additive
model, and Tables 2.3 and 2.4 for the recessive model, respectively.
F Figure 2.1: AM MD gene netwo ork and molec cular functions based on the LALES genom me-wide allelic test and HTR
20
analyses
21
Figure 2.2: SNPs in Linkage Disequilibrium for VLDLR and KCNV2
22
Table 2.1: Genome-wide Allele Association Test Results on the Whole Set of
SNPs for the Additive Model
CHR Armitage
P Value
Armitage
FDR
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
1 4.62E-05 8.10E-01 235,796,792 q43 rs2779401 RYR2
1 6.11E-05 7.88E-01 214,039,418 q41 rs4363405 USH2A
*
3 1.64E-05 6.00E-01 117,051,004 q13.31 rs3772969 LSAMP
4 3.87E-05 8.48E-01 188,712,663 q35.2 rs6849870 FAT
5 3.84E-05 9.37E-01 5,766,635 p15.32 rs1560073 FLJ33360
*
5 4.37E-05 8.73E-01 148,387,901 q33.1 rs1432794 SH3TC2
5 9.27E-05 9.25E-01 121,154,351 q23.1 rs1431950 FTMT
6 1.95E-06 8.56E-01 135,077,065 q23.2 rs9385695 ALDH8A1
*
6 2.76E-06 6.05E-01 135,093,100 q23.2 rs9389212 ALDH8A1
*
6 1.41E-05 6.88E-01 135,094,938 q23.2 rs4896089 ALDH8A1
*
6 9.82E-05 9.16E-01 52,596,759 p12.2 rs9474260 TRAM2
*
7 1.90E-05 5.55E-01 77,505,244 q21.11 rs12668947 MAGI2
*
7 4.68E-05 7.61E-01 103,212,404 q22.1 ? RELN
*
7 4.83E-05 7.31E-01 2,770,036 p22.2 rs798485 GNA12
*
8 7.60E-06 1.00E+00 77,943,140 q21.11 rs7822671 ZFHX4
*
8 1.74E-05 5.88E-01 77,944,279 q21.11 rs6997459 PXMP3
8 8.71E-05 9.10E-01 119,276,210 q24.12 rs10955863 EXT1
*
9 8.29E-06 9.09E-01 2,899,111 p24.2 rs10511437 RFX3
9 3.42E-05 9.39E-01 128,588,878 q33.3 rs10819234 ZBTB43
*
9 5.67E-05 7.54E-01 14,002,182 p23 rs7848012 NFIB
*
9 6.78E-05 7.83E-01 111,609,173 q31.3 rs10759366 PALM2
9 8.78E-05 8.96E-01 111,617,299 q31.3 rs10980064 PALM2
*
23
Table 2.1, Continued
CHR Armitage
P Value
Armitage
FDR
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
10 1.16E-05 8.47E-01 82,738,032 q23.1 rs11187959 SH2D4B
*
10 1.41E-05 7.74E-01 126,060,139 q26.13 rs12218361 OAT
10 3.86E-05 8.92E-01 82,739,125 q23.1 rs7919358 SH2D4B
*
11 1.19E-05 7.45E-01 82,240,805 q14.1 rs7106307 PRCP
*
11 1.82E-05 5.71E-01 82,274,542 q14.1 rs7928693 PRCP
*
11 4.67E-05 7.88E-01 82,304,560 q14.1 rs11821336 FLJ25416
*
11 6.63E-05 7.86E-01 123,493,541 q24.1 rs2276049 LOH11CR2A
*
12 4.39E-05 8.38E-01 85,593,602 q21.32 rs2162673 MGAT4C
*
12 4.58E-05 8.38E-01 2,518,202 p13.33 rs11062261 CACNA1C
*
14 1.53E-05 6.72E-01 24,494,533 q12 rs2146916 STXBP6
*
14 5.47E-05 7.75E-01 24,342,766 q12 rs9323560 STXBP6
*
15 4.36E-05 9.11E-01 51,484,541 q21.3 rs1021746 ONECUT1
*
17 4.79E-05 7.50E-01 51,055,766 q22 rs9891412 TMEM100
18 5.57E-05 7.64E-01 49,311,296 q21.2 rs2270954 MBD2
*
18 8.53E-05 9.35E-01 57,509,359 q21.33 rs7242701 RNF152
20 7.27E-05 8.18E-01 54,248,065 q13.2 rs6069670 MC3R
22 6.47E-05 7.89E-01 29,893,555 q12.2 rs5994376 RNF185
*
22 9.39E-05 8.96E-01 15,733,947 q11.1 rs738043 XKR3
*
Note. * = Markers common in both the tagged set and in the whole set of SNPs;
CHR = Chromosome; FDR = False Discovery Rate; SNP RS ID = Single
Nucleotide Polymorphism Restriction Site Identification Number.
24
Table 2.2: Genome-wide Allele Association Test Results on the Tagged Set of
SNPs for the Additive Model
CHR Armitage
P Value
Armitage
FDR
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
1 6.11E-05 6.09E-01 214,039,418 q41 rs4363405 USH2A
*
1 1.24E-04 7.94E-01 18,077,419 p36.13 rs622689 IGSF21
4 1.50E-04 8.69E-01 111,628,167 q25 rs1814951 ENPEP
5 3.84E-05 7.37E-01 5,766,635 p15.32 rs1560073 FLJ33360
*
5 9.27E-05 7.23E-01 121,154,351 q23.1 rs1431950 FTMT
6 1.95E-06 4.87E-01 135,077,065 q23.2 rs9385695 ALDH8A1
*
6 2.76E-06 3.44E-01 135,093,100 q23.2 rs9389212 ALDH8A1
*
6 1.41E-05 5.03E-01 135,094,938 q23.2 rs4896089 ALDH8A1
*
6 9.82E-05 6.99E-01 52,596,759 p12.2 rs9474260 TRAM2
*
6 1.06E-04 7.36E-01 135,115,756 q23.2 rs4896099 ALDH8A1
7 1.90E-05 4.73E-01 77,505,244 q21.11 rs12668947 MAGI2
*
7 4.68E-05 6.14E-01 103,212,404 q22.1 ? RELN
*
7 4.83E-05 5.73E-01 2,770,036 p22.2 rs798485 GNA12
*
8 7.60E-06 6.32E-01 77,943,140 q21.11 rs7822671 ZFHX4
*
8 8.71E-05 7.24E-01 119,276,210 q24.12 rs10955863 EXT1
*
8 1.60E-04 9.06E-01 41,749,562 p11.21 rs4737009 ANK1
9 3.42E-05 7.76E-01 128,588,878 q33.3 rs10819234 ZBTB43
*
9 8.78E-05 7.06E-01 111,617,299 q31.3 rs10980064 PALM2
*
9 1.39E-04 8.46E-01 13,998,652 p23 rs7859915 NFIB
*
10 1.16E-05 7.22E-01 82,738,032 q23.1 rs11187959 SH2D4B
*
10 1.41E-05 5.87E-01 126,060,139 q26.13 rs12218361 OAT
*
10 3.86E-05 6.88E-01 82,739,125 q23.1 rs7919358 SH2D4B
*
25
Table 2.2, Continued
CHR Armitage
P Value
Armitage
FDR
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
11
1.19E-05 5.93E-01 82,240,805 q14.1 rs7106307 PRCP
*
11 1.82E-05 5.05E-01 82,274,542 q14.1 rs7928693 PRCP
*
11 4.67E-05 6.46E-01 82,304,560 q14.1 rs11821336 FLJ25416
*
11 6.63E-05 6.12E-01 123,493,541 q24.1 rs2276049 LOH11CR2A
*
12 4.39E-05 6.84E-01 85,593,602 q21.32 rs2162673 MGAT4C
*
12 4.58E-05 6.72E-01 2,518,202 p13.33 rs11062261 CACNA1C
*
14 1.53E-05 4.77E-01 24,494,533 q12 rs2146916 STXBP6
*
14 5.47E-05 5.94E-01 24,342,766 q12 rs9323560 STXBP6
*
14 1.65E-04 8.93E-01 24,493,787 q12 rs7157671 STXBP6
15 4.36E-05 7.24E-01 51,484,541 q21.3 rs1021746 ONECUT1
*
15 1.11E-04 7.47E-01 55,531,372 q21.3 rs2271406 CGNL1
17 4.79E-05 5.97E-01 51,055,766 q22 rs9891412 TMEM100
*
17 1.68E-04 8.94E-01 51,090,120 q22 rs4794607 TMEM100
18 5.57E-05 5.79E-01 49,311,296 q21.2 rs2270954 MBD2
18 1.36E-04 8.45E-01 49,289,928 q21.2 rs4939732 DCC
20 7.27E-05 6.47E-01 54,248,065 q13.2 rs6069670 MC3R
22 6.47E-05 6.20E-01 29,893,555 q12.2 rs5994376 RNF185
*
22 9.39E-05 6.89E-01 15,733,947 q11.1 rs738043 XKR3
*
Note. * = Markers common in both the tagged set and in the whole set of SNPs;
CHR = Chromosome; FDR = False Discovery Rate; SNP RS ID = Single
Nucleotide Polymorphism Restriction Site Identification Number.
26
Table 2.3: Genome-wide Allele Association Test Results on the Whole Set of SNPs for the Recessive Model
CHR
Chi-Sq.
P Value
Chi-Sq.
FDR
Odds
Ratio
Odds
LCB
Odds
UCB
Physical
Position
Cytoband
SNP
RS ID
Associated
Gene
1 1.09E-07 4.80E-02 20.16 4.52 89.82 175,121,140 q25.2 rs4652201 ASTN1
*
1 2.53E-07 5.55E-02 18.82 4.23 83.67 175,099,236 q25.2 rs6425398 ASTN1
*
1 2.73E-07 3.99E-02 18.73 4.21 83.25 175,113,685 q25.2 rs4652199 ASTN1
1 6.53E-05 1.00E+00 3.2 1.79 5.73 194,182,106 q31.3 rs5024503 KCNT2
*
1 1.75E-04 1.00E+00 18.8 2.34 150.97 84,743,535 p22.3 rs3813605 GNG5
*
1 2.33E-04 1.00E+00 0.21 0.09 0.52 65,140,630 p31.3 rs12743599 JAK1
*
1 2.36E-04 1.00E+00 8.11 2.21 29.77 175,151,790 q25.2 rs4652206 ASTN1
*
2 4.25E-05 1.00E+0013.92 2.91 66.5 28,645,326 p23.2 rs4666096 PLB1
*
2 4.61E-05 1.00E+00 3.08 1.77 5.37 169,865,126 q31.1 rs830964 LRP2
*
2 6.72E-05 1.00E+00 12.1 2.63 55.72 31,350,182 p23.1 rs644736 EHD3
*
2 2.10E-04 1.00E+00 2.77 1.6 4.8 204,491,827 q33.2 rs7596727 CTLA4
*
3 2.16E-04 1.00E+00 4.4 1.91 10.15 212,075 p26.3 rs1516340 CHL1
*
4 3.87E-05 1.00E+0021.98 2.77 174.27 154,873,314 q31.3 rs2606328 RNF175
4 1.12E-04 1.00E+00 3.83 1.88 7.82 113,509,847 q25 rs1859142 ALPK1
*
4 2.64E-04 1.00E+00 6.79 2.12 21.73 102,202,630 q23 rs6851231 PPP3CA
*
27
Table 2.3, Continued
CHR Chi-Sq.
P Value
Chi-Sq.
FDR
Odds
Ratio
Odds
LCB
Odds
UCB
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
5 8.48E-05 1.00E+00 3.11 1.74 5.56 125,421,412 q23.2 rs3849067 GRAMD3
*
5 1.44E-04 1.00E+00 3.75 1.84 7.66 151,179,395 q33.1 rs2915891 G3BP1
5 1.89E-04 1.00E+00 3.11 1.68 5.74 115,298,766 q23.1 rs890785 ATG12
5 2.30E-04 1.00E+00 2.91 1.63 5.2 129,175,651 q23.3 rs1963407 CSS3
5 2.70E-04 1.00E+00 17.8 2.19144.45 31,515,362 p13.3 rs9292428 RNASEN
6 1.59E-05 1.00E+00 5.76 2.41 13.74135,093,100 q23.2 rs9389212 ALDH8A1
*
6 1.36E-04 1.00E+00 3.54 1.8 6.96 157,876,496 q25.3 rs9457715 ZDHHC14
*
6 1.60E-04 1.00E+00 0.24 0.11 0.52 138,037,827 q23.3 rs651973 OLIG3
*
6 2.01E-04 1.00E+00 6.06 2.11 17.41135,077,065 q23.2 rs9385695 ALDH8A1
*
7 4.17E-05 1.00E+00 6.93 2.44 19.7 114,142,696 q31.1 rs4730641 FOXP2
*
7 7.54E-05 1.00E+0012.18 2.63 56.4138,982,540 q34 rs1464798 HIPK2
7 8.94E-05 1.00E+00 5.93 2.22 15.88 114,147,908 q31.1 rs4730643 FOXP2
7 2.49E-04 1.00E+00 6.79 2.13 21.65 2,859,330 p22.2 rs1636255 CARD11
7 2.72E-04 1.00E+00 4.62 1.9 11.21114,170,144 q31.1 rs684253 MDFIC
8 8.76E-06 7.69E-0110.85 3.04 38.77 88,994,844 q21.3 rs7839686 WDR21C
*
28
Table 2.3, Continued
CHR Chi-Sq.
P Value
Chi-Sq.
FDR
Odds
Ratio
Odds
LCB
Odds
UCB
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
8 4.39E-05 1.00E+00 5.72 2.28 14.33 41,749,562 p11.21 rs4737009 ANK1
*
8 8.39E-05 1.00E+00 3.3 1.79 6.1 15,805,264 p22 rs6994573 TUSC3
*
9 8.96E-05 1.00E+00 3.91 1.91 7.98 318,006 p24.3 rs2296828 DOCK8
*
9 9.14E-05 1.00E+00 3.78 1.88 7.6 19,647,975 p22.1 rs10738546 SLC24A2
*
9 1.01E-04 1.00E+00 4.58 2.02 10.38 29,030,151 p21.1 rs12003345 LRRN6C
9 2.57E-04 1.00E+00 8.05 2.19 29.58 111,685,289 q31.3 rs12237174 PALM2
*
10 3.73E-06 4.09E-01 3.59 2.05 6.26 82,739,125 q23.1 rs7919358 SH2D4B
*
10 9.30E-05 1.00E+00 8.85 2.44 32.15 79,546,360 q22.3 rs2253909 RPS24
*
10 1.58E-04 1.00E+00 3.99 1.87 8.5 76,596,709 q22.2 rs2657291 SAMD8
10 2.57E-04 1.00E+00 2.96 1.63 5.38 78,905,202 q22.3 rs42311 KCNMA1
*
11 5.93E-05 1.00E+00 3.34 1.82 6.13 123,127,296 q24.1 rs11219274 ZNF202
11 2.24E-04 1.00E+00 6.03 2.08 17.43 13,530,401 p15.2 rs10500784 PTH
*
12 2.41E-04 1.00E+00 3.14 1.67 5.91 98,623,660 q23.1 rs1500662 ANKS1B
*
13 2.76E-04 1.00E+00 4.95 1.94 12.59 83,699,592 q31.1 rs4303371 SLITRK1
*
14 9.54E-05 1.00E+00 4.16 1.95 8.87 70,671,979 q24.2 rs221901 PCNX
29
Table 2.3, Continued
CHR Chi-Sq.
P Value
Chi-Sq.
FDR
Odds
Ratio
Odds
LCB
Odds
UCB
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
14 1.15E-04 1.00E+00 5.75 2.16 15.35 24,196,182 q12 rs7154849 GZMB
*
14 1.59E-04 1.00E+00 5.14 2.04 12.94 75,845,696 q24.3 rs7155312 ESRRB
14 2.19E-04 1.00E+00 18.37 2.26 149.3 38,546,258 q21.1 rs8014021 SEC23A
*
16 1.28E-04 1.00E+00 4.43 1.97 9.99 8,100,368 p13.2 rs9635544 C16orf68
*
19 9.08E-05 1.00E+00 6.51 2.27 18.66 57,280,344 q13.33 rs16983430 ZNF616
Note. * = Markers common in both the tagged set and in the whole set of SNPs; CHR = Chromosome;
Chi-Sq. = Chi-Square; FDR = False Discovery Rate; LCB = Lower Confidence Bound; UCB = Upper Confidence Bound;
SNP RS ID = Single Nucleotide Polymorphism Restriction Site Identification Number.
30
Table 2.4: Genome-wide Allele Association Test Results on the Tagged Set of SNPs for the Recessive Model
CHR Chi-Sq.
P Value
Chi-Sq.
FDR
Odds
Ratio
Odds
LCB
Odds
UCB
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
1 1.09E-07 2.73E-02 20.16 4.52 89.82 175,121,140 q25.2 rs4652201 ASTN1
*
1 2.53E-07 3.15E-02 18.82 4.23 83.67 175,099,236 q25.2 rs6425398 ASTN1
*
1 6.53E-05 1.00E+00 3.2 1.79 5.73 194,182,106 q31.3 rs5024503 KCNT2
*
1 1.75E-04 1.00E+00 18.8 2.34 150.97 84,743,535 p22.3 rs3813605 GNG5
*
1 2.33E-04 1.00E+00 0.21 0.09 0.52 65,140,630 p31.3 rs12743599JAK1
*
1 2.36E-04 1.00E+00 8.11 2.21 29.77 175,151,790 q25.2 rs4652206 ASTN1
*
1 2.76E-04 1.00E+00 4.16 1.84 9.42 175,039,581 q25.2 rs1555070 PAPPA2
1 3.84E-04 1.00E+00 5.19 1.93 14 189,752,293 q31.2 rs609733 RGS18
2 4.25E-05 1.00E+00 13.92 2.91 66.5 28,645,326 p23.2 rs4666096 PLB1
*
2 4.61E-05 1.00E+00 3.08 1.77 5.37 169,865,126 q31.1 rs830964 LRP2
*
2 6.72E-05 1.00E+00 12.1 2.63 55.72 31,350,182 p23.1 rs644736 EHD3
*
2 2.10E-04 1.00E+00 2.77 1.6 4.8 204,491,827 q33.2 rs7596727 CTLA4
*
2 2.78E-04 1.00E+00 3.53 1.74 7.13 206,304,908 q33.3 rs1983343 NRP2
31
Table 2.4, Continued
CHR Chi-Sq.
P Value
Chi-Sq.
FDR
Odds
Ratio
Odds
LCB
Odds
UCB
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
3
2.16E-04 1.00E+00 4.4 1.91 10.15 212,075 p26.3 rs1516340 CHL1
*
3 3.06E-04 1.00E+00 5.83 2.02 16.85 4,717,779 p26.2 rs6774180 ITPR1
3 3.75E-04 1.00E+00 5.7 1.97 16.49 133,157,408 q22.1 rs1870712 CPNE4
4 1.12E-04 1.00E+00 3.83 1.88 7.82 113,509,847 q25 rs1859142 ALPK1
*
4 2.64E-04 1.00E+00 6.79 2.12 21.73 102,202,630 q23 rs6851231 PPP3CA
*
5 8.48E-05 1.00E+00 3.11 1.74 5.56 125,421,412 q23.2 rs3849067 GRAMD3
*
5 1.89E-04 1.00E+00 3.11 1.68 5.74 115,298,766 q23.1 rs890785 ATG12
*
6 1.59E-05 7.95E-01 5.76 2.41 13.74 135,093,100 q23.2 rs9389212 ALDH8A1
*
6 1.36E-04 1.00E+00 3.54 1.8 6.96 157,876,496 q25.3 rs9457715 ZDHHC14
*
6 1.60E-04 1.00E+00 0.24 0.11 0.52 138,037,827 q23.3 rs651973 OLIG3
*
6 2.01E-04 1.00E+00 6.06 2.11 17.41 135,077,065 q23.2 rs9385695 ALDH8A1
*
6 3.65E-04 1.00E+00 6.68 2.06 21.69 135,094,938 q23.2 rs4896089 ALDH8A1
7 7.54E-05 1.00E+00 12.18 2.63 56.4 138,982,540 q34 rs1464798 HIPK2
32
Table 2.4, Continued
CHR Chi-Sq.
P Value
Chi-Sq.
FDR
Odds
Ratio
Odds
LCB
Odds
UCB
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
7 2.49E-04 1.00E+00 6.79 2.13 21.65 2,859,330 p22.2 rs1636255 CARD11
7 3.35E-04 1.00E+00 4.83 1.9 12.28 114,161,587 q31.1 rs983483 FOXP2
8 8.76E-06 5.46E-01 10.85 3.04 38.77 88,994,844 q21.3 rs7839686 WDR21C
*
8 4.39E-05 1.00E+00 5.72 2.28 14.33 41,749,562 p11.21 rs4737009 ANK1
*
8 8.39E-05 1.00E+00 3.3 1.79 6.1 15,805,264 p22 rs6994573 TUSC3
*
8 3.72E-04 1.00E+00 17.03 2.1 138.18 131,741,750 q24.22 rs4645574 DDEF1
9 8.96E-05 1.00E+00 3.91 1.91 7.98 318,006 p24.3 rs2296828 DOCK8
*
9 9.14E-05 1.00E+00 3.78 1.88 7.6 19,647,975 p22.1 rs10738546SLC24A2
*
9 2.57E-04 1.00E+00 8.05 2.19 29.58 111,685,289 q31.3 rs12237174 PALM2
*
10 3.73E-06 3.10E-01 3.59 2.05 6.26 82,739,125 q23.1 rs7919358 SH2D4B
*
10 9.30E-05 1.00E+00 8.85 2.44 32.15 79,546,360 q22.3 rs2253909 RPS24
*
10 2.57E-04 1.00E+00 2.96 1.63 5.38 78,905,202 q22.3 rs42311 KCNMA1
*
10 3.53E-04 1.00E+00 10.3 2.18 48.72 98,845,455 q24.1 rs17112401 SLIT1
33
Table 2.4, Continued
CHR Chi-Sq.
P Value
Chi-Sq.
FDR
Odds
Ratio
Odds
LCB
Odds
UCB
Physical
Position
Cytoband SNP
RS ID
Associated
Gene
11 2.24E-04 1.00E+00 6.03 2.08 17.43 13,530,401 p15.2 rs10500784 PTH
*
12 2.41E-04 1.00E+00 3.14 1.67 5.91 98,623,660 q23.1 rs1500662 ANKS1B
*
13 2.76E-04 1.00E+00 4.95 1.94 12.59 83,699,592 q31.1 rs4303371 SLITRK1
*
14 1.15E-04 1.00E+00 5.75 2.16 15.35 24,196,182 q12 rs7154849 GZMB
*
14 2.19E-04 1.00E+00 18.37 2.26 149.3 38,546,258 q21.1 rs8014021 SEC23A
*
14 3.69E-04 1.00E+00 3.17 1.64 6.09 31,413,712 q12 rs2151783 NUBPL
16 1.28E-04 1.00E+00 4.43 1.97 9.99 8,100,368 p13.2 rs9635544 C16orf68
*
16 3.04E-04 1.00E+00 4.57 1.88 11.09 79,809,347 q23.2 rs7199144 PKD1L2
17 2.86E-04 1.00E+00 5.32 1.98 14.3 51,090,120 q22 rs4794607 TMEM100
19 4.01E-04 1.00E+00 2.79 1.57 4.97 16,050,243 p13.12 rs2278006 TPM4
20 3.07E-04 1.00E+00 7.97 2.16 29.4 22,712,291 p11.21 rs2424473 SSTR4
Note. Markers common in both the tagged set and in the whole set of SNPs; CHR = Chromosome;
Chi-Sq. = Chi-Square; FDR = False Discovery Rate; SNP RS ID = Single Nucleotide Polymorphism
Restriction Site Identification Number; LCB = Lower Confidence Bound; UCB = Upper Confidence Bound.
34
Haplotype Trend Regression (HTR): We employed the HTR approach for
moving windows of varying sizes, starting with a minimum window of 4 SNPs and
then, when associations were found, subsequently increasing the number of
SNPs per window until statistical significance was lost. The 40 most significant
haplotypes are summarized in Table 2.5. Ten of these haplotypes are contained
within genes that themselves contained one or more SNPs ranked in the top 50
genome-wide allele test signals. These genes are: ALDH8A1, ASTN1, DCC,
GALNTL4, PAPPA2, PRCP, SH2D4B, SMARCA2, TUSC1, and VLDLR. Based
on these results we summarize in the following paragraphs the main genes in
which we detect a signal and for which potential defective mechanisms might
interfere with AMD-related pathways and/or interact with biological risk factors
predisposing for development of AMD.
35
Table 2.5: Haplotype Trend Regression Results
CHR Associated
Gene(s)
Haplotype
SNP RS ID Variables
Cytoban
d
P Value Permutation
P Value
1
PAPPA2 -
ASTN1
rs10913259 – rs16850236 – rs17378432 – rs1883432 –
rs6425398 – rs17313104 – rs4652201 – rs4652202
1q25.2 2.02E-08 3.00E-03
1 ASTN1 rs17313104 - rs4652201 - rs4652202 - rs6664879-
rs9659386- rs6664184 – rs10166122 - rs11587780
1q25.2 1.21E-07 7.00E-03
1 PAPPA2 -
ASTN1
rs6656137 - rs10913259 – rs16850236 – rs17378432 –
rs1883432 -rs6425398 – rs17313104 – rs4652201
1q25.2 2.30E-07 4.00E-03
1 ASTN1 rs6425398 - rs17313104 - rs4652201- rs4652202 -
rs6664879- rs9659386- rs6664184 – rs10166122
1q25.2 6.25E-07 7.00E-03
5 PRDM9 rs10051288 – rs5009031 – rs2973628 – rs1366396 –
rs2914263 – rs2914255
5p14.2 1.19E-07 1.00E-03
5 AYTL2 rs27052 – rs16878933 – rs11952520 – rs31491 – rs2962064
– rs7700579
5p15.33 9.81E-07 2.0E-02
5 ZNF608 rs1472607 – rs11949944 – rs2244177 – rs179354 –
rs11957402 – rs330433 – rs30743
5q23.2 1.30E-06 2.0E-03
5 SEMA5A rs1205718 – rs40654 – rs183733 – rs805153 – rs1805927 –
rs390322 – rs380206 – rs415024
5p15.2 1.86E-06 4.0E-02
5 TGFBI -
LECT2 -
FBXL21
rs30743 – rs4976360 rs6882087 – rs17740150 -
rs11747904 – rs2282790
5q31.1 2.14E-06 4.0E-02
36
Table 2.5, Continued
CHR Associated
Gene(s)
Haplotype
SNP RS ID Variables
Cytoband
P Value Permutation
P Value
6 KCNQ5 rs6910780 – rs3005765 – rs11960994 – rs4707991 –
rs9351945 – rs4707995 – rs4615346 – rs9360603
6q13 6.047E-08 2.00E-03
6 PRIM2A rs1119375 – rs6932152 – rs6914118 – rs7738502 –
rs6914118 – rs7738502 – rs6919074 – rs1855410 –
rs4541746
6p11.2 1.37E-07 5.00E-03
6 SGK -
ALDH8A1
rs9321467 - rs6905115 – rs7765392 – rs46170076 –
rs6569967 – rs17705152 – rs9385695
6q23.2 5.61E-07 7.00E-03
6 SGK rs1474828 - rs9321467 - rs7765392 – rs1407107 -
rs9385695
6q23.2 3.00E-07 5.00E-03
6 HACE1 rs4518532 – rs7765474 – rs9404463 – rs4476862 –
rs9322774
6q16.3 1.95E-06 3.00E-02
9
C9orf71 -
PIP5K1B -
FAM122A
rs1411992 – rs10869043 – rs4213482 - rs11142941 –
rs869950 – rs9644996 – rs7046236 - ? –rs17388292 –
rs2183738
9q13 1.32E-08 1.00E-03
9 FAM125B rs4838341 – rs10987167 – rs4838350 – rs882061 –
rs10819130 – rs888228 – rs888225 – rs10760419
9q33.3 1.88E-07 3.00E-03
9 PALM2-
AKAP2
rs10816909 – rs4978411 – rs12000943 – rs10739287 –
rs10816913 – rs10816913 – rs7864336 – rs10816919 –
rs260193
9q31.3 1.90E-07 3.00E-03
9 MTAP rs7860126 – rs12004745 – rs10811624 – rs2039971 –
rs7851125 – rs7027989 – rs3928894 – rs3931609
9p21.3 2.66E-08 1.00E-03
37
Table 2.5, Continued
CHR Associated
Gene(s)
Haplotype
SNP RS ID Variables
Cytoband
P Value Permutation
P Value
9 ASB6 -
C9orf32-
C9orf50
rs10819541 – rs1220688 – rs7035800 – rs1231463 –
rs4836660 – rs10988439 – rs2542242 – rs10988449
9q34.11 7.71E-08 2.00E-03
9 ALDH1B1 rs10118416 – rs4271056 – rs10973720 – rs4878191 –
rs10814670 – rs7044431 – rs10973734 – rs1885491
9p13.1 4.66E-07 6.00E-03
9 ABCA1 rs3098264 – rs1809924 – rs1809921 – rs3132115 –
rs2102121 – rs10991509 - rs10991522 – rs2937353
9q31.1 1.56E-06 2.00E-02
9 TUSC1 rs932983 – rs1318664 – rs1758727 – rs7856152 –
rs4083606 – rs17176454 – rs10757522 - ? (snp1996307)
9p21.3 1.84E-06 3.00E-02
9 SMARCA2
VLDLR
rs11789700 – rs13298799 – 13287382 – rs16938404 –
rs10125660 – rs1535839 – rs1324202 – rs10448165
9p24.2 8.55E-07 1.00E-02
9 SMARCA2 rs16937675 – rs7040968 – rs10811539 – 1340186 –
rs3793504 – rs7032394 – rs10491695 – rs3793511
9p24.3 2.59E-06 4.00E-02
10 ABCC2 rs17216282 – rs3740065 – rs7067971 – 2256700 –
rs2256678
10q24.2 9.01E-07 2.00E-02
10 ABCC2 rs3740065 – rs7067971 – 2256700 – rs2256678 –
rs2251478
10q24.2 1.13E-06 3.00E-02
10 PITRM1 - KLF6 rs7087863 – rs2814913 – rs7088720 – rs10795014 –
rs2814925
10q23.1 1.30E-06 3.00E-02
10 SH2D4B rs11187959 – rs7919358 – rs7912838 – rs7916201 –
rs7097203
10q23.1 1.47E-06 3.00E-02
38
Table 2.5, Continued
CHR Associated
Gene(s)
Haplotype
SNP RS ID Variables
Cytoband
P Value Permutation
P Value
11 PRCP rs7106307 – rs3793981 – rs11233352 - rs11233357 11p15.3 1.73E-06 4.00E-02
18 DCC rs4940260 – rs4939732 – rs10502979 – rs8088048 18q21.2 4.63E-07 7.00E-03
18 SLC14A1 rs2298718 – rs11082467 – rs9955503 – rs474270 18q12.3 2.54E-06 3.00E-02
20 TMEPAI rs191269 – rs4811944 – rs326822 – rs1334117 20q13.32 6.92E-08 1.00E-03
20 RSPO4 rs542451 – rs2223961 - rs2207323 – rs6086704 20p13 9.77E-07 6.00E-03
20 TMEPAI rs7269897 – rs157087 – rs157092 – rs157093 20q13.31 2.01E-06 1.00E-03
22 FLJ44385 rs737356 – rs714007 – rs713905 – rs11705496 9q13.32 1.75E-06 6.00E-03
22 FLJ44385 rs9616300 - rs737356 – rs714007 – rs713905 9q13.32 3.43E-06 1.00E-02
22 FAM19A5 rs9615820 – rs738187 – rs713762 – 469999 22q13.31 3.87E-06 2.00E-02
X MID1IP1 rs198781 – rs199895 – rs6520536 – rs199872 – rs199867-
rs198781
Xp11.14 1.04E-06 1.00E-02
X SH2D1A rs6649176 – rs6648583 – rs5956633 – rs1391622 –
rs6608182 - rs5958445
Xq25 5.04E-06 4.00E-02
Note. These results are based on the HTR analysis for the tagged set of SNPs; CHR = Chromosome;
SNP RS ID = Single Nucleotide Polymorphism Restriction Site Identification Number.
39
Chromosome 1. Neural glial cells respond to amyloid beta (Aβ
proteins, oligomers generated through inflammatory responses. While amyloid
oligomers are found in drusen deposits and RPE cells, they are involved in the
pathogenesis of amyloid diseases such as AMD due to their toxicity toward cells.
Therefore, it has been hypothesized that the presence of oligomers may lead to
activation of the complement cascade and to formation of drusen. On cytoband
1q25.2 we identified a susceptible region for astrotactin 1 (ASTN1), containing a
9-SNP haplotype that spans 54.83 kilo-base pairs (Kbp). ASTN1 is a neuronal
protein that contributes during brain development to the glial guided movement of
young post mitotic neuroblasts in cortical regions (including those of the olfactory
bulb). Four of these SNPs (rs452201, rs6425398, rs4652199, and rs4652206)
ranked as the 1
st
, 2
nd
, 3
rd
, and 42
nd
most associated SNPs for the recessive
allelic test, with odds ratios (ORs) of 20.16, 18.82, 18.73, and 8.11, respectively.
A parallel LALES study is currently analyzing the implications and effects of
ASTN1 expression levels on AMD pathology. This work will be reported in
Tedeschi-Blok et al (2008).
On cytoband 1q25.2 there is strong linkage disequilibrium between
pappalysin-2 (PAPPA2) and downstream ASTN1 SNPs, over 56.94 Kbp. Five
PAPPA2 SNPs (rs6656137, rs10913259, rs16850236, rs17378432, rs1883432)
are part of two PAPPA2-ASTN1 significant haplotypes. Also, 20.19 Kbp
upstream of these five SNPs an additional PAPPA2 marker (rs1555070) gave the
36
th
genome-wide recessive association result, with an OR of 4.16. PAPPA2 is a
40
metalloproteinase that specifically cleaves the insulin-like growth factor-binding
protein (IGFBp-5). Smith et al (Smith, Shen et al. 1999) verified through a
hypoxic mouse model that the interaction of PAPPA2 and IGFBP is essential for
the
etiology of retinopathy, mainly because an inhibitory action of IGF directly
decreases VEGF enhancement of proliferative retinopathy, and of diabetic
retinopathy and retinopathy of prematurity. All of these outcomes are
characterized by similar neovascularizations. RPE cells in AMD patients over-
express VEGF, a homodimeric glycoprotein that stimulates angiogenesis
beneath the retina (Frank, Amin et al. 1996; Lopez, Sippy et al. 1996; Spilsbury,
Garrett et al. 2000). The newly formed and unstable blood vessels leak fluid and
blood, and ultimately produce irreversible scarring and loss of vision (Kvanta
1995). Also, Ferrara et al. (Ferrara, Gerber et al. 2003) conducted a study in
which histopathologic specimens of excised choroidal neovascular membrane
from patients with late AMD showed strong immunoreactive VEGF-A levels, with
expression of both VEGF-A mRNA and VEGF-A protein. This cascade of
interactions suggests a potentially modifiable role of PAPPA2 on IGF, and
ultimately on VEGF expression levels.
Chromosome 5. Eight markers (rs1205718, rs40654, rs183733, rs805153,
rs180592, rs390322, rs380206, and rs415024) located on semaphorin SEMA5A
formed a 35.25 Kbp significant haplotype on cytoband p15.2. This region is in
close proximity to the macular dystrophy retinal 3 (MCDR3) gene. In mice,
SEMA5A was shown to induce collapse of Retinal Glial Cell (RGC) growth
41
cones, inhibit RGC axon regeneration, and participate in diverse immune
responses (Goldberg, Vargas et al. 2004). Also, based on quantitative RT-PCR,
Sugimoto et al (Sugimoto, Fujikawa et al. 2006) established in bovines that
enhanced
SEMA5A induces expression of several genes implicated in immune
responses, including IL-8. The IL8-SEMA5A interaction generates a permissive
environment for developing AMD. Specifically, during early stages of AMD an
ingestion of oxidized photoreceptor outer segments stimulates the expression of
IL8, an angiogenic factor that recruits macrophages into the subretinal space
(Higgins 2003). Macrophage recruitment in the healthy eye removes
accumulated debris (Nicolai and Eckardt 1993; Ambati, Anand et al. 2003).
However, as age increases, an accumulation of oxidized lipoproteins
alters normal macrophage behavior and upregulates the secretion of matrix
metalloproteases and glycosidases (Nicolai and Eckardt 1993; Galis 1994;
Krzystolik, Afshari et al. 2002; Ambati, Anand et al. 2003). Notably, the
semaphorin family also consists of SEMA4A, a protein whose defects lead to
dominant retinitis Pigmentosa (Abid, Ismail et al. 2006). A second haplotype on
chromosome 5 belongs to the PR-domain Zinc finger protein 9 (PRDM9).
Besides the impact of PR-domain family genes on tumorigenesis, PRDM9 is also
required for Zinc ion binding. Notably, the AREDS study showed a strong
correlation between Zinc supplements and the degree of visual improvement in
treating late AMD patients (SanGiovanni, Chew et al. 2007).
42
Chromosome 6. We identified a 6-SNP haplotype (rs6905115, rs7765392,
rs46170076, rs656996, rs17705152, and rs9385695) spanning 50.32 Kbp on
ALDH8A1. Noteworthy, marker rs9385695, along with two additional SNPs
(rs9389212, and rs489608) gave the 1
st
, 2
nd
, and 8
th
genome-wide significant p-
values for the allelic additive test model. ALDH8A1 is a retinaldehyde
dehydrogenase that plays a key role in the 9-cis-retinoic acid biosynthesis
pathway by oxidizing retinaldehyde into retinoic acid. Interestingly, several
serum-glucocorticoid regulated kinase (SGK1) SNPs are in moderate LD with
ALDH8A1 SNPs. Moreover, SGK1 marker rs9321457 is contained within the
same haplotype as the 6 ALDH8A1 SNPs. An additional set of 5 SGK1 SNPs
formed a 136.58 Kbp haplotype (rs1474828, rs9321467, rs7765392, rs1407107,
and rs9385695), upstream of the identified ALDH8A1 SNPs. SGK1 negatively
controls FOXO3, a repressor of CFH during oxidative stress. Furthermore, SGK1
behaves as a mineralocorticoid receptor (MR) target in the kidney, leading to
increased reabsorption of salt and water, and therefore to hypertension (Busjahn,
Aydin et al. 2002; F. Luca 2007). In addition, over expression of SGK1
considerably enhances dendritic growth and branching in spinal cord neurons
(Yang, Lin et al. 2006), and stimulates neuronal and glial amino acid transport
(Bohmer C 2006). While SGK1 regulates blood pressure and hypertension, Lee
et al. (2001) established through a mouse model that activity is pronounced
during embryogenesis and postnatal development, when it becomes highly
expressed in the brain. In addition, over expression of SGK1 considerably
43
enhances dendritic growth and branching in spinal cord neurons (Yang, Lin et al.
2006) and stimulates neuronal and glial amino acid transport (Bohmer C 2006).
Also on chromosome 6, cytoband q14 conferred association for the
potassium voltage-gated channel subunit KCNQ5 gene, for which we detected
an 8-SNP significant haplotype (rs6910780, rs3005765, rs11960994, rs470799,
rs9351945, rs4707995, rs4615346, and rs9360603) of length 88.03 Kbp.
KCNQ5 is expressed predominately in the retina and brain and has been linked
with Stargardt-like macular dystrophy, and cone-rod dystrophy (Kang Zhang,
Metzker et al. 2001). It is worth noting that potassium voltage-gated channel
subunits KCNQ1-KCNQ4 function in the same gene network as SGK1, for which
we detected 6 SNPs in association with AMD. Singh et al. (Singh, Ristau et al.
2005) investigated the genetic factors associated with macular drusen in macaca
mulatta, a non-human primate, and identified marker D6S1036 on 6q14-15 to be
potentially associated with drusen formation. This marker spans ~ 8Mb, and
includes KCNQ5 along with 9 additional genes, all believed to be implicated in
human macular degeneration syndromes.
Chromosome 9. The very low density lipoprotein receptor (VLDLR) gave
statistical significance for 3 markers (rs1535839, rs1324202, and rs10448165)
which joined with 5 SMARCA2 markers (rs11789700, rs13298799, 13287382,
rs16938404, and rs10125660) to form an 8-SNP haplotype of size 39.21 Kbp.
This result corroborates the earlier work of Haines et al (Haines, Schnetz-
44
Boutaud et al. 2006) who found the VLDLR rs2290465 marker at position
2,635,201 (9p24.2) to be associated with AMD in both a family based and a
case-control dataset. VLDLR, a receptor for apolipoprotein E (APOE) participates
in the transport and metabolism of lipids. Also, through its binding to Reelin,
VLDLR participates in neuronal signaling
pathways by governing neuronal
migration during brain development and during brain injury repair (Trommsdorff,
Gotthardt et al. 1999). We also discovered a second haplotype formed by 8
SMARCA2 SNPs, between positions 2,134,687 and 2,160,512, upstream of the
VLDLR-SMARCA2 haplotype. SMARCA2, a member of the SWI-SNF protein
complex, is primarily responsible for chromatin remodeling but is also
participating in retinoblastoma tumor suppressor activities (Strobeck, Reisman et
al. 2002; Kang, Cui et al. 2004). Further LD analysis in this region revealed r
2
correlations in the range of 0.67 to 0.96 between 3 additional VLDLR SNPs and 4
KCNV2 SNPs (Figure 2.2; Table 2.6). These markers cover 15.87 Kbp,
downstream of the VLDLR-SMARCA2 haplotype. Deficiencies in the KCNV2
potassium channel subunit cause retinal cone dystrophy 3B (RCD3B), and
consequently result in lifelong visual loss, night blindness, and abnormal color
vision (Wu, Cowing et al. 2006). Although we did not find significant association
results in the set of 25 KCNV2 SNPs we genotyped, further investigation is
necessary to describe the possible molecular interplay between these three
genes.
45
Table 2.6: Linkage disequilibrium correlations for VLDLR and KCNV2 on
chromosome 9p24.2
LD SNP Pairs for
VLDLR , KCNV2
χ
2
P Value LD R
2
rs17656660, rs2168133 280.85 4.90E-63 0.96
rs10967598, rs1889109 264.42 1.87E-59 0.93
rs17656660, rs11793747 260.42 1.39E-58 0.93
rs12345016, rs1869594 175.48 4.71E-40 0.76
rs12345016, rs1889109 135.15 3.06E-31 0.67
Note. LD R
2
= Linkage Disequilibrium Correlation; χ
2
= Chi Square.
A second region on chromosome 9 (cytoband p21.3) includes the tumor
suppressor candidate 1 (TUSC1) and MTA-phosphorylase (MTAP) genes. Both
of these genes contain significant 8-SNP haplotypes across 42.26 Kbp and 12.13
Kbp, respectively. This cytoband is well-documented for being abundant in
homozygous deletions common in malignant cell lines, gliomas, and leukemias
(Schmid, Sen et al. 2000). Down-regulation of TUSC1 is frequent in lung
tumorigenesis (Shan, Parker et al. 2004), while MTAP, a ubiquitously expressed
gene, is deficient in multiple types of cancer. Also, the gene network of MTAP
includes the retinoblastoma RB1 gene, along with genes (C1S and C4B) in the
complement and coagulation pathways.
46
Notably, cyclin-dependent kinase inhibitors p16 and p15 are co-localized in the
same region as MTAP and TUSC1. Several multi-kinase inhibitors are already
documented to be extremely essential in blocking the angiogenesis process
common to progressive AMD (Wang, Shi et al. 2007).
A third region on chromosome 9 that showed susceptibility for AMD is
located on cytoband q31.1, and includes a haplotype of 8 loci in a 30.98 Kbp
region of the ATP-binding cassette sub-family A1 (ABCA1). The ABCA1 protein
functions as a cholesterol efflux pump for the cellular lipid removal pathway,
participates in the engulfment of apoptotic bodies, and is induced by bacterial
lipopolysaccharides. Remarkably, the alternative complement pathway is also
activated by bacterial lipopolysaccharides. Mutant phenotypes have an effect on
cholesterol metabolic processes and on cardiovascular and immune system
responses (Zwarts, Clee et al. 2002). Specifically, defects in ABCA1 cause a
deficiency in high density lipoprotein types 1 and 2 (HDLD1 and HDLD2),
premature artery disease, and accumulation of cholesteryl esters (Clee,
Zwinderman et al. 2001). Also, common variations in its noncoding regions seem
to considerably alter the severity of atherosclerosis (Singaraja, Fievet et al.
2002). Luibl et al (Luibl, Isas et al. 2006) demonstrated that AMD and
cardiomyopathy share commonalities with respect to protein misfolding and
pathogenesis (Mullins, Russell et al. 2000; Anderson, Talaga et al. 2004).
Moreover, it was shown that the principal regulatory pathways of AMD and
cardiomyopathy could be related to those of amyloid diseases such as
47
atherosclerosis, elastosis, amyloidosis, and dense deposit disease. All of these
cardiovascular and inflammatory phenotypes are predisposing factors for
developing AMD. The functional role of ABCA1 is of further interest because of
its molecular interconnection with 10 complement and coagulation genes,
including the AMD-related CFH and SERPINE1 genes. Moreover, it also
interacts with APOE, a critical protein in the formation of drusen deposits.
Notably, carriers of ABCA4 show high mutation frequencies in AMD patients, as
defects in ABCA4 lead to an accumulation of retinal derivatives in the RPE
(Allikmets, Shroyer et al. 1997).
Finally, we detected a fourth susceptible locus on chromosome 9, for a
haplotype of size 100.8 Kbp on cytoband q23.2. This association is for an 8-SNP
haplotype on the aldehyde dehydrogenase ALDH1B1 gene. This is the second
protein we identified in the aldehyde dehydrogenase family, the other paralog
being ALDH8A1 (6q23.2). ALDH1B1 exhibits oxidoreductase activity, and
metabolism of corticosteroids and carbohydrates.
Chromosome 11. We identified the set of markers rs7106307,
rs3793981, rs11233352, and rs11233357 for the prolylcarboxypeptidase (PRCP)
gene. These SNPs form a significant haplotype of length 17.87 Kbp on cytoband
p15.3. Also, rs7106307 and rs7928693 are the 6
th
and 12
th
highest genome-wide
signals for the additive allele model. PRCP cleaves C-terminal amino acids linked
to proline in angiotension II and III. Since Angiotensin II modulates hypertension
48
by regulating both blood pressure and electrolyte balance, alterations in its
molecular structure may provide an auspicious background for developing AMD.
Fourteen genes in the complement pathway participate in the molecular network
of PRCP, some of which (i.e. CFB and SERPING1) confer an increased risk for
acquiring AMD.
Chromosome 18. Four markers (rs4940260, rs4939732, rs10502979, and
rs8088048) produced a significant haplotype of length 53.64 Kbp on the DCC
tumor suppressor gene (cytoband 18q21.2), upstream of the autosomal dominant
cone rod dystrophy 1 (CORD1). If bound to the netrin ligand, DCC mediates axon
attraction of neuronal growth cones. However, when not coupled with netrin, it
functions as a dependence receptor for induction of apoptosis. Moreover, two
VEGF signaling pathway genes that interact with DCC are RAF1 and NRAS.
These are vital networks, mainly because in addition to controlling the
angiogenesis process, VEGF has also been shown to act as a survival factor for
microvascular endothelial cells by preventing apoptosis (Gupta, Kshirsagar et al.
1999).
Chromosome X. Cytoband Xp11.4 includes MID1IP1, a gene for which
we obtained a significant 6-SNP haplotype of length 44.84 Kbp. We further tested
whether these loci might have different effects in males vs. females and ran the
HTR analyses separately for each gender. Although the combined data showed
association between AMD and these markers, this association remained strong
49
only in males (raw p-value = 1.34E-06; permuted p-value = 1.00E-02). MID1IP1
controls the stabilization of microtubules, and is highly expressed in the brain and
spinal cord. However, previous studies found this cytoband to be abundant in a
series of genes (COD1, COD4, CSBN1, CSNB4, NYX, RPGR, OPA2, and AIED)
related to different retinal degeneration diseases. Based on the gender status
difference on this chromosomal region, and given the interactive role of VLDLR
with APOEε2 for increasing the risk for AMD among males (Schmid, Sen et al.
2000) we will in future focus our attention on identifying the various gender based
genetic factors and their effect on Latinos.
2.4 GWAS Discussion: In this study we sought to identify novel variants
associated with early bilateral age-related macular degeneration in a population-
based sample of Latinos. This cohort consists of cases characterized by
intermediate to large soft drusen in both eyes, a high risk factor for progression of
early to late AMD. Therefore, the LALES sample provides us with an opportune
phenotypic background for identifying genetic variants and biological
mechanisms responsible for the initiation and evolution of AMD. As we discuss
later, our sample size is not particularly large. This motivated our use of HTR
analyses in an attempt to gain power. Primarily, HTR allowed for a better
identification of susceptible regions for which additive contributions of multiple
50
markers produced a detectable effect on phenotype, but for which single SNP
tests might fail statistical significance.
Our overall findings suggest a role relating to several genetic factors for
visual impairment through the activation of oxidative stress pathways, retinal
oxidoreductase metabolism, and through inflammatory and immune responses
overlapping both AMD and Alzheimer’s disease (AD) mechanisms. Additional
gene activities implicate biological mechanisms regulating glial guided axon
migration and regeneration of injured brain cells (including the olfactory bulb),
angiogenesis, Zinc ion binding, hypertension, and apoptosis. We explored the
principal biological pathways through which deleterious changes in these genes
might influence the genesis and evolution of AMD.
Growing evidence suggests that inflammatory and immune-mediated
events initiating complement responses play an essential role in the biogenesis
of AMD. The main complement pathways are (1) the classical complement
pathway activated by antibody-antigen binding, (2) the lectin pathway triggered
by oxidative stress and carbohydrates, and (3) the alternative pathway set off by
microbial pathogens, bacterial lipopolysaccharides, multimolecular aggregates
and cellular debris (Sivaprasad et al. 2006). Some of the main complement
regulators are complement factor H (CFH), a protein produced constitutively in
the liver but also synthesized by RPE cells, complement factor B (CFB) which
competes with CFH, and C-reactive protein (CRP). The convergence of these
51
three pathways activates proteins C3 into C3a and C3b, modulators of the C5b-9
terminal complexes that facilitate cell lysis. CFH inhibits activation of C3, down-
regulates the alternative complement pathway by reducing bystander cellular
damage, and arrests CRP-dependent complement activation. Also, the
interaction of oxidative stress and CFH is a key inducer of the complement-based
increase in AMD risk. Mainly, oxidative stress reduces the ability of inflammatory
cytokine interferon-γ (INF-γ) to increase CFH expression in RPE cells (Wu, Lauer
et al. 2007).
AMD-Associated Polymorphisms Involved in Inflammatory and Oxidative
Mechanisms: Patients with early AMD show an accumulation of oxidized lipids
and cholesterol esters, non-degradable deposits that result from phagocytosis of
photoreceptor cellular outer segments rich in retinol (vitamin A) (Mata, Weng et
al. 2000; Ruberti, Curcio et al. 2003). Retinol and β-carotene (provitamin A)
undergo specific metabolic processes and generate new derivatives that either
induce light absorption for vision or regulate cellular growth and development
(Wald 1951; Gudas 1994). In order for retinol to induce gene expression, it must
be converted by ALDH8A1 into 9-cis-retinoic acid. Moreover, our detection of 8
markers in the ALDH8A1 gene is also of interest due to the verified effect of β‐
carotene in treating AMD patients. The Age-Related Eye Disease Study
(AREDS) confirmed that high-dose antioxidant vitamins such as β‐carotene
markedly slowed the progression rate of vision loss in patients affected with
52
advanced AMD (SanGiovanni, Chew et al. 2007). Future investigation of the
oxidoreductase role of ALDH8A1 and its activity through the retinoic acid
metabolic process on age-related cellular activities is required in order to better
understand its impact on AMD.
One potential modifier in the oxidative stress pathway is SGK1. Two main
factors mark the importance of SGK1. First, SGK1 negatively controls the pro-
apoptotic FOXO3 gene, which is the main repressor of CFH upon oxidative
stress. Suppression of CFH involves activator STAT1 and repressor FOXO3.
STAT1 mediates INF‐γ induced stimulation of CFH promoter activity. Under
normal conditions FOXO3 does not impede the activation of CFH promoter by
STAT1, as it is localized predominantly in the cytoplasm. However, Wu et al (Wu,
Lauer et al. 2007) showed that after treatment with INF‐γ some FOXO3 levels
enter into the nucleus, where it preferentially binds STAT1 rather than its DNA
binding site on the CFH promoter. In the presence of oxidative stress, FOXO3
becomes acetylated, a reaction that reduces its interaction with STAT1,
increases it’s binding to the CFH promoter, decreases STAT1 occupancy of its
binding site, and reduces expression of CFH. Thus, an altered SGK1 protein may
generate a series of sequential events through FOXO3, CFH, and ultimately
complement responses. Secondly, SGK1 is a mineralocorticoid receptor (MR)
target in the kidney, leading to increased reabsorption of salt and water. Since
MR and glucocorticoid receptor (GR) utilize the same hormone responsive
elements (HREs), genetic variations in HREs are likely to affect both MR- and
53
GR- mediated transcriptional efficiency. Luca et al (F. Luca 2007) have recently
tested the effect of variations in regulatory sequences of SGK1, and found that
they account for differences in prevalence of hypertension among populations.
Nevertheless, hypertension has long been connected with an increased risk of
developing AMD (Klein, Deng et al. 2007). Therefore, the dual role of SGK1 as
an inhibitor of CFH and as a key regulator for hypertension suggests its critical
function in oxidative stress pathways and in creating a permissive environment
for AMD.
We also identified genetic variants ABCA1 and VLDLR, whose functions
are interconnected with inflammatory mechanisms. The presence of bacterial
lipopolysaccharides activates both ABCA1 and the alternative complement
pathway. Thus, the affect of ABCA1 mutations on cardiovascular and immune
system responses (i.e. premature coronary artery disease, atherosclerosis)
suggest the formation of prolific conditions predisposing to development of AMD,
along with the activation of the alternative complement pathway that is common
in AMD patients.
VLDLR, the second inflammatory linked variant, interacts with APOE.
However, APOE, whose variants are APOEε2, APOEε3, and APOEε4, takes part in
divergent mechanisms leading either to protectiveness against AMD or to an
increased risk for AMD (Souied, Benlian et al. 1998; Schmidt, Saunders et al.
2000). APOEε2 increases the risk of AMD, while APOEε4 protects against it.
54
Produced in the retina by Müller glial cells and in the CNS by astrocytes
(Weeber, Beffert et al. 2002), APOE participates in the transport and distribution
of lipids. A 2001 international group that pulled together 2 American and 1
European studies reported that the APOEε4 variant was significantly protective
against AMD across gender and all age groups, while the APOEε2 variant
doubled the risk in males compared to females (Schmidt, Saunders et al. 2000).
Equally relevant for understanding the risk of acquiring AMD is the interaction
effect of VLDLR and APOEε4 on the initiation and progression of Alzheimer’s
disease, as the results of cyclical immune reactions are analogous in both of
these diseases. In particular, while amyloid beta (Aβ) proteins are present in
neuritic plaques that affect neurons in the brain tissue of Alzheimer’s disease
patients (Goldstein, Muffat et al. 2003; Yoshida, Ohno-Matsui et al. 2005), they
also initiate the complement and inflammatory responses specific to AMD,
accumulate in retinal drusen deposits (Johnson, Leitner et al. 2002; Dentchev,
Milam et al. 2003), and further promote angiogenesis by modulating the
expression of angiogenesis-related factors released from RPE cells (Yoshida,
Ohno-Matsui et al. 2005). Reinforcing this relationship, LaDu et al (F. Luca
2007) described a mechanism through which the LDLR protein whose receptor is
VLDLR, mediates an increase in APOE levels as a response to Aβ proteins, a
chain reaction that limits the inflammatory response by decreasing the secretion
of pro-inflammatory cytokines and oxidative stress molecules (Smith, Richey
Harris et al. 1997; Wallace, Geddes et al. 1997). Moreover, Okuizumi et al
55
(Okuizumi, Onodera et al. 1995) observed that among Japanese the presence of
two copies of the 5-repeat CGG allele of VLDLR and at least one copy of the
AMD protective APOEε4 variant conferred a relative risk of 8.7 for developing
AD. Given the interaction effects of APOE with VLDLR, along with the strong
linkage between KCNV2, SMARCA2, and VLDLR, we need to further investigate
the role of each of these variants and the interdependent steps through which
they might trigger susceptibility to AMD.
Central Nervous System Functions Associated with AMD: Astrocytes
produce APOE, a primary apolipoprotein in the brain that functions as an anti-
inflammatory agent by responding to the A β-induced glial activation. We found
AMD association with several polymorphisms located in genes expressed in the
central nervous system (CNS). This set of genes includes VLDLR (a receptor for
APOE), SGK1, DCC, SEMA5A, ASTN1, KCNQ5, and MID1IP1. Besides a
variety of cellular functions comprised by these genes, their overall activities also
suggest that neurodevelopment and brain injury repairing processes may
constitute essential mechanisms in the biogenesis of AMD among Latinos.
Overall, these genes participate during embryogenesis and brain development
stage in cellular mechanisms that control (1) neuronal transport (VLDLR), (2)
number of dendrites, neuronal and glial amino acid transport (SGK1), (3) axon
attraction of neuronal growth cones (DCC), inhibition of RGC growth cones and
of axon regeneration (SEMA5A), and (4) glial guided migration of young
postmitotic neuroblasts (ASTN1). Also, KCNQ5 participates in excitability of
56
neurons, while VLDLR is also involved in neuronal signaling
pathways active
during brain injury repair.
AMD-associated Polymorphisms Involved in Zinc Ion Binding: We
detected two genes (PRDM and ZNF608) involved in Zinc ion binding. The
AREDS study confirmed that a combination of antioxidants and Zinc
supplements reduced the risk of developing advanced stages of AMD by 25%,
while the risk of vision loss was reduced by 19%. Administering Zinc alone
decreased these risks by 21% and by 11%, respectively (SanGiovanni, Chew et
al. 2007). While these findings were observed in patients with late AMD, our
study is based on early AMD cases. Nevertheless, since the LALES cohort has a
significantly lower rate of late AMD compared to that of Caucasians, the roles of
PRDM9 and ZNF608 remain of particular interest for understanding whether they
play a role in the progression of AMD.
Other AMD-associated Mechanisms: We identified several genetic
variants that are involved in angiogenesis. Namely, PAPPA2, SEMA5A, PRCP,
and KCNQ5 are each contributors to the damaging subretinal neovascularization
process through their interaction with IGFBP, IL8, and VEGF, respectively. Also,
we detected sets of SNPs located in 5 tumor-related genes (PRDM9, TUSC1,
MTAP, DCC, and MID1PI1). Deleterious changes in any of these genes may
exacerbate a chain of cardiovascular, inflammatory and immune system
responses common in any of the tumor predisposing or AMD inducing pathways.
57
As noted earlier, this study is not without limitations. Firstly, the sample
size is relatively small and may therefore be somewhat under-powered.
Consequently, we take the view that our study is, to some degree, ‘hypothesis
generating’ rather than ‘hypothesis testing’. It is this concern that motivated our
use of a haplotype-based test in addition to standard single allele tests.
Secondly, it is known that the Latino population is, to some degree, admixed, and
we have not directly addressed this issue in the current analysis. In part, this is
because we believe that because the LALES sample was drawn consistently
from the city of La Puente, it is relatively homogeneous. To confirm this we
compared the self reported birthplace locations across both cases and controls
and obtained an almost identical distribution of cases and controls. Using this
assessment, population structure does not appear to be a confounder in this
analysis. A more comprehensive analysis of this issue requires a better
understanding of what it means to be ‘Latino’. Consequently, we are in the
process of collecting 30 Native American samples, which can be used to inform a
more comprehensive analysis in future. Despite these limitations, this study
contributes valuable new insights into the type of genetic factors influencing the
risk of developing AMD. During the following two decades it is estimated that
worldwide prevalence of eye disease and associated blindness will increase
dramatically within aged individuals worldwide.
58
It is therefore essential to identify the range of genetic, environmental, and
complex biological pathways that define the occurrence and progression of AMD,
along with the potential burden of eye disease in susceptible US populations.
59
Chapter 3
Variation in Population Admixture and Structure among Latinos
3.1 Population Admixture Overview: Population structure and admixture have
strong confounding effects on genetic association studies. Discordant
frequencies for age-related macular degeneration (AMD) risk alleles and for AMD
incidence and prevalence rates are reported across different ethnic groups. We
examined the genomic ancestry characterizing 538 Latinos drawn from the Los
Angeles Latino Eye Study [LALES] as part of an ongoing AMD-association study.
To help assess the degree of Native American ancestry inherited by Latino
populations we sampled 25 Mayans and 5 Mexican Indians collected through
Coriell’s Institute. Levels of European, Asian, and African descent in Latinos were
inferred through USC Multiethnic Panel (USC MEP), which was formed from a
sample from the Multiethnic Cohort (MEC) study, the Singapore Chinese Health
Study, and a prospective cohort from Shanghai, China. A total of 233 ancestry
informative markers were genotyped for the 538 LALES Latinos, 30 Native
Americans, and 355 USC MEP individuals (African Americans, Japanese,
Chinese, European Americans, Latinos, and Native Hawaiians). Sensitivity of
ancestry estimates to relative sample size was considered.
We detected strong evidence for recent population admixture in LALES
Latinos, with overall estimates of 45.5% Native American (NA), 40.1% European
60
(EU), 9.8% Asian (AS), and 4.5% African-American (AA) ancestry.
Gradients of increasing NA background (37.3% to 51.7%) and of correspondingly
decreasing EU ancestry (45.3% to 29.6%) were observed as a function of birth
origin from North to South. The strongest excess of homozygosity, a reflection of
recent population admixture, was observed in non-US born Latinos that recently
populated the US. A set of 42 SNPs especially informative for distinguishing
between Native Americans and Europeans were identified, and could be valuable
reference markers for future studies.
These findings reflect the historic migration patterns of Native Americans
and suggest that while the ‘Latino’ label is used to categorize the entire
population, there exists a strong degree of heterogeneity within that population,
and that it will be important to assess this heterogeneity within future association
studies on Latino populations. Our study raises awareness of the diversity within
“Latinos” and the necessity to assess appropriate risk and treatment
management.
3.2 Admixture Background: In recent years great advances have been made
in discovering genetic variants associated with the biogenesis and progression of
a variety of complex diseases [e.g., (Gudmundsson, Sulem et al. 2007;
McPherson, Pertsemlidis et al. 2007; Rioux, Xavier et al. 2007; Saxena, Voight et
al. 2007; Scott, Mohlke et al. 2007; Sladek, Rocheleau et al. 2007;
Steinthorsdottir, Thorleifsson et al. 2007; Zeggini, Weedon et al. 2007)].
61
However, despite the relative success of association studies in mapping
susceptible loci, we are still frequently faced with a lack of replication across
different populations. One possible cause for this is our relatively poor
understanding of the degree of genetic diversity between populations.
Besides the variation in genetic make-up across ethnicities, we also often
observe a wide range in incidence and prevalence rates for any given disease
across different populations. Indeed, it is likely that this wide range of rates is
largely due to that variation. In admixed populations, for example, the difficulty
lies in appropriately matching cases and controls (often based on self-reported
ancestry). On the other hand, population substructure may inflate positive
associations and cause hidden confounding effects due to an underlying
difference in the distribution of ancestry between cases and controls (Lander and
Schork 1994; Altshuler, Kruglyak et al. 1998; Wacholder, Rothman et al. 2000;
Kittles, Chen et al. 2002; Thomas and Witte 2002; Cardon and Palmer 2003;
Wakeley and Lessard 2003; Hinds, Stokowski et al. 2004; Campbell, Ogburn et
al. 2005; Helgason, Yngvadottir et al. 2005; Balding 2006). If a particular
ancestral group has relatively lower disease prevalence rates, this will result in
an under-representation of that subgroup in cases versus controls. Loci with
dissimilar allele frequencies across populations may therefore induce spurious
associations with phenotype. For example, the CY3A4-V gene variant and
prostate cancer are reported to be substantially less common among European
American than African American men; Kittles et al. studied 688 AAs and found
62
that a strongly significant association at CYP3A4-V for prostate cancer
became a non-significant signal after including ten ancestry informative markers
(AIMs) (Kittles, Chen et al. 2002). In another example, Campbell et al. found the
LCT-13910 C >T variant to be strongly associated with height across European
populations; however, after matching cases and controls for southeastern and
northwestern ancestry, the evidence diminished substantially (Reiner, Ziv et al.
2005). It is therefore essential to understand the degree of genetic structure
within a study population of interest, and to then design and interpret association
studies with reference to that structure. As Latinos form 15.1% of the total US
population, the largest minority ethnic group in the US, with close to 200 million
individuals projected by 2050 (US Census Bureau 2007), the likelihood that a
growing number of genome-wide association studies will involve that population
is also increasing, particularly for diseases that are common in this minority
population, such as type II diabetes. For these reasons, given that Latinos are an
admixed population, it is crucial to understand the specifics of that admixture.
With this goal in mind, we examine the ancestral landscape of Latinos
ascertained through the Los Angeles Latino Eye Study (LALES), the largest
visual impairment epidemiological cohort of Latinos in the US (Varma, Paz et al.
2004). As such, this cohort represents a unique opportunity to better decipher the
demographics of the Latino population. Several discrepancies in both disease
prevalence rates and in the range of genetic susceptibility loci have already been
confirmed in Latino studies. For instance, asthma prevalence and mortality are
63
the highest among Puerto Ricans and the lowest among Mexicans (Salari,
Choudhry et al. 2005). In a 2005 study, Salari et al. (Salari, Choudhry et al. 2005)
found a higher level of European ancestry among Mexican Americans to be
strongly associated with increased asthma severity, while a higher proportion of
Native American ancestry protected against asthma severity. Also, Choudhry et
al. (2006) observed a significant difference in allele frequencies between asthma
cases and controls (P = 0.0002) in Puerto Ricans, but not in Mexicans.
Los Angeles County contains the largest Latino population of any county
in the US, comprising 47% of LA County’s 10 million residents. The LALES study
is a population-based cohort composed of 6,357 Latinos, residing in 6 census
tracts of the Los Angeles County, who originated mainly in the US, Mexico,
Guatemala, and El Salvador regions. Case-control status was based on
diagnosis for early AMD; cases were identified through the presence of bilateral
drusen deposits, while controls lacked drusen in either eye.
Age-related macular degeneration is a multifactorial disease caused by
genetic defects that exacerbate the risk of developing AMD through interactions
with biological and environmental factors. Preliminary evidence suggests that
there are differences for risk of AMD between various populations (Klein, Klein et
al. 1992; Mitchell, Smith et al. 1995; Allikmets, Shroyer et al. 1997; Klein and
Francis 2003; Klein, Peto et al. 2004; Varma, Fraser-Bell et al. 2004; Zareparsi,
Buraczynska et al. 2005; Leske, Wu et al. 2006; Klein, Klein et al. 2007; Klein,
Knudtson et al. 2007). In longitudinal studies of Non-Latino Whites, presence of
64
large drusen deposits was shown to be predictive of advanced/late AMD.
Although at baseline the prevalence of large drusen in the LALES Latinos was
14.5%, incidence of late AMD remained very low (0.43%) compared to that of
Caucasians (1.4% BMES; 1.6% Beaver Dam) (Mitchell, Smith et al. 1995; Klein,
Klein et al. 2007) but was closer to African descent individuals (0.7% Barbados)
(Leske, Wu et al. 2006). While prevalence rates for early AMD among Latinos
are similar to those found in Caucasians [9.4% LALES vs. 7.2% Blue Mountains
Eye Study (BMES) vs. 15.6% Beaver Dam] and in individuals of African descent
(12.6% BES) (Klein, Klein et al. 1992; Mitchell, Smith et al. 1995; Varma, Fraser-
Bell et al. 2004; Leske, Wu et al. 2006), incidence data indicates that only 1.5%
of early AMD cases advance into late AMD in Latinos, while 3.4% of cases
progress in Caucasian cohorts. It is thus still debatable whether Caucasian
specific AMD pathways are regulated by the same genetic mechanisms as those
attributed to African, Asian, or Latino populations.
The difficulty in defining Latino admixture rests in our relatively poor historical
understanding of the series of demographic events that converged into shaping
the modern Latino population from the source populations of the Americas,
Europe, Asia and Africa. However, the history of any population is written in its
genetic make-up, and that version is forgotten much more slowly than any
language-based version of the same history. While a number of studies defined
the admixed nature of Latinos to be mostly composed of Native American and
European descent, the complexity of studying Latinos is still compounded by the
65
fact that there is a considerable degree of heterogeneity between Native
Americans from differing geographic regions across the Americas (Collins-
Schramm, Chima et al. 2004; Krueger, Siddens et al. 2004; Salari, Choudhry et
al. 2005; Choudhry, Coyle et al. 2006; Price, Patterson et al. 2007). Wang et al.
examined genetic diversity in 29 Native American populations sampled from
North, Central, and South America, and compared them to Siberian populations
(Wang, Lewis et al. 2007). They depicted gradients of decrease in both genetic
diversity and similarity to immigrant Siberians as a function of geographic
distance from the Bering Strait. Moreover, an examination of ancient migration
patterns for Native American populations and a comparison to 54 other
indigenous populations worldwide suggested that coastal routes might have been
essential during ancient migrations of Native Americans. These factors add to the
intrinsic genetic variability among Latinos across the US. Unfortunately, the
relative paucity of available genome-wide data for the Native American
populations has made even the genetic data hard to interpret. Consequently, in
addition to the data inherent in the LALES study, we have also generated
genotype data for a number of Native American individuals.
Previous studies identified ancestry informative marker (AIM)
polymorphisms that exhibit large differences in allele frequencies across
populations of European, Asian, and African descent, and therefore confer
increased power for detecting levels of population stratification (Pfaff, Parra et al.
2001; Rosenberg, Pritchard et al. 2002; Collins-Schramm, Chima et al. 2004;
66
Rosenberg 2005; Price, Butler et al. 2008). A series of projects have since
followed, describing the effects these ancestries have on numerous genetic risk
factors (Yang, McElree et al. 1993; Bernardi, Arcieri et al. 1997; Menotti, Lanti et
al. 2000; Azofeifa, Hahn et al. 2004; Campbell, Ogburn et al. 2005; Reiner, Ziv et
al. 2005; Castellano 2006; Duerr, Taylor et al. 2006; Li, Atmaca-Sonmez et al.
2006; Ziv, John et al. 2006; Ziv, John et al. 2006; Shaffer, Kammerer et al. 2007).
However, such AIMs are liable to be less powerful when describing the ethnicity
of Latinos. For example, Mexican Americans contain a rather small percentage of
African heritage (< 6%), and are mostly composed of a mixture of European and
Native American ancestry (Yang, McElree et al. 1993; Bernardi, Arcieri et al.
1997; Menotti, Lanti et al. 2000; Salari, Choudhry et al. 2005; Choudhry, Coyle et
al. 2006; Ziv, John et al. 2006). The historical focus on the HapMap has meant
that a clear and comprehensive description of genetic admixture among
American Latinos has been lacking, and has only recently started to emerge
(Collins-Schramm, Chima et al. 2004; Salari, Choudhry et al. 2005; Conrad,
Jakobsson et al. 2006; Price, Patterson et al. 2007). For this purpose, the current
analysis is based on a set of 233 AIMs genotyped for 5 population samples:
LALES Latinos, Native Americans selected through Coriell’s institute for medical
research laboratory (http://ccr.coriell.org), and Asian, African and European
descent individuals from the USC Multiethnic Panel (USC MEP), consisting of
samples from the Multiethnic Cohort (MEC) (Kolonel, Henderson et al. 2000;
Pike, Kolonel et al. 2002), and two additional Chinese cohorts (Yuan, Ross et al.
67
1996; Wu, Seow et al. 2003). In this paper, we use this set of marker data to
infer the important demographic characteristics of Latinos. Our hope is that such
an understanding will enable investigators to increase the power of future
association studies based on Latino populations.
3.3 Materials and Methods: We used a set of 233 ancestry informative markers
(AIMs), dispersed throughout the genome, and chosen from a set of 400 markers
described in Smith et al. (Smith, Patterson et al. 2004). These SNPs exhibit a
substantial difference in allele frequencies across ethnicities (Rosenberg, Li et al.
2003). In addition, AIMs are specifically chosen to lack linkage with any known
human disease candidate. These SNPs had been previously genotyped among
the USC MEP. Given the existence of this data, and our desire to incorporate it
within our study, we ourselves genotyped the LALES sample and the NA
collection of individuals at the same set of AIMs.
Study Subjects: Three datasets were compiled for the estimation of Latino
ancestry for the ongoing ocular disease study of the LALES cohort: LALES, NA,
and a multiethnic panel comprised of subjects from the MEC and Chinese
cohorts. We genotyped two distinct datasets for the same set of AIMs described
above: (1) 538 LALES subjects and (2) 30 Native Americans (25 Mayans and 5
Mexican Indians). A brief description of the three datasets is provided below.
68
LALES Subjects: 538 LALES participants (268 cases: 268 controls) with
an average age (s.d.) of 56.7 (11.2) years were genotyped for this study (Table
3.1). All LALES cases were diagnosed with early AMD through the detection of
bilateral, intermediate to large soft drusen deposits. Controls lacked drusen in
either eye and were matched with cases based on age and birthplace location.
Details of the LALES cohort design are described elsewhere (Klein, Davis et al.
1991; Varma, Paz et al. 2004; Fraser-Bell, Wu et al. 2006). Institutional review
board approval was obtained from the Los Angeles County/University of
Southern California Medical Center Institutional Review Board.
Table 3.1: Demographics for LALES subjects included in this analysis
LALES Demographics Cases Controls
Age Average (S.D.) All (n = 500) 60.34 (11.37) 60.04 (11.54)
Males (n = 227) 59.62 (11.37) 60.74 (11.55)
Females (n = 273) 61.12 (12.40) 59.61 (11.54)
Birthplace Percentage Mexico 68.0 68.8
USA 18.0 18.4
EL Salvador 5.6 5.2
Guatemala 3.6 3.2
Other 4.8 4.4
Note. S.D. = Standard Deviation; n = Number; Other = Other birthplace locations.
Native American Subjects: In order to establish a reference set for the NA
lineage in Latinos, we genotyped 25 Mayan Amerindian and 5 Mexican Indian
69
DNA samples from Coriell’s human population repository collection
(http://ccr.coriell.org/). The Mayan samples were specifically chosen because
they represent ancient Native American civilizations that lived before the arrival
of Europeans in what nowadays are eastern and southern Mexico, El Salvador,
Guatemala, Belize, and Honduras. Since the dispersion of geographic regions for
the LALES cohort covers Mexico and most of Central America, the Mayan and
Mexican Indian samples overlap the birth locations for most of the LALES cohort.
MEC Subjects: The USC MEP consists of 355 European, Japanese,
Native Hawaiian, African American, and Latino female subjects from the
Multiethnic Cohort (MEC) study (Kolonel, Henderson et al. 2000), 18 Chinese
males from a prospective cohort from Shanghai, China (Yuan, Ross et al. 1996) ,
17 females from the Singapore Health Study (Wu, Seow et al. 2003) , and 40
parents from 20 CEU trios from HapMap (Haiman, Stram et al. 2003). This
multiethnic panel has been reported previously in de Bakker et al. (de Bakker,
Burtt et al. 2006) and Haiman et al. (Haiman, Stram et al. 2003). The Multiethnic
Cohort (MEC) study is a prospective cohort of approximately 215,000 individuals
from California and Hawaii (Kolonel, Henderson et al. 2000). This study was
established between 1993 -1996 and includes men and women primarily from
five racial and ethnic populations in Hawaii and California (African Americans,
European Americans, Latinos, Japanese Americans and Native Hawaiians). The
USC MEP includes a subset of 250 MEC women without a history of cancer,
70
namely, 40 Europeans, 70 African Americans, 70 Latinos from the Los
Angeles area, 35 Japanese, and 35 Hawaiians.
Genotyping: The 538 LALES and 30 Native American subjects were
genotyped using the Illumina GoldenGate platform for the 233 AIMs (USC
Genomics Core Laboratory, Los Angeles, CA). The MEP panel samples were
genotyped using the same platform (USC Genomics Core Laboratory, Los
Angeles, CA). 176 SNPs out of 233 had genotype call rates > 0.98 and were
chosen for the present analysis. Samples with an overall genotype call rate ≤ 0.8
were removed from analysis, resulting in a total of 500 LALES (250 cases, 250
controls) and 30 Native American individuals being included in the analysis.
3.4 Statistical Analysis: We employed a series of methods to evaluate the level
of admixture among Latinos, to estimate the relative proportions of AA, AS, EU,
and NA background in both LALES and MEC Latinos, and to assess the
correlation of NA and EU ancestry with the LALES AMD case-control status.
Ethnic proportions were inferred through the Markov chain Monte Carlo (MCMC)
algorithm of Falush and Pritchard using the STRUCTURE 2.2 software package
(Pritchard, Stephens et al. 2000; Falush, Stephens et al. 2003; Falush, Stephens
et al. 2007). Besides obtaining the ancestry information representing the entire
LALES cohort sample, we determined the effect of continental region by
71
computing estimates for LALES Latinos born within the US and within Mexico,
Guatemala and El Salvador, respectively.
Assessment of Latino population admixture was performed using three
different statistics: (1) the Pearson chi-square test to identify SNPs in Hardy
Weinberg disequilibrium, (2) an overall assessment across all AIMs of the
distribution of homozygous genotypes within each sampled population and also
of that within US-born vs. non-US born Latinos, and (3) a measure for excess
association between physically unlinked loci in LALES and in MEC Latinos.
Latino ancestry estimates were then used to identify a set of SNPs that are
especially informative for distinguishing between Native Americans and
Europeans, and could therefore be valuable reference markers for future
population admixture studies. Finally, since our population sample sizes vary
greatly across ethnic groups, we generated Bootstrap simulations to investigate
whether ancestry estimates might be influenced by this variance of sample sizes.
Estimation of Population Ancestry in Latinos: The genetic make-up of
Latinos was inferred using STRUCTURE 2.2 (Pritchard, Stephens et al. 2000;
Falush, Stephens et al. 2003; Falush, Stephens et al. 2007). We ran 45,000
burn-in repetitions and a further 50,000 iterations after the burn-in period.
Analysis was performed on the set of AIMs for the merged dataset of the LALES,
NA, and USC MEP. Since AIMs were selected for their lack of linkage with loci
known to be associated with human diseases, the inclusion of cases would be
72
unlikely to affect overall approximations. However, to avoid any potential
biases we report the population structure results based only on the inclusion of
LALES controls, referenced against the sets of 30 NA and the 355 subject from
the USC MEP. (Results when cases were included were essentially the same.) In
addition, as part of our continuing LALES Latino eye study we also completed a
separate STRUCTURE analysis using only the 250 AMD cases. This additional
step allowed us to examine potential differences in ethnic background between
AMD cases and controls. The Wilcoxon signed test was also employed to assess
differences in the estimated NA and EU inheritance between AMD cases and
controls. Association between any of the AIMs and AMD status was tested using
an additive genetic model. Allelic regression analysis was also conducted by
including individual EU and NA ancestry estimates as model covariates for
assessing the strength of association between any of the AIMs and AMD. Final
p-values were corrected for multiple comparisons through Bonferroni adjustment
at the 2.84*10
-4
(or 0.05/176) threshold.
When using STRUCTURE, accurately deciding the number of clusters, K,
that best describes a population’s substructure is a rather difficult task (Pritchard,
Stephens et al. 2000; Pritchard, Stephens et al. 2000; Pritchard and Donnelly
2001; Falush, Stephens et al. 2003; Falush, Stephens et al. 2007). Two main
factors influencing the estimation of K are (1) the probability of observing the data
given the predefined K parameter (i.e. Pr(X|K)), and (2) the existence of a
meaningful biological interpretation of K (Pritchard, Stephens et al. 2000).
73
Furthermore, contrary to the relatively simplistic nature of the model used by
STRUCTURE, data in real-populations is affected by inbreeding and isolation by
distance, with the consequence that allele frequencies vary in a relatively
continuous way across geographical regions (and this may result in an over-
estimation of the actual number of clusters present) (Pritchard, Stephens et al.
2000; Falush, Stephens et al. 2003). Therefore, our preferred solution is to focus
on the value of K which not only captures most of the structure in a population,
but also offers an experimentally relevant interpretation. We ran the analysis
using different values of K and obtained the estimated log-likelihood of the data
(lnPr(X|K)) at each run. For each K-value three independent analyses were
completed to ensure that lnPr(X|K) estimates were consistent across runs.
The average likelihood from the three independent runs is reported for each K,
where the posterior probability of K can be computed as
∑
=
= =
n
i
i k
i k
e
e
i K
1
) | Pr( ln
) | Pr( ln
) Pr( .
A second parameter of interest is the divergence in allele frequencies between
the K clusters, traditionally referred to as Wright’s F
st
measure (Wright 1951). The
current STRUCTURE implementation reports F
K
, an analogue of F
st
, proposed by
Falush et al. (2003) (Falush, Stephens et al. 2007). The F
K
-based model allows
for variation in drift rates between populations, computing a different F
K
measure
for each of the K populations rather than assessing an overall F
st
measure across
all populations.
74
Identification of population structure and recent admixture: In a
random-mating population we expect genotypes to be in Hardy-Weinberg
equilibrium (HWE) (Hartl 1988 ). Deviations from this equilibrium are typically
thought to be due to population structure, selection or genotyping errors. For
example, admixture will cause a modification of genotype frequencies in a
population due to the influx of alleles from other populations (Law, Buckleton et
al. 2003). Deviations due to selection are unlikely in the present context given
that the AIMs were chosen to be optimal for distinguishing large scale population
mixtures and for making precise ancestry estimates (Smith et al. 2004) (Smith,
Patterson et al. 2004). Given this, we checked among Latinos for deviations from
HWE in the set of 176 AIM SNPs using a Pearson’s chi-square test with one
degree of freedom.
Besides testing for HW disequilibrium, we tested for excess homozygosity,
a trademark of recent admixture, for each of the Latino, NA, EU, AS, and AA.
Choudhry and Siegmund implemented the T statistic as a measure for estimating
the amount of deviation from HWE, and as a means for evaluating the trend in
homozygosity across all markers, where
N
N P X P X
T
d dd D DD
) / / ( − +
=
, N is the
total number of individuals, P
D
and P
d
denote estimated allele frequencies, and
X
DD
and X
dd
are the homozygote genotypic counts (Choudhry, Coyle et al. 2006).
Under the assumption of HWE and based on the selection of randomly chosen
genome-wide loci, a standard normal distribution is expected to fit the
75
frequencies of the T-statistic (Choudhry, Taub et al. 2008), with
heterozygote frequencies distributed towards the left, homozygote counts
towards the right, and an overall mean/median centered around zero. The
observed distributions of Choudhry and Siegmund’s T-statistic values were
contrasted between the LALES sample, each population that comprises the USC
MEP, and Native Americans. We further searched for potential variation within
Latinos themselves by evaluating regional specific homozygosity trends of
individuals originating in different birthplace locations. A final analysis of
population admixture was conducted by assessing the degree of allelic
association between physically unlinked markers (Helgason, Yngvadottir et al.
2005; Tsai, Kho et al. 2006; Choudhry, Taub et al. 2008). Thus, any associations
between AIM pairs from these SNP pairs would most likely be due to recent
admixture or population substructure.
Bootstrap Methods for Assessing the Effect of Sample Size on Population
Structure Inference: An emerging concern when assessing ancestral proportions
is the size of the genotyped samples within a given study. Two issues surface
when inferring population structure: (1) the minimum sample size requirement for
a given population, and (2) the difference in the size of the analyzed sub-
populations. There is a danger that estimates of population ancestry might be
influenced by the size of the (sub)population being analyzed. For example, it is
plausible to imagine that it is easier to identify a population for which we have a
large number of representatives than one with relatively few members. This is a
76
particular concern in our study, given the discrepancies between sample
sizes across ethnic groups, and this issue is not generally addressed in the
literature. To guard against this issue we employed two commonly-used
techniques for adjusting sample sizes. First, smaller samples were inflated by
Boot-strapping (i.e. sampling at random with replacement) until they reached the
LALES control sample size (n = 250 controls). Chinese and Japanese subjects
were merged and categorized as ‘Asians’, while White and CEPH samples were
grouped into a single ‘European’ population. We applied this scheme to inflate
each of the following samples: 70 African Americans, 70 Latinos from LA (non-
LALES), 35 Native Hawaiians, 70 Asians (35 Japanese, 18 Chinese from
Shangai, and 17 Chinese from Singapore), and 110 Caucasians (40 CEPHs and
70 Europeans).
Through a second approach we reduced the size of the LALES control
cohort by selecting 70 individuals through random sampling without replacement.
Unselected individuals were excluded from the subsequent STRUCTURE
analysis. Each of the two schemes were repeated 100 times, and every resulting
data-set was analyzed with STRUCTURE 2.2 under the K = 4 model
parameterization used on the original data. We then reanalyzed the data to see if
our earlier conclusions remained true.
77
3.5 Results: A total of 500 out of 538 genotyped subjects were included in
the final analysis after a sample call rate test was performed at the 0.80 level.
Age, gender, and self-reported geographic birthplace distributions for the 500
LALES subjects are given in Table 3.1. Recent Latino-based population studies
reported various ancestry estimates between Puerto Ricans and Mexican
Americans (Krueger, Siddens et al. 2004; Salari, Choudhry et al. 2005;
Choudhry, Coyle et al. 2006; Choudhry, Taub et al. 2008). Overall, LALES birth
locations were dispersed as 68.4% Mexico, 18.2 % USA, 5.4 % El Salvador,
3.4% Guatemala, and 4.6% from other places. There is little difference between
cases and controls in this respect, as would be expected given that the inclusion
criteria for cases and controls in the original LALES cohort (n = 6357) study
design required a matched frequency for birthplace location.
Estimation of LALES Population Structure and Admixture: STRUCTURE
analysis estimates are given in Table 3.2; the log likelihood of the data, lnPr(X|K),
and the corresponding allele frequency difference measure F
K
are summarized
for each K = {2, …, 6}. Reported results represent an average from 3 different
runs, all of which gave consistent results, reflecting proper MCMC convergence.
Population structure for the LALES, USC MEP and NA samples for each of the K
= {2, …, 6} cluster models are illustrated in Figure 3.1. Previous studies suggest
that Latinos are a mixture of three main source populations (Native American,
European, and Asian), with rather little African descent (Salari, Choudhry et al.
2005; Ziv, John et al. 2006; Wang, Lewis et al. 2007; Choudhry, Taub et al.
78
2008). For this reason, we focus on the modeling results of K = 4 for which
the second largest likelihood [lnPr(X|K=4) = -116312.20] where the LALES Latino
admixture is partitioned as 45.5% Native American, 40.1% European, 9.8%
Asian, and 4.5 % African-American (Table 3.3). We estimated the Latino
admixture proportions from the inclusion of LALES controls only (n = 250).
However, LALES Latino cases (n = 250) exhibit substantially the same ethnic
admixture.
79
Figure 3.1: Individual ancestry proportions for the LALES, MEC, and Native American sampled populations
Legend:
AA – MEC African Americans; MEC – MEC Latinos; NH – MEC Native Hawaiians; JP – MEC Japanese (AS);
CH-Sh – China Shanghai (AS); CH-Si – China Singapore (AS); CEPH – CEPH (EU); White – MEC Whites (EU);
LALES – LALES Latinos; NA – Native American.
80
Table 3.2: Simulation summary statistics of ancestry clustering models
K LnP(D) Nucleotide Divergence Estimate LALES Latino MEC Latino
Ancestry Proportion Estimates Ancestry Proportion Estimates
F
st
1
F
st
2
F
st
3
F
st
4
F
st
5
F
st
6 C1 C2 C3 C4 C5 C6 C1 C2 C3 C4 C5 C6
2 -122914 0.38 0.27 - - - - 0.05 0.95 - - - - 0.06 0.94
3 -118002 0.27 0.31 0.53 - - - 0.06 0.39 0.56 - - - 0.07 0.45 0.48 - - -
4 -116312 0.30 0.31 0.61 0.14 - - 0.05 0.40 0.45 0.10 - - 0.06 0.45 0.37 0.12 - -
5 -116186 0.20 0.17 0.00 0.28 0.62 - 0.02 0.14 0.17 0.04 0.63 - 0.04 0.25 0.18 0.06 0.48 -
6 -116344 0.00 0.02 0.21 0.17 0.29 0.64 0.02 0.10 0.14 0.03 0.36 0.36 0.03 0.20 0.13 0.04 0.31 0.28
Note. For each K, the results are based on the average of 3 independent MCMC runs, each of the 3 iterations
having been initialized through a different seed; LnP(D) = Log Likelihood Probability of the Data;
F
st
= Nucleotide distance or divergence between predefined clusters; C = Inferred Cluster.
81
Table 3.3: Estimation of ancestry proportions for the LALES, MEC, and
Native American populations
Population AA EU AS NA
MEC African American 0.7210.1670.066 0.047
MEC Native Hawaiian 0.031 0.329 0.591 0.049
MEC Japanese 0.012 0.034 0.892 0.061
Chinese - Shanghai 0.013 0.018 0.916 0.052
Chinese - Singapore 0.014 0.02 0.934 0.032
MEC European 0.009 0.939 0.029 0.022
CEPH 0.0150.8910.056 0.038
Native American 0.005 0.013 0.032 0.949
MEC Latinos 0.059 0.453 0.116 0.373
LALES Latinos 0.049 0.401 0.098 0.452
Note. Estimates for the LALES Latino population are based on the LALES;
controls; European – EU; African American – AA; Asian – AS;
Native American – NA.
Nucleotide distance dispersions of individual ancestry vectors for K = 4 are
plotted in Figure 3.2, where each individual is mapped on the triangular
coordinates between Native American, European and ‘Other’ ethnicities. Each
individual is positioned proportional to his/her ancestral similarity to each the
three reference groups. Individuals placed at a particular corner are completely
assigned to the corresponding population, whereas those in the centroid area are
equidistant from the three group lineages. Measures of allele frequency
divergence between the four predefined clusters and of expected heterozygosity
between individuals within the same cluster are given in Table 3.10.
82
Figure 3.2: Cluster ancestry distribution for the LALES, multiethnic panel,
and Native American sampled populations
Legend:
LALES Latinos – turquoise; MEC Latinos – pink; African American – red;
Asian – burgundy; European – orange; Native Hawaiian – green.
In comparison to LALES Latinos, those ascertained through the MEC
cohort show a stronger relatedness to Europeans (40.1% vs. 45.3%), with
correspondingly lower Native American ancestry (45.2% vs. 37.3%) (Table 3.3).
This discrepancy is likely to be a consequence of differentiation in selection of
individuals for the two cohorts from the different birth places. Roughly 18% of the
LALES Latinos were born within the US and 68% within Mexico, with smaller
proportions born in Guatemala and El Salvador (Table 3.1). For the MEC sample
these proportions are somewhat different, with 47% of Latinos born in the US,
34% in Mexico, 10% in Central/South American, and 4% in Cuba. Three MEC
Latino individuals were of unknown birth origin. When we split the data by birth
83
origin (i.e. US vs. Mexico vs. Central/South America or El
Salvador/Guatemala), even though there are some differences in EU and NA
proportions between MEC and LALES Latinos, we detect in both cohorts a
common trend, a gradient of linear increase in NA ancestry from North (US) to
South (El Salvador for LALES or South America for MEC) with a corresponding
decrease in European descent (Table 3.4). Specifically, we observed a closer
resemblance between LALES and MEC US-born Latinos, with 42.0% vs. 39.3%
NA and 42.3% vs. 39.0% EU background. Conversely, Mexican and El
Salvadoran/Guatemalan based LALES Latinos show increasingly higher NA
lineages (48.6% and 51.7%, respectively) with correspondingly lower EU
contributions (36.7% and 29.6%, respectively). Similarly, MEC Latinos born
within Central/South America inherit higher NA (51.2%) and lower EU (37.5%)
ancestries.
Moreover, individual NA and EU ancestry distributions between
Salvadorans/Guatemalans and the rest of the LALES cohort were significantly
different (Wilcoxon signed p = 0.012 and 0.009, respectively). Since relatively few
individuals were born in El Salvador and Guatemala, we included both LALES
cases and controls for the Wilcoxon tests. We note however that separate
analyses of LALES cases or controls gave relatively similar ancestry estimates
(Table 3.5; Figure 3.3), resulting in non-significant differences for any of the NA,
EU, AA, or AS proportions (Wilcoxon signed p = 0.24, 0.39, 0.79, and 0.23,
respectively). Furthermore, while 17 SNPs were associated with AMD at the 0.05
84
level, none remained significant after Bonferroni correction (Table 3.8).
Inclusion of individual NA and EU estimates in an allelic regression model gave
similar results. While, for ease of interpretation, we focus our results on the
assumption of four source populations, the strongest log-likelihood was obtained
at K = 5 [lnPr(X|K=5) = -116186.10], for which a new substructure explains
63.0% of LALES and 47.6% of MEC Latino ancestry, but is found in none of the
four founder populations. (Figure 3.1; Table 3.6).
Table 3.4: Estimation of ancestry proportions for the LALES and MEC
Latinos by birthplace location
Latinos Birth Region NA EU AA AS
LALES El Salvador + Guatemala 0.52 0.30 0.12 0.07
Mexico 0.49 0.37 0.11 0.04
USA 0.42 0.42 0.12 0.04
Other 0.35 0.42 0.10 0.14
MEC Central/South America 0.51 0.37 0.06 0.05
Mexico 0.39 0.39 0.08 0.13
USA 0.35 0.47 0.04 0.14
Note. LALES = Los Angeles Latino Eye Study; MEC = Multi Ethnic Cohort; EU –
European; AA – African American; AS – Asian; NA – Native American.
85
Table 3.5: Ancestry informative markers with difference in allele frequency
(δ) greater than 0.3 between Native American and European ancestry among
Latinos
Marker Information MAF Allele Frequency δ
Locus SNP CHR Latinos NA EU AS AA
NA/
EU
NA/
AS
NA/
AA
rs6662385 T 1 0.35 0.66 0.20 0.77 0.01 0.46 0.11 0.65
C 1 0.35 0.34 0.80 0.23 0.99 0.46 0.11 0.65
CV11745078 G 2 0.28 0.04 0.37 0.48 0.98 0.34 0.45 0.94
A 2 0.28 0.96 0.63 0.52 0.02 0.34 0.45 0.94
rs1275988 C 2 0.31 0.10 0.87 0.36 0.91 0.77 0.26 0.81
T 2 0.31 0.90 0.13 0.64 0.09 0.77 0.26 0.81
rs2060447 C 2 0.23 0.89 0.17 0.86 0.02 0.72 0.03 0.87
T 2 0.23 0.11 0.83 0.14 0.98 0.72 0.03 0.87
rs2625051 C 2 0.17 0.90 0.56 0.87 0.04 0.34 0.03 0.87
T 2 0.17 0.10 0.44 0.13 0.97 0.34 0.03 0.87
CV74522 T 3 0.45 0.21 0.50 0.74 0.19 0.30 0.53 0.02
C 3 0.45 0.79 0.50 0.26 0.81 0.30 0.53 0.02
rs3796384 C 3 0.45 0.10 0.41 0.86 0.24 0.30 0.75 0.14
G 3 0.45 0.90 0.59 0.15 0.76 0.30 0.75 0.14
rs7611703 T 3 0.47 0.43 0.72 0.70 0.10 0.30 0.27 0.33
C 3 0.47 0.57 0.28 0.31 0.90 0.30 0.27 0.33
rs1921877 C 4 0.48 0.28 0.61 0.73 0.15 0.34 0.46 0.13
T 4 0.48 0.72 0.39 0.27 0.85 0.34 0.46 0.13
rs262838 G 5 0.47 0.76 0.29 0.14 0.72 0.47 0.63 0.04
A 5 0.47 0.24 0.71 0.86 0.28 0.47 0.63 0.04
rs4702813 G 5 0.32 0.82 0.04 0.71 0.01 0.78 0.11 0.82
A 5 0.32 0.18 0.96 0.29 0.99 0.78 0.11 0.82
rs874973 A 5 0.13 0.89 0.58 0.94 0.06 0.31 0.06 0.82
G 5 0.13 0.11 0.42 0.06 0.94 0.31 0.06 0.82
rs900379 T 5 0.41 0.18 0.54 0.66 0.18 0.36 0.48 0.01
C 5 0.41 0.82 0.46 0.34 0.82 0.36 0.48 0.01
CV11635757 A 6 0.46 0.83 0.25 0.18 0.82 0.58 0.65 0.01
G 6 0.46 0.17 0.75 0.82 0.18 0.58 0.65 0.01
rs1480642 T 6 0.26 0.45 0.10 0.00 0.85 0.35 0.45 0.40
C 6 0.26 0.55 0.90 1.00 0.15 0.35 0.45 0.40
rs222541 G 6 0.26 0.91 0.50 0.70 0.02 0.41 0.21 0.89
C 6 0.26 0.09 0.50 0.30 0.98 0.41 0.21 0.89
rs606548 C 6 0.41 0.00 0.73 0.61 0.61 0.73 0.61 0.60
T 6 0.41 1.00 0.27 0.39 0.39 0.73 0.61 0.60
rs1031402 A 8 0.27 0.93 0.56 0.68 0.01 0.37 0.25 0.93
G 8 0.27 0.07 0.44 0.32 0.99 0.37 0.25 0.93
86
Table 3.5, Continued
Marker Information MAF Allele Frequency δ
Locus Allele CHR Latinos NA EU AS AA
NA/
EU
NA/
AS
NA/
AA
rs2045638 A 8 0.16 0.93 0.58 0.83 0.01 0.35 0.10 0.92
G 8 0.16 0.07 0.43 0.17 0.99 0.35 0.10 0.92
rs2076974 C 10 0.15 0.99 0.57 0.85 0.14 0.42 0.14 0.85
T 10 0.15 0.01 0.43 0.15 0.86 0.42 0.14 0.85
rs959354 C 11 0.09 0.01 0.39 0.08 0.88 0.39 0.07 0.88
CV11287912 C 12 0.34 0.37 0.87 0.96 0.24 0.50 0.58 0.14
A 12 0.34 0.63 0.13 0.05 0.76 0.50 0.58 0.14
rs2293048 T 12 0.42 0.68 0.15 0.12 0.57 0.52 0.56 0.10
C 12 0.42 0.32 0.85 0.88 0.43 0.52 0.56 0.10
rs4766807 A 12 0.30 0.92 0.45 0.62 0.01 0.47 0.30 0.91
T 12 0.30 0.08 0.55 0.38 1.00 0.47 0.30 0.91
rs7995033 T 13 0.47 0.76 0.41 0.13 0.96 0.34 0.63 0.20
C 13 0.47 0.25 0.59 0.87 0.04 0.34 0.63 0.20
CV1436495 C 14 0.30 0.10 0.40 0.51 0.00 0.30 0.41 0.09
G 14 0.30 0.90 0.61 0.49 1.00 0.30 0.41 0.09
rs6495569 A 15 0.14 0.99 0.69 0.81 0.17 0.31 0.18 0.82
G 15 0.14 0.01 0.31 0.19 0.83 0.31 0.18 0.82
rs2217271 C 16 0.26 0.88 0.50 0.70 0.24 0.38 0.18 0.64
A 16 0.26 0.13 0.50 0.30 0.76 0.38 0.18 0.64
rs878522 G 20 0.28 0.98 0.65 0.37 0.98 0.34 0.62 0.00
A 20 0.28 0.02 0.35 0.63 0.02 0.34 0.62 0.00
rs727563 T 22 0.40 0.03 0.60 0.79 0.24 0.57 0.76 0.21
C 22 0.40 0.98 0.40 0.21 0.76 0.57 0.76 0.21
Note. SNP ID = Marker Name; CHR = Chromosome; δ = Allele frequency
difference.
87
Table 3.6: Proportion of membership of each pre-defined population in each of
the 5 clusters
Population AA EU AS NA 5
th
CLUSTER
MEC African American 0.736 0.138 0.042 0.025 0.059
MEC Native Hawaiian 0.023 0.292 0.589 0.018 0.079
MEC Japanese 0.006 0.020 0.92 0.033 0.021
Chinese - Shanghai 0.006 0.008 0.945 0.027 0.014
Chinese - Singapore 0.007 0.009 0.956 0.014 0.013
MEC European 0.005 0.959 0.011 0.009 0.016
CEPH 0.0090.893 0.036 0.013 0.049
Native American 0.003 0.008 0.022 0.947 0.020
Latinos from LA 0.037 0.253 0.058 0.176 0.476
LALES Latinos 0.019 0.137 0.041 0.173 0.630
Note. Estimates for the LALES Latino population are based on the LALES
controls; European – EU; African American – AA; Asian – AS;
Native American – NA.
88
Figure 3.3: Native American and European ancestry distribution among LALES cases and controls
Legend:
Cases EU = European ancestry among cases; CASES NA = Native American ancestry among cases;
Controls EU = European ancestry among controls; Controls NA = Native American ancestry among controls;
NA All = Native American ancestry in both cases and controls; EU All= European ancestry in both cases
and controls.
89
Selection of markers informative for distinguishing between Native
American and African, Asian, and European ethnicity: Given the importance of
allowing for the effects of population structure within associations studies, it
would clearly be useful to determine a set of SNPs that might be helpful in
untangling admixture in Latinos. The HapMap data is not sufficient for this
purpose since it contains no Native American individuals. With this in mind, we
compared allele frequency estimates between AA, AS, EU, and NA populations,
with the intent of defining sets of SNPs informative for differentiating NAs from
other populations. Table 3.6 summarizes the chromosomal positions and allele
frequencies for 42 AIMs at which we detected at least a 30% difference in allele
frequencies (δ > 0.3) between NA and EU populations.
Tests for population structure and recent admixture: The HWE test was
used as a means of detecting population structure and/or recent admixture. None
of the 176 analyzed AIMs failed HW equilibrium. However, the overall distribution
of genotype homozygosity shows a greater shift to the right (higher
homozygosity) in the LALES Latinos than in any of the four founder/reference
populations (Figure 3.4). This tendency is reduced in the MEC Latinos. Figure
3.5 reveals potential explanation for this. We examine the distribution of
homozygosity within the LALES population for those born outside the US. Given
90
that the MEC population contains a larger proportion of individuals born
within the US, a smaller signature of increased homozygosity might be expected.
Finally, from a total of 15,931 pair-wise SNP combinations we attained a
subset of 15,163 pairs formed by SNPs positioned on different chromosomes,
and of these unlinked pairs 10.0% were significantly associated in the LALES
cohort compared to 6.7% in MEC Latinos. These results point towards evidence
for recent population admixture in Latinos that have recently populated the US,
since they compose most of the LALES cohort (~ 82%).
91
Figure 3.4: Distribution of T-values for testing overall homozygosity and
heterozygosity trends
Note. s.d. = standard deviation; quart. = quartile.
F
h
L
N
Figure 3.5: D
eterozygos
atinos born
Note. s.d. =
Distribution
sity trends i
n outside th
standard d
of T-values
n LALES La
e US
deviation; qu
s for testing
atinos born
uart. = quar
g overall ho
n within the
rtile.
omozygosity
US versus
y and
LALES
92
93
Effect of Sample Size on Admixture Estimation: We used two
sampling techniques to explore the effect of relative sample sizes on inferred
ancestry. In a first approach, we sub-sampled the LALES cohort to produce a
sample of size 70, broadly consistent with the other sample sizes in our data.
Despite the wide variation of estimated NA and EU admixture proportions within
LALES individuals, this approach typically resulted in estimates broadly similar to
those resulting from the initial dataset analysis (Table 3.9). Estimated NA and EU
ancestries had a mean (s.d.) over 100 sampled datasets of 45.0% (2.0%) and
42.0% (2.0%), respectively, compared to the original estimates of 45.2% and
40.1%. These results suggest that a sample of size 70 is large enough to result
in reliable estimates of admixture proportions, at least in the present context.
Using a second, bootstrapping (sampling with replacement) approach we
increased the smaller AS, AA, EU, and NA datasets to 250 individuals each,
matching the size of the LALES control set. We report average ancestry
estimates over 100 samples (Table 3.9; Figure 3.6). Mean EU ancestry now
increased to 44.3% (s.d. = 0.6%), with a correspondingly lower NA ancestry (42.2
% (0.7%)) in LALES Latinos. While this outcome is only suggestive, it does seem
that a sample size of 70 individuals per ethnic group is sufficient to obtain reliable
estimates of ancestry. However, if there is a perceived need to increase the size
of smaller samples by using boot-strapping, somewhat altered estimates of
admixture proportions may result.
94
Table 3.7: Comparison of ancestry proportion medians (1
st
: 3
rd
quartile) among LALES Latinos by birth location and case-
control status
Birthplace
Origin
African American European Asian Native American
Cases Controls Cases Controls Cases Controls Cases Controls
El Salvador
0.057
(0.029:0.095)
0.034
(0.017:0.067)
0.358
(0.282:0.455)
0.272
(0.187:0.376)
0.053
(0.026:0.210)
0.058
(0.0.03:0.125)
0.483
(0.380:0.511)
0.581
(0.405:0.653)
Guatemala
0.035
(0.018:0.093)
0.071
(0.018:0.089)
0.283
(0.229:0.316)
0.366
(0.289:0.444)
0.022
(0.018:0.102)
0.034
(0.022:0.043)
0.628
(0.506:674)
0.576
(0.515:0.609)
Mexico
0.031
(0.016:0.057)
0.029
(0.013:0.059)
0.404
(0.261:0.519)
0.408
(0.302:0.503)
0.059
(0.029:0.124)
0.068
(0.038:0.131)
0.460
(0.334:0.606)
0.452
(0.345:0.547)
Other
0.058
(0.042:0.119)
0.0.051
(0.031:0.083)
0.391
(0.285:0.626)
0.419
(0.370:0.501)
0.047
(0.024:0.079)
0.054
(0.042:0.092)
0.337
(0.115:0.555)
0.462
(0.366:0.511)
USA
0.023
(0.013:0.047)
0.038
(0.016:0.073)
0.422
(0.336:0.0514)
0.378
(0.289:0.511)
0.055
(0.037:0.090)
0.069
(0.034:0.12)
0.457
(0.368:0.529)
0.457
(0.331:0.620)
ALL
0.031
(0.016:0.058)
0.032
(0.013:0.064)
0.404
(0.284:0.513)
0.371
(0.290:0.502)
0.057
(0.029:0.118)
0.066
(0.036:0.126)
0.466
(0.345:0.594)
0.460
(0.345:0.576)
95
Table 3.8: Association test for AIMs with AMD in LALES Latinos
Marker CHR Position
Corr-
Trend
P*
Ancestry
Adjusted
P*
Armitage
P*
Exact
Armitage
P*
OR
((Dd)
vs. (dd))
OR LCB
(Dd/dd)
OR
UCB
(Dd/dd)
OR ((DD)
vs. (Dd))
OR LCB
(DD/Dd)
OR
UCB
(DD/Dd)
CV1766307 1 65,161,877 0.04 >0.05 0.04 0.05 1.97 1.29 3.01 0.81 0.52 1.26
rs2055314 3 244,035 0.01 0.01 0.01 0.01 1.34 0.91 1.99 1.43 0.87 2.37
rs900379 5 44,405,413 0.04 >0.05 0.04 0.04 0.75 0.51 1.11 0.76 0.45 1.28
rs1482680 5 44,427,899 0.01 0.01 0.01 0.01 0.63 0.41 0.97 0.80 0.51 1.25
rs267071 5 116,976,402 0.01 0.02 0.01 0.02 0.57 0.39 0.83 1.29 0.59 2.84
rs3808013 7 103,585,438 0.03 0.01 0.03 0.03 1.04 0.69 1.56 1.81 1.13 2.90
rs7791496 7 148,494,607 0.05 0.03 0.05 0.05 1.87 1.28 2.75 0.62 0.34 1.16
rs2124036 8 126,717,316 0.03 0.02 0.03 0.04 0.72 0.48 1.07 0.81 0.50 1.31
rs7865808 9 2,903,620 0.04 0.03 0.04 0.04 1.59 0.97 2.61 2.02 0.20 20.34
rs2230808 9 106,602,625 0.01 0.02 0.01 0.02 0.55 0.38 0.80 1.58 0.66 3.78
rs1335870 13 19,803,776 0.02 0.01 0.02 0.03 1.51 1.02 2.22 1.17 0.50 2.77
rs6495569 15 79,447,292 0.05 0.02 0.05 0.06 1.34 0.87 2.06 1.79 0.58 5.51
rs188324 15 93,501,509 0.01 0.01 0.01 0.01 0.67 0.45 1.02 0.76 0.48 1.21
rs2217271 16 12,611,981 0.02 0.03 0.02 0.02 0.86 0.59 1.24 0.38 0.17 0.87
rs1557519 16 14,158,804 0.03 0.03 0.03 0.03 0.76 0.51 1.15 0.40 0.12 1.30
rs878522 20 45,050,583 0.01 0.01 0.01 0.01 0.79 0.54 1.15 0.46 0.22 0.93
rs727563 22 40,197,323 0.02 0.03 0.02 0.02 0.61 0.41 0.90 0.96 0.57 1.60
Note. * All P-values are unadjusted for Bonferroni correction. After Bonferroni correction none of the AIMs remain
signifficantly associated with AMD; CHR = Chromosome; Corr = Correlation; P = P value; OR = Odds Ratio;
LCB = Lower Confidence Limit; UCB = Upper Confidence Limit.
96
Table 3.9: Bootstrap simulation results for the increased and decreased sample
size methods
Bootstrap
Method
Statistics EU
Ancestry
NA
Ancestry
Increased Sample Size Mean 0.44 0.42
Median 0.44 0.42
Variance 4.38E-05 5.50E-05
S.D. 6.61E-03 7.42E-03
Minimum 0.42 0.39
Maximum0.47 0.44
Decreased Sample Size Mean 0.42 0.45
Median 0.42 0.45
Variance 2.54E-04 3.43E-04
S.D. 0.02 0.02
Minimum 0.38 0.39
Maximum0.46 0.48
Note. Original sample analysis gave estimates for LALES Latinos of 45.2%
Native American and 40.1 % European ancestry; S.D. = Standard Deviation
EU = European; NA = Native American.
Table 3.10: Allele frequency divergence among populations (net nucleotide
distance) and expected heterozygosity between individuals
Statistical Measure Population AS NA EU AA
Allele Frequency
Divergence
§
Asian * -0.07 -0.14 -0.43
Native American -0.07 * -0.19 -0.54
European -0.14 -0.19 * -0.65
African American -0.43 -0.54 -0.65 *
Expected Heterozygosity** 1.91 2.03 1.99 2.11
Note. § Allele frequency divergence between estimated clusters; ** Expected
heterozygosity between individuals in the same cluster.
97
Figure 3.6: Bootstrap re-sampling: distribution of European and Native American
ancestry frequencies in LALES Latinos
Note. EU = European; NA = Native American.
98
3.6 Discussion: Population stratification may induce confounding associations in
case-control study designs (Thomas and Witte 2002). For example, association
studies of recently admixed populations may produce spurious allelic
associations for markers that are in linkage disequilibrium with a causal gene, a
reason for replication failures in other populations (Cardon and Palmer 2003;
Campbell, Ogburn et al. 2005; Clayton, Walker et al. 2005; Helgason,
Yngvadottir et al. 2005). It is therefore necessary to first assess the extent of
admixture when designing association studies that involve populations such as
Latinos. This issue is further complicated by the fact that ancestry may vary on a
local scale. For example, Marrero et al. identified one population in Brazil to be of
mostly Caucasian ancestry, while another self-identified Caucasian population
was in fact composed of 36% Native American and 16% African ethnicity
(Marrero, Das Neves Leite et al. 2005). While the degree of genetic variation
seen within ‘Latino’ populations is not well understood, it is likely to exist, and to
have been continuously shaped by the migration routes and time periods during
which immigrants arrived in the Americas, and by the regional densities of Native
American populations.
In this study we evaluated admixture in Latinos ascertained through the
Los Angeles Latino Eye Study, the most comprehensive eye disease study in the
US (Varma, Fraser-Bell et al. 2004; Varma, Paz et al. 2004; Varma and Torres
2004; Varma, Ying-Lai et al. 2004). Despite the growing evidence for the role of
complement pathway in development of AMD, discordant frequencies for a series
99
of AMD risk alleles have been reported between different ethnic groups
(Klein, Klein et al. 1992; Mitchell, Smith et al. 1995; Grassi, Fingert et al. 2006;
Klein, Klein et al. 2006; Klein, Klein et al. 2007; Tedeschi-Blok, Buckley et al.
2007). Our paper raises awareness of the diversity within “Latinos” themselves
and of the necessity to further assess appropriate risk and treatment
management. It also provides a resource for future invasive examination of
ancestry-specific AMD mechanisms or of other related biological pathways. A
distinctive characteristic of the LALES study is the ascertainment of Latinos from
different geographic regions, a factor that allowed us to better characterize the
extent of Native American and European variation within Latinos. Our findings
differ from two previous studies, in which Mexican Americans were estimated to
inherit higher NA (51.7%; 52.0%) and EU (44.9%; 45.0%) background compared
to Mexico-born LALES Latinos (48.6 % NA and 36.7 % EU) or to the overall
LALES cohort (45.2 % NA and 40.1 % EU), while having roughly similar African
contributions (Collins-Schramm, Chima et al. 2004; Salari, Choudhry et al. 2005).
This difference largely arises from the fact that our analysis focused on using K =
4 clusters (AA, AS, EU, and NA,) in the STRUCTURE analysis, whereas the
earlier studies used K = 3 (AA, EU, and NA). When we replicate their approach,
by excluding Asians and running an analysis with K=3 we recover broadly the
same estimated ancestry proportions in both Mexican LALES Latinos (53.4 % NA
and 40.3 % EU) and the overall LALES cohort (49.3 % NA and 41.1 % EU).
100
Increased homozygosity is a commonly-used signature for admixture. In
our study we observe elevated levels of homozygosity in our Latino study
populations. The increase in homozygosity is higher in the LALES Latinos than in
those from the MEC cohort, an indicator of more recent population admixture
among Latinos that have migrated recently to the US. Indeed, when we
compared US with non-US born LALES Latinos, we observed an increase in the
level of homozygosity in the latter. Another indicator of recent admixture and/or
population structure is allelic associations between markers positioned on
different chromosomes. 10% vs. 6.7% of unlinked locus pairs were associated in
LALES vs. MEC Latinos, an additional confirmation of heterogeneity within
Latinos. Finally, in an attempt to aid the design of future studies involving Latinos,
we reported a set of SNPs with high differences in allele frequencies between
Native American and European populations.
The issue of whether the results from a STRUCTURE analysis are
affected by discrepancies between sample sizes across ethnic groups within a
study design is not typically addressed. We approached this issue in two ways.
Firstly, we constructed new samples by sub-sampling the Latino population in our
existing study, sampling randomly without replacement, to produce new samples
in which the size of the Latino population was comparable to that of the other
ethnic groups within the data (n=70). Results of STRUCTURE analyses for these
new datasets were consistent with those obtained from the original data. This
suggests two things. First, unequal sample sizes do not appear to bias estimates
101
of ancestry, at least in the context of the present paper. Second, it supports
the belief that sample sizes of 25 or great are typically sufficient to give
meaningful estimates of ancestry. In a second approach to this problem we tried
another common strategy, inflating sample sizes by boot-strapping. When
following this approach, ancestry estimates did appear to change from those
found in the original sample. While these results are clearly only suggestive, they
do imply that caution should be exercised before employing such an approach.
In summary, we found strong evidence for recent population admixture in
Latinos ascertained through the LALES cohort, most of whom have migrated
recently from outside the US into the LA area. By specifically incorporating, and
in some cases collecting genotype data for each of the likely source populations,
we were able to identify the ethnicity related to each component of the Latino
genetic make-up. The highest ancestral component was Native American, with
gradients of increasing NA ancestry as a function of birth origin from North to
South (US, Mexico, Guatemala, El Salvador).
These findings reflect the historic migration patterns of the NA population
and suggest that while the ‘Latino’ label is used to categorize the entire
population, there exists a strong degree of heterogeneity within that population,
and that it will be important to assess this heterogeneity, and control for it, within
future association studies on Latino populations.
102
Chapter 4
Copy Number Variation (CNV) Association Methods
4.1 CNV Objective: One of the most important challenges facing biology
nowadays is to understand how genetic variation affects phenotype. CNV
analysis has become a rapidly growing field focused on detecting the variation
and location of DNA copy number polymorphisms (CNPs), and the degree to
which CNPs influence phenotypic variation. Several algorithms have already
been developed for detecting the genome location and copy numbers of inherited
DNA segments. However, we propose to further exploit the effect of variation in
DNA alteration by developing a series of tests that will enable us to statistically
verify whether there is an association between disease status and the number
and/or amplitude of altered DNA. Moreover, the methods necessary for running
CNV tests of associations will be implemented through a software program (CNA
–Copy Number Association) that will soon (2009) be made available on the
Bioconductor website for Bioinformatics ( http://www.bioconductor.org).
103
4.2 CNV Background: The 2006 publication of Feuk et al. about human “copy-
number variation” (CNV) generated a paradigm shift that dramatically expanded
our understanding of genome differences between individuals (Feuk, MacDonald
et al. 2005). Namely, the discovery of human DNA copy number variation and its
effect on expression and regulation of genes strongly suggested that genetic
association studies based on SNPs alone were insufficient, whereas a combined
analysis of SNPs and CNVs would more efficiently expand the molecular
diagnosis of complex diseases. Feuk defined a CNV to be any chromosomal
change affecting between 1000 to approximately half a million nucleotides of
genomic DNA. Their study found an approximately 12% variation in the human
genome, shattering the formerly accepted idea that humans are identical at the
genome level more than 99.7% -- based on SNP comparisons. Chromosomal
rearrangements occur during the “fork stalling and template switching”
recombination process of cell division. Lee and Lupski et al. found that during the
meiosis process the copying process of DNA stalls whenever there is a problem
with the transcription of DNA, allowing the duplication complex to switch and
copy a similar but different segment before transferring back and restarting at the
original fork .(Lee, Carvalho et al. 2007) Thus, meiotic recombinant products are
composed of various numbers of deletions and/or insertions of DNA segments
(Figures 4.1 and 4.2).
F
N
(h
F
N
(h
Figure 4.1: C
Note. This im
http://comm
Figure 4.2: D
Note. This im
http://comm
CNV Illustra
mage has b
mons.wikime
DNA Fork a
mage has b
mons.wikime
ation for De
been releas
edia.org/wi
and Stall Re
been releas
edia.org/wi
eletion and
sed into the
ki/User:Lad
eplication
sed into the
ki/User:Lad
Duplication
public dom
dyofHats).
public dom
dyofHats).
n of DNA
main by Lad
main by Lad
dyofHats
dyofHats
104
105
While large scale copy number alterations affect disorders such as
Down’s syndrome, Palizaeus-Merzbacher, or Klinefelter’s syndrome, regional
CNVs have been linked to cancers (i.e. loss of RB tumor suppressor in
retinoblastoma) and other complex disorders (i.e. hyperglycemia). Thus, an
overlap between CNVs and a map of single gene disorders identified 300 well
known disease-causing genes. (Feuk, MacDonald et al. 2005) Moreover, a study
based on high-resolution map of segmental DNA in the mouse genome reported
germline copy number polymorphisms in the CFH region, one of the well known
genetic risk factors for developing AMD. (Graubert, Cahan et al. 2007) (Timothy
A. Graubert). The ubiquity of CNVs suggests that both deletions and gene
dosage increases may be underlying rare and common diseases. Nevertheless,
while genomic deletions in healthy individuals may not trigger a monogenic
disease, environmental and/or additional genetic factors may contribute to the
development of complex diseases.
4.3 Methods for estimating variation in chromosomal location and number
of altered DNA segments: Numerous algorithms (Kallioniemi, Kallioniemi et al.
1992; Pollack, Perou et al. 1999; Fridlyand 2004; Huang, Wei et al. 2004;
Olshen, Venkatraman et al. 2004; Zhao, Li et al. 2004; Hsu, Self et al. 2005;
Huang, Wu et al. 2005; Nannya, Sanada et al. 2005; Broet and Richardson 2006;
Lipson, Aumann et al. 2006; Marioni, Thorne et al. 2006) were developed to
evaluate the types and levels of hybridization in gene expression arrays. In
g
fl
(n
re
F
N
c
c
n
d
th
eneral, cop
uorescently
normal DNA
elative inten
Figure 4.3: C
Note. Permi
Most
ontains rela
opies. For a
umber of c
iscrete valu
hese two pr
py number a
y tagged ge
A). Thus, th
nsity of the
CNV Hybrid
ssion for G
of these al
atively long
a given num
opies c
m
ar
ues (DIS). H
roperties ca
alteration (C
enome (seg
he copy num
fluorescen
dization Pro
raphics fro
gorithms ar
DNA sequ
mber of pro
re assumed
However, d
annot be ob
CNA) probe
gments of in
mber for a r
t probe (Fig
ocedure
m Transge
re built on t
uences com
obes m map
d to be piec
ue to biolog
bserved, bu
es allow co
nterest) and
region is dir
gure 4.3).
ne S.A.
the assump
mposed of a
pping to a s
cewise cons
gical and te
ut instead a
-hybridizati
d reference
rectly propo
ption that th
a constant n
specific pos
stant (PWC
echnical co
general mo
on between
e genome
ortional to t
he genome
number of
sition, the
C) and to ha
ntamination
odel takes
106
n
the
ave
ns,
the
fo
a
s
e
s
a
M
F
4
s
G
a
R
te
orm y
m
= x
m
nd e
m
is an
ummary stu
xploit the P
egmentatio
lso one of t
Monso-Varo
Figure 4.4:
4.3.1 GADA
egmentatio
Genome Alt
pproaches
Regi, Monso
ests propos
m
+ e
m
, whe
n additive ze
udy reviewe
PWC and D
on (CBS) pr
the slowest
ona et al. 20
Modeling o
A Algorithm
ons, a newly
eration Det
that achiev
o-Varona et
sed in here
ere x
m
repre
ero-mean w
ed the perfo
IS assump
roposed by
t methods (
008).
of probe hyb
m: While a r
y developed
tection Algo
ve greater a
t al. 2008).
will be bas
esents the a
white rando
ormance an
tions, and f
Olshen et
(Olshen, Ve
bridizations
range of tec
d method im
orithm (GAD
accuracy an
It is for this
ed on the s
average log
om process
nd accurac
found the c
al (2004) to
enkatraman
s (Pique at a
chniques ha
mplemente
DA) include
nd faster co
s reason tha
segmentatio
g intensity o
(Figure 4.4
y of CNV a
circular bina
o be most a
n et al. 2004
al., 2008)
ave dealt w
ed by Pique
es a combin
omputation
at the CNV
on results o
of hybridiza
4). A recent
algorithms th
ary
accurate, bu
4; Pique-Re
with CNV
et al in the
nation of
time (Piqu
V associatio
obtained
107
tion,
t
hat
ut
egi,
e
e-
n
108
through the method of Pique et al (2008). GADA employs three main steps: (1) it
uses a compact linear algebra representation for the genome copy number from
normalized probe intensities, (2) it applies sparse Bayesian learning (SBL), and
(3) it uses a backward elimination (BE) procedure that ranks the inferred points
from the SBL approach and also efficiently adjusts the accuracy trade-off
between sensitivity and false discovery rate (FDR). The combined sequence of
SBL and BE achieves better accuracy because each method reduces the impact
of the assumptions made by the other.
In short, SBL identifies a set of breakpoints with a specified initial level of
sparseness controlled by the prior parameter a. An increase in a leads to a faster
convergence rate of the EM algorithm and to a sparser solution. Nonetheless, if a
is too large, then too rapid of a convergence for the EM algorithm is not
necessarily desirable, as a local minimum may consequently give a suboptimal
placement of breakpoints. Correction for EM local minimum can be obtained by
testing the statistical evidence for ach breakpoint. However, since SBL may still
include some spurious (false) breakpoints, they are then removed through the BE
procedure.
Backward Elimination is a greedy algorithm that ranks the breakpoints
provided by SBL through a critical value T used to establish the final degree of
d
c
a
a
im
u
re
w
n
te
5
s
th
F
a
esired spar
ontrolled th
nd with fina
nd superior
mplement it
se its segm
epresentati
while a grap
euroblasto
est on chro
00k Affyme
ubjects (Fig
he final repo
Figure 4.5: N
lgorithm of
rseness. T
hrough both
al adjustme
r performan
t for the gen
mentation o
on of the G
phic illustrat
ma tumor c
mosome 1
etrix chip sh
gure 4.7). A
ort of this p
Numerical c
representin
The flexibility
h a and T, w
ent for FDR
nce of GAD
nome-wide
utput for the
GADA segm
tion for dete
cell lines is
for the set
how the var
A detailed a
project.
characteriza
ng CNV vec
y in adjustin
with high ini
through BE
DA over pre
500k Affym
e associatio
mentation te
ection of ov
given in Fig
of 303 LAL
riation in CN
analysis of t
ations for th
ctors (Piqu
ng the final
itial sensitiv
E. Given th
viously dev
metrix chip
on tests we
echnique is
ver-express
gure 4.6. In
LES individu
NVs across
this chromo
he GADA p
e et al, 200
set of brea
vity acquire
e computat
veloped one
LALES dat
e propose. A
shown in F
sion of the M
n addition, a
uals genoty
s both chrom
osome will b
piecewise co
08)
akpoints is t
d through S
tional speed
es, we
a, and furth
A numerica
Figure 4.5,
MYCN gene
a preliminar
yped for the
mosome an
be included
omponent
109
thus
SBL,
d
her
al
e in
ry
e
nd
d in
F
a
4
c
s
C
v
a
c
tw
m
Figure 4.6: G
al, 2008)
4.3.2 Helix C
ompare the
egmentatio
Copy Numb
ariations ov
nd multivar
ovariate is
wo methods
method seg
GADA dete
Copy Num
e validity an
on algorithm
er Analysis
ver markers
riate. Both m
the mean o
s differ in th
ments each
ection of CN
mber Analys
nd overall re
m we will als
s method/m
s and offers
methods us
of the intens
he criteria fo
h sample se
NV in Neuro
sis (CNAM
esults obtai
so employ t
module. CNA
s two types
se the same
sities within
or determin
eparately, w
oblastoma T
M) Algorithm
ined throug
the recently
AM is desig
of segmen
e algorithm
n each segm
ning cut-poi
while the m
Tumor Cell
m: In order
gh the GAD
y released G
gned to find
nting metho
m. For a give
ment for tha
nts. The un
ultivariate m
Lines (Piqu
to better
DA
GoldenHeli
local
ods: univaria
en sample,
at sample. T
nivariate
method
110
ue et
ix
ate
the
The
111
searches for copy number regions that may be similar across all samples,
segmenting all samples simultaneously. The multivariate method is preferable for
detecting very small copy number regions, and for finding conserved regions,
such as CFH. A schematic illustration for the two methods is shown in Figure 4.8.
112
Figure 4.7: CNV analysis on Chromosome 1 for 303 LALES Latinos subjects
A.
Note. (A) Sensitivity and FDR parameters are: a = 0.2, T = 2.
B.
Note. (B) Sensitivity and FDR parameters are: a = 0.2, T = 3.
113
Figure 4.8: Representation of univariate and multivariate GoldenHelix CNAM
segmentation algorithm (www.goldenhelix.com)
The CNAM algorithm also uses a permutation test to verify copy number
segments. While a t-test could be used to compare significant differences
between pair-wise adjacent segments, an adjustment for the final p-value would
require multiple correction comparison for all cut-points. Since there is strong
correlation between adjacent markers, a Bonferroni adjustment would be too
conservative and therefore underestimate the true variation in DNA
segmentation. Instead, a permutation test is performed by randomizing the
marker information and performing the same univariate or multivariate procedure
as for the original statistic.
114
Briefly, if MaxP is the maximum pair-wise permuted p-value parameter, for each
adjacent pair of segments the CNAM permutation method applies the following
procedure:
1. Calculate the original sum of squared deviations from the means of the
two adjacent segments.
2. Set count = 1.
3. Do the following 10 ⁄ MaxP - 1 times:
3.1. Randomly shuffle the marker label columns (i.e. multivariate columns
are kept together).
3.2. Find the optimal two-way split that minimizes the sum of squared
deviations from the means within the random data.
4. If the randomly computed sum of squared deviations is less than or equal
to the original sum of squared deviations, set count = count + 1
5. The pair-wise p = count ⁄(10 ⁄ MaxP)
If the permuted pair-wise p ≤ MaxP for all adjacent pair-wise segments, all of the
segments are statistically significant. Otherwise, CNAM repeats the procedure
with the optimal k - 1 way segmenting, until either all pair-wise segments are
found significant, or all the data in a region represents a single segment.
115
4.3.3. HELIX CNAM Association Test: The copy number segment covariates
identified through CNAM were discretized into three state (-1,0,1) covariates at
thresholds that signify a transition between copy number states. Interactive tree
analysis was then performed to find the most significant segments. P-values
were generated from an analysis of deviance test, a mixture of chi-squared
distributions with 1 and 2 degrees of freedom. For each marker the segmenting
algorithm decides on a two or three way split which changes the number of
degrees of freedom. Adjusted p-values take into account the number of multiple
tests that were considered for each marker. A benefit of discretizing the data is
that it does not allow outliers (extremely small or large logR values) to have more
influence on p-values than those logRs close to the chosen threshold. However,
we note that since the ‘best’ chosen p-value of each marker is drawn from a chi
square distribution that may differ in the degrees of freedom from those of the
other markers, a uniform distribution of p-values cannot be obtained. We report
adjusted p-values, where for each marker the adjustment is correcting for the
multiple testing of searching through all possible cut-points to find the optimal
ones.
4.4 Proposed Tests of Association for Copy Number Variation: CNV
algorithms perform segmentations of sampled genomes into regions of lost,
normal, or inserted DNA material. However, these methods do not provide a
statistical measure for assessing levels of association between the number and
116
length of altered DNA copies and the risk for disease (i.e. case/control
status, different stages of a tumor). CNV association tests could offer a
quantification and clear interpretation of the effect of variation in recombinant
DNA product and of expression and regulation of genes. The following sections
propose several methods designed for the purpose of testing association
between disease and CNV status.
The difficulty in conducting a test of association between disease status
and CNV segmentation is caused by the fact that an individual’s segmentation for
types of DNA copies does not necessarily overlap the exact physical genetic
positions as for the genomes of other individuals, nor do segments respect the
same number of copies for a given copy state k (i.e. lost, normal, or inserted
DNA). For example, while one individual may inherit a gain of DNA material
between chromosomal positions (j, j+1), a second individual may contain a range
of copy states k in the same (j, j+1) location. Thus, a straightforward comparison
of copy number variation at a particular position or region of interest requires
cautious consideration. Two main factors influence the type and interpretation of
the proposed CNV association tests; (1) the minimum segmentation length
shared by all samples, or (2) the arbitrary region of interest which may be
composed of several types of shared or unshared DNA copy states (i.e. entire
gene, promoter, or any genomic length sequence) (Figure 4.9). For each of these
two options a series of four association tests are described.
o
a
i
st
j+
p
…
fo
s
F
s
Minim
ption is det
lteration oc
= 1, …, n, a
tate may be
+p] that inte
hysically m
…, n. The se
orms a new
quare tests
Figure 4.9:
amples
mally shared
termined by
ccurs within
and a for lo
e computed
ersects all s
mapped for a
et of interse
w set of sha
s are applie
Illustration
d DNA copy
y the distinc
n each indiv
oci positions
d for each s
samples, wh
all of the de
ection point
red DNA se
ed (Figure 4
n of copy
y segmenta
ct positions
vidual (Figu
s j = 1, …, m
segment len
here j < p ≤
efined CNV
ts for each
egments fo
4.9).
number v
ation: The f
at which a
re 4.9). Thu
m, the test
ngth [j, j+1]
≤ m. Bounda
V fragments
CNV fragm
r which “Co
variation s
first segmen
a variation in
us, for a se
of associat
or for the m
ary limits [j,
s of every in
ment across
onditional” C
segmentatio
ntation
n DNA
et of individu
ion for CNV
minimum se
, j+p] are
ndividual i =
s all individu
CNV Chi-
on across
117
uals
V
et [j,
= 1,
uals
four
118
The advantage of assessing the CNV effect for a predefined or
minimum segment rests in the fact that the DNA state k for any given individual i
will remain constant (loss, normal, or gain), though it will vary across individuals.
This method is preferable for finding the effect of very small copy number
regions, and for assessing the significance of segmentations as predefined
through CNV algorithms. However, the drawback of this approach rests in the
lack of a broader interpretation for the role of DNA alteration in a particular region
of interest such as an entire gene, a region that could encompass a series of
sequential CNV segments. Also, due to the increased number of intersecting
CNV subsets across all samples, we need to adjust the Bonferroni corrected p-
values for a high number of segments, decreasing the power for detecting
significance. However, in order to avoid overcorrection of p-values, a permuted
method is also implemented.
Arbitrary DNA Region of Interest: The second option allows one to analyze
an entire region of interest, and therefore to better interpret the biological effect of
a gene’s copy variation on risk of disease. The complexity of evaluating an
arbitrary sequence flanked by any two loci (j, j+p), where j < p ≤ m, is due to (1)
the variation in altered states k within an individual’s genome between the two
bordering markers, and (2) the variation between individuals for number of copies
for either lost, normal, or inserted DNA in that region. To solve this discordance,
a series of approaches are proposed in sections 4.3.1 – 4.3.2. In short, a
summary statistics for either (1) changed versus normal DNA copy state, or (2)
119
for lost, normal, and inserted DNA is computed for each individual. Chi-
square tests are then applied for each chosen region to test association between
CNV status and disease.
4.4.1 Chi-square Test for Assessing the Effect of Change vs. Normal Copy
State on Disease: In order to gain a preliminary estimation for the effect of
change (Loss or Gain) vs. normal DNA copy, a Chi-square test with 1 degree of
freedom is employed for testing association with case/control status. While loss
and gain are both coded as “change”, the advantage of this preliminary test is in
obtaining an overall estimate of the amount of abnormal DNA copy and its
association with disease status. Moreover, as sparse frequency in either loss or
gain of DNA material among individuals may occur for some CNV segments, the
inclusion of both loss and gain in counting “change” states avoids the possibility
of obtaining less than 5 counts per cell. Thus, a two by two table with
case/control vs. change/normal copy status is constructed for each DNA
segment. A Bonferroni p-value is then obtained as a correction for the total
number of tested segments. It is important to note that this test does not consider
the intensity or amplitude of copy number variation at a given position, but only
the copy status is recorded for frequency counting (i.e. change vs. normal copy).
120
Association for change/normal copy state and case/control status can be
applied either for each of the minimum shared DNA copies across all individuals,
or for an arbitrary sequence.
4.4.2 Conditional Chi-square Test for Assessing the Effect of Lost, Normal,
or Inserted DNA Copy on Disease Status: Let the number of individual
samples i run from i = 1 to n, and loci positions range from j = 1 to m. Also, let
DNA Copy Loss = L; Normal DNA Copy = N, DNA Copy Insertion/Gain = G.
Table 4.1 illustrates the tabulation scheme for DNA copy vs. case/control status
when calculating Chi-square tests of association. Because we observe several
instances where the frequency of lost, normal, or gained DNA fragments are
absent in both cases and controls, leading to zero count cells, we describe a
conditional Chi-square test method that can be implemented for both a
categorical coding of DNA copy state (i.e. Lost – Normal – Inserted), and also for
a weighted measure that takes into account both the intensity of hybridization
and the length of hybridized DNA for a given DNA sequence.
121
Table 4.1: Chi-square tabulation scheme for DNA copy vs. case/control status
DNA Copy States Loss Normal Gain
Cases Cs_l Cs_n Cs_g
Controls Cn_l Cn_n Cn_g
Note. Cs = Case; Cn = Control; l = loss; n = normal; g = gain.
The conditional Chi-square technique is as follows: for segment j and across all i
individuals:
1. If all LNG cell frequencies are greater or equal to 5, perform Chi-square on
2 degrees of freedom for a three by two table (LNG vs. case/control
status) Æ
2
LNG
χ
2. If there is NO Loss of DNA in BOTH cases and controls (i.e. Loss
frequency L= 0), perform a Chi-square test on 1 degree of freedom for
Normal vs. Gain in DNA copies Æ
2
NG
χ
3. If N = 0 in both cases and controls, perform a Chi-square test on 1 degree
of freedom for Loss vs. Gain in DNA copies Æ
2
LG
χ
4. If G = 0 in both cases and controls, perform a Chi-square test on 1 degree
of freedom for Normal vs. Loss in DNA copies Æ
2
NL
χ
5. If L < 5 in cases and L ≥ 5 in controls, or L ≥ 5 in cases and L < 5 in
controls perform
2
NG
χ
122
6. If N < 5 in cases and N ≥ 5 in controls, or N ≥ 5 in cases and N < 5 in
controls perform
2
LG
χ
7. If G < 5 in cases and G ≥ 5 in controls, or G ≥ 5 in cases and G < 5 in
controls perform
2
NL
χ
4.4.3 Wilcoxon gene-based Association Test: An alternative approach to the
Chi-square test is to relate SNP variation to CNV, and to adopt an approach akin
to the tag-SNP concept. Through this scheme we attempted to move past the
perils of a strategy based upon marginal tests by conducting Roger Pique’s
method, a gene-at-a-time analysis, relating mean copy number within a gene to
phenotype. Such an approach seems reasonable, succeeds in reducing the
analysis to a reasonable number of tests, and thereby avoids the worst excesses
of multiple comparison corrections.
For this purpose the proposed Wilcoxon test for computing the average, lost,
normal, or gained DNA difference between cases and controls has been applied
for every gene. For each chromosome we first tagged the location of each gene
according to the Affymetrix map information.
Weighted Representation of CNV Status: Let the number of individuals range
from i = 1 to n, and the DNA copy segment s have length (j, j+p), where j = 1 to
m, and j > p ≤ m. Also, let the amplitude of copy numbers be denoted as: l for
segments with lost DNA copy, where -2 ≤ l < 1; n for segments with normal DNA
123
copy, where n = 1; and g for segments with gained DNA copy, where g > 1.
Define a region of interest to be a sequential collection of segments s that spans
either a gene or any other arbitrary DNA sequence (i.e. a gene of interest). A
measure for the amount of lost, gained, or normal DNA is set to be proportional
to both the length of copied DNA and the number of copies for that particular
segment. Then, for every individual i, the summary statistic for a copy state k (L,
N, G) equals the sum of the products of copy number (hybridization intensity) and
length (j, j+p) for each segment contained in the chosen region. Thus, the total
weighted LNG measure for individual i in a chromosomal region is given by
∑
=
+ =
m
j
i
p j j length l L
1
) , ( * ;
∑
=
+ =
m
j
i
p j j length n N
1
) , ( * ; and by
∑
=
+ =
m
j
i
p j j length g G
1
) , ( *
, where ) 1 , 2 [− ∈ l , 1 = n , 1 > g , and j < p ≤ m. Individual summary measures L
i
,
N
i
, and G
i
are then summed up and their values are compared between cases
and controls through the conditional Chi-square algorithm described above. The
weighted CNV approach (by length and copy number) can be used to
differentiate whether the number of DNA copies or the length of copied DNA has
a stronger effect on gene expression and function. For example, we can better
compare the impact of high copy numbers even if for short DNA segments
compared to the effect of low copy numbers yet for longer DNA segments. In
addition, if we examine a DNA sequence across all samples such that the copy
state remains unchanged (i.e. either L, N, or G) within each individual’s genome
but varies between individuals, the LNG summary statistics will depend on the
124
inserted/deleted copy numbers rather than length. In contrast, the length of
altered DNA may be evaluated for samples containing relatively equal number of
copies. Nonetheless, the interaction or additive effect of length of copied/lost
DNA and number of DNA copies may be evaluated simultaneously.
4.5 Copy Number Variation Analysis Results: The proposed CNV association
tests were applied on the chromosomal segmentations estimated through the
GADA and CNAM algorithms. Based on the GADA detection of copy number
variation we first implemented the proposed Chi square and Wilcoxon tests on
the GAW Framingham T2D data. A series of tests allowed us to conclude that
the Wilcoxon gene-based tests for measuring differences between cases and
controls for the gene-based average, lost, normal, or gained DNA were most
reliable. This experience allowed us to then focus our analysis on the LALES
data through the methods we accepted from analyzing the GAW data. Segment
based associations were also performed for the CNAM genome segmentation
and further compared with the Wilcoxon based results.
We report the CNV association results based on the Chi-square segment-
based tests, the Wilcoxon gene-based copy number association tests, and the
CNAM segment-based association tests. Through the previously completed
GWAS we found strong evidence for the role of two main pathways attributed
with risk of developing early AMD; the SGK – FOXO3A - CFH complement
network and the ASTN1 neurodevelopment mechanism. These two pathways are
125
of particular interest, as the SGK gene, for which we found highly significant
haplotype association with AMD seems to regulate the complement pathway in
both Caucasian and Latino populations.
The ASTN1 gene gave the strongest genome-wide significance signals
though both allelic and HTR analyses, and appears to be specific for Latinos
alone. In-situ hybridization (Tedeschi el al. – work in progress) showed over-
expression for the ASTN1 protein in AMD affected retinal tissue. Since both SGK
and ASTN1 genes present novel findings, we assessed their copy number
variation and the potential association with AMD. In addition, we analyzed the
copy polymorphisms and their associations with AMD for the most susceptible
GWAS genes we previously identified.
Notably, we found significant association through the Wilcoxon gene-
based test for three genes identified earlier in our GWAS: JAK1, RYR2, and DCC
(Tables 4.2 – 4.5). Moreover, we detect consistency in significant AMD
association for JAK1 across all three tests: logR SNP intensity association,
GWAS allelic test, and Wilcoxon gene copy number (Table 4.3). Figures 4.10
and 4.11 illustrate the patterns of copy number for JAK 1. While we did not
explore in detail the role of JAK1 on early AMD, these results offer new insight
into what may prove to be a potential early AMD susceptible gene.
126
Table 4.2: Comparison of association results for the JAK1 gene/SNPS
TEST CHR Gene P Value
Physical
Position
Cyto SNP RS ID
GWAS 1 JAK1 2.33E-04 65,140,630 p31.3 rs12743599
CNV 1 JAK1 1.000E-04 whole gene
SNP Hybr. 1 JAK1
P: 2.48E-04
aP: 4.37E-02
65,162,070 p31.3 rs7538610
Note. P = raw P value; aP = adjusted P value; Cyto = Cytoband;
Hybr. = Hybridization.
Table 4.3: Log2 ratio (SNP hybridization) association with early AMD
CHR GENE dbSNP RS ID Position P Value aP Value
1 JAK1 rs7538610 65,162,070 2.48E-04 4.37E-02
1 ASTN1* rs2223301 175,267,126 6.47E-05 1.42E-02
1 ASTN1* rs742123 175,269,861 2.48E-04 4.37E-02
1 RYR2 rs10802581 235,265,216 1.21E-04 2.40E-02
1 RYR2 rs6703530 235,447,811 3.25E-06 1.16E-03
1 RYR2 rs16835058 235,487,632 2.40E-04 4.25E-02
1 RYR2 rs10925414 235,665,518 2.48E-04 4.37E-02
6 SGK rs6935311 134,952,731 9.88E-05 2.00E-02
6 FOXO3A rs508903 109,173,454 7.37E-05 1.00E-02
18 DCC rs7237478 49,628,036 3.10E-05 7.69E-03
18 DCC rs17417046 49,238,995 7.37E-05 1.58E-02
18 DCC rs7240099 48,066,941 1.88E-04 3.46E-02
18 DCC rs17755318 48,345,285 1.88E-04 3.46E-02
18 DCC rs8084280 48,980,747 2.48E-04 4.37E-02
Note. JAK1, RYR2, and DCC show CNV association with AMD; ASTN1 gave the
highest genome-wide association results and showed in-situ over-expression
[Latino specific]; SGK and FOXO3A are main upstream regulators of CFH [Latino
and Caucasian specific]; SGK gave HTR p-value = 3.00E-07 (permuted P-value
= 5.00E-03); DCC gave allelic and HRT association; aP = adjusted P value.
127
Assessment of the GADA segmentation for this gene suggests an
increase in the JAK1 copy number in early AMD cases (Figure 4.11). Similarly,
the RYR2 logR and its corresponding CNV association offers new evidence for
the validity of the GWAS findings we previously recorded. We also found
evidence for high association between AMD and both the SGK and the FOXO3
logR marker hybridization copies. While we explored the regulatory role of SGK
on FOXO3 and consequently on CFH, this is the first evidence we obtain for the
FOXO3 gene in our LALES study. The conjunction of CNV association results for
these two genes, and the interdependent biological role of SGK and FOXO3
(Chapter 2) substantiate the findings and hypothesis we proposed for their
function in the complement AMD pathway. We do note however, that ASTN1, for
which Tedeschi et al. obtained positive in-situ hybridization, did not give any
evidence for CNV association with AMD (Figure 4.12). This result may be caused
by the lack of an appropriate CNV association test. Specifically, a regression
analysis with predictor values equal to the copy number, and the phenotype
values as continuous variables (i.e. drusen size) would better depict the increase
in copy number (or over expression) observed in the in-situ hybridization. Based
on these results we wish to further explore the role of the complement system in
the activation of early AMD through SGK and FOXO3, and the role of ASTN1.
Also, while we are not yet clear on the exact role JAK1 and RYR2 on AMD,
further exploration through either gene ontology or a more powerful GWAS study
should benefit this study.
128
Table 4.4: CNV association with early AMD; p-values are based on the Wilcoxon
test for average copy number (GADA segmentation)
CHR Gene
P-Value
Average CNV
CHR Gene
P-Value
Average CNV
1 JAK1 1.000E-04 9 C9orf68 5.304E-03
1 PADI2 4.377E-03 9 SLC1A1 5.304E-03
1 RYR2 7.875E-03 9 GNA14 1.114E-02
1 LRRC8B 1.386E-02 9 TMEM2 2.018E-02
1 ITPKB 1.762E-02 9 BARX1 2.534E-02
1 PGM1 1.809E-02 9 PHF2 2.534E-02
1 NGFB 2.576E-02 9 ANGPTL2 3.107E-02
1 C1orf140 2.618E-02 9 RALGPS1 3.107E-02
1 COG2 2.910E-02 10 ITGA8 4.730E-03
1 SLC25A24 2.910E-02 10 PAX2 6.065E-03
5 COMMD10 2.954E-03 10 EBF3 8.753E-03
5 RNF130 1.386E-02 10 DIP2C 1.186E-02
5 HMHB1 2.534E-02 10 SORCS3 1.267E-02
5 AP3B1 2.576E-02 10 HIF1AN 1.702E-02
5 GABRB2 3.383E-02 10 ZNF365 1.972E-02
6 HTR1B 4.527E-03 10 SPAG6 2.039E-02
6 KIAA1009 9.399E-03 10 C10orf108 2.534E-02
6 PPP1R14C 1.370E-02 10 LARP5 2.534E-02
6 BTBD9 2.379E-02 10 PIP5K2A 3.204E-02
6 C6orf213 2.448E-02 10 ERCC6 3.420E-02
6 SMPDL3A 2.448E-02 18 CDH2 2.950E-02
6 UNC5CL 2.991E-02 18 DCC 3.804E-02
F
N
r
J
A
Figure 4.10: C
Note. ASTN1:
retinal AMD af
JAK1 and RYR
AMD (p-values
Chromosome 1 Position (Mbp)
hromosome 1
highest geno
ffected tissues
R2: significant
s in the top 50
CNV Segmen
me-wide assoc
; potential earl
t CNV associat
genome-wide
ntation (GADA
ciation results
y AMD neurod
tion with AMD
signals.
Copy N
algorithm)
(allelic and HT
development p
; JAK1 and RY
Number
TR); over-expr
pathway specif
YR2 SNPs high
ression for in-s
fic for Latinos.
hly associated
situ hybridized
d with early
129
F
Figure 4.11: JA
Chromosome 1 Position (Mbp)
AK1 CNV on C
Chromosome 1 1 (GADA)
Copy Number r
130
F Figure 4.12: A
Chromosome 1 Position (Mbp)
STN1 CNV se egmentation
Copy Number
131
132
Table 4.5: Genome-wide allele association signals included in CNV regions
associated with early AMD
CHR P Value Position Cytoband SNP RS ID Gene
1 4.62E-05 235,796,792 q43 rs2779401 RYR2
1 2.33E-04 65,140,630 p31.3 rs12743599 JAK1
18 1.36E-04 49,289,928 q21.2 rs4939732 DCC*
Note. *GWAS HTR: DCC haplotype rs4940260 - rs4939732 - rs10502979 -
rs8088048 P-value = 4.63*10
-7
and permutation p-value = 7.0*10
-3
CHR = chromosome; SNP RS ID = SNP RS identification number.
133
Chapter 5
Copy Number Variation in the Framingham Study
In this project we tested for association between copy number variation and
diabetes in a subset of individuals from the Framingham Heart Study. We used
the 500K SNP data and called copy number variation using two algorithms: the
Genome Alteration Detection Algorithm of Pique-Regi et al. (Pique-Regi, Monso-
Varona et al. 2008) and the software of Golden Helix. We then tested for
association between copy number and diabetes using a gene-based analysis.
Our results show little evidence of association between copy number and
diabetes status. Furthermore our results indicate a relatively poor level of
agreement between top associations resulting from the two programs. We then
examined potential causes for this difference in behavior and the implications for
future studies. Based on these results we were able to infer more general
conclusion regarding the use of the segment based Chi-square CNV association
test vs. the gene based Wilcoxon CNV association test, and to also compare the
GADA vs. the CNAM algorithms for CNV estimation.
134
5.1 Results: Our analysis currently focuses on a subset of individuals with
diabetes. In order to protect against undiagnosed diabetes we insisted that cases
had a fasting plasma glucose measure of at least 126 mg/dl, while controls had a
measure less than 110 mg/dl. The controls were also frequency matched to
cases by 5-year age intervals, sex, and ever smoked status (age and smoking
history taken at baseline). Some individuals are related, but the majority are not
(at least by the pedigree information provided). This resulted in a final sample
size of 194 cases and 213 controls, for which we analyzed the 500K SNP data.
We then called copy number. Space prohibits a display of estimated copy
number here, but, in summary, we observed that previously known sites of copy
number variations (e.g., (Redon, Ishikawa et al. 2006)) were detected as such in
this data set, and that, for each method, there is clear correlation between the
locations at which copy number changes across individuals (despite the fact that
the algorithm call copy number independently for each individual). Moving on to a
more detailed analysis, in Table 5.1 we present a summary of the results for the
GADA and CNAM analyses for each chromosome. No gene attains genome-
wide significance when tested for association between copy number and
diabetes. However, some interesting features are observed. For example, under
the null hypothesis of no association, p-values should be uniformly distributed,
with each gene having a probability of 0.05 of being reported as `significant' on
this basis. Thus, for each chromosome, the number of genes showing
135
association at the level p < 0:05) will be Binomially distributed. We used this to
test whether there was an over-(under-)representation of such associated genes
on any chromosome. After a multiple comparison correction several
chromosomes showed evidence of excess, or lack, of associated genes.
Furthermore there is a clear excess of small p-values for the CNAM analysis.
This can be seen in Figure 5.1, where we show a Q-Q plot of observed vs.
expected p-values genome-wide (we plot -log10 of the p-values). The QQ-plots
reject the relative over-(under-) abundance of small p-values resulting from the
CNAM(GADA) analysis. The distribution of the genes with the smallest p-values
when tested for association with diabetes varies along the genome for the GADA
and CNAM analyses. There appears to be little agreement between the two
methods. This raises the question of why the results of the methods show such
poor agreement among these genes. Recall that the first step for both methods is
the normalization of SNP intensities, but that the two methods employ different
normalization techniques. We therefore examined the degree of agreement
between the SNP intensities after normalization. We show this in Figure 5.2 (left
side), a scatter plot of the intensities resulting from the two schemes. The results
are striking: there is very poor correlation between the normalized intensities.
Thus, even if the methods were using identical routines to call copy number,
which they are not, we would expect to obtain widely different calls of copy
number, with consequent large differences between p-values resulting from
subsequent tests of association between copy number
a
fo
d
fo
w
th
d
n
m
s
F
G
N
v
d
nd phenoty
ound. Howe
efinitions o
or a SNP is
whereas if n
han 2; othe
ifferent for
ormalizatio
methods cal
imultaneou
Figure 5.1: Q
GADA and C
Note. The pl
alues resul
iabetes usi
ype. A poor
ever, it is im
of `cut-point
s below CL
normalized
rwise the c
the two me
on, result in
ll copy num
usly determ
Q-Q Plot of
CNAM
lot shows o
ting from th
ng the GAD
r agreemen
mportant to
s' CL and C
it is determ
intensity is
opy numbe
ethods and,
wildly diffe
mber as 2, b
ine copy nu
f p-values fo
observed (x
he gene-ba
DA and CN
nt in copy nu
note that b
CU, defined
mined to hav
above CU
er is called a
combined
erent calls o
but it is seld
umber to be
or gene-bas
x-axis) and
sed test for
AM analys
umber betw
oth method
d such that
ve a copy n
the copy nu
as 2. These
with the dif
of copy num
dom the cas
e other than
sed averag
expected (y
r associatio
es.
ween the tw
ds use som
if the norm
number less
umber is ca
e arbitrary c
ffering resu
mber. Most o
se that the m
n 2 for any
ge CNV ass
y-axis) valu
on between
wo program
mewhat arbit
alized inten
s than 2,
alled as gre
cut-points a
ults from
often both
methods
given SNP
sociations fo
ues of -log1
CNV and
136
s is
trary
nsity
eater
are
.
or
0 p-
F
N
fr
a
S
n
c
w
G
Figure 5.2: S
Note. The le
rom the CN
nalysis (y-a
SNP. There
ormalizatio
alls for rand
we note a st
GADA.
Scatter plot
eft plot show
NAM norma
axis) for a r
is a striking
on routines.
domly chos
triking lack
ts
ws a scatte
lization (x-a
randomly ch
g lack of co
The right p
sen (but rep
of agreeme
r plot of nor
axis) and A
hosen indiv
orrelation be
plot is a sca
presentative
ent between
rmalized SN
PT normali
vidual. Each
etween the
atter plot of
e) regions a
n calls resu
NP intensit
zation used
h point corre
results of t
resulting c
along the ge
ulting from C
ies resulting
d for the GA
esponds to
the two
opy numbe
enome. Ag
CNAM and
137
g
ADA
o a
er
ain,
138
Table 5.1: Summary of gene-based CNV associations for GADA based
segmentation (Wilcoxon rank-sum test)
CHR
Affymetrix
Genes
Genotyped
Genes
%GADA
Sig. Genes
%CNAM
Sig. Genes
1 1650 1500 5.12 12.60
2 1046 974 0.72 16.32
3 893 829 0.36 13.51
4 661 634 1.31 10.57
5 722 680 7.45 22.35
6 883 822 1.84 18.86
7 709 655 3.09 6.87
8 539 505 10.18 1.78
9 632 582 0.87 4.81
10 636 603 1.84 8.96
11 1028 907 1.68 4.85
12 871 799 4.80 15.14
13 295 287 8.81 10.45
14 527 474 4.25 12.66
15 507 474 1.92 9.92
16 580 483 5.64 3.52
17 851 712 8.09 0.14
18 241 241 4.17 1.66
19 912 713 1.51 0.14
20 470 439 2.75 5.47
21 201 187 0.00 9.09
22 365 326 3.12 0.00
Total/Avg 15219 13826 3.50 9.70
Note. For each chromosome we show the overall number of genes as defined by
Affymetrix annotation _les (column 2), the number of those genes for which we
had SNP intensity data (column 3), and the percentage of those genes in which
associated CNV was detected at the p=0.05 value (uncorrected for multiple
comparisons) using GADA (column 4) and CNAM (column 5);
CHR = Chromosome; Sig. = Significant.
139
5.2 Discussion: The importance of CNV has only recently become appreciated.
The relationship of CNV to phenotypic variation is even less well developed. We
hope the analysis presented in this paper is a useful step forward in this area, but
it is certainly the case that it merely scratches the surface of what is likely to be
an extremely complex challenge. CNV lacks some of the `neatness' of SNP data.
It does not occur at well-defined positions (i.e. the points at which CNV changes
is often different across individuals). Furthermore, for a variety of reasons, the
500K chips platforms are not ideal for detecting CNV when compared to more
recent platforms. It is also the case that many functional mutations occur outside
genes (in promoter regions, for example). As such, many regions of CNV may
not be detected. It is also likely to be challenging to detect small CNVs using
these technologies. An approach in which we break the genome into regions of
maximal length, such that copy number remains constant for each individual
within each region, results in an extremely large number of regions (around 82K
regions for the samples analyzed in this paper). While each region can be treated
as if it was a (multi-allelic) locus, and marginal tests can then be performed, such
tests are likely to be far from optimal. The principal reason for this is the
correlation between intervals. This is directly analogous to the situation with SNP
analysis, but may well be even more complex in this new setting. An alternative
approach might be to attempt to relate SNP variation to CNV, and adopt an
approach akin to the tag-SNP idea. In this paper we attempted to move past the
perils of a strategy based upon marginal tests by
140
conducting a gene-at-a-time analysis, relating mean copy number within a gene
to phenotype. Such an approach seems reasonable, succeeds in reducing the
analysis to a reasonable number of tests, and thereby avoids the worst excesses
of multiple comparison corrections. However, it should be noted that the very
poor agreement in results from the two analysis methods explored in this paper
indicates that substantial work remains to be done. It is this lack of agreement
that represents the principle lesson to be drawn from the current study.
Normalization is regarded as an important, but somewhat routine step in
analyses such as these. However, our paper demonstrates that the particular
method of normalization chosen can have a key influence on the results
obtained. In our case, the two normalization methods are both widely-used, and
appear inherently sensible, but result in normalized intensities that are very
poorly correlated across methods. Consequently, subsequent analyses will
produce wildly different results. As the phrase “garbage in, garbage out" reminds
us, it is important to ensure that such normalization routines are adding signal,
rather than noise to the data. As such, there is an urgent need for a widespread
comparison of normalization methods in order to better assess which of them
perform well. Finally, it should also be noted that, while we look at all genes in
the present study, there is little reason a priori to expect candidate genes chosen
on the basis of a SNP study (say) to also have a function due to CNV. It is
entirely possible that genes that affect phenotype through CNV will be distinct
141
from those that have an effect due to SNP polymorphism. In a recent study,
Stranger et al. (Stranger, Forrest et al. 2007), who examined data from Phase 1
of HapMap, noted that SNPs and CNVs captured 83.6% and 17.7%
(respectively) of the total detected genetic variation in the expression of around
15,000 genes, but that \the signals from the two types of variation had little
overlap", but other studies take a different view (e.g. McCarroll et al. (McCarroll,
Kuruvilla et al. 2008).
142
Chapter 6
Integration of CNV, GWAS, and Population Admixture Results
6.1 Analysis Overview: During the research process of my PhD research I have
worked on several types of analyses, with the intent of exploring how diverse
statistical methods may be useful in investigating genetic risk factors. This
experience has also enabled me to plan a series of additional tests and methods
that I am interested in developing in the near future, especially in the realm of
copy number variation. It is also important to specify that due to the ethnic nature
of the analyzed cohort we had the opportunity to investigate the amount of
population admixture existent in Latinos ascertained through this study. While we
have not had the chance to perform additional tests with adjustment for
population admixture, our current findings suggest that there exists a strong
ancestry-driven component that should be incorporated in future Latino-based
association studies, namely the extent of population admixture and its effect on
producing false positives. The following paragraphs will summarize some of the
main lessons I have learned from the genome-wide and copy number variation
analyses, and from the study of population admixture.
143
Through the genome-wide study the goal was to either validate previously
reported results or to discover novel signals, and to further propose a probable
genetic and biological pathway responsible for the increased risk of developing
AMD. This task can be categorized in three parts: (a) the selection of SNPs that
were to be analyzed, (b) the methods used in analyzing the data, and (3) the
decisions made in selecting susceptible loci and in inferring a probable pathway
mechanism. Besides the exclusion of markers with call rate < 95% or in Hardy-
Weinberg disequilibrium [443,283 (88.60%) SNPs with a HWE p-value > 2*10
-8
were kept], an additional threshold of 0.05 for minor allele frequency (MAF) is
often considered. However, recent years have seen a reverse in the MAF
stringency criterion, favoring instead a summary of all significant alleles, with the
additional listing of those significantly associated yet rare alleles. This note is
important not only from the perspective of potential future replications of the
currently reported SNP associations, but also because genotyping of rare alleles
may be purposefully chosen when performing copy number variation analysis
(and therefore not eliminated from the dataset). Thus, for the current report I
have allowed the inclusion of SNPs with MAF < 0.05 and ensured to search if
any of the significant results were indeed rare in the study population.
Allelic associations were tested through additive, recessive, and dominant
models of inheritance. Comparison of results from the recessive and dominant
144
analysis revealed that the recessive model was almost always a better fit,
which is in accordance with most of the papers published on this particular
disease. Based on the top 50 allelic results for each of the three models of
inheritance, only 5, 1, and 0 had MAF < 0.05, respectively. Moreover, all of the 5
SNPs with strongest associations (ASTN1 p-values in the range of 10
-5
-10
-7
)
have MAF of 0.17 - 0.26. Of all significant alleles with MAF < 0.05, none
pertained to well known AMD biological pathways and were not pursued in the
discussion of the GWAS project. Since our sample size was not particularly
large, we used haplotype trend regression analysis (HTR) in an attempt to gain
power. Primarily, HTR allowed for a better identification of susceptible regions for
which additive contributions of multiple markers produced a detectable effect on
phenotype. Of the top 40 haplotypes we report in our study, only 10 were
contained within genes that themselves enclosed one or more SNPs ranked in
the top 50 genome-wide allele test signals. It is worth noting that for HTR we
determined a set of 249,352 tagging SNPs that captured the majority of variation
across the genome of the LALES population, and thus allowed for a reduction in
the number of multiple tests. The almost double difference in the number of
SNPs used for the two analyses (allelic vs. HTR) may also account for some of
the lack in repetition between the two main sets of findings. Finally, we assessed
the validity of haplotype p-values by creating 100 permuted datasets and then
calculating
145
the chromosome-wide p-value for a given haplotype as the proportion of
permuted replicates in which the smallest p-value across all haplotypes on that
chromosome was smaller than the p-value observed in the original data.
One of the underlying ‘drawbacks’ of this analysis was the rather small
sample size (101 cases : 202 controls) genotyped through a standard 500k
Affymetrix SNP chip dataset, and therefore, its effect on the power of detecting
signals at the 10
-8
significance level. When chosing which of the potential signals
should be described, I based my decisions on the following main criteria: (1) the
strongest signals across all chromosomes, (2) previously reported associations,
and (3) biological pathways known to be involved in the development of AMD.
Since we have not identified signals previously reported to be associated with
AMD, nor did we obtain significance at the genome level, we focused our efforts
on characterizing the strongest risk factors, and primarily on those loci involved in
main pathways of interest. We should note however, that the current lack of
confirmation for previously reported findings may account for several reasons.
First, this study is based on a Latino population cohort, unlike most of the earlier
projects which have been centered on Caucasian or African populations.
Secondly, this dataset was not generated with the intent of obtaining dense SNP
coverage in particular regions of interest. As such, for some of the genes that
have been abundantly reported in earlier candidate gene studies, we have a
rather poor coverage, with some of the highly acclaimed SNPs not being
genotyped (i.e. the Y402H variant). And thirdly, our study is based on subjects
146
diagnosed with early AMD, whereas most of the previous results have been
confirmed from studying late AMD cases, the advanced form of this disease.
The encouraging news however, is that a parallel in-situ hybridization for
subjects attained from the same dataset verified the strongest signals of the
current GWAS (Trische et al, 2007). However, even though weaker statistical
associations may be true risk factors, due to the absence of genome-wide
significance or alternative study confirmations, a replication study on a similar
Latino population or other additional in-situ hybridization(s) is/are necessary.
Thus, in an attempt to best characterize susceptible regions and their role on
developing of AMD, I conducted a gene ontology search for each of the top allelic
and haplotype hits and further explored the biological functions of those genes
involved in well known AMD pathways. This in turn led to the proposal of several
regulatory mechanisms between selected markers and AMD, and to the
description of a series of events through which a given gene may trigger a
cascade of other downstream responses.
6.2 Copy Number Variation: CNV was the second major analysis we performed
on the 500k data, with the purpose of describing the type of genetic material
inherited by regions that harbor markers associated with AMD. For this purpose, I
proposed and ran a series of association tests informed by the copy number
variation segmentation and by disease status. We aimed the copy number (CN)
147
association tests at AMD susceptible regions selected through the allelic and
HTR GWAS results. Combining the two CNV methods (segmentation and
association) allowed us to assess the correlation between allelic and CNV
associations at a given region, and helped us to deduce whether a clear
distinction for the type of inherited DNA exists between cases and controls (gain
vs. normal vs. lost DNA), or whether a gene dosage effect is the main
determinant (i.e. two vs. three vs. more DNA copies).
While SNPs out of HWE are excluded from allelic association tests, this
criterion is not usually applied prior to the analytic detection of altered DNA.
Another issue similar to the common SNP threshold (MAF > 0.05) is that of
common CNVs. For example, it has been recently reported through the HapMap
project that more than 90% of the observed copy number differences are due to
common CNPs that segregate with a MAF > 1%, and more than 99% derive from
inheritance rather than new mutation. Thus, in comparison to SNP association
methods, the CNV ones only exclude markers with low call rate or those
associated with gender. Our CNV analyses allowed us to conclude that there is
no rule for a conditional duality between SNP and CNV associations. Susceptible
loci may or may not translate into CN association, depending on whether there is
a deleterious insertion/deletion or gene dosage effect. Given these scenarios, it
is necessary that both SNP and CNV analyses are conducted concomitantly, so
as to better evaluate their combined or distinct effect on disease. We found
examples of these two instances in regions of most interest for this disease. For
148
example, the strongest allelic and HTR signals from our current GWAS study do
not produce significant CNV association neither through the segment-based Chi-
square tests nor through the nonparametric gene-based Wilcoxon tests (i.e. for
ASTN1). However, in-situ hybridization findings corroborate the CNV results we
obtained, showing an overall increased expression of DNA copy number in both
affected and control retinal tissue. Thus, dosage effect rather than
insertion/deletion is what characterizes this particular region/gene. An alternative
test that could better quantify the effect of gene dosage would be a regression
analysis that accounts for the individual number of inherited copies. However,
such an analysis would benefit from a new coding of the depending variable,
such as size of drusen deposits or a step scale of AMD severity, rather than our
current dichotomous case/control categorization. Nonetheless, dosage effect
could be studied through a regression analysis on any dataset that offers
information on the severity of disease status (i.e. glucose levels in diabetes
patients, blood pressure, tumor size/type, etc). A second instance in our findings
was the positive SNP and CN association for a gene that is involved in one of the
most important AMD pathways.
However, one statistical aspect we observed from the copy number Chi-
square association results (segment based) was a somewhat moderate inflation
of significant p-values. A potential reason for obtaining such a departure from a
uniform distribution of p-values could be attributed to the strong correlation that is
expected to exist between neighboring segments of a region, or due to rare
149
CNVs (MAF < 1%) caused in many cases by singletons. For example, a recent
HapMap CNV study (McCarrol, 2008) observed that half of the CNV regions
were singleton CNVs. On the other hand, this inflation is not depicted in the
gene-based Wilcoxon tests, as it seems to be a more robust test which
incorporates all of the information within a gene as part of the designed test
procedure.
A second characteristic that needs to be mentioned is that within 500k
arrays, most of the CNV regions that overlap segmental duplications are very
complex in nature (i.e. tandem repeat inheritance from multiple ancestors) and
have therefore been dropped from the original HapMap project. While we tried to
identify SNPs that tag CNVs by conducting a singleton (SNP) hybridization
association with disease and then mapping them against copy number regions,
we were constrained by the diminished density of effectively typed SNPs that tag
the repeat-rich regions in which CNVs are also enriched. Therefore, we are
aware that due to the chip design itself, there is a paucity of SNPs in regions that
would otherwise be very informative CNPs but for now does not permit for a well
representation of copy number. Thus, there is an ascertainment bias that does
not permit us to accurately detect such regions and perform accurate SNP/CNV
correlation analyses. Nonetheless, the comparison of the two main analyses
(GWAS and CNV) has given us a good description of the types of inherited DNA
material in regions susceptible to disease. Moreover, both segment-based and
gene-based CNV association tests are in agreement for most of the GWAS AMD
150
susceptible regions of interest. Also, the accord between CNV and in-situ
hybridization confers optimism for future analytic studies that aim to understand
biological mechanisms yet do not have the means or circumstances to invest in
additional laboratory work (such as the in-situ hybridization).
6.3 Population Admixture: A third component of this research project was the
assessment of population admixture in the Latino LALES cohort. This has been a
very useful experience, mainly because of our access to both the multiethnic
cohort (MEC), through which we incorporated European, Asian, and African
descent samples, and also due to the genotyped collection of Native American
samples. Given the heterogeneity of the LALES cohort, with Latinos originating
from at least four main regions outside the US, we were able to distinguish quite
well the differences in ancestral background and population admixture between
different geographic regions. Furthermore, our Native American ancestry
patterns in Latinos from across different geographic regions agree with previous
migration patterns reported on the Native American population. Aside from the
ancestral estimates software and other informative parameters (i.e. allele
divergence measure F
st
) that are computed through the STRUCTURE software,
we found the homozygosity trend calculations to be very useful. The two
measures (ancestry estimates and homozygosity distribution) were
complementary tools that helped us describe patterns of admixture and the
151
effects of recent migration on ancestry (i.e. comparing Latinos originating in El
Salvador versus those born within the US). In addition to these two tests we were
also able to compare linkage disequilibrium patterns between loci positioned on
different chromosomes, as a print of recent population admixture. Our overall
assessment is that there is strong evidence for population admixture in Latinos,
and that this factor should certainly be incorporated in future association tests.
Because of our sample size constraint, we did not pursue this type of adjustment
at this time, but have proposed it as part of the future LALES grant which plans to
analyze a much larger sample size for this identical disease. Nonetheless, a
complete-linkage hierarchical clustering algorithm may be used to account for the
underlying population structure, however, at this time we did not make use of this
method.
In summary, based on the GWAS and population admixture studies, we
cannot determine with certainty whether the main pathways we described
characterize early AMD affecteds for both Caucasian and Latino populations, or
whether they are specific for Latinos alone. A future replication study with
adjustment for population structure will facilitate these distinctions. Since we
have not adjusted for population structure, it is probable that some of the allelic
results might be in fact false positives. However, this possibility is rather small for
this particular dataset, as the LALES ascertainment for cases and controls
matched individuals on origin of birth. Through our population admixture study
we found no significant difference in ancestry between cases and controls,
152
confirming in this case the well thought case:control matching design. Finally,
while the 500k Affymetrix chip was inevitably built for SNP rather than CNV
analysis, we have been able to use this data to (1) for develop and run CNV
association tests, and (2) to explain in large the relationship between some of the
susceptible AMD regions and their copy number. On a side note, the field of CNV
studies is relatively new and the current shortcoming is the paucity of available
tools and statistical tests necessary for completing an invasive analysis.
However, this setting will certainly improve with fast speed, as an overwhelming
number of studies and tools are currently built for the purpose of efficiently
analyzing both SNP and CNV data.
153
Chapter 7
Conclusions
The following section summarizes some of the main issues related to the study
design and methodologies used throughout this project. In short, these topics
emerge from (1) the implementation of various analytical methods, (2) the study
design and genotyped array, and (3) the study of population admixture.
7.1 Marker exclusion criteria for GWAS vs. CNV studies:
Allele Frequency criteria: While a threshold of 0.05 has been employed
routinely by previous GWA studies, in recent years investigators are instead
favoring the inclusion of all alleles that meet a call rate threshold of at least 0.95.
The inclusion of both rare and common alleles is now considered in view of
emerging population and CNV-based studies. As ethnic specific risk factors are
detected, and populations are associated with differential disease susceptibility, it
is often the case that the frequency of a mutation varies greatly among
ethnicities. For example, some of the well known Mendelian mutations are the
HbS variant (sickle cell trait), with frequencies ranging from 7.1% in sub-Saharan
Africans to 0.17% in Caucasians (based on College of Medicine at the University
of Illinois report), and the DF508 (cystic fibrosis) polymorphism, with a mean
carrier frequency of ~ 7.5% in northern European populations compared to 2.7%
154
in Turkey (Lucotte, Hazout et al. 1995). Also, while most studies on
neovascular AMD report MAF > 2% for the Y402H variant, this mutation has
ethnic variation; the 1277C allele frequency is much lower in
Chinese (AMD,
11.3%; controls, 2.8%) (Lau, Chen et al. 2006) than in Caucasian populations
(AMD, 55.9%–59.0%;
controls, 35.4%–39.9%) (Souied, Leveziel et al. 2005;
Magnusson, Duan et al. 2006; Sepp, Khan et al. 2006).
Despite this difference,
the CFH polymorphism is significantly associated with neovascular AMD in these
populations, contributing with an OR of 4.4 in Chinese, and with ORs of 2.17,
2.32, 5.1, 5.6, and 6.93, in Utah, Iceland, UK, North American, and French
populations, respectively (Hageman, Anderson et al. 2005; Haines, Hauser et al.
2005; Souied, Leveziel et al. 2005; Magnusson, Duan et al. 2006; Sepp, Khan et
al. 2006). Nonetheless, these ORs are also influenced by various population
specific environmental responses. This issue is especially relevant to our study
since Latinos are an admixed population, and high risk alleles within this
population should be compared with those observed in its founder populations.
The second reason for the inclusion of rare alleles is their role in studying
CNVs. Although a MAF threshold of 0.05 was initially implemented to ensure that
reported signals are common and therefore have a better chance of being
replicated, Redon et al. (2006) determined through comparative genomic
hybridization that rare alleles often characterize CNV regions. Previous studies
focused primarily on allelic association tests rather than CNVs and therefore
earlier Affymetrix chips discarded SNPs from complex or non-Mendelian regions
155
of inheritance. Given the surging number of CNV projects, newly designed
arrays (i.e. SNP5.0 and 6.0) have incorporated ~ 100K probes that are
distributed in known regions of CNV. As CNV regions are identified and
genotyped, it will be interesting to determine the proportion of deleterious (or trait
influencing) yet rare SNPs identified through GWA studies that are also
correlated with CNVs.
HWE criteria: We excluded SNPs in Hardy-Weinberg disequilibrium from the
GWAS, but not from the CNV association study, as many CNVs do not respect a
Mendelian inheritance. Future arrays will enable a better assessment of the
proportion of SNPs in HW disequilibrium that are related to CNVs, and of how
many of these loci or CNVs have an effect on phenotype. Since the 500K
Affymetrix platform was not designed to include SNPs located in well
documented CNV regions, this assessment was not pursued in the context of
CNV association tests.
7.2 Factors that may induce the QQ-plot difference between segment-
and gene-based CNV association tests: We observed a departure from a
uniform distribution for the segment-based Chi-square test p-values, with a
somewhat moderate inflation in significant p-values, whereas gene-based
Wilcoxon p-values conformed to a relatively uniform distribution. For N tests with
a preset significance level α, approximately αN tests would appear significant by
156
chance alone. Based on a total of ~ 82,000 shared genome-wide
segments/tests and for α = 0.05 we would expect ~ 4100 false positives.
Approximately 10% of Chi-square tests and ~ 3.6% of the Wilcoxon tests (499 of
13826 tested genes) were significant at the unadjusted 0.05 level.
For the Chi-square method we tested the set of DNA fragments obtained
through the intersection of all CNV breakpoints across cases and controls. High
variation in copy states or large number of singletons creates a large number of
breakpoints. As a note regarding the high number of segments detected within
this analysis, a recent HapMap CNV study (McCarrol et al., 2008) observed that
fully half of the CNV regions were singleton CNVs. We observed on average
60% of CNVs to be singletons. Our higher percentage could be attributed to a
high sensitivity and/or low FDR GADA parameter setting. However, the CNAM
based segmentation produced a similar amount of segments to the GADA
algorithm. It is plausible that this discrepancy may be due to the admixed nature
of Latinos. The main factor driving the increase in significant Chi-square p-values
is the large number of minimally shared segments for which cell counts of
deleted, normal, or inserted DNA are very low. On average, 86% of all segments
had ‘lost’/’gained’ DNA cells collapsed into ‘changed’ DNA, transforming the
2
tests from 2 d.f. (lost vs. normal. vs. gained copy number) to 1 d.f. (normal vs.
change). Even after this we still obtained a high percentage of cells with a low
‘change’ count (13% of all ‘change’ cells were < 5; 20% were < 10). Based on
these findings, we conclude that segment-based CNV association tests should
157
be interpreted with caution, since many segments include sparse data for
loss or gain of copy number. In principle, such tests should be performed using
Fisher’s Exact Test, but computational considerations made this impossible in
our case. We note though that for those segments with high expected cell
counts, the test seems reliable. Still, the issue of correcting for multiple
comparisons and for correlation between adjacent loci remains to be addressed.
For our second approach we adopted a gene-based Wilcoxon testing that
allowed us to (1) avoid the large number of segments/tests and sparse data, and
(2) eliminate some of the correlation effect among loci from within neighboring
regions (i.e. intragenic). The uniform distribution of resulting Wilcoxon p-values
may be attributed to the fact that it eliminates the sparse data problem discussed
earlier. Another advantage is that it lowers the number of tests being performed.
7.3 Role of Normalization in Conducting CNV analysis: Normalization of raw
intensity values is a first step in conducting CNV analysis, and it is designed to
correct for artificial noise introduced at the SNP or sample level through either
batch effects or hardware defects. For the GAW data we ran two normalization
schemes, through the Affymetrix Power Tools (APT) and the CNAM implemented
algorithms. APT uses a median normalization approach, while CNAM uses a
proprietary scheme that is not clearly disclosed. A scatter plot of the normalized
intensities from these two programs showed a very poor correlation. This
strikingly different normalization outcome obviously affects all of the downstream
158
analyses, from copy number calling (even if both APT and CNAM used
similar calling methods) to the resulting Wilcoxon gene-based association p-
values we tested. We note that the mean intensity cutoff points for defining
normal copy differ between the two algorithms (i.e. -0.1 < mean intensity < 0.1
for GADA; -0.01 < mean intensity < 0.01 for most CNAM chromosome output
distributions). Nonetheless, it is most likely the case that both the GADA and
CNAM algorithms transform the mean intensity into copy number such that a
copy number of two is indeed called for most of the normally inherited SNPs. A
correct approach for comparing the two programs would therefore be to apply the
copy number calling on the same normalized data; however, such manipulation
was not possible at this time, given the restrictions of the CNAM data type import
and running procedure. This experience has led us to conclude that there is a
critical need for a better understanding of the different normalizing techniques,
and of a detailed comparison between these methods.
7.4 Other tests to be considered when performing CNV analyses: A variety
of tests can be constructed for testing CNV differences between cases and
controls. CNVs can affect phenotypes in many ways, by altering gene products
through either gene dosage alteration, disruption of genes, or through positional
effects. We tested for associations using Chi-square and Wilcoxon tests.
Numerous other theoretical methods could certainly be extended, such as CNV
regressions analysis, ancestry specific CNV studies, and CNV-environment
159
interactions. Individual summary statistics similar to the ones we described
for the Wilcoxon tests (i.e. average copy number) could be incorporated as
predictive covariates in a regression model that evaluates gene dosage effect,
provided that there exists a continuous classification for the degree of disease
severity (i.e. glucose levels in diabetes patients, blood pressure, tumor size,
drusen size in AMD affecteds, etc.).
For admixed populations such as Latinos, one could contrast CNV
patterns between subgroups with high variation in ancestry (i.e. Mexican Latinos
vs. Puerto Rican Latinos), and could further assess the effect of ancestry on
estimating segmentation breakpoints. For example, 26% of Caucasians vs. 63%
of African-Americans had ≥ 3 SULT1A1 copies in a recent CNV ancestry based
study (Hebbring et al., 2007). Analogous to ethnic specific allele frequencies
identified in allelic association studies, location and frequency of ethnic specific
CNVs should be incorporated in CNV association tests as a means for
addressing the issue of false positives in admixed cohorts.
7.5 What sort of study design should be considered when studying Latino
populations? The Latino population is admixed; a factor that may induce false
positive associations if not adjusted for in the analysis. However, in the LALES
cohort we detected a wide variation in levels of admixture and ancestral
proportions among Latinos themselves, depending on their birth/migration origin.
Region-based ancestry estimates, homozygosity trends, and association strength
160
between unlinked loci suggest that a ‘Latino’ classification alone is not fully
informative. We conclude that a study comprised of Latinos should also match
cases and controls on birth origin, since the difference between Native American
and European ancestry was as small as 0.3% or as large as 22.1 % for two
different geographic regions of our cohort. This differentiation is especially
relevant if genetic and biochemical pathways are believed to be ethnic specific
and can influence both rates of disease and genetic risk factors. A second
inference of our study was the influence of sample size on ancestry estimates.
An artificial inflation of smaller samples produced different estimates from those
based on original sample sizes or on smaller samples drawn from original sets.
As has been formerly suggested by other authors, we determined from our
simulations that a minimal sample size of 25 individuals was indeed sufficient for
estimating population ancestry.
7.6 Benefits of haplotype trend regression (HTR) analysis: A drawback of the
LALES project was the small sample size (n = 300) for the 500K Affymetrix array,
and therefore a lack of power for detecting single allele significance at the 10
-8
level. Since haplotype analysis may increase the test power by exploiting the LD
between SNPs, we employed HTR as an alternative tool to the single allele
association study. Some of the haplotypes we report enclose SNPs that rank
highest in the single allelic tests, while others capture SNPs whose single effects
are not highly significant (or are non-significant). Even though SNPs identified
161
through HTR may in fact be in LD with the causative allele(s), haplotype
analysis allowed us to narrow down regions that may enclose a true risk factor.
Still, haplotype analysis may suffer from a loss of statistical power due to the
increased level of multiple comparison correction that is necessary. Also, if
markers are in perfect or high LD, it may not be statistically feasible to determine
the causative SNP(s). In an attempt to verify the validity of HTR results we
performed a permutation test through which we calculated the chromosome-wide
p-value for a given haplotype as the proportion of permuted replicates in which
the smallest p-value across all haplotypes on that chromosome was smaller than
the p-value observed for that haplotype on the original data.
162
References
Abid, A., M. Ismail, et al. (2006). "Identification of novel mutations in the SEMA4A
gene associated with retinal degenerative diseases." J Med Genet 43(4):
378-81.
Allikmets, R., N. F. Shroyer, et al. (1997). "Mutation of the Stargardt disease
gene (ABCR) in age-related macular degeneration." Science 277(5333):
1805-7.
Altshuler, D., L. Kruglyak, et al. (1998). "Genetic polymorphisms and disease." N
Engl J Med 338(22): 1626.
Ambati, J., A. Anand, et al. (2003). "An animal model of age-related macular
degeneration in senescent Ccl-2- or Ccr-2-deficient mice." Nat Med 9(11):
1390-7.
Anderson, D. H., K. C. Talaga, et al. (2004). "Characterization of beta amyloid
assemblies in drusen: the deposits associated with aging and age-related
macular degeneration." Exp Eye Res 78(2): 243-56.
Azofeifa, J., M. Hahn, et al. (2004). "The STR polymorphism (AAAAT)n within the
intron 1 of the tumor protein 53 (TP53) locus in 17 populations of different
ethnic groups of Africa, America, Asia and Europe." Rev Biol Trop 52(3):
645-57.
Bahlo, M., J. Stankovich, et al. (2006). "Detecting genome wide haplotype
sharing using SNP or microsatellite haplotype data." Hum Genet 119(1-2):
38-50.
Balding, D. J. (2006). "A tutorial on statistical methods for population association
studies." Nat Rev Genet 7(10): 781-91.
Bernardi, F., P. Arcieri, et al. (1997). "Contribution of factor VII genotype to
activated FVII levels. Differences in genotype frequencies between
northern and southern European populations." Arterioscler Thromb Vasc
Biol 17(11): 2548-53.
Bohmer C, P. M., Rajamanickam J, Jeyaraj S, Lang F (2006). "Mechanisms
underlying the regulation of neuronal and glial amino acid transport by the
serum and glucocorticoid inducible kinase SGK." Acta Physiologica
188(651).
163
Broet, P. and S. Richardson (2006). "Detection of gene copy number
changes in CGH microarrays using a spatially correlated mixture model."
Bioinformatics 22(8): 911-8.
Busjahn, A., A. Aydin, et al. (2002). "Serum- and glucocorticoid-regulated kinase
(SGK1) gene and blood pressure." Hypertension 40(3): 256-60.
Campbell, C. D., E. L. Ogburn, et al. (2005). "Demonstrating stratification in a
European American population." Nat Genet 37(8): 868-72.
Cardon, L. R. and L. J. Palmer (2003). "Population stratification and spurious
allelic association." Lancet 361(9357): 598-604.
Castellano, M. (2006). "Geographic ancestry, angiotensinogen gene
polymorphism, and cardiovascular risk." Hypertension 48(4): 562-3.
Choudhry, S., N. E. Coyle, et al. (2006). "Population stratification confounds
genetic association studies among Latinos." Hum Genet 118(5): 652-64.
Choudhry, S., M. Taub, et al. (2008). "Genome-wide screen for asthma in Puerto
Ricans: evidence for association with 5q23 region." Hum Genet 123(5):
455-68.
Clark, S. J., V. A. Higman, et al. (2006). "His-384 allotypic variant of factor H
associated with age-related macular degeneration has different heparin
binding properties from the non-disease-associated form." J Biol Chem
281(34): 24713-20.
Clayton, D. G., N. M. Walker, et al. (2005). "Population structure, differential bias
and genomic control in a large-scale, case-control association study." Nat
Genet 37(11): 1243-6.
Clee, S. M., A. H. Zwinderman, et al. (2001). "Common genetic variation in
ABCA1 is associated with altered lipoprotein levels and a modified risk for
coronary artery disease." Circulation 103(9): 1198-205.
Collins-Schramm, H., B. Chima, et al. (2004). "Mexican American ancestry-
informative markers: examination of population structure and marker
characteristics in European Americans, Mexican Americans, Amerindians
and Asians." Human Genetics 114: 263-271.
164
Conley, Y. P., A. Thalamuthu, et al. (2005). "Candidate gene analysis
suggests a role for fatty acid biosynthesis and regulation of the
complement system in the etiology of age-related maculopathy." Hum Mol
Genet 14(14): 1991-2002.
Conrad, D. F., M. Jakobsson, et al. (2006). "A worldwide survey of haplotype
variation and linkage disequilibrium in the human genome." Nat Genet
38(11): 1251-60.
De Bakker, P. I., N. P. Burtt, et al. (2006). "Transferability of tag SNPs in genetic
association studies in multiple populations." Nat Genet 38(11): 1298-303.
De Jong, F. J., M. K. Ikram, et al. (2007). "Complement factor h polymorphism,
inflammatory mediators, and retinal vessel diameters: the rotterdam
study." Invest Ophthalmol Vis Sci 48(7): 3014-8.
Dentchev, T., A. H. Milam, et al. (2003). "Amyloid-beta is found in drusen from
some age-related macular degeneration retinas, but not in drusen from
normal retinas." Mol Vis 9: 184-90.
Dinu, V., P. L. Miller, et al. (2007). "Evidence for association between multiple
complement pathway genes and AMD." Genet Epidemiol 31(3): 224-37.
Duerr, R. H., K. D. Taylor, et al. (2006). "A genome-wide association study
identifies IL23R as an inflammatory bowel disease gene." Science
314(5804): 1461-3.
Excoffier, L. and M. Slatkin (1995). "Maximum-likelihood estimation of molecular
haplotype frequencies in a diploid population." Mol Biol Evol 12(5): 921-7.
F. Luca, M. Z., S. Kashyap, S. Conzen, A. Di Rienzo (2007). SGK expression is
increased by an ancestral allele showing a latitudinal cline in human
populations. ASHG, San Diego, CA.
Falush, D., M. Stephens, et al. (2003). "Inference of population structure using
multilocus genotype data: linked loci and correlated allele frequencies."
Genetics 164(4): 1567-87.
Falush, D., M. Stephens, et al. (2007). "Inference of population structure using
multilocus genotype data: dominant markers and null alleles." Molecular
Ecology Notes.
Ferrara, N., H. P. Gerber, et al. (2003). "The biology of VEGF and its receptors."
Nat Med 9(6): 669-76.
165
Feuk, L., J. R. MacDonald, et al. (2005). "Discovery of human inversion
polymorphisms by comparative analysis of human and chimpanzee DNA
sequence assemblies." PLoS Genet 1(4): e56.
Frank, R. N., R. H. Amin, et al. (1996). "Basic fibroblast growth factor and
vascular endothelial growth factor are present in epiretinal and choroidal
neovascular membranes." Am J Ophthalmol 122(3): 393-403.
Fraser-Bell, S., J. Wu, et al. (2006). "Smoking, alcohol intake, estrogen use, and
age-related macular degeneration in Latinos: the Los Angeles Latino Eye
Study." Am J Ophthalmol 141(1): 79-87.
Fridlyand, J. e. a. (2004). "Hidden markov models approach to the analysis of
array cgh data." J. Multivariate Anal.(90 ): 132-153.
Galis, Z. S. (1994). "Increased expression of matrix metalloproteinases and
matrix degrading activity in vulnerable regions of human atherosclerotic
plaques." J. Clin. Invest. 94: 2493–2503.
Goldberg, J. L., M. E. Vargas, et al. (2004). "An oligodendrocyte lineage-specific
semaphorin, Sema5A, inhibits axon growth by retinal ganglion cells." J
Neurosci 24(21): 4989-99.
Goldstein, L. E., J. A. Muffat, et al. (2003). "Cytosolic beta-amyloid deposition
and supranuclear cataracts in lenses from people with Alzheimer's
disease." Lancet 361(9365): 1258-65.
Grassi, M. A., J. H. Fingert, et al. (2006). "Ethnic variation in AMD-associated
complement factor H polymorphism p.Tyr402His." Hum Mutat 27(9):921-5.
Graubert, T. A., P. Cahan, et al. (2007). "A High-Resolution Map of Segmental
DNA Copy Number Variation in the Mouse Genome." PLoS Genetics
3(1): e3.
Gudas, L. J. (1994). "Retinoids and vertebrate development." J Biol Chem
269(22): 15399-402.
Gudmundsson, J., P. Sulem, et al. (2007). "Genome-wide association study
identifies a second prostate cancer susceptibility variant at 8q24." Nat
Genet 39(5): 631-7.
Gupta, K., S. Kshirsagar, et al. (1999). "VEGF prevents apoptosis of human
microvascular endothelial cells via opposing effects on MAPK/ERK and
SAPK/JNK signaling." Exp Cell Res 247(2): 495-504.
166
Hageman, G. S., D. H. Anderson, et al. (2005). "A common haplotype in
the complement regulatory gene factor H (HF1/CFH) predisposes
individuals to age-related macular degeneration." Proc Natl Acad Sci U S
A 102(20): 7227-32.
Haiman, C. A., D. O. Stram, et al. (2003). "A comprehensive haplotype analysis
of CYP19 and breast cancer risk: the Multiethnic Cohort." Hum. Mol.
Genet. 12(20): 2679-2692.
Haines, J. L., M. A. Hauser, et al. (2005). "Complement factor H variant
increases the risk of age-related macular degeneration." Science
308(5720): 419-21.
Haines, J. L., N. Schnetz-Boutaud, et al. (2006). "Functional candidate genes in
age-related macular degeneration: significant association with VEGF,
VLDLR, and LRP6." Invest Ophthalmol Vis Sci 47(1): 329-35.
Hartl, D. L. (1988 ). A Primer of Population Genetics. Sunderland, MA. , Sinauer
Associates, Inc.
Helgason, A., B. Yngvadottir, et al. (2005). "An Icelandic example of the impact of
population structure on association studies." Nat Genet 37(1): 90-95.
Herrera, V. L., T. Didishvili, et al. (2004). "Genome-wide scan identifies novel
QTLs for cholesterol and LDL levels in F2[Dahl RxS]-intercross rats." Circ
Res 94(4): 446-52.
Higgins, e. a. G. T. (2003). "Induction of angiogenic cytokine expression in
cultured RPE by ingestion of oxidized photoreceptor outer segments."
Invest. Ophthalmol. Vis. Sci. 44: 1775–1782.
Hinds, D. A., R. P. Stokowski, et al. (2004). "Matching strategies for genetic
association studies in structured populations." Am J Hum Genet 74(2):
317-25.
Hsu, L., S. G. Self, et al. (2005). "Denoising array-based comparative genomic
hybridization data using wavelets." Biostatistics 6(2): 211-26.
Huang, J., W. Wei, et al. (2004). "Whole genome DNA copy number changes
identified by high density oligonucleotide arrays." Hum Genomics 1(4):
287-99.
Huang, T., B. Wu, et al. (2005). "Detection of DNA copy number alterations using
penalized least squares regression." Bioinformatics 21(20): 3811-7.
167
Johnson, L. V., W. P. Leitner, et al. (2002). "The Alzheimer's A beta -
peptide is deposited at sites of complement activation in pathologic
deposits associated with aging and age-related macular degeneration."
Proc Natl Acad Sci U S A 99(18): 11830-5.
Kallioniemi, A., O. P. Kallioniemi, et al. (1992). "Comparative genomic
hybridization for molecular cytogenetic analysis of solid tumors." Science
258(5083): 818-21.
Kang, H., K. Cui, et al. (2004). "BRG1 controls the activity of the retinoblastoma
protein via regulation of p21CIP1/WAF1/SDI." Mol Cell Biol 24(3):1188-99.
Kang Zhang, M. K., Min Han, Wen Li, Zhengya Yu, Zhenglin Yang, Yang Li,
Michael L., R. A. Metzker, Donald J. Zack, Laura E. Kakuk, Pamela S.
Lagali, Paul W. Wong, Ian M., et al. (2001). "A 5-bp deletion in ELOVL4 is
associated with two related forms of autosomal dominant macular
dystrophy." Nature 47(Letter).
Kittles, R. A., W. Chen, et al. (2002). "CYP3A4-V and prostate cancer in African
Americans: causal or confounding association because of population
stratification?" Hum Genet 110(6): 553-60.
Klein, M. L. and P. J. Francis (2003). "Genetics of age-related macular
degeneration." Ophthalmol Clin North Am 16(4): 567-74.
Klein, R., M. D. Davis, et al. (1991). "The Wisconsin age-related maculopathy
grading system." Ophthalmology 98(7): 1128-34.
Klein, R., Y. Deng, et al. (2007). "Cardiovascular disease, its risk factors and
treatment, and age-related macular degeneration: Women's Health
Initiative Sight Exam ancillary study." Am J Ophthalmol 143(3): 473-83.
Klein, R., B. E. Klein, et al. (2007). "Fifteen-year cumulative incidence of age-
related macular degeneration: the Beaver Dam Eye Study."
Ophthalmology 114(2): 253-62.
Klein, R., B. E. Klein, et al. (2006). "Prevalence of age-related macular
degeneration in 4 racial/ethnic groups in the multi-ethnic study of
atherosclerosis." Ophthalmology 113(3): 373-80.
Klein, R., B. E. Klein, et al. (1992). "Prevalence of age-related maculopathy. The
Beaver Dam Eye Study." Ophthalmology 99(6): 933-43.
168
Klein, R., M. D. Knudtson, et al. (2007). "Statin use and the five-year
incidence and progression of age-related macular degeneration." Am J
Ophthalmol 144(1): 1-6.
Klein, R., T. Peto, et al. (2004). "The epidemiology of age-related macular
degeneration." Am J Ophthalmol 137(3): 486-95.
Klein, R. J., C. Zeiss, et al. (2005). "Complement factor H polymorphism in age-
related macular degeneration." Science 308(5720): 385-9.
Kolonel, L. N., B. E. Henderson, et al. (2000). "A multiethnic cohort in Hawaii and
Los Angeles: baseline characteristics." Am J Epidemiol 151(4): 346-57.
Krueger, S. K., L. K. Siddens, et al. (2004). "Differences in FMO2*1 allelic
frequency between Hispanics of Puerto Rican and Mexican descent."
Drug Metab Dispos 32(12): 1337-40.
Krzystolik, M. G., M. A. Afshari, et al. (2002). "Prevention of experimental
choroidal neovascularization with intravitreal anti-vascular endothelial
growth factor antibody fragment." Arch Ophthalmol 120(3): 338-46.
Kvanta, A. (1995). "Expression and regulation of vascular endothelial growth
factor in choroidal fibroblasts." Curr Eye Res 14(11): 1015-20.
Lander, E. S. and N. J. Schork (1994). "Genetic dissection of complex traits."
Science 265(5181): 2037-48.
Lau, L. I., S. J. Chen, et al. (2006). "Association of the Y402H polymorphism in
complement factor H gene and neovascular age-related macular
degeneration in Chinese patients." Invest Ophthalmol Vis Sci 47(8): 3242-
6.
Law, B., J. S. Buckleton, et al. (2003). "Effects of Population Structure and
Admixture on Exact Tests for Association Between Loci." Genetics 164(1):
381-387.
Lee, J. A., C. M. Carvalho, et al. (2007). "A DNA replication mechanism for
generating nonrecurrent rearrangements associated with genomic
disorders." Cell 131(7): 1235-47.
Leske, M. C., S. Y. Wu, et al. (2006). "Nine-year incidence of age-related
macular degeneration in the Barbados Eye Studies." Ophthalmology
113(1): 29-35.
169
Li, M., P. Atmaca-Sonmez, et al. (2006). "CFH haplotypes without the
Y402H coding variant show strong association with susceptibility to age-
related macular degeneration." Nat Genet 38(9): 1049-54.
Lipson, D., Y. Aumann, et al. (2006). "Efficient calculation of interval scores for
DNA copy number data analysis." J Comput Biol 13(2): 215-28.
Lopez, P. F., B. D. Sippy, et al. (1996). "Transdifferentiated retinal pigment
epithelial cells are immunoreactive for vascular endothelial growth factor in
surgically excised age-related macular degeneration-related choroidal
neovascular membranes." Invest Ophthalmol Vis Sci 37(5): 855-68.
Lucotte, G., S. Hazout, et al. (1995). "Complete map of cystic fibrosis mutation
DF508 frequencies in Western Europe and correlation between mutation
frequencies and incidence of disease." Hum Biol 67(5): 797-803.
Luibl, V., J. M. Isas, et al. (2006). "Drusen deposits associated with aging and
age-related macular degeneration contain nonfibrillar amyloid oligomers."
J Clin Invest 116(2): 378-85.
Magnusson, K. P., S. Duan, et al. (2006). "CFH Y402H confers similar risk of soft
drusen and both forms of advanced AMD." PLoS Med 3(1): e5.
Marioni, J. C., N. P. Thorne, et al. (2006). "BioHMM: a heterogeneous hidden
Markov model for segmenting array CGH data." Bioinformatics 22(9):
1144-6.
Marrero, A. R., F. P. Das Neves Leite, et al. (2005). "Heterogeneity of the
genome ancestry of individuals classified as White in the state of Rio
Grande do Sul, Brazil." Am J Hum Biol 17(4): 496-506.
Mata, N. L., J. Weng, et al. (2000). "Biosynthesis of a major lipofuscin
fluorophore in mice and humans with ABCR-mediated retinal and macular
degeneration." Proc Natl Acad Sci U S A 97(13): 7154-9.
McCarroll, S. A., F. G. Kuruvilla, et al. (2008). "Integrated detection and
population-genetic analysis of SNPs and copy number variation." Nat
Genet 40(10): 1166-74.
McPherson, R., A. Pertsemlidis, et al. (2007). "A common allele on chromosome
9 associated with coronary heart disease." Science 316(5830): 1488-91.
170
Menotti, A., M. Lanti, et al. (2000). "Coronary heart disease incidence in
northern and southern European populations: a reanalysis of the seven
countries study for a European coronary risk chart." Heart 84(3): 238-44.
Mitchell, P., W. Smith, et al. (1995). "Prevalence of age-related maculopathy in
Australia. The Blue Mountains Eye Study." Ophthalmology 102(10): 1450-
60.
Mullins, R. F., S. R. Russell, et al. (2000). "Drusen associated with aging and
age-related macular degeneration contain proteins common to
extracellular deposits associated with atherosclerosis, elastosis,
amyloidosis, and dense deposit disease." Faseb J 14(7): 835-46.
Nannya, Y., M. Sanada, et al. (2005). "A robust algorithm for copy number
detection using high-density oligonucleotide single nucleotide
polymorphism genotyping arrays." Cancer Res 65(14): 6071-9.
Nicolai, U. and C. Eckardt (1993). "The occurrence of macrophages in the retina
and periretinal tissues in ocular diseases." Ger J Ophthalmol 2(4-5): 195-
201.
Okuizumi, K., O. Onodera, et al. (1995). "Genetic association of the very low
density lipoprotein (VLDL) receptor gene with sporadic Alzheimer's
disease." Nat Genet 11(2): 207-9.
Olshen, A. B., E. S. Venkatraman, et al. (2004). "Circular binary segmentation for
the analysis of array-based DNA copy number data." Biostatistics 5(4):
557-72.
Pfaff, C. L., E. J. Parra, et al. (2001). "Population structure in admixed
populations: effect of admixture dynamics on the pattern of linkage
disequilibrium." Am J Hum Genet 68(1): 198-207.
Pike, M. C., L. N. Kolonel, et al. (2002). "Breast cancer in a multiethnic cohort in
Hawaii and Los Angeles: risk factor-adjusted incidence in Japanese
equals and in Hawaiians exceeds that in whites." Cancer Epidemiol
Biomarkers Prev 11(9): 795-800.
Pique-Regi, R., J. Monso-Varona, et al. (2008). "Sparse representation and
Bayesian detection of genome copy number alterations from microarray
data." Bioinformatics 24(3): 309-18.
Pollack, J. R., C. M. Perou, et al. (1999). "Genome-wide analysis of DNA copy-
number changes using cDNA microarrays." Nat Genet 23(1): 41-6.
171
Price, A. L., J. Butler, et al. (2008). "Discerning the Ancestry of European
Americans in Genetic Association Studies." PLoS Genetics 4(1): e236.
Price, A. L., N. Patterson, et al. (2007). "A genomewide admixture map for Latino
populations." Am J Hum Genet 80(6): 1024-36.
Pritchard, J. K. and P. Donnelly (2001). "Case-control studies of association in
structured or admixed populations." Theor Popul Biol 60(3): 227-37.
Pritchard, J. K., M. Stephens, et al. (2000). "Inference of population structure
using multilocus genotype data." Genetics 155(2): 945-59.
Pritchard, J. K., M. Stephens, et al. (2000). "Association mapping in structured
populations." Am J Hum Genet 67(1): 170-81.
Purcell, S., B. Neale, et al. (2007). "PLINK: a tool set for whole-genome
association and population-based linkage analyses." Am J Hum Genet
81(3): 559-75.
Redon, R., S. Ishikawa, et al. (2006). "Global variation in copy number in the
human genome." Nature 444(7118): 444-54.
Reiner, A. P., E. Ziv, et al. (2005). "Population structure, admixture, and aging-
related phenotypes in African American adults: the Cardiovascular Health
Study." Am J Hum Genet 76(3): 463-77.
Rioux, J. D., R. J. Xavier, et al. (2007). "Genome-wide association study
identifies new susceptibility loci for Crohn disease and implicates
autophagy in disease pathogenesis." Nat Genet 39(5): 596-604.
Rosenberg, N. A. (2005). "Algorithms for selecting informative marker panels for
population assignment." J Comput Biol 12(9): 1183-201.
Rosenberg, N. A., L. M. Li, et al. (2003). "Informativeness of genetic markers for
inference of ancestry." Am J Hum Genet 73(6): 1402-22.
Rosenberg, N. A., J. K. Pritchard, et al. (2002). "Genetic structure of human
populations." Science 298(5602): 2381-5.
Ruberti, J. W., C. A. Curcio, et al. (2003). "Quick-freeze/deep-etch visualization
of age-related lipid accumulation in Bruch's membrane." Invest
Ophthalmol Vis Sci 44(4): 1753-9.
172
Salari, K., S. Choudhry, et al. (2005). "Genetic admixture and asthma-
related phenotypes in Mexican American and Puerto Rican asthmatics."
Genet Epidemiol 29(1): 76-86.
SanGiovanni, J. P., E. Y. Chew, et al. (2007). "The relationship of dietary
carotenoid and vitamin A, E, and C intake with age-related macular
degeneration in a case-control study: AREDS Report No. 22." Arch
Ophthalmol 125(9): 1225-32.
Saxena, R., B. F. Voight, et al. (2007). "Genome-wide association analysis
identifies loci for type 2 diabetes and triglyceride levels." Science
316(5829): 1331-6.
Schmid, M., M. Sen, et al. (2000). "A methylthioadenosine phosphorylase
(MTAP) fusion transcript identifies a new gene on chromosome 9p21 that
is frequently deleted in cancer." Oncogene 19(50): 5747-54.
Schmidt, S., A. M. Saunders, et al. (2000). "Association of the apolipoprotein E
gene with age-related macular degeneration: possible effect modification
by family history, age, and gender." Mol Vis 6: 287-93.
Scott, L. J., K. L. Mohlke, et al. (2007). "A genome-wide association study of type
2 diabetes in Finns detects multiple susceptibility variants." Science
316(5829): 1341-5.
Sepp, T., J. Khan, et al. (2006). "Complement factor H variant Y402H is a major
risk determinant for geographic atrophy and choroidal neovascularization
in smokers and nonsmokers." Invest Ophthalmol Vis Sci 47(536-540).
Shaffer, J., C. Kammerer, et al. (2007). "Genetic markers for ancestry are
correlated with body composition traits in older African Americans."
Osteoporosis International 18: 733-741.
Shan, Z., T. Parker, et al. (2004). "Identifying novel homozygous deletions by
microsatellite analysis and characterization of tumor suppressor candidate
1 gene, TUSC1, on chromosome 9p in human lung cancer." Oncogene
23(39): 6612-20.
Singaraja, R. R., C. Fievet, et al. (2002). "Increased ABCA1 activity protects
against atherosclerosis." J Clin Invest 110(1): 35-42.
Singh, K. K., S. Ristau, et al. (2005). "Mapping of a macular drusen susceptibility
locus in rhesus macaques to the homologue of human chromosome 6q14-
15." Exp Eye Res 81(4): 401-6.
173
Sivaprasad S, Chong NV. (2006). "The complement system and age-
related macular degeneration." Eye 20(8): 867-72.
Sladek, R., G. Rocheleau, et al. (2007). "A genome-wide association study
identifies novel risk loci for type 2 diabetes." Nature 445(7130): 881-5.
Smith, L. E., W. Shen, et al. (1999). "Regulation of vascular endothelial growth
factor-dependent retinal neovascularization by insulin-like growth factor-1
receptor." Nat Med 5(12): 1390-5.
Smith, M. A., P. L. Richey Harris, et al. (1997). "Widespread peroxynitrite-
mediated damage in Alzheimer's disease." J Neurosci 17(8): 2653-7.
Smith, M. W., N. Patterson, et al. (2004). "A high-density admixture map for
disease gene discovery in african americans." Am J Hum Genet 74(5):
1001-13.
Souied, E. H., P. Benlian, et al. (1998). "The epsilon4 allele of the apolipoprotein
E gene as a potential protective factor for exudative age-related macular
degeneration." Am J Ophthalmol 125(3): 353-9.
Souied, E. H., N. Leveziel, et al. (2005). "Y402H complement factor H
polymorphism associated with exudative age-related macular
degeneration in the French population." Mol Vis 11: 1135-40.
Spilsbury, K., K. L. Garrett, et al. (2000). "Overexpression of vascular endothelial
growth factor (VEGF) in the retinal pigment epithelium leads to the
development of choroidal neovascularization." Am J Pathol 157(1): 135-
44.
Steinthorsdottir, V., G. Thorleifsson, et al. (2007). "A variant in CDKAL1
influences insulin response and risk of type 2 diabetes." Nat Genet 39(6):
770-5.
Stranger, B. E., M. S. Forrest, et al. (2007). "Relative impact of nucleotide and
copy number variation on gene expression phenotypes." Science
315(5813): 848-53.
Strobeck, M. W., D. N. Reisman, et al. (2002). "Compensation of BRG-1 function
by Brm: insight into the role of the core SWI-SNF subunits in
retinoblastoma tumor suppressor signaling." J Biol Chem 277(7): 4782-9.
174
Sugimoto, M., A. Fujikawa, et al. (2006). "Evidence that bovine forebrain
embryonic zinc finger-like gene influences immune response associated
with mastitis resistance." Proc Natl Acad Sci U S A 103(17): 6454-9.
Tedeschi-Blok, N., J. Buckley, et al. (2007). "Population-based study of early
age-related macular degeneration: role of the complement factor H Y402H
polymorphism in bilateral but not unilateral disease." Ophthalmology
114(1): 99-103.
Thakkinstian, A., P. Han, et al. (2006). "Systematic review and meta-analysis of
the association between complement factor H Y402H polymorphisms and
age-related macular degeneration." Hum Mol Genet 15(18): 2784-90.
Thomas, D. C. and J. S. Witte (2002). "Point: population stratification: a problem
for case-control studies of candidate-gene associations?" Cancer
Epidemiol Biomarkers Prev 11(6): 505-12.
Trommsdorff, M., M. Gotthardt, et al. (1999). "Reeler/Disabled-like disruption of
neuronal migration in knockout mice lacking the VLDL receptor and ApoE
receptor 2." Cell 97(6): 689-701.
Tsai, H. J., J. Y. Kho, et al. (2006). "Admixture-matched case-control study: a
practical approach for genetic association studies in admixed populations."
Hum Genet 118(5): 626-39.
US Census Bureau (2007). Hispanic Heritage Month 2007.
Varma, R., S. Fraser-Bell, et al. (2004). "Prevalence of age-related macular
degeneration in Latinos: the Los Angeles Latino eye study."
Ophthalmology 111(7): 1288-97.
Varma, R., S. H. Paz, et al. (2004). "The Los Angeles Latino Eye Study: design,
methods, and baseline data." Ophthalmology 111(6): 1121-31.
Varma, R. and M. Torres (2004). "Prevalence of lens opacities in Latinos: the Los
Angeles Latino Eye Study." Ophthalmology 111(8): 1449-56.
Varma, R., M. Ying-Lai, et al. (2004). "Prevalence and risk indicators of visual
impairment and blindness in Latinos: the Los Angeles Latino Eye Study."
Ophthalmology 111(6): 1132-40.
Wacholder, S., N. Rothman, et al. (2000). "Population stratification in
epidemiologic studies of common genetic variants and cancer:
quantification of bias." J Natl Cancer Inst 92(14): 1151-8.
175
Wakeley, J. and S. Lessard (2003). "Theory of the effects of population
structure and sampling on patterns of linkage disequilibrium applied to
genomic data from humans." Genetics 164(3): 1043-53.
Wald, G. (1951). "The chemistry of rod vision." Science 113(2933): 287-91.
Wallace, M. N., J. G. Geddes, et al. (1997). "Nitric oxide synthase in reactive
astrocytes adjacent to beta-amyloid plaques." Exp Neurol 144(2): 266-72.
Wang, F. E., G. Shi, et al. (2007). "Receptor tyrosine kinase inhibitors AG013764
and AG013711 reduce choroidal neovascularization in rat eye." Exp Eye
Res 84(5): 922-33.
Wang, S., C. M. Lewis, et al. (2007). "Genetic variation and population structure
in native Americans." PLoS Genet 3(11): e185.
Weeber, E. J., U. Beffert, et al. (2002). "Reelin and ApoE receptors cooperate to
enhance hippocampal synaptic plasticity and learning." J Biol Chem
277(42): 39944-52.
Weiss, L. A., J. Veenstra-Vanderweele, et al. (2004). "Genome-wide association
study identifies ITGB3 as a QTL for whole blood serotonin." Eur J Hum
Genet 12(11): 949-54.
Wright, S. (1951). "The genetical structure of populations." Ann. Eugen. (15):
323-354.
Wu, A. H., A. Seow, et al. (2003). "HSD17B1 and CYP17 polymorphisms and
breast cancer risk among Chinese women in Singapore." Int J Cancer
104(4): 450-7.
Wu, H, Cowing JA, Michaelides M, et al. (2006). "Mutations in the gene KCNV2
encoding a voltage-gated potassium channel subunit cause "cone
dystrophy with supernormal rod electroretinogram" in humans." Am J Hum
Genet 79(3):574-9.
Wu, Z., T. W. Lauer, et al. (2007). "Oxidative stress modulates complement
factor H expression in retinal pigmented epithelial cells by acetylation of
FOXO3." J Biol Chem 282(31): 22414-25.
Yang, H., C. McElree, et al. (1993). "Familial empirical risks for inflammatory
bowel disease: differences between Jews and non-Jews." Gut 34(4): 517-
24.
176
Yang, Y. C., C. H. Lin, et al. (2006). "Serum- and glucocorticoid-inducible
kinase 1 (SGK1) increases neurite formation through microtubule
depolymerization by SGK1 and by SGK1 phosphorylation of tau." Mol Cell
Biol 26(22): 8357-70.
Yang, Z., N. J. Camp, et al. (2006). "A variant of the HTRA1 gene increases
susceptibility to age-related macular degeneration." Science 314(5801):
992-3.
Yoshida, T., K. Ohno-Matsui, et al. (2005). "The potential role of amyloid beta in
the pathogenesis of age-related macular degeneration." J Clin Invest
115(10): 2793-800.
Yuan, J. M., R. K. Ross, et al. (1996). "Morbidity and mortality in relation to
cigarette smoking in Shanghai, China. A prospective male cohort study."
JAMA 275(21): 1646-50.
Zareparsi, S., K. E. Branham, et al. (2005). "Strong association of the Y402H
variant in complement factor H at 1q32 with susceptibility to age-related
macular degeneration." Am J Hum Genet 77(1): 149-53.
Zareparsi, S., M. Buraczynska, et al. (2005). "Toll-like receptor 4 variant D299G
is associated with susceptibility to age-related macular degeneration."
Hum Mol Genet 14(11): 1449-55.
Zaykin, D. V., P. H. Westfall, et al. (2002). "Testing association of statistically
inferred haplotypes with discrete and continuous traits in samples of
unrelated individuals." Hum Hered 53(2): 79-91.
Zeggini, E., M. N. Weedon, et al. (2007). "Replication of genome-wide
association signals in UK samples reveals risk loci for type 2 diabetes."
Science 316(5829): 1336-41.
Zhao, X., C. Li, et al. (2004). "An integrated view of copy number and allelic
alterations in the cancer genome using single nucleotide polymorphism
arrays." Cancer Res 64(9): 3060-71.
Ziv, E., E. M. John, et al. (2006). "Genetic Ancestry and Risk Factors for Breast
Cancer among Latinas in the San Francisco Bay Area." Cancer Epidemiol
Biomarkers Prev 15(10): 1878-1885.
177
Ziv, E., E. M. John, et al. (2006). "Genetic ancestry and risk factors for
breast cancer among Latinas in the San Francisco Bay Area." Cancer
Epidemiol Biomarkers Prev 15(10): 1878-85.
Zwarts, K. Y., S. M. Clee, et al. (2002). "ABCA1 regulatory variants influence
coronary artery disease independent of effects on plasma lipid levels." Clin
Genet 61(2): 115-25.
Abstract (if available)
Abstract
Age-related macular degeneration (AMD) is the leading cause of blindness among elderly individuals worldwide. Previous studies confirmed different frequencies between ethnic groups for a series of risk alleles and a pressing need to determine unidentified genetic factors characterizing the pathogenesis of AMD. This study seeks to identify new genes associated with risk for developing early AMD among Latinos, through the use of a genome wide association study (GWAS) and through a copy number variation (CNV) analysis that is based on a series of proposed CNV association tests. In addition, we performed a population admixture study based on 233 ancestry informative markers.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Age related macular degeneration in Latinos: risk factors and impact on quality of life
PDF
The role of genetic ancestry in estimation of the risk of age-related degeneration (AMD) in the Los Angeles Latino population
PDF
Association of traffic-related air pollution and age-related macular degeneration in the Los Angeles Latino Eye Study
PDF
RPE secretome for the treatment of retinal degeneration in the RCS rat
Asset Metadata
Creator
Shtir, Corina Julia
(author)
Core Title
An integrative approach for determining age related macular degeneration risk facors in Latinos
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Biostatistics
Publication Date
04/07/2009
Defense Date
12/17/2008
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
age related macular degeneration,CNV,GRAS,Latinos,OAI-PMH Harvest,population admixture
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Azen, Stanley Paul (
committee chair
), Gauderman, W. James (
committee member
), Humayun, Mark S. (
committee member
), Marjoram, Paul (
committee member
), Varma, Rohit (
committee member
)
Creator Email
corina@shtir.com,stir@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m2057
Unique identifier
UC1317102
Identifier
etd-Shtir-2764 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-208950 (legacy record id),usctheses-m2057 (legacy record id)
Legacy Identifier
etd-Shtir-2764.pdf
Dmrecord
208950
Document Type
Dissertation
Rights
Shtir, Corina Julia
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
age related macular degeneration
CNV
GRAS
population admixture