Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Breast epithelial cell type specific enhancers and functional annotation of breast cancer risk loci
(USC Thesis Other)
Breast epithelial cell type specific enhancers and functional annotation of breast cancer risk loci
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
BREAST EPITHELIAL CELL TYPE SPECIFIC ENHANCERS AND
FUNCTIONAL ANNOTATION OF BREAST CANCER RISK LOCI
By
Suhn Kyong Rhie
A Dissertation Presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(GENETIC, MOLECULAR, AND CELLULAR BIOLOGY)
May 2013
Copyright 2013 Suhn Kyong Rhie
ii
Acknowledgements
Foremost, I would like to express my gratitude to my advisor, Dr. Gerhard A.
Coetzee, who has supported my research with patience, guidance, encouragement,
and caring. I am thankful for his support from the initial to the final.
I would like to thank my thesis committee, Dr. Ite A. Laird-Offringa for
kindness, encouragement, and support. I am grateful for Dr. Christopher A. Haiman
who guided me and enabled me to develop an understanding of the epidemiology.
I would like to express thankful mind to Dr. Young-Kwon Hong and Dr.
Michael R. Stallcup for encouragement and useful discussion. I also thank Dr. Peter
Laird and Dr. Peter A. Jones who provided me insightful comments and questions
through the Jones Lab meetings.
I thank my lab members, Chunli Yan, Simon Coetzee, Houtan Noushmehr,
Omar Khalid, Li Jia, John Lai, and Dennis Hazelett. My research would not have been
possible without their help.
I would like to thank my colleague, Hui Shen, Wanting Chen, Sheng Fang Su,
Genyuan Zhu, Inkyong Mah, and Young Kim for continuous friendship. I was so
happy to walk this journey with them. Many thanks to Dr. Kwang won Jeong and Dr.
Jueng soo You, who discussed many science topics with me.
Last but not the least, I would like to thank my husband, Jae Mun Hugo Kim,
my mother, Ji Youn Choo Rhie, my father, Jong Soo Jason Rhie and my family.
Without their understanding, support, and love, I would not be able to finish this
iii
journey. I also give all glory and thanks to God, who gives me strength, love, and
hope every day and night.
iv
Table of Contents
Acknowledgements ii
List of Tables viii
List of Figures x
Abstract xii
Chapter 1 : Genetic Predisposition to Breast Cancer (BCa) 1
Introduction 1
Breast Cancer 2
What is Breast cancer? 2
Classification of BCa 3
Breast cancer risk factors 3
Genetic Predisposition to Breast Cancer 4
Linkage analysis 4
Mutational screening 5
Positional cloning 6
GWAS 6
Risk loci within coding regions of genes 7
post-GWAS 7
Risk loci in non-coding regions 7
Thesis Perspective 8
Chapter 2: Breast epithelial cell enhancers 11
Introduction 11
Transcription and gene regulation by enhancers 11
Tissue specific enhancers 12
Enhancer identification methods (ChIP-seq, DNaseI-seq,
FAIRE-seq) 14
Results and Discussion 20
Chapter 2 Summary 50
v
Chapter 3: Comprehensive Functional Annotation of Seventy-One
Breast Cancer Risk Loci 51
Introduction 51
From index risk SNP to functional SNP 51
Results 52
1,005 potentially functional high LD SNPs in 71 breast
cancer risk loci 52
Among 21 high LD SNPs in exons, only 2 SNPs in the
ANKLE1 gene were predicted to be non-benign coding
variants. 62
Among 76 high LD SNPs in TSS regions, 42 reside within
response elements of transcription factors. 64
921 high LD SNPs were found at enhancers, and the
enhancer activity varied among breast epithelial cells. 79
Discussion 97
Most TFs, which likely bind to high LD SNPs at enhancers 97
Chapter 3 Conclusions 103
Chapter 4: One example, BCE5 enhancer in 8q24 104
Introduction 104
8q24 risk loci 104
Results and Discussion 105
8q24 is in open chromatin region likely representing an
enhancer nursery 105
BCE5 is defined as a biologically functional enhancer in
breast cells 108
Differential nucleosome depletion and functionality
of SNPs, rs28759353 and rs10087810. 115
BCE5 enhancer activity and its interaction with the c-MYC
oncogene are cell type specific. 119
Chapter 4 Summary 123
vi
Chapter 5: Functional analyses of GWASs 124
Two novel breast cancer susceptibility loci at 6q14 and 20q11 124
GWAS of breast cancer - two novel breast cancer
susceptibility loci at 6q14 and 20q11 124
Functional annotation of two novel breast cancer
susceptibility loci, 6q14 and 20q11 125
Four novel breast cancer susceptibility loci from meta-
analysis 131
GWAS of ER-negative breast cancer in a meta-analysis
- four susceptibility loci (Garcia-Closas, 2013) 131
Functional annotation on four risk loci 132
Novel Loci Associated with Body Mass Index Identified in
a Large Genome-wide Association Study of Men and Women
of African Ancestry 140
GWAS of Body Mass Index in meta-analysis 140
Biological Functions of the Three Novel Loci 140
Functional annotation on Schizophrenia GWAS risk Loci 150
GWAS on Schizophrenia (SCZ) 150
Functional annotation of SCZ risk loci 150
Chapter 6: Materials and Methods 159
Cell Culture 159
FAIRE 159
FAIRE-seq library construction and sequencing 159
Identification of FAIRE-seq peaks 160
Site-specific FAIRE 161
ChIP 161
Chromatin Immunoprecipitation-seq library
construction and sequencing 161
Identification of ChIP-seq peaks 161
Histone modification ChIP-seq data for HMEC 162
DNaseI 162
DNaseI-seq data 162
Plasmid construction and Luciferase reporter assays 163
Annotation and Comparison between normal and cancer cells 164
Identification of the HMEC specific enhancer loci and
MDAMB231 specific enhancer loci by enhancer status 164
vii
Open chromatin nursery regions Identification 165
Circos for the open chromatin nursery regions 165
Gene expression analysis between breast cancer and normal
tissues 165
Gene expression analysis between HMEC and
MDAMB231 cells 166
Motif discovery for SNP 166
Motif discovery for breast epithelial cell type specific
enhancers 167
Transcription factor and gene/protein interaction analysis 168
eQTL analyses 168
FunciSNP Method 169
FunciSNP package (Coetzee et al., 2012) 169
FunciSNP for the seventy-one breast cancer risk loci 170
FunciSNP for two novel breast cancer risk loci;
6q14 and 20q11 (Siddiq et al., 2012) 171
FunciSNP for the three novel ER negative breast cancer
risk loci (Garcia-Closas, 2013) 171
FunciSNP for novel loci associated with Body Mass
Index (BMI) (Monda et al., 2013) 173
FunciSNP for novel loci associated with schizophrenia
(SCZ) 174
Functional SNP validation 175
Allele-specific Luciferase reporter assays 175
Allele-specific FAIRE 175
Chapter 7: Concluding Remarks 176
References 180
viii
List of Tables
Table 2-1 : Motif enrichment in HSEL and MSEL 46
Table 2-2 : Motif enrichment in poised HSEL and active HSEL 48
Table 2-3: Motif enrichment in poised MSEL and active MSEL 49
Table 3-1: 71 Breast cancer risk index SNPs and high LD SNPs genomic
locations
57
Table 3-2: FunciSNP results for TSS regional high LD SNPs. 67
Table 3-3: Differential expression analysis of the genes, which high LD
TSS regional SNPs reside in.
70
Table 3-4: TSS regional high LD SNP motif analysis 72
Table 3-5: eQTL analyses on TSS regional SNP 78
Table 3-6: FunciSNP result for enhancer high LD SNPs coincide with 5 or
more biofeatures 82
Table 3-7: Top 18 TF motifs for high LD SNPs in enhancers 88
Table 3-8: eQTL analyses on enhancer SNPs 89
Table 3-9: Breast Cancer Enhancer (BCE) regions used for luciferase
assays. 90
Table 3-10: Oligonucleotide sequences used for cloning and qPCR. 94
Table 4-1: Oligonucleotide sequences used for cloning and qPCR 113
Table 5-1: A summary of the biofeature analysis and putative functional
SNPs at 5q33 (Monda et al., 2013). 144
Table 5-2: A summary of the biofeature analysis and putative functional
SNPs at 6q16 (Monda et al., 2013). 145
ix
Table 5-3: A summary of the biofeature analysis and putative functional
SNPs at 7p15 (Monda et al., 2013) 147
Table 5-4: A list of putative functional SNPs at SCZ risk loci 153
Table 5-5: A list of SCZ putative functional SNPs in coding exons 157
x
List of Figures
Figure 1-1: Breast cancer susceptibility loci and genes. 10
Figure 2-1: Chromatin structure of the epigenome in normal human
cells.
17
Figure 2-2: Regulatory elements in seven different cell types. 18
Figure 2-3: Image of Human Mammary Epithelial Cell (HMEC) and
MDAMB231 cancer cell.
19
Figure 2-4: Open chromatin region comparison between normal and
breast cancer epithelial cells. 22
Figure 2-5: HSEL and MSEL 25
Figure 2-6: Genomic distribution of HSEL and MSEL 27
Figure 2-7: Expression level of nearby genes of the HSEL and MSEL 30
Figure 2-8: The expression value of nearby genes of HSEL and MSEL 32
Figure 2-9: Active and Poised enhancers 35
Figure 2-10: Active and Poised enhancers with FAIRE signals 38
Figure 2-11: Workflow diagram of transcription factor motif search
between enhancer groups.
45
Figure 3-1: Identification of potential functional SNPs in 71 Breast
cancer risk loci 56
Figure 3-2: 21 High LD SNPs in exon and effect of each variant to the
respective protein. 62
Figure 3-3: An example of TSS regional SNPs, rs2303696, is located in
the promoter region of ISYNA1. 77
Figure 3-4: An example of enhancer SNPs, rs76969790 likely alters a
TAL1 response element. 95
xi
Figure 3-5: Nine novel enhancers including high LD SNPs were
identified in breast epithelial cells. 96
Figure 3-6: Most TFs, likely bind to high LD SNPs at enhancers, are
involved in BCa tumorigenesis. 101
Figure 4-1: Breast cancer risk correlated SNPs were found in selected
open chromatin nursery regions in genome. 106
Figure 4-2: Transcript level near BCE5 enhancer in 8q24 risk region. 110
Figure 4-3: Enhancer assays in 8q24 region. 111
Figure 4-4: Site-specific FAIRE assays near BCE5 region. 112
Figure 4-5: rs28759353 and rs10087810 at BCE5 enhancer are
functional SNPs for breast cancer risk.
117
Figure 4-6: 8q24 region and 3C interaction frequency of risk loci with
MYC. 121
Figure 5-1: UCSC Genome Browser view displaying chromatin features
in 20q11 region. 128
Figure 5-2: UCSC Genome Browser view displaying chromatin features
in 6q14 region. 130
Figure 5-3: Chromatin features for 1q32.1/MDM4, 1q32.1/LGR6, 2p24
and 16q12.2/FTO regions in normal human mammary
epithelial cells (HMEC) and triple negative breast cancer
cells (MDAMB231). 135
Figure 5-4: UCSC Genome Browser view of the three novel loci with
FunciSNP results. 148
Figure 5-5: UCSC Genome Browser view of the risk SNP rs478190 and
functional SNPs. 155
Figure 5-6: UCSC Genome Browser view of the risk SNP rs7269496 and
functional SNPs.
156
Figure 5-7: UCSC Genome Browser view of the risk SNP rs2955368 and
functional SNPs.
158
xii
Abstract
Breast Cancer (BCa) genome-wide association studies (GWAS) revealed allelic
frequency differences between cases and controls at index single nucleotide
polymorphisms (SNPs). To date 71 loci have thus been identified and replicated.
More than 90% of them are in either introns or intergenic regions. Here we
hypothesize that at least some of the SNPs affect the activity of non-coding genomic
regulatory regions, such as enhancers.
To identify such elements in terms of nucleosome depletion and surrounding
histone modifications, we measured Formaldehyde assisted isolation of regulatory
elements (FAIRE) and histone modifications, H3K4me1 and H3K27Ac via ChIP-seq
genome-wide in human mammary epithelial cells (HMEC) and MDAMB231 breast
cancer epithelial cells. We identified 2000 HMEC specific enhancer loci (HSEL) and
2000 MDAMB231 specific enhancer loci (MSEL), and characterized their enhancer
status in terms of poised and active enhancers by annotating histone modifications.
Additionally, by analyzing gene expression data in both cells types, we found that
the enhancers (identified above) affect nearby gene expression, and this effect was
significantly correlated with the enhancer status.
To find possible functional SNPs in 71 BCa risk loci, we extracted all the SNPs,
residing in 1Mb windows around breast cancer risk index SNP from the 1000
genomes project to find correlated SNPs as defined by r
2
. We used FunciSNP, an
R/Bioconductor package developed in-house with input from me, to identify
xiii
potentially functional SNPs at 71 risk loci by coinciding them with chromatin
biofeatures, including enhancers. We identified 1,005 SNPs in LD with the index
SNPs (r
2
≥0.5) in three categories; 21 in exons of 18 genes, 76 in transcription start
site (TSS) regions of 25 genes, and 921 in enhancers. We found two correlated and
predicted non-benign coding variants (rs8100241 in exon 2 and rs8108174 in exon
3) of the gene, ANKLE1. Most putative functional LD SNPs, however, were found in
either epigenetically defined enhancers or in gene TSS regions. Fifty-five percent of
these non-coding SNPs are likely functional, since they affect response element (RE)
sequences of transcription factors. Unbiased analyses of SNPs at BCa risk loci
revealed new and overlooked mechanisms that may affect risk of the disease,
thereby providing a valuable resource for follow-up studies.
1
Chapter 1. Genetic Predisposition to Breast Cancer (BCa)
1.1 Introduction
Cancer is a class of diseases caused by abnormal growth of cells and their invasion
to other tissues. These growth-uncontrolled cells, including malignant ones, can
additionally spread to other locations in the body through the blood and lymph
systems; this process is called metastasis. Cancer can occur in various tissues in our
body, depending on the location of malignant tumors. It is reported that there are
more than one hundred types of cancer (Fogh et al., 1977).
Unlike single gene disorders, which are caused by the single gene mutation
such as Sickle-cell anemia, most cancers belong to complex disorders, which occur
due to a combination of effects from multiple factors such as multiple genes,
lifestyle, and environmental factors (Boffetta and Nyberg, 2003). Complex disorders
often cluster in families, and this fact implicates that inheritance are also crucial in
predisposition of cancer (Lichtenstein et al., 2000). However, the complexity of
interaction between multiple factors and their labyrinthine combinatorial effects on
the prognosis of cancers have challenged scientists to study cancers for the past
decades. After the human genome was completely sequenced in 2003, various
projects were started to study the genetic variants in the human genome. Using
identified genetic variants, such as single nucleotide polymorphism (SNP),
epidemiologists have performed genome wide association studies (GWAS) linking
SNP allelic frequencies with several cancer phenotypes. However, most such
2
identified SNPs were located in non-coding regions of the genome and the
identification of functional and causal genetic variants have not been performed yet.
Therefore, it is acclaimed that post-GWAS are essential to understand mechanisms
of carcinogenesis from risk SNPs and these may be used in determining risk a priori.
1.2 Breast Cancer
1.2.1 What is Breast cancer?
Breast cancer (BCa) is the most common type of non-skin cancer among American
women. The American Cancer Society calculates that the chance of developing
invasive breast cancer at any time in a woman’s life is about 1 in 8. Female breast
cancer incidence rates have increased for more than decades, but these rates started
to decrease by about 2% per year from 1998 to 2007, specifically in women aged 50
or older mainly due to the use of hormone therapy after menopause. However,
breast cancer is still the second leading cause of cancer death in women, and it is
reported that about 1 in 35 of woman’s death is due to breast cancer (2008). There
are no symptoms in early stages of breast cancer. Breast cancer is only noticed by
feeling a lump or mass in the breast region (or detected by mammograms).
Therefore, preventive strategies, such as routine mammogram screening, are the
only way to be diagnosed during early stages of breast cancer.
3
1.2.2 Classification of BCa
BCa can be classified using different parameters, which influence treatment
response and prognosis. The principal classification aspect for breast cancer is the
tumor-node-metastasis (TNM) stage. The size of the tumor (T), expansion of
primary tumors to the lymph nodes (N), and the presence of distant metastases (M)
determine the stage of cancer from 0 to 4. Stage 4 is defined as metastatic cancer
and almost incurable (Singletary and Connolly, 2006).
Molecular receptors status is also considered to categorize tumors to at least
five major breast cancer subtypes: single or a combination of basal-like, human
epidermal growth factor receptor 2 (HER2) (aka ERBB2) positive, estrogen receptor
(ER) positive, luminal B, and luminal A. Basal-like represents ‘triple-negative’ breast
cancer [ER negative, progesterone receptor (PR) negative, HER2 negative]. HER2
positive subtype expresses no ER and PR, but HER2. Luminal A tumors express ER
and/or PR but no HER2, and luminal B tumors are HER2, ER positive and/or PR
positive (Perou et al., 2000; Sorlie et al., 2001). Luminal tumors tend to be
associated with more favorable prognoses, while HER2 positive and basal like
tumors have been associated with the worst prognoses (Nielsen et al., 2004;
Sotiriou and Pusztai, 2009).
1.2.3 Breast cancer risk factors
Many factors increase the risk of development of breast cancer: onset of
menstruation in early ages, late onset of menopause, radiation exposure, heavy
4
alcohol consumption, high fat diet, obesity, first pregnancy after age of 30, and
genetic predisposition (Hulka and Stark, 1995). Unlike other cancers, breast cancer
risk tends to be inherited (Lichtenstein et al., 2000). For example, having a first-
degree relative with breast cancer increases a woman’s risk of the breast cancer
about 2-fold. Additionally, several genetic loci are involved in the development of
breast cancer (Martin and Weber, 2000). For instance, carriers of alterations in
familial breast cancer genes such as BRCA1 or BRCA2 are at higher risk (life-time
risk increases by ten fold). Ethnic background also impacts the breast cancer
incidence and death rates; for example, one of the breast cancer types, triple
negative breast cancer occurrence has been observed at a higher rate among
Africans compared to other ethnic groups (Carey et al., 2006). Owing to the difficulty
of early detection, it is crucial to understand the development of breast cancer and
the genetic predisposition.
1.3 Genetic Predisposition to Breast Cancer
1.3.1 Linkage analysis
Linkage analysis is a statistical technique used to identify the gene involved in a trait
or disease relative to the known location of chromosomal markers. This method is
based on the co-segregation of chromosomal markers with the disease within many
disease pedigrees. Genome-wide linkage analyses were performed in large breast
cancer pedigrees, and breast cancer susceptibility genes, BRCA1 and BRCA2 were
identified (Hall et al., 1990; Wooster et al., 1995).
5
BRCA1 and BRCA2 are large genes that have important roles in DNA double-
strand breaks repair and the maintenance of genomic stability. Multiple loss-of-
function mutations and truncations of these genes significantly increase the risk of
breast cancer; for instance, BRCA1-185delAG, 5382insC and BRCA2-6174delT,
999del5 are related to the risk independently (Linger and Kruk, 2010). These genes
have high relative risk of breast cancer (RR>10), and penetrance is very high (Fig. 1-
1). In early studies of large cancer families, it is reported that the risk of breast
cancer by age 70 may be about 87% for BRCA1 and 84% for BRCA2 mutation
carriers (Ford et al., 1995). However, these mutations tend to occur only in single
families; in the general population it very rarely happens (Graeser et al., 2009).
1.3.2 Mutational screening
After BRCA1 and BRCA2 were identified as susceptible genes of breast cancer,
studies to understand their molecular pathogenesis were performed. In order to
identify candidate breast cancer susceptible genes, multiple mutational screenings
were performed in the 1990s. Mutational screening of the entire coding exons from
a large number of cases and controls were conducted to identify pathogenic
mutations. Therefore, it was revealed that DNA repair-related candidate genes such
as TP53, CHEK2, ATM, BRIP1, and PALB2 have been associated with hereditary
breast cancer (Campeau et al., 2008; Walsh et al., 2006)(Fig. 1-1).
6
1.3.3 Positional cloning
Additional positional cloning, which is the method of finding causative genes near
linked markers, also revealed causative genes, such as PTEN, STK11, and CDH1
(Guilford et al., 1998; Hemminki et al., 1998; Hemminki et al., 1997; Jenne et al.,
1998; Nelen et al., 1996; Nelen et al., 1997). However, unlike BRCA1 and BRCA2
genes, which were identified in families, these genes were identified in families with
other syndromes (e.g. Cowden syndrome), and the syndromes were associated with
an increased risk of breast cancer. PTEN, STK11, and CDH1 genes give relative risk
of breast cancer (RR=2~10) (Fig. 1-1). However, it is not certain that mutations in
these genes give an attributable risk to familial breast cancer.
1.3.4 GWAS
In recent years, using new molecular technologies, molecular epidemiologists have
performed numerous genome-wide association studies (GWAS), in which thousands
of common single-nucleotide polymorphisms (SNPs) are tested for association with
a disease in hundreds or thousands of people (allelic frequency differences between
cases and matched controls). These studies have produced many common alleles,
which each confers a risk of cancers with relatively low magnitude (RR<1.5) (Fig. 1-
1). Additionally, issues of the effect sizes of studies as well as the lack of the
subsequent functional studies, which identify molecular mechanism, have hampered
the interpretation of present and future GWAS.
7
1.3.5 Risk loci within coding regions of genes
Although as stated above most risk loci from BCa GWAS are in non-coding regions of
the human genome, a few coding variants have been identified. DCLRE1B, CASP8,
BRCA2, and ANKLE1 are examples of genes where breast cancer risk index SNPs
reside in coding regions. They may play a mechanistically important role for obvious
reasons. For instance, BRCA2, a regulator of DNA-repair is a well-known susceptible
gene of breast cancer (see above). However, they represent by far the minority of
risk index SNPs. Additionally, these index SNPs are most likely surrogates of many
other SNPs in linkage disequilibrium (LD); Most of the GWAS used arrays which
contained a subgroup of SNPs based on the Hapmap data to capture a large fraction
of common genetic variation (Altshuler et al., 2010).
1.4 post-GWAS
1.4.1 Risk loci in non-coding regions
More than ninety percent of breast cancer risk loci identified by GWASs was in non-
coding regions such as introns and intergenic regions (most of them have correlated
SNPs in LD also not in coding regions -- see Chapter 3). Among them, more than
forty-five percent of breast cancer risk loci were in gene deserts, with no annotated
genes within +/- 50kb from the risk loci. Knowledge of the non-coding regions is
rudimentary compared to the protein coding part. However, recent ENCODE data
dramatically demonstrate that the non-coding part of the genome is much more
than simply ‘junk’ DNA and contains well-demarcated gene regulatory regions, in
8
particular enhancers (Ecker et al., 2012). We have recently formulated a roadmap to
address the functionality of risk SNPs in non-coding regions by characterizing gene
regulatory regions with nucleosome and transcription factor occupancy and histone
modifications (Freedman et al., 2011). Moreover, several research groups annotated
genomic regions (coding and non-coding) to identify candidate functional SNPs
involved in complex diseases (Cowper-Sal Lari et al., 2012; Hardison, 2012;
Maurano et al., 2012; Schaub et al., 2012). However, as more next generation
sequencing (NGS) data (of chromatin annotations from consortia such as ENCODE),
more loci (from meta and primary GWASs) and more SNPs at ever lower minor
allele frequencies (from the 1000 genomes project) become available, further
analyses utilizing updated data and methods are needed for specific diseases such as
BCa. Therefore, the study to find the role of these non-coding regions should be
performed to identify the functional and causal risk SNP.
1.5 Thesis Perspective
This chapter provides a general introduction of the breast cancer, conveying
classification and risk factors of breast cancer. However, the main focus of this
dissertation is to understand the role of non-coding regions in genome and breast
cancer predisposition by identifying functional SNPs.
In chapter 2, the identification and characterization of breast cancer specific
enhancers was performed to reveal the role of non-coding DNA regions in breast
cancer and normal genome. Chapter 3 describes functional SNPs identified in
9
seventy-one breast cancer risk loci. In chapter 4, a specific enhancer, BCE5 enhancer
in chromosome 8q24 region is assessed and functional SNPs within its enhancer
have been identified. In chapter 5, functional analyses of two additional GWASs on
breast cancer and GWASs on body mass index (BMI) and schizophrenia (SCZ) are
described. Chapter 6, the materials and methods, including the FunciSNP in-house R
package, are explained in detail. Finally, in chapter 7, a discussion about post-GWAS
approaches and breast cancer genome structure is presented.
10
Figure 1-1. Breast cancer susceptibility loci and genes. All known breast cancer
susceptibility genes are shown between the red and blue lines. No genes have been
identified below the blue line, and no genes exist above the red line so far (Foulkes,
2008).
Foulkes, W.D. (2008). Inherited susceptibility to common cancers. N Engl J Med 359,
2143-2153.
11
Chapter 2. Breast epithelial cell enhancers
2.1 Introduction
2.1.1 Transcription and gene regulation by enhancers
The human genome has 22 chromosome pairs and a sex chromosome pair. This
single genome gives rise to several hundred distinct cell types such as epithelial
cells, neuron cells, blood cells, germ cells, etc. In 2003, the human genome was
sequenced and it allowed a profound understanding of the primary coding DNA
sequences. However, the understanding of primary DNA sequences cannot address
the important question as how one genome can give rise to a large number of
diverse cells and tissues. What has become apparent is that dynamic gene
expression levels largely determined by regulatory networks and epigenetic
mechanisms may explain how cellular phenotypes are defined.
Genomic DNA is organized into chromatin. 147 base pairs of DNA are
wrapped around two of each of four core histone proteins, namely H2A, H2B, H3,
and H4, forming a single octameric nucleosome core particle. In general there are
two levels of chromatin organization: euchromatin and heterochromatin (Fig. 2-
1)(Baylin and Jones, 2011). In heterochromatin, multiple histones are wrapped into
a 30nm fiber consisting of nucleosome arrays, forming tightly packaged chromatin.
In this compact chromatin, most nuclear processes cannot be performed. For
instance, DNA coding genes are not expressed in heterochromatin. In euchromatin
by contrast, the chromatin is ‘open’ and loosely packaged, forming a ‘beads on a
12
string’ structure. In this form, nuclear processes such as transcription can be
performed and dynamic nucleosome positioning influence gene expression.
Nucleosome positions and histone marks play important roles to demarcate
regulatory elements such as promoters, enhancers, and repressed regions.
Specifically, nucleosome-depleted regions mark active open chromatin regions
where RNA polymerases and transcription factors bind and function in the
regulation of gene expression.
The binding of transcription factors to specific regulatory DNA sequences in
the genome is crucial for the precise and coordinated regulation of transcription.
Transcription factors affecting gene expression regulation bind to both promoter
regions and to distal sites such as enhancer regions, forming DNA loops to the genes
they regulate. DNA looping and the networking among transcription factors, co-
activators, and co-repressors result in a particular transcriptional output of target
genes. These dynamic processes of regulatory elements are essential for regulating
gene expression, and the roles of non-coding regulatory elements at distal sites,
apart from transcription start site (TSS), are in fact in many cases more important
than promoters.
2.1.2 Tissue specific enhancers
Recently, a number of studies investigated the roles of regulatory elements distal
from genes. For instance, Song et al analyzed regulatory elements in seven different
cell types, which were GM12878 (lymphoblast), K562 (chromic myeloid leukemia),
13
HepG2 (Hepatocellular carcinoma), HeLa-S3 (Cervical carcinoma), HUVEC (Human
umbilical vein endothelial cells), NHEK (Keratinocyte, normal epidermal cells) and
H1-ES (Human embryonic stem cells) (Song et al., 2011). The location of regulatory
elements in the genome was classified into five groups: 2kb upstream of
transcription start site (TSS), first exon and intron, internal exon and intron, last
exon and 2kb downstream, and intergenic region. Interestingly, it was found that
ubiquitous peaks, which were found in all seven cell types, were mainly located near
TSS region. However, cell type selective peaks, which were unique to each cell, were
located in non-coding regulatory regions such as enhancers in intergenic and intron
regions (Song et al., 2011) (Fig. 2-2). Another study also revealed differences in
chromatin organization and was key to the multiplicity of cell states by annotating
tissue-specific distal regulatory elements (Zhu et al., 2013). According to this paper,
there are clusters of distal elements with similar cell-type specificities with specific
transcription factors enriched for each cluster.
In human cancer, global loss of mono-acetylated Lys16 and tri-methylated
Lys20 residues of histone H4 (H4K16 and H4K20) appeared in the early phases of
cell transformation and increased with tumor progression (Elsheikh et al., 2009).
Additionally, DNA methylation is a biochemical process involving the addition of a
methyl group to the cytosine DNA nucleotide, alters the expression of genes and the
cancer phenotype (Baylin and Jones, 2011).
Epigenetic changes are critical also for development and progression of
breast cancer. For example, tumor suppressor genes such as CDKN2/p16 are
14
silenced via aberrant CpG island hypermethylation in breast cancer (Herman et al.,
1995). Many histone acertyltransferases (HATs), which direct histone acetylation to
open chromatin, are involved in breast cancer (Kimura et al., 2005; Sterner and
Berger, 2000; Yang, 2004). For instance, well-characterized HAT proteins,
p300/CBP regulate cell proliferation and function as tumor suppressors (Phillips
and Vousden, 2000).
Therefore, here I hypothesize that enhancers may distinguish breast
epithelial cells with different phenotypes. In order to identify such specific
enhancers related to two extreme breast epithelial phenotypes, we selected to study
human mammary epithelial cell (HMEC) and breast cancer cell line (MDAMB231) as
models to compare enhancers with extremes in differential gene expression profiles
and cell morphology (Fig. 2-3). Additionally, to avoid potential complications due to
the presence of the estrogen receptor (ER) in certain BCa cells, and since it is known
that BCa can arise from the transformation of a normal estrogen receptor-negative
mammary epithelial cell (Prat and Perou, 2009), these two types of cells (both ER
negative) were selected in this study.
2.1.3 Enhancer identification methods (ChIP-seq, DNaseI-seq, FAIRE-seq)
Epigenome-wide studies have identified various histone marks and chromatin
features that annotate the genome and detect potential regulatory activity. Thus,
H3K4me3 is a modification more likely associated with promoters (Heintzman et al.,
2007; Mikkelsen et al., 2007), whereas H3K4me2 is more likely associated with
15
promoters and enhancers (Heintzman et al., 2007). H3K4me1, on the other hand, is
associated more likely with enhancers (Bernstein et al., 2005). H3K9Ac is associated
with open chromatin regions (Bernstein et al., 2005; Heintzman et al., 2007),
whereas H3K27Ac is associated more likely with active (engaged) regulatory
elements (Bernstein et al., 2005; Heintzman et al., 2007). H3K27me3 is associated
with polycomb-repressed regions (Prescott et al., 2007), whereas H3K36me3 and
H4K20me1 are associated with transcribed regions. By using the method called
Chromatin Immunoprecipitation Assays (ChIP), we can identify these regulatory
elements.
Additionally, nucleosome-depleted regions can be detected by DNaseI
hypersensitive sites and/or Formaldehyde-Assisted Isolation of Regulatory
Elements (FAIRE) (Song et al., 2011). Although both methods identify nucleosome-
depleted regions including where transcription factors bind (Neph et al Nature
2012), the FAIRE assay seems to detect more nucleosome-depleted regions (than
DNaseI) in introns and exons and non-promoter intergenic regions. In contrast,
DNaseI hypersensitive sites are more enriched within 2kb of TSS and within 5’ exon
and introns (Song et al., 2011).
Enhancers are non-directional regulatory elements that control gene
expression at a distance on linear DNA (Ong and Corces, 2011); i.e. they are found in
so-called non-coding DNA regions such as intergenic and introns. Here, we recorded
nucleosome-depleted regions genome-wide using FAIRE-seq analyses in HMEC
(normal breast epithelial cells) and MDAMB231 (triple negative BCa cells), and used
16
enhancer histone marks (H3K4me1 and H3K27Ac) to refine the identified
enhancers.
17
Figure 2-1. Chromatin structure of the epigenome in normal human cells.
Chromatin architecture is an important factor in the regulation of gene expression.
Nucleosome positioning and histone modifications demarcate regulatory elements
(Baylin and Jones, 2011).
Baylin, S.B., and Jones, P.A. (2011). A decade of exploring the cancer epigenome -
biological and translational implications. Nat Rev Cancer 11, 726-734.
18
Figure 2-2. Regulatory elements in seven different cell types. Percentage of
ubiquitous and cell-type selective open chromatin sites differ related to
transcription start sites and presence of CTCF (Song et al., 2011).
Song, L., Zhang, Z., Grasfeder, L.L., Boyle, A.P., Giresi, P.G., Lee, B.K., Sheffield, N.C.,
Graf, S., Huss, M., Keefe, D., et al. (2011). Open chromatin defined by DNaseI and
FAIRE identifies regulatory elements that shape cell-type identity. Genome Res 21,
1757-1767.
Figure 2-3. Image of Human Mammary Epithelial Cell (HMEC) and MDAMB231
cancer cell. HMEC and MDAMB231 cells were observed through a light microscope
(200x microscope). (A) HMEC (B) MDAMB231
Image of Human Mammary Epithelial Cell (HMEC) and MDAMB231
HMEC and MDAMB231 cells were observed through a light microscope
(200x microscope). (A) HMEC (B) MDAMB231
19
Image of Human Mammary Epithelial Cell (HMEC) and MDAMB231
HMEC and MDAMB231 cells were observed through a light microscope
20
2.2 Results and Discussion
2.2.1 Nucleosome-depleted regions of two extreme breast epithelial cell types
To compare nucleosome-depleted region differences between two extreme breast
epithelial cell types, we performed FAIRE-seq using normal human mammary
epithelial cells (HMEC) and breast cancer epithelial cells (MDAMB231). We
identified 34,806 FAIRE-seq peaks genome-wide in the two cell types by selecting
significant FAIRE signals with p-value < 10
-9
as compared with input signals.
To study the distribution of potential regulatory regions in each cell type, we
pooled FAIRE peaks into two distinct genomic categories: promoters (defined as +/-
1,500bp from a known annotated TSS) and putative enhancer regions (> 1,500bp
from TSS) (Fig. 2-4A). We are aware that not all FAIRE peaks (nucleosome-depleted
regions) distal from an annotated TSS function as enhancers [e.g. insulators are also
nucleosome depleted (Song et al., 2011)]; so we classified them as putative
enhancers. In HMEC, 91.6% (n=17,450) of the FAIRE peaks were located in
intergenic or intronic regions; a similar high percentage, 85% (n=13,401) of FAIRE
peaks were located in such regions in MDAMB231 cells (Fig. 2-4A).
Next, we studied FAIRE peaks throughout the genome in the two cell types in
more detail. Interestingly, a large number of nucleosome depletion regions in these
two cell types did not coincide (examples of coinciding and non-coinciding peaks are
shown in Figure 2-4B). Overall overlapping FAIRE peaks found in HMEC and
MDAMB231 cells were only 6,453 (34% of HMEC, 41% of MDAMB231) among the
15,753 MDAMB231 and 19,053 HMEC FAIRE peaks. To test whether there is any
21
preference of genomic location in these overlapping regions, we assessed the
number of overlapping FAIRE peaks at TSS and putative enhancer regions,
separately. For TSS, 73.7% of FAIRE peaks in normal were common FAIRE peaks in
breast cancer cells. However, only 30.2% of FAIRE peaks in normal putative
enhancer regions were common with breast cancer (Fig. 2-4C). This indicated that
there were relatively large numbers of nucleosome-depleted regions in non-coding
regions (putative enhancers), which were unique to either normal or breast cancer
cells.
To determine the genomic distribution of overlapping FAIRE peaks in
putative enhancers, we categorized FAIRE peaks overlapping intergenic, intron, and
other regions (i.e. 3’UTR, 5’UTR). About the same percentage of peaks common in
both cell types were distributed in intergenic, intron or other regions (29.7%, 31.0%,
30.8%, respectively) (Fig. 2-4D). In summary, we found that >60% of nucleosome-
depleted regions were unique to normal or breast cancer cells, and these regions
were evenly distributed in intergenic and intronic regions, most likely in enhancers.
This result is consistent with data found by comparing other cell types with each
other (Song et al., 2011).
Figure 2-4. Open chromatin region comparison between normal and breast
cancer epithelial cells.
MDAMB231 FAIRE peaks in the genome. (B) Genome browser of MDAMB231
FAIRE peak (red) at 5q11.2 (chr5:56,121,000
(yellow) at 6q22.33 (chr6:127,678,000
peak (green) at 14q24.1 (chr14:69,005,000
between MDAMB231 (red) and HMEC (green) FAIRE peaks in whole genome (ALL),
1500bp upstream from transcription start site region (TSS) and greater than
1500bp from TSS region (Putative Enhancer). (D) Distribution of non
chromatin regions in three categorized regions: intergenic, intron, other regions (i.e.
3’UTR, 5’UTR).
Open chromatin region comparison between normal and breast
cancer epithelial cells. (A) The distribution of annotated HMEC FAIRE peaks and
MDAMB231 FAIRE peaks in the genome. (B) Genome browser of MDAMB231
5q11.2 (chr5:56,121,000-56,136,000)(left), overlap FAIRE peak
(yellow) at 6q22.33 (chr6:127,678,000-127,693,000)(middle), HMEC
peak (green) at 14q24.1 (chr14:69,005,000-69,020,000)(right). (C) Overlap
between MDAMB231 (red) and HMEC (green) FAIRE peaks in whole genome (ALL),
bp upstream from transcription start site region (TSS) and greater than
1500bp from TSS region (Putative Enhancer). (D) Distribution of non
chromatin regions in three categorized regions: intergenic, intron, other regions (i.e.
22
Open chromatin region comparison between normal and breast
(A) The distribution of annotated HMEC FAIRE peaks and
MDAMB231 FAIRE peaks in the genome. (B) Genome browser of MDAMB231-only
(left), overlap FAIRE peak
127,693,000)(middle), HMEC-only FAIRE
69,020,000)(right). (C) Overlap
between MDAMB231 (red) and HMEC (green) FAIRE peaks in whole genome (ALL),
bp upstream from transcription start site region (TSS) and greater than
1500bp from TSS region (Putative Enhancer). (D) Distribution of non-coding open
chromatin regions in three categorized regions: intergenic, intron, other regions (i.e.
23
2.2.2 Breast epithelial cell type specific enhancers (HSEL and MSEL)
As stated above, FAIRE-seq identified peaks, may not only detect enhancers but also
insulators or other regulatory elements. To further investigate specific enhancers,
we first performed ChIP assays using an enhancer histone mark, H3K4me1. We
characterized 110,715 H3K4me1 ChIP-seq peaks genome-wide in the two cell types
and assessed their genomic distribution. To identify breast epithelial cell type
specific enhancers, we used the findPeaks software, which ranks differentially
enriched H3K4me1 sites in both cells. Then, we selected the top 2000 H3K4me1
sites, which were unique to each cell type. We named H3K4me1 sites found only in
HMEC, but not in MDAMB231 as HMEC specific enhancer loci (HSEL). We named
H3K4me1 sites found only in MDAMB231, but not in HMEC as MDAMB231 specific
enhancer loci (MSEL). By visualizing H3K4me1 enrichment of each HSEL (+/- 5kb)
in a heatmap, we confirmed that the HSEL are H3K4me1 sites in HMEC only but not
in MDAMB231 (Ye et al., 2011)(Fig. 2-5A). By calculating the mean density of
H3K4me1 ChIP-seq tags, we confirmed that the HSEL are unique H3K4me1 sites in
HMEC (Fig. 2-5B). For example, one of the HSEL is located at 10kb downstream of
TAPT1-AS gene, which encodes a putative transmembrane protein and may function
downstream of homeobox C8 to transduce extracellular patterning information
during axial skeleton development (Howell et al., 2007)(Fig. 2-5C).
We also detected the enrichment of H3K4me1 in MDAMB231 cells, but
almost none of HMEC H3K4me1 enrichment at MDAMB231 specific enhancer loci
(MSEL)(Fig. 2-5D). The mean density of the H3K4me1 ChIP-seq tags at the MSEL
24
was much higher in MDAMB231, compared to HMEC (Fig 2-5E). An example of the
MSEL is shown in figure 2-5F. This H3K4me1 site is in the intron region of PRKAG2
gene, which is a member of the AMP-activated protein kinase gamma subunit family.
The PRKAG2 gene encodes a protein with four cystathonine beta-synthase domains
(Ahmad et al., 2005; Burwinkel et al., 2005). Mutations in the gene are associated
with Wolff-Parkinson-White syndrome, progressive conduction system disease and
cardiac hypertrophy (Daniel and Carling, 2002; Gollob et al., 2002).
Next, we investigated the genomic distributions of the HSEL. When we
categorized genomic regions to transcription terminal site (TTS) (-100bp of TSS to
+1kb of TSS), promoter-TSS (-1kb of TSS to +100bp), non-coding exon, intron,
intergenic, exon, 5’UTR, and 3’UTR, we observed that most of the HSEL were
distributed in introns and intergenic regions (Fig. 2-6A). The HSEL were distributed
throughout the genome from chr1 through x, and we were not able to detect any
chromosomal distribution enrichment pattern (Fig. 2-6B). We also investigated the
genomic distribution of the MSEL. As we have seen for the HSEL, most of the MSEL
were enriched in introns and intergenic regions (Fig. 2-6C). The MSEL were
distributed throughout the chromosomes from chr1 through x like the HSEL (Fig. 2-
6D).
25
Figure 2-5. HSEL and MSEL (A) H3K4me1 ChIP-seq tags from both cells at the
HSEL were graphed in the heatmap (red: higher density) (B) Mean density of
H3K4me1 ChIP-seq tags from both cells at the HSEL (C) An example of the HSEL
located near the TAPT1-AS1 gene (D) H3K4me1 ChIP-seq tags from both cells at the
MSEL were graphed in the heatmap (red: higher density) (B) Mean density of
H3K4me1 ChIP-seq tags from both cells at the MSEL (C) An example of the MSEL
located in the intron of the PRKAG2 gene
MDAMB231 HMEC
H3K4me1
Scale
chr4:
10 kb hg19
16,250,000 16,260,000 16,270,000 16,280,000
RefSeq Genes
MDAMB231_H3K4me1_1_SHE1303A43.tags Total Tags = 2.56e+07, normalized to 1.00e+07
HMEC H3K4me1 Histone Mods by ChIP-seq Signal from ENCODE/Broad
TAPT1-AS1
TAPT1-AS1
MDAMB231_H3K4me1_1_SHE1303A43.tags Total Tags = 2.56e+07, normalized to 1.00e+07
15 _
0 _
HMEC H3K4m1
62.48 _
0.04 _
-5kb +5kb
Center of Peak
MDAMB231
H3K4me1
HMEC
H3K4me1
Center of Peak
A B
C
26
Figure 2-5: Continued.
S c a l e
c h r 7 :
5 k b h g 1 9
1 5 1 , 5 3 5 , 0 0 0 1 5 1 , 5 4 0 , 0 0 0 1 5 1 , 5 4 5 , 0 0 0
R e f S e q G e n e s
M D A M B 2 3 1 _ H 3 K 4 m e 1 _ 1 _ S H E 1 3 0 3 A 4 3 . t a g s T o t a l T a g s = 2 . 5 6 e + 0 7 , n o r m a l i z e d t o 1 . 0 0 e + 0 7
H M E C H 3 K 4 m e 1 H i s t o n e M o d s b y C h I P - s e q S i g n a l f r o m E N C O D E / B r o a d
P R K A G 2
M D A M B 2 3 1 _ H 3 K 4 m e 1 _ 1 _ S H E 1 3 0 3 A 4 3 . t a g s T o t a l T a g s = 2 . 5 6 e + 0 7 , n o r m a l i z e d t o 1 . 0 0 e + 0 7
8 . 6 _
0 _
H M E C H 3 K 4 m 1
1 6 . 1 6 _
0 . 0 4 _
MDAMB231 HMEC
H3K4me1
-5kb +5kb
Center of Peak
Center of Peak
D E
F
HMEC
H3K4me1
MDAMB231
H3K4me1
27
Figure 2-6. Genomic distribution of HSEL and MSEL (A) Genomic distribution of
the HSEL (B) Distribution of the HSEL from chr1 to chrX (C) Genomic distribution of
the MSEL (D) Distribution of the MSEL from chr1 to chrX
0 0.1 0.2 0.3 0.4 0.5
3'UTR
5'UTR
Exon
Intergenic
Intron
Non-coding
Promoter-TSS
TTS
Percentage of Peaks
1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 8 9 X
Number of Peaks
0 50 100 150 200
Chromosome number
A
B
Percentage of Peaks
Percentage of Peaks
0 0.1 0.2 0.3 0.4 0.5
3'UTR
5'UTR
Exon
Intergenic
intron
Non-coding
Promoter-TSS
TTS
1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 3 4 5 6 7 8 9 M X
Number of Peaks
0 50 100 150
Chromosome number
C
D
28
2.2.3 RNA expression level change by enhancers
Once enhancers were identified, we asked which genes are the targets of these
enhancers. Therefore, we decided to study the gene expression patterns in HMEC
and MDAMB231 by using published the microarray data, which were performed in
3 replicates of each cell type by D’Amato et al (D'Amato et al., 2012). When we
compared the expression level of nearby genes for the specific HSEL, we found that
the expression levels correlated with the enrichment of H3K4me1. For example, one
of the HSEL is in the intron of the phosphatidylinositol-specific phospholipase C, X
domain-containing 2 (PLCXD2) gene (Fig. 2-7A). Interestingly, there was an
alternative promoter of another gene, pleckstrin homology-like domain, family B,
member 2 (PHLDB2) gene at 1kb downstream of this HSEL (Fig. 2-7B). By using the
microarray data, we measured the expression level of PLCXD2 gene and PHLDB2
gene in HMEC and MDAMB231 and found that both genes were up-regulated in
HMEC, compared to MDAMB231 (Fig. 2-7C,D). Since the distance from this HSEL to
the canonical promoter of PHLDB2 gene was about 200kb, we decided to investigate
the nearby genes expression levels (+/-200kb) of the HSEL and MSEL genome-wide.
We calculated the distribution of expression values of nearby genes for the
HSEL (Fig. 2-8A). As expected, compared to MDAMB231, a median of the nearby
genes’ expression values for HMEC was higher. Also, nearby genes’ expression
values for the MSEL were higher in MDAMB231, compared to HMEC (Fig. 2-8B). We
further plotted the expression fold change of nearby genes (+/-200kb) of the HSEL
and MSEL. Most of genes with the HSEL were up-regulated in HMEC, compared to
29
the genes without HSEL. In contrast, most of genes with the MSEL were up-
regulated in MDAMB231 compared to the genes without the MSEL (11,545
genes)(Fig. 2-7E).
30
Figure 2-7. Expression level of nearby genes of the HSEL and MSEL (A) UCSC
genome browser screen shot at 3q13.2 (B) One of the HSEL is located in intron of
the PLCXD2 gene (red arrow) (C) The expression level of the PLCXD2 gene in both
cells (HMEC and MDAMB231) (D) The expression level of the PHLDB2 gene in both
cells (E) Log fold change of nearby gene expression boxplot for the HSEL and MSEL
Scale
chr3:
200 kb hg19
111,300,000 111,400,000 111,500,000 111,600,000 111,700,000
RefSeq Genes
MDAMB231_H3K4me1_1_SHE1303A43.tags Total Tags = 2.56e+07, normalized to 1.00e+07
HMEC H3K4me1 Histone Mods by ChIP-seq Signal from ENCODE/Broad
CD96
CD96
ZBED2 PLCXD2
PLCXD2
PHLDB2
PHLDB2
PHLDB2
PHLDB2
ABHD10
ABHD10
ABHD10
ABHD10
TAGLN3
TAGLN3
TAGLN3
MDAMB231_H3K4me1_1_SHE1303A43.tags Total Tags = 2.56e+07, normalized to 1.00e+07
9.4 _
0 _
HMEC H3K4m1
40 _
1 _
S c a l e
c h r 3 :
1 0 k b h g 1 9
1 1 1 , 4 4 0 , 0 0 0 1 1 1 , 4 4 5 , 0 0 0 1 1 1 , 4 5 0 , 0 0 0 1 1 1 , 4 5 5 , 0 0 0 1 1 1 , 4 6 0 , 0 0 0 1 1 1 , 4 6 5 , 0 0 0
R e f S e q G e n e s
M D A M B 2 3 1 _ H 3 K 4 m e 1 _ 1 _ S H E 1 3 0 3 A 4 3 . t a g s T o t a l T a g s = 2 . 5 6 e + 0 7 , n o r m a l i z e d t o 1 . 0 0 e + 0 7
H M E C H 3 K 4 m e 1 H i s t o n e M o d s b y C h I P - s e q S i g n a l f r o m E N C O D E / B r o a d
P L C X D 2
P L C X D 2
P H L D B 2
M D A M B 2 3 1 _ H 3 K 4 m e 1 _ 1 _ S H E 1 3 0 3 A 4 3 . t a g s T o t a l T a g s = 2 . 5 6 e + 0 7 , n o r m a l i z e d t o 1 . 0 0 e + 0 7
1 3 _
0 _
H M E C H 3 K 4 m 1
8 7 . 4 4 _
0 . 0 4 _
HMEC
H3K4me1
MDAMB231
H3K4me1
HMEC
H3K4me1
MDAMB231
H3K4me1
ZOOM IN
A
B
Figure 2-7: Continued.
0
1
2
3
4
5
6
7
PLCXD2
Expression value
C
P<1.4e-06
E
7: Continued.
HMEC
MDAMB231
0
2
4
6
8
10
12
PHLDB2
Expression value
HMEC
MD
D P<7.6e-08
31
MEC
MDAMB231
32
Figure 2-8. The expression value of nearby genes of HSEL and MSEL (A) Boxplot
of expression value for the genes with HSEL (B) Boxplot of expression value for the
genes with MSEL
33
2.2.4 Classification of enhancers
Recently, Zentner et al and several other groups categorized enhancers to several
classes by using histone modifications and nearby gene expression (Zentner and
Scacheri, 2012). H3K27Ac is known to demarcate active enhancers along with
H3K4me1, the mark for enhancers in general. When there are high enrichment of
H3K4me1 and H3K27Ac, the enhancers are active. High enrichment of H3K4em1,
but low H3K27Ac, correlated with low gene expression and the enhancers were
considered poised. (Zentner and Scacheri, 2012).
In order to determine whether the expression levels of nearby genes in our
study are also correlated with the enhancer status as defined above, we classified
the HSEL to poised and active groups by using K means linear clustering (Ye et al.,
2011). We identified 1,270 poised HSEL, which contain high enrichment of HMEC
H3K4me1 but low H3K27Ac, and 730 active HSEL, which contain high enrichment of
HMEC H3K4me1 and H3K27Ac (Fig. 2-9A,B). We also classified the MSEL to poised
and active groups as we have done for HSEL. 1,021 poised MSEL, which contain high
MDAMB231 H3K4me1 but low H3K27Ac were thus identified. 979 active MSEL,
which contain high MDAMB231 H3K4me1 and H3K27Ac, were also identified (Fig.2-
9C, D).
Next, we investigated the gene expression level of the poised and active
enhancer groups as defined above by randomly selecting 500 sites: 500 poised
HSEL, 500 active HSEL, 500 poised MSEL, and 500 active MSEL. Genes with the
poised HSEL had significantly negative log fold change of expression value
34
compared to genes without the HSEL. However, genes with the active HSEL were
more significantly up-regulated in HMEC than MDAMB231, compared to genes
without HSEL. On the other hand, genes with the poised MSEL had more positive
log fold change of expression compared to genes without the MSEL. Genes with the
active MSEL were more significantly up-regulated in MDAMB231 than HMEC
compared to the genes without MSEL (Fig. 2-9E). Therefore, we detected that
enhancer status defined by the abundance of histone modifications were positively
correlated with the expression of nearby genes (+/-200kb).
35
Figure 2-9. Active and Poised enhancers (A) H3K4me1 and H3K27Ac ChIP-seq
tags from HMEC at the HSEL were graphed in the heatmap (red: higher density) (B)
Mean density of H3K4me1 and H3K27Ac ChIP-seq tags from HMEC at the poised
HSEL (top) and the active HSEL (bottom) (C) H3K4me1 and H3K27Ac ChIP-seq tags
from MDAMB231 at the MSEL were graphed in the heatmap (red: higher density)
(B) Mean density of H3K4me1 and H3K27Ac ChIP-seq tags from MDAMB231 at the
poised MSEL (top) and the active MSEL (bottom) (E) Log fold change of nearby gene
expression boxplot for the poised/active HSEL and poised/active MSEL
H3K4me1 H3K27Ac
Poised Active
HMEC
-5kb +5kb
Center of Peak
H3K4me1 H3K27Ac
HMEC
Center of Peak Center of Peak
Center of Peak Center of Peak
A B
Figure 2-9: Continued.
H3K4me1 H3K27
MDAMB231
Poised Active
-5kb +5
Center of
C
E
nued.
7Ac
+5kb
Peak
H3K4me1 H3K27Ac
MDAMB231
Center of Peak Center of Peak
Center of Peak Center of Peak
D
36
37
2.2.5 Nucleosome depleted regions in enhancers (FAIRE seq)
Next, we compared the HSEL and MSEL with FAIRE-seq data, to superimpose
nucleosome-depleted regions. We intersected FAIRE-seq peaks from HMEC with the
HSEL. In 1,004 HSEL, HMEC FAIRE peaks were detected with well-positioned
surrounding histone marks (Fig. 2-10A-D). Among these sites, 550 HSEL were in the
poised status, and 454 of them were in active (Fig. 2-10A-D). On the other hand,
1047 MSEL were intersected with MDAMB231 FAIRE peaks with surrounding
histone marks (Fig. 2-10E-H). Among them, 512 were in the poised status, and 535
belonged to the active status (Fig. 2-10E-H). Overall, regardless of enhancer status,
about fifty percent of enhancers retained nucleosome-depleted regions.
38
Figure 2-10. Active and Poised enhancers with FAIRE signals (A) H3K4me1,
H3K27Ac ChIP-seq and FAIRE-seq tags from HMEC at the poised HSEL which were
intersected with HMEC FAIRE signals were graphed in the heatmap (red: higher
density) (B) Mean density of H3K4me1, H3K27Ac ChIP-seq and FAIRE-seq tags from
HMEC at the poised HSEL overlapping with HMEC FAIRE signals (C) H3K4me1,
H3K27Ac ChIP-seq and FAIRE-seq tags from HMEC at the active HSEL which were
intersected with HMEC FAIRE signals were graphed in the heatmap (D) Mean
density of H3K4me1, H3K27Ac ChIP-seq and FAIRE-seq tags from HMEC at the
active HSEL overlapping with HMEC FAIRE signals (E) H3K4me1, H3K27Ac ChIP-
seq and FAIRE-seq tags from MDAMB231 at the poised MSEL which were
intersected with MDAMB23 FAIRE signals were graphed in the heatmap (F) Mean
density of H3K4me1, H3K27Ac ChIP-seq and FAIRE-seq tags from MDAMB231 at
the poised MSEL overlapping with MDAMB231 FAIRE signals (G) H3K4me1,
H3K27Ac ChIP-seq and FAIRE-seq tags from MDAMB231 at the active MSEL which
were intersected with MDAMB231 FAIRE signals were graphed in the heatmap (H)
Mean density of H3K4me1, H3K27Ac ChIP-seq and FAIRE-seq tags from HMEC at
the active MSEL overlapping with MDAMB231 FAIRE signals
39
Figure 2-10: Continued.
Center of FAIRE Peak
H3K4me1 H3K27Ac
HMEC
FAIRE
-5kb +5kb
Center of Peak
A
B
Center of FAIRE Peak
H3K4me1 H3K27Ac
HMEC
FAIRE
-5kb +5kb
Center of Peak
C
D
40
Figure 2-10: Continued.
Center of FAIRE Peak
H3K4me1 H3K27Ac
MDAMB231
FAIRE
-5kb +5kb
Center of Peak
E
F
41
Center of FAIRE Peak
H3K4me1 H3K27Ac
MDAMB231
FAIRE
-5kb +5kb
Center of Peak
G
H
42
2.2.6 Motif analysis and enrichment
Nucleosome-depleted regions are known as markers of transcription factor
occupied regions. According to Neph et al, nucleosome-depleted regions identify the
DNA nucleotides that a transcription factor, NRF1 bound, consistent with the results
from the in silico motif search (Neph et al., 2012). Additionally, CENTIPEDE, which is
the transcription factor binding prediction program that uses FAIRE peaks and in
silico motif search, were determined as a relatively precise tool compared to other
methods (He et al., 2012). Therefore, we searched for motifs, which may be found in
FAIRE nucleosome depleted regions in the HSEL and MSEL by using TRANSFAC and
JASPAR (Matys et al., 2006; Portales-Casamar et al., 2010). After performing the
motif search, we restricted this search to the response elements of factors, known to
be expressed in these cells by using the microarray data (D'Amato et al., 2012).
Then, we performed chi-square test between the groups to measure significant
enrichment of transcription factor response elements, which may be found
preferentially in one group of enhancers than another. The entire process is
summarized in a flow diagram (Fig. 2-11).
When we compared the results of transcription factor motif searches between
FAIRE regions in the HSEL and MSEL, we found that several response elements such
as TP63, FOS, FOXA1, FOXA2 were highly enriched in FAIRE regions in the MSEL,
compared to in MSEL enhancers (P<10
-6
). In contrast, TP63 elements were more
highly enriched in the HSEL than in the MSEL (P<10
-29
) (Table 2-1). FOXA1 and
FOXA2 are transcription factors involved in breast cancer, found mainly in
43
enhancers. FOXA1 is known as a pioneer transcription factor that triggers
transcriptional competency of regulatory sites (Serandour et al., 2011). These
proteins are interacting with estrogen receptor (ER) and playing an important role
in ER positive breast cancer (Mann et al., 2011). FOS is also known to act as an AP1
by forming a dimer with JUN, and it is a classical integrator of extrinsic growth
stimuli and environmental stress (Angel and Karin, 1991). This motif is also highly
enriched in the clusters of distal elements with primary cells (Zhu et al., 2013). TP63
acts as a sequence-specific DNA binding transcriptional activator or repressor, and
may be required in conjunction with TP73/p73 for initiation of p53/TP53
dependent apoptosis in response to genotoxic insult and the presence of activated
oncogenes (Peter et al., 2009; Yang et al., 1998). TP63 may be involved in Notch
signaling by probably inducing JAG1 and JAG2. It may play a role in the regulation of
epithelial morphogenesis.
We also compared the enriched motifs between the poised and active HSEL.
Besides FOXA2 and TP63 (found in above), NR3C1, NFATC2, FOXQ1, PBX1, GABPA,
FOXF2, TFAP2A, NFIC, VDR, TFCP2, BHLHE41, and HMGA1 motifs were significantly
and differently enriched (Table 2-2). When we compared the enriched motifs
between the poised and active MSEL, FOXA1, FLI1///EWSR1, GABPA, ELK1, MYB,
and NR2F1 were significantly and differently enriched (Table 2-3). However, FOS
motif, which is highly enriched in the HSEL than MSEL, seemed to have the same
motif enrichment value for poised and active HSEL. Therefore, we speculate that
44
FOS is one of the transcription factors that may act as a pioneer to regulate the
activity of the breast epithelial cell type specific enhancers.
45
Figure 2-11 Workflow diagram of transcription factor motif search between
enhancer groups.
FAIRE signals in HSEL/MSEL
TRANSFAC/JASPAR motif search
Restricted motif search to expressed TF
Enrichment of motifs between groups
(Chi-square test)
46
Table 2-1. Motif enrichment in HSEL and MSEL
number of motif
motif
name HSEL MSEL P-value database
TP63 116 4 7.36184E-30 TRANSFAC
FOS 398 650 2.89561E-24 JASPAR
FOS 352 589 3.09326E-21 JASPAR
FOS 49 173 6.26727E-19 TRANSFAC
FOS 42 183 2.55706E-17 TRANSFAC
FOXA1 277 468 8.01437E-16 JASPAR
FOS 352 585 5.15082E-14 JASPAR
FOS 39 133 9.59355E-14 TRANSFAC
FOXA2 339 514 1.91691E-12 JASPAR
FOS 32 143 2.92473E-12 TRANSFAC
FOXA1 47 124 4.3675E-12 TRANSFAC
FOXA1 32 134 6.35589E-12 TRANSFAC
FOS 306 524 6.90782E-12 JASPAR
TCF4 95 215 2.32746E-10 TRANSFAC
FOXA1 49 117 3.29421E-10 TRANSFAC
FOXA1 34 127 4.05122E-10 TRANSFAC
FOXA2 9 59 1.40838E-08 TRANSFAC
FOXA1 40 129 1.47946E-08 TRANSFAC
FOXA1 43 89 2.56782E-07 TRANSFAC
FOXA1 28 99 2.91238E-07 TRANSFAC
FOXI1 232 347 4.48727E-07 JASPAR
FOXA1 34 122 4.80172E-07 TRANSFAC
FOXF2 128 214 3.0013E-06 JASPAR
FOXD1 121 202 6.77501E-06 JASPAR
NR4A2 226 326 1.06541E-05 JASPAR
FOXA1 38 94 0.000152604 TRANSFAC
MAF 220 324 0.000261159 TRANSFAC
FOXQ1 150 221 0.000286212 JASPAR
FOXO3 221 303 0.00032296 JASPAR
MYB 110 139 0.000554679 TRANSFAC
BACH2 11 36 0.00108292 TRANSFAC
RORA 208 257 0.003943546 JASPAR
NFATC2 250 320 0.004210301 JASPAR
TFCP2 223 200 0.005152595 TRANSFAC
47
Table 2-1: Continued.
HOXA7 41 31 0.007929921 TRANSFAC
FOXA2 9 59 0.007929921 TRANSFAC
REPIN1 64 81 0.007929921 TRANSFAC
REPIN1 64 81 0.007929921 TRANSFAC
FOXA2 9 59 0.007929921 TRANSFAC
HOXA9 40 34 0.007929921 TRANSFAC
HOXA9 40 34 0.007929921 TRANSFAC
NFE2L1 85 101 0.007929921 TRANSFAC
MITF 18 29 0.007929921 TRANSFAC
TP63 116 4 0.007929921 TRANSFAC
ZEB1 25 15 0.007929921 TRANSFAC
TP63 116 4 0.007929921 TRANSFAC
ZEB1 25 15 0.007929921 TRANSFAC
48
Table 2-2. Motif enrichment in poised HSEL and active HSEL
number of motif
motif
name
poised
HSEL
active
HSEL P-value database
FOXA2 149 190 8.57142E-07 JASPAR
NR3C1 82 103 0.001556665 JASPAR
NFATC2 117 133 0.003437088 JASPAR
FOXQ1 66 84 0.004021317 JASPAR
PBX1 58 75 0.005447848 JASPAR
FOXA1 133 144 0.007836496 JASPAR
GABPA 167 170 0.018035385 JASPAR
FOXF2 58 70 0.021211545 JASPAR
TFAP2A 178 117 0.022461303 JASPAR
NFIC 204 141 0.045116749 JASPAR
VDR 78 37 0.003973454 TRANSFAC
TFAP2A 48 19 0.005425038 TRANSFAC
TFCP2 141 82 0.006009468 TRANSFAC
BHLHE41 74 41 0.038680133 TRANSFAC
HMGA1 26 34 0.048453367 TRANSFAC
49
Table 2-3. Motif enrichment in poised MSEL and active MSEL
number of motif
motif name
poised
MSEL
active
MSEL P-value database
FOXA1_1 84 50 0.0011194 TRANSFAC
FOXA1_2 77 45 0.001439 TRANSFAC
FLI1///EWSR1 318 381 0.001768741 JASPAR
GABPA 173 231 0.001811555 JASPAR
ELK1 38 63 0.017063412 JASPAR
MYB_2 56 36 0.0238437 TRANSFAC
NR2F1 134 173 0.028502632 JASPAR
50
2.3 Summary
In this study, we identified breast epithelial enhancers, which are detected only in
one cell type of breast epithelial cells by using FAIRE-seq and ChIP-seq methods.
The expression level of nearby genes of these enhancers (±200kb) was changed by
the abundance of such enhancers. By using histone marks, H3K4me1 and H3K27Ac,
enhancers were sub-categorized to the poised and active enhancer groups. Among
breast epithelial enhancers, about fifty percent of them were in poised and the rest
were in active status, with high enrichment of H3K27Ac marks. The expression
levels of genes near such enhancers (±200kb) were correlated with enhancer status
(poised or active). Although the targets of each enhancer and direct interactions
among regulatory elements should be tested by other methods such as 3C, TALEN,
and transgenic mice modeling, our finding demonstrated that the enhancer
regulated nearby gene expression levels. Fifty percent of the poised enhancers and
fifty percent of the active enhancers contained nucleosome-depleted regions, where
transcription factors may bind. By performing transcription factor motif search, we
identified that TP63, FOS, FOXA1 and FOXA2 are candidate transcription factors,
which may be involved in regulating breast epithelial enhancers.
51
Chapter 3. Comprehensive Functional Annotation of
Seventy-One Breast Cancer Risk Loci
3.1 Introduction
3.1.1 From index risk SNP to functional SNP
Apart from a few examples of genetic mutations with high penetrance, such as found
in BRCA1 & 2 genes (Mavaddat et al., 2010), most genetic risk of breast cancer (BCa)
resides at multiple low penetrance loci, more recently identified by genome-wide
association studies (GWASs) (Peng et al., 2011). In general, GWASs utilize single
nucleotide polymorphisms (SNPs) to tag common genetic variation in linkage
disequilibrium (LD) blocks in order to identify risk loci genome-wide. To date 71
replicated and independent BCa risk loci have been identified (Ahmed et al., 2009;
Antoniou et al., 2010; Easton et al., 2007; Fletcher et al., 2011; Garcia-Closas, 2013;
Ghoussaini et al., 2013; Gudmundsson et al., 2007; Haiman et al., 2011; Hunter et al.,
2007; Michailidou, 2013; Stacey et al., 2008; Stevens et al., 2012; Turnbull et al.,
2010; Zheng et al., 2009). On average, there are thousands of SNPs in each LD block,
and many of these SNPs are candidates to exert functionality in BCa risk. For
example, collectively more than 320,000 such SNPs are associated with BCa risk at
the 71 loci. Due to this plethora of correlated SNPs in LD, much of the heritability of
complex diseases, such as BCa, remains hidden and needs to be elucidated (Gibson,
2011). Identification of underlying mechanisms of how SNPs affect risk will facilitate
52
a more comprehensive understanding of complex disease genetic risk, such as in the
work described here for breast cancer.
In the present study, we addressed the hypothesis that BCa risk SNPs reside
in functional genomic regions such as coding exons, TSS regions, and enhancers. In
order to identify potentially functional SNPs, we conducted a comprehensive
analysis on 656,895 SNPs from the 1000 genomes project data released in May
2012, at the 71 BCa risk loci by measuring LD and annotating them with 11 NGS
datasets, all in primary breast epithelial cells. Thus, we found 1,005 potentially
functional high LD SNPs. From these we were able to frame specific hypotheses
involving 547 SNPs in terms of novel biological mechanisms; 2 SNPs were at non-
benign codon changes in one gene, 42 and 503 SNPs were within response elements
of known transcription factors in TSS regions and enhancers, respectively. This
shortlist of potentially functional SNPs will not only aid in prioritizing a manageable
number of likely functional SNPs, but also reveal hidden biological mechanisms for
the etiology of breast cancer.
3.2 Results
3.2.1 1,005 potentially functional high LD SNPs in 71 breast cancer risk loci
To date, 71 replicated risk loci for BCa have been identified primarily using
GWASs (Ahmed et al., 2009; Antoniou et al., 2010; Easton et al., 2007; Fletcher et al.,
2011; Ghoussaini et al., 2013; Gudmundsson et al., 2007; Haiman et al., 2011;
Hunter et al., 2007; Stacey et al., 2008; Stevens et al., 2012; Turnbull et al., 2010;
53
Zheng et al., 2009) (Garcia-Closas, 2013; Michailidou, 2013). The index SNPs
identified by GWASs occur mainly in non-coding DNA (33 intergenic, 33 in introns, 1
in a 3’UTR) and only 4 in coding exons (Fig. 3-1A) (Table 3-1). Although index SNPs
such as rs11571833 (Lys3326Term in BRCA2 gene) (Michailidou, 2013) seem to be
involved in known genetic mechanism of breast cancer tumorigenesis (Mavaddat et
al., 2010), the mechanisms for most of the other index SNPs are hidden. Additionally,
these index SNPs are most likely surrogates of many other SNPs in LD, since most of
the GWAS arrays were designed based on the Hapmap data to capture a large
fraction of common genetic variation (Altshuler et al., 2010). When we extracted
SNP data for Europeans from the 1000 genomes project released in May 2012
(2010), we found 308,010 very low LD (0≤r
2
<0.1), 11,438 low LD (0.1≤r
2
<0.5) and
3,508 high LD (r
2
≥0.5) SNPs at the 71 BCa risk loci (in a 1MB window surrounding
each index SNP) (Fig. 3-1B).
In order to identify potentially functional SNPs, we hypothesized that risk
SNPs occur at sites with functionality of some form or another. Candidates are in
coding exons, regulatory regions near TSS (TSS regions), and enhancers. To assist in
assigning potential functionality, we performed a FunciSNP (Functional Integration
of SNPs) analysis (Coetzee et al., 2012). FunciSNP is an R/Bioconductor package
developed in-house to evaluate positional overlap between correlated SNPs at any
disease or trait locus, and available chromatin biofeatures. Here, we chose exons,
TSS regions (including promoters), and enhancers as biofeatures to annotate the
genome comprehensively.
54
Coding exon data were downloaded from UCSC genome table browser
(Dreszer et al., 2012). TSS regions were defined as 3kb windows centered on the
annotated transcription start sites of genes including one or more of the following
biofeatures, all in human mammary epithelial cells (HMEC): nucleosome depletion
[DNase1-sensitivity and/or Formaldehyde-Assisted Isolation of Regulatory
Elements (FAIRE) signals] and/or histone modifications as diagnostics of promoters
(H3K4me3, H3K4me2, H3K9ac and/or H3K27ac) (Heintzman et al., 2007; Song et al.,
2011; Wang et al., 2008). Enhancers were defined as regions in introns and
intergenic regions (>1.5kb from TSS) in HMEC, containing one or more of the
following biofeatures: nucleosome depletion (DNase1-sensitivity and/or FAIRE
signals) and/or histone modifications as diagnostics of enhancers (H3K4me1,
H3K4me2, H3K9ac and/or H3K27ac) (Heintzman et al., 2007; Song et al., 2011;
Wang et al., 2008).
In order to identify risk correlated SNPs, a FunciSNP evaluation of each index
SNP was employed by extracting all known SNPs from the 1000 genomes project
database (1Mb windows, spanning each index SNP) (2010). Biofeatures were then
aligned with the positions of all curated SNPs at each region. Each SNP that overlaps
with a biofeature was used to calculate the r
2
and distance to the associated index
SNP. Among 322,954 correlated SNPs (r
2
> 0), 22 percent were at biofeatures (Fig.
3-1C). Several issues may be considered to define risk SNPs in LD. One is that low LD
SNPs may be the functional risk SNP, poorly measured by the index SNP. On the
other hand, high LD SNPs are more likely to be the risk SNP, since this is based on
55
the hypothesis that the underlying functional alleles are common. We identified
1,005 SNPs in relatively high LD (r
2
≥0.5); 21 in exons, 76 in TSS regions and 921 in
enhancers (Fig. 3-1D) at 60 of the 71 BCa risk loci. The selection process of
potentially functional variants is summarized in figure 3-1E.
56
Figure 3-1. Identification of potential functional SNPs in 71 Breast cancer risk
loci (A) Genomic distribution of 71 replicated index SNPs for breast cancer risk loci.
(B) SNPs residing in 1MB windows around breast cancer risk index SNPs were
categorized into the indicated four different groups by measuring LD in EUR ethnic
groups. (C) SNPs in each LD group were further analyzed by their locations
coinciding with biofeatures. (D) High LD SNPs within biofeatures were categorized
to three groups; exon, TSS region, and enhancers. (E) The entire process is
summarized in a flow diagram (Rhie et al., 2013).
Rhie, S.K., Coetzee, S.G., Noushmehr, H., Yan, C., Kim, J.M., Haiman, C.A., and Coetzee,
G.A. (2013). Comprehensive annotation of seventy-one breast cancer risk loci. Plos
One
All SNPs in 1Mb window of each index SNP
(n=656859)
High LD SNPs (r
2
.5)
(n=3508)
Overlap biofeatures
(n=1005)
Exon
(n=21)
TSS region
(n=76)
Enhancer
(n=921)
21
76
921
Exon TSS region Enhancer
0
200
400
600
800
1000
Number of SNP
1005
2158
67399
2503
9280
240611
0 50,000 100,000 150,000 200,000 250,000 300,000
high LD (r .5)
low LD (.1 r <.5)
very low LD (0 r <.1)
Number of SNP
without biofeature
with biofeature
4
33
33
1
exon
intron
intergenic
other(3'UTR)
B
C
D
E
A
333939
308010
11438 3508
no LD
very low LD (0 r <.1)
low LD (.1 r <.5)
high LD (r .5)
2
2
2
2
2
2
57
Table 3-1. 71 Breast cancer risk index SNPs and high LD SNPs genomic locations
risk region # Chr. risk index SNP
Number of
high LD
SNPs
within TSS
region
biofeature
Number of
high LD
SNPs
within
Enhancer
biofeature
Number of
high LD
SNPs within
coding Exon
biofeature
Nearest gene of
index SNP
index SNP
genomic
location(intron,
exon, intergenic)
1 1p11.2 rs11249433 EMBP1 intron
2 1p13.2 rs11552449 3 6 1 DCLRE1B exon(misssense)
3 1p36.22 rs616488 16 1 PEX14 intron
4 1q32.1 rs4245739 8 21 MDM4 intron(3'UTR)
5 1q32.1 rs6678914 2 10 1 LGR6 intron
6 2p24.1 rs12710696 15 OSR1(200kb) intergenic
7 2q14.2 rs4849887 3 INHBB(100kb) intergenic
8 2q31.1 rs1550623 3 CDCA7(6kb) intergenic
9 2q31.1 rs2016394 3 DLX2(5kb) intergenic
10 2q33.1 rs1045485 1 CASP8 exon(missense)
11 2q35 rs13387042 TNP1(200kb) intergenic
12 2q35 rs16857609 24 DIRC3 intron
13 3p24.1 rs12493607 34 TGFBR2 intron
14 3p24.1 rs4973768 14 1 SLC4A7 3'UTR
15 3p26.1 rs6762644 15 ITPR1 intron
16 4q24 rs9790517 1 2 TET2 intron
17 4q34.1 rs6828523 ADAM29 intron
18 5p12 rs4415084 6 16 1 MRPS30(100kb) intergenic
19 5p15.33 rs10069690 1 TERT intron
20 5q11.2 rs10472076
RAB3C(30kb),
PDE4D intergenic
21 5q11.2 rs1353747 3 PDE4D intron
58
Table 3-1: Continued.
22 5q11.2 rs889312 17 8 MAP3K1(60kb) intergenic
23 5q33.3 rs1432679 1 EBF1 intron
24 6p23 rs204247 2 5 1 RANBP9(10kb) intergenic
25 6p25.3 rs11242675 2 FOXQ1(3kb) intergenic
26 6q14.1 rs17529111 1 FAM46A(300kb) intergenic
27 6q25.1 rs2046210 1 4 1 C6orf97(5kb) intergenic
28 6q25.1 rs3757318 C6orf97 intron
29 7q35 rs720475 3 ARHGEF5 intron
30 8p12 rs9693444 42
C8orf75(80kb),
DUSP4(250kb) intergenic
31 8q21.11 rs2943559 1 1 HNF4G intron
32 8q21.11 rs6472903 HNF4G(80kb) intergenic
33 8q24.21 rs11780156 42 PVT1(70kb) intergenic
34 8q24.21 rs13281615 38 POU5F1B(100kb) intergenic
35 9p21.3 rs1011970 1 CDKN2A,CDKN2B intron
36 9q31.2 rs10759243 2 KLF4(50kb) intergenic
37 9q31.2 rs865686 10 KLF4(500kb) intergenic
38 10p12.31 rs11814448 1 DNAJC1(20kb) intergenic
39 10p12.31 rs7072776 11 11 1 MLLT10(300bp) intergenic
40 10p15.1 rs2380205 2 11 ANKRD16(25kb) intergenic
41 10q21.2 rs10995190 ZNF365 intron
42 10q22.3 rs704010 24 ZMIZ1 intron
43 10q25.2 rs7904519 36 TCF7L2 intron
44 10q26.12 rs11199914 5 FGFR2(100kb) intergenic
45 10q26.13 rs2981582 1 15 FGFR2 intron
46 11p15.5 rs3817198 1 LSP1 intron
59
Table 3-1: Continued.
47 11q13.1 rs3903072 2 11 3
SNX32(15kb),
OVOL1(15kb) intergenic
48 11q13.3 rs614367 1 CCND1(100kb) intergenic
49 11q24.3 rs11820646 1 BARX2(130kb) intergenic
50 12p11 rs10771399 1 62 PTHLH (30kb) intergenic
51 12p13.1 rs12422552 9 ATF7IP(80kb) intergenic
52 12q22 rs17356907 2 NTN4(20kb) intergenic
53 12q24.21 rs1292011 5 MED13L(400kb) intergenic
54 13q13.1 rs11571833 BRCA2 exon(nonsense)
55 14q13.3 rs2236007 3 1 PAX9 intron
56 14q24.1 rs2588809 39 RAD51B intron
57 14q24.1 rs999737 28 RAD51B intron
58 14q32.11 rs941764 CCDC88C intron
59 16q12.1 rs3803662 1 TOX3(5kb) intergenic
60 16q12.2 rs11075995 22 FTO intron
61 16q12.2 rs17817449 98 FTO intron
62 16q23.2 rs13329835 CDYL2 intron
63 17q23 rs6504950 7 10 1 STXBP4 intron
64 18q11.2 rs1436904 2 CHST9 intron
65 18q11.2 rs527616 1 AQP4(80kb) intergenic
66 19p13.11 rs2363956 2 2 2 ANKLE1 exon(missense)
67 19p13.11 rs4808801 3 49 3 ELL intron
68 19q13.31 rs3760982 39 KCNN4(1kb) intergenic
69 21q21.1 rs2823093 6 NRIP1(60kb) intergenic
70 22q12.2 rs132390 EMID1 intron
71 22q13.1 rs6001930 88 MKL1 intron
60
3.2.2 Among 21 high LD SNPs in exons, only 2 SNPs in the ANKLE1 gene were
predicted to be non-benign coding variants.
Twenty-one high LD SNPs (r
2
≥ 0.5) were annotated in exons (Fig. 3-2A). The
majority (fifteen) results in synonymous variants. Among the six missense variants,
2 variants: rs8100241 and rs8108174 (both in the gene ANKLE1 at locus 19p13)
(Fig. 3-2B), are predicted to result in a non-benign change as revealed by SIFT and
PolyPhen protein function prediction software (Adzhubei et al., 2010; Kumar et al.,
2009)(Fig. 3-2C). The first of these is in exon 2 (causing A31T) and the other in exon
3 (causing L94Q). Both SNPs are equally and highly correlated (r
2
= 0.94) with the
original GWAS index SNP, rs2363956, which in turn also results in another non-
benign amino acid change (L184W) in exon 5 of ANKLE1 as revealed by PolyPhen
analysis. Thus, the three SNPs collectively result in two main haplotypes, which in
turn create two main protein isoforms, A - L - L and T - Q - W with most likely
functional consequences as revealed by SIFT and PolyPhen analyses. ANKLE1 is
expressed in breast epithelial cells (Brachner et al., 2012; Rosenbloom et al., 2012).
It contains an ankyrin repeat likely involved in protein-protein interactions. Also, it
is an evolutionary conserved non-membrane-bound LEM protein that shuttles
between the nucleus/cytoplasm and has an enzymatically active GIY-YIG
endonuclease domain (Brachner et al., 2012). This multifunctional protein has the
potential of affecting many cellular phenotypes and thus cancer risk. The two allelic
variants need to be modeled in protein structure-function assays to precisely
determine the risk mechanisms involving them. A final interesting genomic feature
61
of the two correlated SNPs is that their locations appear to have histone H3K4me1, -
me2 and -me3 signals (Fig. 3-2B), pointing to possible additional potential roles in
regulatory components that in turn may affect expression levels of ANKLE1 and/or
the other nearby gene, BABAM1. Such multi-functional SNPs will add to the
complexity of BCa disease risk. Interestingly, the same locus was identified in a
GWAS of ovarian cancer (Bolton et al., 2010), indicating that ANKLE1 may be
generally involved in women cancers, perhaps via hormonal-mediated mechanisms.
62
Figure 3-2. 21 High LD SNPs in exon and effect of each variant to the respective
protein. (A) The list of high LD SNPs (r
2
≥ 0.5) in exons. The risk region number is
derived from Table S1 and ordered by chromosome number. Index SNP of each
corrSNP and the value of r
2
between these SNPs are listed. The distance from the
index SNP to each corrSNP is shown along with the name of the nearest gene. The
type of each exon variant is also annotated. (B) A genomic browser view of two high
LD SNPs, rs8100241 and rs8108174. The first track shows FunciSNP results for the
exons. The name of the correlated SNP (rsnumber – r
2
value) is shown in blue. The
index SNP is shown in black. The bottom tracks are biofeature tracks, RefSeq
genes/mRNA/Pseudogene tracks from UCSC Genes, common SNPs (version 137),
and Linkage Disequilibrium (LD) blocks. LD block, which is measured by r
2
value in
phased CEU is shown. Allele frequencies of each SNP and zoomed in view of the
genome browser for each SNP are shown, including the amino acid changes by
missense variants. (C) The effect of amino acid changes by missense variants of the
respective protein was predicted by SIFT and PolyPhen (Adzhubei et al., 2010;
Kumar et al., 2009).
63
Figure 3-2: Continued.
Rhie, S.K., Coetzee, S.G., Noushmehr, H., Yan, C., Kim, J.M., Haiman, C.A., and Coetzee,
G.A. (2013). Comprehensive annotation of seventy-one breast cancer risk loci. Plos
One
64
3.2.3 Among 76 high LD SNPs in TSS regions, 42 reside within response
elements of transcription factors.
Next, we studied 76 high LD SNPs, which reside at TSS regions of 25 genes
(Table 3-2). Fifty-two percent of these genes are not only expressed in breast
tissues, but their expression levels are changed during breast carcinogenesis (2012;
Finak et al., 2008; Ma et al., 2009; Radvanyi et al., 2005; Rhodes et al., 2004;
Richardson et al., 2006; Sorlie et al., 2003) (Table 3-3). The TSS regions were
defined as containing not only proximal promoters but also distal ones and perhaps
also close-by (proximal) enhancers in the 3kb windows centered at annotated TSS.
These genomic regions are likely involved in gene expression regulation of the gene,
primarily by altering transcription factor (TF) binding. There are approximately
2,600 proteins in the human genome that bind to DNA (Babu et al., 2004), and
recently, a large number of ChIP-seq datasets were published involving many TFs
(Ecker et al., 2012). However, due to the availability of a limited number of good
antibodies and the requirement of high numbers of cells for ChIP assays, ChIP data
are often biased towards a subgroup of TFs. As a more broader approach, we
performed in silico searches of finding TF REs by utilizing 4 different softwares:
HOMER (ChIP-seq known motifs), FIMO, Genome Trax (ChIP-seq TFBS), Haploreg
(TRANSFAC, JASPAR, and PBM) (Grant et al., 2011; Heinz et al., 2010; Matys et al.,
2006; Portales-Casamar et al., 2010; Ward and Kellis, 2012). In this way, we
established datasets that contain thousands of TF motifs. Among the 76 high LD
SNPs in TSS regions, 42 likely affect known transcription factor binding by altering
65
their REs as revealed by our analyses. These SNPs were located at 82 different TF
motifs’ REs (Table 3-4). We ranked the TFs by the number of SNPs affecting their
REs across the risk loci, and the top motif was for Specificity protein 1 (SP1)
followed by the motif for Early Growth Response 1 (EGR1). REs of SP1 were affected
at 6 TSS regional SNPs from 5 risk loci, and its binding was likely altered by the SNP
alleles (Table 3-4). SP1 is known to be involved in many cellular processes including
cell differentiation, cell growth, apoptosis, response to DNA damage, and chromatin
remodeling, and its expression is up-regulated in breast cancer cells (Liu et al.,
2000). Therefore, it is reasonable to suggest that the perturbed REs by our newly
identified risk SNPs may alter the binding activity of SP1 and thereby change the
expression patterns of the genes, regulated by SP1.
One example of a TSS regional SNP is rs2303696 (at 19p13.11 risk locus),
which likely alters a SP1 RE. This SNP is highly correlated (r
2
= 0.81) with a known
index SNP, rs1353747, which is located 22kb downstream from it. The correlated
SNP is located in the promoter region of Inositol-3-phosphate synthase 1 (ISYNA1)
gene, which catalyzes the de novo synthesis of myoinositol 1-phosphate from
glucose 6-phosphate (Fig. 3-3). Seelan et al (Seelan et al., 2004) reported that E2F1
and SP1 interaction at ISYNA1 gene promoter regulates ISYNA1 expression level.
Additionally, it is expressed in breast tissue and decreases 5-6 fold during invasive
breast carcinogenesis (Table 3-3) (Finak et al., 2008; Rhodes et al., 2004). We
propose here that the SNP may influence the regulatory activity of this gene’s
promoter and thus influencing risk.
66
Additionally, expression quantitative trait locus (eQTL) analyses were
performed to examine whether these TSS regional SNPs are associated with
messenger RNA (mRNA) level by using publicly available datasets (Boyle et al.,
2012; Li et al., 2013). Among 76 high LD SNPs in TSS regions, 30 SNPs are associated
with near gene mRNA levels (all data not shown). As an example, rs832552 (at
MAP3K1 promoter region) changes the expression level of C5orf35 gene in estrogen
receptor positive breast cancer tissues as its allele changes (Table 3.5).
67
Table 3-2. FunciSNP results for TSS regional high LD SNPs.
risk.region chr
# of
biofeat
ure corr.snp.id index.snp.id R
2
distance.f
rom.index
.SNP nearest.lincRNA.ID nearest.Gene
18 5p12 1 rs10462083 rs4415084 0.73 144222 TCONS_00009394 MRPS30
39 10p12.31 5 rs10828247 rs7072776 0.63 -210086 TCONS_00018449 MLLT10
39 10p12.31 4 rs10828248 rs7072776 0.67 -208323 TCONS_00018449 MLLT10
39 10p12.31 4 rs10828249 rs7072776 0.66 -208215 TCONS_00018449 MLLT10
47 11q13.1 3 rs10896064 rs3903072 0.7 57967 TCONS_00019670 EFEMP2
5 1q32.1 4 rs10920362 rs6678914 0.76 -3868 TCONS_00002217 LGR6
39 10p12.31 1 rs11012730 rs7072776 0.56 -221455 TCONS_00018449 MLLT10
2 1p13.2 1 rs11102701 rs11552449 0.91 1440 TCONS_00000279 DCLRE1B
4 1q32.1 1 rs11240753 rs4245739 0.74 -43329 TCONS_00000114 MDM4
5 1q32.1 1 rs12129763 rs6678914 0.98 -3145 TCONS_00002217 LGR6
39 10p12.31 4 rs12251016 rs7072776 0.66 -211024 TCONS_00018449 MLLT10
39 10p12.31 1 rs12253527 rs7072776 0.72 -213118 TCONS_00018449 MLLT10
39 10p12.31 3 rs12770228 rs7072776 0.57 -249308 TCONS_00018449 C10orf114
39 10p12.31 1 rs12779865 rs7072776 0.62 -203780 TCONS_00018449 MLLT10
55 14q13.3 1 rs12883049 rs2236007 0.82 -708 TCONS_00022475 PAX9
63 17q23 4 rs12937360 rs6504950 0.55 -10173 TCONS_00025435 STXBP4
50 12p11 4 rs141089797 rs10771399 0.5 -31718 TCONS_00020365 PTHLH
22 5q11.2 1 rs1466008 rs889312 0.56 181522 TCONS_00009404 SETD9
67 19p13.11 4 rs172032 rs4808801 0.69 62614 TCONS_00026930 ELL
63 17q23 1 rs17745344 rs6504950 0.99 -8152 TCONS_00025435 STXBP4
39 10p12.31 1 rs1802669 rs7072776 0.67 -205146 TCONS_00018449 MLLT10
18 5p12 7 rs1866406 rs4415084 0.73 147430 TCONS_00009394 MRPS30
22 5q11.2 1 rs192249 rs889312 0.59 165630 TCONS_00009404 SETD9
68
Table 3-2: Continued.
22 5q11.2 1 rs194059 rs889312 0.59 165532 TCONS_00009404 SETD9
8 2q31.1 2 rs2117131 rs1550623 0.74 8993 TCONS_00004500 CDCA7
67 19p13.11 2 rs2303696 rs4808801 0.81 -22257 TCONS_00026930 ISYNA1
67 19p13.11 1 rs2385089 rs4808801 0.88 -20707 TCONS_00026930 ISYNA1
22 5q11.2 1 rs252913 rs889312 0.59 163962 TCONS_00009404 SETD9
22 5q11.2 1 rs252914 rs889312 0.59 166266 TCONS_00009404 SETD9
22 5q11.2 5 rs252923 rs889312 0.57 173778 TCONS_00009404 SETD9
22 5q11.2 3 rs252925 rs889312 0.59 171889 TCONS_00009404 SETD9
2 1p13.2 4 rs2797412 rs11552449 0.52 -145423 TCONS_00000279 PHTF1
4 1q32.1 1 rs2926534 rs4245739 0.99 -54661 TCONS_00000114 PIK3C2B
4 1q32.1 1 rs3014606 rs4245739 0.99 -54692 TCONS_00000114 PIK3C2B
59 16q12.1 1 rs3095602 rs3803662 0.99 -5841 TCONS_00024415 TOX3
45 10q26.13 1 rs3135718 rs2981582 0.95 1552 TCONS_00018061 FGFR2
22 5q11.2 1 rs331499 rs889312 0.56 179039 TCONS_00009404 SETD9
22 5q11.2 1 rs33318 rs889312 0.56 176530 TCONS_00009404 SETD9
39 10p12.31 2 rs34853448 rs7072776 0.55 -223530 TCONS_00018449 MLLT10
18 5p12 5 rs3747479 rs4415084 0.73 146647 TCONS_00009394 MRPS30
18 5p12 4 rs3761648 rs4415084 0.73 145564 TCONS_00009394 MRPS30
18 5p12 4 rs3761649 rs4415084 0.73 145649 TCONS_00009394 MRPS30
18 5p12 4 rs3761650 rs4415084 0.73 145841 TCONS_00009394 MRPS30
2 1p13.2 2 rs3761936 rs11552449 0.99 1273 TCONS_00000279 DCLRE1B
24 6p23 4 rs405447 rs204247 0.95 -9424 TCONS_00011269 RANBP9
4 1q32.1 4 rs4245736 rs4245739 0.76 -32637 TCONS_00000114 MDM4
4 1q32.1 4 rs4245737 rs4245739 0.73 -32600 TCONS_00000114 MDM4
22 5q11.2 2 rs43183 rs889312 0.56 175908 TCONS_00009404 SETD9
4 1q32.1 1 rs4951075 rs4245739 0.73 -54396 TCONS_00000114 PIK3C2B
69
Table 3-2: Continued.
4 1q32.1 1 rs4951391 rs4245739 0.81 -34285 TCONS_00000114 MDM4
4 1q32.1 1 rs4951392 rs4245739 0.92 -34225 TCONS_00000114 MDM4
8 2q31.1 5 rs4972554 rs1550623 0.74 8123 TCONS_00004500 CDCA7
39 10p12.31 2 rs5011832 rs7072776 0.55 -250100 TCONS_00018449 C10orf114
16 4q24 6 rs62331150 rs9790517 0.98 -15765 TCONS_00007807 TET2
47 11q13.1 1 rs633800 rs3903072 0.78 55653 TCONS_00019670 EFEMP2
40 10p15.1 4 rs6602322 rs2380205 0.53 43845 TCONS_00018097 ANKRD16
55 14q13.3 1 rs67155512 rs2236007 0.98 -16498 TCONS_00022475 PAX9
40 10p15.1 1 rs685852 rs2380205 0.53 47883 TCONS_00018097 FBXO18
24 6p23 6 rs6905991 rs204247 0.6 -11244 TCONS_00011269 RANBP9
22 5q11.2 1 rs702691 rs889312 0.6 82642 TCONS_00009404 MAP3K1
63 17q23 4 rs7208403 rs6504950 0.55 -9804 TCONS_00025435 STXBP4
63 17q23 3 rs7222197 rs6504950 1 -8972 TCONS_00025435 STXBP4
63 17q23 4 rs7226272 rs6504950 0.55 -10024 TCONS_00025435 STXBP4
55 14q13.3 1 rs72679711 rs2236007 0.98 -16488 TCONS_00022475 PAX9
27 6q25.1 1 rs7763637 rs2046210 0.91 946 TCONS_00011198 CCDC170
66 19p13.11 2 rs8100241 rs2363956 0.94 -1230 TCONS_00026930 ANKLE1
66 19p13.11 3 rs8108174 rs2363956 0.94 -594 TCONS_00026930 ANKLE1
22 5q11.2 1 rs832533 rs889312 0.56 183268 TCONS_00009404 SETD9
22 5q11.2 1 rs832534 rs889312 0.56 181788 TCONS_00009404 SETD9
22 5q11.2 1 rs832535 rs889312 0.56 181450 TCONS_00009404 SETD9
22 5q11.2 1 rs832536 rs889312 0.55 180711 TCONS_00009404 SETD9
22 5q11.2 1 rs832540 rs889312 0.59 167318 TCONS_00009404 SETD9
22 5q11.2 1 rs832552 rs889312 0.61 81966 TCONS_00009404 MAP3K1
8 2q31.1 4 rs930313 rs1550623 0.98 6224 TCONS_00004500 CDCA7
63 17q23 1 rs9895808 rs6504950 1 -8029 TCONS_00025435 STXBP4
70
Table 3-3. Differential expression analysis of the genes, which high LD TSS regional SNPs reside in.
Gene
name Dataset Comparison
Fold
change P-value References
MLLT10 Finak Breast
Invasive Breast Carcinoma Stroma vs.
Normal -2.577 2.52E-18 Finak et al. 2008
EFEMP2 Richardson Breast 2 Ductal Breast Carcinoma vs. Normal -2.712 3.19E-08 Richardson et al. 2006
LGR6 Ma Breast 4
Invasive Breast Carcinoma Stroma vs.
Normal -3.97 3.54E-06 Ma et al. 2009
LGR6 Ma Breast 4
Invasive Ductal Breast Carcinoma Epithelia
vs. Normal -2.991 1.39E-05 Ma et al. 2009
LGR6 Ma Breast 4
Ductal Breast Carcinoma in situ Epithelia
vs. Normal -3.212 3.96E-05 Ma et al. 2009
LGR6 TCGA Breast
Invasive Lobular Breast Carcinoma vs.
Normal -5.796 2.54E-09
Cancer Genome Atlas
Network et al. 2012
LGR6 TCGA Breast Invasive Breast Carcinoma vs. Normal -4.484 4.27E-12
Cancer Genome Atlas
Network et al. 2012
PAX9 TCGA Breast
Invasive Ductal and Lobular Breast
Carcinoma vs. Normal 2.363 6.33E-08
Cancer Genome Atlas
Network et al. 2012
PAX9 TCGA Breast
Invasive Lobular Breast Carcinoma vs.
Normal 4.002 1.69E-13
Cancer Genome Atlas
Network et al. 2012
CDCA7 Finak Breast
Invasive Breast Carcinoma Stroma vs.
Normal -17.312 2.60E-22 Finak et al. 2008
ISYNA1 Finak Breast
Invasive Breast Carcinoma Stroma vs.
Normal -5.773 1.87E-19 Finak et al. 2008
TOX3 TCGA Breast
Invasive Ductal and Lobular Breast
Carcinoma vs. Normal 3.42 1.03E-11
Cancer Genome Atlas
Network et al. 2012
TOX3 TCGA Breast
Intraductal Cribriform Breast
Adenocarcinoma vs. Normal 2.858 3.28E-11
Cancer Genome Atlas
Network et al. 2012
TOX3 TCGA Breast
Mixed Lobular and Ductal Breast
Carcinoma vs. Normal 3.838 5.31E-06
Cancer Genome Atlas
Network et al. 2012
FGFR2 Sorlie Breast 2 Ductal Breast Carcinoma vs. Normal -2.106 3.54E-09 Sorlie et al. 2003
RANBP9 Finak Breast
Invasive Breast Carcinoma Stroma vs.
Normal -18.405 3.07E-19 Finak et al. 2008
71
Table 3-3: Continued.
TET2 Radvanyi Breast
Invasive Ductal Breast Carcinoma vs.
Normal -2.208 3.80E-05 Radvanyi et al. 2005
TET2 Finak Breast
Invasive Breast Carcinoma Stroma vs.
Normal 2.444 5.22E-12 Finak et al. 2008
ANKRD16 TCGA Breast
Invasive Ductal and Lobular Carcinoma vs.
Normal 2.163 4.13E-21
Cancer Genome Atlas
Network et al. 2012
MAP3K1 Richardson Breast 2 Ductal Breast Carcinoma vs. Normal -2.511 1.15E-05 Richardson et al. 2006
C6orf97 TCGA Breast
Intraductal Cribriform Breast
Adenocarcinoma vs. Normal 2.834 3.10E-21
Cancer Genome Atlas
Network et al. 2012
72
Table 3-4. TSS regional high LD SNP motif analysis
corr.snp.id chr start end feature score database
rs8108174 chr19 17393530 17393538 SCL/HPC7-Scl-ChIP-Seq/Homer 5.638456 FindMotif_ChIP-Seq
rs3761650 chr5 44808349 44808361 TATA-Box(TBP)/Promoter/Homer 5.842351 FindMotif_ChIP-Seq
rs4972554 chr2 174221008 174221023
PR(NR)/T47D-PR-ChIP-
Seq(GSE31130)/Homer 6.32266 FindMotif_ChIP-Seq
rs8108174 chr19 17393527 17393535 SCL/HPC7-Scl-ChIP-Seq/Homer 6.47736 FindMotif_ChIP-Seq
rs43183 chr5 56207784 56207792
Smad3(MAD)/NPC-Smad3-ChIP-
Seq(GSE36673)/Homer 6.580625 FindMotif_ChIP-Seq
rs12770228 chr10 21783627 21783639
EBF1(EBF)/Near-E2A-ChIP-
Seq/Homer 6.791801 FindMotif_ChIP-Seq
rs3761649 chr5 44808160 44808172 TATA-Box(TBP)/Promoter/Homer 6.855115 FindMotif_ChIP-Seq
rs43183 chr5 56207782 56207792
Smad4(MAD)/ESC-SMAD4-ChIP-
Seq(GSE29422)/Homer 7.045952 FindMotif_ChIP-Seq
rs43183 chr5 56207784 56207792
Smad2(MAD)/ES-SMAD2-ChIP-
Seq(GSE29422)/Homer 7.189329 FindMotif_ChIP-Seq
rs8108174 chr19 17393527 17393539
MyoD(HLH)/Myotube-MyoD-ChIP-
Seq/Homer 7.34401 FindMotif_ChIP-Seq
rs34853448 chr10 21809410 21809418
Maz(Zf)/HepG2-Maz-ChIP-
Seq(GSE31477)/Homer 7.47877 FindMotif_ChIP-Seq
rs62331150 chr4 106069011 106069019
Maz(Zf)/HepG2-Maz-ChIP-
Seq(GSE31477)/Homer 7.47877 FindMotif_ChIP-Seq
rs10896064 chr11 65641026 65641034
NF1-halfsite(CTF)/LNCaP-NF1-ChIP-
Seq/Homer 7.746746 FindMotif_ChIP-Seq
rs34853448 chr10 21809409 21809417
Maz(Zf)/HepG2-Maz-ChIP-
Seq(GSE31477)/Homer 7.775198 FindMotif_ChIP-Seq
rs62331150 chr4 106069006 106069014
Maz(Zf)/HepG2-Maz-ChIP-
Seq(GSE31477)/Homer 8.227184 FindMotif_ChIP-Seq
rs3747479 chr5 44809160 44809168
NF1-halfsite(CTF)/LNCaP-NF1-ChIP-
Seq/Homer 8.312297 FindMotif_ChIP-Seq
rs12779865 chr10 21829155 21829165
Mef2a(MADS)/HL1-Mef2a.biotin-
ChIP-Seq/Homer/ 8.409607 FindMotif_ChIP-Seq
rs62331150 chr4 106069005 106069013
Maz(Zf)/HepG2-Maz-ChIP-
Seq(GSE31477)/Homer 8.73681 FindMotif_ChIP-Seq
rs405447 chr6 13713091 13713101
Oct4(POU/Homeobox)/mES-Oct4-
ChIP-Seq/Homer 9.154166 FindMotif_ChIP-Seq
73
Table 3-4: Continued.
corr.snp.id chr start end feature score database
rs405447 chr6 13713097 13713107
Oct2(POU/Homeobox)/Bcell-Oct2-
ChIP-Seq/Homer 9.324513 FindMotif_ChIP-Seq
rs62331150 chr4 106069005 106069019 UF1H3BETA -12 Haploreg
rs17745344 chr17 53048309 53048325 Foxl1_1 -3.4 Haploreg
rs685852 chr10 5934608 5934624 Mef2_5 -2.6 Haploreg
rs405447 chr6 13713102 13713088 Pou2f1_5 -2.3 Haploreg
rs10828247 chr10 21822852 21822867 Zfp281 -1.6 Haploreg
rs12779865 chr10 21829158 21829179 HNF1_7 -1.3 Haploreg
rs62331150 chr4 106069008 106069018 Sp1_4 -1.1 Haploreg
rs12251016 chr10 21821919 21821904 MafK -0.6 Haploreg
rs331499 chr5 56210937 56210920 Eomes -0.2 Haploreg
rs5011832 chr10 21782826 21782843 CDP_6 0.1 Haploreg
rs8108174 chr19 17393541 17393524 Ascl2 0.1 Haploreg
rs62331150 chr4 106068997 106069010 Sp1_3 0.2 Haploreg
rs3135718 chr10 123353883 123353868 Hic1_3 0.5 Haploreg
rs252923 chr5 56205665 56205650 AP-2_6 0.8 Haploreg
rs331499 chr5 56210935 56210923 TBX5_1 0.8 Haploreg
rs252923 chr5 56205665 56205651 AP-2_5 1 Haploreg
rs685852 chr10 5934607 5934625 Mef2_6 1 Haploreg
rs832534 chr5 56213655 56213672 LXR_3 1 Haploreg
rs10828249 chr10 21824724 21824747 HMG-IY_2 1.1 Haploreg
rs2797412 chr1 114302961 114302974 CEBPG 1.3 Haploreg
rs252923 chr5 56205649 56205664 AP-2_7 1.5 Haploreg
rs685852 chr10 5934607 5934623 Mef2_3 1.6 Haploreg
rs685852 chr10 5934609 5934625 Mef2_1 1.6 Haploreg
74
Table 3-4: Continued.
corr.snp.id chr start end feature score database
rs9915183 chr17 53047154 53047137 Six_2 1.9 Haploreg
rs10828247 chr10 21822870 21822856 KROX 2.3 Haploreg
rs62331150 chr4 106069003 106069017 Egr_3 2.5 Haploreg
rs11012730 chr10 21811481 21811492 Pou6f1_1 2.7 Haploreg
rs34853448 chr10 21809412 21809398 KROX 5.1 Haploreg
rs62331150 chr4 106069021 106069004 Sp4 5.1 Haploreg
rs62331150 chr4 106069003 106069017 KROX 5.7 Haploreg
rs331499 chr5 56210945 56210921 Brachyury_1 9 Haploreg
rs2385089 chr19 18550428 18550440 Sox_7 <-1.4 Haploreg
rs62331150 chr4 106069008 106069017 CAC-binding-protein <-1.8 Haploreg
rs62331150 chr4 106069020 106069004 Zfp740 <-2.2 Haploreg
rs12770228 chr10 21783625 21783639 Zic_4 <-2.5 Haploreg
rs2385089 chr19 18550426 18550443 Sox_13 <-2.7 Haploreg
rs832552 chr5 56113839 56113865 AIRE_1 <-3.1 Haploreg
rs832534 chr5 56213688 56213670 AP-4_1 <-4.5 Haploreg
rs405447 chr6 13713092 13713110 Pou2f1_1 <-6 Haploreg
rs405447 chr6 13713095 13713107 Pou2f1_8 <-7.9 Haploreg
rs10462083 chr5 44806744 44806730 CEBPB_2 >1.5 Haploreg
rs33318 chr5 56208421 56208409 NF-AT >1.8 Haploreg
rs9895808 chr17 53048451 53048437 CEBPB_2 >1.9 Haploreg
rs12937360 chr17 53046310 53046298 GATA_10 >2.5 Haploreg
rs8108174 chr19 17393522 17393540 Myf_3 >2.6 Haploreg
rs7208403 chr17 53046656 53046670 Irf_10 >3 Haploreg
rs832552 chr5 56113835 56113853 NR3C_9 >3.1 Haploreg
rs7208403 chr17 53046656 53046671 Irf_7 >3.3 Haploreg
75
Table 3-4: Continued.
corr.snp.id chr start end feature score database
rs3747479 chr5 44809168 44809156 GLI >3.5 Haploreg
rs7208403 chr17 53046656 53046670 Irf_6 >3.7 Haploreg
rs7208403 chr17 53046657 53046670 Irf_3 >3.7 Haploreg
rs8108174 chr19 17393538 17393526 HEN1_2 >4.2 Haploreg
rs702691 chr5 56114539 56114515 COMP1 >5.4 Haploreg
rs8108174 chr19 17393538 17393526 Myf_1 >8.4 Haploreg
rs8108174 chr19 17393534 17393523 Myf 8.77587 JASPARCORE2009
rs8108174 chr19 17393537 17393526 Myf 8.77587 JASPARCORE2009
rs12883049 chr14 37132057 37132068 Myf 9.34656 JASPARCORE2009
rs8108174 chr19 17393537 17393526 NHLH1 9.82053 JASPARCORE2009
rs4951392 chr1 204484617 204484624 HOXA5 10.4631 JASPARCORE2009
rs252923 chr5 56205660 56205667 Mafb 10.5409 JASPARCORE2009
rs72679711 chr14 37116284 37116275 SP1 10.926 JASPARCORE2009
rs141089797 chr12 28123358 28123366 TFAP2A 10.9905 JASPARCORE2009
rs62331150 chr4 106069010 106069019 SP1 11.2221 JASPARCORE2009
rs62331150 chr4 106069006 106069015 MZF1_5-13 11.2621 JASPARCORE2009
rs7208403 chr17 53046665 53046674 MAX 11.3473 JASPARCORE2009
rs12770228 chr10 21783628 21783637 EBF1 11.3938 JASPARCORE2009
rs67155512 chr14 37116269 37116278 EBF1 11.3938 JASPARCORE2009
rs2303696 chr19 18548881 18548890 SP1 11.458 JASPARCORE2009
rs12770228 chr10 21783633 21783642 SP1 11.5626 JASPARCORE2009
rs405447 chr6 13713092 13713106 Pou5f1 11.958 JASPARCORE2009
rs405447 chr6 13713105 13713091 Pou5f1 11.958 JASPARCORE2009
rs141089797 chr12 28123362 28123353 NF-kappaB 12.1024 JASPARCORE2009
rs12770228 chr10 21783634 21783643 Klf4 12.1799 JASPARCORE2009
76
Table 3-4: Continued.
corr.snp.id chr start end feature score database
rs3095602 chr16 52580499 52580508 MZF1_5-13 12.204 JASPARCORE2009
rs62331150 chr4 106069010 106069019 Klf4 12.3751 JASPARCORE2009
rs2303696 chr19 18548889 18548880 SP1 12.4991 JASPARCORE2009
rs4951392 chr1 204484617 204484608 SP1 12.4991 JASPARCORE2009
rs141089797 chr12 28123353 28123363 NFKB1 12.6329 JASPARCORE2009
rs34853448 chr10 21809415 21809405 Egr1 12.7278 JASPARCORE2009
rs832533 chr5 56215155 56215144 IRF1 13.8327 JASPARCORE2009
rs34853448 chr10 21809403 21809412 SP1 14.3307 JASPARCORE2009
rs62331150 chr4 106069018 106069009 SP1 14.3307 JASPARCORE2009
rs3014606 chr1 204464147 204464158 c-Myc 0.785 Transfac_ChIPSite
rs3747479 chr5 44809156 44809175 YY1 0.902 Transfac_ChIPSite
rs10828247 chr10 21822856 21822863 E2F-6 0.995 Transfac_ChIPSite
77
Figure 3-3. An example of TSS regional SNPs, rs2303696, is located in the
promoter region of ISYNA1. The genomic browser view is shown of a TSS regional
high LD SNP, rs2303696. First track shows FunciSNP results for TSS region. The
name of correlated SNP (rsnumber – r
2
value) is shown and color-coded to indicate
the number of biofeatures (Fig S2A). The index SNP is shown in black. The bottom
tracks are biofeature tracks, RefSeq genes/mRNA/Pseudogene tracks from UCSC
Genes, common SNPs (version 137), and Linkage Disequilibrium (LD) blocks. LD
block, which is measured by r
2
value in phased CEU is shown. Allele frequencies of
rs2303696 and the location of this SNP in SP1 RE are shown.
Rhie, S.K., Coetzee, S.G., Noushmehr, H., Yan, C., Kim, J.M., Haiman, C.A., and Coetzee,
G.A. (2013). Comprehensive annotation of seventy-one breast cancer risk loci. Plos
One
FunciSNP TSS region
HMEC H3K4me2
HMEC H3K4me3
HMEC H3K9Ac
HMEC FAIRE
HMEC DNaseI
HMEC H3K4me1
HMEC H3K27Ac
UCSC Genes
Linkage
Disequilibrium
Phased CEU (R^2)
10 kb hg19
18,550,000 18,555,000 18,560,000 18,565,000 18,570,000
rs2303696--0.8061
SSBP4
SSBP4
ISYNA1
ISYNA1
ISYNA1
ISYNA1
ISYNA1
ISYNA1
ELL
ELL
ELL
8.2 _
0
122 _
1
90.16 _
0.04
29.48 _
0.04
82.48 _
0.04
30 _
1
12.6 _
0.04
Common SNPs(137)
rs4808801
SP1
5 bases hg19
18,548,885 18,548,890 18,548,895 18,548,900 18,548,875 18,548,880 chr19:
Scale:
19p13.11
rs2303696
Allele Freq
T:54.034%
C:45.966%
CCCACGGAGGCCGTCCTGCCCCGAACCGCACTCACCG
78
Table 3-5 eQTL analyses on TSS regional SNP
Index SNP High LD
SNP
r
2
Target
Gene
eQTL
P-value
Cell type Reference
rs889312 rs832552 0.61 C5orf35 2.46e-6 Estrogen receptor
positive breast
cancer
(Li et al.,
2013)
rs889312 rs252913 0.59 C5orf35 1.36e-8 Estrogen receptor
positive breast
cancer
(Li et al.,
2013)
rs889312 rs331499 0.56 C5orf35 1.16e-
11
Estrogen receptor
positive breast
cancer
(Li et al.,
2013)
rs889312 rs331499 0.56 MIER3 7.75e-6 Estrogen receptor
positive breast
cancer
(Li et al.,
2013)
Rhie, S.K., Coetzee, S.G., Noushmehr, H., Yan, C., Kim, J.M., Haiman, C.A., and Coetzee,
G.A. (2013). Comprehensive annotation of seventy-one breast cancer risk loci. Plos
One
79
3.2.4 921 high LD SNPs were found at enhancers, and the enhancer activity
varied among breast epithelial cells.
Nine-hundred-and-twenty-one correlated SNPs (r
2
≥ 0.5) were annotated in
enhancers (Table 3-6). Among them, 503 SNPs likely affect known transcription
factor binding by altering their REs. By performing in silico searches of TF REs as
described above for TSS regions, we identified 455 different transcription factor REs
where the TF binding will likely to be altered by the risk-correlated SNP. Among the
motifs, we ranked them by the number of SNPs affecting their RE. The top 18 motifs
were selected for further analysis (See below) (Table 3-7). The top motif is for the T-
cell acute lymphocytic leukemia 1 (TAL1; aka SCL); 28 enhancer SNPs at 16 BCa risk
loci were thus identified. The next ranked motifs most often likely affected are in
order, Eomesoderim (EOMES), Foxhead box P1 (FOXP1) and SP1. TAL1 is a
transcription factor that acts in hemopoiesis, anti-apoptosis, angiogenesis, and other
activities (Hansson et al., 2003; Palii et al., 2011; Visvader et al., 1991). It is
expressed in breast tissue and decreases 2-3 fold during invasive breast
carcinogenesis (2012; Rhodes et al., 2004). It also inhibits the expression level of
GATA3, a transcription factor, which inhibits breast cancer metastasis (Ono et al.,
1998; Yan et al., 2010).
One example of a likely TAL1-affecting SNP is rs76969790 at the 5q11 risk
locus (Fig. 3-4). The SNP is highly correlated (r
2
= 0.88) with a GWAS index SNP,
rs1353747, which is located 58kb upstream from it. This correlated SNP is located
in the large intron 10 of the PDE4D gene. PDE4D codes for an enzyme that has 3', 5'-
80
cyclic-AMP phosphodiesterase activity and degrades cAMP, resulting in regulation
of multiple signaling pathways and metabolism (i.e. GPCR and TOR signaling, cAMP
metabolism) (Kim et al., 2010; Persani et al., 2000). The intron 10 of PDE4D gene is
large, 140kb in length and contains several histone marks of enhancers with
nucleosome depletion signals (i.e. DNaseI and FAIRE). The rs76969790 is in close
proximity with FAIRE and DNase1 signals and coincides exactly with enhancer
histone marks (H3K4me1, H3K27ac, H3K4me2, and H3K9ac) (Fig. 3-4).
Additionally, analysis of quantitative trait locus (eQTL), which is an analysis
of the expression level of genes affected by SNP genotypes, was executed as we
described above for TSS regions. Among 921 high LD SNPs in enhancer regions, a
relatively small number of SNPs (65 SNPs) were associated with near gene mRNA
levels (all data not shown). This is unlike the eQTL results in TSS regional high LD
SNPs referred to above. Alternatively, the sample number for eQTL analyses could
have been too low to detect the association signal between risk loci and affected
genes (table 3-8).
To verify the activity of identified enhancers, we performed in vitro enhancer
assays by cloning approximately 1.2kb regions in which the SNPs reside. We
selected the best 11 SNP regions for cloning, based on the number of chromatin
biofeatures (5 or more coinciding biofeatures) and named them breast cancer
enhancer 1 (BCE1) through BCE11 (Table 3-9, 10). By performing dual luciferase
assays in normal and breast cancer cells, we found that 9 out of the 11 regions
retained enhancer activities over background (CT1 and CT2) in either normal or
81
breast cancer cells, or in both cells types (Fig. 3-5). Among 9 active enhancers, BCE4,
5, and 8 had enhancer activities in both normal (HMEC and MCF10A) and breast
cancer cells (MCF7 and MDAMB231). On the other hand, BCE1, 2, and 11 revealed
enhancer activity only in normal HMEC. BCE7 had enhancer activity only in MCF7,
estrogen receptor (ER) positive breast cancer epithelial cells. BCE3 retained
enhancer activity in ER negative breast epithelial cells: MDAMB231, MCF10A, HMEC.
82
Table 3-6. FunciSNP result for enhancer high LD SNPs coincide with 5 or more biofeatures
Risk.
region. chr
No.of.
biofe
ature corr.snp.id index.snp.id R
2
distance.
from.
index.SNP nearest.lincRNA. nearest.Gene
6 2p24.1 8 rs4233763 rs12710696 0.74 3246 TCONS_00004704 MIR4757
6 2p24.1 8 rs4233764 rs12710696 0.74 3252 TCONS_00004704 MIR4757
50 12p11 7 rs10843065 rs10771399 0.83 21501 TCONS_00020365 PTHLH
50 12p11 7 rs788463 rs10771399 0.93 -2432 TCONS_00020365 PTHLH
34 8q24.21 7 rs28759353 rs13281615 0.63 -3255 TCONS_00015171 POU5F1B
56 14q24.1 7 rs1314913 rs2588809 0.83 39166 TCONS_00022775 RAD51B
67 19p13.11 7 rs12981981 rs4808801 0.71 66053 TCONS_00026930 ELL
71 22q13.1 7 rs73169087 rs6001930 0.73 148612 TCONS_00029427 MKL1
71 22q13.1 7 rs73169089 rs6001930 0.73 148620 TCONS_00029427 MKL1
3 1p36.22 7 rs636291 rs616488 0.6 -10118 TCONS_00000152 PEX14
57 14q24.1 7 rs17828907 rs999737 0.82 -23477 TCONS_00022775 ZFP36L1
6 2p24.1 6 rs12710697 rs12710696 0.74 165 TCONS_00004704 MIR4757
34 8q24.21 6 rs4871782 rs13281615 0.92 -3384 TCONS_00015171 POU5F1B
52 12q22 6 rs11836367 rs17356907 0.8 -292 TCONS_00020885 USP44
61 16q12.2 6 rs3751812 rs17817449 0.99 5093 TCONS_00024420 FTO
9 2q31.1 6 rs4519482 rs2016394 0.53 -9612 TCONS_00003049 DLX2
56 14q24.1 6 rs1316014 rs2588809 0.96 38942 TCONS_00022775 RAD51B
56 14q24.1 6 rs2767365 rs2588809 0.57 -53405 TCONS_00022775 RAD51B
45 10q26.13 6 rs10736303 rs2981582 0.62 -17860 TCONS_00018061 FGFR2
45 10q26.13 6 rs11200014 rs2981582 0.8 -17387 TCONS_00018061 FGFR2
71 22q13.1 6 rs16985899 rs6001930 0.73 87168 TCONS_00029427 MKL1
3 1p36.22 6 rs648324 rs616488 0.71 -9768 TCONS_00000152 PEX14
3 1p36.22 6 rs648399 rs616488 0.71 -9730 TCONS_00000152 PEX14
15 3p26.1 6 rs9812100 rs6762644 0.57 21025 TCONS_00005727 EGOT
83
Table 3-6: Continued.
Risk.
region. chr
No.of.
biofe
ature corr.snp.id index.snp.id R
2
distance.
from.
index.SNP nearest.lincRNA. nearest.Gene
30 8p12 6 rs6558136 rs9693444 0.88 13507 TCONS_00014555 C8orf75
57 14q24.1 6 rs79071597 rs999737 0.58 9872 TCONS_00022775 ZFP36L1
50 12p11 5 rs10843066 rs10771399 0.97 21717 TCONS_00020365 PTHLH
50 12p11 5 rs813722 rs10771399 0.74 -3036 TCONS_00020365 PTHLH
33 8q24.21 5 rs11783807 rs11780156 0.64 -105 TCONS_00014856 MIR1208
33 8q24.21 5 rs12546580 rs11780156 0.63 -14128 TCONS_00014856 MIR1208
33 8q24.21 5 rs1902778 rs11780156 0.64 209 TCONS_00014856 MIR1208
33 8q24.21 5 rs56152647 rs11780156 0.64 1280 TCONS_00014856 MIR1208
33 8q24.21 5 rs6470608 rs11780156 0.64 1929 TCONS_00014856 MIR1208
33 8q24.21 5 rs6992491 rs11780156 0.64 3262 TCONS_00014856 MIR1208
51 12p13.1 5 rs11055880 rs12422552 0.67 -3297 TCONS_00020723 ATF7IP
51 12p13.1 5 rs11055882 rs12422552 0.67 -1171 TCONS_00020723 ATF7IP
51 12p13.1 5 rs11055883 rs12422552 0.67 -1125 TCONS_00020723 ATF7IP
13 3p24.1 5 rs5020833 rs12493607 0.73 -12514 TCONS_00006465 TGFBR2
6 2p24.1 5 rs10206829 rs12710696 0.74 697 TCONS_00004704 MIR4757
6 2p24.1 5 rs4425044 rs12710696 0.74 -826 TCONS_00004704 MIR4757
6 2p24.1 5 rs6731836 rs12710696 1 1201 TCONS_00004704 MIR4757
6 2p24.1 5 rs6741887 rs12710696 0.91 -420 TCONS_00004704 MIR4757
53 12q24.21 5 rs10507259 rs1292011 0.71 -15211 TCONS_00020937 TBX3
53 12q24.21 5 rs1391715 rs1292011 0.66 -15287 TCONS_00020937 TBX3
53 12q24.21 5 rs7136795 rs1292011 0.6 -22915 TCONS_00020937 TBX3
53 12q24.21 5 rs7979536 rs1292011 0.7 -14434 TCONS_00020937 TBX3
34 8q24.21 5 rs10087810 rs13281615 0.82 -2888 TCONS_00015171 POU5F1B
34 8q24.21 5 rs10098985 rs13281615 0.52 -599 TCONS_00015171 POU5F1B
84
Table 3-6: Continued.
Risk.
region. chr
No.of.
biofe
ature corr.snp.id index.snp.id R
2
distance.
from.
index.SNP nearest.lincRNA. nearest.Gene
34 8q24.21 5 rs10956363 rs13281615 0.92 -3845 TCONS_00015171 POU5F1B
34 8q24.21 5 rs12541832 rs13281615 0.62 -2447 TCONS_00015171 POU5F1B
34 8q24.21 5 rs13262406 rs13281615 0.62 -1879 TCONS_00015171 POU5F1B
34 8q24.21 5 rs13270266 rs13281615 0.62 -1532 TCONS_00015171 POU5F1B
34 8q24.21 5 rs17465052 rs13281615 0.53 -1538 TCONS_00015171 POU5F1B
34 8q24.21 5 rs56898669 rs13281615 0.92 -4176 TCONS_00015171 POU5F1B
34 8q24.21 5 rs59351655 rs13281615 0.92 -4570 TCONS_00015171 POU5F1B
34 8q24.21 5 rs61044379 rs13281615 0.62 -3664 TCONS_00015171 POU5F1B
61 16q12.2 5 rs11075987 rs17817449 0.65 1794 TCONS_00024420 FTO
61 16q12.2 5 rs1121980 rs17817449 0.9 -4120 TCONS_00024420 FTO
61 16q12.2 5 rs11642015 rs17817449 0.93 -10873 TCONS_00024420 FTO
61 16q12.2 5 rs1558901 rs17817449 0.9 -10180 TCONS_00024420 FTO
61 16q12.2 5 rs1558902 rs17817449 0.93 -9793 TCONS_00024420 FTO
61 16q12.2 5 rs17817288 rs17817449 0.64 -5603 TCONS_00024420 FTO
61 16q12.2 5 rs17817497 rs17817449 0.99 2068 TCONS_00024420 FTO
61 16q12.2 5 rs3751813 rs17817449 0.57 5341 TCONS_00024420 FTO
61 16q12.2 5 rs3751814 rs17817449 0.99 5357 TCONS_00024420 FTO
61 16q12.2 5 rs55872725 rs17817449 0.93 -4244 TCONS_00024420 FTO
61 16q12.2 5 rs56313538 rs17817449 0.99 5467 TCONS_00024420 FTO
61 16q12.2 5 rs62033399 rs17817449 0.99 -2424 TCONS_00024420 FTO
61 16q12.2 5 rs62033400 rs17817449 1 -1579 TCONS_00024420 FTO
61 16q12.2 5 rs62048402 rs17817449 0.93 -10144 TCONS_00024420 FTO
61 16q12.2 5 rs7187250 rs17817449 0.99 -2821 TCONS_00024420 FTO
61 16q12.2 5 rs7193144 rs17817449 0.99 -2681 TCONS_00024420 FTO
85
Table 3-6: Continued.
Risk.
region. chr
No.of.
biofe
ature corr.snp.id index.snp.id R
2
distance.
from.
index.SNP nearest.lincRNA. nearest.Gene
61 16q12.2 5 rs8043757 rs17817449 0.99 83 TCONS_00024420 FTO
61 16q12.2 5 rs8050136 rs17817449 0.99 2908 TCONS_00024420 FTO
61 16q12.2 5 rs8051591 rs17817449 1 3385 TCONS_00024420 FTO
61 16q12.2 5 rs8055197 rs17817449 0.62 -10211 TCONS_00024420 FTO
61 16q12.2 5 rs8057044 rs17817449 0.73 -753 TCONS_00024420 FTO
61 16q12.2 5 rs8063057 rs17817449 1 -934 TCONS_00024420 FTO
61 16q12.2 5 rs9923147 rs17817449 0.9 -11818 TCONS_00024420 FTO
61 16q12.2 5 rs9923233 rs17817449 0.99 5831 TCONS_00024420 FTO
61 16q12.2 5 rs9923312 rs17817449 0.99 6000 TCONS_00024420 FTO
61 16q12.2 5 rs9923544 rs17817449 0.9 -11382 TCONS_00024420 FTO
61 16q12.2 5 rs9931900 rs17817449 0.95 5446 TCONS_00024420 FTO
61 16q12.2 5 rs9933509 rs17817449 0.96 4800 TCONS_00024420 FTO
61 16q12.2 5 rs9935401 rs17817449 1 3471 TCONS_00024420 FTO
61 16q12.2 5 rs9936385 rs17817449 0.99 5802 TCONS_00024420 FTO
61 16q12.2 5 rs9972653 rs17817449 0.99 996 TCONS_00024420 FTO
56 14q24.1 5 rs1555945 rs2588809 0.67 -35630 TCONS_00022775 RAD51B
56 14q24.1 5 rs2588827 rs2588809 0.64 -45987 TCONS_00022775 RAD51B
56 14q24.1 5 rs2588828 rs2588809 0.71 -46947 TCONS_00022775 RAD51B
56 14q24.1 5 rs2588831 rs2588809 0.59 -54206 TCONS_00022775 RAD51B
47 11q13.1 5 rs11227311 rs3903072 0.9 2191 TCONS_00019671 SNX32
18 5p12 5 rs4412123 rs4415084 0.83 213773 TCONS_00009394 MRPS30
18 5p12 5 rs7720551 rs4415084 0.97 1962 TCONS_00009648 MRPS30
67 19p13.11 5 rs11670392 rs4808801 0.51 36639 TCONS_00026930 ELL
14 3p24.1 5 rs55676236 rs4973768 0.57 -24415 TCONS_00005969 NEK10
86
Table 3-6: Continued.
Risk.
region. chr
No.of.
biofe
ature corr.snp.id index.snp.id R
2
distance.
from.
index.SNP nearest.lincRNA. nearest.Gene
71 22q13.1 5 rs17001997 rs6001930 0.87 28838 TCONS_00029427 MKL1
71 22q13.1 5 rs56108505 rs6001930 0.73 148416 TCONS_00029427 MKL1
71 22q13.1 5 rs718193 rs6001930 0.71 147830 TCONS_00029427 MKL1
71 22q13.1 5 rs73167069 rs6001930 0.86 2019 TCONS_00029427 SGSM3
71 22q13.1 5 rs73167097 rs6001930 1 19472 TCONS_00029427 SGSM3
71 22q13.1 5 rs73167098 rs6001930 1 19576 TCONS_00029427 SGSM3
3 1p36.22 5 rs2480785 rs616488 0.8 3043 TCONS_00000152 PEX14
3 1p36.22 5 rs596537 rs616488 0.71 -11506 TCONS_00000152 PEX14
3 1p36.22 5 rs597438 rs616488 0.71 -11710 TCONS_00000152 PEX14
3 1p36.22 5 rs620405 rs616488 0.71 -11421 TCONS_00000152 PEX14
3 1p36.22 5 rs622623 rs616488 0.71 -10958 TCONS_00000152 PEX14
3 1p36.22 5 rs660725 rs616488 0.71 -8671 TCONS_00000152 PEX14
3 1p36.22 5 rs662064 rs616488 0.7 -8964 TCONS_00000152 PEX14
15 3p26.1 5 rs2306881 rs6762644 0.66 11436 TCONS_00005727 EGOT
42 10q22.3 5 rs4980021 rs704010 0.54 -6586 TCONS_00018541 ZMIZ1
42 10q22.3 5 rs704006 rs704010 0.61 1826 TCONS_00018541 ZMIZ1
42 10q22.3 5 rs704007 rs704010 0.61 1679 TCONS_00018541 ZMIZ1
43 10q25.2 5 rs10885402 rs7904519 0.97 -12230 TCONS_00018339 TCF7L2
43 10q25.2 5 rs10885409 rs7904519 0.91 34145 TCONS_00018339 TCF7L2
43 10q25.2 5 rs11196180 rs7904519 0.62 -25898 TCONS_00018339 TCF7L2
43 10q25.2 5 rs11196205 rs7904519 0.91 33120 TCONS_00018339 TCF7L2
43 10q25.2 5 rs35936842 rs7904519 0.65 44845 TCONS_00018339 TCF7L2
43 10q25.2 5 rs4074718 rs7904519 0.89 -25310 TCONS_00018339 TCF7L2
43 10q25.2 5 rs4074720 rs7904519 0.89 -25430 TCONS_00018339 TCF7L2
87
Table 3-6: Continued.
Risk.
region. chr
No.of.
biofe
ature corr.snp.id index.snp.id R
2
distance.
from.
index.SNP nearest.lincRNA. nearest.Gene
43 10q25.2 5 rs7071302 rs7904519 0.91 43600 TCONS_00018339 TCF7L2
37 9q31.2 5 rs500708 rs865686 0.65 36629 TCONS_00015868 KLF4
37 9q31.2 5 rs630611 rs865686 0.84 36929 TCONS_00015868 KLF4
37 9q31.2 5 rs837993 rs865686 0.65 36289 TCONS_00015868 KLF4
37 9q31.2 5 rs837994 rs865686 0.84 38191 TCONS_00015868 KLF4
30 8p12 5 rs10087058 rs9693444 0.91 4926 TCONS_00014555 C8orf75
30 8p12 5 rs12155535 rs9693444 0.87 13897 TCONS_00014555 C8orf75
30 8p12 5 rs12155815 rs9693444 0.87 13920 TCONS_00014555 C8orf75
30 8p12 5 rs12546747 rs9693444 0.94 17723 TCONS_00014555 C8orf75
30 8p12 5 rs4732988 rs9693444 0.68 18168 TCONS_00014555 C8orf75
30 8p12 5 rs6558137 rs9693444 0.93 18034 TCONS_00014555 C8orf75
30 8p12 5 rs9314380 rs9693444 0.9 4777 TCONS_00014555 C8orf75
57 14q24.1 5 rs17755925 rs999737 0.81 -22736 TCONS_00022775 ZFP36L1
57 14q24.1 5 rs17828955 rs999737 0.82 -18037 TCONS_00022775 ZFP36L1
88
Table 3-7. Top 18 TF motifs for high LD SNPs in enhancers
motif_name
number of SNPs within
each TF motif's RE
TAL1 28
Eomes 26
Foxp1 24
SP1 20
Pou2f1 19
MEF2A 18
TBX5 18
E2A 13
EWSR1-FLI1 13
AR 12
CRX 12
Egr1 11
Nkx2.5 11
NKX3-1 11
HNF1A 10
IRF1 10
Olig2 10
Zfx 10
89
Table 3-8. eQTL analyses on enhancer SNPs
SNP
target
Gene Cell type interaction References
rs677029 BANF1 Monocytes cis (Zeller et al., 2010)
rs3760983 FLJ30469 Cortex cis (Myers et al., 2007)
rs3760983 KCNN4 Monocytes cis (Zeller et al., 2010)
rs10421887 KCNN4 Monocytes cis (Zeller et al., 2010)
rs4951409 MDM4 Monocytes cis (Zeller et al., 2010)
rs6679717 MDM4
Monocytes,
Lymohoblastoid cis
(Veyrieras et al., 2008;
Zeller et al., 2010)
rs2628305 COX11 Monocytes cis (Zeller et al., 2010)
rs3786956 FLJ30469 Cortex cis (Myers et al., 2007)
rs3786956 KCNN4 Monocytes cis (Zeller et al., 2010)
rs4252725 MDM4
Monocytes,
Lymohoblastoid cis
(Veyrieras et al., 2008;
Zeller et al., 2010)
rs7532236 MDM4 Monocytes cis (Zeller et al., 2010)
rs1469412 UBA52 Cortex cis (Myers et al., 2007)
rs1469412 ELL Monocytes cis (Zeller et al., 2010)
rs2369244 MDM4
Monocytes,
Lymohoblastoid cis
(Veyrieras et al., 2008;
Zeller et al., 2010)
rs4951393 MDM4 Monocytes cis (Zeller et al., 2010)
rs10442 SSBP4 Lymphoblastoid cis (Veyrieras et al., 2008)
rs11960484 MGC33648 Monocytes cis (Zeller et al., 2010)
rs252905 MGC33648 Monocytes cis (Zeller et al., 2010)
rs7740686 C6orf97 Monocytes cis (Zeller et al., 2010)
rs9904377 COX11 Monocytes cis (Zeller et al., 2010)
90
Table 3-9. Breast Cancer Enhancer (BCE) regions used for luciferase assays.
risk
regi
on
# chr
# of
biofe
ature corr.snp.id bio.feature index.snp.id R
2
nearest.
lincRNA.I
D
nearest.
TSS.Gene
Symbol
clone
name
6 2p24.1 6 rs12710697 Enhancer_HMEC_HMMEnhancer rs12710696 0.74
TCONS_0
0004704 MIR4757 BCE1
5 rs10206829 Enhancer_HMEC_H3K27Ac rs12710696 0.74
5 rs6731836 Enhancer_HMEC_H3K9Ac rs12710696 1.00
5 rs6741887 Enhancer_HMEC_H3K4me1 rs12710696 0.91
5 rs4425044 Enhancer_HMEC_H3K4me2 rs12710696 0.74
4 rs4322801 Enhancer_HMEC_H3K9Ac rs12710696 0.74
Enhancer_HMEC_H3K4me1
Enhancer_HMEC_FAIRE
6 2p24.1 8 rs4233763 Enhancer_HMEC_Uw_Rep2_DNaseI rs12710696 0.74
TCONS_0
0004704 MIR4757 BCE2
8 rs4233764 Enhancer_HMEC_HMMEnhancer rs12710696 0.74
Enhancer_HMEC_H3K9Ac
Enhancer_HMEC_H3K4me2
Enhancer_HMEC_H3K4me1
Enhancer_HMEC_Uw_Rep1_DNaseI
Enhancer_HMEC_H3K27Ac
Enhancer_HMEC_FAIRE
14 3p24.1 5 rs55676236 Enhancer_HMEC_HMMEnhancer rs4973768 0.57
TCONS_0
0005969 NEK10 BCE3
Enhancer_HMEC_H3K9Ac
Enhancer_HMEC_H3K4me2
Enhancer_HMEC_H3K4me1
Enhancer_HMEC_H3K27Ac
91
Table 3-9: Continued.
risk
regi
on
# chr
# of
biofe
ature corr.snp.id bio.feature index.snp.id R
2
nearest.
lincRNA.I
D
nearest.
TSS.Gene
Symbol
clone
name
18 5p12 5 rs4412123 Enhancer_HMEC_H3K4me1 rs4415084 0.83
TCONS_0
0009394 MRPS30 BCE4
Enhancer_HMEC_H3K27Ac
Enhancer_HMEC_H3K4me2
Enhancer_HMEC_H3K9Ac
Enhancer_HMEC_HMMEnhancer
34 8q24.21 7 rs28759353 Enhancer_HMEC_H3K27Ac rs13281615 0.63
TCONS_0
0015171 POU5F1B BCE5
6 rs4871782 Enhancer_HMEC_H3K4me1 rs13281615 0.92
5 rs10087810 Enhancer_HMEC_H3K4me2 rs13281615 0.82
Enhancer_HMEC_H3K9Ac
Enhancer_HMEC_HMMEnhancer
Enhancer_HMEC_FAIRE
Enhancer_HMEC_Uw_Rep1_DNaseI
42 10q22.3 5 rs4980021 Enhancer_HMEC_H3K4me2 rs704010 0.54
TCONS_0
0018541
ZMIZ1 BCE6
Enhancer_HMEC_H3K9Ac
Enhancer_HMEC_Duke_DNaseI
Enhancer_HMEC_H3K4me1
Enhancer_HMEC_HMMEnhancer
92
Table 3-9: Continued.
risk
regi
on
# chr
# of
biofe
ature corr.snp.id bio.feature index.snp.id R
2
nearest.
lincRNA.I
D
nearest.
TSS.Gene
Symbol
clone
name
50 12p11 5 rs813722 Enhancer_HMEC_H3K4me2 rs10771399 0.74
TCONS_0
0020365
PTHLH BCE7
Enhancer_HMEC_H3K9Ac
Enhancer_HMEC_H3K4me1
Enhancer_HMEC_H3K27Ac
Enhancer_HMEC_HMMEnhancer
50 12p11 7 rs788463 Enhancer_HMEC_Uw_Rep2_DNaseI rs10771399 0.93
TCONS_0
0020365
PTHLH BCE8
Enhancer_HMEC_Uw_Rep1_DNaseI
Enhancer_HMEC_H3K4me1
Enhancer_HMEC_H3K27Ac
Enhancer_HMEC_H3K4me2
Enhancer_HMEC_HMMEnhancer
Enhancer_HMEC_FAIRE
57 14q24.1 7 rs17828907 Enhancer_HMEC_H3K4me2 rs999737 0.82
TCONS_0
0022775
ZFP36L1 BCE9
Enhancer_HMEC_H3K9Ac
Enhancer_HMEC_FAIRE
Enhancer_HMEC_H3K27Ac
Enhancer_HMEC_HMMEnhancer
Enhancer_HMEC_H3K4me1
Enhancer_HMEC_Uw_Rep1_DNaseI
93
Table 3-9: Continued.
risk
regi
on
# chr
# of
biofe
ature corr.snp.id bio.feature index.snp.id R
2
nearest.
lincRNA.I
D
nearest.
TSS.Gene
Symbol
clone
name
61 16q12.2 5 rs11642015 Enhancer_HMEC_H3K9Ac rs17817449 0.93
TCONS_0
0024420
FTO BCE10
5 rs1558901 Enhancer_HMEC_H3K27Ac rs17817449 0.90
5 rs1558902 Enhancer_HMEC_HMMEnhancer rs17817449 0.93
5 rs62048402 Enhancer_HMEC_H3K4me1 rs17817449 0.93
5 rs8055197 Enhancer_HMEC_H3K4me2 rs17817449 0.62
61 16q12.2 6 rs3751812 Enhancer_HMEC_H3K4me1 rs17817449 0.99
TCONS_0
0024420
FTO BCE11
5 rs3751813 Enhancer_HMEC_H3K9Ac rs17817449 0.57
5 rs11075987 Enhancer_HMEC_H3K27Ac rs17817449 0.65
5 rs3751814 Enhancer_HMEC_HMMEnhancer rs17817449 0.99
5 rs56313538 Enhancer_HMEC_H3K4me2 rs17817449 0.99
5 rs9931900 Enhancer_HMEC_FAIRE rs17817449 0.95
5 rs9933509
rs17817449 0.96
94
Table 3-10. Oligonucleotide sequences used for BCE cloning.
Region
name
Forward primers (5'-3') Reverse primers (5'-3') Cloned regions
BCE1
TTGGGATATTGGAGGAGCTG TGAATGCTGCACGACTTACC
chr2:19319749-19322150
BCE2
AAAGTCCTGTCACCCCACAG CCATCAGATCCAACCCATTT
chr2:19323777-19324820
BCE3
TTTTCAAGACAATTAATAAGCCCATAA TTGCAATCTGTTGTGGGACA
chr3:27391058-27392178
BCE4 GATACTGGCAAAAGCCCTGA
CCTCTCAAATTGTGAAGATTGG
chr5:44875448-44876553
BCE5 GGGGCATAGAATCAGTGGAGGG CTGGGGGTGAGGAAGCTAACTC chr8:128351899-128353121
BCE6 CTCAGGCCTTACCGATACCA
TCAAGGTCCTGAGCCAGTCT
chr10:80833975-80835085
BCE7
AACTCATTTTCTCAGTACCTCAGTCA TGGGGAAAATTCAGAGTCCA
chr12:28151557-28152263
BCE8 CATCTCTAAAAATGTCTTGGCTACC TGGTGTTACCCAAATCCAAAA chr12:28152319-28152861
BCE9
ACCAGACAGCCTCTCAGGAA ATGGAACCTGGTGCTCAAAG
chr14:69010612-69011886
BCE10 CTTTGGAGCCTGGCTTTATG TTTTCCCCTTCTTGCT chr16:53801475-53803839
BCE11
TGGTTGATCATCACCTCACC GCTGAGTGAGGTCCAAATGC
chr16:53817841-53818941
CT1 GGGGTACCCCAAGTGGAACCAACTGACA GGGGTACCGGCCAAAAGAAAATGGCATA chr8:128497870-128499560
CT2 GGGGTACCGCATGCATTAGGGGAGAAAA GGGGTACCGTAGCTCACAGCCGAGATCC chr8:128512424-128514005
95
Figure 3-4. An example of enhancer SNPs, rs76969790 likely alters a TAL1
response element. The genomic browser view is shown of an enhancer SNP,
rs2303696. First track shows FunciSNP results for enhancers. The name of
correlated SNP (rsnumber – r
2
value) is shown and color-coded to indicate the
number of biofeatures (Fig S2B). The index SNP is shown in black. The bottom
tracks are biofeature tracks, RefSeq genes/mRNA/Pseudogene tracks from UCSC
Genes, common SNPs (version 137), and Linkage Disequilibrium (LD) blocks. LD
block, which is measured by r
2
value in phased CEU is shown. Allele frequencies of
rs76969790 and the location of this SNP in TAL1 RE are shown.
Rhie, S.K., Coetzee, S.G., Noushmehr, H., Yan, C., Kim, J.M., Haiman, C.A., and Coetzee,
G.A. (2013). Comprehensive annotation of seventy-one breast cancer risk loci. Plos
One
FunciSNP Enhancer
HMEC H3K4me2
HMEC H3K4me3
HMEC H3K9Ac
HMEC FAIRE
HMEC DNaseI
HMEC H3K4me1
HMEC H3K27Ac
Common SNPs(137)
Linkage
Disequilibrium
Phased CEU (R^2)
Scale
chr5:
50 kb hg19
58,350,000 58,360,000 58,370,000 58,380,000 58,390,000 58,400,000 58,410,000 58,420,000 58,430,000 58,440,000 58,450,000
PDE4D
PDE4D
PDE4D
PDE4D
PDE4D
PDE4D
PDE4D
PDE4D
9.9 _
0
235 _
1
56.16 _
0.04
31.6 _
0.04
40.44 _
0.04
30 _
1
14.08 _
0.04
UCSC Genes
rs1353747 rs76969790--0.8878
TAL1
10 bases hg19
58,395,670 58,395,675 58,395,680 58,395,685 58,395,690
CAGACTCCACCCAGCTGCATTCTCAACAAAC
chr4:
Scale:
5q11.2
rs76969790
Allele Freq
T: 94.385%
G: 5.615%
96
Figure 3-5. Nine novel enhancers including high LD SNPs were identified in
breast epithelial cells. Eleven enhancer regions, which include FunciSNP identified
BCa high LD SNPs in epigenetically defined enhancers, were cloned and analyzed
using the dual luciferase assays in MCF7 (blue), MDAMB231 (red), MCF10A (orange)
and HMEC (blue). Each luciferase activity was divided by average luciferase activity
of two negative controls, CT1 and 2. The average value of two negative controls is
shown as a horizontal line across the breast cancer enhancers (BCEs) (gray).
Rhie, S.K., Coetzee, S.G., Noushmehr, H., Yan, C., Kim, J.M., Haiman, C.A., and Coetzee,
G.A. (2013). Comprehensive annotation of seventy-one breast cancer risk loci. Plos
One
MCF7 ce
MCF7
MCF7 cel
MCF7 c
MCF7 cyt
MCF7 cel
MCF7 cell
MCF7 c
MCF7 c
MCF7 cell
MCF7 cell
MCF7 c
MCF7 c
MCF7 cyto
MCF7 cyto
MCF7 n
MCF7 n
MCF7 nuc
MCF7 nuc
HMEC cel
HMEC cell
MCF7 cyt
MCF7 cyt
MCF7 nuc
HMEC cel
MCF7 cel
HMEC ce
MCF7 cel
97
3.3 Discussion
3.3.1 Most TFs, which likely bind to high LD SNPs at enhancers
We found 1,005 potentially functional high LD SNPs at 71 risk loci. The majority was
located in enhancer regions. By performing in silico searches of transcription factor
REs, we further identified top 18 motifs (REs) affected by 10 or more enhancer SNPs
(Table 3-6). Interestingly, most of the 18 transcription factors directly interact with
each other (defined as protein-protein and/or protein specific gene promoter
interactions) (thus, 34 direct interactions among 15 proteins) (Fig. 3-6A).
Additionally, we identified that most of the transcription factors are involved in
breast cancer tumorigenesis by establishing 51 direct interactions (defined as above)
between the group containing the top 18 transcription factors and another group
containing 52 known BCa tumorigenesis-related molecules (Fig. 3-6B). The number
of relationships increases if indirect relationships between the groups and
interactions within each group were included.
We further highlight 32 genes, which contain functional SNPs either in their
exons or within their TSS regions. Using these 32 genes plus the BRCA2 gene, in
which rs11571833, a nonsense index SNP resides, we executed an ingenuity
pathway analysis (IPA, www.ingenuity.com) to examine interactions among these
genes and/or their protein products using data from published papers. We found
only one direct interaction and two indirect interactions among the 33 genes
(Albers et al., 2005; Eswarakumar et al., 2002; Lemonnier et al., 2001). We also
98
analyzed the relationships among the 18 TFs (see above) and proteins coded by the
33 genes. Although some of genes are understudied and currently lack information
about their functions and locations, we observed that a number of proteins interact
with each other: 12 direct and 2 indirect relationships (Fig. 3-6C). Additionally,
these TFs mediate interactions among 33 BCa risk genes/proteins. For instance, SP1
binds to the promoter (-329bp to 324bp) of the FOSL1 gene, whereas SP1 binds
directly to another BCa risk protein, BRCA2 (Adiseshaiah et al., 2003; Tapias et al.,
2008). BRCA2 binds to several fragments of the AR protein (1aa-556aa, 627aa-
919aa) (Shin and Verma, 2003). In turn, AR binds to RANBP9 and CASP8 (Qi et al.,
2007; Rao et al., 2002; Wellington et al., 1998). In prostate cancer cell lines, RANBP9
increases activity of AR protein. CASP8 protein level increases cleavage of AR
proteins (Evert et al., 2000; Rao et al., 2002; Tarlac and Storey, 2003; Wellington et
al., 1998). It is clear that functional networks of identified genes exist that may affect
BCa risk.
Recently, Cowper-Sal Lari et al (Cowper-Sal Lari et al., 2012) reported that
FOXA1 binding to high LD SNPs in BCa are more frequent than other transcription
factors. However, that study was performed using a limited number of transcription
factors (ChIP-seq data in ER positive breast cancer cells of only 16 transcription
factors) and a relatively small number of SNPs (obtained from the limited Hapmap
datasets) in only 44 BCa risk loci. For an updated, more comprehensive and
unbiased analysis, we assessed high LD SNPs in TF REs within TSS regions and
99
enhancers using the 1000 genomes database, which contains not only rare variants
but also un-tagged SNPs from the Hapmap project (2010; Altshuler et al., 2010;
Reich et al., 2001). We further interrogated thousands of TF motifs in known
datasets (Grant et al., 2011; Heinz et al., 2010; Matys et al., 2006; Ward and Kellis,
2012). Our top potentially affected TFs were SP1 and TAL1, at TSS regions and
enhancers, respectively. SNPs in FOXA1 REs were ranked only 51
th
in our priority
list. The difference between the Cowper-Sal Lari et al (Cowper-Sal Lari et al., 2012)
study and the work reported here, is likely due to the limited number of TFs tested
and the limited number of SNPs assessed in the Cowper-Sal Lari et al study.
Recently, it was reported that the average number of distal elements
interacting with a TSS is 3.9, and the average number of TSSs interacting with a
distal element is 2.5 (Ecker et al., 2012). Another study on genome structure also
revealed that active chromatin regions form inter-chromosomal contacts and blocks
of each chromosome interact with blocks in different chromosomes, composing a
spatial nuclear structure (Kalhor et al., 2012). Therefore, a large number of
chromosomal contacts and interactions likely are orchestrated by the three-
dimensional organization of the nucleus. In order to find potential targets of
regulatory elements in enhancers besides the eQTL analyses we performed, three
different methods can be suggested for future study. First, looping interactions
between enhancer and target genes can be found by 3C (chromatin chromosome
capture) assays. To scan the interactions genome-wide, 3C-seq, 4C-seq, and 5C-seq
100
may be performed (Ecker et al., 2012). Also ChIA-PET and HiC-seq may be employed
to identify the looping interactions although the resolution of these methods is too
low to capture precise contacts (Li et al., 2012; van Berkum et al., 2010). Secondly,
transgenic mouse modeling by knocking in conserved regulatory elements may be
conducted to identify target genes and their activity (Ting et al., 2012). Lastly,
targets of regulatory elements can be identified in vivo and in vitro by knock-out
DNA method such as transcription activator-like effector nucleases (TALEN) (Bedell
et al., 2012).
Newly identified regulatory elements, coinciding with high LD SNPs are not
necessarily targeting protein-coding genes. For instance, they can interact with long
noncoding RNAs (lncRNA) (Orom et al., 2010). Each SNP identified by FunciSNP
(Coetzee et al., 2012) was further annotated by us for proximity to the nearest
known lncRNA (Table 3-2 and 3-5).
101
Figure 3-6. Most TFs, likely bind to high LD SNPs at enhancers, are involved in
BCa tumorigenesis. (A) Most of the top 18 motifs affected by 10 or more high LD
enhancer SNPs directly interact with each other. Each circle colored in yellow
represents each TF. Top TF is TAL1, and the next TF is EOMES. Each ranked TF is
located in a clockwise direction. (B) Direct interactions between the group,
containing the top 18 TFs (yellow circle) and another group, containing 52 known
BCa tumorigenesis-related molecules (red square) are shown. (C) 32 proteins coded
by genes, which contain functional SNPs either in their exons or within their TSS
regions plus the BRCA2, in which a nonsense index SNP resides, are laid out using
subcellular localization annotation. Each molecule is shown in green hexagon.
Interactions among these 33 genes/proteins are shown in green arrow (direct: solid
line, indirect: dashed line). The interactions between the group, containing 33
genes/proteins and another group, containing top 18 TFs are shown in black arrows
(direct: solid line, indirect: dashed line).
102
Figure 3-6: Continued.
B A
C
103
3.4 CONCLUSIONS
Since 2005, over 1,600 variants have been identified at p-value ≤ 5x10
-8
for
over 250 traits. Most of the identified index SNPs from GWASs are in noncoding DNA
regions, making the assignment of functionality difficult (Altshuler et al., 2010).
Despite the controversy surrounding the utility of GWAS, post-GWAS identification
of mechanisms have become valuable for the identification of genomic targets of
diseases. Here, we provide functional rationale for 2 SNPs in exons, 42 SNPs in TSS
regions and 503 SNPs in putative enhancers at 60 of the 71 BCa risk loci. These
annotations are based on the assumption that functional alleles are common. This
short list out of more than 320,000 correlated risk SNPs can be used in follow-up
fine-mapping and functional studies on identifying disease-causing SNPs.
104
Chapter 4. One example, BCE5 enhancer in 8q24
4.1 Introduction
4.1.1 8q24 risk loci
The 8q24 region falls in a gene-desert, which has no annotated gene closer than
100kb from several risk loci identified for many cancers (see below). This
chromosomal region is of great interest because the risk SNPs, for not only breast
cancer but also colorectal, prostate, and ovarian cancers, were identified in this
region by GWASs (Beebe-Dimmer et al., 2008; Freedman et al., 2006; Ghoussaini et
al., 2008; Gruber et al., 2007; Gudmundsson et al., 2007; Haiman et al., 2007; Meyer
et al., 2009; Schumacher et al., 2007; Tomlinson et al., 2007; Yeager et al., 2007;
Zanke et al., 2007). These risk SNPs were found in five distinct haplotype blocks in
8q24 region. Among them, three blocks were associated with prostate cancer risk,
one was associated with a combination of prostate, colorectal and ovarian cancer
risk, and the other one was associated with breast cancer risk only (Ghoussaini et
al., 2008). Since the proto-oncogene c-MYC is the closest and most likely involved in
risk, functional studies suggest that these loci are enhancers, which all interact with
the c-MYC promoter and thus likely regulate its expression level. The best example
of a risk SNP for prostate and colorectal cancer is, rs6983267 (Jia et al., 2008;
Pomerantz et al., 2009). However, the mechanisms of carcinogenesis at other LD
blocks of the 8q24 region for other cancer types, including breast cancer is not
known.
105
4.2 Results and Discussion
4.2.1 8q24 is in open chromatin region likely representing an enhancer
nursery
By analyzing BCa risk regions in depth, I noticed a tendency that some loci contain
more nucleosome-depleted regions than other risk loci. Therefore, I hypothesized
that open chromatin regions are aggregated at certain regions in the genome. To
test this hypothesis, we evaluated the number of FAIRE/DNaseI peaks in a 1Mb
windows where each risk index SNP resides in 12 breast cancer risk loci >50kb from
a nearest gene, and divided the number of peaks by the total number of peaks in the
genome. Interestingly, we found that 8q24.21 and 9q31.2 regions contained
relatively more nucleosome-depleted regions in both normal and breast cancer cells
and we may view these loci as ‘open chromatin nursery regions’ (Fig. 4-1). Such
open chromatin nursery regions may contain up to seven times more nucleosome-
depleted regions relative to other breast cancer risk regions. Overall, the 8q24.21
region had the largest number of open chromatin regions in both normal and breast
cancer cells, including the largest number of correlated SNPs coinciding with
enhancer histone marks.
106
Figure 4-1. Breast cancer risk correlated SNPs were found in selected open
chromatin nursery regions in genome. (A) Number of open chromatin regions in
a 1Mb window of twelve breast cancer risk index SNPs, which are located in
intergenic regions (>50kb from nearest gene) relative to the number of open
chromatin regions in total genome were analyzed using galaxy overlapping pieces of
interval tools, and open chromatin region nurseries were identified. (B, C) Breast
cancer risk index SNPs from GWAS (black circle) and breast cancer risk correlated
SNPs coincide with enhancer histone marks (light blue circle: correlated SNPs with
r
2
≤.5, Dark blue circle: correlated SNPs with r
2
>.5) were annotated in chromosome 8
(B) and 9 (C) along with FAIRE signals (red: MDAMB231 significant peaks (p-value <
10
-9
), orange: MDAMB231 subpeaks which include all of FAIRE peaks identified by
findPeaks from HOMER, dark green: HMEC significant peaks (p-value < 10
-9
), light
green: HMEC subpeaks which include all of FAIRE peaks identified by findPeaks
from HOMER).
Figure 4-1: Continued.
1: Continued.
107
108
4.2.2 BCE5 is defined as a biologically functional enhancer in breast cells
Among the 12 regions with enhancer activity identified in chapter 3, we decided to
investigate BCE5 in detail (Fig. 3-5). It is located at 8q24, which contains ‘open
chromatin nursery region’ (see above) as well as the largest number of correlated
SNPs found within biofeatures (see chapter 3). Additionally, no detectable transcript
was found in breast epithelial cells near 8q24 breast cancer risk region (Jia et al.,
2009) (Fig. 4-2). First, we characterized the chromatin biofeatures of 12 correlated
SNPs identified in 8q24.21 region by FunciSNP (Table 3-5). Near these 12 correlated
SNPs, two big regions with enrichment of H3K4me1 and H3K27Ac were evident. In
order to identify possible enhancers among them, we cloned 5 additional regions,
which contain enhancer histone marks, designated HKM1 through 5 (Fig. 4-3A).
Then, by performing in vitro dual luciferase assays, we measured each region’s
potential enhancer activity in HMEC and MDAMB231 cells as before (Fig. 4-3B, C).
Unlike the BCE5 region, no significant enhancer activity was evident in the HKM1
through -5 regions. We also checked the haplotypes containing 12 other correlated
SNPs in the 8q24 region (Fig. 4-3D, E). Despite the fact that some SNPs in HKM 1
through 5 regions were highly correlated to the risk index SNP, rs13281615, the
SNPs were not located in potential enhancers as revealed by chromatin features.
Therefore, only the BCE5 region had in vitro enhancer activity, as well as a FAIRE
peak at its center and had correlated risk SNPs.
109
Consequently, we performed site-specific FAIRE-qPCR to map the region of
nucleosome depletion, by using 5 different primer sets in and around the BCE5
region (Fig. 4-4, Table 4-1). Among 5 different primer sets: BCE5-1 through -5, a
FAIRE signal was clearly highest with the BCE5-3 primer set. Therefore, this result
narrowed down the location of active regulatory element to chr8:128352000-
128352800 within BCE5.
110
Figure 4-2. Transcript level near BCE5 enhancer in 8q24 risk region. Four SNPs
exist near BCE5 enhancer: rs4871782, rs28759353, rs57549811 and rs10087810.
RNA transcript level near BCE5 enhancer was measured in HMEC (green) and
MDAMB231 (red) using 4 different primer sets surrounding SNPs locations. GAPDH
mRNA primer set was used as a positive control.
111
Figure 4-3. Enhancer assays in 8q24 region. (A) HKM1 through 5 regions with
various chromatin marks in 8q24, breast cancer risk loci as a screen shot from the
UCSC genome browser. FunciSNP Results: breast cancer risk correlated SNPs
identified by FunciSNP (red) and risk index SNP (black). Breast cancer risk
correlated SNPs with 5 or more biofeatures were in bold. Five regions (HKM1
through 5) which include FunciSNP identified breast cancer risk correlated SNPs in
8q24 were cloned to perform dual luciferase assays in HMEC (B) and MDAMB231
cells (C). Each luciferase activity was divided by average luciferase activity of
negative control, CT1. Linkage Disequilibrium (LD) plot and haplotypes of twelve
FunciSNP identified breast cancer risk correlated SNPs with breast cancer risk index
SNP, rs13281615 are shown (D and E).
112
Figure 4-4. Site-specific FAIRE assays near BCE5 region. Site-specific FAIRE
assays were performed near BCE5 enhancer region using 5 different primer sets (A)
in HMEC (B), MDAMB231 (C), MCF10A cells (D) and MCF7 cells (E).
SNP
0
2
4
6
8
BCE5-1 BCE5-2 BCE5-3 BCE5-4 BCE5-5
% input
DNA region
HMEC FAIRE near BCE5
0.00
0.50
1.00
1.50
2.00
2.50
BCE5-1 BCE5-2 BCE5-3 BCE5-4 BCE5-5
% input
DNA region
MDAMB231 FAIRE near BCE5
BCE5
A
B C
0.00
0.50
1.00
1.50
2.00
BCE5-1 BCE5-2 BCE5-3 BCE5-4 BCE5-5
% input
DNA region
MCF10A FAIRE near BCE5
0.00
0.50
1.00
1.50
2.00
BCE5-1 BCE5-2 BCE5-3 BCE5-4 BCE5-5
% input
DNA region
MCF7 FAIRE near BCE5
D E
BCE5-1 BCE5-2 BCE5-3 BCE5-4 BCE5-5
113
Table 4-1. Oligonucleotide sequences used for cloning and qPCR
HKM cloning Forward primers (5'-3') Reverse primers (5'-3') Cloned regions
HKM 1 TCCCCGCGGCTTTGTCCTTGGCTTTCCAG TCCCCGCGGTTCAGCTTGCTCTGCTCTCA chr8:128343429-128345758
HKM 2 TCCCCGCGGGGAAGCAGGGTGTATTGTGG TCCCCGCGGTATGCCCCTAGCTCCATCTG chr8:128350867-128351906
HKM 3 TCCCCGCGGCCCAGGCCTCAAGTTATCTG TCCCCGCGGTCTTGTCTCTCTTTAACCCATGC chr8:128353117-128354806
HKM 4 TCCCCGCGGCACTGGGAAAACCAAGGAGA TCCCCGCGGGTTAGTGGGCCAAATCCATC chr8:128354814-128356139
HKM 5 TCCCCGCGGTGGTGGTGTGTGCCTGTAGT TCCCCGCGGATCTTGCTGGATTCCCACAC chr8:128341914-128343467
8q24CT1 GGGGTACCCCAAGTGGAACCAACTGACA GGGGTACCGGCCAAAAGAAAATGGCATA chr8:128497870-128499560
8q24CT2 GGGGTACCGCATGCATTAGGGGAGAAAA GGGGTACCGTAGCTCACAGCCGAGATCC chr8:128512424-128514005
8q24CT3 TCCCCGCGGTGGTGGTGTGTGCCTGTAGT TCCCCGCGGATCTTGCTGGATTCCCACAC chr8:128341914-128343467
FAIRE-qPCR Forward primers (5'-3') Reverse primers (5'-3') Cloned regions
BCE5-1 TGAGATCATAAGGTAAAAGGGGACA TCTTGGTTTTTCTCATATTTTTCTGG chr8:128351124-128351272
BCE5-2 CCAATCTCCTGCAGAGGTGCT GACTTGAGCCCAGGTCACCA chr8:128351512-128351662
BCE5-3 TGCAAGTGGAATGTACTGAAATCAA TCTGGCCCTTTAGCTCTTTGG chr8:128352446-128352595
BCE5-4 TGAATAACCTGGGTCCCAAA CCCAGAAGAACTTGGCATAGAGG chr8:128352823-128353029
BCE5-5 TCACCAATCACAGGGGAAGC CACTGGAATTCACAATAAGACATGAA chr8:128353161-128353287
114
Table 4-1: Continued.
cDNA-qPCR Forward primers (5'-3') Reverse primers (5'-3')
GAPDH mRNA TGAAGGTCGGAGTCAACGG AGAGTTAAAAGCAGCCCTGGTG
rs4871782 TTCTCAGGAGCAAGCTCAGAC CACTTCTCTCACCTTCCAGCTT
rs28759353 GCTGGAAGGTGAGAGAAGTGA TCAGTACATTCCACTTGCATACAG
rs57549811 GTGAGTAGGTGGCAGAGGTG TGTCTGGCTTACTTCTTCCTCTTC
rs10087810 CGGACAGAGAGGAGAAGAGG TGAGGATAGGGTAGAGGTGAAGA
115
4.2.3 Differential nucleosome depletion and functionality of SNPs, rs28759353
and rs10087810.
FunciSNP analysis identified three correlated SNPs (r
2
>0.5) within BCE5
(rs4871782, rs28759353, and rs10087810) (Fig. 4-5A). In order to determine
whether the alleles of the three SNPs participated in nucleosome displacement (i.e.
using FAIRE signals), we performed allele-specific FAIRE using an HMEC cell strain,
which was heterozygous for the three SNPs. Allele-specific FAIRE is performed by
sequencing across the interested SNP region of FAIRE samples and comparing the
sequence of peaks with that of input DNA (as control). For rs4871782, the FAIRE
sample contained about the same relative amount of the two alleles compared to
input. In contrast, for rs28759353, the FAIRE sample had clearly more of the G allele,
compared to input. Similarly, for rs10087810, more of the T allele was detected in
the FAIRE sample compared to input (Fig. 4-5B). Note the high fidelity of the
sequence reactions between the FAIRE and input samples as reflected by the nearly
identical relative sizes of the peaks surrounding the SNP. These results strongly
indicate that the rs28759353, G allele and rs10087810 T allele (i.e. the GT
haplotype), had a more open chromatin structure than the other alleles and perhaps
consequently a higher enhancer activity, which we tested next (see below).
We analyzed the haplotype of rs28759353, rs4871782 and rs10087810 SNPs
relative to the risk index SNP, rs13281615(Easton et al., 2007). The GGTA haplotype
(Fig. 4-5C) had lower risk of breast cancer because it correlated with the risk allele
of rs13281615. The other haplotypes and relative percentages are shown in
116
Caucasians. In order to relate allele-specific FAIRE results to enhancer activity, we
next performed allele-specific in vitro enhancer assays by generating plasmids,
which contain different versions of each SNP in BCE5 region (Fig. 4-5D, E). Overall,
we found that the risk versions of each SNP independently had lower enhancer
activities. These results together with the allele specific FAIRE data indicate that
rs28759353 and rs10087810 are functional SNPs, with the risk allele having more
nucleosome depletion and higher enhancer activity in the in vitro assay. Although
we do not understand the disparity between the two assays for SNP rs4871782 it is
probably related to the sensitivity of the two assays for this particular SNP with
allele-specific FAIRE being less sensitive. However we can conclude that the risk
allele has lower enhancer activity and therefore likely affect candidate tumor
suppressor gene(s). Therefore c-Myc is unlikely the target gene of this enhancer (see
below).
117
Figure 4-5. rs28759353 and rs10087810 at BCE5 enhancer are functional
SNPs for breast cancer risk. (A) The location of three SNPs (blue: rs4871782,
green: rs28759353, black: rs10087810) at BCE5 and breast cancer risk index SNP
(red: rs13281615) in 8q24.21 region. (B) Allele-specific FAIRE assays were
performed near three candidate SNPs for breast cancer risk at BCE5. Sequence
results of Input DNA and FAIRE DNA are shown. The colors of the nucleotides from
DNA sequencing: blue is C, green is A, black is G and red is T. Sequences near SNP
were shown in a double-strand DNA (bottom). (C) Linkage Disequilibrium (LD) plot
(r
2
) and haplotypes of three SNPs with breast cancer risk index SNP, rs13281615
are shown(Easton et al., 2007). Allele-specific in vitro dual luciferase assays were
performed in HMEC (D) and MDAMB231 cells (E). The Analysis of variance
statistical test (ANOVA) was used to confirm the difference and two-side p-values
between alleles were calculated using the student t-test.
118
Figure 4-5: Continued.
119
4.2.4 BCE5 enhancer activity and its interaction with the c-MYC oncogene are
cell type specific.
An obvious question is what gene is the target of the BCE5 enhancer? Studies
showed that several 8q24 gene desert regions looping to the oncogene, c-MYC some
hundreds kb telomeric (Ahmadiyeh et al., 2010; Jia et al., 2009; Pomerantz et al.,
2009). Meyer et al alternatively reported that 8q24 prostate cancer risk region acts
as a prostate cancer specific enhancer, interacting with PVT1 gene (Meyer et al.,
2011). In order to test the enhancer activity of BCE5 region in several breast
epithelial cells, we performed dual luciferase (Fig. 3-5) and FAIRE assays (Fig. 4-4).
Interestingly, the in vitro enhancer activity of the BCE5 region was detected in all of
breast epithelial cells although the level of activity was different among the cell
types. On the other hand, the level of nucleosome depletion at BCE5 enhancer region
was very different among different breast epithelial cells. The BCE5 region is
nucleosome depleted and surrounded with enhancer chromatin marks in normal
breast epithelial cells (HMEC and MCF10A) and TN breast cancer cells (MDAMB231)
but not in ER positive breast cancer cells (MCF7) (Fig. 4-4).
It is reported that c-MYC interacts with breast cancer risk region in 8q24 in
ER positive breast cancer cells (MCF7), but not in a normal breast epithelial cells
(MCF10A) (Fig. 4-6) (Ahmadiyeh et al., 2010). The SNP alleles with lower BCa risk
(i.e. relatively protecting) within SRBCE5 are G for rs28759353, T for rs10087810
and A for rs13281615. This haplotype, in addition, had higher enhancer activity in
our in vitro assays and was more nucleosome depleted (Fig. 4-5). The alleles at two
120
functional SNPs, rs28759353 and rs10087810 may alter the BCE5 enhancer activity
as well as the target gene expression level by regulating transcription factor binding
affinity (e.g. GATA3, FOXP1, NANOG, EWSR-FLI1, NKX2). Therefore, we suggest that
rather than c-MYC, perhaps a tumor suppressor gene is the most likely target of the
BCE5 enhancer in normal breast epithelial cells and TN breast cancer cells but not in
ER positive breast cancer cells.
121
Figure 4-6. 8q24 region and 3C interaction frequency of risk loci with MYC. x
axis: genomic position (not drawn to scale); y axis: 3C interaction frequency ±1 SEM
of the constant fragment with each of the target fragments including MYC,
normalized to a 3C interaction within a housekeeping gene, FAM32A. (A) Schematic
diagram of the 8q24 risk loci. The closest genes as well as the locations of the
constant fragments (red ticks) and target fragments (black ticks) are shown. (B)
Normalized 3C interaction frequency of the breast cancer risk locus with MYC in a
normal breast epithelial line, MCF10A. (C) Normalized 3C interaction frequency of
breast cancer risk locus (yellow line plot) and prostate cancer region 2 (blue line
plot) in a breast cancer cell line, MCF-7 (Ahmadiyeh et al., 2010).
Figure 4-6: Continued.
Ahmadiyeh, N., Pomerantz, M.M., Grisanzio, C., Herman, P., Jia, L., Almendro, V., He,
H.H., Brown, M., Liu, X.S., Davis, M.
cancer risk loci show tissue
Acad Sci U S A 107, 9742
: Continued.
Ahmadiyeh, N., Pomerantz, M.M., Grisanzio, C., Herman, P., Jia, L., Almendro, V., He,
H.H., Brown, M., Liu, X.S., Davis, M., et al. (2010). 8q24 prostate, breast, and colon
cancer risk loci show tissue-specific long-range interaction with MYC. Proc Natl
, 9742-9746.
122
Ahmadiyeh, N., Pomerantz, M.M., Grisanzio, C., Herman, P., Jia, L., Almendro, V., He,
(2010). 8q24 prostate, breast, and colon
range interaction with MYC. Proc Natl
123
4.3 Summary
In this chapter, the BCE5 enhancer was determined as being breast cancer
cell type specific. Furthermore, BCE5 enhancer activity also varied among other
cancers; for example, its activity was very high in PC3 (AR negative prostate cancer
cells) and HCT116 (colorectal cancer cells), but low in LNCaP (AR positive cancer
cells)(Jia et al., 2009). Three SNPs, rs28759353, rs10087810, and rs4871782 were
identified as functional variants that alter its enhancer activity. Although the odds
ratio for rs28759353, rs10087810, and rs4871782 should still be determined by
additional epidemiological studies, they represent the best functional and even
causal candidates, which are highly correlated with risk index SNP, rs1328165 in
8q24 region. Based on the 3C data and haplotypes of these alleles to the risk index
SNP, we speculate that a tumor suppressor gene is the most likely target of the BCE5
enhancer in normal breast epithelial cells and TN breast cancer cells but not in ER
positive breast cancer cells.
124
Chapter 5. Functional analyses of GWASs
In this chapter I’m summarizing work I did in collaboration with large consortia. It
resulted in three multi-author publications (Garcia-Closas, 2013; Monda et al., 2013;
Siddiq et al., 2012) as well as one unpublished project. My contributions to theses
studies were the use of FunciSNP to annotate possible functional SNPs, which
correlated with the novel index SNPs and making use of the data from the
1000genomes project (Coetzee et al., 2012). As elaborated below interesting
functional hypotheses were discerned and raised for future follow-up. In each of the
following four sections I will briefly summarize the main findings of each project,
followed by my functional analyses.
5.1 Two novel breast cancer susceptibility loci at 6q14 and 20q11
5.1.1 GWAS of breast cancer - two novel breast cancer susceptibility loci at
6q14 and 20q11 (Siddiq et al., 2012)
In addition to the several GWAS of breast cancer (See chapter 3), meta-analyses of
GWAS of ER-negative disease were conducted by a large consortium. 4,754 ER-
negative cases and 31,663 controls from three GWAS were combined in a meta-
analysis. They were the NCI Breast and Prostate Cancer Cohort Consortium (BPC3)
(2,188 ER-negative cases; 25,519 controls of European ancestry), Triple Negative
Breast Cancer Consortium (TNBCC) (1562 triple negative cases; 3,399 controls of
European ancestry) and African American Breast Cancer Consortium (AABC) (1,004
125
ER-negative cases; 2,745 controls). Additional genetic variants for ER-negative
breast cancer were thus identified (Siddiq et al., 2012). By replication of the study
using 86 SNPs at P ≤ 1x10
-5
in an additional 11,209 breast cancer cases (946 with
ER-negative disease) and 16,057 controls of Japanese, Latino and European
ancestry, two novel breast cancer susceptibility loci, rs17530068 at 6q14 (OR=1.16,
P=2.5x10
-7
) and rs2284378 at 20q11 (combined two-stage OR=1.16; P=1.1x10
-8
)
were determined as risk SNPs (Siddiq et al., 2012).
5.1.2 Functional annotation of two novel breast cancer susceptibility loci,
6q14 and 20q11
As stated above, my part of this project was to annotate functionality. I used
FunciSNP for this purpose. The variant rs2284378, a novel breast cancer risk SNP, is
located in 20q11 where the ASIP, RALY and EIF2S2 genes are located. Agouti
signaling protein (product of the ASIP gene) was first described to inhibit
melanogenesis in human melanocytes(Stranger et al., 2012). ASIP is a melanocortin
1 receptor (MC1R) ligand that antagonizes the function of the transmembrane
receptor (Scherer and Kumar, 2010). RALY encodes a member of the heterogeneous
nuclear ribonucleoprotein (hnRNP) and it may play a role in pre-mRNA splicing and
in embryonic development (Rhodes et al., 1997). EIF2S2 encodes eukaryotic
translation initiation factor 2, subunit 2 beta, which is involved in early steps of
protein synthesis by forming a ternary complex with GTP and initiator tRNA. The
deletion of Eif2s2 has been associated with suppression of testicular germ cell tumor
incidence and recessive lethality in mice (Heaney et al., 2009). Both RALY and
126
EIF2S2 are expressed in many tissues including the mammary gland (Database).
However, the SNP rs2284378 was not correlated with expression of EIF2S2, RALY or
ASIP in lymphocytes (Dimas et al., 2009), adipocytes or skin cells (Nica et al., 2011)
Several other correlated SNPs in the same LD block as SNP rs2284378
(r
2
>0.8) were significantly associated with expression of nearby genes EIF2S2 and
RALY. Another correlated SNP rs4911379 (r
2
=0.96) is statistically significantly
associated with EIF2S2 expression in fibroblasts (P=3.6 x10
-4
) (Dimas et al.,
2009)and SNPs rs761238 and rs761236 (r
2
=0.85) are associated with RALY
expression in lymphocytes (P=8.3x10
-4
)(Nica et al., 2011).
In addition to specific gene expression, several enhancer as well as promoter
regions as defined by overlapping chromatin marks in human mammary epithelial
cells (HMEC) were found at 20q11 (Fig. 5-1). SNPs in high LD with rs2284378
(r
2
>0.7), such as rs4911395, rs4911396 and rs1007090 were found at the promoter
region of RALY. Additionally, correlated SNPs rs6142101, rs6087557, and
rs4911408 (r
2
>0.7) are present in the promoter region of EIF2S2, and rs1054534,
rs1555075, rs2268086, rs2268088, rs4911401, rs2284388, rs2284389 and
rs932388 are located in predicted enhancer regions in introns of RALY. Thus,
variants at 20q11 may influence expression of multiple genes in mammary
epithelial cells, as has been seen in prostate cancer. Coupled with the findings above,
this could suggest that variants at 20q11 are associated with expression of multiple
genes, as has been seen at the prostate cancer risk locus on chromosome 10q11
(Pomerantz et al., 2010). A number of SNPs correlated with rs2284378 (rs1054534,
127
rs1555075, rs2268086, rs2268088, rs4911401, rs2284388, rs2284389 and
rs932388) were also found in putative enhancer regions, in introns of the RALY
gene. Thus, variants at 20q11 may influence expression of multiple genes in
mammary epithelial cells, as has been seen in prostate cancer (Pomerantz et al.,
2010).
In contrast, the association at 6q14 is located in a gene desert region with no
evidence of an open/active regulatory region in HMEC (Fig. 5-2). The closest gene
(~262kb), family with sequence similarity 46, member A (FAM46A/C6orf37),
encodes a protein of unknown function. However, the SNP rs2248378 and
correlated SNPs (r
2
>0.1) in the LD were not associated with expression of
FAM46A/C6orf37. Therefore, additional studies for these novel regions will be
necessary to identify the underlying biologically relevant variant/s.
The SNP rs17530068 at chromosome 6q14 was associated with overall
breast cancer risk and showed no differential association depending on ER status.
The association of SNP rs2284378 at 20q11, however, was stronger for ER-negative
than ER-positive breast cancer. According to Curtis et al, there are significant
differences in gene expression from subtypes of breast cancer (Curtis et al., 2012).
Since many GWAS of breast cancer to date did not focus on this specific breast
cancer subtype, the etiology of ER-negative disease is largely unknown. Therefore,
this and new GWAS to identify loci associated with ER-negative and TN breast
cancer will provide insight into the biological mechanisms underlying the disease,
and result in improvements in risk prediction and treatment.
128
Figure 5-1. UCSC Genome Browser view displaying chromatin features in
20q11 region. First track shows UCSC genes (RefSeq, UniPort, CCDS, Rfam, tRNAs &
Comparative Genomics). Second track shows FunciSNP result. The name of
correlated SNP (rsnumber – r
2
value) is shown and colour coded to indicate the
number of biofeatures. The bottom tracks are biofeature tracks in human mammary
epithelial cells (HMEC) and rsnumber colour code for number of biofeatures (dark
blue:0, cyan:4, brown:8). We collected 5,680 SNPs, which coincided with one or
more biofeatures, in a 1MB window of each of the three index SNP from the 1000
genome projects in European ethnic group. Among them, 66 SNPs had R
2
value > 0.5
with the index SNP (Siddiq et al., 2012).
129
Figure 5-1: Continued.
Siddiq, A., Couch, F.J., Chen, G.K., Lindstrom, S., Eccles, D., Millikan, R.C., Michailidou,
K., Stram, D.O., Beckmann, L., Rhie, S.K., et al. (2012). A meta-analysis of genome-
wide association studies of breast cancer identifies two novel susceptibility loci at
6q14 and 20q11. Hum Mol Genet 21, 5373-5384.
130
Figure 5-2. UCSC Genome Browser view displaying chromatin features in 6q14
region. First track shows UCSC genes (RefSeq, UniPort, CCDS, Rfam, tRNAs &
Comparative Genomics). Second track shows FunciSNP result. The name of
correlated SNP (rsnumber – r
2
value) is shown and colour coded to indicate the
number of biofeatures. The bottom tracks are biofeature tracks in human mammary
epithelial cells (HMEC) and rsnumber colour code for number of biofeatures (dark
blue:0, cyan:4, brown:8). There was no evidence of an open/active regulatory region
in human mammary epithelial cells near the SNP, rs17530068 at 6q14 (Siddiq et al.,
2012).
Siddiq, A., Couch, F.J., Chen, G.K., Lindstrom, S., Eccles, D., Millikan, R.C., Michailidou,
K., Stram, D.O., Beckmann, L., Rhie, S.K., et al. (2012). A meta-analysis of genome-
wide association studies of breast cancer identifies two novel susceptibility loci at
6q14 and 20q11. Hum Mol Genet 21, 5373-5384.
131
5.2 Four novel breast cancer susceptibility loci from meta-analysis
5.2.1 GWAS of ER-negative breast cancer in a meta-analysis - four
susceptibility loci (Garcia-Closas, 2013)
Recently, another GWAS identified four ER-negative specific breast cancer
susceptibility loci (Garcia-Closas, 2013). This study analyzed three GWASs in
populations of European ancestry and replicated promising signals from each GWAS
in the Breast Cancer Association Consortium (BCAC). 4,193 estrogen receptor (ER)
negative breast cancer cases and 35,194 controls were combined with a replication
series of 40 studies (6,514 cases and 41,455 controls) in this meta-analysis. Four
loci were associated with ER negative but not ER-positive breast cancer (P>0.05),
suggesting specific etiologic pathways for invasive ER-negative disease. Two
independent loci were located at chromosome 1q32.1: rs4245739 (P=3.9x10-13,
OR=1.14 95%CI=(1.10-1.18)) and rs6678914 (P = 4.9x10-9, OR=1.10 (1.07-1.14)).
The other two loci were located at 2p24.1 (rs12710696, P=1.5x10-8, OR=1.10 (1.06-
1.14)) and 16q12 (rs11075995, P=2.0x10-8, OR=1.11 (1.07-1.15).
The association of the four SNPs to the disease was significantly differed by
ER status; none of the SNPs showed significant associations with ER-positive disease
based on analysis of 25,227 ER-positive cases and 41,455 controls of European
ancestry in BCAC. None of the four SNPs showed significant (P<0.05) associations in
studies of Asian ancestry in BCAC, and only the rs11075995 variant in 16q12.2 was
associated in combined analyses in BCAC and the African-American Breast Cancer
132
Consortium. Furthermore, none of the markers were significantly associated with
age at onset of ER-negative disease (Garcia-Closas, 2013).
5.2.2 Functional annotation on four risk loci
As before, I did putative functional annotation of above loci. The SNP rs4245739
(1q32.1) is located in the 3’ region of the MDM4 oncogene. MDM4 is a repressor of
TP53 and TP73 transcription and is important for cell cycle regulation and
apoptosis. rs4245739 resides in a linkage disequilibrium (LD) block of
approximately 230Kb that also contains the tRNA-Lysine transcript, and the genes
PIK3C2B and LRRN2 (Fig. 5-3A). MDM4, tRNA-Lysine and PIK3C2B, but not LRRN2,
are expressed in normal breast epithelium, breast cancer cell lines and breast
tumors (Graham et al., 2010; Turashvili et al., 2007; Wang and Yan, 2011). There are
no non-synonymous SNPs correlated with rs4245739 in the 1000 Genomes Project
European ancestry populations (r2>0.10), however, correlated SNPs are located in
the promoter region of PIK3C2B (rs3014606, r2=0.94 and rs2926534, r2=0.94) and
in the tRNA-Lysine transcript (rs11240753, r2=0.78 and rs4951389, r2=0.78).
Variants in the MDM4 locus correlated with rs4245739 have also been associated
with breast cancer in BRCA1 mutation carriers who have predominantly ER
negative tumors (Couch, F. et al. (Submitted to Nature Genetics, 2012). Thus, this
region appears to be specifically associated with ER-negative disease and not overall
breast cancer risk.
rs6678914 on chromosome 1q32.1 is located in intron 1 of the LGR6 gene
(Fig. 5-3B). LGR6 and several other genes in this region, including UBE2T and
133
PTPN7, are expressed in breast tumors (Turashvili et al., 2007). A correlated SNP
(rs12032424 r2=0.96) is located in a putative enhancer region in the same intron of
LGR6 in normal breast epithelial cells, although not in the triple-negative breast
cancer cell line MDA MB231 (Fig. 5-3B). The rs6678914 SNP is not correlated with
non-synonymous SNPs in LGR6 (at r
2
>0.10 in 1000 Genomes Project European
ancestry populations).
The SNP rs12710696 on chromosome 2p24.1 is located in an intergenic
region, more than 200 Kb from the nearest gene (OSR1) (Fig. 5-3C). It is possible
that the allele marked by rs12710696 could influence a set of active enhancers
because the region contains multiple overlapping chromatin marks in normal breast
epithelial cells and the MDA MB231 triple-negative breast cancer cell line (Fig. 5-3C).
The signal found on chromosome 16q12.2 is located in the fat mass and
obesity associated (FTO) gene. This signal is tagged by rs11075995 located in a
~40Kb LD block in intron 1 of FTO, within an enhancer region that appears to be
active in both normal and triple-negative breast cancer cells (Fig. 5-3D).
rs11075995 is located ~40Kb distal to a region that has been associated with
obesity (rs9939609) (Frayling et al., 2007) and overall breast cancer risk
(rs17817449) (Michailidou, 2013). The obesity and overall breast cancer SNPs are
highly correlated (r
2
=0.99 for rs3751812, a surrogate of rs9939609, and
rs17817449); however rs11075995 is not correlated with either of the two SNPs
(r2<0.04). Conditional analyses indicated that the ER-negative specific signal
(rs11075995) is independent of rs17817449. This adds to the increasing evidence
134
of distinct signals at the same locus for different subtypes of cancers occurring at the
same site, e.g. 5p15.33 (TERT-CLPTM1L) (Bojesen, S. E. et al. (Submitted to Nature
Genetics, 2012) and 14q24.1 (RAD51L1) (Michailidou, 2013) in breast cancer, and
5p15.33 (TERT-CLPTM1L) (Johnatty, S. E. et al. (Submitted to Nature Genetics, 2012)
and HNF1B (Pearce, C. L. (Submitted to Nature Genetics, 2012) in ovarian cancer.
In summary, the meta-analyses identified novel loci, which are differently
associated with ER-positive and ER-negative breast cancer. It further provides the
evidence that there are distinct etiologic pathways for subtypes of breast cancer. In
order to understand the hidden biological mechanisms for the disease and to
identify the new targets for therapy and prevention, fine mapping and functional
studies of ER-negative susceptibility loci should be performed.
135
Figure 5-3. Chromatin features for 1q32.1/MDM4, 1q32.1/LGR6, 2p24 and
16q12.2/FTO regions in normal human mammary epithelial cells (HMEC) and
triple negative breast cancer cells (MDAMB231). First track shows FunciSNP
result. The name of correlated SNP (rsnumber – r2 value) is shown and colour
coded to indicate the number of biofeatures. The bottom tracks are biofeature tracks
and RefSeq genes/mRNA/Pseudogene tracks. rsnumber colour code for number of
biofeatures (dark blue:0, green:5 brown:10). We collected 9,893 SNPs, which
coincided with one or more biofeatures, in a 1Mb window of each of the four index
SNP from the 1000 genome projects in European ethnic group. Among them, 64
SNPs had R2 value ≥ 0.5 with the index SNP. Among the 64 SNPs, 17 SNPs were
coinciding with 5 or more biofeatures (Garcia-Closas, 2013).
136
Figure 5-3: Continued.
Garcia-Closas, M., Couch, F.J., Lindstrom, S., Michailidou, K., Schmidt, M.K., Orr, N.,
Rhie, S.K., Riboli, E., Feigelson, H.S., Marchand, L.L. et al (2013). Genome-wide
association studies identify four ER-negative specific breast cancer risk loci. .Nat
Genet. 45, 392-398
A. rs4245739 in 1q32.1/MDM4!
137
Figure 5-3: Continued.
Garcia-Closas, M., Couch, F.J., Lindstrom, S., Michailidou, K., Schmidt, M.K., Orr, N.,
Rhie, S.K., Riboli, E., Feigelson, H.S., Marchand, L.L. et al (2013). Genome-wide
association studies identify four ER-negative specific breast cancer risk loci. .Nat
Genet. 45, 392-398
B. rs6678914 1q32.1/LGR6
138
Figure 5-3: Continued.
Garcia-Closas, M., Couch, F.J., Lindstrom, S., Michailidou, K., Schmidt, M.K., Orr, N.,
Rhie, S.K., Riboli, E., Feigelson, H.S., Marchand, L.L. et al (2013). Genome-wide
association studies identify four ER-negative specific breast cancer risk loci. .Nat
Genet. 45, 392-398
C. rs12710696 2p24
139
Figure 5-3: Continued.
Garcia-Closas, M., Couch, F.J., Lindstrom, S., Michailidou, K., Schmidt, M.K., Orr, N.,
Rhie, S.K., Riboli, E., Feigelson, H.S., Marchand, L.L. et al (2013). Genome-wide
association studies identify four ER-negative specific breast cancer risk loci. .Nat
Genet. 45, 392-398
D. rs11075995 6q12/FTO
140
5.3 Novel Loci Associated with Body Mass Index Identified in a
Large Genome-wide Association Study of Men and Women of
African Ancestry
5.3.1 GWAS of Body Mass Index in meta-analysis (Monda et al., 2013)
36 common genetic loci for body mass index (BMI) were identified, but they were
performed predominantly in populations of European ancestry. Monda et al
conducted a meta-analysis of GWAS to examine the association of ~3 million SNPs
with BMI in 39,144 men and women of African ancestry (Monda et al., 2013). By
following up the most significant associations in an additional 32,268 individuals of
African ancestry, three novel BMI loci were identified at 5q33 (GALNT10,
rs7708584, p=3.4x10
-11
), 7p15 (MIR148A/NFE2L3, rs10261878, p=1.2x10
-10
) and
6q16 (KLHL32, rs974417, p=6.9x10
-8
) in the African ancestry sample. This study
also confirmed that thirty-two of the 36 previously established BMI variants
displayed directionally consistent effect estimates in our African ancestry GWAS
(p=9.7x10
-7
).
5.3.2 Biological Functions of the Three Novel Loci
The variant rs7708584 at chromosome 5q33 is located upstream of the gene
galactosamine:polypeptide N-acetylgalactosaminyltransferase 10 (GALNT10), which
catalyzes the first step in the synthesis of mucin-type oligosaccharides. The protein
is highly expressed in the small intestine and at intermediate levels in the stomach,
pancreas, ovary, thyroid gland and spleen (Cheng et al., 2002). Additionally,
141
GALNT2, also a member of the GalNAc-transferases family, has been identified in
GWAS of HDL cholesterol and triglycerides (Kathiresan et al., 2009; Teslovich et al.,
2010; Waterworth et al., 2010).
The most significant SNP at chromosome 6q16 (rs974417) is intronic in the
kelch-like 32 gene (KLHL32). There are over 50 kelch-like genes in humans. Their
propeller domains bind substrate proteins, promoting substrate ubiquitination,
which modulates protein function.
The variant at 7p15, rs10261878, is intergenic and located to a nearest
microRNA gene (MIR148A), which has been found to be significantly up-regulated
during adipogenesis (Xie et al., 2009) as well as in human adipocytes(Ortega et al.,
2010). In addition, human miR-148a has been associated with panic disorder and
regulates an anxiety candidate gene, CCKBR (cholecystokinin B receptor), which has
been shown to play a regulatory role in the control of food intake(Clerc et al., 2007).
The next closest gene (241 kb from rs10261878) is the nuclear factor (erythroid-
derived 2)-like 3 gene (NFE2L3), a transcription factor that binds to antioxidant
response elements in target genes and appears to play a role in differentiation,
inflammation, and carcinogenesis (Chevillard and Blank, 2011). Previous GWAS
have identified associations between SNPs in the 1 Mb region surrounding
rs10261878 and waist-to-hip ratio (Heid et al., 2010), endometriosis (Painter et al.,
2011), type 1 diabetes(Barrett et al., 2009), and smoking behavior (Caporaso et al.,
2009); however, these signals appear to be independent of this risk SNP (r
2
<0.1).
142
To analyze putative biological functions of these loci, we further investigated
the three novel loci. The SNP rs7708584 at 5q33 were located near GALNT10, and
showed nominally significant (p<0.05) associations with GALNT10 expression (for
two of its three transcripts available) in liver, omental, and subcutaneous fat
(p=0.048, 0.00010, and 0.00017, respectively). Furthermore, another SNP,
rs10261878 near NFE2L3 was associated with NFE2L3 expression in the same three
tissues (p=0.039, 0.015, and 0.036 for liver, omental, and subcutaneous fat,
respectively). However, other nearby SNPs, not in LD (r
2
< 0.02) with risk SNPs
showed stronger association with the expression levels for the respective
transcripts. The SNP rs974417 was located near KLHL32 gene, but there was no
evidence of association between the loci and expression in the brain.
We did not find non-synonymous SNPs in GALNT10, KLHL32 or NFE2L3 that
were correlated (r
2
> 0.2) with the most significant SNPs in these regions in the 1000
Genomes Project African ancestry populations (AFR). However, we did detect a
number of correlated SNPs (r
2
>0.5) in regulatory sequences determined based on
overlapping chromatin marks in multiple cell types, including brain and adipose
tissue (Methods). Many of these SNPs (or good proxies in the 1000 Genomes Project
AFR, r
2
range 0.59-1.0), which are located in putative enhancer and promoter
regions (Table 5-1, 2, 3, Fig 5-4). Taken together, these data show that the
biologically relevant variants in all three regions may regulate genes involved in
BMI.
143
In the largest GWAS meta-analysis of African ancestry populations to date,
including as many as 71,412 individuals, two novel and one highly suggestive loci
influencing BMI were identified. Although mapping novel loci and replicating signals
found in other populations may reflect differences which are influenced by
population such as demographic and LD with the functional variant as well as
genetic and environmental factors, this study provided evidence for a shared genetic
influence on BMI across populations. Additional studies on the biologically
functional alleles in these regulatory regions will be needed to understand genetics
and biological mechanism for BMI. Together, these findings demonstrate the
importance of conducting genetic studies in diverse populations in order to identify
novel susceptibility loci for common traits.
144
Table 5-1. A summary of the biofeature analysis and putative functional SNPs at 5q33 (Monda et al., 2013).
SNPs correlated
with index (r
2
>0.5)
in 1KGP AFR
Tested in
Stage 1
Correlated SNP
tested in Stage1
(r
2
in AFR) SNP Chr
Position
(Hg19)
STAGE1
beta
STAGE1
SE
STAGE1
P-value
STAGE1
N
No. of
overlapping
Biofeatures
a
rs17625484 Yes rs17625484 5 153513403 0.0427 0.0103 3.63E-05 38214 1
rs67791094 No rs7707654 (0.74) 5 153525840 1
rs7715256 No rs7708584 (0.59) rs7715256 5 153537893 -0.041 0.0082 5.87E-07 38177 2
rs7719067 No rs7708584 (0.77) rs7719067 5 153538241 0.0445 0.0083 8.68E-08 38198 2
rs55776503 No rs7707654 (0.86) 5 153539912 2
rs4569924 No rs7708584 (0.77) rs4569924 5 153540025 0.0445 0.0083 8.68E-08 38215 2
rs7708584 - Index Yes rs7708584 5 153543466 0.0498 0.0086 8.02E-09 38218.9 0
rs1366219 Yes rs1366219 5 153545698 0.0498 0.0086 8.02E-09 38218.9 8
rs2099044 Yes rs2099044 5 153548206 0.0509 0.0086 3.74E-09 38219.9 3
rs113042334 No rs7707654 (0.86) 5 153548856 1
rs56968070 No rs7707654 (0.98) 5 153550752 4
rs7707654 Yes rs7707654 5 153555871 0.0502 0.0108 3.12E-06 38221 15
rs58253496 No rs7707654 (0.98) 5 153560421 3
rs17115711 Yes rs17115711 5 153572862 0.0443 0.0102 1.50E-05 39140 22
rs17115719 Yes rs17115719 5 153574772 0.043 0.0102 2.64E-05 39094 15
rs55946756 No rs17115719 (1.0) 5 153577099 39
rs6865687 Yes rs6865687 5 153578190 0.0434 0.0102 2.22E-05 39134 17
rs58711554 No rs6865687 (0.97) 5 153579425 8
a
73 different chromatin features, which annotate regulatory elements in brain and adipose tissues were obtained from NIH Epigenomics Roadmap and known
DNaseI hypersensitive locations, FAIRE-seq peaks, and CTCF binding sites from more than 100 different cell types were collected from the ENCODE data (see
Methods). Each correlated SNP is color coded in Figure 5-5 to reflect the number of biofeatures with which it overlaps.
145
Table 5-2. A summary of the biofeature analysis and putative functional SNPs at 6q16 (Monda et al., 2013).
SNPs correlated
with index
(r
2
>0.5) in 1KGP
AFR
Tested
in
Stage 1
Correlated SNP
tested in Stage 1
(r
2
in AFR) SNP Chr
Position
(Hg19)
STAGE1
beta
STAGE1
SE
STAGE1
P-value
STAGE1
N
No. of
overlapping
Biofeatures
a
rs13203153 Yes rs13203153 6 97374850 -0.0341 0.0088 0.000116 38220.9 16
rs2206568 No rs7761614 (0.88) 6 97382338 6
rs13198696 No rs7761614 (0.96) 6 97386246 19
rs7761614 Yes rs7761614 6 97388179 0.0338 0.0084 5.97E-05 39140 9
rs5025221 No rs6937736 (0.99) 6 97388516 6
rs35104491 No rs7769945 (0.98) 6 97390463 8
rs2223614 No rs6917254 (0.91) 6 97391272 13
rs6917254 Yes rs6917254 6 97391853 -0.0344 0.0086 6.77E-05 39103 10
rs6937736 Yes rs6937736 6 97399772 -0.0346 0.0086 6.14E-05 39134 3
rs6937950 Yes rs6937950 6 97399927 -0.0346 0.0086 6.14E-05 39136 3
rs7769945 Yes rs7769945 6 97400839 0.0347 0.0086 5.84E-05 39127 2
rs6568674 Yes rs6568674 6 97402165 -0.0345 0.0099 0.0005008 39071.8 1
rs6910273 No rs6568674 (1.0) 6 97403783 1
rs34663994 No rs6937736 (0.97) 6 97404417 2
rs13211612 Yes rs13211612 6 97406847 -0.0364 0.0087 3.12E-05 39140 5
rs17057164 Yes rs17057164 6 97410536 0.0393 0.0082 1.68E-06 39125 3
rs71562315 No rs13192767 (0.99) 6 97410922 1
rs13192767 Yes rs13192767 6 97411114 0.0366 0.0087 2.82E-05 38896 2
rs7743034 Yes rs7743034 6 97411894 -0.024 0.0079 0.002344 39139.9 4
rs11153285 Yes rs11153285 6 97413151 -0.0386 0.008 1.38E-06 39100 8
rs11153286 No rs11153285 (1.0) 6 97413636 8
rs10485381 Yes rs10485381 6 97413827 0.038 0.008 2.00E-06 39110 9
146
Table 5-2: Continued.
SNPs
correlated
with index
(r
2
>0.5) in
1KGP AFR
Tested
in
Stage 1
Correlated SNP tested
in Stage 1
(r
2
in AFR) SNP Chr
Position
(Hg19)
STAGE1
beta
STAGE1
SE
STAGE1
P-value
STAGE1
N
No. of
overlapping
Biofeatures
a
rs13205380 No rs13192767 (0.96) 6 97414063 9
rs11753834 No rs7759572 (1.0) 6 97414756 12
rs11754336 No rs7759572 (1.0) 6 97414775 12
rs7758454 No rs13211612 (0.98) 6 97415116 12
rs7759572 Yes rs7759572 6 97415135 0.0387 0.008 1.29E-06 39093 11
rs7758623 No rs7759572 (1.0) 6 97415238 9
rs34398537 No rs13192767 (1.0) 6 97416270 9
rs974417 -
Index Yes rs974417 6 97419598 -0.0395 0.0082 1.49E-06 39120 0
rs7762215 No rs2179537 (1.0) 6 97421139 7
rs2143389 Yes rs2143389 6 97422681 -0.0391 0.0082 1.90E-06 39078 10
rs9791288 No rs2179537 (1.0) 6 97422915 11
rs2179537 Yes rs2179537 6 97422930 0.0241 0.0079 0.002248 39123 11
rs11756272 Yes rs11756272 6 97423683 0.0209 0.0079 0.008056 38497 13
rs2387762 No rs974417 (1.0) 6 97424463 8
rs7770535 No rs974417 (1.0) 6 97425095 1
rs13196401 No rs13195937 (1.0) 6 97427757 1
rs13195937 Yes rs13195937 6 97428128 -0.0265 0.0079 0.0007803 39117 2
a
73 different chromatin features, which annotate regulatory elements in brain and adipose tissues were obtained from NIH Epigenomics Roadmap and known
DNaseI hypersensitive locations, FAIRE-seq peaks, and CTCF binding sites from more than 100 different cell types were collected from the ENCODE data (see
Methods). Each correlated SNP is color coded in Figure 5-5 to reflect the number of biofeatures with which it overlaps.
147
Table 5-3. A summary of the biofeature analysis and putative functional SNPs at 7p15 (Monda et al., 2013).
SNPs
correlated
with index
(r
2
>0.5) in
1KGP AFR
Tested in
Stage 1
Correlated SNP
tested in Stage 1
(r
2
in AFR) SNP Chr Position (Hg19) STAGE1 beta STAGE1 SE
STAGE1
P-value STAGE1 N
No. of
overlap
ping
Biofeat
ures
a
rs10261878
- Index Yes rs10261878 7 25950545 -0.0301 0.008 0.0001664 39100.9 0
rs4722543 Yes rs4722543 7 25953720 0.0276 0.0081 0.0006567 39078 2
rs9655264 Yes rs9655264 7 25955324 -0.0281 0.0081 0.0005228 39141 1
rs10223942 No rs4722544 (0.95) 7 25955552 3
rs4722544 Yes rs4722544 7 25956167 0.0664 0.0382 0.08185 2101 7
rs998357 Yes rs998357 7 25979526 -0.0312 0.0082 0.0001439 39084 2
rs7803374 Yes rs7803374 7 25979695 -0.0304 0.0083 0.0002557 38180 2
rs2893222 No rs7803374 (1.0) 7 25980231 2
rs6959363 No rs6977848 (0.83) 7 25984565 2
rs6947509 No rs10261878 (0.77) 7 25987772 8
rs6953596 No rs10261878 (0.82) 7 25989298 20
rs6977848 Yes rs6977848 7 25989520 0.0279 0.0084 0.000922 38059 37
rs62446277 No rs6977848 (0.86) 7 25989960 35
a
73 different chromatin features, which annotate regulatory elements in brain and adipose tissues were obtained from NIH Epigenomics Roadmap and known
DNaseI hypersensitive locations, FAIRE-seq peaks, and CTCF binding sites from more than 100 different cell types were collected from the ENCODE data (see
Methods). Each correlated SNP is color coded in Figure 5-5 to reflect the number of biofeatures with which it overlaps.
Figure 5-4. UCSC Genome Browser view of the three novel loci with FunciSNP
results. (A) Genome browser
6q16.1, (C) rs10261878
A
B
UCSC Genome Browser view of the three novel loci with FunciSNP
) Genome browser view for rs7708584 at 5q33.2, (B) rs974417 at
) rs10261878 at 7p15.2* (Monda et al., 2013)
148
UCSC Genome Browser view of the three novel loci with FunciSNP
view for rs7708584 at 5q33.2, (B) rs974417 at
Figure 5-4: Continued.
*Genome browser tracks are ordered in the following manner: Refseq genes, UCSC Genes, Human mRNAs from
GenBank, Human spliced ESTs, regulatory elements histone marks from 7 cell lines (GM12878, H1
HUVEC, K562, NHEK, and NHLF) from ENCODE, and FunciSNP results. In the FunciSNP result track, each index
SNP is shown in black and each correlated SNP is col
overlaps. The color key is shown at the bottom. The color ranges from blue (low number of biofeature overlap)
to red (high number of biofeature overlap). The r
rsID. We used 73 different chromatin features, which annotate regulatory elements (i.e. enhancers, promoters)
from brain and adipose tissues from the NIH Epigenomics Roadmap
DNaseI hypersensitive locations, FAIRE
types, which were collected from the ENCODE data
chromatin mark and overlapping SNPs are provided in Tables
Monda, K.L., Chen, G.K., Taylor, K.C., Palmer, C., Edwards, T.L., Lange, L.A., Ng, M.C.Y.,
Adeyemo, A.A., Allison, M.A., Bielak, L.F.
loci associated with body mass index in individuals of African ancestry
C
4: Continued.
*Genome browser tracks are ordered in the following manner: Refseq genes, UCSC Genes, Human mRNAs from
, Human spliced ESTs, regulatory elements histone marks from 7 cell lines (GM12878, H1
HUVEC, K562, NHEK, and NHLF) from ENCODE, and FunciSNP results. In the FunciSNP result track, each index
SNP is shown in black and each correlated SNP is color coded to reflect the number of biofeatures with which it
overlaps. The color key is shown at the bottom. The color ranges from blue (low number of biofeature overlap)
to red (high number of biofeature overlap). The r
2
value of each correlated SNP is also shown to the right of the
rsID. We used 73 different chromatin features, which annotate regulatory elements (i.e. enhancers, promoters)
from brain and adipose tissues from the NIH Epigenomics Roadmap(Bernstein et al., 2010)
sensitive locations, FAIRE-seq peaks, and CTCF binding sites from more than 100 different cell
types, which were collected from the ENCODE data(Ernst et al., 2011) More information regarding each
rk and overlapping SNPs are provided in Tables 5-1,2, and 3.
Monda, K.L., Chen, G.K., Taylor, K.C., Palmer, C., Edwards, T.L., Lange, L.A., Ng, M.C.Y.,
Adeyemo, A.A., Allison, M.A., Bielak, L.F., et al. (2013). A meta-analysis identifies new
ted with body mass index in individuals of African ancestry
149
*Genome browser tracks are ordered in the following manner: Refseq genes, UCSC Genes, Human mRNAs from
, Human spliced ESTs, regulatory elements histone marks from 7 cell lines (GM12878, H1-hESC, HSMM,
HUVEC, K562, NHEK, and NHLF) from ENCODE, and FunciSNP results. In the FunciSNP result track, each index
or coded to reflect the number of biofeatures with which it
overlaps. The color key is shown at the bottom. The color ranges from blue (low number of biofeature overlap)
o shown to the right of the
rsID. We used 73 different chromatin features, which annotate regulatory elements (i.e. enhancers, promoters)
as well as known
seq peaks, and CTCF binding sites from more than 100 different cell
More information regarding each
Monda, K.L., Chen, G.K., Taylor, K.C., Palmer, C., Edwards, T.L., Lange, L.A., Ng, M.C.Y.,
analysis identifies new
ted with body mass index in individuals of African ancestry. Nat Genet.
150
5.4 Functional annotation on Schizophrenia GWAS risk Loci
5.4.1 GWAS on Schizophrenia (SCZ) (preliminary data from the Psychiatric
Genetics Consortium, provided by James Knowles)
Schizophrenia (SCZ) is a mental disorder characterized by a breakdown of thought
process and normal emotional responses. The Schizophrenia Psychiatric Genome-
Wide Association Study (GWAS) Consortium (PGC) identified sixty-two novel SCZ
risk loci from individuals of European ancestry. Among sixty-two index SNPs,
biological function of fifty-three SNPs were annotated, and functional SNPs in risk
loci were identified by using the FunciSNP analyses (Coetzee et al., 2012).
5.4.2 Functional annotation of SCZ risk loci
In order to determine whether these variants are involved in gene regulation, we
performed FunciSNP analyses using publicly available histone marks ChIP-seq data
(i.e. H3K4me1, H3K4me3, H3K9Ac, and DNaseI) from brain (i.e. substantia nigra,
hippocampus.middle, germinal matrix, mid frontal lobe, inferior temporal lobe,
anterior caudate, and cingulate gyrus), fetal brain, and neurosphere (i.e. cortex
derived neurosphere, ganglionic eminence derived neurosphere) (Bernstein et al.,
2010). We collected 50,259 SNPs, which coincided with one or more of the above
biofeatures, in a 1MB window of each 53 index SNPs. Among these SNPs, 60 of them
were highly correlated with the index SNPs in 15 risk loci (r
2
>0.9 with the index
SNP)(Table 5-4).
151
For example, an index SNP, rs4788190 is located about 10kb away from the
TSS of the KCTD13 gene. There are several high LD SNPs such as rs12716982,
rs12716973, rs4420550, and rs4424923 in the promoter of the KCTD13 gene (Fig.
5-5). Since the promoter histone mark, H3K4me3 and the active promoter mark,
H3K9Ac are enriched in this promoter region in fetal brain, brain, and neurosphere
cells, these high LD SNPs are possibly involved in regulating the gene expression of
KCTD13.
The SNP, rs72694965 is located in a gene desert region. However, high LD
SNPs, such as rs72694957, rs11589922, rs11591200, and rs72694956 are located
in an enhancer region, which is 5kb downstream from the PLEKHO1 gene. The
enhancer histone marks, H3K4me1 and open chromatin, DNaseI, are enriched in
fetal brain and brain cells, but not in neurosphere-cultured cells. Enhancers are
generally cell type specific, and the variants, rs72694957, rs11589922, rs11591200,
and rs72694956 are candidates for enhancer functional risk SNPs for SCZ (Fig. 5-6).
We also identified 15,010 SNPs, which are located in the exons of genes in 52
risk loci (a 1MB window of 52 index SNPs). Among them, only 5 SNPs were highly
correlated with 2 index SNPs (r
2
> 0.9, and only one SNP, rs4584886, is a missense
variant. The other four SNPs are synonymous variants (Table 5-5). The SNP,
rs4584886 is located in the exon 6 of a gene, called LRRC48, leucine rich repeat
containing 48. In order to test whether this missense variant change affects its
protein function in silico, we used the PolyPhen2 and the software (Adzhubei et al.,
152
2010; Kumar et al., 2009). The results indicate that R(CGG) -> W(TGG) amino acid
change probably damage its protein function (Fig. 5-7).
153
Table 5-4. A list of putative functional SNPs at SCZ risk loci
corr.snp.id
Number of
overlapping
biofeatures tag.snp.id R.squared nearest.Gene
rs12716972 17 rs4788190 0.96 KCTD13
rs4420550 13 rs4788190 0.97 KCTD13
rs610060 13 rs11717383 0.90 PPM1M
rs12716973 12 rs4788190 0.96 KCTD13
rs1805563 12 rs13071004 0.95 FXR1
rs34455584 12 rs13071004 0.94 FXR1
rs4727621 12 rs7811681 0.92 SRPK2
rs11584354 11 rs72694965 0.95 VPS45
rs2027349 11 rs72694965 0.93 VPS45
rs7189381 9 rs111402974 0.91 PLA2G15
rs72694957 7 rs72694965 0.96 PLEKHO1
rs7184821 6 rs111402974 0.95 ESRP2
rs7406982 6 rs2955368 0.96 C17orf39
rs917114 6 rs7811681 0.95 SRPK2
rs9936474 6 rs4788190 0.98 KCTD13
rs2955380 5 rs2955368 0.98 C17orf39
rs2955383 5 rs2955368 0.98 ATPAF2
rs35026580 5 rs12887734 0.95 BAG5
rs72694954 5 rs72694965 0.96 PLEKHO1
rs1993193 4 rs72930787 1.00 MIR4529
rs4609871 4 rs4788190 0.98 KCTD13
rs58426216 4 rs72694965 0.95 PLEKHO1
rs72694956 4 rs72694965 0.96 PLEKHO1
rs7622851 4 rs11717383 0.93 MIR135A1
rs11591200 3 rs72694965 0.96 PLEKHO1
rs1971546 3 rs111402974 0.91 PLA2G15
rs2237613 3 rs7811681 0.94 SRPK2
rs3801278 3 rs7811681 0.95 SRPK2
rs3801282 3 rs7811681 0.95 SRPK2
rs3801999 3 rs7811681 0.92 SRPK2
rs41562 3 rs7811681 0.94 SRPK2
rs4407079 3 rs4788190 0.98 KCTD13
rs4424923 3 rs4788190 0.97 KCTD13
rs832195 3 rs704367 0.92 THOC7
rs11589922 2 rs72694965 0.94 PLEKHO1
rs2240463 2 rs7811681 0.94 SRPK2
154
Table 5-4: Continued.
corr.snp.id
Number of
overlapping
biofeatures tag.snp.id R.squared nearest.Gene
rs2299319 2 rs7811681 0.95 SRPK2
rs4730082 2 rs7811681 0.92 SRPK2
rs10277120 1 rs7811681 0.93 MLL5
rs10281886 1 rs7811681 0.93 MLL5
rs11223659 1 rs11223651 1.00 IGSF9B
rs12273290 1 rs10838601 0.95 DGKZ
rs12311439 1 rs4298967 0.93 CACNA1C
rs1824849 1 rs72694965 0.96 PLEKHO1
rs2000852 1 rs11223651 1.00 IGSF9B
rs2000853 1 rs11223651 0.96 IGSF9B
rs2955378 1 rs2955368 0.98 ATPAF2
rs3801281 1 rs7811681 0.95 SRPK2
rs41563 1 rs7811681 0.94 SRPK2
rs4808959 1 rs2905432 0.92 TSSK6
rs4938023 1 rs2514218 0.92 DRD2
rs6502633 1 rs2955368 0.98 ATPAF2
rs6589379 1 rs2514218 0.91 DRD2
rs66599006 1 rs7811681 0.95 SRPK2
rs7106715 1 rs11223651 1.00 IGSF9B
rs7106806 1 rs11223651 0.99 IGSF9B
rs7111031 1 rs2514218 0.92 DRD2
rs7122746 1 rs11223651 1.00 IGSF9B
rs7140568 1 rs12887734 0.95 BAG5
155
Figure 5-5. UCSC Genome Browser view of the risk SNP rs478190 and
functional SNPs.
156
Figure 5-6. UCSC Genome Browser view of the risk SNP rs7269496 and
functional SNPs.
157
Table 5-5. A list of SCZ putative functional SNPs in coding exons
chr corr.snp.id tag.snp.id R.squared GeneSymbol Types of variant
19 rs1047361 rs2905432 0.94 TSSK6 synonymous
19 rs7250893 rs2905432 0.94 TSSK6 synonymous
17 rs2955355 rs2955368 0.99 C17orf39 synonymous
17 rs4368210 rs2955368 0.97 LRRC48 synonymous
17 rs4584886 rs2955368 0.98 LRRC48
missense variant
(R/W)
158
Figure. 5-7. UCSC Genome Browser view of the risk SNP rs2955368 and
functional SNPs
159
Chapter 6. Materials and Methods
6.1 Cell Culture
HMEC cells were obtained from Lonza (Lonza, Walkersville, MD) and cultured under
recommended conditions. MDAMB231, MCF10A and MCF7 cells were obtained from
American Type Culture Collection (ATCC, Manassas, VA). MDAMB231 and MCF7
cells were cultured in DMEM with 5% FBS. MCF10A cells were cultured in
DMEM/F12 with 5% horse serum, 100 units/ml penicillin, 0.1 mg/ml streptomycin,
0.5 μg/ml hydrocortisone, 100 ng/ml cholera toxin, 10 μg/ml insulin, and 20 ng/ml
epidermal growth factor (EGF).
6.2 FAIRE
6.2.1 FAIRE-seq library construction and sequencing
FAIRE assays were performed as described (Giresi et al., 2007), with a number of
modifications. Briefly, nuclei were extracted after the crosslinking step (1%
formaldehyde in PBS) of intact cells, followed by re-suspension in SDS lysis buffer
before sonication. FAIRE DNA samples were purified by phenol/chloroform
extraction. Input DNA was also purified by phenol/chloroform extraction after the
crosslinking was reversed. Two independent libraries were made for each sample
by using bar coded adapters. Each library was PCR amplified, and re-validated by
160
quantitative real-time PCR (qPCR). Single-end DNA sequencing was performed
using Illumina Hi-Seq 50 cycles by the Epigenome Center at the University of
Southern California. Two independent assays per condition were analyzed
separately and then the data were combined in order to increase the depth of
coverage.
6.2.2 Identification of FAIRE-seq peaks
Each bam file was filtered using a quality filter score of 30 after removing PCR
artifacts and duplicates by Samtools (Li et al., 2009). The identification of FAIRE-seq
peaks was performed using findPeaks from HOMER
(http://biowhat.ucsd.edu/homer) (Heinz et al., 2010). Peaks were identified by
using a triangle-based distribution with a median length of 150bp. 99.0%
confidence interval for peak pairs, which are unequal between sample and input
was used. A subpeak value of 0.6 with a trim float value of 0.3 was used to perform
peak separation. After peak identification, we calculated a p-value for each peak
between sample and input. To be most stringent, functional peaks (Heinz et al.,
2010) at a p-value of 10
-9
were used. FAIRE-seq data within 3kb windows centered
on the annotated TSS of genes were used to define TSS regions. The data > 1.5kb
from TSS were utilized to define enhancer regions for FunciSNP analysis.
161
6.2.3 Site-specific FAIRE
5 different primer sets surrounding BCE5 region were generated (~150bp for each
PCR product size), BCE5-1 through 5. The sequences of primers are listed in Table
S1. FAIRE samples were used to perform quantitative real-time PCR (qPCR) on a
Bio-Rad DNA Engine Opticon Real-Time PCR System (Bio-Rad, Hercules, CA) using
SYBR FAST qPCR Kits from Kapa Biosystems (Woburn, MA).
6.3 ChIP
6.3.1 Chromatin Immunoprecipitation-seq library construction and
sequencing
Chromatin immunoprecipitation (ChIP) assays were performed as previously
described (Jia et al., 2009). Antibodies used were anti-AcH3-K9/K14 (06-599)
(Millipore Corp., Billerica, MA), anti-H3K4me1 (ab8895) (Abcam, Cambridge, MA),
anti-H3K27ac (ab4729) (Abcam, Cambridge, MA), and normal rabbit IgG (sc-2027)
(Santa Cruz Biotechnology, Santa Cruz, CA). Library was constructed and sequenced
as previously described above in the FAIRE-seq section.
6.3.2 Identification of ChIP-seq peaks
162
Each bam file was filtered using quality filter score 30 after removing PCR artifacts
and duplicates by samtools. To identify epigenomic domains for histone
modification, the MACS program was applied (Zhang et al., 2008). In order to select
significant peaks over input DNA, p-value cutoff 1e-5 was used.
6.3.3 Histone modification ChIP-seq data for HMEC
Histone modification ChIP-seq data (H3K4me1, me2, me3, H3K9Ac and H3K27Ac) in
HMEC were obtained from accession number [GSE29611] through the NCBI Gene
Expression Omnibus portal. [GSE29611] was published as part of ENCODE project.
ChIP assay protocol as well as data processing details may be seen here
(http://genome.ucsc.edu/cgi-bin/hgTrackUi?db=hg19&g=wgEncodeBroadHistone).
Chromatin State Segmentataion HMM data generated by using above ChIPseq data
were obtained from accession number [GSE38163] and included for FunciSNP
analyses of regulatory elements. NGS data within 3kb windows centered on the
annotated transcription start sites of genes were used for TSS regions. For putative
enhancer regions, NGS data > 1.5kb from TSS were utilized.
6.4 DNaseI
6.4.1 DNaseI-seq data
DNaseI-seq data in HMEC were obtained from accession number [GSE32970]
through the NCBI Gene Expression Omnibus portal. Additional DNaseI-seq data
163
generated by University of Washington as part of ENCODE project were downloaded
from here
(http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeUwDnas
e/). Detailed protocols may be seen at following websites
(http://genome.ucsc.edu/cgi-
bin/hgTrackUi?hgsid=307403817&c=chr1&g=wgEncodeOpenChromDnase and
http://genome.ucsc.edu/cgi-
bin/hgTrackUi?hgsid=307403817&c=chr1&g=wgEncodeUwDnase). NGS data
within 3kb windows centered on the annotated transcription start sites of genes
were used to define TSS regions for FunciSNP analysis. For putative enhancer
regions, NGS data > 1.5kb from TSS were utilized.
6.5 Plasmid construction and Luciferase reporter assays
Eleven potential enhancer regions (~1200bp sequence surrounding the nucleosome
depleted regions with FunciSNP identified correlated SNP) were amplified from
genomic DNA using High Fidelity Platinum Tag DNA polymerase (Invitrogen Corp.,
Carlsbad, CA). The amplified sequences were then subcloned using SacII, EcoRI,
BglII or KpnI restriction sites upstream of a thymidine kinase (TK) minimal
promoter-firefly-luciferase vector. All clones were confirmed by sequencing. The
primer sequences for subcloning are listed in table 3-10 and table 4-1. HMEC,
MCF10A, MDAMB231, MCF7 cells were transfected with reporter plasmids along
164
with constitutively active pRL-TK Renilla luciferase plasmid (Promega Corp.,
Madison, WI) using Lipofectamine LTX Reagent (Invitrogen Corp., Carlsbad, CA)
under recommended protocol. Dual luciferase activities were measured as
previously described (Jia et al., 2009).
6.6 Annotation and Comparison between normal and cancer cells
Identified peaks were analyzed by using mergePeaks and annotatePeaks from
HOMER (http://biowhat.ucsd.edu/homer) (Heinz et al., 2010). Annotated positions
for promoters, exons, introns, intergenic regions and other features were based on
RefSeq transcripts and repeat annotations from University of California, Santa Cruz.
6.7 Identification of the HMEC specific enhancer loci and
MDAMB231 specific enhancer loci by enhancer status
Differently enriched H3K4me1 sites were identified from HMEC and MDAMB231
H3K4me1 ChIP-seq tags by using findPeaks from HOMER
(http://biowhat.ucsd.edu/homer) (Heinz et al., 2010). To avoid from detecting false
positive peaks, 0.10% false discovery rate (FDR) was used as a cut off, and top 2000
H3K4me1 sites were selected for the Lost and Gained H3K4me1. Identified
H3K4me1 sites were plotted in the heatmaps and line graphs, which were generated
by using seqMINER software (Ye et al., 2011). Poised and active enhancers were
165
categorized by using K-means linear clustering from the seqMINER software (Ye et
al., 2011).
6.8 Open chromatin nursery regions Identification
The number of FAIRE and DNaseI peaks within 1Mb of index SNPs, which is in
intergenic regions, were calculated by using galaxy-overlapping pieces of intervals
tools (Giardine et al., 2005).
6.9 Circos for the open chromatin nursery regions
The list of tagSNPs were downloaded from UCSC genome browser
(http://genome.ucsc.edu) and validated from a catalog of Genome-Wide Association
Studies (http://www.genome.gov/gwastudies/). FAIRE-seq data along with SNPs
were visualized in genome using Circos software (Krzywinski et al., 2009).
6.10 Gene expression analysis between breast cancer and normal
tissues
We compared gene expression levels between breast cancer and normal tissues
using the Oncomine database, released in Sep 2012 (Rhodes et al., 2004). This
database currently contains more than 674 datasets and information on 73,327
samples tissues, including datasets with over 593 samples for breast cancer (2012;
Finak et al., 2008; Ma et al., 2009; Radvanyi et al., 2005; Rhodes et al., 2004;
166
Richardson et al., 2006; Sorlie et al., 2003). For the differential expression analyses,
t-test with false discovery rates as a corrected measure of significance was
performed and following cut-off thresholds were utilized: p-value < 10
-4
, fold change
> 2.0, within top 10% gene rank. The result of this analysis for the genes, which high
LD TSS regional SNPs reside in, is listed in table 3-3.
6.11 Gene expression analysis between HMEC and MDAMB231 cells
We compared gene expression levels between HMEC and MDAMB231 cells by using
the affymetrix HG-U133 plus2 microarrays obtained from the accession number
[GSE33167](D'Amato et al., 2012). A single gene’s expression values from both cells
were processed and the bar plots were graphed by using GEO2R (Barrett et al.,
2013). Log fold change of gene expression level between cell groups were graphed
using box plots in R (Gentleman et al., 2004).
6.12 Motif discovery for SNP
In order to annotate SNP effects on regulatory motifs, sets of position weight
matrices (PWMs) were used from FIMO, HOMER (ChIP-seq known motifs), Genome
Trax (ChIP-seq TFBS), Haploreg (TRANSFAC, JASPAR, and PBM) (Grant et al., 2011;
Heinz et al., 2010; Matys et al., 2006; Portales-Casamar et al., 2010; Ward and Kellis,
2012). FIMO analysis was performed using the motif database, called JASPAR CORE
2009 vertebrates, downloaded from the MEME suite
167
(http://tools.genouest.org/tools/meme/meme-download.html)(Grant et al., 2011).
P-value for output threshold utilized for FIMO was 1e-4. FindMotif analysis was
executed by using known motifs generated from HOMER. Each motif matrix was
established after collecting strong binding sites of each TF genome wide from
published human ChIP-seq data. Log odds score of the motif matrix cut-off value 5
was used for findMotif analysis. Predicted ChIP-seq TFBS analysis from Genome
Trax was utilized with the motif score cut-off 0.7. Its database contains motif
matrices from best-scoring TF binding sites identified with a ChIP-chip or ChIP-seq
fragment. A stringent threshold of p < 4
-8
was applied for the PWM score of each
instance for Haploreg. The change in log-odds (LOD) score as alleles change was
calculated and listed in table 3-4 and 3-7. Each identified motif RE was organized by
SNP id, and the number of SNPs affecting regulatory motif was counted to rank the
TFs (Table 3-7).
6.13 Motif discovery for breast epithelial cell type specific
enhancers
In order to find regulatory motifs in enhancers, sets of position weight matrices
(PWMs) were used from FIMO and Genome Trax (Grant et al., 2011; Heinz et al.,
2010; Matys et al., 2006; Portales-Casamar et al., 2010; Ward and Kellis, 2012).
FIMO analysis was performed using the motif database, called JASPAR CORE 2009
vertebrates, downloaded from the MEME suite
168
(http://tools.genouest.org/tools/meme/meme-download.html)(Grant et al., 2011).
P-value for output threshold utilized for FIMO was 1e-4. Predicted ChIP-seq TFBS
analysis, predicted TFBS in DNase hypersensitivity regions, and TRANSFAC
experimentally verified TFBS data from Genome Trax were obtained. This database
contains motif matrices from best-scoring TF binding sites identified with a ChIP-
chip or ChIP-seq fragment. The enrichment of transcription motifs in enhancer
regions were calculated by performing chi-square test between groups.
6.14 Transcription factor and gene/protein interaction analysis
We obtained 52 molecules involved in breast cancer tumorigenesis, as well as
information of 18 TFs and 33 genes/proteins using an Ingenuity Pathway Analysis
(IPA, www.ingenuity.com). IPA Path Explore tools were used to identify direct and
indirect interactions among molecules. IPA Path Designer tools were utilized to map
the annotated subcellular location of each molecule.
6.15 eQTL analyses
We performed expression quantitative trait locus (eQTL) analyses on FunciSNP
identified SNPs to examine whether these SNPs are associated with messenger RNA
(mRNA) level of nearby genes. We used processed eQTL datasets from The Cancer
Genome Atlas (TCGA) breast cancer in 15 breast cancer risk loci and publicly
available datasets using RegulomeDB (Boyle et al., 2012; Li et al., 2013).
169
6.16 FunciSNP Method
6.16.1 FunciSNP package (Coetzee et al., 2012)
FunciSNP is an in-house developed R package, freely from the Bioconductor
(Gentleman et al., 2004) and licensed under the General Public License (GPLv3). We
built FunciSNP by integrating Bioconductor core packages: Rsamtools, GGtools,
VariantAnnotation, snpStats and ChIPpeakAnno. In order to run FunciSNP, GWAS
index SNP information and a set of user-defined NGS peak files (biofeatures) are
needed. For biofeatures, any bed file formatted ChIP-seq data or any collection of
genomic regions can be used.
FunciSNP extracts all available 1000 genomes SNPs (1kgSNP) within the
user-defined window surrounding each index SNPs (e.g. 1Mbp) directly from FTP
servers from the 1000 genomes public repository. FunciSNP utilizes the
GenomicRanges package (version_1.6.7) from Bioconductor to intersect 1kgSNP to
the biofeatures, and only those 1kgSNPs that overlap with a biofeature are used to
calculate the degree of correlation or association between index SNP using R
2
, D’.
The snpStats package (available through bioconductor) within FunciSNP package
provides the mechanism to measure LD. Genomic annotations (distance to nearest
TSS, nearest lincRNA, genomic characterization, such as gene bodies and promoters)
170
of those 1kgSNPs under biofeatures’ are also provided by using VariantAnnotation
(version_ 1.0.5), TxDb.Hsapiens.UCSC. hg19.knownGene (version_ 2.6.2), and
ChIPpeakAnno (version_ 2.2.0) which is used to map the nearest genes and long
non-coding RNAs (lincRNAs). LincRNA information was obtained directly from the
UCSC Genome table browser.
Since only those 1kgSNPs overlapping the selected biofeatures are processed,
total FunciSNP execution time is shorter: on average 3.8 s per 10 000 bp window
size centered on a index SNP using one single central processing unit (CPU) core.
FunciSNP outputs the result in an annotated table, which can be exported and used
in the various packages in R. FunciSNP also contains functions which can generate
several custom plots and summary tables to help users visualizing data. Most of
FunciSNP’s plots are generated using ggplot2 (http://had.co.nz/ggplot2/).
6.16.2 FunciSNP for the seventy-one breast cancer risk loci
FunciSNP version 0.99 (Coetzee et al., 2012) was used to find correlated SNPs,
which coincide with 11 independent ChIP-seq/FAIRE-seq/DNaseI-seq data sets in
TSS regions and putative enhancer regions. All the SNPs from the 1000 genomes
project (up to May 2012 data release) (2010) residing in 1Mb windows around
breast cancer risk index SNP and within EUR ethnic groups (original GWAS), were
analyzed with an r
2
value 0.5 as a cut-off (Table 3-2 and 3-6).
171
6.16.3 FunciSNP for two novel breast cancer risk loci; 6q14 and 20q11 (Siddiq
et al., 2012)
In an attempt to identify functionality at the two novel breast cancer risk loci (6q14
and 20q11), we used the open-source R/Bioconductor package FunciSNP version
0.99 (Coetzee et al., 2012), which systematically integrates the 1000 Genomes
Project SNP data (April 2012 data release) with chromatin features of interest. For
each of the two novel breast cancer markers, we analyzed all SNPs with an r
2
-value,
0.5 with each index SNP in the 1000 Genomes Project EUR populations in a 1 Mb
window around each index variant. We assessed whether these SNPs were co-
located with 12 different chromatin features generated by next-generation
sequencing technologies, which capture open chromatin regions, promoters and
enhancers genome wide in HMEC as well as known DNaseI hypersensitive locations,
FAIRE-seq peaks and CTCF-binding sites from .100 different cell types, which were
collected in ENCODE data (Ernst et al., 2011). We utilized the UCSC Genome
Browser (http://genome.ucsc.edu/) to illustrate the correlated SNPs, which overlap
chromatin features as well as chromatin feature tracks (Fig. 5-1 and 5-2).
6.16.4 FunciSNP for the three novel ER negative breast cancer risk loci
(Garcia-Closas, 2013)
172
In an attempt to identify functionality in regions of interest, we used the open
source
R/Bioconductor package FunciSNP version 0.1.14 (Functional Integration of SNPs)
(Coetzee et al., 2012), which systematically integrates the 1,000 Genomes Project
SNP data (June 2011 data release) with chromatin features of interest. For each of
the four novel ER-negative breast cancer markers we analyzed all SNPs within a 1
Mb window that were in linkage disequilibrium (r
2
>0.5) with the index marker
(according to the 1,000 Genome Project CEU panel). We assessed whether these
SNPs were co-located with 13 different chromatin features generated by next-
generation sequencing technologies, which capture open chromatin regions and
enhancers genome-wide. The open chromatin states (H3K9Ac, H3K14Ac),
nucleosome-depleted regions (DNaseI, FAIRE), enhancers (H3K4me1), and
active/engaged enhancers (H3K27ac) were generated either by the Coetzee
Laboratory (University of Southern California, Los Angeles, CA) (Rhie et al, under
review), or harvested from the ENCODE project. All chromatin features were
identified in normal human mammary epithelial cells (HMEC) and triple-negative
breast cancer cells (MDAMB231). We used the UCSC Genome Browser
(http://genome.ucsc.edu/) with potentially functional SNPs identified using
FunciSNP and chromatin features tracks to generate images (Fig. 5-4).
173
6.16.5 FunciSNP for novel loci associated with Body Mass Index (BMI) (Monda
et al., 2013)
In an attempt to identify functionality in non-coding regions at the three loci, we
utilized FunciSNP version 0.99(Coetzee et al., 2012), which systematically integrates
the 1,000 Genomes SNP data (April 2012) with chromatin features of interest. In
order to capture regulatory elements, we used 73 different chromatin features
generated by next-generation sequencing technologies in brain and adipose tissues
from the NIH Epigenomics Roadmap (Bernstein et al., 2010) as well as known
DNaseI hypersensitive locations, FAIRE-seq peaks, and CTCF binding sites from
more than 100 different cell types, which were collected from the ENCODE data
(Ernst et al., 2011). All SNPs with an r
2
value >0.5 with each index SNP in the 1KGP
AFR populations in a 1Mb window around each index variant were catalogued. We
used the UCSC Genome Browser (http://genome.ucsc.edu/) to illustrate the
correlated SNPs which overlap chromatin features from these tissues as well as
chromatin features from seven cell lines utilized in the ENCODE Project (Fig 5-5, 5-6,
5-7). All of results from these analyses are provided in table 5-1, 5-2, and 5-3.
174
6.16.6 FunciSNP for novel loci associated with schizophrenia (SCZ)
(preliminary data from the Psychiatric Genetics Consortium, provided by
James Knowles)
In an attempt to identify functionality in non-coding regions at the three loci, we
utilized FunciSNP version 0.99 (Coetzee et al., 2012), which systematically
integrates the 1,000 Genomes SNP data (April 2012) with chromatin features of
interest. In order to capture regulatory elements, we used 73 different chromatin
features generated by next-generation sequencing technologies in brain (Substantia
nigra, hippocampus.middle, germinal matrix, mid frontal lobe, inferior temporal
lobe, anterior caudate, cingulate gyrus), fetal brain and neurosphere (cortex derived
neurosphere, ganglionic eminence derived neurosphere) from the NIH Epigenomics
Roadmap(Bernstein et al., 2010). All SNPs with an r
2
value >0.9 with each index SNP
in the 1KGP EUR populations in a 1Mb window around each index variant were
catalogued. We used the UCSC Genome Browser (http://genome.ucsc.edu/) to
illustrate the correlated SNPs, which overlap chromatin features from these tissues
(Fig. 5-5, 5-6, 5-7). All of results from these analyses are provided in table 5-4 and
table 5-5.
175
6.17 Functional SNP validation
6.17.1 Allele-specific Luciferase reporter assays
Point mutations were introduced to create enhancer-reporter constructs with
specific SNP allele using QuikChange site-directed mutagenesis kit (Agilent
Technologies Inc., Santa Clara, CA). In order to avoid the bias from miniprep
procedures, six independent clones of each construct were made and confirmed by
sequencing. Each of the six independent clones of each construct were transfected in
four wells and two luciferase assays per well were performed in order to record
luciferase-reading variation. Allele-specific fold activities were presented and values
shown are means ± SEM of the six independent clones of each allele. The Analysis of
variance statistical test (ANOVA) was used to confirm the difference and two-side p-
values between alleles were calculated using the student t-test.
6.17.2 Allele-specific FAIRE
PCR reactions were performed on FAIRE-isolated and input DNA using High Fidelity
Platinum Tag DNA polymerase (Invitrogen Corp., Carlsbad, CA) for 15 cycles after
which products were purified and re-PCRed for 20 cycles to minimize the PCR
artifacts. Purified DNA from these reactions was sequenced, using primers near the
SNP locations by the DNA Core Facility at the University of Southern California
(Table 4-1). Each experiment was independently performed more than twice.
176
Chapter 7. Concluding Remarks
Since I have discussed each chapter in detail, I have just a few concluding remarks
here.
o We defined enhancers qualitatively and quantitatively in breast
epithelial cells and found that FAIRE and histone modified signals
genome-wide can be used for this purpose (chapter 2).
o By annotating functionality (exons, TSSs, and enhancers as defined
above) at seventy-one breast cancer risk loci, we identified 1,005
putative functional SNPs (chapter 3). We were not able to test the
effects of all of these identified SNPs on gene regulation and function.
However, by performing allele specific enhancer assays on one
enhancer region (i.e. BCE5), we were able to show that SNPs residing
at this enhancer were functional (chapter 4). In future genetic
epidemiology studies, the odds ratio of these SNPs should be tested in
so-called fine mapping analyses, which better define the chromosomal
boundaries with the strongest association signal.
o Some attempts have been made to elucidate functionality using
genetic and epidemiology approaches. For example, recently, Lee et al
performed fine mapping at one of the seventy-one breast cancer risk
loci, namely 14q24.1. By imputing a 3.93MB region flanking index
SNP, rs999737 in Stages 1 and 2 of the CGEMS study (5,692 cases,
177
5,576 controls), they determined that initial index SNP, rs999737
retained the strongest association with breast cancer risk (Lee et al.,
2012). However, as showed in chapter 2, rs999737 is located in the
intron of RAD51B gene, and there are no histone marks for promoter
and enhancer near this SNP. Shen et al (Submitted to Nature
Communications) has reported that one of the ovarian risk SNPs,
which is confirmed by fine mapping, is affecting DNA methylation
patterns. However, there were no nucleosome-depleted regions or
regulatory element histone marks near the SNP. It could be that allele-
specific DNA methylation near risk SNP could provide some biological
understanding of the hidden etiology for the disease. However, as
showed in the section 5.3.2, there is a very small number (28) of SNPs
in the LD associated with gene expression at such risk loci. For
instance, at the novel BMI risk locus, 5q33, the index SNP and all other
correlated SNPs were not associated with any gene expression. Then,
how could one explain the etiology of diseases from such risk SNP
without apparent functionality? One likely reason may be that the
wrong cell type was analyzed or that regulatory effects extended to
distant genes (even in trans to other chromosomes). How does one
study situations like these further? One way would be to destroy
regulatory elements in different cell types and measure effect on gene
178
expression genome-wide. This can be achieved by a TALEN approach
(Ding et al., 2013) or in mouse models (Ting et al., 2012).
Alternatively, the sample number for eQTL analyses could have been
too low to detect the association signal between the risk loci and the
gene. If the sample number for eQTL analyses increases, one perhaps
could be able to detect an association between gene and the SNP.
o Functional analyses, as suggested in this thesis, may be informative to
detect and study risk mechanisms and the genes involved in disease
etiology. Much progress has already been made in the past several
years. Before 1953, we did not know the structure of DNA. Before
2003, we did not fully understand the human genome sequence. It can
be argued that at present we may still know too little to explain the
etiology of GWAS risk SNPs. However, as showed in chapter 2, the
understanding level of the human genome, epigenetic mechanism, and
gene regulation processes have all rapidly increased in recent years.
For instance, cell-type specific and ubiquitous regulatory elements
were identified, and roles of non-coding RNAs such as miRNA and
lincRNA were explained in detail in recent studies (Su et al., 2012).
o Additionally, I am convinced that we eventually will be able to connect
the dots between genomic studies, functionality and GWAS signals by
understanding the underlying biology fully in terms of mechanisms
179
involved. Such information will very likely be useful in disease
diagnostics, prevention and treatment strategies.
180
References
(2008). "World Cancer Report". International Agency for Research on Cancer.
(2010). A map of human genome variation from population-scale sequencing.
Nature 467, 1061-1073.
(2012). Comprehensive molecular portraits of human breast tumours. Nature 490,
61-70.
Adiseshaiah, P., Papaiahgari, S.R., Vuong, H., Kalvakolanu, D.V., and Reddy, S.P.
(2003). Multiple cis-elements mediate the transcriptional activation of human fra-1
by 12-O-tetradecanoylphorbol-13-acetate in bronchial epithelial cells. J Biol Chem
278, 47423-47433.
Adzhubei, I.A., Schmidt, S., Peshkin, L., Ramensky, V.E., Gerasimova, A., Bork, P.,
Kondrashov, A.S., and Sunyaev, S.R. (2010). A method and server for predicting
damaging missense mutations. Nat Methods 7, 248-249.
Ahmad, F., Arad, M., Musi, N., He, H., Wolf, C., Branco, D., Perez-Atayde, A.R.,
Stapleton, D., Bali, D., Xing, Y., et al. (2005). Increased alpha2 subunit-associated
AMPK activity and PRKAG2 cardiomyopathy. Circulation 112, 3140-3148.
Ahmadiyeh, N., Pomerantz, M.M., Grisanzio, C., Herman, P., Jia, L., Almendro, V., He,
H.H., Brown, M., Liu, X.S., Davis, M., et al. (2010). 8q24 prostate, breast, and colon
cancer risk loci show tissue-specific long-range interaction with MYC. Proc Natl
Acad Sci U S A 107, 9742-9746.
Ahmed, S., Thomas, G., Ghoussaini, M., Healey, C.S., Humphreys, M.K., Platte, R.,
Morrison, J., Maranian, M., Pooley, K.A., Luben, R., et al. (2009). Newly discovered
breast cancer susceptibility loci on 3p24 and 17q23.2. Nat Genet 41, 585-590.
Albers, M., Kranz, H., Kober, I., Kaiser, C., Klink, M., Suckow, J., Kern, R., and Koegl, M.
(2005). Automated yeast two-hybrid screening for nuclear receptor-interacting
proteins. Mol Cell Proteomics 4, 205-213.
Altshuler, D.M., Gibbs, R.A., Peltonen, L., Dermitzakis, E., Schaffner, S.F., Yu, F.,
Bonnen, P.E., de Bakker, P.I., Deloukas, P., Gabriel, S.B., et al. (2010). Integrating
common and rare genetic variation in diverse human populations. Nature 467, 52-
58.
181
Angel, P., and Karin, M. (1991). The role of Jun, Fos and the AP-1 complex in cell-
proliferation and transformation. Biochim Biophys Acta 1072, 129-157.
Antoniou, A.C., Wang, X., Fredericksen, Z.S., McGuffog, L., Tarrell, R., Sinilnikova, O.M.,
Healey, S., Morrison, J., Kartsonaki, C., Lesnick, T., et al. (2010). A locus on 19p13
modifies risk of breast cancer in BRCA1 mutation carriers and is associated with
hormone receptor-negative breast cancer in the general population. Nat Genet 42,
885-892.
Babu, M.M., Luscombe, N.M., Aravind, L., Gerstein, M., and Teichmann, S.A. (2004).
Structure and evolution of transcriptional regulatory networks. Curr Opin Struct
Biol 14, 283-291.
Barrett, J.C., Clayton, D.G., Concannon, P., Akolkar, B., Cooper, J.D., Erlich, H.A., Julier,
C., Morahan, G., Nerup, J., Nierras, C., et al. (2009). Genome-wide association study
and meta-analysis find that over 40 loci affect risk of type 1 diabetes. Nature
genetics 41, 703-707.
Barrett, T., Wilhite, S.E., Ledoux, P., Evangelista, C., Kim, I.F., Tomashevsky, M.,
Marshall, K.A., Phillippy, K.H., Sherman, P.M., Holko, M., et al. (2013). NCBI GEO:
archive for functional genomics data sets--update. Nucleic Acids Res 41, D991-995.
Baylin, S.B., and Jones, P.A. (2011). A decade of exploring the cancer epigenome -
biological and translational implications. Nat Rev Cancer 11, 726-734.
Bedell, V.M., Wang, Y., Campbell, J.M., Poshusta, T.L., Starker, C.G., Krug Ii, R.G., Tan,
W., Penheiter, S.G., Ma, A.C., Leung, A.Y., et al. (2012). In vivo genome editing using a
high-efficiency TALEN system. Nature.
Beebe-Dimmer, J.L., Levin, A.M., Ray, A.M., Zuhlke, K.A., Machiela, M.J., Halstead-
Nussloch, B.A., Johnson, G.R., Cooney, K.A., and Douglas, J.A. (2008). Chromosome
8q24 markers: risk of early-onset and familial prostate cancer. Int J Cancer 122,
2876-2879.
Bernstein, B.E., Kamal, M., Lindblad-Toh, K., Bekiranov, S., Bailey, D.K., Huebert, D.J.,
McMahon, S., Karlsson, E.K., Kulbokas, E.J., 3rd, Gingeras, T.R., et al. (2005). Genomic
maps and comparative analysis of histone modifications in human and mouse. Cell
120, 169-181.
Bernstein, B.E., Stamatoyannopoulos, J.A., Costello, J.F., Ren, B., Milosavljevic, A.,
Meissner, A., Kellis, M., Marra, M.A., Beaudet, A.L., Ecker, J.R., et al. (2010). The NIH
Roadmap Epigenomics Mapping Consortium. Nat Biotechnol 28, 1045-1048.
182
Boffetta, P., and Nyberg, F. (2003). Contribution of environmental factors to cancer
risk. Br Med Bull 68, 71-94.
Bolton, K.L., Tyrer, J., Song, H., Ramus, S.J., Notaridou, M., Jones, C., Sher, T., Gentry-
Maharaj, A., Wozniak, E., Tsai, Y.Y., et al. (2010). Common variants at 19p13 are
associated with susceptibility to ovarian cancer. Nat Genet 42, 880-884.
Boyle, A.P., Hong, E.L., Hariharan, M., Cheng, Y., Schaub, M.A., Kasowski, M.,
Karczewski, K.J., Park, J., Hitz, B.C., Weng, S., et al. (2012). Annotation of functional
variation in personal genomes using RegulomeDB. Genome Res 22, 1790-1797.
Brachner, A., Braun, J., Ghodgaonkar, M., Castor, D., Zlopasa, L., Ehrlich, V., Jiricny, J.,
Gotzmann, J., Knasmuller, S., and Foisner, R. (2012). The endonuclease Ankle1
requires its LEM and GIY-YIG motifs for DNA cleavage in vivo. J Cell Sci 125, 1048-
1057.
Burwinkel, B., Scott, J.W., Buhrer, C., van Landeghem, F.K., Cox, G.F., Wilson, C.J.,
Grahame Hardie, D., and Kilimann, M.W. (2005). Fatal congenital heart glycogenosis
caused by a recurrent activating R531Q mutation in the gamma 2-subunit of AMP-
activated protein kinase (PRKAG2), not by phosphorylase kinase deficiency. Am J
Hum Genet 76, 1034-1049.
Campeau, P.M., Foulkes, W.D., and Tischkowitz, M.D. (2008). Hereditary breast
cancer: new genetic developments, new therapeutic avenues. Hum Genet 124, 31-42.
Caporaso, N., Gu, F., Chatterjee, N., Sheng-Chih, J., Yu, K., Yeager, M., Chen, C., Jacobs,
K., Wheeler, W., Landi, M.T., et al. (2009). Genome-wide and candidate gene
association study of cigarette smoking behaviors. PLoS One 4, e4653.
Carey, L.A., Perou, C.M., Livasy, C.A., Dressler, L.G., Cowan, D., Conway, K., Karaca, G.,
Troester, M.A., Tse, C.K., Edmiston, S., et al. (2006). Race, breast cancer subtypes, and
survival in the Carolina Breast Cancer Study. Jama 295, 2492-2502.
Cheng, L., Tachibana, K., Zhang, Y., Guo, J., Kahori Tachibana, K., Kameyama, A., Wang,
H., Hiruma, T., Iwasaki, H., Togayachi, A., et al. (2002). Characterization of a novel
human UDP-GalNAc transferase, pp-GalNAc-T10. FEBS Lett 531, 115-121.
Chevillard, G., and Blank, V. (2011). NFE2L3 (NRF3): the Cinderella of the
Cap'n'Collar transcription factors. Cell Mol Life Sci 68, 3337-3348.
183
Clerc, P., Coll Constans, M.G., Lulka, H., Broussaud, S., Guigne, C., Leung-Theung-Long,
S., Perrin, C., Knauf, C., Carpene, C., Penicaud, L., et al. (2007). Involvement of
cholecystokinin 2 receptor in food intake regulation: hyperphagia and increased fat
deposition in cholecystokinin 2 receptor-deficient mice. Endocrinology 148, 1039-
1049.
Coetzee, S.G., Rhie, S.K., Berman, B.P., Coetzee, G.A., and Noushmehr, H. (2012).
FunciSNP: an R/bioconductor tool integrating functional non-coding data sets with
genetic association studies to identify candidate regulatory SNPs. Nucleic Acids Res.
Cowper-Sal Lari, R., Zhang, X., Wright, J.B., Bailey, S.D., Cole, M.D., Eeckhoute, J.,
Moore, J.H., and Lupien, M. (2012). Breast cancer risk-associated SNPs modulate the
affinity of chromatin for FOXA1 and alter gene expression. Nat Genet.
Curtis, C., Shah, S.P., Chin, S.F., Turashvili, G., Rueda, O.M., Dunning, M.J., Speed, D.,
Lynch, A.G., Samarajiwa, S., Yuan, Y., et al. (2012). The genomic and transcriptomic
architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346-352.
D'Amato, N.C., Ostrander, J.H., Bowie, M.L., Sistrunk, C., Borowsky, A., Cardiff, R.D.,
Bell, K., Young, L.J., Simin, K., Bachelder, R.E., et al. (2012). Evidence for phenotypic
plasticity in aggressive triple-negative breast cancer: human biology is recapitulated
by a novel model system. PLoS One 7, e45684.
Daniel, T., and Carling, D. (2002). Functional analysis of mutations in the gamma 2
subunit of AMP-activated protein kinase associated with cardiac hypertrophy and
Wolff-Parkinson-White syndrome. J Biol Chem 277, 51017-51024.
Database, G.S.G.t.S.B.C. http://www.itb.cnr.it/breastcancer/index.html.
Dimas, A.S., Deutsch, S., Stranger, B.E., Montgomery, S.B., Borel, C., Attar-Cohen, H.,
Ingle, C., Beazley, C., Gutierrez Arcelus, M., Sekowska, M., et al. (2009). Common
regulatory variation impacts gene expression in a cell type-dependent manner.
Science 325, 1246-1250.
Ding, Q., Lee, Y.K., Schaefer, E.A., Peters, D.T., Veres, A., Kim, K., Kuperwasser, N.,
Motola, D.L., Meissner, T.B., Hendriks, W.T., et al. (2013). A TALEN Genome-Editing
System for Generating Human Stem Cell-Based Disease Models. Cell Stem Cell 12,
238-251.
Dreszer, T.R., Karolchik, D., Zweig, A.S., Hinrichs, A.S., Raney, B.J., Kuhn, R.M., Meyer,
L.R., Wong, M., Sloan, C.A., Rosenbloom, K.R., et al. (2012). The UCSC Genome
Browser database: extensions and updates 2011. Nucleic Acids Res 40, D918-923.
184
Easton, D.F., Pooley, K.A., Dunning, A.M., Pharoah, P.D., Thompson, D., Ballinger, D.G.,
Struewing, J.P., Morrison, J., Field, H., Luben, R., et al. (2007). Genome-wide
association study identifies novel breast cancer susceptibility loci. Nature 447, 1087-
1093.
Ecker, J.R., Bickmore, W.A., Barroso, I., Pritchard, J.K., Gilad, Y., and Segal, E. (2012).
Genomics: ENCODE explained. Nature 489, 52-55.
Elsheikh, S.E., Green, A.R., Rakha, E.A., Powe, D.G., Ahmed, R.A., Collins, H.M., Soria,
D., Garibaldi, J.M., Paish, C.E., Ammar, A.A., et al. (2009). Global histone modifications
in breast cancer correlate with tumor phenotypes, prognostic factors, and patient
outcome. Cancer Res 69, 3802-3809.
Ernst, J., Kheradpour, P., Mikkelsen, T.S., Shoresh, N., Ward, L.D., Epstein, C.B., Zhang,
X., Wang, L., Issner, R., Coyne, M., et al. (2011). Mapping and analysis of chromatin
state dynamics in nine human cell types. Nature 473, 43-49.
Eswarakumar, V.P., Monsonego-Ornan, E., Pines, M., Antonopoulou, I., Morriss-Kay,
G.M., and Lonai, P. (2002). The IIIc alternative of Fgfr2 is a positive regulator of bone
formation. Development 129, 3783-3793.
Evert, B.O., Wullner, U., and Klockgether, T. (2000). Cell death in polyglutamine
diseases. Cell Tissue Res 301, 189-204.
Finak, G., Bertos, N., Pepin, F., Sadekova, S., Souleimanova, M., Zhao, H., Chen, H.,
Omeroglu, G., Meterissian, S., Omeroglu, A., et al. (2008). Stromal gene expression
predicts clinical outcome in breast cancer. Nat Med 14, 518-527.
Fletcher, O., Johnson, N., Orr, N., Hosking, F.J., Gibson, L.J., Walker, K., Zelenika, D.,
Gut, I., Heath, S., Palles, C., et al. (2011). Novel breast cancer susceptibility locus at
9q31.2: results of a genome-wide association study. J Natl Cancer Inst 103, 425-435.
Fogh, J., Fogh, J.M., and Orfeo, T. (1977). One hundred and twenty-seven cultured
human tumor cell lines producing tumors in nude mice. J Natl Cancer Inst 59, 221-
226.
Ford, D., Easton, D.F., and Peto, J. (1995). Estimates of the gene frequency of BRCA1
and its contribution to breast and ovarian cancer incidence. Am J Hum Genet 57,
1457-1462.
185
Foulkes, W.D. (2008). Inherited susceptibility to common cancers. N Engl J Med 359,
2143-2153.
Frayling, T.M., Timpson, N.J., Weedon, M.N., Zeggini, E., Freathy, R.M., Lindgren, C.M.,
Perry, J.R., Elliott, K.S., Lango, H., Rayner, N.W., et al. (2007). A common variant in the
FTO gene is associated with body mass index and predisposes to childhood and
adult obesity. Science 316, 889-894.
Freedman, M.L., Haiman, C.A., Patterson, N., McDonald, G.J., Tandon, A., Waliszewska,
A., Penney, K., Steen, R.G., Ardlie, K., John, E.M., et al. (2006). Admixture mapping
identifies 8q24 as a prostate cancer risk locus in African-American men. Proc Natl
Acad Sci U S A 103, 14068-14073.
Freedman, M.L., Monteiro, A.N., Gayther, S.A., Coetzee, G.A., Risch, A., Plass, C., Casey,
G., De Biasi, M., Carlson, C., Duggan, D., et al. (2011). Principles for the post-GWAS
functional characterization of cancer risk loci. Nat Genet 43, 513-518.
Garcia-Closas, M., Couch, F.J., Lindstrom, S., Michailidou, K., Schmidt, M.K., Orr, N.,
Rhie, S.K., Riboli, E., Feigelson, H.S., Marchand, L.L. et al (2013). Genome-wide
association studies identify four ER-negative specific breast cancer risk loci. Nat
Genet 45, 392-398.
Gentleman, R.C., Carey, V.J., Bates, D.M., Bolstad, B., Dettling, M., Dudoit, S., Ellis, B.,
Gautier, L., Ge, Y., Gentry, J., et al. (2004). Bioconductor: open software development
for computational biology and bioinformatics. Genome Biol 5, R80.
Ghoussaini, M., Fletcher, O., Michailidou, K., Turnbull, C., Schmidt, M.K., Dicks, E.,
Dennis, J., Wang, Q., Humphreys, M.K., Luccarini, C., et al. (2013). Genome-wide
association analysis identifies three new breast cancer susceptibility loci. Nat Genet
44, 312-318.
Ghoussaini, M., Song, H., Koessler, T., Al Olama, A.A., Kote-Jarai, Z., Driver, K.E.,
Pooley, K.A., Ramus, S.J., Kjaer, S.K., Hogdall, E., et al. (2008). Multiple loci with
different cancer specificities within the 8q24 gene desert. J Natl Cancer Inst 100,
962-966.
Giardine, B., Riemer, C., Hardison, R.C., Burhans, R., Elnitski, L., Shah, P., Zhang, Y.,
Blankenberg, D., Albert, I., Taylor, J., et al. (2005). Galaxy: a platform for interactive
large-scale genome analysis. Genome Res 15, 1451-1455.
Gibson, G. (2011). Rare and common variants: twenty arguments. Nat Rev Genet 13,
135-145.
186
Giresi, P.G., Kim, J., McDaniell, R.M., Iyer, V.R., and Lieb, J.D. (2007). FAIRE
(Formaldehyde-Assisted Isolation of Regulatory Elements) isolates active regulatory
elements from human chromatin. Genome Res 17, 877-885.
Gollob, M.H., Green, M.S., Tang, A.S., and Roberts, R. (2002). PRKAG2 cardiac
syndrome: familial ventricular preexcitation, conduction system disease, and
cardiac hypertrophy. Curr Opin Cardiol 17, 229-234.
Graeser, M.K., Engel, C., Rhiem, K., Gadzicki, D., Bick, U., Kast, K., Froster, U.G.,
Schlehe, B., Bechtold, A., Arnold, N., et al. (2009). Contralateral breast cancer risk in
BRCA1 and BRCA2 mutation carriers. J Clin Oncol 27, 5887-5892.
Graham, K., de las Morenas, A., Tripathi, A., King, C., Kavanah, M., Mendez, J., Stone,
M., Slama, J., Miller, M., Antoine, G., et al. (2010). Gene expression in histologically
normal epithelium from breast cancer patients and from cancer-free prophylactic
mastectomy patients shares a similar profile. Br J Cancer 102, 1284-1293.
Grant, C.E., Bailey, T.L., and Noble, W.S. (2011). FIMO: scanning for occurrences of a
given motif. Bioinformatics 27, 1017-1018.
Gruber, S.B., Moreno, V., Rozek, L.S., Rennerts, H.S., Lejbkowicz, F., Bonner, J.D.,
Greenson, J.K., Giordano, T.J., Fearson, E.R., and Rennert, G. (2007). Genetic variation
in 8q24 associated with risk of colorectal cancer. Cancer Biol Ther 6, 1143-1147.
Gudmundsson, J., Sulem, P., Manolescu, A., Amundadottir, L.T., Gudbjartsson, D.,
Helgason, A., Rafnar, T., Bergthorsson, J.T., Agnarsson, B.A., Baker, A., et al. (2007).
Genome-wide association study identifies a second prostate cancer susceptibility
variant at 8q24. Nat Genet 39, 631-637.
Guilford, P., Hopkins, J., Harraway, J., McLeod, M., McLeod, N., Harawira, P., Taite, H.,
Scoular, R., Miller, A., and Reeve, A.E. (1998). E-cadherin germline mutations in
familial gastric cancer. Nature 392, 402-405.
Haiman, C.A., Chen, G.K., Vachon, C.M., Canzian, F., Dunning, A., Millikan, R.C., Wang,
X., Ademuyiwa, F., Ahmed, S., Ambrosone, C.B., et al. (2011). A common variant at
the TERT-CLPTM1L locus is associated with estrogen receptor-negative breast
cancer. Nat Genet 43, 1210-1214.
Haiman, C.A., Patterson, N., Freedman, M.L., Myers, S.R., Pike, M.C., Waliszewska, A.,
Neubauer, J., Tandon, A., Schirmer, C., McDonald, G.J., et al. (2007). Multiple regions
within 8q24 independently affect risk for prostate cancer. Nat Genet 39, 638-644.
187
Hall, J.M., Lee, M.K., Newman, B., Morrow, J.E., Anderson, L.A., Huey, B., and King, M.C.
(1990). Linkage of early-onset familial breast cancer to chromosome 17q21. Science
250, 1684-1689.
Hansson, A., Manetopoulos, C., Jonsson, J.I., and Axelson, H. (2003). The basic helix-
loop-helix transcription factor TAL1/SCL inhibits the expression of the p16INK4A
and pTalpha genes. Biochem Biophys Res Commun 312, 1073-1081.
Hardison, R.C. (2012). Genome-wide Epigenetic Data Facilitate Understanding of
Disease Susceptibility Association Studies. J Biol Chem 287, 30932-30940.
He, H.H., Meyer, C.A., Chen, M.W., Jordan, V.C., Brown, M., and Liu, X.S. (2012).
Differential DNase I hypersensitivity reveals factor-dependent chromatin dynamics.
Genome Res 22, 1015-1025.
Heaney, J.D., Michelson, M.V., Youngren, K.K., Lam, M.Y., and Nadeau, J.H. (2009).
Deletion of eIF2beta suppresses testicular cancer incidence and causes recessive
lethality in agouti-yellow mice. Hum Mol Genet 18, 1395-1404.
Heid, I.M., Jackson, A.U., Randall, J.C., Winkler, T.W., Qi, L., Steinthorsdottir, V.,
Thorleifsson, G., Zillikens, M.C., Speliotes, E.K., Magi, R., et al. (2010). Meta-analysis
identifies 13 new loci associated with waist-hip ratio and reveals sexual dimorphism
in the genetic basis of fat distribution. Nature genetics 42, 949-960.
Heintzman, N.D., Stuart, R.K., Hon, G., Fu, Y., Ching, C.W., Hawkins, R.D., Barrera, L.O.,
Van Calcar, S., Qu, C., Ching, K.A., et al. (2007). Distinct and predictive chromatin
signatures of transcriptional promoters and enhancers in the human genome. Nat
Genet 39, 311-318.
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y.C., Laslo, P., Cheng, J.X., Murre, C.,
Singh, H., and Glass, C.K. (2010). Simple combinations of lineage-determining
transcription factors prime cis-regulatory elements required for macrophage and B
cell identities. Mol Cell 38, 576-589.
Hemminki, A., Markie, D., Tomlinson, I., Avizienyte, E., Roth, S., Loukola, A., Bignell,
G., Warren, W., Aminoff, M., Hoglund, P., et al. (1998). A serine/threonine kinase
gene defective in Peutz-Jeghers syndrome. Nature 391, 184-187.
Hemminki, A., Tomlinson, I., Markie, D., Jarvinen, H., Sistonen, P., Bjorkqvist, A.M.,
Knuutila, S., Salovaara, R., Bodmer, W., Shibata, D., et al. (1997). Localization of a
susceptibility locus for Peutz-Jeghers syndrome to 19p using comparative genomic
hybridization and targeted linkage analysis. Nat Genet 15, 87-90.
188
Herman, J.G., Merlo, A., Mao, L., Lapidus, R.G., Issa, J.P., Davidson, N.E., Sidransky, D.,
and Baylin, S.B. (1995). Inactivation of the CDKN2/p16/MTS1 gene is frequently
associated with aberrant DNA methylation in all common human cancers. Cancer
Res 55, 4525-4530.
Howell, G.R., Shindo, M., Murray, S., Gridley, T., Wilson, L.A., and Schimenti, J.C.
(2007). Mutation of a ubiquitously expressed mouse transmembrane protein
(Tapt1) causes specific skeletal homeotic transformations. Genetics 175, 699-707.
Hulka, B.S., and Stark, A.T. (1995). Breast cancer: cause and prevention. Lancet 346,
883-887.
Hunter, D.J., Kraft, P., Jacobs, K.B., Cox, D.G., Yeager, M., Hankinson, S.E., Wacholder,
S., Wang, Z., Welch, R., Hutchinson, A., et al. (2007). A genome-wide association
study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal
breast cancer. Nat Genet 39, 870-874.
J
enne, D.E., Reimann, H., Nezu, J., Friedel, W., Loff, S., Jeschke, R., Muller, O., Back, W.,
and Zimmer, M. (1998). Peutz-Jeghers syndrome is caused by mutations in a novel
serine threonine kinase. Nat Genet 18, 38-43.
Jia, L., Berman, B.P., Jariwala, U., Yan, X., Cogan, J.P., Walters, A., Chen, T., Buchanan,
G., Frenkel, B., and Coetzee, G.A. (2008). Genomic androgen receptor-occupied
regions with different functions, defined by histone acetylation, coregulators and
transcriptional capacity. PLoS One 3, e3645.
Jia, L., Landan, G., Pomerantz, M., Jaschek, R., Herman, P., Reich, D., Yan, C., Khalid, O.,
Kantoff, P., Oh, W., et al. (2009). Functional enhancers at the gene-poor 8q24 cancer-
linked locus. PLoS Genet 5, e1000597.
Kalhor, R., Tjong, H., Jayathilaka, N., Alber, F., and Chen, L. (2012). Genome
architectures revealed by tethered chromosome conformation capture and
population-based modeling. Nat Biotechnol 30, 90-98.
Kathiresan, S., Willer, C.J., Peloso, G.M., Demissie, S., Musunuru, K., Schadt, E.E.,
Kaplan, L., Bennett, D., Li, Y., Tanaka, T., et al. (2009). Common variants at 30 loci
contribute to polygenic dyslipidemia. Nature genetics 41, 56-65.
Kim, H.W., Ha, S.H., Lee, M.N., Huston, E., Kim, D.H., Jang, S.K., Suh, P.G., Houslay, M.D.,
and Ryu, S.H. (2010). Cyclic AMP controls mTOR through regulation of the dynamic
interaction between Rheb and phosphodiesterase 4D. Mol Cell Biol 30, 5406-5420.
189
Kimura, A., Matsubara, K., and Horikoshi, M. (2005). A decade of histone acetylation:
marking eukaryotic chromosomes with specific codes. J Biochem 138, 647-662.
Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., Jones, S.J.,
and Marra, M.A. (2009). Circos: an information aesthetic for comparative genomics.
Genome Res 19, 1639-1645.
Kumar, P., Henikoff, S., and Ng, P.C. (2009). Predicting the effects of coding non-
synonymous variants on protein function using the SIFT algorithm. Nat Protoc 4,
1073-1081.
Lee, P., Fu, Y.P., Figueroa, J.D., Prokunina-Olsson, L., Gonzalez-Bosquet, J., Kraft, P.,
Wang, Z., Jacobs, K.B., Yeager, M., Horner, M.J., et al. (2012). Fine mapping of 14q24.1
breast cancer susceptibility locus. Hum Genet 131, 479-490.
Lemonnier, J., Hay, E., Delannoy, P., Fromigue, O., Lomri, A., Modrowski, D., and
Marie, P.J. (2001). Increased osteoblast apoptosis in apert craniosynostosis: role of
protein kinase C and interleukin-1. Am J Pathol 158, 1833-1842.
Li, G., Ruan, X., Auerbach, R.K., Sandhu, K.S., Zheng, M., Wang, P., Poh, H.M., Goh, Y.,
Lim, J., Zhang, J., et al. (2012). Extensive promoter-centered chromatin interactions
provide a topological basis for transcription regulation. Cell 148, 84-98.
Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis,
G., and Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools.
Bioinformatics 25, 2078-2079.
Li, Q., Seo, J.H., Stranger, B., McKenna, A., Pe'er, I., Laframboise, T., Brown, M.,
Tyekucheva, S., and Freedman, M.L. (2013). Integrative eQTL-Based Analyses Reveal
the Biology of Breast Cancer Risk Loci. Cell 152, 633-641.
Lichtenstein, P., Holm, N.V., Verkasalo, P.K., Iliadou, A., Kaprio, J., Koskenvuo, M.,
Pukkala, E., Skytthe, A., and Hemminki, K. (2000). Environmental and heritable
factors in the causation of cancer--analyses of cohorts of twins from Sweden,
Denmark, and Finland. N Engl J Med 343, 78-85.
Linger, R.J., and Kruk, P.A. (2010). BRCA1 16 years later: risk-associated BRCA1
mutations and their functional implications. Febs J 277, 3086-3096.
Liu, Y., Zhong, X., Li, W., Brattain, M.G., and Banerji, S.S. (2000). The role of Sp1 in the
differential expression of transforming growth factor-beta receptor type II in human
breast adenocarcinoma MCF-7 cells. J Biol Chem 275, 12231-12236.
190
Ma, X.J., Dahiya, S., Richardson, E., Erlander, M., and Sgroi, D.C. (2009). Gene
expression profiling of the tumor microenvironment during breast cancer
progression. Breast Cancer Res 11, R7.
Mann, M., Cortez, V., and Vadlamudi, R.K. (2011). Epigenetics of Estrogen Receptor
Signaling: Role in Hormonal Cancer Progression and Therapy. Cancers (Basel) 3,
1691-1707.
Martin, A.M., and Weber, B.L. (2000). Genetic and hormonal risk factors in breast
cancer. J Natl Cancer Inst 92, 1126-1135.
Matys, V., Kel-Margoulis, O.V., Fricke, E., Liebich, I., Land, S., Barre-Dirrie, A., Reuter,
I., Chekmenev, D., Krull, M., Hornischer, K., et al. (2006). TRANSFAC and its module
TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 34,
D108-110.
Maurano, M.T., Humbert, R., Rynes, E., Thurman, R.E., Haugen, E., Wang, H., Reynolds,
A.P., Sandstrom, R., Qu, H., Brody, J., et al. (2012). Systematic localization of common
disease-associated variation in regulatory DNA. Science 337, 1190-1195.
Mavaddat, N., Antoniou, A.C., Easton, D.F., and Garcia-Closas, M. (2010). Genetic
susceptibility to breast cancer. Mol Oncol 4, 174-191.
Meyer, A., Schurmann, P., Ghahremani, M., Kocak, E., Brinkhaus, M.J., Bremer, M.,
Karstens, J.H., Hagemann, J., Machtens, S., and Dork, T. (2009). Association of
chromosomal locus 8q24 and risk of prostate cancer: a hospital-based study of
German patients treated with brachytherapy. Urol Oncol 27, 373-376.
Meyer, K.B., Maia, A.T., O'Reilly, M., Ghoussaini, M., Prathalingam, R., Porter-Gill, P.,
Ambs, S., Prokunina-Olsson, L., Carroll, J., and Ponder, B.A. (2011). A functional
variant at a prostate cancer predisposition locus at 8q24 is associated with PVT1
expression. PLoS Genet 7, e1002165.
Michailidou, K., Hall, P., Gonzalez-Neira, A., Ghoussaini, M., Dennis, J., Milne, R.L.,
Schmidt, M.K., Chang-Claude, J. et al (2013). Large-scale genotyping identifies 41
new loci associated with breast cancer risk. Nat Genet 45, 353-361.
Mikkelsen, T.S., Ku, M., Jaffe, D.B., Issac, B., Lieberman, E., Giannoukos, G., Alvarez, P.,
Brockman, W., Kim, T.K., Koche, R.P., et al. (2007). Genome-wide maps of chromatin
state in pluripotent and lineage-committed cells. Nature 448, 553-560.
191
Monda, K.L., Chen, G.K., Taylor, K.C., Palmer, C., Edwards, T.L., Lange, L.A., Ng, M.C.Y.,
Adeyemo, A.A., Allison, M.A., Bielak, L.F., et al. (2013). A meta-analysis identifies new
loci associated with body mass index in individuals of African ancestry. Nat Genet.
Myers, A.J., Gibbs, J.R., Webster, J.A., Rohrer, K., Zhao, A., Marlowe, L., Kaleem, M.,
Leung, D., Bryden, L., Nath, P., et al. (2007). A survey of genetic human cortical gene
expression. Nat Genet 39, 1494-1499.
Nelen, M.R., Padberg, G.W., Peeters, E.A., Lin, A.Y., van den Helm, B., Frants, R.R.,
Coulon, V., Goldstein, A.M., van Reen, M.M., Easton, D.F., et al. (1996). Localization of
the gene for Cowden disease to chromosome 10q22-23. Nat Genet 13, 114-116.
Nelen, M.R., van Staveren, W.C., Peeters, E.A., Hassel, M.B., Gorlin, R.J., Hamm, H.,
Lindboe, C.F., Fryns, J.P., Sijmons, R.H., Woods, D.G., et al. (1997). Germline
mutations in the PTEN/MMAC1 gene in patients with Cowden disease. Hum Mol
Genet 6, 1383-1387.
Neph, S., Vierstra, J., Stergachis, A.B., Reynolds, A.P., Haugen, E., Vernot, B., Thurman,
R.E., John, S., Sandstrom, R., Johnson, A.K., et al. (2012). An expansive human
regulatory lexicon encoded in transcription factor footprints. Nature 489, 83-90.
Nica, A.C., Parts, L., Glass, D., Nisbet, J., Barrett, A., Sekowska, M., Travers, M., Potter,
S., Grundberg, E., Small, K., et al. (2011). The architecture of gene regulatory
variation across multiple human tissues: the MuTHER study. PLoS Genet 7,
e1002003.
Nielsen, T.O., Hsu, F.D., Jensen, K., Cheang, M., Karaca, G., Hu, Z., Hernandez-
Boussard, T., Livasy, C., Cowan, D., Dressler, L., et al. (2004). Immunohistochemical
and clinical characterization of the basal-like subtype of invasive breast carcinoma.
Clin Cancer Res 10, 5367-5374.
Ong, C.T., and Corces, V.G. (2011). Enhancer function: new insights into the
regulation of tissue-specific gene expression. Nat Rev Genet 12, 283-293.
Ono, Y., Fukuhara, N., and Yoshie, O. (1998). TAL1 and LIM-only proteins
synergistically induce retinaldehyde dehydrogenase 2 expression in T-cell acute
lymphoblastic leukemia by acting as cofactors for GATA3. Mol Cell Biol 18, 6939-
6950.
Orom, U.A., Derrien, T., Beringer, M., Gumireddy, K., Gardini, A., Bussotti, G., Lai, F.,
Zytnicki, M., Notredame, C., Huang, Q., et al. (2010). Long noncoding RNAs with
enhancer-like function in human cells. Cell 143, 46-58.
192
Ortega, F.J., Moreno-Navarrete, J.M., Pardo, G., Sabater, M., Hummel, M., Ferrer, A.,
Rodriguez-Hermosa, J.I., Ruiz, B., Ricart, W., Peral, B., et al. (2010). MiRNA
expression profile of human subcutaneous adipose and during adipocyte
differentiation. PLoS One 5, e9022.
Painter, J.N., Anderson, C.A., Nyholt, D.R., Macgregor, S., Lin, J., Lee, S.H., Lambert, A.,
Zhao, Z.Z., Roseman, F., Guo, Q., et al. (2011). Genome-wide association study
identifies a locus at 7p15.2 associated with endometriosis. Nature genetics 43, 51-
54.
Palii, C.G., Perez-Iratxeta, C., Yao, Z., Cao, Y., Dai, F., Davison, J., Atkins, H., Allan, D.,
Dilworth, F.J., Gentleman, R., et al. (2011). Differential genomic targeting of the
transcription factor TAL1 in alternate haematopoietic lineages. Embo J 30, 494-509.
Peng, S., Lu, B., Ruan, W., Zhu, Y., Sheng, H., and Lai, M. (2011). Genetic
polymorphisms and breast cancer risk: evidence from meta-analyses, pooled
analyses, and genome-wide association studies. Breast Cancer Res Treat 127, 309-
324.
Perou, C.M., Sorlie, T., Eisen, M.B., van de Rijn, M., Jeffrey, S.S., Rees, C.A., Pollack, J.R.,
Ross, D.T., Johnsen, H., Akslen, L.A., et al. (2000). Molecular portraits of human
breast tumours. Nature 406, 747-752.
Persani, L., Lania, A., Alberti, L., Romoli, R., Mantovani, G., Filetti, S., Spada, A., and
Conti, M. (2000). Induction of specific phosphodiesterase isoforms by constitutive
activation of the cAMP pathway in autonomous thyroid adenomas. J Clin Endocrinol
Metab 85, 2872-2878.
Peter, A.K., Ko, C.Y., Kim, M.H., Hsu, N., Ouchi, N., Rhie, S., Izumiya, Y., Zeng, L., Walsh,
K., and Crosbie, R.H. (2009). Myogenic Akt signaling upregulates the utrophin-
glycoprotein complex and promotes sarcolemma stability in muscular dystrophy.
Hum Mol Genet 18, 318-327.
Phillips, A.C., and Vousden, K.H. (2000). Acetyltransferases and tumour suppression.
Breast Cancer Res 2, 244-246.
Pomerantz, M.M., Ahmadiyeh, N., Jia, L., Herman, P., Verzi, M.P., Doddapaneni, H.,
Beckwith, C.A., Chan, J.A., Hills, A., Davis, M., et al. (2009). The 8q24 cancer risk
variant rs6983267 shows long-range interaction with MYC in colorectal cancer. Nat
Genet 41, 882-884.
193
Pomerantz, M.M., Shrestha, Y., Flavin, R.J., Regan, M.M., Penney, K.L., Mucci, L.A.,
Stampfer, M.J., Hunter, D.J., Chanock, S.J., Schafer, E.J., et al. (2010). Analysis of the
10q11 cancer risk locus implicates MSMB and NCOA4 in human prostate
tumorigenesis. PLoS Genet 6, e1001204.
Portales-Casamar, E., Thongjuea, S., Kwon, A.T., Arenillas, D., Zhao, X., Valen, E.,
Yusuf, D., Lenhard, B., Wasserman, W.W., and Sandelin, A. (2010). JASPAR 2010: the
greatly expanded open-access database of transcription factor binding profiles.
Nucleic Acids Res 38, D105-110.
Prat, A., and Perou, C.M. (2009). Mammary development meets cancer genomics. Nat
Med 15, 842-844.
Prescott, J., Jariwala, U., Jia, L., Cogan, J.P., Barski, A., Pregizer, S., Shen, H.C.,
Arasheben, A., Neilson, J.J., Frenkel, B., et al. (2007). Androgen receptor-mediated
repression of novel target genes. Prostate 67, 1371-1383.
Qi, W., Wu, H., Yang, L., Boyd, D.D., and Wang, Z. (2007). A novel function of caspase-
8 in the regulation of androgen-receptor-driven gene expression. Embo J 26, 65-75.
Radvanyi, L., Singh-Sandhu, D., Gallichan, S., Lovitt, C., Pedyczak, A., Mallo, G., Gish, K.,
Kwok, K., Hanna, W., Zubovits, J., et al. (2005). The gene associated with
trichorhinophalangeal syndrome in humans is overexpressed in breast cancer. Proc
Natl Acad Sci U S A 102, 11005-11010.
Rao, M.A., Cheng, H., Quayle, A.N., Nishitani, H., Nelson, C.C., and Rennie, P.S. (2002).
RanBPM, a nuclear protein that interacts with and regulates transcriptional activity
of androgen receptor and glucocorticoid receptor. J Biol Chem 277, 48020-48027.
Reich, D.E., Cargill, M., Bolk, S., Ireland, J., Sabeti, P.C., Richter, D.J., Lavery, T.,
Kouyoumjian, R., Farhadian, S.F., Ward, R., et al. (2001). Linkage disequilibrium in
the human genome. Nature 411, 199-204.
Rhie, S.K., Coetzee, S.G., Noushmehr, H., Yan, C., Kim, J.M., Haiman, C.A., and Coetzee,
G.A. (2013). Comprehensive annotation of seventy-one breast cancer risk loci. PLoS
One.
Rhodes, D.R., Yu, J., Shanker, K., Deshpande, N., Varambally, R., Ghosh, D., Barrette, T.,
Pandey, A., and Chinnaiyan, A.M. (2004). ONCOMINE: a cancer microarray database
and integrated data-mining platform. Neoplasia 6, 1-6.
194
Rhodes, G.H., Valbracht, J.R., Nguyen, M.D., and Vaughan, J.H. (1997). The p542 gene
encodes an autoantigen that cross-reacts with EBNA-1 of the Epstein Barr virus and
which may be a heterogeneous nuclear ribonucleoprotein. J Autoimmun 10, 447-
454.
Richardson, A.L., Wang, Z.C., De Nicolo, A., Lu, X., Brown, M., Miron, A., Liao, X.,
Iglehart, J.D., Livingston, D.M., and Ganesan, S. (2006). X chromosomal abnormalities
in basal-like human breast cancer. Cancer Cell 9, 121-132.
Rosenbloom, K.R., Dreszer, T.R., Long, J.C., Malladi, V.S., Sloan, C.A., Raney, B.J., Cline,
M.S., Karolchik, D., Barber, G.P., Clawson, H., et al. (2012). ENCODE whole-genome
data in the UCSC Genome Browser: update 2012. Nucleic Acids Res 40, D912-917.
Schaub, M.A., Boyle, A.P., Kundaje, A., Batzoglou, S., and Snyder, M. (2012). Linking
disease associations with regulatory information in the human genome. Genome Res
22, 1748-1759.
Scherer, D., and Kumar, R. (2010). Genetics of pigmentation in skin cancer--a review.
Mutat Res 705, 141-153.
Schumacher, F.R., Feigelson, H.S., Cox, D.G., Haiman, C.A., Albanes, D., Buring, J., Calle,
E.E., Chanock, S.J., Colditz, G.A., Diver, W.R., et al. (2007). A common 8q24 variant in
prostate and breast cancer from a large nested case-control study. Cancer Res 67,
2951-2956.
Seelan, R.S., Parthasarathy, L.K., and Parthasarathy, R.N. (2004). E2F1 regulation of
the human myo-inositol 1-phosphate synthase (ISYNA1) gene promoter. Arch
Biochem Biophys 431, 95-106.
Serandour, A.A., Avner, S., Percevault, F., Demay, F., Bizot, M., Lucchetti-Miganeh, C.,
Barloy-Hubler, F., Brown, M., Lupien, M., Metivier, R., et al. (2011). Epigenetic switch
involved in activation of pioneer factor FOXA1-dependent enhancers. Genome Res
21, 555-565.
Shin, S., and Verma, I.M. (2003). BRCA2 cooperates with histone acetyltransferases
in androgen receptor-mediated transcription. Proc Natl Acad Sci U S A 100, 7201-
7206.
Siddiq, A., Couch, F.J., Chen, G.K., Lindstrom, S., Eccles, D., Millikan, R.C., Michailidou,
K., Stram, D.O., Beckmann, L., Rhie, S.K., et al. (2012). A meta-analysis of genome-
wide association studies of breast cancer identifies two novel susceptibility loci at
6q14 and 20q11. Hum Mol Genet 21, 5373-5384.
195
Singletary, S.E., and Connolly, J.L. (2006). Breast cancer staging: working with the
sixth edition of the AJCC Cancer Staging Manual. CA Cancer J Clin 56, 37-47; quiz 50-
31.
Song, L., Zhang, Z., Grasfeder, L.L., Boyle, A.P., Giresi, P.G., Lee, B.K., Sheffield, N.C.,
Graf, S., Huss, M., Keefe, D., et al. (2011). Open chromatin defined by DNaseI and
FAIRE identifies regulatory elements that shape cell-type identity. Genome Res 21,
1757-1767.
Sorlie, T., Perou, C.M., Tibshirani, R., Aas, T., Geisler, S., Johnsen, H., Hastie, T., Eisen,
M.B., van de Rijn, M., Jeffrey, S.S., et al. (2001). Gene expression patterns of breast
carcinomas distinguish tumor subclasses with clinical implications. Proc Natl Acad
Sci U S A 98, 10869-10874.
Sorlie, T., Tibshirani, R., Parker, J., Hastie, T., Marron, J.S., Nobel, A., Deng, S., Johnsen,
H., Pesich, R., Geisler, S., et al. (2003). Repeated observation of breast tumor
subtypes in independent gene expression data sets. Proc Natl Acad Sci U S A 100,
8418-8423.
Sotiriou, C., and Pusztai, L. (2009). Gene-expression signatures in breast cancer. N
Engl J Med 360, 790-800.
Stacey, S.N., Manolescu, A., Sulem, P., Thorlacius, S., Gudjonsson, S.A., Jonsson, G.F.,
Jakobsdottir, M., Bergthorsson, J.T., Gudmundsson, J., Aben, K.K., et al. (2008).
Common variants on chromosome 5p12 confer susceptibility to estrogen receptor-
positive breast cancer. Nat Genet 40, 703-706.
Sterner, D.E., and Berger, S.L. (2000). Acetylation of histones and transcription-
related factors. Microbiol Mol Biol Rev 64, 435-459.
Stevens, K.N., Fredericksen, Z., Vachon, C.M., Wang, X., Margolin, S., Lindblom, A.,
Nevanlinna, H., Greco, D., Aittomaki, K., Blomqvist, C., et al. (2012). 19p13.1 is a
triple-negative-specific breast cancer susceptibility locus. Cancer Res 72, 1795-
1803.
Stranger, B.E., Montgomery, S.B., Dimas, A.S., Parts, L., Stegle, O., Ingle, C.E.,
Sekowska, M., Smith, G.D., Evans, D., Gutierrez-Arcelus, M., et al. (2012). Patterns of
cis regulatory variation in diverse human populations. PLoS Genet 8, e1002639.
Su, S.F., Chang, Y.W., Andreu-Vieyra, C., Fang, J.Y., Yang, Z., Han, B., Lee, A.S., and
Liang, G. (2012). miR-30d, miR-181a and miR-199a-5p cooperatively suppress the
196
endoplasmic reticulum chaperone and signaling regulator GRP78 in cancer.
Oncogene.
Tapias, A., Ciudad, C.J., Roninson, I.B., and Noe, V. (2008). Regulation of Sp1 by cell
cycle related proteins. Cell Cycle 7, 2856-2867.
Tarlac, V., and Storey, E. (2003). Role of proteolysis in polyglutamine disorders. J
Neurosci Res 74, 406-416.
Teslovich, T.M., Musunuru, K., Smith, A.V., Edmondson, A.C., Stylianou, I.M., Koseki,
M., Pirruccello, J.P., Ripatti, S., Chasman, D.I., Willer, C.J., et al. (2010). Biological,
clinical and population relevance of 95 loci for blood lipids. Nature 466, 707-713.
Ting, M.C., Liao, C.P., Yan, C., Jia, L., Groshen, S., Frenkel, B., Roy-Burman, P., Coetzee,
G.A., and Maxson, R. (2012). An enhancer from the 8q24 prostate cancer risk region
is sufficient to direct reporter gene expression to a subset of prostate stem-like
epithelial cells in transgenic mice. Dis Model Mech 5, 366-374.
Tomlinson, I., Webb, E., Carvajal-Carmona, L., Broderick, P., Kemp, Z., Spain, S.,
Penegar, S., Chandler, I., Gorman, M., Wood, W., et al. (2007). A genome-wide
association scan of tag SNPs identifies a susceptibility variant for colorectal cancer
at 8q24.21. Nat Genet 39, 984-988.
Turashvili, G., Bouchal, J., Baumforth, K., Wei, W., Dziechciarkova, M., Ehrmann, J.,
Klein, J., Fridman, E., Skarda, J., Srovnal, J., et al. (2007). Novel markers for
differentiation of lobular and ductal invasive breast carcinomas by laser
microdissection and microarray analysis. BMC Cancer 7, 55.
Turnbull, C., Ahmed, S., Morrison, J., Pernet, D., Renwick, A., Maranian, M., Seal, S.,
Ghoussaini, M., Hines, S., Healey, C.S., et al. (2010). Genome-wide association study
identifies five new breast cancer susceptibility loci. Nat Genet 42, 504-507.
van Berkum, N.L., Lieberman-Aiden, E., Williams, L., Imakaev, M., Gnirke, A., Mirny,
L.A., Dekker, J., and Lander, E.S. (2010). Hi-C: a method to study the three-
dimensional architecture of genomes. J Vis Exp.
Veyrieras, J.B., Kudaravalli, S., Kim, S.Y., Dermitzakis, E.T., Gilad, Y., Stephens, M., and
Pritchard, J.K. (2008). High-resolution mapping of expression-QTLs yields insight
into human gene regulation. PLoS Genet 4, e1000214.
197
Visvader, J., Begley, C.G., and Adams, J.M. (1991). Differential expression of the LYL,
SCL and E2A helix-loop-helix genes within the hemopoietic system. Oncogene 6,
187-194.
Walsh, T., Casadei, S., Coats, K.H., Swisher, E., Stray, S.M., Higgins, J., Roach, K.C.,
Mandell, J., Lee, M.K., Ciernikova, S., et al. (2006). Spectrum of mutations in BRCA1,
BRCA2, CHEK2, and TP53 in families at high risk of breast cancer. Jama 295, 1379-
1388.
Wang, H., and Yan, C. (2011). A small-molecule p53 activator induces apoptosis
through inhibiting MDMX expression in breast cancer cells. Neoplasia 13, 611-619.
Wang, Z., Zang, C., Rosenfeld, J.A., Schones, D.E., Barski, A., Cuddapah, S., Cui, K., Roh,
T.Y., Peng, W., Zhang, M.Q., et al. (2008). Combinatorial patterns of histone
acetylations and methylations in the human genome. Nat Genet 40, 897-903.
Ward, L.D., and Kellis, M. (2012). HaploReg: a resource for exploring chromatin
states, conservation, and regulatory motif alterations within sets of genetically
linked variants. Nucleic Acids Res 40, D930-934.
Waterworth, D.M., Ricketts, S.L., Song, K., Chen, L., Zhao, J.H., Ripatti, S., Aulchenko,
Y.S., Zhang, W., Yuan, X., Lim, N., et al. (2010). Genetic variants influencing
circulating lipid levels and risk of coronary artery disease. Arterioscler Thromb Vasc
Biol 30, 2264-2276.
Wellington, C.L., Ellerby, L.M., Hackam, A.S., Margolis, R.L., Trifiro, M.A., Singaraja, R.,
McCutcheon, K., Salvesen, G.S., Propp, S.S., Bromm, M., et al. (1998). Caspase cleavage
of gene products associated with triplet expansion disorders generates truncated
fragments containing the polyglutamine tract. J Biol Chem 273, 9158-9167.
Wooster, R., Bignell, G., Lancaster, J., Swift, S., Seal, S., Mangion, J., Collins, N., Gregory,
S., Gumbs, C., and Micklem, G. (1995). Identification of the breast cancer
susceptibility gene BRCA2. Nature 378, 789-792.
Xie, H., Lim, B., and Lodish, H.F. (2009). MicroRNAs induced during adipogenesis
that accelerate fat cell development are downregulated in obesity. Diabetes 58,
1050-1057.
Yan, W., Cao, Q.J., Arenas, R.B., Bentley, B., and Shao, R. (2010). GATA3 inhibits breast
cancer metastasis through the reversal of epithelial-mesenchymal transition. J Biol
Chem 285, 14042-14051.
198
Yang, A., Kaghad, M., Wang, Y., Gillett, E., Fleming, M.D., Dotsch, V., Andrews, N.C.,
Caput, D., and McKeon, F. (1998). p63, a p53 homolog at 3q27-29, encodes multiple
products with transactivating, death-inducing, and dominant-negative activities. Mol
Cell 2, 305-316.
Yang, X.J. (2004). The diverse superfamily of lysine acetyltransferases and their
roles in leukemia and other diseases. Nucleic Acids Res 32, 959-976.
Ye, T., Krebs, A.R., Choukrallah, M.A., Keime, C., Plewniak, F., Davidson, I., and Tora, L.
(2011). seqMINER: an integrated ChIP-seq data interpretation platform. Nucleic
Acids Res 39, e35.
Yeager, M., Orr, N., Hayes, R.B., Jacobs, K.B., Kraft, P., Wacholder, S., Minichiello, M.J.,
Fearnhead, P., Yu, K., Chatterjee, N., et al. (2007). Genome-wide association study of
prostate cancer identifies a second risk locus at 8q24. Nat Genet 39, 645-649.
Zanke, B.W., Greenwood, C.M., Rangrej, J., Kustra, R., Tenesa, A., Farrington, S.M.,
Prendergast, J., Olschwang, S., Chiang, T., Crowdy, E., et al. (2007). Genome-wide
association scan identifies a colorectal cancer susceptibility locus on chromosome
8q24. Nat Genet 39, 989-994.
Zeller, T., Wild, P., Szymczak, S., Rotival, M., Schillert, A., Castagne, R., Maouche, S.,
Germain, M., Lackner, K., Rossmann, H., et al. (2010). Genetics and beyond--the
transcriptome of human monocytes and disease susceptibility. PLoS One 5, e10693.
Zentner, G.E., and Scacheri, P.C. (2012). The chromatin fingerprint of gene enhancer
elements. J Biol Chem 287, 30888-30896.
Zhang, Y., Liu, T., Meyer, C.A., Eeckhoute, J., Johnson, D.S., Bernstein, B.E., Nusbaum,
C., Myers, R.M., Brown, M., Li, W., et al. (2008). Model-based analysis of ChIP-Seq
(MACS). Genome Biol 9, R137.
Zheng, W., Long, J., Gao, Y.T., Li, C., Zheng, Y., Xiang, Y.B., Wen, W., Levy, S., Deming,
S.L., Haines, J.L., et al. (2009). Genome-wide association study identifies a new breast
cancer susceptibility locus at 6q25.1. Nat Genet 41, 324-328.
Zhu, J., Adli, M., Zou, J.Y., Verstappen, G., Coyne, M., Zhang, X., Durham, T., Miri, M.,
Deshpande, V., De Jager, P.L., et al. (2013). Genome-wide Chromatin State
Transitions Associated with Developmental and Environmental Cues. Cell 152, 642-
654.
Abstract (if available)
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
PDF
Identification and characterization of cancer-associated enhancers
PDF
Functional characterization of colorectal cancer GWAS loci
PDF
Functional DNA methylation changes in normal and cancer cells
PDF
Functional characterization of colon cancer risk enhancers
PDF
Effects of chromatin regulators during carcinogenesis
PDF
Polygenic analyses of complex traits in complex populations
PDF
Integrative genomic and epigenomic analysis of human cancer
PDF
Functional characterization of a prostate cancer risk region
PDF
Functional role of chromatin remodeler proteins in cancer biology
PDF
Exploration of the roles of cancer stem cells and survivin in the pathogenesis and progression of prostate cancer
PDF
Post-GWAS methods in large scale studies of breast cancer in African Americans
PDF
Genomic risk factors associated with Ewing Sarcoma susceptibility
PDF
Investigating the function and epigenetic regulation of ABCA3, a novel LUAD tumor suppressor gene
PDF
Understanding DNA methylation and nucleosome organization in cancer cells using single molecule sequencing
PDF
Genome-wide characterization of the regulatory relationships of cell type-specific enhancer-gene links
PDF
Identification of novel epigenetic biomarkers and microRNAs for cancer therapeutics
PDF
Epigenetic plasticity of cultured female human embryonic stem cells and regulation of gene expression and chromatin by PR-SET7 mediated H4K20me1
PDF
Human myeloid-derived suppressor cells in cancer: Induction, functional characterization, and therapy
PDF
Understanding acute lymphoblastic leukemia in different ethnic groups in the United States
Asset Metadata
Creator
Rhie, Suhn Kyong
(author)
Core Title
Breast epithelial cell type specific enhancers and functional annotation of breast cancer risk loci
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Genetic, Molecular and Cellular Biology
Publication Date
04/23/2013
Defense Date
03/08/2013
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
breast cancer,enhancer,epigenetics,GWAS,OAI-PMH Harvest,predisposition,single nucleotide polymorphism
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Coetzee, Gerhard (Gerry) A. (
committee chair
), Haiman, Christopher A. (
committee member
), Laird-Offringa, Ite A. (
committee member
)
Creator Email
rhie@usc.edu,suhnrhie@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-241673
Unique identifier
UC11294785
Identifier
etd-RhieSuhnKy-1580.pdf (filename),usctheses-c3-241673 (legacy record id)
Legacy Identifier
etd-RhieSuhnKy-1580.pdf
Dmrecord
241673
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Rhie, Suhn Kyong
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
breast cancer
enhancer
epigenetics
GWAS
predisposition
single nucleotide polymorphism