Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
(USC Thesis Other)
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1
Functional Characterization of colon cancer risk-associated enhancers:
connecting risk loci to risk genes
By
Yu Gyoung Tak (Esther)
B.S. (Seoul National University) 2007
DISSERTATION
Submitted in partial satisfaction of the requirements for the degree of
DOCTOR OF PHILOSOPHY
in
Genetics, Molecular and Cellular Biology
in the
FACULTY OF GRADUATE SCHOOL
of the
UNIVERSITY OF SOUTHERN CALIFORNIA
Approved:
______________________________
(Peggy J. Farnham, Ph.D.), Mentor
______________________________
(Ite A Laird-Offringa, Ph.D.), Chair
______________________________
(Gerry A. Coetzee, Ph.D)
Committee in Charge
MAY, 2016
2
Table of contents
Table of contents ............................................................................................................. 2
Abstract ............................................................................................................................ 6
Acknowledgements ......................................................................................................... 7
List of Figures .................................................................................................................. 9
List of Tables ................................................................................................................. 10
List of Supplementary Figures ..................................................................................... 11
1Chapter 1. Introduction ............................................................................................... 13
1.1The GWAS conundrum ..................................................................................................... 13
1.2 Prioritization of SNPs associated with a specific disease using functional
annotation ............................................................................................................................... 18
1.3 Prioritization of SNPs associated with a specific disease by linking to gene
expression ............................................................................................................................... 26
1.4 Experimental approaches to identify target genes of regulatory and eQTL SNPs .... 29
1.4.1 Deletion of regulatory elements harboring prioritized SNPs ........................................ 31
1.4.2 Epigenetic modification of regulatory elements that harbor prioritized risk SNPs ....... 34
1.4.3 Specific targeting of prioritized SNPs .......................................................................... 35
1.5 Disease-related functional analyses ............................................................................... 38
Chapter 2: Functional annotation of colon cancer risk SNPs ................................. 43
2.1 Abstract ............................................................................................................................. 43
2.2 Introduction ....................................................................................................................... 44
3
2.3 Results ............................................................................................................................... 46
2.3.1 CRC risk-associated SNPs linked to a specific gene .................................................. 46
2.3.2 CRC risk-associated SNPs in distal regulatory regions ............................................... 51
2.3.3 Effects of SNPs on binding motifs in the distal elements ............................................ 56
2.3.4 Expression analysis of candidate risk-associated genes ............................................ 57
2.3.5 The effect of enhancer deletion on the transcriptome ................................................. 62
2.4 Discussion ........................................................................................................................ 63
2.5 Methods ............................................................................................................................. 68
2.5.1 RNA-seq ...................................................................................................................... 68
2.5.2 ChIP-seq analysis ........................................................................................................ 69
2.5.3 Enhancer deletion ........................................................................................................ 69
2.5.4 Analysis of FunciSNP and correlated SNPs effects .................................................... 70
2.5.5 Batch effects analysis .................................................................................................. 70
2.5.6 eQTL analyses ............................................................................................................ 72
2.5.7 General data handling and visualization ...................................................................... 73
2.6 Supplementary figures for chapter2 ............................................................................... 74
3Chapter 3: Effects on the transcriptome upon deletion of distal elements are
not correlated with the size of H3K27Ac peaks in human cells .............................. 84
3.1 Abstract ............................................................................................................................. 84
3.2 Introduction ....................................................................................................................... 85
3.3 Results ............................................................................................................................... 86
3.3.1 Deletion of CRC risk-associated enhancers can cause widespread effects on the
transcriptome ........................................................................................................................ 86
3.3.2 The size or presence of an H3K27Ac peak does not correlate with enhancer activity 94
4
3.3.3 Characterization of the genome-wide changes in gene expression upon enhancer
deletion ................................................................................................................................. 97
3.3.4 Cell growth is affected by deletion of enhancer E7 ..................................................... 98
3.3.5 4C analysis of E7 and E24 in HCT116 ...................................................................... 100
3.4 Discussion ...................................................................................................................... 102
3.5 Methods ........................................................................................................................... 107
3.5.1 Cell culture ................................................................................................................. 107
3.5.2 CRISPR/Cas9-mediated genome editing .................................................................. 107
3.5.3 PCR detection of cells having enhancer deletions .................................................... 108
3.5.4 RNA-seq .................................................................................................................... 108
3.5.5 Cell proliferation assays ............................................................................................ 110
3.5.6 Colony forming assays .............................................................................................. 110
3.5.7 ChIP-seq and analysis ............................................................................................... 110
3.5.8 4C-seq and analysis .................................................................................................. 111
3.6 Supplementary figures for Chapter 3 ........................................................................... 113
4Chapter 4: Conclusions and Future Studies ........................................................... 122
4.1 Summary of main findings ............................................................................................ 122
4.2 Future directions ............................................................................................................ 123
4.2.1.Epigenome editing of risk-associated enhancers ...................................................... 123
4.2.2. Targeting combinatorial enhancers .......................................................................... 125
4.2.3. Targeting SNP using HDR mechanism .................................................................... 125
4.2.4 . Disease-related functional assays. .......................................................................... 125
Appendix A: Publications as a contributing author ................................................. 127
Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the
genome by association with GATA3 .................................................................................. 127
5
Appendix B: Supplementary data files in DVD ......................................................... 146
References ................................................................................................................... 148
6
Abstract
Genome wide association studies (GWASs) have identified SNPs that are statistically
associated with diseases. Interestingly, most of the GWAS-identified SNPs and their high LD
SNPs are found in non-coding regions of the genome, decorated by regulatory elements
(enhancers, promoters, and nuclear structure-associated elements). This led to my general
hypothesis that disease-associated SNPs affect the expression of disease-associated genes
through the function of regulatory elements in which the SNPs are located. Among regulatory
elements, disease-associated SNPs are enriched in enhancers, which play an important role in
regulating cell-type specific gene expression. However, enhancers do not necessarily regulate
nearby genes. Using colorectal cancer (CRC) GWAS and epigenomic information for the active
enhancer mark (H3K27Ac), I identified enhancers harboring CRC risk-associated SNPs.
Employing a genome-editing tool (CRISPR/Cas9), I deleted several CRC risk-associated
enhancers along with control regions and performed RNA-seq assays to identify putative CRC
risk-associated genes. The putative target genes of each enhancer were assessed by
chromosomal looping assay (4C), confirming direct physical interactions between enhancers and
promoters. Deletion of one of the CRC-risk associated enhancers (E7) led to decreased
expression of the MYC oncogene and reduced proliferation of HCT116 cells. Interestingly,
deletion of the E7 region in HEK293 cells caused a similar downregulation of MYC and reduced
cell proliferation as in HCT116, even though the H3K27Ac mark is not present in HEK293 cells. In
conclusion, by identifying genes regulated by CRC risk-associated enhancers harboring SNPs, I
have developed a general approach to connect risk loci to putative risk genes.
7
Acknowledgements
I would like to thank to my mentor, Professor Peggy Farnham, for being my role model as
a scientist and a teacher during these past five years. Peggy is someone every student likes as a
mentor. She is one of the smartest people that I know. She has always guided me to become a
better researcher through many insightful discussions and suggestions. I deeply thank the fact
that her office door is always open, which allows me to see her whenever I have issues. She is
always prepared to answer any questions and I always receive great feedbacks with positive
energy. In addition to scientific discussions, she showed me how to enjoy life through lab parties
and baseball games. I will never forget several conferences where all lab members attended,
which was possible with Peggy’s great support. Not only having an opportunity to learn how to
make and present a poster at the conference, which gave me lots of opportunities to
communicate with other scientists, I had lots of fun with lab members in Italy and Santa Fe.
There are a lot of great things that I appreciate Peggy for, which were not described here, and I
really hope that I could be as lively, enthusiastic, insightful, and generous as Peggy when I am a
mentor. Peggy, thank you very much for being my mentor!!
I am also very grateful to my committee members Professor Ite A. Laird-Offringa, and
Professor Gerry Coetzee. When I was wondering about subjects for my Ph.D work in my first
year, Ite guided me towards epigenetics during my first rotation. Her kindness and warm
attention helped me to adjust to Ph.D life. I also have to thank Gerry for helping me think different
about aspects of my experiments by providing lots of important questions and suggestions.
I thank all the past and present members of the Peggy’s group. I learned a lot about
ChIP-seq from Seth Frietze and Heather Witt. Seth was a great teacher when I started my Ph.D
work in Peggy’s lab. Matt Grimmer and Adam Blattler were great senior grad students, giving me
lots of advice. Malaina and Lijing were amazing friends, and I enjoyed having scientific
discussions with them, and also having fun inside and outside of the lab. Yuli Hung was a great
8
Master student, and I enjoyed teaching her and talking to her about science. I enjoyed our PET
meetings with Yu Guo, Hang Yang, and Zhiefe Luo, which helped me to learn skills to become a
mentor. I also appreciate Suhn Kyong Rhie, Fides Lay, and Shannon Wood for their scientific
suggestions. I thank Charlie Nicolet, Helen Truong, and Selene Tyndale for their efforts to
provide me the best quality sequencing data. I am also grateful that I have always seen Charlie
and Peggy’s happy family life. I thank Vicky Yamamoto who always smiles and works really hard
till late night providing me company, listening to my stories, and giving me suggestions. I felt like
she was my older sister.
I also want to give special thanks to my former advisor in Korea, Sung Jin Kim, who
treated me as his daughter, providing advices and help for my work, health, and my family.
I also want to give thanks to my spiritual family in USA and my real family in Korea.
Without their love and help, I could not have finished my Ph.D.
9
List of Figures
Figure 1.1 Making sense of GWAS: an overview ............................................................ 17
Figure 2.1 Identification of potential functional SNPs for CRC. ....................................... 49
Figure 2.2 Expression of risk-associated genes in colon cells. ....................................... 54
Figure 2.3 Linking a transcript to an enhancer using TCGA data. .................................. 61
Figure 2.4 Identification of genes affected by deletion of enhancer 7. ............................ 63
Figure 2.5 Summary of identified candidate genes correlated with increased risk for
CRC. ................................................................................................................................ 64
Figure 3.1 Genomic location and H3K27Ac profiles of E7, E24, 18qE, and 18qNE. ...... 88
Figure 3.2 Experimental schema. .................................................................................... 89
Figure 3.3 Gene expression and H3K27Ac changes upon deletion of E7 in HCT116
cells. ................................................................................................................................ 91
Figure 3.4 Gene expression and H3K27Ac changes upon deletion of E24 in HCT11
cells. ................................................................................................................................ 93
Figure 3.5 H3K27Ac, TCF7L2, and CTCF binding profiles of E7, E24, 18qE, and
18qNE. ........................................................................................................................... 95
Figure 3.6 Gene expression and H3K27Ac changes upon deletion of E7 in HEK293
cells. ................................................................................................................................ 96
Figure 3.7 Proliferation is affected by deletion of enhancer 7. ...................................... 100
Figure 3.8 4C analysis of enhancer E7 and E24. ......................................................... 102
Figure 4.1 Repressed CRC risk enhancer E21 ............................................................. 124
10
List of Tables
Table 1.1 Publicly available functional annotation programs. ............................................ 25
Table 1.2 Sources of eQTL databases ............................................................................... 29
Table 2.1 Summary of regions linked to CRC tag SNPs. ................................................... 50
Table 2.2 Expressed transcripts directly linked CRC index SNPs ..................................... 51
Table 2.3 Distal regulatory regions correlated with CRC tag SNPs ................................... 55
Table 2.4 Effects of SNPs on motifs in the distal regulatory regions. ................................. 57
Table 2.5 Linking transcripts to enhancers using TCGA data. ........................................... 60
Table 3.1 Altered gene expression upon enhancer deletion. ............................................. 94
11
List of Supplementary Figures
Supplementary figure 2.1 Correlated exon SNPs. .......................................................... 74
Supplementary figure 2.2 Analysis of RNA-seq data. ..................................................... 75
Supplementary figure 2.3 Correlated TSS SNPs. ........................................................... 76
Supplementary figure 2.4 ChIP-seq peak analysis. ........................................................ 77
Supplementary figure 2.5 Correlated enhancer SNPs. ................................................... 78
Supplementary figure 2.6 TCGA batch effects analysis. RNA-seq batch effects. ........... 79
Supplementary figure 2.7 TCGA batch effects analysis. CNV (GW SNP6 array) batch
effects. ............................................................................................................................. 80
Supplementary figure 2.8 TCGA batch effects analysis. DNA methylation (Infinium
HM450K microarray) batch effects. ............................................................................... 81
Supplementary figure 2.9 Expression analysis of genes identified by promoter and
exon SNPs and potential enhancer target genes in TCGA samples. ............................ 82
Supplementary figure 2.10 eQTL analysis summary. .................................................... 83
Supplementary figure 3.1 Guide RNA sequences. ........................................................ 113
Supplementary figure 3.2 Confirmation of enhancer deletions. .................................... 114
Supplementary figure 3.3 List of datasets. .................................................................... 115
Supplementary figure 3.4 PCA plots of RNA-seq data. ................................................. 116
Supplementary figure 3.5 Circos plots for top downregulated genes. ........................... 117
Supplementary figure 3.6 Comparison of TCF7L2 and CTCF binding patterns upon
E7 deletion in HCT116. ................................................................................................. 118
Supplementary figure 3.7 .............................................................................................. 120
12
Supplementary figure 3.8 Proliferation assays. ............................................................. 121
Supplementary figure 3.9 4C-seq information. .............................................................. 121
13
1 Chapter 1. Introduction
The following introductory chapter is being prepared for submission as a review article entitled “Making
Sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of
SNPs in non-coding regions of the human genome” for publication in the journal Epigenetics and
Chromatin. The authors will be Yu Gyoung Tak and Peggy J. Farnham. Yu Gyoung Tak reviewed all the
literature, wrote the first draft, and incorporated all suggestions and edits; Peggy Farnham provided
suggestions and helped to edit the document.
1.1 The GWAS conundrum
Considerable progress towards an understanding of complex diseases has been made in
recent years due to the development of high throughput genotyping technologies. Using
microarrays that contain millions of single nucleotide polymorphisms (SNPs), Genome Wide
Association Studies (GWASs) have identified SNPs that are associated with many complex
diseases or traits (Welter, MacArthur et al. 2014). Such studies rely on differences in the
frequency of a specific SNP in, for example, healthy (or control) vs. diseased (or affected)
populations. To date, ~14 million validated SNPs have been identified in human populations.
Although GWAS arrays do not contain all mapped SNPs, it is estimated that they do capture most
human genome variation through haplotype-based SNP imputation (Li, Willer et al. 2009, Kichaev
and Pasaniuc 2015). The SNPs identified by GWAS that are statistically significantly over-
represented in the disease (or affected) populations are called index SNPs and genomic regions
containing the index SNPs are called risk loci for that particular disease. As of February 2015,
2111 different association studies have identified 15,396 index SNPs for various diseases and
traits (www.genome.gov/gwastudies), with the number of identified SNP-disease/trait associations
14
increasingly rapidly in recent years (Welter, MacArthur et al. 2014). However, there are several
issues that have made it difficult for researchers to explain disease risk using GWAS results.
First, unlike Mendelian diseases such as cystic fibrosis that are often caused by mutations in
coding regions of proteins, GWAS-identified disease-associated nucleotide differences are rarely
found in coding regions. Instead, most disease-associated index SNPs are located in non-coding
regions of the genome, equally proportioned between the intergenic and intronic compartments
(Freedman, Monteiro et al. 2011, Blattler, Yao et al. 2014) However, it is important to consider
that, as mentioned above, the index SNPs actually serve only as representatives for all the SNPs
in the same haplotype block, and it is equally likely that other SNPs in high linkage disequilibrium
(LD) with the array-identified index SNPs are causal for the disease. Because it was hoped that
disease-associated coding variants would be identified if the true casual SNPs were known,
investigators began expanding their analyses to include more than just the index SNPs. A
commonly used approach to investigate SNPs other than the index SNPs present on the standard
GWAS array has been to use LD calculation (Browning and Browning 2007, Howie, Donnelly et al.
2009, Li, Willer et al. 2010) together with the 1000 Genomes Project reference panels from 14
populations (Abecasis, Auton et al. 2012). Such approaches have generally expanded the list of
putative causal SNPs from less than 100 index SNPs for a particular disease or trait to thousands
of associated SNPs (Figure 1; High LD SNPs). For example, 727 SNPs are in high LD (r
2
>0.5)
with 77 index SNPs linked to prostate cancer (Hazelett, Rhie et al. 2014). Unfortunately, most of
these LD-associated SNPs are also in non-coding regions of the genome. Similarly, SNPs
correlated with 25 colon cancer risk-associated index SNPs were analyzed (using an r
2
>0.5); 13
correlated SNPs were located in exons (only 2 of which were predicted to be damaging to the
protein structure), whereas 503 correlated SNPs were located in non-coding regions
corresponding to promoters or enhancers (Yao, Tak et al. 2014). An alternative approach called
fine mapping has also been used in attempts to move from the index SNP (which basically
15
identifies a large genomic region) to a more refined list of putative causative SNPs located within
the identified region. Fine-mapping studies employ dense genotyping arrays that contain all
common SNPs within the previously identified risk loci, which together with imputation (Browning
and Browning 2007, Howie, Donnelly et al. 2009, Li, Willer et al. 2010), allows investigators to
perform a more complete analysis of the risk regions (Error! Reference source not found.; Fine-
mapped SNPs) (Spain and Barrett 2015). However, genotyping at this fine scale requires large
sample sizes to provide the statistical power needed to differentiate the causal SNPs from the
non-causal SNPs. In addition, creation of loci-specific genotyping arrays is quite expensive.
Therefore, most fine mapping analyses have been done by international consortia with shared
interests for specific diseases or traits; examples include the Immunochip (Trynka, Hunt et al.
2011), the Metabochip (Voight, Kang et al. 2012), the iCOGs array (Michailidou, Hall et al. 2013),
and the Oncoarray (hyyp://epi.grants.cancer/gov/oncoarray/). The majority of fine-mapping
studies have been performed using European-ancestry populations in which LD blocks are longer
than in other populations and therefore many correlated SNPs are present per loci, exacerbating
the problems related to a need for large sample sizes to separate true candidate causal SNPs
from less significantly associated risk SNPs (Edwards, Beesley et al. 2013, Amin Al Olama,
Dadaev et al. 2015). However, recent fine-mapping studies of trans-ethnic populations have
shown better results in discovering candidate causal SNPs (Ong, Wang et al. 2012, Mahajan, Go
et al. 2014, Han, Hazelett et al. 2015, Kichaev and Pasaniuc 2015); trans-ethnic fine-mapping
increases statistical power by increasing the number of samples and also helps to avoid false
positives due to confounding factors of population stratification. Unfortunately, a recent multi-
ethnic analysis of prostate cancer risk SNPs found that, even after fine-mapping, most risk-
associated SNPs are located in non-coding regions (Han, Hazelett et al. 2015). Thus, the GWAS
field has been left with the conundrum as to how a single nucleotide change in a non-coding
region could confer increased risk for a specific disease. One possible answer to this puzzle is
16
that the SNPs cause changes in gene expression levels rather than causing changes in protein
function. This chapter provides a description of 1) advances in genomic and epigenomic
approaches that incorporate functional annotation of regulatory elements to prioritize for follow-up
studies the disease risk-associated SNPs that are located in non-coding regions of the genome, 2)
various computational tools that aid in identifying gene expression changes caused by the non-
coding disease-associated SNPs, and 3) experimental approaches to identify target genes of, and
study the biological phenotypes conferred by, non-coding disease-associated SNPs.
17
Figure 1.1 Making sense of GWAS: an overview
Shown is a flow chart of analytical and experimental steps that can be followed to understand how
a non-coding SNP can be associated with an increased risk for a specific disease. Index SNPs
are identified using GWAS arrays and then expanded to a larger set of SNPs (termed Refined
Associated SNPs) using LD scores and fine mapping. These Refined Associated SNPs are then
prioritized using functional annotation to identify Regulatory SNPs (Reg SNPs) or linkage to allele-
specific gene expression to identify eQTL SNPs, producing a set of Candidate Functional SNPs.
The Candidate Functional SNPs can either be studied directly or further refined by testing the Reg
SNPs for possible SNP-RNA linkages or by testing the eQTL SNPs for functional annotation. If the
Candidate Functional SNP (yellow arrowhead) lies within a distal regulatory element, it should be
deleted or modified using genomic nucleases or epigenomic toggle switches (Approach A);
putative target genes are then identified using RNA-seq. Distal regulatory elements that cause
changes in gene expression when deleted or modified can then be studied using allele-specific
analyses (Approach B); promoters harboring risk-associated SNPs (pink arrowhead) can be
directly studied using Approach B. As described in the text, the cells deleted for the distal
regulatory elements can be used to identify an appropriate phenotypic assay for analysis of the
candidate target genes. Then, the genes that show expression changes that are linked to distal
SNPs and the genes regulated by the promoter SNPs can be studied using those biological
assays to identify possible therapeutic targets and/or candidates for diagnostic tests. Finally,
looping assays can be performed to distinguish direct from indirect targets of the distal regulatory
elements. It is important to note that a gene whose expression is indirectly affected by a non-
coding SNP could be a more important diagnostic or therapeutic target that the directly affected
gene.
18
1.2 Prioritization of SNPs associated with a specific disease using functional
annotation
As noted above, not only do the vast majority (~93%) of index SNPs in the GWAS catalog
that have been associated with specific diseases or traits lie within non-coding regions, but also
most SNPs in high LD with the index SNPs and most SNPs identified by fine-mapping (Figure 1;
collectively identified as Refined Associated SNPs) are also located in non-coding regions.
The current hypothesis is that one or more of these refined risk-associated non-coding SNPs
cause changes in gene expression of a critical gene. However, functional follow-up experiments
(described in Section 1.4) are both expensive and time-consuming and one cannot test each
possible candidate SNP for causality. It is also important to note that although fine-mapping
usually results in a smaller number of associated SNPs than does LD calculation, fine-mapping
has only been performed for a relatively small number of disease-associated loci and therefore
most investigators are left with the problem of a fairly large list of “possible” causal SNPs. Clearly,
it is necessary to prioritize the list of refined associated SNPs for follow-up analyses. One way to
prioritize the list of SNPs is to identify those located in regulatory regions of the genome (Figure 1;
Regulatory SNPs). There are several types of elements involved in transcriptional regulation
including promoters, enhancers, and nuclear structure-associated elements such as CTCF
binding regions; each of these elements has been associated with non-coding SNPs.
The first step in identifying Regulatory SNPs is to select from the list of Refined Associated
SNPs those that lie within regulatory regions. A promoter is user-defined region, usually
corresponding to several Kb surrounding a transcription start site (TSS) of a known coding or non-
coding gene. Thus, investigators can bioinformatically identify promoter SNPs. It is more difficult
to identify SNPs within enhancers because, unlike promoters, they do not occur at a defined
19
distance from a TSS. However, they can be identified by specific epigenomic profiles. Within
recent years, consortia such as the Encyclopedia of DNA Elements (ENCODE)
(ENCODE_Project_Consortium 2012) and the Roadmap Epigenomics Mapping Consortium
(REMC) (Bernstein, Stamatoyannopoulos et al. 2010, RoadmapEpigenomicsConsortium 2015)
have used a variety of genome-wide methods to study the chromatin state of non-coding regions
in the human genome in hundreds of different cell types (primary cell lines, immortalized cell lines,
and tissues). In these studies, enhancers have been identified using methods that detect open
chromatin, specific histone modifications, and enhancer RNAs (eRNAs). For example, DNase-
seq (Maurano, Humbert et al. 2012) has been used to identify DNase1-hypersensitive regions
(DHSs) that correspond to areas of open, accessible chromatin that contain binding motifs for
transcription factors (TFs). Although DHSs are generally a few Kb in length, DNase footprinting
(which combines deeply sequenced DNase-seq data with motif information) can help to more
precisely identify the critical nucleotides within a DHS site (Boyle, Song et al. 2011, Schaub, Boyle
et al. 2012). More recently, ATAC-seq, a method that employs an engineered Tn5 transposase to
measure chromatin accessibility, has been used to define genomic maps of open chromatin;
advantages of ATAC-seq include the requirement for fewer cells (500-50,000 cells) and fewer
experimental steps, as compared to DNase-seq (Buenrostro, Giresi et al. 2013). The entire set of
DHSs includes promoter regions, distal enhancer regions, and sites of binding of structural TFs.
To further refine the set of distal DHSs to include only active enhancers, investigators use the
method of ChIP-seq and antibodies specific to histone modifications. For example, potentially
active enhancers are identified as regions of open chromatin with flanking nucleosomes having
histone 3 marked by monomethylation of lysine 4 (H3K4me1), whereas nucleosomes flanking fully
active enhancers are marked by H3K4me1 and also by acetylation of lysine 27 (H3K27Ac)
(ENCODE_Project_Consortium 2012); enhancers also sometimes have low levels of histone H3
trimethylated on lysine 4 (H3K4me3), a mark that is quite strong at promoters. The H3K27Ac
20
mark at enhancers is likely a consequence of the binding of site-specific TFs (e.g. TCF7L2) that
recruit histone acetyltransferases such as EP300 and CBP. It is thought that the acetylation of the
histones flanking a DHS increases the net affinity of other TFs to the region of open chromatin
(Spitz and Furlong 2012). Thus, it seems logical that identifying active enhancers using TF ChIP-
seq data would also be possible. However, considering the fact that ChIP-seq patterns have been
identified for less than 150 transcription factors out of 1800 known TFs (and in only a few cell
types), the combination of DHS and histone modifications is more commonly used to identify
enhancers (Maurano, Humbert et al. 2012). Finally, a different approach to identifying active
enhancers has been used by the FANTOM5 project, which employed cap analysis of gene
expression (CAGE) to discover active enhancers that produce bidirectional capped RNA. Notably,
although very few enhancers were identified by this method, a high percentage of these
enhancers were validated by reporter assays (Andersson, Gebhard et al. 2014). It is also
important to note that enhancers are very cell type-specific and therefore enhancer mapping must
be performed in the cell type(s) that are relevant to the disease under study.
Several studies have shown that index SNPs and/or correlated SNPs that are in high LD
to the index SNPs are enriched in enhancer regions. For example, one study found that non-
coding index SNPs from 426 GWASs are enriched in enhancers present in the relevant cell types
and that several of the index SNPs created or disrupted TF motifs in the identified enhancers
(Ernst, Kheradpour et al. 2011). Also, Schaub et al. studied 4724 GWAS index SNPs associated
with 470 different phenotypes using ENCODE data, showing that 36% of the SNPs are in DHSs
and 20% are in a ChIP-seq peak in at least one cell line. When they extended their analyses to
SNPs that are in high LD (r^2 >0.8) with the index SNPs, the overlap increased by over two-fold
(Schaub, Boyle et al. 2012). These findings are consistent with a recent study in which
investigators used H3K27Ac ChIP-seq data from normal and colon cancer cells and found that
270 SNPs that have a high LD (r
2
>0.5) with 25 colorectal cancer index SNPs are located in
21
H3K27Ac sites; when the SNPs were limited to distal regions they identified 68 unique enhancers
(Yao, Tak et al. 2014). Similarly, combining H3K27Ac and H3K4me1 ChIP-seq data and DNase-
seq data from prostate cancer cells, Hazelett et al. identified 727 SNPs that were in high LD
(r
2
>0.5) with 77 prostate cancer risk SNPs; of these, 663 SNPs were in putative enhancer regions
(Hazelett, Rhie et al. 2014). Also, a recent fine-mapping study of Type 1 Diabetes (T1D) found
that fine-mapped T1D-associated SNPs are localized in active enhancers of thymus, T and B cells,
and CD34+ stem cells (Li, Chen et al. 2015, Onengut-Gumuscu, Chen et al. 2015).
The working model for establishment and maintenance of active enhancers is that
transcription factors bind to the DNA, position the nucleosomes, and then serve to keep the region
in between the nucleosomes in an open conformation (Spitz and Furlong 2012). Thus, it is logical
to assume that risk-associated regulatory SNPs would have a higher likelihood of causality if they
disrupt a motif for a site-specific TF in the nucleosome-free region of an enhancer or DHS.
Unfortunately, although progress has been made in identifying in vivo motifs for TFs using ChIP-
seq data (Wang, Zhuang et al. 2013), the motifs for most site-specific TFs are not known.
However, programs have been developed that allow investigators to incorporate information about
the set of known TF motifs into SNP prioritization (Kilpinen, Waszak et al. 2013, Amin Al Olama,
Dadaev et al. 2015). Using such programs, regulatory SNPs located in motifs of TFs known to be
important in establishing or maintaining the phenotypic characteristics of specific cell types have
been identified. For example, motifbreakR (Amin Al Olama, Dadaev et al. 2015) can predict
TF motif disruptions for a large number of provided SNPs using several different sources of TF
motifs (see Table 1 for details). However, it should be noted that studies have shown that many
risk-associated SNPs (index SNPs and SNPs that are in high LD to index SNPs) are not precisely
located in the conserved binding motif of transcription factors but are in nearby regions (Heinz,
Romanoski et al. 2013, Farh, Marson et al. 2015). It is possible that such SNPs disrupt an as-of-
yet unknown motif for a TF that has not yet been characterized by ChIP-seq.
22
Perhaps the TF for which the most ChIP-seq experiments have been performed is
CCCTC-binding factor (CTCF) (Holwerda and de Laat 2013, Ong and Corces 2014, Nichols
and Corces 2015). ENCODE, as well as many individual laboratories, have mapped CTCF
binding sites in a large number of human cell types. Such studies have revealed that CTCF binds
to promoter and enhancer regions, but it can also bind to regions of the genome that lack the
histone modifications that specify active promoters and enhancers. For example, in Panc1 cells,
15% of CTCF peaks are in promoters, 14% are in enhancers, and 71% are in neither promoters
nor enhancers (M. Gaddis and P. Farnham, unpublished data). Topologically-associating domains
(TADs), which demarcate large chromatin regions that interact via looping, are enriched for CTCF
binding sites at their boundaries, suggesting a role for CTCF-mediated looping in the maintenance
of TADs (Sexton and Cavalli 2015). CTCF is also thought to contribute to the overall 3-
dimensional structure of chromatin by forming a loop through which distal enhancers and
promoters can be brought into close proximity, perhaps leading to transcriptional activation of the
linked promoter (Ong and Corces 2014). CTCF has also been shown to serve as insulator that
interferes with the interaction between an enhancer and a promoter and to block chromosome
position effects of transgenes (Ong and Corces 2014). Thus, regulatory SNPs that disrupt or
create a CTCF site would have high priority for follow-up analyses. A recent GWAS of a Chinese
population identified 3 index SNPs statistically associated with increased risk of lung cancer that
are located within CTCF ChIP-seq peaks in the A549 lung cancer cell line (Petit, Jourdain et al.
2015). In addition, Ding et al. identified statistically significant allele-specific CTCF binding data
from 50 lymphoblastoid cells lines (McDaniell, Lee et al. 2010) which were genotyped as a part of
the 1000 Genomes Project, providing a source of prioritized SNPs to study the involvement of
CTCF in disease risk (Ding, Ni et al. 2014). Interestingly, only 25% of these genetic variants are
exactly in the CTCF motif; however, most are located within 1kb of the motif (Ding, Ni et al. 2014).
This finding is consistent with the studies described above showing that many risk-associated
23
SNPs are not in the conserved binding motif of transcription factors but are in nearby regions. Of
course, it is not yet known if the SNPs that are nearby, but not in, CTCF motifs are functionally
relevant.
Finally, SNPs located within CpG sites have been studied for their relationship to disease.
Clearly, if a CpG site within a known motif for a TF is identified as a disease-associated SNP, it
could alter gene regulation simply by changing the affinity of the TF for that region. In fact, several
TFs do harbor CpG dinucleotides at critical positions in their motifs (Blattler and Farnham 2013,
Blattler, Yao et al. 2014). However, CpGs can also regulate gene expression in a more region-
specific way. CpG island methylation of promoter regions of tumor-suppressor genes is one of the
driving factors for cancer development (Jones 2012). In addition, recent studies have shown that
hyper- and hypo-methylation of distal elements can be linked to tumor-specific changes in gene
expression (Yao, Shen et al. 2015). Increased methylation of a promoter or enhancer is generally
thought to lead to transcriptional repression, whereas decreased methylation is thought to lead to
gene activation. Thus, a single change at a SNP (which disrupts or increases binding of a TF) can
lead to an altered epigenetic pattern of a larger region. Measuring methylation levels at 22,290
CpG diuncleotides in lymphoblastoid cell lines of 77 individuals from the HaMap project, Bell et al.
found 180 CpG sites in 173 genes that are associated with SNPs located nearby (within a 5 Kb
window) (Bell, Pai et al. 2011). Additionally, several diseases have been reported to be linked to
aberrant SNP-associated methylation at CpGs in promoter regions (Hitchins, Rapkins et al. 2011,
Dayeh, Olsson et al. 2013, Ye, Zhou et al. 2015). For example, Hitchins et al. found that a single
nucleotide variant in the 5’ UTR of the MLH1 gene resulted in increased methylation of the
promoter, leading to transcriptional repression. It has been suggested that the variant SNP
decreases recruitment of a TF, causing loss of protection from methylation on nearby CpG sites,
thus leading to Lynch syndrome.
24
As described above, identification of Regulatory SNPs requires investigation as to whether
any of the Refined Association SNPs fall within promoters, enhancers, TF binding sites, or CpG
dinucleotides. Although one could determine if any of the relatively small set of index SNPs for a
particular disease is located within a mapped regulatory element by simply visualizing the location
of the SNP and the location of functional elements on a genome browser, it would be quite
laborious to do this for the many hundreds of the SNPs in high LD with the index SNPs. Therefore,
several different programs have been developed that integrate genetic information (genotyping
and imputation data for GWAS index SNPs and SNPs in LD to index SNPs) with epigenetic
information (generated by DNase-seq, ChIP-seq, or DNA methylation assays) and chromatin
interaction data. Listed in Table 1 are some of the publicly available functional annotation
programs; each program has its own advantages and disadvantages. For example, Regulome DB
(Boyle, Hong et al. 2012) and HaploReg (Ward and Kellis 2012) share similar features,
automatically providing all possible epigenetic information for all available cell types and tissues
for the input SNPs (the epigenetic maps are derived from the ENCODE and REMC databases).
However, neither program has options for analyzing only the relevant cell types for the disease-
associated SNPs. In contrast, FunciSNP (Coetzee, Rhie et al. 2012), GREGOR (Schmidt, Zhang
et al. 2015) and Enlight (Guo, Conti et al. 2015) allow users to add their specific epigenetic data
from the cell type of interest (which may not be in the public databases), providing a better
prioritization of the regulatory SNPs. Of note, GWAS3D (Li, Wang et al. 2013) and Enlight (Guo,
Conti et al. 2015) include an automatic analysis of chromatin interaction features (although such
data is not yet available for many cell types), and Enlight automatically generates plots showing
LD information and overlapping annotated features.
25
Table 1.1 Publicly available functional annotation programs.
Tool Type Minimum,input, Output
epigenetic,annotation,file,
used
URL PMID
HaploReg Web$server rsID
Overlapping$annotated$features$
and$TF$motif$disruption$
information$for$SNPs$(input)$and$
correlated$SNPs$with$r2$>0.8
ChromHMM,$DnaseJseq,$a$
library$of$position$weight$
matrices$(PWMs)$from$
TRANSFAC,$JASPAR,$and$
protein$binding$array$(PBM),$
and$eQTL
http://www.broadinstitute.o
rg/mammals/haploreg/haplo
reg_v3.php
22064851
RegulomeDB Web$server rsID
Overlapping$annotated$feaure$for$
SNPs$(input)$with$scores$which$
depend$on$the$combination$of$
overlapping$annotated$features$
and$UCSC$genome$browser$
showing$overlapping$feaures$
TF$binding,$DnaseJseq,$
FAIRE,$Dnase$
footprinting,eQTL,$dsQTL,$
ChIPJexo$and$DNA$
methylation
http://www.regulomedb.org 22955989
FORGE Web$server rsID
Overlapping$DNase1$hotspots$for$
SNP(input)
DNase1$hotspot
http://browser.1000genome
s.org/Homo_sapiens/UserDa
ta/Forge
rSNPBase Web$server rsID$or$gene$name
Proximal$or$distal$transcriptional$
regulation,$miRNA$regulation,$
RNA$binidng$protein$mediated$
regulation,$eQTL$results$for$SNPs$
(input)$and$correlated$SNPs$
(r2>0.8)
histone$modification,$TF$
bindings,$CpG$islands,$RBP,$
miRNA$data$$
http://rsnp.psych.ac.cn/ 24285297
FunciSNP R$package
GWAS$index$SNP$
information$
(chrom:position,$rsID,$
population)in$tabJdelimited$
file,$$biofeature$$
information$in$.bed$format,$
userJdefined$r2$value
Overlapping$annotated$features$
for$index$SNP(input)$and$
correlated$SNPs$which$r2$values$
are$userJdefined.
Any$biofeature$annotation$
information$in$.bed$format
https://github.com/labrazil/
Coetzee_Seq_Analysis/tree/
master/FunciSNP
22684628
GREGOR
A$package$run$using$
perl$code
A$file$containing$single$
column$of$index$SNP,$
biofeature$information$in$
.bed$format,$userJdefined$
r2$value
prioritized$variants$based$on$
overlap$with$selected$regulatory$
regions,$enrichment$analysis$with$
PJvalues$showing$how$index$SNPs$
or$correlated$SNPs$are$enriched$in$
annoated$feaure$compared$to$
control$SNPs$
Any$biofeature$annotation$
information$in$.bed$format
http://csg.sph.umich.edu/GR
EGOR/index.php/site/index
25886982
Enlight Web$server rsID,$PJvalue
$plots$shwing$LD$and$overlapping$
annotated$features$for$SNP$
(input)
chromHMM,$histone$
modification,$DNA$
methylation,$TF$bindings,$
eQTL,$HiJC$or$customized$
BED$file$for$biofeatures
http://enlight.usc.edu/index.
html
25262152
GWAS3D Web$server rsID,$PJvalue
TF$motif$analysis$and$overlapping$
annotated$features$for$SNPs$
(input)
5C,$HiJC,$ChIAJPET,$
ChromeHMM,$H3K27Ac,$
p300,$CTCF,$DHS$$(Option$
for$selecting$cell$lines$
relavant$to$disease)
http://jjwanglab.org/gwas3d 23723249
motifbreakR R$package
SNP$information$in$.bed$or$
.vcf$format
Comprehensive$TF$binding$sites$
disruption$at$SNPs$(input)
TF$motif$information$from$
ScerTF,$FlyFactorSurvey,$
hpDI,$UniPROBE,$JASPAR,$
ENCODE,Homer,Factorbook,$
HOCOMOCO
https://github.com/SimonJ
Coetzee/motifBreakR
26272984
26
1.3 Prioritization of SNPs associated with a specific disease by linking to
gene expression
A second way to prioritize risk-associated SNPs is to link the different SNP alleles to
changes in gene expression using population-based methods. These methods identify
“expression quantitative trait loci” (eQTL), which are defined as genomic regions that harbor one
or more nucleotide variants that correlate with differences in gene expression (Albert and
Kruglyak 2015). We note that although eQTLs are said to identify ‘loci’, most investigators use this
term to refer to specific nucleotides (i.e. SNPs) that correlate with differences in gene expression
(Albert and Kruglyak 2015). Expression-associated SNPs (Figure 1; eQTL SNPs) can be
bioinformatically associated with genes that are located in a genomic regions near to or far from
the SNP in question. If associated with a nearby gene, the relationship is termed a “local eQTL’”,
whereas SNPs associated with genes located farther away on the same chromosome or on
different chromosomes are called “distal eQTLs”. In many cases, local eQTLs work as cis-eQTLs,
which directly affect expression of nearby genes (usually limited to genes within 250 Kb to 1 Mb)
in an allelic-specific manner (Albert and Kruglyak 2015, Gibson, Powell et al. 2015). In contrast,
trans-eQTLs cannot be applied to the study of allele-specific gene expression because they likely
affect expression of the identified gene as a secondary consequence of changes in direct target
genes. Most trans-eQTLs are distal eQTLs, being associated with genes found far from the SNP
on the same chromosome or on different chromosomes (Pai, Pritchard et al. 2015). However, it
should be noted that some trans-eQTLs can be local eQTLs; even though nearby the SNP under
study, the associated gene can be affected as a secondary consequence of gene expression
changes of a direct target gene. Most studies focus on cis-eQTLs (Pai, Pritchard et al. 2015)
because trans-eQTLs require multiple testing to gain statistical power (Westra, Peters et al. 2013).
27
For eQTL analyses, SNPs are mapped using a genotyping array and mRNA abundance is
measured by microarray or, more commonly in recent studies, by RNA-seq using hundreds of
samples from cell lines or tissues that are relevant to the disease or traits under study. Statistical
methods are then used to associate SNPs with transcripts to identify eQTLs (Gibson, Powell et al.
2015); sources of eQTL databases are listed in Table 2. It is important to note that not all eQTL
SNPs have been linked to disease; in other words, the SNPs associated with gene expression
were not identified via GWAS. However, many studies have revealed that eQTLs can be identified
for some GWAS risk loci (testing index SNPs or SNPs in high LD with the index SNPs); in these
cases, the association of the SNP and expression of nearby genes was identified in a trait or
disease-specific manner (Nicolae, Gamazon et al. 2010, Zhong, Beaulaurier et al. 2010,
Ramasamy, Trabzuni et al. 2014, Zhang, Johnson et al. 2014). For example, type 2 Diabetes
index SNPs and high LD SNPs (r2>0.9) are enriched in the set of eQTL SNPs identified using
liver and fat tissues (Zhong, Beaulaurier et al. 2010). Also, eQTL SNPs identified using gene
expression datasets from blood showed enrichment for association with autoimmune disease, but
not with bipolar disorder or type 2 Diabetes(2015). On the other hand, the sets of genes located
nearby GWAS-identified SNPs are not always highly concordant with eQTL-associated genes
(2015), suggesting that some GWAS signals affect genes that are far away. Therefore, we cannot
conclude that the target genes of GWAS SNPs are the same genes identified by cis-eQTL SNPs,
For example, Munsunuru et al. (Musunuru, Strong et al. 2010) used GWAS information to identify
a risk loci at 1p13 that is associated with both plasma low-density lipoprotein cholesterol (LDL-C)
and myocardial infarction (MI). Also, they used eQTL analysis of liver gene expression datasets to
determine if risk SNPs in the 1p13 region are associated with nearby genes, finding that two
GWAS index SNPs (rs646776 and rs12740374) were in eQTL with the SORT1 gene. The authors
suggest that the minor allele of rs12740374 creates a C/EBP binding site and results in increased
SORT1 expression, which contributes to the risk for LDL-C and MI. However, it should be noted
28
that SORT1 is not the nearest gene to rs12740374 and is located 123 Kb from the risk-associated
index SNP.
Although some eQTLs are shared across different cell types, most eQTL associations are
cell-type specific (Emilsson, Thorleifsson et al. 2008, Dimas, Deutsch et al. 2009). These cell-type
specific eQTLs are often quite far from the gene they as associated with and tend to have small
effects on gene expression, reflecting the characteristics of enhancer elements (Dimas, Deutsch
et al. 2009). Using epigenetic information from ENCODE and REMC to functionally annotate 4085
intergenic eQTLs, investigators showed that the eQTLs which have the highest significance per
gene are enriched in transcription factor binding sites, enhancers, promoters, and open chromatin.
A recent study identified enrichment of eQTL SNPs in distal elements, but the SNP-gene
expression linkage only appeared upon immune stimulation of naïve monocytes (Fairfax,
Humburg et al. 2014), suggesting that new enhancers harboring eQTL SNPs were created by
immune stimuli. Several studies have suggested that changes in transcription factor binding are a
major result of cell type-specific eQTLs, leading to changes in chromatin structure, histone
modification, or methylation, with resultant changes in gene expression (McVicker, van de Geijn et
al. 2013). In a recent study correlating RNA-seq data from 103 matched tumor and normal colon
mucosa samples from Danish patients with germline genotyping from 90 patients, investigators
found that many of the identified eQTLs are tumor specific. Using ChIP-seq peaks from a colon
cancer cell line, they concluded that the tumor-specific eQTLs are associated with binding of
several transcription factors that show increased expression in tumors (Ongen, Andersen et al.
2014). Other evidence supporting an important role for transcription factor binding in the
mechanism by which eQTLs function is provided by meQTLs, defined as CpG sites in which DNA
methylation changes have association with SNPs that are several Kb away (Banovich, Lan et al.
2014). A recent study showed 23 SNPs out of 109 cancer GWAS SNPs from 13 different cancer
types had associations with methylation status (Heyn, Sayols et al. 2014). Banovich et al. showed
29
that meQTLs are frequently associated with changes in histone modification, DNase1
hypersensitivity, chromatin accessibility, and expression changes in nearby genes. As described
above, meQTLs are thought to affect TF binding, which in turn influences DNA methylations at
nearby CpG sites (Banovich, Lan et al. 2014). In the cases where meQTLs are eQTLs, a positive
correlation between methylation and expression was shown when meQTLs are not near a TSS
(median distance of ~7Kb) and a negative correlation between methylation and expression was
shown when meQTLs are near a TSS (median distance of ~1Kb), which is consistent with
findings that active promoters show low DNA methylation whereas bodies of actively transcribed
genes show high DNA methylation (Gutierrez-Arcelus, Lappalainen et al. 2013, Banovich, Lan et
al. 2014).
Table 1.2 Sources of eQTL databases
1.4 Experimental approaches to identify target genes of regulatory and eQTL
SNPs
Although investigators often use either functional annotation or eQTL to identify prioritized
SNPs (Figure 1; collectively referred to as Candidate Functional SNPs), using a combination
approach may help to rank the individual lists for follow-up study. The set of Regulatory SNPs
(especially those obtained using high LD and not fine-mapping) is usually larger than the set of
eQTL SNps. Therefore, determining if any of the large set of enhancers that harbor risk-
Tool Features URL PMID
NCBI%eQTL%browser cis1eQTL%from%liver,%limphoblastoid,%brain http://www.ncbi.nlm.nih.gov/projects/gap/eqtl/index.cgi
seeQTL
browser%for%cis1eQTL,%and%trans1eQTL%from%
limhpblastoid,%brain,%monocyte
http://www.bios.unc.edu/research/genomic_software/seeQTL/ 22171328
Chicago%eQTL
QTL%(eQTL,%dsQTL,%trQTL,%exonQTL)from%
limphoblastoid,%brain,%liver,%fibroblast,%T1cells
http://eqtl.uchicago.edu/cgi1bin/gbrowse/eqtl/
GTEx%Portal >60%tissues%eQTL%data%and%eQTL%IGV%browser http://www.gtexportal.org/home/ 25954001
GeneVar >5%tissues%eQTL%,%meQTL%data%and%visualization https://www.sanger.ac.uk/resources/software/genevar/ 20702402
Blood%eQTL Blood%cis1and%trans1eQTLs http://genenetwork.nl/bloodeqtlbrowser/ 24013639
Geuvadis
QTL%(eQTL,mirQTL,%trQTL)%from%limphoblastoid%
cell%lines
http://www.ebi.ac.uk/Tools/geuvadis1das/ 24037378
30
associated SNPs are also in eQTL with one or more genes may identify a subset of risk-
associated enhancers that have a higher probability of having an impact on gene expression.
Similarly, although the set of eQTL SNPs is usually not large, it is difficult to perform functional
follow-up studies of the entire set. Therefore, determining which of the eQTL SNPs are also
located in a regulatory region would help to prioritize the list. Having identified a set of Regulatory
and/or eQTL SNPs, the next logical step would seem to be functional follow-up studies of the
genes regulated by the SNP-harboring elements. However, it is not easy to determine the actual
target gene of a regulatory element. It is a commonly held assumption that a risk-associated SNP
that falls within a promoter region influences expression of that particular gene. In fact, if the gene
in question has a known biological function consistent with the possibility that it may influence
cellular phenotype in a manner consistent with the disease being studied, then investigators often
go straight to studying that gene. However, some have postulated that promoters can also have
enhancer activity and can influence the expression of other genes (Andersson, Sandelin et al.
2015). Thus, it may be premature to assume that SNPs located near to the 5’ end of a gene only
influence the regulation of that particular gene. It is even more difficult to predict what gene is
directly regulated by an enhancer because they are located distal from a TSS, can regulate genes
in an orientation-independent manner, and, most importantly, can skip over nearby genes to
regulate genes farther away. Thus, although one hypothesis is that the gene nearest to a
promoter or an enhancer that harbors a regulatory SNP is the disease-related gene, in most
cases this hypothesis has not been proven (or even tested). Therefore, it is best to take an
unbiased approach and experimentally identify target genes of the regulatory elements harboring
Candidate Functional SNPs. This can be accomplished by manipulating the genomic region
containing the SNPs in question and determining if expression of the putative target gene is in fact
altered and/or by testing physical interactions between the region harboring the SNP and a
putative target gene using looping assays.
31
1.4.1 Deletion of regulatory elements harboring prioritized SNPs
We suggest that the first step toward identifying a target gene of a distal element should
be to delete or epigenetically modify the element and study subsequent effects on the
transcriptome (Figure 1; Approach A). We note that because deleting or inactivating an entire
promoter region would automatically eliminate expression from that gene (making it difficult to
determine the exact role of the SNP), we recommend that analysis of Candidate Functional SNPs
located in promoters should begin with specific targeting methods described below (Figure 1;
Approach B). A traditional method to study a distal regulatory element in the genome of cultured
cells or in a mouse model has been to remove or replace a wildtype regulatory element with a
mutated version using loxP and the Cre recombinase. In a recent study, the loxP-Cre
recombination method was used to delete an enhancer from the mouse genome that is located
within a region that corresponds to a region of the human genome harboring a colon cancer
GWAS SNP (Sur, Hallikas et al. 2012). Mice lacking this enhancer element were resistant to
intestinal tumor formation, possibly due to down regulation of Myc, which is located 335 kb from
the deleted sequence. Although these results are promising, there are several disadvantages in
using the loxP-Cre system. For example, creation of the plasmids needed for homologous
recombination is laborious and the insertion of the foreign loxP DNA sequence into the genome
could potentially affect gene expression (Meier, Bernreuther et al. 2010). Fortunately, recently
developed technologies that are based on zinc finger proteins (ZFPs), transcription activator-like
effectors (TALEs), or the clustered regularly interspaced short palindromic repeats
(CRISPR)/CRISPR-associated protein (Cas) have allowed researchers to investigate functionality
of genomic elements in the endogenous context in almost any organism (Hsu, Lander et al. 2014).
Using these genomic engineering platforms, regulatory elements can be deleted from the genome
without the introduction of exogenous sequences. In addition, the same genomic platforms can be
used to epigenetically alter the genomic sequences containing a risk-associated SNP.
32
Regulatory elements harboring Candidate Functional SNPs can be deleted using zinc
finger-based nucleases (ZFNs), TALE-based nucleases (TALENs), or Cas9-based nucleases
(CRISPR/Cas9) (Hsu, Lander et al. 2014, Sander and Joung 2014). ZFNs and TALENs work as
heterodimers, with each monomer consisting of multiple DNA binding domains and a partial Fok1
nuclease. The DNA binding domains of ZNFs are tandem arrays of C2H2 zinc fingers, with each
finger recognizing 3-bp of DNA; ZFNs are created such that each half of the heterodimer
recognizes between 9 to 18bp of DNA at the target cut site. The DNA binding domains of TALENs
are composed of a tandem array of repetitive 33-35 amino acid modules, with each module
recognizing 1-bp of DNA; TALENs are usually created such that they recognize between 12 to 19
bp of DNA at the target cut site. Binding of a pair of heterodimeric ZFNs or TALENs at target
sequences leads to Fok1 dimerization and DNA double strand breaks (DSB) (Kim and Kim 2014).
Because TALENs can be assembled based on a single bp recognition schema, they can be
targeted to a larger percentage of the genome than can ZNFs, which are based on a 3 nt motif
schema. Also, the DNA binding domains of TALENs are easier to assemble than are zinc finger
domains. In contrast to ZFNs and TALENs, which rely on protein-target DNA interaction,
CRISPRs nucleases use complementary binding between RNA and DNA (Sander and Joung
2014). The most widely used CRISPR/CAS9 system has two components; the Cas9 nuclease
and a guide RNA (sgRNA) that can bind to specific target DNA sequence and recruit Cas9 to that
genomic location, resulting in a double strand break (Barrangou 2014, Sander and Joung 2014).
Construction of CRISPRs requires only cloning RNA sequences that will hybridize to target sites
(Hsu, Lander et al. 2014). Because of the ease of cloning, reports of high targeting specificity
(Hsu, Lander et al. 2014), and accessibility of the guide RNAs to regions of methylated DNA (Hsu,
Scott et al. 2013), most investigators have begun using the CRISPR/Cas9 system to make DSBs
in human cells (Sander and Joung 2014).
33
Deletion of regulatory elements by ZFNs, TALENs, or CRISPR nucleases requires targeting
functional nucleases (heterodimeric in the case of ZFNs or TALENs or monomeric in the case of
CRISPR nucleases) to both sides of the element. DSBs occur at both target sites, resulting in
local sequence alterations at each target site and loss of the intervening sequences. Recent
studies have shown that genomic regions ranging from several bp to more than 1 Mb can be
deleted (Li, Rivera et al. 2014, Webster, Barajas et al. 2014, Kraft, Geuer et al. 2015, Li, Shou et
al. 2015), with deletion efficiency having an inverse correlation with the size of the deleted region
(Canver, Bauer et al. 2014). The frequency of obtaining biallelic deletions in normal cells or
cancer cells having almost diploid chromosomal numbers is much higher than when multi-copy
genomic regions (created by amplification or increased chromosomal copy numbers) in cancer
cells are targeted. In most cases, many clones must be analyzed to identify cells that lack all
copies of the regulatory element under study. It is also important to keep in mind that if a
regulatory element plays a large role in controlling expression of an essential gene, then deletion
of all copies of that element from the genome could affect cell proliferation or survival (Canver,
Bauer et al. 2014); in this case, cells having monoallelic or partial loss of the copies of the element
(in the case of aneuploid cancer cells) must be analyzed. Several recent studies have used
genomic nucleases to delete regulatory elements and identify target genes. For example, Li et al.
deleted a 13 Kb section of a super-enhancer located 100 Kb downstream of the Sox2 gene and
observed ~90% downregulation of Sox2 gene expression (Li, Rivera et al. 2014). Myer et al.
deleted a Vitamin D receptor (VDR) binding region located 10 Kb upstream of the Mmp13 gene
and found that VDR-mediated regulation of Mmp13 was abolished. They also deleted a RUNX2
binding region located 30 Kb upstream of Mmp13 and observed a complete loss of Mmp13
expression (Meyer, Benkusky et al. 2015). As described in Chapter 2 and Chapter 3, I have
deleted enhancers harboring Regulatory SNPs from colon cancer cells and have identified genes
whose expression is altered.
34
1.4.2 Epigenetic modification of regulatory elements that harbor prioritized risk
SNPs
An alternative method to identify target genes for distal Candidate Functional SNPs is to
modulate the chromatin state of the element using ZFs or TALEs fused to a chromatin modifying
domain or by using a Cas9 that has no nuclease activity (dCas9) fused to a chromatin modifying
domain; such engineered systems are termed “epigenetic toggle switches”. To mimic deletion of
an enhancer, epigenetic repressors can be employed. The lysine-specific histone demethylase
KDM1A (also known as LSD1) and a KRAB domain that recruits the KAP1/SETDB1 histone
methylase have been fused to TALEs and dCAS9; constructs having KDM1A should decrease
active histone methylation marks whereas constructs having the KRAB domain should increase
inactivating histone methylation marks. One study has suggested that dCas9-KRAB is more
efficient than TALE-KRAB for inactivating enhancers, perhaps due to steric hindrance caused by
bound dCas9 in preventing recruitment of activating factors (Gao, Tsang et al. 2014). Another
study that targeted dCas-LSD1 to the distal enhancer of Oct34 and Tbx3 showed loss of
H3K4me2 and a dramatic decrease of H3K27Ac at enhancer regions. Interestingly, the action of
dCas9-LSD1 was shown to be specific to enhancers, with very little consequences if targeted to
promoters. In contrast, in this study dCas9-KRAB was more effective at promoters, resulting in an
increase of H3K27me3 or H3K29me3 level at targeted promoters but not at targeted enhancers
(Kearns, Pham et al. 2015). To achieve the opposite effect, investigators have used domains such
as VP64, an activating domain that recruits histone acetylases (HATs), as well as the enzymatic
domain of the p300 HAT to increase the levels of active epigenetic marks at regulatory elements.
For example, Gao et al. modified enhancers that regulate the Oct 4 gene. These enhancers are
normally only active in embryonic stem cells and are marked by the repressive histone
modification H3K27me3 in mouse embryonic fibroblasts. However, TALE-VP64 constructs
35
targeted to these enhancers decreased levels of H3K27me3 and increased levels of the active
marks H3K27Ac and H3K4me1 (Gao, Tsang et al. 2014). Also, a recent study showed that the
catalytic domain of the HAT P300 (P300
core
) fused to dCas9 could activate target enhancers and
promoters. In this study, a single gRNA targeting an enhancer region with dCas9-P300
core
was
sufficient to activate target gene expression, whereas other dCas9 activators required several
gRNAs to achieve high levels of gene expression (Hilton, D'Ippolito et al. 2015). The authors
suggested that the P300 domain is superior to the VP64 domain because P300 directly regulates
histone acetylation whereas VP64 must recruit a HAT. It is possible that many of the differences
in effectiveness of the various activating or repressing epigenetic toggle switches in the different
studies are due to specific features of the exact promoters and enhancers that were studied.
However, considering the ease of cloning guides RNAs, it seems that CRISPR/dCas9 constructs
such as dCas9-P300
core
and dCas9-LSD1 could become a standard method used to identify
target genes after turning on repressed enhancers or turning off activated enhancers, respectively.
1.4.3 Specific targeting of prioritized SNPs
Once deletion or epigenetic modification of a distal regulatory element has been shown to
have functional consequences, a more detailed analysis can be performed to compare the effects
of the risk and non-risk alleles and to identify specific nucleotides within the element important for
regulation; this same approach should be used to study the effect of a SNP on the activity of a
promoter region (Figure 1; Approach B). In traditional approaches, investigators have used
luciferase reporter assays to test individual TF binding sites of enhancers. Such studies require
removing putative enhancer elements from their native chromosomal location and ligating them
into luciferase constructs such that they regulate a heterologous promoter (Melnikov, Murugan et
al. 2012, Fortini, Tring et al. 2014). In addition to not using the correct promoter to test enhancer
elements, the choice of cell type could influence the results for enhancers, which function in a
highly cell-type specific manner. Another approach using mice involves pronuclear injection of
36
endogenous versus mutated enhancer sequences linked to a lacZ gene (Palmiter and Brinster
1986). These approaches have issues regarding copy number and position-dependent effects on
reporter gene activity and effects of foreign DNA sequences on the native genomic landscape that
perturb endogenous gene expression (Palmiter and Brinster 1986). More recent studies have
used genomic engineering to compare endogenous versus mutated regulatory elements. When
CRISPR/Cas9 makes a double stranded break, cells use either nonhomologous end-joining
(NHEJ) or homology-directed repair (HDR) to repair the break (Sander and Joung 2014). DNA
repair mediated by NHEJ is used when two CRISPR nucleases are targeted to either side of an
enhancer, resulting in local alterations at each target site and loss of the intervening sequences.
However, because NHEJ results in small insertions or deletions at the site of cleavage this
method can be used for disrupting transcription factor motifs if one guide RNA is precisely
targeted to the motif. Another way to study the precise effects of removing or altering a SNP is to
substitute a section of the genome with exogenously provided DNA, using the HDR pathway. By
providing, along with the guide RNAs and Cas9, a donor DNA fragment that is basically identical
to the genomic sequence but contains the alternative SNP allele or a mutation of a TF motif, a
precise exchange of genomic regions can be accomplished.
In one study, Vierstra et al. deleted three DHSs located 62, 58, and 55 Kb away from the
TSS of BCL11A, a TF that represses fetal hemoglobin (HbF) levels. Deletion of the DHSs located
at 55 and 58 Kb away using TALENs led to downregulation of BCL11A and increased level of
HbF, but no effect was seen after deletion of the DHS located 62 Kb away (Vierstra, Reik et al.
2015). This study provides an excellent example supporting the recommendation for deletion of
regulatory elements prior to performing more detailed mutational analyses of an element. In this
case, studies of individual binding sites in the DHS located 62 Kb away would have not been
useful. Following upon the deletion studies, Viestral et al. then used ZFNs to disrupt five
transcription factor footprints in the enhancer located 58 Kb away from BCL11A and found that
37
disruption of one of the TF footprints led to reduction of BCL11A. Another method for identifying
critical regions of an enhancer is to use tiled guide RNAs and Cas9. Investigators used ~150 to
~200 different guide RNAs to target the +55, +58, and +62 DHS regions of the BCL11A locus.
They found that guide RNAs that disrupted the +58 DHS showed the most effect (Canver, Smith
et al. 2015). Even though HDR is less efficient compared to NHEJ, the fact that this mechanism
can be used to exchange DNA fragments between a plasmid and the genome makes this the
method of choice to study SNP-specific differences. Several studies have used CRISRP/Cas9
and HDR-mediated genome editing to change SNPs in mice and cell culture model systems
(Claussnitzer, Dankel et al. 2014, Lee, Ye et al. 2014, Long, McAnally et al. 2014, Yin, Xue et al.
2014, Han, Slivano et al. 2015). The most common method is to introduce plasmids that express
the guide RNAs and Cas9, along with a plasmid that contains the donor sequence (e.g. an
enhancer fragment that has the SNP changed to the other allele). Claussnitzer et al. transfected
guide RNAs along with Cas9 and donor DNA plasmids into cultured adipose cells to switch a
Type 2 Diabetes risk SNP to the non-risk SNP allele, affecting binding of a TF and causing a
decrease in target gene expression (Claussnitzer, Dankel et al. 2014). Other studies have
reported an increased efficiency of HDR-mediated genome editing using purified guide RNAs and
Cas9 mRNA in place of the expression plasmids and single stranded oligodeoxynucelotides
having homology arms in place of the double stranded DNA (Wang, Yang et al. 2013, Yang,
Wang et al. 2013). Using this strategy in a mouse model, Han et al. substituted a 5 nucleotide
sequence within an intronic region of the Cnn1 gene, which disrupted a CArG box for SRF factor
and caused a reduction in expression of Cnn1 (Han, Slivano et al. 2015). Finally, these genomic
tools can be used to study orientation dependence of a region harboring a Candidate Functional
SNP. CTCF-mediated loops are frequently formed in a convergent orientation involving
homodimerization of CTCF proteins located quite far apart on the genome, with the orientation of
the CTCF sites determining the choice of interaction between specific enhancers and promoters
38
(Ong and Corces 2011, Guo, Xu et al. 2015). Using 2 guide RNAs and Cas9, Guo et.al inverted
the region containing a CTCF binding site, switching the CTCF orientation with respect to
surrounding CTCF sites; they found that this inversion resulted in changes in gene expression
patterns (Guo, Xu et al. 2015).
1.5 Disease-related functional analyses
As described above, an integrated and ordered approach should be used to investigate the role of
non-coding SNPs in gene expression. Namely, a combination of deletion or modification of a
regulatory element plus eQTL analyses can provide a list of candidate target genes. However, an
analysis of non-coding risk-associated SNPs is not complete without further characterization of
how the gene(s) whose activity is influenced by a particular SNP affect initiation, progression, or
manifestation of the disease under study. Identifying the causal gene(s) will provide insights into
the disease and perhaps also provide new diagnostic or therapeutic targets. It is likely that the
results of manipulation of the regulatory element or the eQTL analyses identified more than one
candidate target gene. Thus, it may be difficult to know which of the genes whose expression is
linked to the SNP should be tested in phenotypic assays. If one of the candidate target genes is
tested with negative results, this could mean either that the candidate SNP is not really linked to
the disease, that the wrong assay was used, or that the wrong candidate gene was assayed. One
approach to deal with this uncertainty is to first analyze the effects of deleting or repressing the
element in a functional assay; if the element can be shown to affect a particular phenotype, then
individual candidate target genes can subsequently be studied using that same assay.
If studying GWAS loci related to cancer, methods that are used for functional follow-up
studies include proliferation and cell migration assays (Edwards, Beesley et al. 2013). However,
cultured cancer cell lines are not ideal model systems because of their genomic instability (which
leads to variable karyotypes) and because isolated cancer cell lines grown in tissue culture dishes
do not properly represent the complex environment of the cells in the context of either a normal
39
tissue or a tumor. Investigators have begun to use 3 dimensional organoids (Matano, Date et al.
2015), normal cell lines, or isogenic ES or iPS cells (Schwank, Koo et al. 2013, Grobarczyk,
Franco et al. 2015) to try to reproduce a more natural cellular environment for functional studies.
However, even these assays do not allow the study of effects seen only within a complex tissue. If
a mouse mode exists that closely reproduces the human disease, then perhaps this would be the
ideal system to use; the phenotypic consequences of a SNP and/or putative causal target gene
may be more consequential in a living organism than in a short-term cell culture assay. For
example, when a mouse lacking a homologous enhancer that is associated with colon cancer in
humans (described above) was crossed to a mouse that spontaneously develops tumors in the
intestine and colon, the incidence of polyp formation was reduced in their offspring (Sur, Hallikas
et al. 2012). Another issue to consider is that an individual SNP (or regulatory element) may not
cause dramatic phenotypic differences. Instead, it may be necessary to study combinations of
SNPs. A recent study evaluating the combinatorial effects of SNPs showed that different SNPs in
the same LD block identified different enhancers that cooperatively regulate the same target gene
(Corradin, Saiakhova et al. 2014). Such studies suggest that altering an individual GWAS-
identified regulatory element may have less functional consequences than inactivation of a target
gene. However, if multiple target genes work together to contribute to disease risk then even
moving from SNP to target gene may not solve the problem. Perhaps investigators should use
multiplexing CRISPR/Cas9 systems (Cheng, Wang et al. 2013, Cong, Ran et al. 2013, Zalatan,
Lee et al. 2015) to simultaneously target many regulatory elements and/or putative target genes
from several different risk-associated loci to test for combinatorial effects in phenotypic assays
(Corradin and Scacheri 2014).
If an appropriate assay is identified whose outcome is influenced by loss or modification of
the SNP or regulatory element, then candidate target genes can be tested using that same assay
in the hopes of identifying the causal gene. Commonly used approaches to investigate the
40
function of a candidate causal gene include overexpressing an exogenous form of the gene (e.g.
using a cloned cDNA) or reducing levels of the endogenous gene using RNAi tools. More recently,
alternative approaches for overexpressing or repressing genes have been developed that are
based on the genomic engineering tools described above. For example, investigators have used
CRISPR/dCas9 nucleases to mutate coding regions (Xue, Chen et al. 2014) and epigenomic tools
such as TALEs and dCAs9 fused to activator or repression domains have been used to regulate
the promoter of a gene of interest (Kabadi and Gersbach 2014). However, it is important to
consider that overexpressing a gene from a cDNA may not appropriately provide the correct
splice variant (Prelich 2012) and that inactivation methods such as siRNA, shRNA, or genomic
nucleases have the inherent problem of off-target effects (Sigoillot, Lyman et al. 2012).
Investigators often choose putative causal genes based on a) proximity to the regulatory
element, b) degree to which expression is affected, or c) a gene function that can be easily
imagined to contribute to the disease risk. Each of these choices is fraught with problems. For
example, as discussed above, genes are not necessarily near their regulatory elements. Another
confounding issue is that changes in mRNA do not always lead to similar changes in protein
levels (Ghazalpour, Bennett et al. 2011, Battle, Khan et al. 2015) and thus the genes that show
the largest changes in mRNA might not necessarily produce the largest changes in protein.
Finally, gene function is often assigned based on the first set of experiments performed on that
gene; many genes function in multiple networks, often in a tissue-specific manner. Therefore, it is
important to keep in mind that identifying a causal gene may require testing several different
candidates.
A limitation of the genomic and epigenomic editing technologies described above is that it
is hard to distinguish target genes directly regulated by an enhancer from genes whose
expression has been indirectly affected as a consequence of the expression changes of the direct
targets (e.g. changes in signaling pathways or proliferative states). However, physical interaction
41
assays can be used to attempt to distinguish genes that are directly vs. indirectly affected by a
risk-associated enhancer. Many interaction assays are based on principles of the chromosome
confirmation capture (3C) assay, which involves capturing chromosome interactions by
formaldehyde cross-linking, followed by digestion with a restriction enzyme and subsequent
ligation of DNA regions that were brought together by protein-protein interactions; ligation
frequency between two loci is assessed using qPCR (Rivera and Ren 2013). Using 3C, Zhang et
al. investigated all possible interactions between a prostate-specific enhancer and genes that are
within an ~3 Mb window, identifying a single loop to a gene that is 1 Mb away from the enhancer
(Zhang, Cowper-Sal lari et al. 2012). However, the results from 3C assays are limited to a pre-
selected region, excluding the discovery of interactions with regions beyond the tested genomic
window. A modification of 3C, Circular Chromosome Conformation Capture followed by
sequencing (4C-seq), allows the investigation of all possible interactions mediated by a specific
enhancer by employing high throughput sequencing instead of qPCR. Using 4C-seq, investigators
showed that enhancers located within an intron of the FTO gene and harboring Obesity and Type
2 Diabetes GWAS-identified SNPs do not interact with the FTO promoter but instead interact with
the IRX3 gene which is located 500 Kb downstream (Smemo, Tena et al. 2014). Hi-C, another
variation of 3C, can be used to study all chromatin interaction within the genome. Unfortunately,
the majority of Hi-C experiments capture interactions separated by at least 1 Mb (Corradin and
Scacheri 2014) and thus may miss “close-by” enhancer-promoter loops. However, a recent
modification of Hi-C, called Capture Hi-C, which increases the resolution of the mapped
interactions, has been used to study colon cancer risk SNPs. These experiments identified
interactions that are enriched with colon cancer-specific transcription factor binding sites (Jager,
Migliorini et al. 2015). This technique was also used to identify short-range interactions between
an enhancer and a gene 26 Kb away (Hughes, Roberts et al. 2014). Therefore, to study
interactions between enhancers and promoters, Capture-C is a recommended method since it not
42
only provides a better resolution of ~1-2 Kb, but also can detect hundreds of interactions in one
experiment. Importantly, even though these looping assays that identify interactions between
regulatory elements harboring SNPs and promoters can provide clues as to the identity of putative
target genes, it is important to compare these results to those in which the regulatory element has
been experimentally. Genes whose expression is linked to the regulatory element and that are
also involved in promoter-enhancer loops are likely to be direct targets, whereas genes whose
expression is linked to the element but no loop is found can either be indirect targets or direct
targets that are difficult to identify due to limitations of the current looping assays; it is also
possible that enhancer-promoter loops will be identified that are not related to genes whose
expression changes upon manipulation of the enhancer. Finally, it is important to note that a
gene whose expression is indirectly affected by a non-coding SNP could be a more important
diagnostic or therapeutic target that the directly affected gene. Thus, it is important to identify both
the direct targets of a risk-associated enhancer and other genes affected by reduction of levels of
the direct targets.
43
2 Chapter 2: Functional annotation of colon cancer risk SNPs
The work described in this chapter has been published in Nature Communication, 5: 5114. doi:
10.1038/ncomms6114. Lijing Yao, Yu Gyoung Tak, Benjamin P. Berman and Peggy J. Farnham.
“Functional annotation of colon cancer risk SNPs”. Lijing Yao is responsible for all bioinformatic
analyses and assisted with manuscript preparation; Yu Gyoung Tak performed the enhancer
characterizations (Table 2.3 and Figure 2.4 ) and helped to edit the manuscript; Benjamin P.
Berman advised L.Y. in bioinformatic analyses and Peggy J. Farnham conceived the project and
wrote the manuscript.
2.1 Abstract
Colorectal cancer (CRC) is a leading cause of cancer-related deaths in the United States.
Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms
(SNPs) associated with increased risk for CRC. A molecular understanding of the functional
consequences of this genetic variation has been complicated because each GWAS SNP is a
surrogate for hundreds of other SNPs, most of which are located in non-coding regions. Using
genomic and epigenomic information, we tested the hypothesis that the GWAS SNPs and/or
correlated SNPs are in elements that regulate gene expression, identifying 23 promoters and 28
enhancers. Using gene expression data from normal and tumor cells, we identified 66 putative
target genes of the risk-associated enhancers (10 of which were also identified by promoter
SNPs). Employing CRISPR nucleases, we deleted one risk-associated enhancer and identified
genes showing altered expression. We suggest that similar studies should be performed to fully
characterize all CRC risk-associated enhancers.
44
2.2 Introduction
Colorectal cancer (CRC) ranks among the leading causes of cancer-related deaths in the
United States. The incidence of and death from CRC is in the top 3 of all cancers in the United
States for both men and women (http://apps.nccd.cdc.gov/uscs/toptencancers.aspx). It is
estimated that 142,820 men and women will be diagnosed with, and 50,830 men and women will
die of, cancer of the colon and rectum in 2013 (http://seer.cancer.gov/-
statfacts/html/colorect.html). A better understanding of the regulatory factors and signaling
pathways that are deregulated in CRC could provide new insight into appropriate
chemotherapeutic targets. Decades of studies have revealed that certain genes and pathways,
such as WNT, RAS, PI3K, TGF-B, p53, and mismatch repair proteins, are important in the
initiation and progression of CRC (Fearon 2011). In an attempt to obtain a more comprehensive
view of CRC, two new approaches have been used; exome sequencing of tumors and genome-
wide population analyses of human variation. The Cancer Genome Atlas (TCGA) has taken the
first of these new approaches in the hopes of moving closer to a full molecular characterization of
the genetic contributions to CRC, analyzing somatic alterations in 224 tumors (2012). These
studies again implicated the WNT, RAS, and PI3K signaling pathways. The second new approach
identifies single nucleotide polymorphisms associated with specific diseases using genome wide
association studies (GWAS). GWAS has led to the identification of thousands of single nucleotide
polymorphisms (SNPs) associated with a large number of phenotypes (Hindorff, Sethupathy et al.
2009, Manolio 2010). Such studies identify what are known as tag SNPs that are associated with
a particular disease. Specifically for CRC, 25-30 tag SNPs have been identified (Zanke,
Greenwood et al. 2007, Houlston, Webb et al. 2008, Houlston, Cheadle et al. 2010, Dunlop,
Dobbins et al. 2012, Bauer, Kamran et al. 2013, Ding, Lee et al. 2013).
Although identification of tag SNPs is an important first step in understanding the relationship
between human variation and risk for CRC, a major challenge in the post-GWAS era is to
45
understand the functional significance of the identified SNPs (Boyle, Hong et al. 2012). It is
critical to advance the field by progressing from a statistical association between genetic variation
and disease to a molecular understanding of the functional consequences of the genetic variation.
Progress towards this goal has been mostly successful when the genetic variation falls within a
coding region. Unfortunately, most SNPs identified as associated with human disease in large
GWAS studies are located within large introns or distal to coding regions, in what in the past has
been considered to be the unexplored territory of the genome. However, recent studies from the
ENCODE Consortium have shown that introns and regions distal to genes contain regulatory
elements. In particular, the ENCODE Consortium has made major progress in defining hundreds
of thousands of cell-type specific distal enhancer regions (2012, Frietze, Wang et al. 2012,
Zentner and Scacheri 2012). Comparison of GWAS SNPs to these enhancer regions has
revealed several important findings. For example, work from ENCODE and others have shown
that many GWAS SNPs fall within enhancers, DNase hypersensitive sites, and transcription factor
binding sites (Maurano, Humbert et al. 2012),(Schaub, Boyle et al. 2012),(Akhtar-Zaidi, Cowper-
Sal-lari et al. 2012). It is also clear that the SNP whose functional role is most strongly supported
by ENCODE data is often a SNP in linkage disequilibrium (LD) with the GWAS tag SNP, not the
actual SNP reported in the association study (Hardison 2012).
These recent reports clearly show that regulatory elements can help to identify important SNPs
(Boyle, Hong et al. 2012, Farnham 2012, Hardison 2012). However, the studies were performed
using all available ENCODE data and did not focus the functional analysis of cancer-associated
SNPs on the regulatory information obtained using the relevant cell types. Using epigenetic marks
obtained from normal colon and colon cancer cells, we have identified SNPs in high LD with
GWAS SNPs that are located in regulatory elements specifically active in normal and/or tumor
colon cells. Characterization of transcripts nearby CRC risk-associated promoters and enhancers
using RNA expression data allows the prediction of putative genes and non-coding RNAs
46
associated with an increased risk of colon cancer. Using genomic nucleases, we deleted one
risk-associated enhancer and compared the deregulated genes with those predicted to be targets
of that enhancer. Our studies suggest that transcriptome characterization after precise deletion of
a risk-associated enhancer will be a useful approach for post-GWAS analyses.
2.3 Results
2.3.1 CRC risk-associated SNPs linked to a specific gene
For our studies, we chose 25 tag SNPs, 4 of which have been associated with an increased
risk for CRC in Asia-derived case-control cohorts and the rest in Europe-derived case-control
cohorts; the genomic coordinates of each SNP can be found in Table 2.1. Of these 25 tag SNPs,
only one is found within an exon, occurring in the third exon of the MYNN gene and resulting in a
synonymous change that does not lead to a coding difference. However, there are hundreds of
SNPs in high LD with each tag SNP and it is possible that some of the high LD SNPs may reside
in coding exons. To address this possibility we used a bioinformatics program called FunciSNP to
identify SNPs correlated with CRC tag SNPs that also intersect the set of coding exons in the
human genome (Coetzee, Rhie et al. 2012). FunciSNP is an R/Bioconductor package that allows
a comparison of population-based correlated SNPs from the 1000 Genomes Project
(http://www.1000genomes.org/) with any set of chromatin biofeatures. In this initial analysis, we
chose coding exons from the Gencode 15 dataset (http://www.gencodegenes.org/releases/) as
the biofeature. Because LD varies with the population, to identify population-based correlated
SNPs we specified the Asian population for analysis of the 4 tag SNPs identified using Asian-
derived case-control cohorts and we specified the European population for analysis of the rest of
the tag SNPs. Using FunciSNP, we identified 240 unique SNPs that are correlated with the 25 tag
SNPs at an r
2
>0.1 and are within a coding exon (see Supplementary Figure S2.1). We then
47
used snpeff (http://snpeff.sourceforge.net/ (Cingolani, Platts et al. 2012)) to determine that 40 of
these correlated SNPs create non-synonymous changes; however, limiting the SNPs to those
with an LD of r
2
>0.5 with the tag SNP reduced the number to only 13. Using polyphen-2
(http://genetics.bwh.harvard.edu/pph2/ (Adzhubei, Schmidt et al. 2010)) and provean
(http://provean.jcvi.org/index.php (Choi, Sims et al. 2012)), only 2 potentially damaging SNPs at
r
2
>0.5 were found, both in POU5F1B (Figure 2.1). At the less restrictive r
2
>0.1, 4 other genes
were also found to harbor a damaging SNP (RHPN2, UTP23, LAMA5, and FAM186A). To
determine if these genes are expressed in colon cells, we performed two replicates of RNA-seq
for HCT116 cells and also used RNA-seq data from the Roadmap Epigenome Mapping
Consortium for normal sigmoid colon to examine expression. After analysis of both sets of RNA-
seq data, we categorized transcripts that are not expressed as having less than 0.5 FPKM
(Supplementary Figure S 2.2). Analysis of the RNA-seq data revealed that POU5F1B and
FAM186A are not expressed in either the normal sigmoid colon or HCT116 cells (however, see
these genes are expressed in a cohort of TCGA colon tumors; see Table 2.2).
Another way to link a SNP to particular gene is if the SNP falls within a promoter region. We
again used FunciSNP, but this time the biofeature analyzed corresponded to the region from -
2000 to +2000 nt of the transcription start site (TSS) of each transcribed gene (we analyzed
coding and non-coding transcripts from GENCODE V15). We chose to include 2 kb upstream
and downstream of the start site as the promoter proximal regions because several studies
(Koudritsky and Domany 2008, Stergachis, Haugen et al. 2013), as well as visual inspection of
the ENCODE TF ChIP-seq tracks, have shown that transcription factors can bind on either side of
a transcription start site. Using an r
2
>0.1, we found 684 correlated promoter SNPs which were
reduced to 233 SNPs at r
2
>0.5 (Figure 2.1 and Supplementary Figure S2.3). Many of these
SNPs fall within the same promoter regions. When collapsed into distinct promoters, we identified
the TSS regions of 17 protein coding genes and 2 noncoding RNAs which are expressed in
48
HCT116 or sigmoid colon cells; promoter SNPs identified 4 additional expressed genes when a
larger number of TCGA colon tumor samples were analyzed (Table 2.2).
49
Figure 2.1 Identification of potential functional SNPs for CRC.
A) Shown is the number of SNPs identified by FunciSNP in each of 3 categories for 25 colon cancer
risk loci (see Table 2.1 for information on each CRC risk SNP). For exons, only non-synonymous
SNPs are reported; parentheses indicated the number of SNPs that are predicted to be damaging;
see Table 2.2 for a list of the expressed genes associated with the correlated SNPs. For TSS
regions, the region from -2kb to +2kb relative to the start site of all transcripts annotated in
GENCODE V15, including coding genes and non-coding RNAs was used; see Table 2.2 for a list of
expressed transcripts associated with the correlated SNPs. B) For H3K27Ac analyses, ChIP-seq
data from normal sigmoid colon and HCT116 tumor cells were used; see Table 2.3 for further
analysis of distal regions harboring SNPs in normal and tumor colon cells. The SNPs having an
r
2
>0.1 that overlapped with H3K27Ac sites were identified separately for HCT116 and sigmoid colon
datasets. Because more than one SNPs could identify the same H3K27Ac-marked region, the SNPs
were then collapsed into distinct H3K27Ac peaks. The sites that were within +/- 2 kb of a promoter
region were removed to limit the analysis to distal elements. To obtain a more stringent set of
enhancers, those regions having only SNPs with r
2
<0.5 were removed. This remaining set of 68
distal H3K27Ac sites were contained within 19 of the 25 risk loci. Visual inspection to identify only
the robust enhancers having linked SNPs not at the margins reduced the set to 27 enhancers
located in 9 of the 25 risk loci; an additional enhancer was identified in SW480 cells (see Table 2.3
for the genomic locations of all 28 enhancers). Color key: green=SNPs or H3K27Ac sites unique to
normal colon, red=unique to colon tumor cells, blue=present in both normal and tumor colon.
A B
CRC tag SNPs (25)
Genomic window
(+/-200kb) around tag SNP
Extract all known SNPs
(1000 genome database)
Select correlated SNPs
LD r
2
> 0.1
Select correlated SNPs
LD r
2
> 0.1
r
2
> 0.1: 40 (7)
r
2
> 0.5: 13 (2)
r
2
> 0.1: 684
r
2
> 0.5: 233
r
2
> 0.1: 746
r
2
> 0.5: 270
Exon TSS region H3K27Ac
370
236
140
111
47
41
96
27
32
41
18
9
13 13
1
SNPs with r
2
> 0.1 and
overlapping H3K27Ac
different
H3K27Ac sites
distal
H3K27Ac sites
distal H3K27Ac sites
r
2
> 0.5
visual inspection
Yao_Figure 1
50
Table 2.1 Summary of regions linked to CRC tag SNPs.
The positions and classification of the CRC tag SNPs are based on the hg19 UCSC genome
browser reference genome; the hg19 reference alleles (Ref) and the alternative alleles (Alt) are
indicated; the risk alleles are in red. The number of exons having a non-synonymous,
damaging correlated SNP with an LD of r
2
>0.1 are reported; the 3 regions marked with an
asterisk are the only ones for which the damaging SNP has an LD of r
2
>0.5 with the tag SNP.
For TSS and enhancers, the number of different promoters or enhancers having at least one
SNP with an LD of r
2
>0.5 with the tag SNP are reported (note that a given TSS or enhancer can
be identified by more than one tag SNP; see Table 2.2 and Table 2.3 for more details). PMID
indicates the PubMED ID for a publication describing the identification of the tag SNP. A list of
all correlated SNPs with r
2
>0.1 in exons, TSS, or enhancers can be found in Supplementary
data file 2.1.
Tag$SNP Position Ref/Alt $Exons
Protein$
Coding$TSS
Non8Coding$
TSS Enhancers$ PMID
rs6691170 chr1:222045446 G/T 0 0 2 0 20972440
rs6687758 chr1:222164948 A/G 0 0 3 0 20972440
rs10936599 chr3:169492101 C/T 0 4 1 0 20972440
rs647161 chr5:134499092 C/A 0 0 1 6 23263487
rs1321311 chr6:36622900 C/A 0 1 1 0 22634755
rs16892766 chr8:117630683 A/C 1 1 0 0 18372905
rs10505477 chr8:128407443 A/G *1 1 0 2 17618283
rs6983267 chr8:128413305 G/T *1 1 0 2 23266556
rs7014346 chr8:128424792 A/G *1 1 1 2 18372901
rs10795668 chr10:8701219 G/A 0 0 1 0 18372905
rs1665650 chr10:118487100 T/C 0 0 0 0 23263487
rs3824999 chr11:74345550 T/G 0 1 0 1 22634755
rs3802842 chr11:111171709 C/A 0 3 1 0 18372901
rs10774214 chr12:4368352 T/C 0 0 0 1 23263487
rs7136702 chr12:50880216 T/C 1 3 1 4 20972440
rs11169552 chr12:51155663 C/T 0 2 2 3 20972440
rs4444235 chr14:54410919 T/C 0 1 1 0 19011631
rs4779584 chr15:32994756 T/C 0 1 1 0 18372905
rs9929218 chr16:68820946 G/A 0 2 1 4 19011631
rs4939827 chr18:46453463 T/C 0 0 0 2 18372905
rs10411210 chr19:33532300 C/T 1 2 0 2 19011631
rs961253 chr20:6404281 C/A 0 0 0 0 19011631
rs2423279 chr20:7812350 T/C 0 0 0 0 23263487
rs4925386 chr20:60921044 T/C 1 1 3 4 20972440
rs5934683 chrX:9751474 T/C 0 0 0 0 22634755
51
Table 2.2 Expressed transcripts directly linked CRC index SNPs
Only 3 damaging SNPs having an r
2
>0.1 were identified in the exons of genes expressed in
either HCT116 or normal sigmoid colon cells; of these, only UTP23 and RHPN2 were identified
as damaging by two different programs. RNAs expressed in HCT116 or sigmoid colon cells and
having a correlated SNP with r
2
>0.5 within +/- 2kb of the TSS of protein coding transcripts or
non-coding RNAs are shown. The cases in which the tag SNP is located in the TSS region are in
bold and non-coding RNAs are in parentheses. We note that exon SNPs identified two additional
expressed genes (POU5F1B and FAM186A) and promoter SNPs identified 3 additional
expressed genes (FAM186A, LRRC34, and LRRIQ4) when a larger number of TCGA colon
tumor samples were analyzed.
2.3.2 CRC risk-associated SNPs in distal regulatory regions
Most of the SNPs in LD with the CRC GWAS tag SNPs cannot be easily linked to a specific
gene because they do not fall within a coding region or a promoter-proximal region. However, it is
possible that a relevant SNP associated with increased risk lies within a distal regulatory element
of a gene whose function is important in cell growth or tumorigenicity. To address this possibility,
we used the histone modification H3K27Ac to identify active regulatory regions throughout the
genome of colon cancer cells or normal sigmoid colon cells. We used HCT116 H3K27Ac ChIP-
Tag$SNP $Exons RNAs$of$TSS$SNPs
rs10936599 ACTRT3,MYNN,.(TERC)
rs1321311 CDKN1A
rs16892766 UTP23 EIF3H
rs7014346 (RP11>382A18.1)
rs3824999 POLD3
rs3802842 C11orf92,C11orf93,C11orf53
rs7136702 DIP2B
rs11169552 ATF1,DIP2B
rs4444235 BMP4
rs4779584 GREM1
rs9929218 CDH3,CDH1
rs10411210 RHPN2 GPATCH1,RHPN2
rs4925386 LAMA5 LAMA5
52
seq data (Frietze, Wang et al. 2012) produced in our lab for the tumor cells and we obtained
H3K27Ac ChIP-seq data for normal colon cells from the NIH Roadmap Epigenome Mapping
Consortium. The ChIP-seq data for both the normal and tumor cells included two replicates. To
demonstrate the high quality of the datasets, we called peaks on each replicate of H3K27Ac from
HCT116 and each replicate of H3K27Ac from sigmoid colon using Sole-search (Blahnik, Dou et
al. 2010, Blahnik, Dou et al. 2011) and compared the peak sets from the two replicates using the
ENCODE 40% overlap rule (after truncating both lists to the same number, 80% of the top 40% of
one replicate must be found in the other replicate and vice versa). After determining that the
HCT116 and sigmoid colon datasets were of high quality (Supplementary Figure S2.4), we
merged the two replicates from HCT116 and separately merged the two replicates from sigmoid
colon and called peaks on the two merged datasets; see Supplementary Table S2.2 for a list of
all ChIP-seq peaks. Using the merged peak lists from each of the samples as biofeatures in
FunciSNP, we determined that 746 of the 4894 SNPs that were in LD with a tag SNP at r
2
>0.1
were located in H3K27Ac regions identified in either the HCT116 or sigmoid colon peak sets; of
these 270 SNPs had an r
2
>0.5 with a tag SNP (Figure 2.1 and Supplementary Figure S2.5).
A comparison of the H3K27Ac peaks from normal and tumor cells indicated that the patterns
are very similar; in fact, ~24,000 H3K27Ac peaks are in common in the normal and tumor cells.
However, there are clearly some peaks unique to normal and some peaks unique to the tumor
cells. Therefore, we separately analyzed the normal and tumor H3K27Ac ChIP-seq peaks as
different sets of biofeatures using FunciSNP (Figure 2.1B). Of the 746 SNPs, 236 were located
in a H3K27Ac site common to both normal and tumor cells, whereas 140 were unique to tumor
and 370 were unique to normal cells. Visual inspection of the SNPs and peaks using the UCSC
genome browser showed that many of the identified enhancers harbored multiple correlated
SNPs. Reduction of the number of SNPs to the number of different H3K27Ac sites resulted in 47
common, 41 tumor-specific, and 111 normal-specific regions. Visual inspection also showed that
53
some of the H3K27 genomic regions corresponded to promoter regions (see Supplementary
Figure S2.4). Because promoter regions having correlated SNPs were already identified using
TSS regions (see above), we eliminated the promoter-proximal H3K27Ac sites, resulting in 27
common, 32 tumor-specific, and 96 normal-specific distal H3K27Ac regions. As the next
winnowing step, we selected only those enhancers having at least one SNP with an r
2
> 0.5,
leaving 18 common, 9 tumor-specific, and 41 normal-specific distal H3K27Ac regions. We noted
that some of the identified regions corresponded to low ranked H3K27Ac peaks. For our
subsequent analyses, we wanted to limit our studies to robust enhancers that harbor correlated
SNPs. Therefore, we visually inspected each of the genomic regions identified as having distal
H3K27Ac peaks harboring a correlated SNP. To prioritize the distal regions for further analysis,
we eliminated those for which the correlated SNPs was on the edge of the region covered by the
H3K27Ac signal or corresponded to a very low-ranked peak. After inspection, we were left with a
set of 27 distal H3K2Ac regions in which a correlated SNP (r
2
>0.5) was well within the boundaries
of a robust peak (Figure 12.B). To confirm our results, we repeated the analysis using H3K27Ac
data from a different colon cancer cell line, SW480, identifying only one additional enhancer
harboring risk SNPs for CRC. The genomic coordinates of each of these 28 enhancers, which are
clustered in 9 genomic regions, are listed in Table 2.3 (see also Supplementary Table S2.3).
Combining all data, enhancers in 5 of the 9 regions were identified in all 3 cell types and 8 of the 9
regions were identified in at least two of the cell types.
54
Figure 2.2 Expression of risk-associated genes in colon cells.
The left panel indicates if a transcript was identified by a SNP located in an exon or a TSS or is
nearby a risk-associated enhancer; the middle panel shows the expression values of each of
the 41 transcripts in sigmoid colon or HCT116 tumor cells; the right panel shows the fold
change of each transcript in the tumor cells (positive indicates higher expression in tumor).
55
Table 2.3 Distal regulatory regions correlated with CRC tag SNPs
The tag SNP and the correlated SNPs for 28 distal, robust H3K27Ac regions are indicated;
the enhancers that are found only in normal sigmoid colon are indicated with an asterisk.
The 3 nearest protein-coding RNAs and 3 nearest non-coding RNAs were identified using
the GENCODE V15 gene annotation; only those RNAs that are expressed in HCT116 or
sigmoid colon cells are shown (see also Supplementary 2.3)
Enhancer Tag*SNP
No.*of*
Correlated*
SNPs
Chromosome Start End Location
Nearby*Expressed*Coding*and*Non=
Coding*RNAs
1 4 chr5 134468409 134473214 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
2 6 chr5 134474759 134478528 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
3* 4 chr5 134520309 134523373 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
4 7 chr5 134525698 134531612 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
5 7 chr5 134543144 134548023 CTC0203F4.13intron H2AFY,PITX1
6 7 chr5 134511610 134516426 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
7 3 chr8 128412778 128414859
RP110382A18.13
intron
MYC,RP110382A18.1,RP110
382A18.2,3RP110255B23.3
8* 5 chr8 128420412 128422114
RP110382A18.13
intron
MYC,RP110382A18.1,RP110
382A18.2,3RP110255B23.3
9* rs3824999 4 chr11 74288844 74294943 Intergenic POLD3,LIPT2,KCNE3,AP001372.2
10
rs10774214.A
SN
1 chr12
4378128 4379840
Intergenic CCND2,C12orf5
11* 1 chr12 50908239 50913757 DIP2B3intron DIP2B,LARP4
12 2 chr12 50938468 50940796 DIP2B3intron DIP2B,LARP4
13* 2 chr12 51018019 51020503 DIP2B3intron DIP2B,ATF1,LARP4
14* 1 chr12 50973150 50974328 DIP2B3intron DIP2B,LARP4
15 1 chr12 51012054 51014942 DIP2B3intron DIP2B,ATF1,LARP4
16
rs11169552,333333
rs7136702
3 chr12 51040371 51042207 DIP2B3intron DIP2B,ATF1
17 1 chr16 68740822 68742561 Intergenic CDH3,CDH1,TMCO7
18* 4 chr16 68754658 68757192 Intergenic CDH1,CDH3,TMCO7
19 5 chr16 68774214 68780161 CDH13intron CDH1,CDH3,TMCO7
20 11 chr16 68784044 68791839 CDH13intron CDH1,CDH3,TMCO7
21 4 chr18 46448530 46450772 SMAD73intron SMAD7,CTIF,DYM,RP11015F12.1
22 6 chr18 46450800 46454601 SMAD73intron SMAD7,CTIF,DYM,3RP11015F12.1
23 6 chr19 33537339 33541195 RHPN23intron RHPN2,GPATCH1,C19orf40
24 1 Chr19 33530860 33533823 RHPN23intron RHPN2,GPATCH1,C19orf40
25 3 chr20
60929861 60935447
LAMA53intron
LAMA5,RPS21,CABLES2,3RP110
157P1.4
26 3 chr20
60938278 60941762
LAMA53intron
LAMA5,RPS21,CABLES2,3RP110
157P1.4
27 6 chr20 60948726 60951918 Intergenic LAMA5,RPS21,CABLES2
28 6 chr20 60955085 60958391 Intergenic RPS21,LAMA5,CABLES2
rs647161.ASN
rs10411210
rs4939827
rs4925386
rs10505477,333
rs6983267,3333333
rs7014346
rs7136702
rs11169552
rs9929218
56
2.3.3 Effects of SNPs on binding motifs in the distal elements
To determine possible effects of the correlated SNPs on transcription factor binding, we first
analyzed all SNPs having an r
2
>0.1 with the 25 CRC tag SNPs. Using position weight matrices
from Factorbook (Blattler, Yao et al. 2013), all correlated SNPs that fell within a critical position in
a transcription factor binding motif were identified (Supplementary Table S2.4). We identified
~800 SNPs that were predicted to impact binding of transcription factor to a known motif.
However, most of these SNPs are not in regulatory regions important for CRC. Therefore, we
next limited our analysis to the set of correlated SNPs that fall within the 28 robust enhancers
(Supplementary Table S2.5). We found 80 SNPs that cause motif changes in a total of 124
motifs, representing binding sites for 40 different transcription factors. Using RNA-seq data, we
found that 36 of these factors are expressed in HCT116 and/or sigmoid colon cells (Table 2.4),
suggesting that perhaps the binding of these factors at the risk-associated enhancers is
influenced by the correlated SNPs. Of the 36 factors, most were expressed at either
approximately the same levels in normal and tumor colon or at higher levels in HCT116 cells than
in normal colon. However, several factors showed large decreases in gene expression in HCT116
as compared to sigmoid colon cells, including FOS and JUN which were ~10 fold higher in normal
colon and HNF4A and ETS1 which were 30-40 fold higher in normal colon; see Supplementary
Table S 2.6.
57
Table 2.4 Effects of SNPs on motifs in the distal regulatory regions.
Details concerning the impacted motifs (SNP position, sequence of reference and alternative
alleles, and the direction of the effect on the motif) can be found in Supplementary data file 2.4.
1
identified by motif UA9, which was the top motif in NANOG hES ChIP-seq data;
2
identified by
motif UA2, which was the top motif in PBX3 GM12878 ChIP-seq data;
3
identified by motif UA5;
4
identified by motif UA3, which was the top motif in ZBTB7A K562 ChIP-seq data.
2.3.4 Expression analysis of candidate risk-associated genes
Although the genes identified by the exon or TSS SNPs are clearly good candidate genes for
analysis of their possible role in the development of colon cancer, it is difficult to definitively link a
target gene with a distal enhancer region because enhancers can function in either direction and
do not necessarily regulate the nearest gene. In fact, the ENCODE Consortium recently reported
that, on average, a distal element can physically associate with ~3 different promoter regions
(Sanyal, Lajoie et al. 2012). Also, only 27% of the distal elements showed an interaction with the
nearest TSS, although this increased to 47% when only expressed genes were used in the
analysis (Sanyal, Lajoie et al. 2012). Taken together, these analyses suggest that examining the
3 nearest genes may produce a reasonable list of genes potentially regulated by the CRC risk-
associated enhancers. Therefore, we used the GENCODE V15 dataset and identified the 3
nearest promoters of coding genes and 3 nearest promoters of non-coding transcripts around
each of the 28 enhancers (Supplementary Table S2.3). We next limited the nearby coding and
AP1 EGR1 MYC/MAX TCF12
AP2 ELF1 NR2C2 TCF7L2
BHLHE40 ELK4 PBX31 TEAD1
CEBPB ESRRA PRDM1 THAP1
2
CREB1 ETS1 RUNX1 USF1
CTCF GABP RXRA YY1
E2F1 GATA SP1 ZBTB7A
3
E2F4 GFI1 SREBF1 ZEB1
EBF1 HNF4A STAT1 ZNF281
58
non-coding transcripts to those expressed in either sigmoid colon RNA or HCT116 cells (Table
2.3); we note that taking into account expression did not greatly change the list of coding
transcripts but eliminated most of the non-coding transcripts which tend to be expressed in a very
cell-type specific manner. Interestingly, several of the genes nearby the risk-associated
enhancers were also identified in the TSS analyses, suggesting that a putative causal gene
associated with CRC might be differentially regulated by risk-associated SNPs found in the
promoter and in a nearby enhancer (Figure 2.2). We note that in these cases, the promoters
and enhancers were identified by different risk-associated SNPs in high LD with a tag SNP, with
the promoters being identified by SNPs within +/- 2kb of the TSS and enhancers being identified
by distal SNPs. We further analyzed the expression levels of all genes directly linked to the risk
SNPs (by exons or TSS) and the expressed genes nearby the risk-associated enhancers in
normal colon and HCT116 tumor cells. Shown in Figure 2.2 are the expression levels of each of
the 41 transcripts and the fold change in expression in HCT116 vs. normal cells; several of these
genes display robust changes in expression in the tumor cells.
As a second approach to identify transcripts potentially regulated by the identified enhancers,
we developed a new statistical approach that employs RNA-seq data from TCGA. We selected
the 10 nearest genes 5’ of and the 10 nearest genes 3’ of each of the 28 enhancers. Because of
the difference in gene density in different regions of the genome, the 20-gene span ranged from
786 kb to 7.5 MB, depending on the specific enhancer. Because several of the 28 enhancers are
clustered near each other, this resulted in a total of 182 unique genes. We downloaded the RNA-
seq data for 233 colorectal tumor samples and 21 colorectal normal samples from the TCGA data
download website (https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm) and determined if any
of the 182 genes show a significant increase or decrease (>2 fold change and P value < 0.01) in
colon tumors vs. normal colon (see Methods and Supplementary Figure S2.6 for an analysis of
potential TCGA batch effects). We then eliminated those genes whose expression change did not
59
correspond to the nature of the enhancer (e.g. a tumor-specific enhancer should not regulate a
gene that is higher in normal cells), leaving a total of 39 possible genes whose expression might
be differentially regulated in colon cancer by the risk enhancers (Table 2.5). We note that 5 of the
genes shown to be differentially expressed in the TCGA data (MYC, PITX1, POU5F1B, C5orf20,
and CDH3) are also in the set of nearest 3 genes to an enhancer having CRC risk-associated
SNPs. We found that 0-6 differentially expressed genes were linked to an enhancer, with an
average of 4 transcripts per enhancer that showed correct differential expression in colon tumors.
Heatmaps of the expression of the 39 putative enhancer-regulated genes, as well as the
expression of the genes identified by exon and promoter SNPs, in the TCGA samples are shown
in Supplementary Figure S2.7. To determine if we could validate any of the putative enhancer
targets, we used eQTL analyses based on data from TCGA. We began by identifying the SNPs
within each of the 28 enhancers that are on the Illumina WG SNP6 array used by TCGA.
Unfortunately, these arrays include only 8% of the SNPs of interest (i.e. the exon, promoter, and
enhancer SNPs that are correlated with the CRC tag SNPs), greatly limiting our ability to
effectively utilize the eQTL methodology. However, we did identify two examples of allelic
expression differences in the set of putative enhancer targets that correlated with SNPs in an
enhancer region. Both of these SNPs fell within enhancer 19 and showed correlation with allelic
expression differences of the TMED6 gene (the two SNPs significantly associated with TMED6
expression had an adjusted P value FDR < 0.1 rs7203339 and rs1078621); enhancer 19 falls
within the intron of the CDH1 gene, which is 600 kb from the transcription start site of the TMED6
gene (Figure 2.3). A summary of the eQTL analysis of enhancer and promoter risk-associated
SNPs can be found in Supplementary Table S2.7 and Supplementary Figure S2.8.
60
Table 2.5 Linking transcripts to enhancers using TCGA data.
Shown are the subset of the 10 nearest 5’ and 10 nearest 3’ transcripts for each enhancer
that show significant gene expression differences in normal vs. tumor samples, as determined
using RNA-seq data from TCGA. The numbers in parentheses indicate the fold change, with
positive indicating a higher expression in tumors. The 7 normal-specific enhancers are shown
in bold and all genes correlated with these enhancers should be expressed higher in normal
cells and thus have a negative value. The R vs L designation indicates the direction and
relative location of the transcript with respect to each enhancer
Region Enhancer( Correlated(transcripts
1
Enhancer1
PITX1(1.37_L1)4C5orf20(;1.96_R2)4TIFAB(;1.64_R3)4CXCL14(;1.39_R5)4SLC25A48(;
1.34_R7)
1
Enhancer2
PITX1(1.37_L1)4C5orf20(;1.96_R24)TIFAB(;1.64_R3)4CXCL14(;1.39_R5)4SLC25A48(;
1.34_R7)
1 Enhancer3 C5orf20(;1.96_R2)4TIFAB(;1.64_R3)4CXCL14(;1.39_R5)4SLC25A48(;1.34_R7)
1
Enhancer4
PITX1(1.37_L2)4C5orf20(;1.96_R1)4TIFAB(;1.64_R2)4CXCL14(;1.39_R4)4SLC25A48(;
1.34_R6)4TGFBI(2.74_R10)
1
Enhancer5
PITX1(1.37_L2)4C5orf20(;1.96_R1)4TIFAB(;1.64_R2)4CXCL14(;1.39_R4)4SLC25A48(;
1.34_R6)4TGFBI(2.74_R10)
1 Enhancer6 PITX1(1.37_L1)
2
Enhancer7
SQLE(1.8_L6)4FAM84B(1.01_L2)4POU5F1B(3.02_L1)4MYC(1.58_R2)4PVT1(2.44_R3)4
GSDMC(1.75_R4)
2 Enhancer8 none
3 Enhancer9 ARRB1(;1.15_R8)
4
Enhancer10
NRIP2(;1.07_L10)4FOXM1(1.47_L9)4TEAD4(2.15_L6)4RAD51AP1(1.36_R5)4GALNT8(;
1.58_R9)4KCNA6(;2.16_R10)
5 Enhancer11 LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;1.14_R9)
5
Enhancer12
RACGAP1(1.06_L10)4ASIC1(1.49_L9)4LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;
1.14_R9)
5 Enhancer13 LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;1.14_R9)
5 Enhancer14 LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;1.14_R9)
5
Enhancer15
RACGAP14(1.06_L10)4ASIC1(1.49_L9)4LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;
1.14_R9)
5 Enhancer16 RACGAP1(1.06_L10)4ASIC1(1.49_L9)
6 Enhancer17 SMPD3(;1.3_L4)4CDH3(6.24_L2)4TMED6(;1.36_R10)
6 Enhancer18 SMPD3(;1.3_L4)4TMED6(;1.36_R10)
6 Enhancer19 SMPD3(;1.3_L4)4CDH3(6.24_L2)4TMED6(;1.36_R10)
6 Enhancer20 SMPD3(;1.3_L4)4CDH3(6.24_L2)4TMED6(;1.36_R10)
7 Enhancer21 KATNAL2(;1.06_L9)4ZBTB7C(;2.81_L3)4LIPG(41.3_R6)4ACAA2(;1.43_R7)
7 Enhancer22 KATNAL2(;1.06_L9)4ZBTB7C(;2.81_L3)4LIPG(41.3_R6)4ACAA2(;1.43_R7)
8 Enhancer23 CHST8(44;2_R9)4KCTD15(;1.02_R10)
8 Enhancer24 CHST8(44;2_R9)4KCTD15(;1.02_R10)
9
Enhancer25
RBBP8NL(1.33_R3)4C20orf166;AS1(;3.81_R6)4SLCO4A1(3.19_R7)4
LOC100127888(2.45_R8)4NTSR1(;2.34_R9)4MRGBP(1.44_R10)
9
Enhancer26
RBBP8NL(1.33_R2)4C20orf166;AS1(;3.81_R5)4SLCO4A1(3.19_R6)4
LOC100127888(2.45_R7)4NTSR1(;2.34_R8)4MRGBP(1.44_R9)
9
Enhancer27
RBBP8NL(1.33_R3)4C20orf166;AS1(;3.81_R6)4SLCO4A1(3.19_R7)4
LOC100127888(2.45_R8)4NTSR1(;2.34_R9)4MRGBP(1.44_R10)
9
Enhancer28
RBBP8NL(1.33_R2)4C20orf166;AS1(;3.81_R5)4SLCO4A1(3.19_R6)4
LOC100127888(2.45_R7)4NTSR1(;2.34_R8)4MRGBP(1.44_R9)
61
Figure 2.3 Linking a transcript to an enhancers using TCGA data.
A) Shown is the location of enhancer 19 and the position of the three SNPs (in red) identified in
the eQTL studies and two other SNPs (in blue) identified by the FunciSNP analysis but not
present on the SNParray, in relation to the H3K27Ac, RNA-seq, and TCF7L2 ChIP-seq data for
that region. Also shown are the ENCODE ChIP-seq transcription factor tracks from the UCSC
genome browser. B) The expression of the Tmed6 RNA is shown for samples having
homozygous or heterozygous alleles for 3 SNPs in enhancer 19.. The upper and lower quartiles
of the box plots are the 75
th
and 25
th
percentiles, respectively. The whisker top and bottom are
90
th
and 10
th
percentiles, respectively. The horizontal line through the box is median value. The
pvalue corresponds to the regression coefficient based on the residue expression level and the
germ line genotype. Sample size is listed under each genotype. C) A schematic of the gene
structure in the genomic region around enhancer 19 (yellow box) is shown; the arrows indicate
the direction of transcription of each gene. The 3 genes in the enhancer 19 region that showed
differential expression in normal vs tumor colon samples (Table 2.5) are indicated; of these,
only TMED6 was identified in the eQTL analysis.
62
2.3.5 The effect of enhancer deletion on the transcriptome
The expression analyses described above provide a list of genes that potentially are regulated by
the CRC risk-associated enhancers. However, it is possible that the enhancers regulate only a
subset of those genes and/or the target genes are at a greater distance than was analyzed. One
approach to identify targets of the CRC risk-associated enhancers would be to delete an
enhancer from the genome and determine changes in gene expression. As an initial test of this
method, we selected enhancer 7, located at 8q24. The region encompassing this enhancer has
previously been implicated in regulating expression of MYC (Bond-Smith, Banga et al. 2012),
which is located 335 kb from enhancer 7. We introduced guide RNAs that flanked enhancer 7,
along with Cas9, into HCT116 cells, and identified cells that showed deletion of the enhancer. We
then performed expression analysis using gene expression arrays, identifying 105 genes whose
expression was down-regulated by loss of the enhancer (Supplementary Table S2.8); the
closest one was MYC, which was expressed 1.5 times higher in control vs. deleted cells (Figure
2.4).
63
Figure 2.4 Identification of genes affected by deletion of enhancer 7.
(A) Shown are the expression differences (x axis) and the significance of the change (y axis)
of the genes in the control HCT116 cells vs. HCT116 cells having complete deletion of
enhancer 7. The Illumina Custom Differential Expression Algorithm was used to determine
Pvalues to identify the significantly altered genes; 3 replicates each for the control and deleted
cells were used. Genes on chromosome 8 (the location of enhancer 7) are shown in blue.
The spot representing the MYC gene is indicated by the arrow. (B) Shown are all genes on
chromosome 8 that change in expression and the 10 genes showing the largest changes in
expression upon deletion of enhancer 7. The location of the enhancer is indicated and the
chromosome number is shown on the outside of the circle. (C) The genes identified as
potential targets using TCGA expression data are indicated; of these, MYC is the only
showing a change in gene expression upon deletion of the enhancer
2.4 Discussion
We have used the program FunciSNP (Coetzee, Rhie et al. 2012), in combination with
genomic, epigenomic, and transcriptomic data, to analyze 25 tag SNPs (and all SNPs in high LD
with those tag SNPs) that have been associated with an increased risk for CRC (Zanke,
Greenwood et al. 2007, Houlston, Webb et al. 2008, Houlston, Cheadle et al. 2010, Dunlop,
64
Dobbins et al. 2012, Bauer, Kamran et al. 2013, Ding, Lee et al. 2013). Taken together, we have
identified a total of 80 genes that may be regulated by risk-associated SNPs. Of these, 24 are
directly linked to a gene via a SNP within an exon or proximal promoter region and 56 additional
genes are putative target genes of risk-associated enhancers; see Figure 2.5 for a schematic
summary of the location of the tag and LD SNPs and associated genes and Supplementary
Table S2.10 for a complete list of genes and how they were identified.
Figure 2.5 Summary of identified candidate genes correlated with increased risk for CRC.
Shown are the 80 candidate genes identified in this study. For the gene names, green means
that it was only identified as a potential enhancer target, the other genes were identified as
direct targets either by an exon SNP or a TSS SNP; the putative enhancer target genes were
selected as described in the text. For each tag SNP, the relative number of SNPs that identified
an exon (red portion), a TSS (blue portion), or an enhancer (green portion) is shown by the bar
graph. The 9 genomic regions that harbor CRC risk enhancers are shown by the green
rectangles outside the circle.
1
3
5
6
8
10
11
12
14
15
16
18 19 20
x
rs10795668
rs10774214
rs11169552
rs10411210
rs10936599
rs10505477
rs16892766
rs6687758
rs6691170
rs1665650
rs3802842
rs3824999
rs7136702
rs4444235
rs4779584
rs9929218
rs4939827
rs2423279
rs4925386
rs1321311
rs6983267
rs7014346
rs5934683
rs961253
rs647161
RP11-255B23.3
RP11-382A18.1
RP11-382A18.2
RP11-157P1.4
LIPG
AP001372.2
CATSPER3
GPATCH1
CABLES2
CDKN1A
C19orf40
ACTRT3
C11orf53
C11orf92
C11orf93
TMCO7
SMAD7
C12orf5
GREM1
CCND2
LAMA5
RHPN2
POLD3
H2AFY
UTP23
MYNN
LARP4
KCNE3
DIP2B
RPS21
PITX1
BMP4
EIF3H
CDH1
TERC
ATF1
DYM
MYC
CTIF
TGFBI
C5orf20
TIFAB
CXCL14
SLC25A48
SQLE
FAM84B
GSDMC
PVT1
ARRB1
LIPT2
NRIP2
FOXM1
TEAD4
RAD51AP1
GALNT8
KCNA6
RACGAP1
ASIC1
LIMA1
METTL7A
POU6F1
CDH3
SMPD3
TMED6
KATNAL2
ZBTB7C
ACAA2
CHST8
KCTD15
RBBP8NL
C20orf166-AS1
SLCO4A1
NTSR1
LOC100127888
MRGBP
POU5F1B
LRRC34
LRRIQ4
FAM186A
RP11-15F12.1
Yao Figure 5
65
Of the 25 tag SNPs, only one is found within a coding exon, occurring in the third exon of
the MYNN gene and resulting in a synonymous change that does not lead to a coding difference.
However, by analysis of SNPs in high LD with the 25 tag SNPs, we identified 5 genes that harbor
damaging SNPs and which are expressed in colon cells (HCT116, normal sigmoid colon, or
TCGA tumors); these are POU5F1B, RHNP2, UTP23, LAMA5, and FAM186A). Interestingly, the
retrogene POU5F1B which encodes a homolog of the stem cell regulator OCT4 has recently been
associated with prostate cancer susceptibility (Breyer, Dorset et al. 2014). We also identified 23
genes (21 coding and 2 non-coding) that harbor highly correlated SNPs in their promoter regions
and are expressed in colon cells. Several of the genes that we have directly linked to increased
risk for CRC by virtue of promoter SNPs show large changes in gene expression in tumor vs.
normal colon tissue. For example, TERC, the non-coding RNA that is a component of the
telomerase complex, was identified by a promoter SNP and has higher expression in a subset of
colon tumors (Supplementary Figure S2.7A). Similarly, CDH3 (P-cadherin) was identified by a
promoter SNP and shows increased expression in many of the colon tumors. Both TERC and
CDH3 have previously been linked to cancer (Cao, Bryan et al. 2008, Paredes, Figueiredo et al.
2012). Promoter SNPs also identified three uncharacterized proteins (c11orf93, c11orf92 and
c11orf53) clustered together on chromosome 11. Inspection of H3K4me3 and H3K27Ac ChIP-
seq signals suggested that these genes are in open chromatin in normal sigmoid colon, but not in
HCT116. Accordingly, the TCGA gene expression data showed that all 3 genes are down-
regulated in a subset of human CRC tumors (Supplementary Figure S2.7A). Additional genes
identified by promoter SNPs that have been linked to cancer include ATF1, BMP4, CDH1,
CDKN1A, EIF3H, GREM1, LAMA5, and RHPN2 (Garte 1993, Cai, Zhao et al. 2007, Chen, Xu et
al. 2008, 2012, Huang, Guo et al. 2012, Paredes, Figueiredo et al. 2012, Abba, Patil et al. 2013,
Carneiro, Figueiredo et al. 2013, Cheung, Athineos et al. 2013, Danussi, Akavia et al. 2013,
Karagiannis, Berk et al. 2013, Aran and Hellman 2014). For example, BMP4 is up-regulated in
66
the HCT116 cells and has been suggested to confer an invasive phenotype during progression of
colon cancer (Cai, Zhao et al. 2007). Interestingly, we also identified GREM1, an antagonist of
BMP proteins, and showed that expression of GRIM1 is decreased in HCT116. The down-
regulation of the antagonist GRIM1 and the up-regulation of the cancer-promoting BMP4 may
cooperate to drive colon cancer progression. LAMA5 is a subunit of laminin-10, laminin-11 and
laminin-15. Laminins, a family of extracellular matrix glycoproteins, are the major non-collagenous
constituent of basement membranes and have been implicated in a wide variety of biological
processes including cell adhesion, migration, signaling, and metastasis (Aumailley 2013).
We identified 28 enhancers, clustered in 9 genomic regions, that harbor correlated SNPs. It is
important to note that in our studies we have used the appropriate cell types and the appropriate
epigenetic mark to identify CRC-associated enhancers. Previous analyses have attempted to link
SNPs to enhancers by using transcript abundance, epigenetic marks, or transcription factor
binding from non-colon cell types (Carvajal-Carmona, Cazier et al. 2011). In contrast, we have
used normal and tumor cells from the colon. Of equal importance is the actual epigenetic mark
that is used to identify enhancers. A previous study used H3K4me1 to identify genomic regions
that were differently marked between normal and tumor colon cells (Akhtar-Zaidi, Cowper-Sal-lari
et al. 2012). However, although H3K4me1 is associated with enhancers, this mark does not
specifically identify active enhancers. Some regions marked by H3K4me1 are classified as “weak”
or “poised” enhancers and it is thought that these regions may become active in different cells or
developmental states (Bernstein, Stamatoyannopoulos et al. 2010). In contrast, H3K27Ac is
strongly associated with active enhancers (Bonn, Zinzen et al. 2012, Calo and Wysocka 2013)
and we feel that this mark is the most appropriate one for identification of CRC-associated risk
enhancers.
Although it is not possible to conclusively know a priori what gene is regulated by each of the
identified enhancers, we have derived a list of putative CRC risk-associated enhancer target
67
genes by examining gene expression data from HCT116 cells and from a large number of colon
tumors. Several of the genes that are possible enhancer targets are transcription factors that
have previously been linked to cancer, including H2AFY, MYC, SMAD7, PITX1, TEAD4, and
ZBTB7C. MYC, of course, has been linked to colon cancer by many studies due to the fact that it
is a downstream mediator of WNT signaling, which is a strongly correlated with colon cancer
(2012). In addition, PITX1, TEAD4, and ZBTB7C are all transcription factors that have been
previously linked to the control of cell proliferation, specification of cell fate, or regulation of
telomerase activity (Cao, Rabinovich et al. 2011, Home, Saha et al. 2012, Jeon, Kim et al. 2012,
Knosel, Chen et al. 2012). Also, PVT1 is a Myc-regulated non-coding RNA that may play a role in
neoplasia (Birney, Stamatoyannopoulos et al. 2007, Huppi, Pitt et al. 2012).
In conclusion, we have used epigenomic and transcriptome information from normal and
tumor colon cells to identify a set of genes that may be involved in an increased risk for the
development of colon cancer. We realize that we cast a rather large net by analyzing 10 genes 5’
and 10 genes 3’ of each enhancer. We note that 5 of the genes shown to be differentially
expressed in the TCGA data (MYC, PITX1, POU5F1B, C5orf20, and CDH3) are also in the set of
nearest 3 genes to an enhancer having CRC risk-associated SNPs. However, enhancers can
also work at large distances. In fact, the eQTL analysis identified TMED6 as a potential target of
enhancer 19 (over 600 kb away) and deletion of enhancer 7 identified MYC as a target (335 kb
away). Future analyses of the entire set of CRC risk-associated enhancers are required to
confirm the additional putative long range regulatory loops suggested by our studies. Such studies
will provide a high confidence list of genes which, when combined with the genes identified by the
TSS risk-associated SNPs, should be prioritized for analysis in tumorigenicity assays.
68
2.5 Methods
2.5.1 RNA-seq
RNA-seq data was downloaded from the Reference Epigenome Mapping Center for
analysis of gene expression in sigmoid colon cells. For HCT116 colon cancer cells, RNA was
prepared using Trizol (Life Technologies, Carlsbad, CA), paired-end libraries were prepared
using the Illumina TruSeqV2 Sample Prep Kit (Catalog #15596-026), starting with 1 ug total
RNA. Libraries were barcoded, pooled, and sequenced using an Ilumina Hiseq. For analysis of
RNA-seq data, we used Cufflinks (Trapnell, Williams et al. 2010), a program of “alignment to
annotation” having discontinuous mapping to the reference genome. mRNA abundance was
measured by calculating FPKM (expected fragments per kilobase of transcripts per million
fragments sequenced), to allow inter-sample comparisons. We specified the –G option with the
GENCODE V15 comprehensive annotation so that the program will only do alignments that are
structurally compatible with the reference transcript provided. Two biological replicates were
performed and the mean FPKM of two biological replicates represents the expression of each
gene. We categorized genes into non-expressed, low expressed, and expressed based on the
distribution of the Gene FPKM (Figure S2.2) generated by the R package “ggplot2”.
RNA-seq data for 233 colorectal tumor samples and 21 colorectal normal samples was
downloaded from the TCGA data download website (https://tcga-
data.nci.nih.gov/tcga/dataAccessMatrix.htm). The data were all generated on the Illumina HiSeq
platform, and mapped with the RSEM algorithm and normalized so that the third quartile for each
sample equals 1000. Entrez gene IDs were used for mapping to genomic locations using
GenomicRanges (http://www.bioconductor.org/packages//2.12/bioc/html/GenomicRanges.html).
To identify transcripts differentially expressed in the tumor samples, we selected the 10 nearest
genes 5’ of and the 10 nearest genes 3’ of each of the 28 enhancers. After removing the non-
expressed genes, we then log2-transformed the expression data [log2(RSEM+1)], and performed
69
a t test on gene expression between the normal group and the tumor group for each gene using
254 TCGA colorectal RNAseq datasets. We selected statistically genes that showed a statisically
significant 2 fold change in expression (P <0.01, after adjustment by Benjamini and Hochberg’s
False Discovery Rate Methods).
To genererate the heatmap showing expression of genes in the TCGA samples, we log2-
transformed the expression data of the 254 TCGA colorectal samples RNAseq [log2(RSEM+1)].
Then we computed the mean and standard deviation of the expression of the each gene (𝑋
"
and
𝑠
"
). We normalized gene expression by [Z=
$%&
'
(
'
]. Hierarchical clustering with Ward’s method was
used to normalized TSS/exon gene expression.
2.5.2 ChIP-seq analysis
Two replicate H3K27Ac ChIP-seq datasets from HCT116 cells and two replicate H3K27Ac
ChIP-seq datasets from normal sigmoid colon were analyzed using the Sole-search ChIP-seq
peak calling program (Blahnik, Dou et al. 2010, Blahnik, Dou et al. 2011) using the following
parameters (Permutation:5; Fragment:250; AlphaValue: 0.00010 = 1.0E-4; FDR: 0.00010 = 1.0E-
4; PeakMergeDistance:0; HistoneBlurLength:1200). Each dataset was analyzed separately and
also analyzed as a merged dataset for HCT116 or sigmoid colon. The merged H3K27Ac peaks
from HCT116 or Sigmoid colon were analyzed using the GenomicRanges package of
bioconductor to identify promoter vs distal peaks.
2.5.3 Enhancer deletion
Guide RNAs designed to recognize chr8: 128412821-128412843 and chr8: 128414816-
128414838 (hg19) were cloned into a gRNA cloning vector (Addgene plasmid 41824) and
introduced into HCT116 cells by transfection, along with a plasmid encoding Cas9 and GFP.
Cells were sorted using a flow cytometer to capture the cells having high GFP signals and then
70
colonies were grown from single cells. Complete deletion of all alleles for enhancer 7 was
confirmed by PCR using primers flanking the enhancer. RNA analysis was performed in triplicate
using HumanHT-12 v4 Expression BeadChip arrays (Illumina), comparing the deleted cells to
parental HCT116 cells.
2.5.4 Analysis of FunciSNP and correlated SNPs effects
To identify SNPs correlated with the 25 CRC tag SNPs and that overlap with chromatin
biofeatures, we use the R package for FunciSNP (Coetzee, Rhie et al. 2012), which is available in
Bioconductor. We used H3K27ac ChIP-seq data from HCT116 cells and sigmoid colon tissue and
as biofeatures we used exon, intron, UTR, TSS annotations generated from GENCODE V15. We
ran FunciSNP with the following parameters: +/- 200 kb around each of the 25 tag SNPs and r
2
>0.1. To analysis the potential effects of correlated SNPs on protein coding, we employed SnpEff
and Provean using suggested default parameters. For analysis of SNPs on transcription factor
motifs, we employ a method developed by Dennis Hazellett (personal communiciation).
2.5.5 Batch effects analysis
We note that TCGA has strict sample criteria. Each frozen primary tumor specimen has a
companion normal tissue specimen which could be blood/blood components (including DNA
extracted at the tissue source site), or adjacent normal tissue taken from greater than 2 cm from
the tumor. Each tumor and adjacent normal tissue specimen (if available) were embedded in
optimal cutting temperature (OCT) medium and a histologic section was obtained for review. Each
H&E stained case was reviewed by a board-certified pathologist to confirm that the tumor
specimen was histologically consistent with colon adenocarcinoma and the adjacent normal
specimen contained no tumor cells. The tumor sections were required to contain an average of
60% tumor cell nuclei (TCGA has found that this provides a sufficient proportion so that the tumor
signal can be distinguished from other cells), with less than 20% necrosis for inclusion in the study
71
per TCGA protocol requirements. To address potential batch effects, we applied MBatch software,
which was developed by the MD Anderson Cancer Center and has been widely used to address
batch effects in TCGA Consortium (2012, TheCancerGenomeAtlas 2012), to perform hierarchical
clustering and Principal Component Analysis (PCA) to address any potential batch effects in the
colorectal TCGA data sets: level 3 mRNA expression (RNA-seq Illumina Hiseq), level 3 DNA
methylation (Infinium HM450K microarray), level 4 SNPs CNV by gene (GW SNP 6). We
assessed batch effects for two variables: batch ID and tissue source site. For hierarchical
clustering, MBatch uses the average linkage algorithm with 1 minus the Pearson correlation
coefficient as the dissimilarity measure. The samples were clustered after labeling with different
colors, each of which corresponds to a batch ID or a tissue source site. (Figures S2.6a.1,
S2.6b.1, and S2.6c.1). For PCA, MBatch plotted four principal components (Figures S2.6a.2,3,
S2.6b.2, 3, and S2.6c.2, 3). Samples with the same batch ID (or tissue source site) were labeled
as same color and shape and were connected to the batch centroids. The centroids were
computed by taking the mean across all samples in the same batch. To assess batch effects on
mRNA expression (Supplementary Fig. S2.6a), genes with zero values were removed and
normalized gene expression values were log2-transformed before analyzing batch effects. Batch
132 and 154 stood out in one comparison (Comp1 vs Comp2) but not in the other comparisons
(Suppl. Fig. 6a2). The remaining batches or tissue source sites did not stand out in clustering or in
any of the PCA plots; thus the data is not supportive of a strong batch effect and all data was
used for analysis. When batch effect on CNV (Supplementary Fig. S2.6b) was analyzed, the
centroid for the NH tissue source site stood out among other batches. The remaining batches or
tissue source sites did not stand out in clustering or in any of the PCA plots. We did not apply
correction on the data because (i) there were only two samples and a centroid calculated by only
two samples is likely not accurate, (ii) the two samples within the NH batch were not far from other
individual samples, and (iii) two samples would not dramatically affect our analysis of 233
72
samples. When assessing batch affects on DNA methylation analysis, no batches or tissue
source sites stood out in clustering or in any of the PCA plots. (Supplementary Fig. S2.6c). In
summary, none of the samples consistently show batch effects in both clustering and PCA
algorithms. Based on the above analysis, we believe that batch effects among the data sets are
not dramatically influencing our analysis.
2.5.6 eQTL analyses
We employed a two step linear regression model which considers somatic germline
genotype, copy number variation and DNA methylation at gene promoters to perform eQTL
analysis (Abba, Patil et al. 2013). We selected 228 patients with both tumor samples and
matched normal blood or normal tissue samples from the TCGA colorectal cancer data set. For
each of these patients, we obtained the germline genotypes from normal blood or normal tissue
samples using data from the GW SNP6 array platform. We directly downloaded gene-level
somatic copy number, gene isoform expression (from the RNAseqHiseq Illumina platform) and
DNA methylation data (from the HM450K platform) for each tumor sample from the TCGA data
download website
(http://gdac.broadinstitute.org/runs/analyses__2014_01_15/data/COAD/20140115/). To determine
DNA methylation of a promoter, we calculated the average DNA methylation at 100bp upstream
of and 700bp downstream of the TSS for a transcript. We fit the germline genotype of patients, the
continuous DNA methylation level of promoters, and the CNV of matched tumor samples into the
two steps multivariate linear regression model. 60 SNPs, including 6 tag SNPs, 18 SNPs within
risk enhancers, and 45 SNPs within TSS regions, were present on the GW SNP6 array. eQTL
analyses were performed using these 60 SNPs and the genes identified by exon or TSS SNPs or
by differential expression analysis (see Table 2.2 and Table 2.5). To reduce false positives, we
excluded genes showing log2 expression less than 2 in over 90% of the samples. The Benjamini-
73
Hochberg method was used to correct the original P value and FDR of 0.1 was used as the
threshold of significant association.
2.5.7 General Data Handling and Visualization
Throughout the analyses we used GenomicRanges to import, export and/or intersect
genomic data for plotting and annotation purposes; the R version 3.0.0 (2013-04-03) was used for
all statistical analyses, the R function ‘image’ was used for heatmap generation, and package
‘ggplot2’ was used to generate scatterplots. To generate the circle plot, Circos software was used
(Krzywinski, Schein et al. 2009). All genomic location information is based on hg19.
74
2.6 Supplementary figures for chapter2
Supplementary figure 2.1 Correlated exon SNPs.
Shown are correlated SNPs that fall within exons located within +/- 200 kb from each tag SNP.
The r2 value in relation to the tag SNP is shown on the x axis and the distance of the correlated
SNP from the tag SNP is shown on the y axis. The color indicates whether the exon is
contained within a protein coding (green), non-coding (red), or pseudogene (blue) transcript.
75
Supplementary figure 2.2 Analysis of RNA-seq data.
The top panels represent expression levels of coding (left panel) and non-coding (right panel)
RNAs in HCT116 and sigmoid colon samples. RNAs were divided into highly expressed (FPKM
higher than 2), modestly expressed (FPKM between 0.5 and 2) and not expressed FPKM less
than 0.5). The number of genes in each category are shown in the table.
76
Supplementary figure 2.3 Correlated TSS SNPs.
Shown are correlated SNPs that fall within TSS regions located within +/- 200 kb from each tag
SNP. The r2 value in relation to the tag SNP is shown on the x axis and the distance of the
correlated SNP from the tag SNP is shown on the y axis. The color indicates whether the TSS
regulates a protein coding (green), non-coding (red), or pseudogene (blue) transcript.
77
Supplementary figure 2.4 ChIP-seq peak analysis.
The top panels show the H3K27Ac peak height vs peak rank in the HCT116 (left) and sigmoid
colon (right) ChIP-seq datasets; blue indicates peaks present in both replicates, green indicates
peaks present in only one replicate, and red indicates peaks present only in the merged dataset.
The bottom panels show the H3K27Ac peak height vs peak rank in the HCT116 (left) and
sigmoid colon (right) ChIP-seq datasets; blue indicates peaks located in promoter proximal
regions and red indicates peaks located in distal regions.
78
Supplementary figure 2.5 Correlated enhancer SNPs.
Shown are correlated SNPs that fall within distal H3K27Ac regions located within +/- 200 kb
from each tag SNP. The r2 value in relation to the tag SNP is shown on the x axis and the
distance of the correlated SNP from the tag SNP is shown on the y axis. The color indicates
whether the enhancer is found only in normal sigmoid colon (blue), only in HCT116 (green), or
in both normal and HCT116 cells (red).
79
Supplementary figure 2.6 TCGA batch effects analysis. RNA-seq batch effects.
a.1) Hierarchical clustering plot for mRNA expression data. a.2) PCA for mRNA expression,
showing first and second components comparison, second and third components comparison,
and third and forth components comparison plots with samples connected by centroids according
to batch ID. a.3) PCA for mRNA expression, showing first and second components comparison,
second and third components comparison and third and forth components comparison plots with
samples connected by centroids according to tissue source site (TSS).
80
Supplementary figure 2.7 TCGA batch effects analysis. CNV (GW SNP6 array) batch
effects.
b.1) Hierarchical clustering plots for copy number variation (CNV) data. b.2) PCA for CNV,
showing first and second components comparison, second and third components comparison
and third and forth components comparison plots with samples connected by centroids
according to batch ID. b.3) PCA for CNV, showing first and second components comparison,
second and third components comparison and third and forth components comparison plots
with samples connected by centroids according to TSS.
81
Supplementary figure 2.8 TCGA batch effects analysis. DNA methylation (Infinium
HM450K microarray) batch effects.
c.1) Hierarchical clustering plot for DNA methylation data. c.2) PCA for DNA methylation,
showing first and second components comparison, second and third components comparison
and third and forth components comparison plots with samples connected by centroids
according to batch ID. c.3) PCA for DNA methylation, showing first and second components
comparison, second and third components comparison and third and forth components
comparison plots with samples connected by centroids according to TSS.
82
Supplementary figure 2.9 Expression analysis of genes identified by promoter and exon
SNPs and potential enhancer target genes in TCGA samples.
(a) Shown are the expression levels for the genes identified by exon or TSS SNPs in normal and
colorectal TCGA tumor samples. The heatmap was created by unsupervised clustering of
normalized gene expression, using 21 normal and 233 tumor colon tissues from TCGA. Each
row represents genes and each column is one of the normal (left panel) or tumor (right panel)
samples. (b) Shown are the expression differences (x axis) in the normal vs tumor samples and
the significance of the change (y axis) of the genes identified by promoter or exon SNPs. Genes
identified as differently expressed are those with an adjusted P value <0.01 and log2 expression
difference >1); red indicates genes with a higher expression in tumors and blue indicates genes
with a lower expression in tumors. (c) Shown are the expression levels of the differentially
expressed genes in the set of the 10 nearest 5’ and 10 nearest 3’ genes for each of the 28
enhancers in tumor and normal samples in normal and tumor TCGA samples; heatmap was
created as described in panel A. (d) Shown are the expression differences (x axis) in the normal
vs tumor samples and the significance of the change (y axis) of the 10 nearest 5’ and 10 nearest
3’ genes for each of the 28 enhancers. Genes identified as differently expressed are those with
an adjusted P value <0.01 and log2 expression difference >1); the colors refer to the 9 different
regions within which the 28 enhancers are located (see Table 2.3).
83
Supplementary figure 2.10 eQTL analysis summary.
(a) Distribution of promoter methylation of all genes in normal and tumor samples. (b)
Distribution of gene level copy number variation for all genes. (c) Shown are the subset of all
related SNP-gene pairs identified by eQTL analysis with a non-adjusted P-value <0.01. (d)
Shown is the expression level for the three genotypes for each SNP-gene pair, colored by the
copy number variable of the gene. (e) Shown is the expression level for the three genotypes for
each SNP-gene pair, colored by the DNA methylation level at the promoter.
84
3 Chapter 3: Effects on the transcriptome upon deletion of distal
elements are not correlated with the size of H3K27Ac peaks in
human cells
The following chapter is being under review as a research article entitled “Effects on the transcriptome
upon deletion of distal elements are not correlated with the size of H3K27Ac peaks in human cells” for
publication in the journal Genome biology. The authors will be Yu Gyoung Tak, Yuli Hung, Lijing Yao
,
,
Matthew R Grimmer, Albert Do, Mital S Bhakta, Henriette O’Geen, David J Segal, and Peggy J Farnham.
Yu Gyoung Tak created enhancer-deleted HCT116 and HEK293 clones, performed the RNA-seq, ChIP-seq,
and 4C-seq experiments, and performed analyses for the figures and PJF conceived of the overall project
design, directed the experimental analyses, and drafted the manuscript.
3.1 Abstract
Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms
(SNPs) associated with increased risk for colorectal cancer (CRC). A molecular understanding of
the functional consequences of this genetic variation is complicated because most GWAS SNPs
are located in non-coding regions. We used epigenomic information to identify H3K27Ac peaks in
HCT116 colon cancer cells that harbor SNPs associated with an increased risk for
CRC.Employing CRISPR/Cas9 nuclease, we deleted 2 CRC risk-associated H3K27Ac peaks
from HCT116 cells and analyzed effects on the transcriptome. Deletion of E7, which corresponds
to a small H3K27Ac peak, caused large-scale changes in gene expression, resulting in decreased
expression of many nearby genes. Deletion of E24, which corresponds to a large H3K27Ac peak,
caused changes in hundreds of genes, only one of which was within 1 Mb. As a comparison, we
showed that deletion of a robust H3K27Ac peak not associated with CRC had minimal effects on
85
the transcriptome. Interestingly, although there is no H3K27Ac peak in HEK293 cells in the E7
region, deletion of this region in HEK293 cells decreased expression of several of the same genes
that were downregulated in HCT116 cells, including the MYC oncogene. Accordingly, deletion of
E7 causes changes in cell proliferation in HCT116 and HEK293 cells. We show that the size or
presence of an H3K27Ac peak does not correlate with enhancer activity. We also show that
deletion of distal regulatory elements associated with CRC can have genome-wide effects on the
transcriptome.
3.2 Introduction
In our previous studies, we identified a set of 28 enhancers (defined as the presence of a
H3K27Ac peak located farther than +/- 2 kb from a transcription start site) that harbor single
nucleotide polymorphisms (SNPs) associated with an increased risk for colon cancer (Yao, Tak et
al. 2014). Our working hypothesis is that the different nucleotide sequence between the “risk-
associated” vs. “non risk-associated” SNPs affects activity of the enhancers, causing a change in
expression in genes (coding or non-coding) that can influence the balance between normal tissue
proliferation or differentiation vs. tumor initiation or progression. Enhancers are composed of
binding sites for many different site-specific DNA binding transcription factors (TFs) that are
thought to work in concert to provide cell type-specific functionality. For example, one of the first
characterized mammalian enhancers is the interferon beta enhanceosome, which is bound by 8
different TFs (Maniatis, Falvo et al. 1998, Panne 2008). Recent studies from the ENCODE
Project(2012) (ENCODE_Project_Consortium 2012) and the Roadmap Epigenome Mapping
Consortia (RoadmapEpigenomicsConsortium 2015) have identified hundreds of thousands of
enhancers, most of which include motifs for a variety of different TFs. The overall function of a
given enhancer is dependent upon several conditions, such as the number of motifs contained
within it, the extent to which the nucleotides within the enhancer match consensus binding motifs,
the expression level of the transcription factors that bind those motifs, and the location of the
86
enhancer with respect to chromatin boundaries. Because many TFs contribute to the overall
function of an enhancer, it is likely that single nucleotide changes within an enhancer will have
quite modest effects on the transcriptional output from a target promoter
(Corradin
and
Scacheri
2014). Although modest effects in gene expression could have strong phenotypic outcomes over
the course of a long time period, such as during tumor development, the consequences of a
single nucleotide change in an enhancer may be difficult to observe in short term cell culture
assays. Thus, rather than analyzing the effect of a single SNP, our approach is to determine the
functional role of the enhancer as a whole by identifying genes that are responsive to loss of the
enhancer in colon cancer cells. To identify target genes of CRC risk-associated enhancers, we
analyzed clonal populations of cells in which specific CRC risk-associated enhancers have been
deleted. For comparison, we also analyzed an enhancer not associated with CRC and a distal
region that lacks the H3K27Ac mark. Our results suggest that the size or presence of an
H3K27Ac peak does not correlate with effects on gene expression. We show that deletion of distal
regulatory elements associated with CRC can affect nearby genes and also have genome-wide
effects on the transcriptome.
3.3 Results
3.3.1 Deletion of CRC risk-associated enhancers can cause widespread effects on
the transcriptome
We have chosen to use HCT116 colon cancer cells for our analysis of CRC risk-associated
enhancers because of the availability of histone modification data (Frietze, Wang et al. 2012),
whole genome DNA methylation data (Blattler, Yao et al. 2014), and ChIP-seq TF binding sites
(ENCODE_Project_Consortium 2012) from these cells. Of the previously identified 28 CRC risk-
associated enhancers, 14 are detected in HCT116 cells. However, they are not necessarily
87
amongst the top ranked enhancers in HCT116 cells (Figure 3.1A). Interestingly, 12 of the 14
CRC risk-associated enhancers in HCT116 cells are located within the introns of 6 protein-coding
genes and 1 non-coding RNA, whereas 2 of the enhancers are intergenic. In HCT116 cells,
approximately 50% of all H3K27Ac enhancers are intronic (Blattler, Yao et al. 2014); it is not clear
if the high preponderance of intronic enhancers in the set of CRC risk-associated enhancers is
due to a bias involving the choice of index SNPs on the GWAS array or due to a biological or
functional reason. However, others have also observed that GWAS SNPs (and SNPs in high
linkage disequilibrium with GWAS index SNPs) are more enriched in introns than in intergenic
regions (Schork, Thompson et al. 2013). We selected 2 risk enhancers, which reside in an intron
of the non-coding RNA CASC8 (enhancer E7, identified by rs6983267) and in an intron of the
RHPN2 protein-coding gene (enhancer E24, identified by rs10411210) for targeted deletion
(Figure 3.1B). Ectopic expression of RHPN2 promotes an invasive phenotype in neural stem cells
and astrocytes and RHPN2 amplification correlates with a highly aggressive phenotype that directs
the worst clinical outcomes in patients with glioblastoma (Danussi, Akavia et al. 2013). Although
the function of CASC8 is not known, SNPs within the long non-coding RNA CASC8 have also
been implicated in increased risk for gastric and breast cancer (Ma, Gu et al. 2014). However,
enhancers can work at a long distance and in either orientation and thus it is not certain that the
gene in which an enhancer resides is in fact regulated by that enhancer (Sagai, Hosoya et al.
2005, Farnham 2009, Sanyal, Lajoie et al. 2012, Zhou, Katsman et al. 2014). For example, in a
ChIA-PET study using an antibody for RNA polymerase II, Li et al. (Li, Ruan et al. 2012) identified
~20,000-30,000 enhancer-promoter loops in MCF7 or K562 cells. Of these, more than 40% of the
enhancers skipped over the nearest gene to loop to a farther one. Of course, it is not been proven
that all loops represent bona fide regulatory interactions or that all enhancer-mediated regulation
involves looping (Yao,
Berman
et
al.
2015). However, these studies indicate that it is not a
certainty that the genes in which the enhancers reside are the ones that should be linked to an
88
increased risk for colon cancer. To gain further insight into the mechanisms by which the CRC
risk-associated enhancers may influence tumor development, we analyzed gene expression
changes in clonal populations of cells lacking each of these enhancers.
Figure 3.1 Genomic location and H3K27Ac profiles of E7, E24, 18qE, and 18qNE.
(A) Shown is a graph representing HCT116 H3K27Ac peaks, as identified using Sole-search (Blahnik,
Dou et al. 2010, Blahnik, Dou et al. 2011) to analyze the HCT116 H3K27Ac ChIP-seq data; all 35,932
peaks were plotted based on peak height (Y axis) vs. peak rank (X axis). The locations of the 14 CRC
risk-associated enhancers present in HCT116 cells are shown (yellow circles), with the larger red circles
indicating the CRC risk-associated enhancers deleted in this study. Also shown is the location of the
H3K27Ac peak called 18qE (blue circle); the green circle indicates that the 18qNE region was not called
as a peak. (B) For the 4 distal regions deleted in this study, the genomic location, the H3K27Ac pattern in
HCT116 cells, the combined H3K27Ac pattern in multiple cell types (the ENCODE Regulation Layered
H3K27Ac track) and TF binding data (the ENCODE Regulation Txn Factor ChIP track combines ENCODE
ChIP-seq data from many different transcription factors and cell lines in a relatively dense display) from
the UCSC genome browser are shown; the boxes indicate the approximate region that was deleted.
89
Figure 3.2 Experimental schema.
The overall approach used to analyze the function of an enhancer is shown, beginning with design of
guide RNAs used to guide Cas9 to the targeted enhancers and ending with RNA-seq analysis of control
and enhancer-deleted cells.
An overview of the strategy used to determine the function of the enhancers is shown in
Figure 3.2. Guide RNAs were designed that flank the H3K27Ac marks of the selected enhancers;
see Supplementary figure 3.1 for the genomic location and the sequences of all guide RNAs
used in this study. Plasmids expressing the guide RNAs and Cas9-GFP were transfected into
target cells, 48 hours later cells were sorted for high GFP expression, and colonies were grown
from single cells. PCR analysis using a set of primers that flank the targeted genomic region and
a set of primers internal to the targeted region were used to identify clonal lines having deletion of
the targeted region (Supplementary figure 3.2). In general, less than 10% of the clones tested
after sorting for high GFP levels showed deletion of all alleles. After confirming that the targeted
90
region had been deleted, RNA-seq was performed to identify gene expression differences
between the cells with enhancer deletion vs. the control cells. For analysis of the CRC-associated
enhancers, we prepared control samples from four different clonal populations of FACS-selected
HCT116 cells that had been transfected with Cas9-GFP plus the guide RNA vector lacking
inserted guide RNAs. Each control clone was analyzed using RNA preps prepared from cell
cultures grown on different days. For enhancer-deleted cells, we also prepared triplicate RNA
samples from cells grown on the same days as the control clones, and sequenced matched
control and deleted samples in the same lane of a sequencer to prevent batch effects.
Information concerning all RNA-seq data sets analyzed in this study can be found in
Supplementary figure 3.3, quality control plots for the RNA-seq data can be found in
Supplementary figure 3.4, and a list of upregulated and downregulated genes for all enhancer
deletions can be found in Supplementary data file 3.1.
The E7 CRC risk-associated enhancer harbors a GWAS CRC index SNP and resides near,
but not within, a region that contains a large number of super-enhancers (see Figure 3.3). Due to
the small size of the H3K27Ac peak (Figure 3.3C), we initially thought that most genes in the
region would be controlled by the super-enhancers and not by E7. Surprisingly, we found that
many genes within +/- 5 Mb from E7 were downregulated upon deletion of the enhancer (Figure
3.3A). Specifically, 5 genes were downregulated within +/- 1 Mb of E7, including the non-coding
CASC8 RNA (in which E7 resides), which was reduced 1.5 fold. Of note, deletion of E7 also had
effects on genes that were not nearby, causing the up- or downregulation of more than 1000
genes in the genome, most of which were on other chromosomes (Table 3.1). In fact 25 genes
were reproducibly downregulated more than 5-fold in the E7-deleted cells; a list of all
downregulated genes in E7-deleted cells is found in Supplementary data file 3.1 and the
chromosomal location of the top 25 downregulated genes is shown in Supplementary figure 3.5.
We also noted that deletion of the small E7 enhancer resulted in loss of H3K27Ac peaks at the
91
edge of the nearby super-enhancer (Figure 3.3C) and at another super-enhancer located ~2 Mb
from E7; patterns of TCF7L2 closely match the H3K27Ac patterns in control and deleted cells
(Supplementary figure 3.6).
Figure 3.3 Gene expression and H3K27Ac changes upon deletion of E7 in HCT116 cells.
(A) The Y-axis shows log2 fold changes in gene expression in the E7-deleted cells throughout a +/- 5
Mb region from E7, with genes showing decreased expression upon enhancer deletion having negative
numbers and genes showing increased expression upon enhancer deletion having positive numbers.
(B) Shown are the H3K27Ac patterns in independent replicates of control and enhancer-deleted cells
within an ~3 Mb region near the E7 enhancer (located within the purple box, which is shown in an
expanded view in panel C). Also shown is the H3K4me3 pattern to identify promoter regions. The Y-
axis indicates the peak height and the X-axis indicates the genomic location; only those genes
significantly downregulated in the deleted cells are indicated in red below the H3K4me3 track. (C) The
H3K27Ac patterns in control and deleted cells are shown for the region nearby E7 (red arrow), showing
loss of the E7 H3K27Ac peak, as well as loss of H3K27Ac signal at the right side of the nearby large
enhancer region.
We next analyzed the effects of deletion of E24, which not only harbors a GWAS index
SNP associated with increased risk for CRC but is also the only robust enhancer within a 3.5 Mb
region that includes 76 expressed genes (note that most H3K27Ac peaks near E24 are promoter
92
regions, as indicated by the H3K4me3 track; Figure 3.4). We thought that this robust enhancer
would be responsible for regulation of several of the genes in the nearby region. Surprisingly, we
found that no genes nearby E24 reproducibly showed a greater than 1.5 fold change in
expression when E24 was deleted (Figure 3.4A). Thus, most genes within this region are either
not regulated by an enhancer or are regulated by an enhancer located quite far away. However,
the gene in which the enhancer resides (RHPN2) almost met the cut-off, showing a 1.44 fold
decrease upon deletion of E24. Deletion of E24 caused a similar reduction in levels of UBA2,
which is more than 1 MB from E24. Similar to the results observed in E7-deleted cells, more than
one hundred genes far from E24 or on other chromosomes showed changes in expression in the
E24-deleted cells (Table 3.1); a list of all downregulated genes in E24-deleted cells is found in
Supplementary data file 3.1 and the chromosomal location of the top 25 downregulated genes is
shown in Supplementary figure 3.5.
93
Figure 3.4 Gene expression and H3K27Ac changes upon deletion of E24 in HCT116 cells.
. (A) The Y-axis shows log2 fold changes in gene expression in the E24-deleted cells throughout a +/- 5
Mb region from E24, with genes showing decreased expression upon enhancer deletion having
negative numbers and genes showing increased expression upon enhancer deletion having positive
numbers. (B) Shown are the H3K27Ac patterns in independent replicates of control and enhancer-
deleted cells within an ~3.5 Mb region near the E24 enhancer (located within the purple box, which is
shown in an expanded view in panel C). Also shown is the H3K4me3 pattern to identify promoter
regions. The Y-axis indicates the peak height and the X-axis indicates the genomic location; only those
genes significantly downregulated (red) or upregulated (blue) in the deleted cells are indicated below
the H3K4me3 track. (C) The H3K27Ac patterns in control and deleted cells are shown for the region
nearby E24 (red arrow), showing loss of H3K27Ac only at the E24 H3K27Ac peak.
94
Table 3.1 Altered gene expression upon enhancer deletion.
Enhancer
Total,number,of,
up,regulated,
genes,(>1.5,FC)
Upregulated,
genes,+/@,1,Mb
Total,number,of,
downregulated,
genes,,(>1.5,FC)
Downregulated,
genes,+/@,1,Mb,
Putative,direct,
targets,
Average,
RPKM,in,
Controls
Fold,
Change
Distance,from,
enhancer,to,
TSS
FAM84B 7.45 1.50 ,843,000
CCAT1 2.79 1.78 ,182,000
CASC8 2.33 1.55 81,000
MYC 112.3 2.31 335,000
PVT1 7.96 1.59 395,000
E7,,,,,,,,,,,,,,,,,,,,
(HEK293)
166 0 295 1 MYC 28.5 5.10 335,000
E24 137 2 189 1 RHPN2 9.73 1.44 47,400
18qE 13 0 3 0 N/A N/A N/A N/A
18qNE 14 0 3 0 N/A N/A N/A N/A
E7 590 0 565 5
The number of genes having >1.5 fold increase or decrease in expression in the genome and within +/-
1 Mb after deletion of each distal region is indicated. Also shown are the names of the putative direct
target genes, the average RPKM values for these genes in the control cells, the fold decrease in the
enhancer-deleted cells, and the distance of these genes from the distal regions. Note that RHPN2, the
gene in which E24 resides, was only decreased 1.44 fold but is shown here because no other nearby
genes were downregulated.
3.3.2 The size or presence of an H3K27Ac peak does not correlate with enhancer
activity
As an approach to understanding the genome-wide changes in gene expression seen upon
deletion of E7 and E24, we also deleted a strong H3K27Ac peak on chromosome 18 (termed
18qE for “enhancer on chromosome 18q”). The 18qE region has a more robust H3K27Ac peak in
HCT116 cells than does E7 (Figure 3.1A) and is covered by H3K27Ac in many cell types (see the
layered H3K27Ac track in Figure 3.1B). Similar to E7 (Figure 3.5), 18qE is bound by TCF7L2, a
site-specific transcription factor that has been implicated as the downstream regulator of the WNT
pathway that drives the development of colon cancer (Bright-‐‑Thomas
and
Hargest
2003,
Clevers
2004,
Clevers
2006,
Segditsas
and
Tomlinson
2006,
Gaddis,
Gerrard
et
al.
2015) and
by CTCF, a multi-functional protein that is thought to be involved in gene regulation and
chromosomal structure (Splinter,
Heath
et
al.
2006,
Holwerda
and
de
Laat
2013,
Ong
and
Corces
2014,
Narendra,
Rocha
et
al.
2015,
Vietri
Rudan,
Barrington
et
al.
2015). However, we
95
note that the binding of TCF7L2 and CTCF is not as strong at 18qE as at E7. For comparison, we
also deleted another region on chromosome 18 (termed 18qNE for “no enhancer on chromosome
18q”) that does not have H3K27Ac or TCF7L2 binding but is bound by CTCF. Interestingly, we
found that deletion of the large 18qE enhancer had little effect on gene expression, causing
changes similar to those seen when the 18qNE region (which completely lacks H3K27Ac) was
deleted (Table 3.1).
Figure 3.5 H3K27Ac, TCF7L2, and CTCF binding profiles of E7, E24, 18qE, and 18qNE.
Shown are the H3K27Ac, TCF7L2, and CTCF binding patterns at (A) the E7 enhancer, (B) the E24
enhancer, (C) 18qE, an intergenic H3K27Ac peak on chromosome 18, and (D) 18qNE, an intergenic
region on chromosome 18 that lacks the H3K27Ac mark. The Y-axis indicates the peak height and the
X-axis indicates the genomic location; the boxes indicate the region deleted by the CRISPR/Cas9
method.
The differential effects on the transcriptome seen when E7 vs. 18qE was deleted from
HCT116 cells suggest that the presence of a robust H3K27Ac peak might not reflect enhancer
activity, but that other, as of yet unknown characteristics of the region may be critical.
Interestingly, although many cell types do show a strong H3K27Ac peak in the E7 region (see
Figure 3.1B), there is no H3K27Ac peak in HEK293 cells (Figure 3.6A). However, there is a very
strong CTCF binding site at E7 in HEK293 cells. To compare the effects of deleting a genomic
region that is marked by H3K27Ac in one cell type but not in another, we deleted E7 in HEK293
cells. We found that this region functioned as an enhancer in HEK293, causing changes in
96
hundreds of genes, including MYC, which showed a robust 5-fold decrease in expression (Figure
3.6B and Table 3.1); a list of all downregulated genes in HEK293 E7-deleted cells is found in
Supplementary data file 3.1 and the chromosomal location of the top 25 downregulated genes is
shown in Supplementary figure 3.5. Thus, the presence of an H3K27Ac signal is not required
for enhancer activity.
Figure 3.6 Gene expression and H3K27Ac changes upon deletion of E7 in HEK293 cells.
(A) Shown are the H3K27Ac, CTCF, and TCF7L2 binding patterns at E7 in HEK293 cells. The Y-
axis indicates the peak height and the X-axis indicates the genomic location; the boxes indicate the
deleted region. (B) The Y-axis shows log2 fold changes in gene expression in the E7-deleted cells
throughout a +/- 5 Mb region from E7, with genes showing decreased expression upon enhancer
deletion having negative numbers and genes showing increased expression upon enhancer
deletion having positive numbers; only those genes significantly downregulated in the deleted cells
are indicated in red below the H3K4me3 track.
97
3.3.3 Characterization of the genome-wide changes in gene expression upon
enhancer deletion
As noted above, in addition to changes in expression of nearby genes, we found that
deletion of the E7 and E24 enhancer regions caused changes in expression of hundreds of other
genes located throughout the genome. To alleviate concerns that these changes were due to
specific methods of differential gene expression analysis and/or to artifacts due to the
CRISPR/Cas9 technology, we performed several different follow-up experiments. First, we
compared the effects on the transcriptome upon deletion of E7 vs. E24 in HCT116 cells. Although
deletion of each enhancer causes hundreds of genes to change in a statistically significant,
reproducible manner, the patterns of gene expression changes were unique to each enhancer.
For example, we found that of the hundreds of downregulated genes, only 9 genes were
commonly downregulated more than 2-fold in the E7- vs E24-deleted cells (Supplementary data
file 3.1). In addition, as described in detail in the methods section, all differential gene expression
analysis was performed using a set of control clones that were selected in the same way as were
the clones of enhancer-deleted cells (i.e. sorting via FACs for single colonies after transfection
with Cas9-GFP and the guide RNA vector). Also, as described above, we show that deletion of
other genomic regions does not lead to genome-wide changes in gene expression; the 18qE and
18qNE clones have essentially identical gene expression profiles as do the control clones,
indicating that the robust changes in gene expression observed when E7 and E24 were deleted
were not general effects due to the CRISPR/Cas9 technology. In addition, we note that we used
a different set of guide RNAs to delete E7 in HCT116 vs. HEK293 cells. In both cases, we
observed widespread changes in gene expression and identified MYC as a target gene. The use
of two different sets of guide RNAs alleviates concerns that the effects on gene expression are
due to specific off target effects of the guide RNAs targeting E7.
98
As another approach to characterizing the effects of deletion of specific enhancers, we
performed a genome-wide analysis of H3K27Ac in the E7- and E24-deleted cells. We analyzed
the H3K27Ac patterns within +/- 100 Kb of upregulated and downregulated genes in the E7- and
E24-deleted cells. We found that many of the downregulated genes in E7-deleted cells
(Supplementary figure 3.7A) and in E24-deleted cells (Supplementary figure 3.7C) had
decreased H3K27Ac peaks nearby. In contrast, many of the upregulated genes in E7-deleted
cells (Supplementary figure 3.7B) and E24-deleted cells (Supplementary figure 3.7D) had
increased H3K27Ac peaks nearby; see Supplementary data file 3.2 for the differential H3K27Ac
peak analysis. For example, in E7-deleted cells there was a large downregulation of DPEP1 and
ANXA10 mRNA, with concomitant decrease in the H3K27Ac marks near those genes
(Supplementary figure 3.7A). In contrast, the H3K27Ac pattern was increased near upregulated
genes such as FGF9 and VSNL1 (Supplementary figure 3.7B). Similarly, in E24-deleted cells
there was a large downregulation of ARHGAP29 and NR2F1, with concomitant decreases in the
H3K27A marks (Supplementary figure 3.7C), whereas WNT11 and APOBEC3F showed
increases in RNA and increases in nearby H3K27Ac marks (Supplementary figure 3.7D). The
concomitant changes in expression and H3K27Ac patterns support the conclusion that the genes
identified as deregulated were not identified simply due to the method of RNA analysis.
3.3.4 Cell growth is affected by deletion of enhancer E7
Deletion of E7 in both HCT116 cells and in HEK293 cells caused decreased expression of
MYC, a well-characterized oncogene (Diolaiti, McFerrin et al. 2014, Gabay, Li et al. 2014,
McKeown and Bradner 2014). Therefore, it is possible that deletion of the E7 enhancer affected
cell proliferation via downregulation of MYC protein. To test this hypothesis, we performed cell
proliferation and colony forming assays in control and E7-deleted HCT116 and HEK293 cells.
HCT116 control and E7-deleted cells were plated at two different densities and grown for 72 or 96
hours. Cell proliferation (Figure 3.7A) and colony formation (Figure 3.7B) is reduced in E7-
99
deleted cells, as compared to control cells. As shown in Supplementary figure 3.8, E7-deleted
cells also have morphological changes and increased contact inhibition. The HEK293 cells in
which the enhancer was deleted showed drastic changes in cell morphology, suggesting that the
ability to form large colonies was affected upon enhancer deletion (Figure 3.7C and
Supplementary figure 3.8). Thus, deletion of the region containing rs6983267 caused down-
regulation of MYC, large changes in the transcriptome, and changes in cell proliferation in both
HCT116 and HEK293 cells, despite the fact that the epigenomic profile of E7 is quite different in
the two cell types.
100
Figure 3.7 Proliferation is affected by deletion of enhancer 7.
(A) Shown are the relative cell numbers 72 hr after plating two different control clones or the E7 deleted
clone at 1 x10e4 cells (top panel) or 96 hr after after plating two different control clones or the E7
deleted clone at 2.5 x 10e3 cells (bottom panel). The Y-axis indicates relative cell numbers from an
average of 4 replicate wells, as measured by absorbance of products of the WST-1 assay. Student’s t-
test was used for significance and * indicates a p-value less than 0.005. (B) Shown are colony forming
assays for control and E7-deleted HCT116 cells. 1,500 cells were seeded per well and after two weeks
colonies were stained with crystal violet. At least 3 replicates were performed; additional replicates are
shown in Supplementary figure 3.10. (C) Shown are colony forming assays for control and E7-deleted
HEK293 cells. 6,500 cells were seeded per well and after two weeks colonies were stained with crystal
violet. At least 3 replicates were performed; additional replicates are shown in Supplementary figure
3.10.
3.3.5 4C analysis of E7 and E24 in HCT116
Recent studies have begun to use looping assays to predict target genes of enhancers
(Yao,
Berman
et
al.
2015).
Using the E7 enhancer as a bait, we performed 4C analyses and
observed extensive chromosomal interactions within a 100 Kb region near E7 (Figure 3.8A; see
101
Supplementary figure 3.9 for genomic coordinates of baits and primer sequences). However, the
MYC promoter is more than 330 Kb from the E7 enhancer and few interactions were detected at
this location. Others have suggested that using a promoter as bait may provide more meaningful
4C results (van
de
Werken,
Landan
et
al.
2012).
For example, others have shown that when
more than one alternative enhancer around HS-26 is engaged with the α1 gene promoter
(possibly in a mutually exclusive fashion), the resulting window statistics are higher when
assaying from the promoter toward the enhancer region than when assaying from any particular
enhancer in the group of enhancers toward the promoter. Because E7 is present nearby many
very large enhancers, it is possible that it may be easier to detect promoter-to-enhancer region
interactions than interactions between a specific enhancer within that region and the MYC
promoter. Thus, we also used the MYC promoter as bait in a 4C assay. It should be noted that
the plotted contact intensities represent relative 4C-seq read counts for 5 Kb genomic windows
and reciprocal peak intensities observed when using the promoter vs. the enhancer as a bait do
not need to be identical. We found high interactions around the MYC promoter as well multiple
interactions in a 500 Kb region spanning from the MYC promoter to just past E7; this large region
is bounded by large H3K27Ac peaks on either side. Interestingly, the major peak detected by 4C
using the MYC promoter as a bait is the same major peak detected using E7 as bait, suggesting
that perhaps E7 and MYC are brought together via common interaction with a H3K27Ac marked
region 100 Kb from E7. We also used E24 as a bait in 4C assays and found multiple interactions
of E24 with an ~100 Kb region on either side of the enhancer, encompassing the RHPN2 gene
(Figure 3.8B).
102
Figure 3.8 4C analysis of enhancer E7 and E24.
Shown are the 4Cseq profiles of HCT116 cells for (A) a bait near the promoter region of MYC and a bait
at the E7 region and (B) a bait at the E24 region. Grey dots are normalized contact intensities within a
1Mb genomic range and the black line is the median of normalized intensities for 5 Kb running windows
with the 20
th
and 80
th
percentile indicated as a grey band. The medians of normalized contact intensities
of different sizes of windows from 2 Kb to 50 Kb were color-coded ranging from 1 (red) to 0.001 (grey).
Significantly downregulated genes are shown above the 4C profiles. Also shown are the H3K27Ac,
CTCF, and TCF7L2 tracks for HCT116 cells, along with the ENCODE TF binding track.
3.4 Discussion
Employing CRISPR/Cas9 nucleases, we deleted 2 CRC risk-associated enhancers (E7 and
E24) from the genome of HCT116 colon cancer cells and analyzed effects on the transcriptome
and epigenome. In each case, we found widespread changes in expression and H3K27Ac
patterning in the enhancer-deleted cells. However, deletion of a robust H3K27Ac peak not
associated with CRC did not cause similar changes in gene expression. Interestingly, deletion of
the E7 region encompassing the CRC-associated risk SNP in HEK293 cells, at which there is no
103
detectable H3K27Ac peak, also caused widespread changes in gene expression. Finally, we
show that deletion of the E7 region affects gene expression and cell proliferation in both HCT116
and HEK293 cells, likely due to downregulation of the MYC oncoprotein.
Large-scale GWAS efforts have identified sets of SNPs associated with a particular disease,
such as colon cancer. If a SNP falls within an exon and changes the coding potential of that
gene, investigators have suggested that the identified gene is likely to be associated with
increased risk for that disease. However, most GWAS-identified SNPs do not fall within exons and
thus the mechanism by which these SNPs might affect disease has not been clear (Coetzee, Jia
et al. 2010, Freedman, Monteiro et al. 2011). Recent studies have shown that many of the GWAS
index SNPs and SNPs in high LD with the GWAS index SNPs fall within regulatory elements such
as promoters and enhancers. In the case of the SNPs falling close to transcription start sites, it
has been assumed that they cause changes in expression of the gene regulated by that promoter.
In the case of SNPs falling within enhancer regions (as defined by the H3K27Ac mark), it is more
difficult to understand how the SNP affects gene expression because enhancers can work at a
great distance, in either orientation, and often skip over the nearest TSS to regulate genes farther
away (Li, Ruan et al. 2012, Sanyal, Lajoie et al. 2012). We reasoned that precise deletion of an
enhancer should identify genes regulated (directly and indirectly) by that enhancer, with nearby
downregulated genes being possible direct target genes. As a test of this experimental approach,
we deleted 2 CRC risk-associated enhancers from the human genome. In each case, we found
that deletion of the CRC risk-associated enhancer caused reproducible changes in expression of
at least one gene within +/- 1 MB of the enhancer. It is important to note that a promoter can be
regulated by several different enhancers; if a gene is regulated equally by two enhancers, then a
50% drop in RNA levels is what would be expected upon deletion of one of the enhancers. Most
of the nearby genes that were downregulated upon enhancer deletion were expressed at 20-65%
of their expression level in control cells, suggesting that the enhancers contributed to expression
104
of these genes, but were not the sole regulatory elements controlling the activity of the target
promoters.
We note that deletion of the CRC risk-associated enhancers not only affected nearby
genes but also affected the regulation of hundreds of other genes, some on the same
chromosome but very far from the enhancer and still more located on other chromosomes. There
have not yet been sufficient studies to know how far a gene can be from an enhancer (or even if it
can be on a different chromosome) and still be a “direct” target. One approach that is used to
identify direct targets is to use 3-dimensional looping assays. A recent study showed that 57% of
enhancer-promoter loops span more than 100 Kb (Jin, Li et al. 2013). Accordingly, our 4C studies
identified interactions that spread between the MYC promoter and a 500 Kb region encompassing
E7, suggesting that the MYC promoter is a direct target of E7 even though it is quite far away.
These results are supported by a previous study in which the E7 enhancer region was shown to
interact with the MYC and PVT1 promoters in colon cancer cells (Pomerantz, Ahmadiyeh et al.
2009, Wright, Brown et al. 2010). Studies in other cell types have also provided evidence that E7
interacts with the MYC gene (Ahmadiyeh, Pomerantz et al. 2010, Dryden, Broome et al. 2014, Du,
Yuan et al. 2015, Jager, Migliorini et al. 2015). Interestingly, the major peak we detected by 4C
using the MYC promoter as a bait is the same major peak detected using E7 as a bait, suggesting
that perhaps E7 and MYC are brought together via common interaction with an H3K27Ac-marked
region 70 Kb from E7. We also observed a reduction in the lncRNA CCAT1 in the E7-deleted
HCT116 cells. Others have suggested that reduction of CCAT1 can reduce long-range
interactions between the MYC promoter and its enhancers (Xiang,
Yin
et
al.
2014). Thus, it is
possible that direct downregulation of CCAT1 (which is located 182 Kb from E7) caused an
indirect downregulation of MYC. However, looping assays should be interpreted with caution, as
different assays give different results (Williamson, Berlivet et al. 2014). Also, it is not yet known if
105
all loops represent functional enhancer-promoter interactions or if some loops may play other
roles, such as structural components of chromatin (Yao,
Berman
et
al.
2015).
It is likely that most of the altered (indirect) gene regulation in the E7-deleted cells is
initiated by reduction of MYC RNA, followed by changes in the MYC regulatory network.
Deregulated expression of MYC expression is a hallmark feature of many types of cancer and
frequently predicts a poor patient outcome (Conacci-Sorrell, McFerrin et al. 2014, Gabay, Li et al.
2014). Accordingly, there is a strong push to develop therapeutic inhibitors of MYC function
(McKeown and Bradner 2014). Perhaps inactivation of cell type-specific enhancers that regulate
MYC expression is an alternate to developing chemotherapeutic agents that target MYC itself.
Interestingly, we have shown changes in cell proliferation upon deletion of E7 in both HCT116 and
in HEK293 cells. Similarly, Sur et al. deleted a region homologous to E7 in the mouse genome
and observed slight changes in Myc expression in the colon, but a significant change in tumor
formation in the colons of Apc
min
mice (Sur, Hallikas et al. 2012). In addition, we have observed
that E7-deleted cells are very difficult to trypsinize from the cell culture plates (data not shown).
Others have previously reported that MYC is involved in suppressing cell adhesion molecules
(Coller, Grandori et al. 2000) and we find that many integrin molecules are significantly
upregulated at least 2-fold in E7-deleted cells (e.g. ITGA3, ITGA4, ITGA5, and ITGA6 in HCT116
and ITGA3, ITGA7, and ITGAV in HEK293; see Supplementary data file 3.1). These molecules
are not upregulated in the other deleted cells. We note that both MYC and PVT1 RNAs are
downregulated in HCT116 cells after deletion of E7. Tseng et al. (Tseng, Moriarity et al. 2014)
showed that PVT1 RNA levels and MYC protein expression correlate in primary human tumors
and, in HCT116 cells, loss of PVT1 results in reduced MYC protein, reduced proliferation, and
impaired colony formation in soft agar. Therefore, the reduction in PVT1 RNA that occurs upon
deletion of E7 may cause an even larger effect on MYC protein levels, and more effects on the
MYC regulatory network, than would be expected by the changes in MYC RNA alone.
106
We do not yet have a complete understanding of the regulatory networks affected in E24-
deleted cells. Our 4C experiments detected interactions within a 100 Kb region around E24 that
did encompass the RHPN2 TSS, which is 50 Kb from E24, and thus RHPN2 may be a direct
target of E24. RHPN2 is a Rho GTPAse that may be involved in the organization of the actin
cytoskeleton. As noted above, high expression of RHPN2 has been associated with various
cancers (Danussi, Akavia et al. 2013) and it is possible that altered regulation of the cytoskeleton
by reduction of RHPN2 could initiate changes in regulatory networks, causing the wide-spread
changes in gene expression that we observed in E24-deleted cells. We also note that UBA2, also
known as SAE2, is the next nearest down-regulated gene. UBA2 is involved in sumolyation of
proteins and a recent study has shown that cells overexpressing MYC have proliferation defects
when UBA2/SAE2 levels are reduced (Kessler,
Kahle
et
al.
2012). The authors suggest that
UBA2/SAE2 enables cells to tolerate the MYC hyperactivation that occurs in tumors and propose
that altering distinct subprograms of MYC-mediated transcription by SAE2 inactivation could be
exploited as a therapeutic strategy in MYC-driven cancers. Thus, reduction of UBA2 could lead to
changes in expression of many genes located throughout the genome. Although, our 4C results
using E24 as bait did not identify interactions near UBA2, this could be because the distance to
UBA2 is well outside of the distance thought to be detectable by 4C (van
de
Werken,
Landan
et
al.
2012). Most 4C-seq analyses have good reproducibility within 500 Kb from the bait because
high coverage near bait regions results in statistically robust contact profiles compared to far away
regions on the same chromosome or regions on other chromosomes where reproducibility is
lower because of less coverage
(Raviram,
Rocha
et
al.
2014). Therefore, due to limitations of the
4C-seq assay, we cannot rule out direct targets that are even further away than UBA2 and/or on
other chromosomes; we note that many transcription factors are up- or downregulated in the E24-
deleted cells and any of these could be responsible for altering a regulatory network.
107
We have used the CRISPR/Cas9 technology to delete 2 enhancers that harbor SNPs
associated with an increased risk for CRC. We show that loss of these enhancers causes
changes in expression of hundreds of genes. However, we also show that the size of the
H3K27Ac peak does not necessarily correlate with the effect of a distal element on gene
regulation. Our results are supported by a recent study in which 2000 predicted enhancers were
analyzed for activity in a reporter assay. The investigators found that enhancer fragments having
“weaker” H3K27Ac signals can drive expression as well as, if not better than, enhancer fragments
having “stronger” H3K27ac signals (Kwasnieski, Fiore et al. 2014). Clearly, further experiments
are required before the activity of distal regulatory elements can be accurately predicted.
3.5 Methods
3.5.1 Cell culture
The human cell lines (control and enhancer-deleted versions) HCT116 (ATCC #CCL-247) and
HEK293 (ATCC #CRL-1573) were grown at 37°, in 5% CO2 in Dulbecco’s Modified Eagle
Medium (DMEM) with 10% fetal bovine serum and 1% penicillin and streptomycin.
3.5.2 CRISPR/Cas9-mediated genome editing
The guide RNAs (gRNAs) flanking the target enhancer regions were designed using a website
tool (http://crispr.mit.edu), avoiding repeat regions in the hg19 genome. After identification of a
potential guide RNA, the 16-17 nucleotide region including the PAM sequence (NGG) was
BLASTed against the hg19 genome to confirm that it was unique in the genome; all guide RNAs
used in this study also did not have a 1 mismatch sequence in the human genome. The target
DNA sequences of the gRNAs used in this study are listed in Supplementary figure 3.1. The 100
bp oligonucleotides containing the gRNA sequences were inserted into the gRNA Empty Vector
(Addgene, catalog#41824) according to the gRNA synthesis protocol (Mali, Yang et al. 2013).
108
The sequences of gRNA expression plasmids were confirmed through Sanger sequencing. To
delete an enhancer, two gRNA plasmids and a plasmid expressing Cas9-GFP (Addgene, catalog
#44719) were transiently transfected into HCT116 or HEK293 cells in a 6-well plate with a Cas9:
gRNAs molar ratio of 1:22) using Lipofectamine 3000 (Life technologies, catalog #3000008).
Genomic DNA was extracted from the transfected cells 48 hours post transfection using the
QIAamp DNA mini kit (Qiagen, catalog# 51306). Subsequently, PCR using primers flanking the
target enhancer regions was performed to check the deletion efficiency. Once enhancer deletion
was confirmed in a pool of transfected cells, cells with high GFP expression were identified using
fluorescence-activated cell sorting with the Aria II cell sorter (BD Biosciences). Sorted cells were
plated into individual wells of a 24 well plate and then re-plated as single cells in 10cm dishes and
subsequently expanded for further analyses.
3.5.3 PCR detection of cells having enhancer deletions
Clonal colonies from the 10cm dishes were transferred and passaged into 24-well plates. When
cells are 80-90% confluent, genomic DNA was isolated from each clone using the QIAamp DNA
mini kit (Qiagen, catalog# 51306) and tested for deletion by PCR using GoTaq Green Master Mix
(Promega, catalog#M712) and primers flanking the target enhancer region. For those colonies
that showed loss of the enhancer region, a second PCR was performed using primers that should
detect the inner portion of the enhancer. In general, less than 10% of the clones tested after
sorting for high GFP levels showed biallelic deletion. For the CRC-associated enhancers,
H3K27Ac ChIP-seq data was also used to confirm loss of the enhancer signal (Supplementary
figure 3.2).
3.5.4 RNA-seq
RNA samples were collected from cells using Trizol (Life Technologies, catalog #15596018)
from clonal population. To remove batch effects, matched controls and deleted samples were
109
plated with similar confluency and harvested at the same time, RNA was extracted at the same
time, RNA libraries were prepared at the same time, and barcoded libraries were pooled and
sequenced together (see Supplementary figure 3.3). All RNA libraries were prepared using the
Illumina TruSeqV2 Sample Prep Kit (catalog #15596-026), starting with 1 µg total RNA. ERCC
spike-in mix (Thermo Fisher, catalog # 4456740) was added when the libraries were made so that
quality assessment could be performed on the RNA-seq data, which allowed the removal of
outliers caused by technical variations. Libraries were sequenced on a Nextseq500 with 75 bp
single reads. Raw reads were trimmed using the Quality Score method (minimal quality score 20,
minimal read length 25, trimming from both ends)
and mapped to hg19 (Ensembl 72) using
Tophat2 (Trapnell, Pachter et al. 2009) installed in the Partek Flow version 3 program (Partek
Inc., St Louis, MO, USA). A matrix of raw fragments counts for each gene was generated from
alignment files using HTSeq python package [53] with Genecode V19 annotation and these
counts were used for differential gene expression analysis using an edgeR program [54]. For the
group of large datasets (Set A; see Supplementary figure 3.3), where there are 12 controls vs. 3
samples, genes with at least 1 counts per million (CPM) in at least 6 samples were kept for
differential gene expression analysis. For groups of small datasets (Sets B and C;
Supplementary figure 3.3), consisting of 2 controls vs. 2 samples, genes with at least 1 CPM in
at least 2 samples were kept for differential gene expression analysis. In dataset A, after removing
lowly counted genes, upper-quartile normalization was used for the GLM approach in edgeR
which takes into account other covariates (e.g. date of RNA extraction; see Supplementary
figure 3.4) found in the PCA plots. For datasets B and C, filtered gene counts were normalized
using the Trimmed Mean of M-value (TMM) method for the negative binomial model in edgeR.
Among the differentially expressed genes with an FDR <0.05, lowly expressed genes were filtered
out if the average CPM or average RPKM values (quantified using Partek software) was less than
2 in the controls for downregulated genes or 2 in the deleted samples for upregulated genes.
110
Finally, genes with greater than 1.5-fold change were defined as differentially expressed genes
(i.e. the numbers shown in Table 3.1).
3.5.5 Cell proliferation assays
Controls and E7-deleted HCT116 cells were counted by sorting single cells using a flow
cytometer and then plated into a 96-well plate at a density of 1 x 10
4
or 2.5 x 10
3
cells
per well.
After 72 hours or 96 hours, cells were rinsed with 1X PBS and then 100ul of PBS mixed with 10ul
of WST-1 reagent (Roche, catalog # 05015944001) was added to each well. Cell proliferation was
measured using a microplate reader (HIDEX Chameleon V4.43) after incubation for 15 minutes at
37°C. The wavelength for measuring the absorbance of formazan, the product of WST-1 assay,
was 450nm.
3.5.6 Colony forming assays
HCT116 cells and HEK293 cells were counted by sorting single cells using a flow cytometer
and were seeded in 6-well plates at a density of 1.5 x 10
3
cells per well and 6.5 x 10
3
cells per
well, respectively. After a 2 week incubation, cells were fixed with 100% methanol for 10 minutes,
stained with 0.5% crystal violet for 30 minutes, and colonies were assessed after rinsing the
plates with water.
3.5.7 ChIP-seq and analysis
For Figure 3.1, H3K27Ac ChIP-seq data from HCT116 cells (ENCODE accession number
ENCSR000EUT) was downloaded from the UCSC genome browser and analyzed using the Sole-
search ChIP-seq peak calling program (Blahnik, Dou et al. 2010, Blahnik, Dou et al. 2011) using
the following parameters (Permutation:5; Fragment:250; AlphaValue: 0.00010 = 1.0E-4; FDR:
0.00010 = 1.0E-4; PeakMergeDistance:0; HistoneBlurLength:1200). For Figure 3.3 and Figure
3.4, H3K27Ac ChIP-seq samples for control and enhancer-deleted HCT116 cells were prepared
111
using an H3K27Ac antibody (Active motif catalog#39133, Lot#21311004), as previously described
(O'Geen, Echipare et al. 2011), with minor modifications (complete protocol available upon
request). These ChIP-seq libraries were sequenced on a HiSeq2000 with 50 bp single end reads.
For Supplementary figure 3.6, the TCF7L2 ChIP and CTCF ChIP-seq samples in control and
E7-deleted HCT116 cells were prepared using a TCF7L2 antibody (CST, catalog #C48H11, lot#2)
or a CTCF antibody (Active motif, catalog #61311, lot# 23913002). The ChIP-seq libraries were
sequenced on a NextSeq500 with 75 bp single end reads. All ChIP-seq FASTQ files were
mapped to hg19 using BWA (default parameters). To identify decreased or increased H3K27Ac
peaks in enhancer-deleted cells (Supplementary figure 3.7), two biological replicates for each
group were used for differential peak analysis. A peak-calling prioritization pipleline (PePr) was
used to identify significant differential binding sites of H3K27Ac caused by enhancer deletions
(Zhang,
Lin
et
al.
2014); default parameters for histone marks were used for analysis and
differential peaks having a p-value less than 1e
-5
are reported in Supplementary figure 3.7.
3.5.8 4C-seq and analysis
4C-seq was performed as previously described (Simonis,
Kooren
et
al.
2007,
van
de
Werken,
de
Vree
et
al.
2012) with the following modifications. 10 million cells were harvested and
cross-linked with 1% formaldehyde for 10 minutes at room temperature and cells were lysed to
obtain nuclei. Digestion of nuclei started with adding 200 units of DpnII (NEB, catalog # R0543L)
for 4 hours, followed by overnight incubation with another 200 units of DpnII, and then a third
addition of 200 units of DpnII was added the next day for a 4 hour incubation. DpnII was then
inactivated by heat, followed by ligation using 50 units of T4 ligase (Roche, catalog #
10799009001) overnight at 16˚C in a cold room. After confirming high efficiency of ligation, cross-
linked were reverse at 65˚C overnight. Phenol-chloroform was used to isolate circularized ligation
products, followed by digestion with 50 units of Csp61 (Thermo, catalog # ER0211) at 37˚C
112
overnight to create trimmed circular ligation products. 200ng of DNA was used for inverse PCR
using the Expand Long Template PCR system (Roche, catalog #
11681842001) to create a 4C
library for each bait. Bait and primer information is in Supplementary figure 3.9. PCR products
were purified using the Roche High Pure PCR Products Purification Kit and further purification
using the Qiagen purification kit. Samples were multiplexed and sequenced on a HiSeq2500
(Illumina) using 65 read length. 4C FASTQ files were trimmed 10bp from 3’ end and were
analyzed using the 4Cseqpipe program (van
de
Werken,
Landan
et
al.
2012). 4Cseqpipe was
used to extract sequences from a FASTQ file based on provided bait-specific primer sequences.
Primers were removed from the reads and 31bp were mapped to the hg19 genome using the
4Cseqpipe mapper. The number of mapped reads are listed in Supplementary figure 3.3.
Normalized contact profiles were generated with for each bait using the following parameters:
genomic range for normalized trend : 1Mb; median resolution: 5Kb window size.
113
3.6 Supplementary figures for chapter3
Supplementary figure 3.1 Guide RNA sequences.
Shown are the size of the deleted regions, the genomic locations of the enhancers, and the target DNA for
the guide RNAs. Two different sets of guide RNAs were used to delete E7 in HCT116 vs. HEK293 cells.
114
Supplementary figure 3.2 Confirmation of enhancer deletions.
The regions corresponding to the deleted sequences for each enhancer are indicated as bars; the
different colors refer to different pairs of guide RNAs used. There are 3 copies of E7 in HCT116 cells
and thus there are 3 bars shown for each clone; the other enhancers are in diploid regions of the
genome. (Panel A) deletion of all E7 alleles was confirmed by PCR (data not shown) and the loss of the
enhancer was confirmed through ChIP-seq; (Panel B) biallelic deletion was confirmed by PCR (data not
shown) for E24 and the loss of the enhancer was confirmed through ChIP-seq; (Panel C and D)
deletion was confirmed for 18qE and 18qNE by PCR using primers flanking each enhancer and internal
to each enhancer; (Panel E) deletion was confirmed for E7 in HEK293 by PCR.
115
Supplementary figure 3.3 List of datasets.
Shown is information relevant to the RNA-seq, ChIP-seq, and 4C-seq experiments performed for this study.
1.#Study#design#of#RNA3seq
A.# Date#of#sequencing Sequencer Cell#line Sample#type Clone##
Replicate#name#(given#by#
RNA#extraction#date)
Sample#name USC#ID
##of#
mapped#
reads
A C1_A USC_RNA_56 16,835,185
B C1_B USC_RNA_57 16,136,616
C C1_C USC_RNA_58 12,973,689
A C2_A USC_RNA_59 17,849,912
B C2_B USC_RNA_60 16,536,346
C C2_C USC_RNA_61 16,479,259
A C3_A USC_RNA_62 16,880,730
B C3_B USC_RNA_63 14,510,306
C C3_C USC_RNA_64 12,879,816
A C4_A USC_RNA_65 13,674,375
B C4_B USC_RNA_66 14,736,300
C C4_C USC_RNA_67 15,768,373
A E7_A USC_RNA_76 18,330,669
B E7_B USC_RNA_77 17,464,123
C E7_C USC_RNA_78 13,928,003
A E24_A USC_RNA_87 15,024,023
B E24_B USC_RNA_88 18,766,076
C E24_C USC_RNA_89 17,901,475
1 A NE1_A USC_RNA_68 14,051,545
2 B NE2_B USC_RNA_70 17,159,364
3 A NE3_A USC_RNA_72 16,559,866
B.# Date#of#sequencing Sequencer Cell#line Sample#type Clone##
Replicate#name#(given#by#
RNA#extraction#date)
Sample#name USC#ID
##of#
mapped#
reads
1 C1 USC_RNA_48 24,680,074
2 C2 USC_RNA_49 22,926,469
1 18qE_1 USC_RNA_52 24,837,265
2 18qE_2 USC_RNA_53 23,033,168
C. Date#of#sequencing Sequencer Cell#line Sample#type Clone##
Replicate#name#(given#by#
RNA#extraction#date)
Sample#name USC#ID
##of#
mapped#
reads
1 HEK_293T_C1 USC_RNA_40 28,851,361
2 HEK_293T_C2 USC_RNA_41 26,748,220
1 HEK_293T_E7_1 USC_RNA_42 35,279,209
2 HEK_293T_E7_2 USC_RNA_43 31,865,398
2.#Study#design#of#ChIP3seq
Date#of#sequencing# Sequencer Cell#line Antibody#used Sample#type# Biological#replicate## Sample#name USC#ID
##of#
mapped#
reads
9/9/14 1 H3K27Ac_Control_1 USC646 29,859,232
12/5/14 2 H3K27Ac_Control_2 USC659 27,282,365
9/9/14 1 H3K27Ac_E7Del_1 USC652 26,250,803
12/5/14 2 H3K27Ac_E7Del_2 USC660 31,175,944
9/9/14 1 H3K27Ac_E24Del_1 USC648 26,389,565
12/5/14 2 H3K27Ac_E24Del_2 USC664 28,661,150
Control 1 TCF7L2_Control_1 USC731 50,016,717
1 TCF7L2_E7Del_1 USC749 36,804,109
2 TCF7L2_E7Del_2 USC750 38,985,416
1 CTCF_Control1 USC726 28,327,919
2 CTCF_Control2 USC727 33,291,505
E7DDeletionD 1 CTCF_E7Del_1 USC728 30,044,462
3.#Study#design#of#4C3seq
Date#of#sequencing# Sequencer Cell#line Sample#type# Bait#name Biological#replicate## Sample#name USC#ID
##of#
mapped#
reads
MYC 1 4C_Control_MYC USC_4C_2
E7 1 4C_Control_E7 USC_4C_18
E24 1 4C_Control_E24 USC_4c_14 416337
4
1
1
Nextseq500 7/2/15
ControlDDDDDDDDDDDDDDDDDDDDDDDDDDD
E7DDeletionDDDDDD
E24DDeletionD
18qNEDDeletionD
HCT116
1
2
3
C
D 1/27/15 Nextseq500 HEKD293T
Control
E7DDeletion
5/5/15 Nextseq500 HCT116
18qEDDeletion
Control
7/31/15 HCT116 Control HiSeq2500
Control
E7DDeletion
E24DDeletion
E7DDeletionD
Control
5/13/15
HCT116
Nextseq500
Hiseq2000 H3K27Ac
TCF7L2
CTCF
116
Supplementary figure 3.4 PCA plots of RNA-seq data.
S Shown are principal components PC1 (x-axis) and PC2 (y-axis) for RNA-seq datasets for (A) control vs
E7-deleted HCT116 cells, (B) E24-deleted HCT116 cells, (C) 18qNE-deleted HCT116 cells, (D) 18qE-
deleted HCT116 cells, and (E) E7-deleted HEK293 cells.
117
Supplementary figure 3.5 Circos plots for top downregulated genes.
The circos plots show the chromosomal location of the top 25 downregulated genes (> 2 fold) for each
enhancer deletion (green for protein-coding genes and orange for non-coding genes, with the
thickness of the line indicating the extent of the fold-change; for 18qE and 18qNE only 3
downregulated genes were identified and for 18qNE these downregulated genes were decreased less
than 2 fold.
118
Supplementary figure 3.6 Comparison of TCF7L2 and CTCF binding patterns upon E7
deletion in HCT116.
(A) Shown are the H3K27Ac, TCF7L2, and CTCF peaks within a 3 Mb region near E7, with
downregulated genes indicated in red. Expanded views of two super-enhancer regions are shown
below; deletion of E7 reduced H3K27Ac and TCF7L2 binding in these regions. (B) For
comparison, shown are the H3K27Ac, TCF7L2, and CTCF peaks near an upregulated gene
indicated in blue; deletion of E7 increased H3K27Ac and TCF7L2 binding in this region.
119
120
Supplementary figure 3.7
H3K27Ac and RNA-seq profiles for genes deregulated in E7- and E24-deleted cells. Shown is an
analysis of genes downregulated upon E7 deletion (A), upregulated upon E7 deletion (B),
downregulated upon E24 deletion (C) and upregulated upon E24 deletion (D). For each set of
analyses panel 1 shows which genes downregulated or upregulated greater than 1.5-fold have
decreased or increased peaks within +/- 100Kb from their TSS; panels 2 and 3 show the
H3K27Ac tracks and RNA-seq data in the control and deleted cells for specific genes
121
Supplementary figure 3.8 Proliferation assays.
(A) Replicates of colony forming assays in control and E7-deleted HCT116 cells. (B) Photographs
of HCT116 vs E7-deleted cells showing that cells was gained a more fibroblast-like morphology
upon deletion of E7 in HCT116 cells. (C) Replicates of colony forming assays in control and E7-
deleted HEK293 cells.
Supplementary figure 3.9 4C-seq information.
Shown are the bait information and primer sequences for the 4C-seq experiments.
122
4 Chapter4: Conclusions and future studies
4.1 Summary of main findings
In the post-GWAS era, the conundrum of GWAS is to determine the functional relevance
of SNPs that are statistically associated with diseases or traits. Unlike rare mutations that disrupt
coding regions of proteins, GWAS-identified SNPs (and associated high LD SNPs or fine
mapped SNPs) are mostly found in non-coding regions of the genome, making it difficult for to
understand how they influence risk of a disease. However, with the efforts of ENCODE and
REMC, it has been found that non-coding regions are decorated by regulatory elements
(promoters, enhancers, and nuclear structure-associated elements) and GWAS-identified SNPs
(index SNPs), as well as their high LD SNPs, are enriched in these regulatory elements. I believe
that characterizing the function of an entire regulatory element should be performed prior to
investigating the functions of a specific SNP, and therefore I chose to study the function of
enhancers that harbor CRC risk-associated SNPs.
For my thesis work, I used genomic and epigenomic information to prioritize CRC risk-
associated SNPs, prioritized enhancers that harbor the SNPS, then performed experimental
analyses of these enhancers using genome editing (CRISPR/Cas9) and a looping assay (4C) to
identify target genes of the enhancers. CRC risk-associated enhancers were chosen by
overlapping refined SNPs (identified using LD calculations of colon cancer index SNPs) with the
active enhancer mark H3K27Ac in HCT116 colon cancer cells. Among the 28 prioritized
enhancers, I deleted 2 CRC risk associated enhancers, termed E7 and E24, that are in the
intronic regions of the non-coding gene CASC8 and the coding gene RHPN2, respectively. Each
deletion resulted in hundreds of gene expression changes, including the genes in which th
enhancers were located. However, the effects on the transcriptome were not correlated with the
size of the enhancers. Deletion of E7, which corresponds to a small H3K27Ac peak, caused
large-scale changes in gene expression, resulting in decreased expression of many nearby
123
genes. Deletion of E24, which corresponds to a large H3K27Ac peak, caused changes in
hundreds of genes, only one of which (RHPN2) was within 1 Mb. As a comparison, I showed that
deletion of a robust H3K27Ac peak not associated with CRC had minimal effects on the
transcriptome. Interestingly, although there is no H3K27Ac peak in HEK293 cells in the E7
region, deletion of this region in HEK293 cells decreased expression of several of the same
genes that were downregulated in HCT116 cells, including the MYC oncogene. Accordingly,
deletion of E7 causes changes in cell proliferation in HCT116 and HEK293 cells. Using a 4C
looping assay, I confirmed that E24 has a physical interaction with the RHPN2 promoter,
supporting the conclusion that RHPN2 is a direct target gene of E24. Also I showed many
interactions between the E7 region and the MYC gene, consistent with the outcomes of deletion
of E7, which affects expression of many nearby genes. Importantly, I showed that deletion of E7
has affects cell proliferation, as would be expected if it regulated MYC. In conclusion, by
identifying genes regulated by CRC-associated enhancers harboring SNPs, I have developed a
general approach to connect risk loci to putative risk genes.
4.2 Future directions
4.2.1.Epigenome editing of risk-associated enhancers
I found that obtaining complete deletion of the risk enhancers was quite difficult, especially when
the enhancer was located within an amplified region in a cancer cell line. Therefore, instead of
deleting enhancers (which requires Cas9 to cut simultaneously at two sites, followed by laborious
clonal selection for completely deleted alleles) I propose that a better method may be to activate
or repress the enhancers, which should also lead to expression changes of their target genes. By
using epigenetic toggle switches consisting of dCas9-LSD1 or dCas9-P300core along with single
gRNA, an enhancer could be repressed or activated, respectively (see details in Chapter 1). This
124
method would be especially important for studying risk-associated enhancers that are highly
active in normal cells but repressed in cancer cell lines. It is difficult to study normal-specific
enhancers because of the lack of easily engineered normal cell lines. When studying these
enhancers in cancer cells, expression of the target genes is already low, making it difficult to
observe significant changes in target gene expression upon enhancer deletion.. For example,
CRC risk enhancer E21 (Figure 4.1) is repressed in HCT116 cells as compared to normal
sigmoid colon tissue, and a recent study using luciferase assays showed that the risk haplotype
of this enhancer has lower enhancer activity than does the non-risk haplotype. However, using
dCas9-P300core along with a E21-specific gRNA (or several gRNAs that tile through the
enhancer region), the target genes of E21 could be activated. Using 4C, differences in
interactions between E21 and the target genes could be evaluated before and after epigenome
editing.
Figure 4.1 Repressed CRC risk enhancer E21
125
4.2.2. Targeting combinatorial enhancers
Given the fact that several enhancers could effect the same target gene, I propose that several
risk associated enhancers in the same risk loci should be deleted simultaneously., This should
be feasible either by deleting one very large region (up to 1 MB have been successfully deleted,
ref) or by deleting several smaller regions corresponding to each enhancer in the region.
Alternatively, the enhancers could all be repressed using epigenomic toggle switches. The
effects on the transcription upon deletion or inactivation of multiple enhancers could then be
compared to the results obtaining when deleting each risk-associated enhancer individually.
4.2.3. Targeting SNP using HDR mechanism
For the risk-associated enhancers that show functional relevance to CRC, for example, E7 which
regulates the MYC oncogene and whose deletion results in reduced cell proliferation and less
colony formation, I propose that a comparison of the risk and non-risk alleles should be made.
The SNPs that are located in the E7 enhancer can be changed using single gRNA, Cas9 and
ssODN. After changing each SNP (or changing several SNPs together that are in the same
haplotype), I could compare the effects of the different alleles on target gene expression. Also,
by performing ChIP-seq for the TFs that bind to motifs containing the SNPs, I could investigate
the functional relevance of the SNPs on differential TF binding.
4.2.4 . Disease-related functional assays.
The putative target gene of E7 is MYC, a well-known oncogene. However, the putative target
gene of E24 is RHPN2 which has not been previously associated with colon cancer. However, a
recent study showed that ectopic expression of RHPN2 promotes an invasive phenotype in
126
neural stem cells and astrocytes and RHPN2 amplification correlates with a highly aggressive
phenotype that directs the worst clinical outcomes in patients with glioblastoma(Danussi, Akavia
et al. 2013). Therefore, I propose to inactivate RHPN2 using epigenetic toggle switches directed
to the RHPN2 promoter or knockout the RHPN2 gene using CRISPR/Cas9 to test whether
RHPN2 is a candidate oncogene in colon cancer. Then, cell proliferation assays could be
performed in control vs. RHPN2 depleted conditions. However perhaps a mouse model is a
better system for testing the phenotypic effects of risk-associated enhancers or risk genes. Even
though I have shown that deletion of E7 led to reduced proliferation of colon cancer cells,
differences in cell proliferation rates were hard to detect when cells were grown at a high density.
This supports the argument that cancer cell lines are not the best systems to see the phenotypic
changes because of their genomic instabilities and abnormalities in karyotype.. Sur et al. showed
that the incidence of polyp formation was markedly reduced when a normal mouse lacking the
homologous E7 region as I studied in human cells was crossed to a mouse that spontaneously
develops tumors in the intestine and colon. This result supports the idea that a mouse model
may be a better system to see more accurate phenotypic consequences of SNPs, regulatory
elements that have SNPs, or putative causal target genes. Therefore, I propose that E24 be
tested in the mouse colon cancer model system.
127
Appendix A: Publications as a contributing author
Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the
genome by association with GATA3
Seth F, Wang R, Yao L, Tak YG, Ye Z, Gaddis M, Witt H, Farnham PJ, Jin VX. Cell type-
specific binding patterns reveal that TCF7L2 can be tethered to the genome by
association with GATA3. Genome Biol. 2012;13(9):R52. PMCID: PMC3491396
I contributed to this publication by performing ChIP-seq experiments revealing an association
between TCF7l2 and GATA3 as part of a project focusing on cell type-specific binding of
TCF7L2.
128
RESEARCH Open Access
Cell type-specific binding patterns reveal that
TCF7L2 can be tethered to the genome by
association with GATA3
Seth Frietze
1†
, Rui Wang
2,3†
, Lijing Yao
1
, Yu Gyoung Tak
1
, Zhenqing Ye
3
, Malaina Gaddis
1
, Heather Witt
1
,
Peggy J Farnham
1*
and Victor X Jin
3*
Abstract
Background: The TCF7L2 transcription factor is linked to a variety of human diseases, including type 2 diabetes
and cancer. One mechanism by which TCF7L2 could influence expression of genes involved in diverse diseases is
by binding to distinct regulatory regions in different tissues. To test this hypothesis, we performed ChIP-seq for
TCF7L2 in six human cell lines.
Results: We identified 116,000 non-redundant TCF7L2 binding sites, with only 1,864 sites common to the six cell
lines. Using ChIP-seq, we showed that many genomic regions that are marked by both H3K4me1 and H3K27Ac are
also bound by TCF7L2, suggesting that TCF7L2 plays a critical role in enhancer activity. Bioinformatic analysis of the
cell type-specific TCF7L2 binding sites revealed enrichment for multiple transcription factors, including HNF4alpha
and FOXA2 motifs in HepG2 cells and the GATA3 motif in MCF7 cells. ChIP-seq analysis revealed that TCF7L2 co-
localizes with HNF4alpha and FOXA2 in HepG2 cells and with GATA3 in MCF7 cells. Interestingly, in MCF7 cells the
TCF7L2 motif is enriched in most TCF7L2 sites but is not enriched in the sites bound by both GATA3 and TCF7L2.
This analysis suggested that GATA3 might tether TCF7L2 to the genome at these sites. To test this hypothesis, we
depleted GATA3 in MCF7 cells and showed that TCF7L2 binding was lost at a subset of sites. RNA-seq analysis
suggested that TCF7L2 represses transcription when tethered to the genome via GATA3.
Conclusions: Our studies demonstrate a novel relationship between GATA3 and TCF7L2, and reveal important
insights into TCF7L2-mediated gene regulation.
Background
The TCF7L2 (transcription factor 7-like 2) gene encodes a
high mobility group box-containing transcription factor
that is highly up-regulated in several types of human can-
cer, such as colon, liver, breast, and pancreatic cancer
[1-4]. Although TCF7L2 is sometimes called TCF4, there
is a helix-loop-helix transcription factor that has been
given the official gene name of TCF4 and it is important,
therefore, to be aware of possible confusion in the litera-
ture. Numerous studies have shown that TCF7L2 is an
important component of the WNT pathway [3,5,6].
TCF7L2 mediates the downstream effects of WNT signal-
ing via its interaction with CTNNB1 (beta-catenin) and it
can function as an activator or a repressor, depending on
the availability of CTNNB1 in the nucleus. For example,
TCF7L2 can associate with the members of the Groucho
repressor family in the absence of CTNNB1. The WNT
pathway is often constitutively activated in cancers, leading
to increased levels of nuclear CTNNB1 and up-regulation
of TCF7L2 target genes [3]. In addition to being linked to
neoplastic transformation, variants in TCF7L2 are thought
to be the most critical risk factors for type 2 diabetes
[7-10]. However, the functional role of TCF7L2 in these
diseases remains unclear. One hypothesis is that TCF7L2
regulates its downstream target genes in a tissue-specific
manner, with a different cohort of target genes being
turned on or off by TCF7L2 in each cell type. One way to
* Correspondence: pfarnham@usc.edu; Victor.Jin@osumc.edu
† Contributed equally
1
Department of Biochemistry and Molecular Biology, Norris Comprehensive
Cancer Center, University of Southern California, Los Angeles, CA 90089, USA
3
Department of Biomedical Informatics, The Ohio State University, Columbus,
OH 43210, USA
Full list of author information is available at the end of the article
Frietze et al. Genome Biology 2012, 13:R52
http://genomebiology.com/content/13/9/R52
©2012Frietzeetal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreativeCommons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
Appendix B: Supplementary data files in DVD
Supplementary data file 2.1 Exon, TSS, and enhancer correlated SNPs. (xlsx file)
Individual worksheets are provided for all correlated SNPs identified in TSS regions, all
correlated SNPs found in exons, the subset of nonsynonymous correlated SNPs found in exons,
and all correlated SNPs found in the set of 28 enhancers. In all cases, the tag SNP, the LD of the
correlated, and genomic information for each SNP is provided.
Supplementary data file 2.2. H3K27Ac ChIP-seq peaks. (xlsx file) The sets of H3K27Ac
peaks called on merged replicates for HCT116 and sigmoid colon datasets is provided in
individual worksheets.
Supplementary data file 2.3. Distal regulatory regions correlated with CRC tag
SNPs. (xlsx file) The tag SNP and the correlated SNPs for 28 distal, robust H3K27Ac regions
are indicated; each region is classified according to its presence or absence in HCT116 (H),
sigmoid colon (S), or SW480 (SW) cells. The 3 nearest protein-coding genes and 3 nearest non-
coding RNAs were identified using the GENCODE V15 gene annotation. The red text indicates
that the same gene was identified by SNPs within the TSS (r2 >0.5) and the blue color means
that the same gene was identified by a SNP within a coding exon (r2 >0.1); see Table 2.2. The
strike-out indicates that the RNA is present at less than 0.5 FPKM in HCT116 and sigmoid colon
cells; bold indicates that the transcript is higher than 2 FPKM.
Supplementary data file 2.4. Motif analysis of all correlated SNPs in enhancers.
(xlsx file) All SNPs having an r2> 0.1 with the 25 CRC tag SNPs were analyzed using motifs from
Factorbook (http://www.factorbook.org). For those SNPs that impacted a critical position of a
motif, it was determined if the change was predicted to be an improvement or a disruption. A
more restrictive list including only the subset of SNP-affected motifs within the robust enhancer
regions (using an r2> 0.5 cut-off) are shown in Supplementary data file 2.5.
Supplementary data file 2.5. Motif analysis of enhancer SNPs having r
2
>0.5 with a
tag SNP. (xlsx file) Shown are the motifs in the risk-associated enhancers that are affected by
the correlated SNPs; “-“ means that the alternative allele of the correlated SNP should disrupt TF
binding and “+” means that that the alternative allele of the correlated SNP should improve TF
binding.
Supplementary data file 2.6. Expression of transcription factors with affected
motifs. (xlsx file) Shown is the name of the transcription factors (TF), the expression levels in
HCT116 and sigmoid colon cells, the differential expression ratio (calculated both ways), and the
gene id and genomic location of each TF.
Supplementary data file 2.7. eQTL analysis results. (xlsx file) Shown are the results for
the SNP-gene pairs using 60 SNPs (6 tag SNPs, 18 SNPs within risk enhancers, and 45 SNPs
147
within TSS regions) present on the GW SNP6 array and the expression of genes identified by
exon or TSS SNPs or by differential expression analysis (see Table 2.2 and Table 2.5).
Supplementary data file 2.8. RNA analysis of cells having a deletion of enhancer 7.
(xlsx file) Shown are the genes whose expression decreased in the cells in which enhancer 7
was deleted, as determined using Illumina HumanHT-12 v4 Expression BeadChip arrays.
Supplementary data file 2.9. TCGA sample IDs. (xlsx file) The IDs for HM450K DNA
methylation, RNA-seq, SNP arrays, and copy number variation analyses for 228 TCGA samples
are provided, as well as the IDs for 254 samples used in normal-tumor gene expression
analyses.
Supplementary data file 2.10. Summary of putative CRC risk-associated genes.
(xlsx file) The names of the 80 genes identified as potentially involved in an increased risk for
colon cancer are indicated, along with the location of the SNP(s) that identified the gene. In some
cases, the gene was identified by more than one type of SNP. In the Enhancer column, “nearest
3” and “nearest 20” refer to the position of the gene relative to the location of the enhancer. In
the Expressed column, Yes means that the gene is expressed in colon cells and TCGA means
that the gene showed differential expression in a cohort of the TCGA normal vs. colon tumor
samples.
Supplementary data file 3.1 Genes deregulated upon enhancer deletion.
(xlsx file) The different worksheets show the genes upregulated or downregulated > 1.5 fold upon
enhancer deletion, as compared to expression levels in control clones (selected after transfection
with Cas9-GFP plus gRNA empty vector).
Supplementary data file 3.2 H3K27Ac and RNA-seq profiles for genes deregulated
in E7- and E24-deleted cells.
(xlsx file) Shown is an analysis of genes downregulated upon E7 deletion (A), upregulated upon
E7 deletion (B), downregulated upon E24 deletion (C) and upregulated upon E24 deletion (D).
For each set of analyses panel 1 shows which genes downregulated or upregulated greater than
1.5-fold have decreased or increased peaks within +/- 100Kb from their TSS; panels 2 and 3
show the H3K27Ac tracks and RNA-seq data in the control and deleted cells for specific genes.
148
References
(2012).
"Comprehensive
molecular
characterization
of
human
colon
and
rectal
cancer."
Nature
487(7407):
330-‐‑337.
(2012).
"An
integrated
encyclopedia
of
DNA
elements
in
the
human
genome."
Nature
489(7414):
57-‐‑74.
(2012).
"Method
of
the
Year
2011."
Nature
Methods
9:
1.
(2015).
"Human
genomics.
The
Genotype-‐‑Tissue
Expression
(GTEx)
pilot
analysis:
multitissue
gene
regulation
in
humans."
Science
348(6235):
648-‐‑660.
Abba,
M.,
N.
Patil,
K.
Rasheed,
L.
D.
Nelson,
G.
Mudduluru,
J.
H.
Leupold
and
H.
Allgayer
(2013).
"Unraveling
the
Role
of
FOXQ1
in
Colorectal
Cancer
Metastasis."
Mol
Cancer
Res
11(9):
1017-‐‑1028.
Abecasis,
G.
R.,
A.
Auton,
L.
D.
Brooks,
M.
A.
DePristo,
R.
M.
Durbin,
R.
E.
Handsaker,
H.
M.
Kang,
G.
T.
Marth
and
G.
A.
McVean
(2012).
"An
integrated
map
of
genetic
variation
from
1,092
human
genomes."
Nature
491(7422):
56-‐‑65.
Adzhubei,
I.
A.,
S.
Schmidt,
L.
Peshkin,
V.
E.
Ramensky,
A.
Gerasimova,
P.
Bork,
A.
S.
Kondrashov
and
S.
R.
Sunyaev
(2010).
"A
method
and
server
for
predicting
damaging
missense
mutations."
Nat
Methods
7(4):
248-‐‑249.
Ahmadiyeh,
N.,
M.
M.
Pomerantz,
C.
Grisanzio,
P.
Herman,
L.
Jia,
V.
Almendro,
H.
H.
He,
M.
Brown,
X.
S.
Liu,
M.
Davis,
J.
L.
Caswell,
C.
A.
Beckwith,
A.
Hills,
L.
Macconaill,
G.
A.
Coetzee,
M.
M.
Regan
and
M.
L.
Freedman
(2010).
"8q24
prostate,
breast,
and
colon
cancer
risk
loci
show
tissue-‐‑specific
long-‐‑range
interaction
with
MYC."
Proc
Natl
Acad
Sci
U
S
A
107(21):
9742-‐‑9746.
Akhtar-‐‑Zaidi,
B.,
R.
Cowper-‐‑Sal-‐‑lari,
O.
Corradin,
A.
Saiakhova,
C.
F.
Bartels,
D.
Balasubramanian,
L.
Myeroff,
J.
Lutterbaugh,
A.
Jarrar,
M.
F.
Kalady,
J.
Willis,
J.
H.
Moore,
P.
J.
Tesar,
T.
Laframboise,
S.
Markowitz,
M.
Lupien
and
P.
C.
Scacheri
(2012).
"Epigenomic
enhancer
profiling
defines
a
signature
of
colon
cancer."
Science
336(6082):
736-‐‑739.
Albert,
F.
W.
and
L.
Kruglyak
(2015).
"The
role
of
regulatory
variation
in
complex
traits
and
disease."
Nat
Rev
Genet
16(4):
197-‐‑212.
Amin
Al
Olama,
A.,
T.
Dadaev,
D.
J.
Hazelett,
Q.
Li,
D.
Leongamornlert,
E.
J.
Saunders,
S.
Stephens,
C.
Cieza-‐‑Borrella,
I.
Whitmore,
S.
Benlloch
Garcia,
G.
G.
Giles,
M.
C.
Southey,
L.
Fitzgerald,
H.
Gronberg,
F.
Wiklund,
M.
Aly,
B.
E.
Henderson,
F.
Schumacher,
C.
A.
Haiman,
J.
Schleutker,
T.
Wahlfors,
T.
L.
Tammela,
B.
G.
Nordestgaard,
T.
J.
Key,
R.
C.
Travis,
D.
E.
Neal,
J.
L.
Donovan,
F.
C.
Hamdy,
P.
Pharoah,
N.
Pashayan,
K.
T.
Khaw,
J.
L.
Stanford,
S.
N.
Thibodeau,
S.
K.
McDonnell,
D.
J.
Schaid,
C.
Maier,
W.
Vogel,
M.
Luedeke,
K.
Herkommer,
A.
S.
Kibel,
C.
Cybulski,
D.
Wokolorczyk,
W.
Kluzniak,
L.
Cannon-‐‑Albright,
H.
Brenner,
K.
Butterbach,
V.
Arndt,
J.
Y.
Park,
T.
Sellers,
H.
Y.
Lin,
C.
Slavov,
R.
Kaneva,
V.
Mitev,
J.
Batra,
J.
A.
Clements,
A.
Spurdle,
M.
R.
Teixeira,
P.
Paulo,
S.
Maia,
H.
Pandha,
A.
Michael,
A.
Kierzek,
K.
Govindasami,
M.
Guy,
A.
Lophatonanon,
K.
Muir,
A.
Vinuela,
A.
A.
Brown,
M.
Freedman,
D.
V.
Conti,
D.
149
Easton,
G.
A.
Coetzee,
R.
A.
Eeles
and
Z.
Kote-‐‑Jarai
(2015).
"Multiple
novel
prostate
cancer
susceptibility
signals
identified
by
fine-‐‑mapping
of
known
risk
loci
among
Europeans."
Hum
Mol
Genet.
Andersson,
R.,
C.
Gebhard,
I.
Miguel-‐‑Escalada,
I.
Hoof,
J.
Bornholdt,
M.
Boyd,
Y.
Chen,
X.
Zhao,
C.
Schmidl,
T.
Suzuki,
E.
Ntini,
E.
Arner,
E.
Valen,
K.
Li,
L.
Schwarzfischer,
D.
Glatz,
J.
Raithel,
B.
Lilje,
N.
Rapin,
F.
O.
Bagger,
M.
Jorgensen,
P.
R.
Andersen,
N.
Bertin,
O.
Rackham,
A.
M.
Burroughs,
J.
K.
Baillie,
Y.
Ishizu,
Y.
Shimizu,
E.
Furuhata,
S.
Maeda,
Y.
Negishi,
C.
J.
Mungall,
T.
F.
Meehan,
T.
Lassmann,
M.
Itoh,
H.
Kawaji,
N.
Kondo,
J.
Kawai,
A.
Lennartsson,
C.
O.
Daub,
P.
Heutink,
D.
A.
Hume,
T.
H.
Jensen,
H.
Suzuki,
Y.
Hayashizaki,
F.
Muller,
A.
R.
Forrest,
P.
Carninci,
M.
Rehli
and
A.
Sandelin
(2014).
"An
atlas
of
active
enhancers
across
human
cell
types
and
tissues."
Nature
507(7493):
455-‐‑461.
Andersson,
R.,
A.
Sandelin
and
C.
G.
Danko
(2015).
"A
unified
architecture
of
transcriptional
regulatory
elements."
Trends
Genet
31(8):
426-‐‑433.
Aran,
D.
and
A.
Hellman
(2014).
"Unmasking
risk
loci:
DNA
methylation
illuminates
the
biology
of
cancer
predisposition:
analyzing
DNA
methylation
of
transcriptional
enhancers
reveals
missed
regulatory
links
between
cancer
risk
loci
and
genes."
Bioessays
36(2):
184-‐‑
190.
Aumailley,
M.
(2013).
"The
laminin
family."
Cell
Adh
Migr
7(1):
48-‐‑55.
Banovich,
N.
E.,
X.
Lan,
G.
McVicker,
B.
van
de
Geijn,
J.
F.
Degner,
J.
D.
Blischak,
J.
Roux,
J.
K.
Pritchard
and
Y.
Gilad
(2014).
"Methylation
QTLs
are
associated
with
coordinated
changes
in
transcription
factor
binding,
histone
modifications,
and
gene
expression
levels."
PLoS
Genet
10(9):
e1004663.
Barrangou,
R.
(2014).
"RNA
events.
Cas9
targeting
and
the
CRISPR
revolution."
Science
344(6185):
707-‐‑708.
Battle,
A.,
Z.
Khan,
S.
H.
Wang,
A.
Mitrano,
M.
J.
Ford,
J.
K.
Pritchard
and
Y.
Gilad
(2015).
"Genomic
variation.
Impact
of
regulatory
variation
from
RNA
to
protein."
Science
347(6222):
664-‐‑667.
Bauer,
D.
E.,
S.
C.
Kamran,
S.
Lessard,
J.
Xu,
Y.
Fujiwara,
C.
Lin,
Z.
Shao,
M.
C.
Canver,
E.
C.
Smith,
L.
Pinello,
P.
J.
Sabo,
J.
Vierstra,
R.
A.
Voit,
G.
C.
Yuan,
M.
H.
Porteus,
J.
A.
Stamatoyannopoulos,
G.
Lettre
and
S.
H.
Orkin
(2013).
"An
erythroid
enhancer
of
BCL11A
subject
to
genetic
variation
determines
fetal
hemoglobin
level."
Science
342(6155):
253-‐‑
257.
Bell,
J.
T.,
A.
A.
Pai,
J.
K.
Pickrell,
D.
J.
Gaffney,
R.
Pique-‐‑Regi,
J.
F.
Degner,
Y.
Gilad
and
J.
K.
Pritchard
(2011).
"DNA
methylation
patterns
associate
with
genetic
and
gene
expression
variation
in
HapMap
cell
lines."
Genome
Biol
12(1):
R10.
Bernstein,
B.
E.,
J.
A.
Stamatoyannopoulos,
J.
F.
Costello,
B.
Ren,
A.
Milosavljevic,
A.
Meissner,
M.
Kellis,
M.
A.
Marra,
A.
L.
Beaudet,
J.
R.
Ecker,
P.
J.
Farnham,
M.
Hirst,
E.
S.
Lander,
T.
S.
Mikkelsen
and
J.
A.
Thomson
(2010).
"The
NIH
Roadmap
Epigenomics
Mapping
Consortium."
Nat
Biotechnol
28(10):
1045-‐‑1048.
Birney,
E.,
J.
A.
Stamatoyannopoulos,
A.
Dutta,
R.
Guigo,
T.
R.
Gingeras,
E.
H.
Margulies,
Z.
Weng,
M.
Snyder,
E.
T.
Dermitzakis,
R.
E.
Thurman,
M.
S.
Kuehn,
C.
M.
Taylor,
S.
Neph,
C.
M.
Koch,
S.
Asthana,
A.
Malhotra,
I.
Adzhubei,
J.
A.
Greenbaum,
R.
M.
Andrews,
P.
Flicek,
P.
J.
Boyle,
H.
Cao,
N.
P.
Carter,
G.
K.
Clelland,
S.
Davis,
N.
Day,
P.
Dhami,
S.
C.
Dillon,
M.
O.
Dorschner,
H.
Fiegler,
P.
G.
Giresi,
J.
Goldy,
M.
Hawrylycz,
A.
Haydock,
R.
Humbert,
K.
D.
James,
B.
E.
Johnson,
E.
M.
Johnson,
T.
T.
Frum,
E.
R.
Rosenzweig,
N.
Karnani,
K.
Lee,
G.
C.
150
Lefebvre,
P.
A.
Navas,
F.
Neri,
S.
C.
Parker,
P.
J.
Sabo,
R.
Sandstrom,
A.
Shafer,
D.
Vetrie,
M.
Weaver,
S.
Wilcox,
M.
Yu,
F.
S.
Collins,
J.
Dekker,
J.
D.
Lieb,
T.
D.
Tullius,
G.
E.
Crawford,
S.
Sunyaev,
W.
S.
Noble,
I.
Dunham,
F.
Denoeud,
A.
Reymond,
P.
Kapranov,
J.
Rozowsky,
D.
Zheng,
R.
Castelo,
A.
Frankish,
J.
Harrow,
S.
Ghosh,
A.
Sandelin,
I.
L.
Hofacker,
R.
Baertsch,
D.
Keefe,
S.
Dike,
J.
Cheng,
H.
A.
Hirsch,
E.
A.
Sekinger,
J.
Lagarde,
J.
F.
Abril,
A.
Shahab,
C.
Flamm,
C.
Fried,
J.
Hackermuller,
J.
Hertel,
M.
Lindemeyer,
K.
Missal,
A.
Tanzer,
S.
Washietl,
J.
Korbel,
O.
Emanuelsson,
J.
S.
Pedersen,
N.
Holroyd,
R.
Taylor,
D.
Swarbreck,
N.
Matthews,
M.
C.
Dickson,
D.
J.
Thomas,
M.
T.
Weirauch,
J.
Gilbert,
J.
Drenkow,
I.
Bell,
X.
Zhao,
K.
G.
Srinivasan,
W.
K.
Sung,
H.
S.
Ooi,
K.
P.
Chiu,
S.
Foissac,
T.
Alioto,
M.
Brent,
L.
Pachter,
M.
L.
Tress,
A.
Valencia,
S.
W.
Choo,
C.
Y.
Choo,
C.
Ucla,
C.
Manzano,
C.
Wyss,
E.
Cheung,
T.
G.
Clark,
J.
B.
Brown,
M.
Ganesh,
S.
Patel,
H.
Tammana,
J.
Chrast,
C.
N.
Henrichsen,
C.
Kai,
J.
Kawai,
U.
Nagalakshmi,
J.
Wu,
Z.
Lian,
J.
Lian,
P.
Newburger,
X.
Zhang,
P.
Bickel,
J.
S.
Mattick,
P.
Carninci,
Y.
Hayashizaki,
S.
Weissman,
T.
Hubbard,
R.
M.
Myers,
J.
Rogers,
P.
F.
Stadler,
T.
M.
Lowe,
C.
L.
Wei,
Y.
Ruan,
K.
Struhl,
M.
Gerstein,
S.
E.
Antonarakis,
Y.
Fu,
E.
D.
Green,
U.
Karaoz,
A.
Siepel,
J.
Taylor,
L.
A.
Liefer,
K.
A.
Wetterstrand,
P.
J.
Good,
E.
A.
Feingold,
M.
S.
Guyer,
G.
M.
Cooper,
G.
Asimenos,
C.
N.
Dewey,
M.
Hou,
S.
Nikolaev,
J.
I.
Montoya-‐‑Burgos,
A.
Loytynoja,
S.
Whelan,
F.
Pardi,
T.
Massingham,
H.
Huang,
N.
R.
Zhang,
I.
Holmes,
J.
C.
Mullikin,
A.
Ureta-‐‑Vidal,
B.
Paten,
M.
Seringhaus,
D.
Church,
K.
Rosenbloom,
W.
J.
Kent,
E.
A.
Stone,
S.
Batzoglou,
N.
Goldman,
R.
C.
Hardison,
D.
Haussler,
W.
Miller,
A.
Sidow,
N.
D.
Trinklein,
Z.
D.
Zhang,
L.
Barrera,
R.
Stuart,
D.
C.
King,
A.
Ameur,
S.
Enroth,
M.
C.
Bieda,
J.
Kim,
A.
A.
Bhinge,
N.
Jiang,
J.
Liu,
F.
Yao,
V.
B.
Vega,
C.
W.
Lee,
P.
Ng,
A.
Yang,
Z.
Moqtaderi,
Z.
Zhu,
X.
Xu,
S.
Squazzo,
M.
J.
Oberley,
D.
Inman,
M.
A.
Singer,
T.
A.
Richmond,
K.
J.
Munn,
A.
Rada-‐‑Iglesias,
O.
Wallerman,
J.
Komorowski,
J.
C.
Fowler,
P.
Couttet,
A.
W.
Bruce,
O.
M.
Dovey,
P.
D.
Ellis,
C.
F.
Langford,
D.
A.
Nix,
G.
Euskirchen,
S.
Hartman,
A.
E.
Urban,
P.
Kraus,
S.
Van
Calcar,
N.
Heintzman,
T.
H.
Kim,
K.
Wang,
C.
Qu,
G.
Hon,
R.
Luna,
C.
K.
Glass,
M.
G.
Rosenfeld,
S.
F.
Aldred,
S.
J.
Cooper,
A.
Halees,
J.
M.
Lin,
H.
P.
Shulha,
M.
Xu,
J.
N.
Haidar,
Y.
Yu,
V.
R.
Iyer,
R.
D.
Green,
C.
Wadelius,
P.
J.
Farnham,
B.
Ren,
R.
A.
Harte,
A.
S.
Hinrichs,
H.
Trumbower,
H.
Clawson,
J.
Hillman-‐‑Jackson,
A.
S.
Zweig,
K.
Smith,
A.
Thakkapallayil,
G.
Barber,
R.
M.
Kuhn,
D.
Karolchik,
L.
Armengol,
C.
P.
Bird,
P.
I.
de
Bakker,
A.
D.
Kern,
N.
Lopez-‐‑Bigas,
J.
D.
Martin,
B.
E.
Stranger,
A.
Woodroffe,
E.
Davydov,
A.
Dimas,
E.
Eyras,
I.
B.
Hallgrimsdottir,
J.
Huppert,
M.
C.
Zody,
G.
R.
Abecasis,
X.
Estivill,
G.
G.
Bouffard,
X.
Guan,
N.
F.
Hansen,
J.
R.
Idol,
V.
V.
Maduro,
B.
Maskeri,
J.
C.
McDowell,
M.
Park,
P.
J.
Thomas,
A.
C.
Young,
R.
W.
Blakesley,
D.
M.
Muzny,
E.
Sodergren,
D.
A.
Wheeler,
K.
C.
Worley,
H.
Jiang,
G.
M.
Weinstock,
R.
A.
Gibbs,
T.
Graves,
R.
Fulton,
E.
R.
Mardis,
R.
K.
Wilson,
M.
Clamp,
J.
Cuff,
S.
Gnerre,
D.
B.
Jaffe,
J.
L.
Chang,
K.
Lindblad-‐‑Toh,
E.
S.
Lander,
M.
Koriabine,
M.
Nefedov,
K.
Osoegawa,
Y.
Yoshinaga,
B.
Zhu
and
P.
J.
de
Jong
(2007).
"Identification
and
analysis
of
functional
elements
in
1%
of
the
human
genome
by
the
ENCODE
pilot
project."
Nature
447(7146):
799-‐‑816.
Blahnik,
K.
R.,
L.
Dou,
L.
Echipare,
S.
Iyengar,
H.
O'Geen,
E.
Sanchez,
Y.
Zhao,
M.
A.
Marra,
M.
Hirst,
J.
F.
Costello,
I.
Korf
and
P.
J.
Farnham
(2011).
"Characterization
of
the
contradictory
chromatin
signatures
at
the
3'
exons
of
zinc
finger
genes."
PLOS
One
6:
e17121.
Blahnik,
K.
R.,
L.
Dou,
H.
O'Geen,
T.
McPhillips,
X.
Xu,
A.
R.
Cao,
S.
Iyengar,
C.
M.
Nicolet,
B.
Ludaescher,
I.
Korf
and
P.
J.
Farnham
(2010).
"Sole-‐‑search:
An
integrated
analysis
program
for
peak
detection
and
functional
annotation
using
ChIP-‐‑seq
data."
Nucleic
Acids
Res
38:
e13.
151
Blattler,
A.
and
P.
J.
Farnham
(2013).
"Cross-‐‑talk
between
site-‐‑specific
transcription
factors
and
DNA
methylation
states."
J
Biol
Chem
288(48):
34287-‐‑34294.
Blattler,
A.,
L.
Yao,
Y.
Wang,
Z.
Ye,
V.
X.
Jin
and
P.
J.
Farnham
(2013).
"ZBTB33
binds
unmethylated
regions
of
the
genome
associated
with
actively
expressed
genes."
Epigenetics
Chromatin
6:
13.
Blattler,
A.,
L.
Yao,
H.
Witt,
Y.
Guo,
C.
M.
Nicolet,
B.
P.
Berman
and
P.
J.
Farnham
(2014).
"Global
loss
of
DNA
methylation
uncovers
intronic
enhancers
in
genes
showing
expression
changes."
Genome
Biol
15(9):
469.
Bond-‐‑Smith,
G.,
N.
Banga,
T.
M.
Hammond
and
C.
J.
Imber
(2012).
"Pancreatic
adenocarcinoma."
BMJ
344:
e2476.
Bonn,
S.,
R.
P.
Zinzen,
C.
Girardot,
E.
H.
Gustafson,
A.
Perez-‐‑Gonzalez,
N.
Delhomme,
Y.
Ghavi-‐‑
Helm,
B.
Wilczynski,
A.
Riddell
and
E.
E.
Furlong
(2012).
"Tissue-‐‑specific
analysis
of
chromatin
state
identifies
temporal
signatures
of
enhancer
activity
during
embryonic
development."
Nat
Genet
44(2):
148-‐‑156.
Boyle,
A.
P.,
E.
L.
Hong,
M.
Hariharan,
Y.
Cheng,
M.
A.
Schaub,
M.
Kasowski,
K.
J.
Karczewski,
J.
Park,
B.
C.
Hitz,
S.
Weng,
J.
M.
Cherry
and
M.
Snyder
(2012).
"Annotation
of
functional
variation
in
personal
genomes
using
RegulomeDB."
Genome
Res
22(9):
1790-‐‑1797.
Boyle,
A.
P.,
L.
Song,
B.
K.
Lee,
D.
London,
D.
Keefe,
E.
Birney,
V.
R.
Iyer,
G.
E.
Crawford
and
T.
S.
Furey
(2011).
"High-‐‑resolution
genome-‐‑wide
in
vivo
footprinting
of
diverse
transcription
factors
in
human
cells."
Genome
Res
21(3):
456-‐‑464.
Breyer,
J.
P.,
D.
C.
Dorset,
T.
A.
Clark,
K.
M.
Bradley,
T.
A.
Wahlfors,
K.
M.
McReynolds,
W.
H.
Maynard,
S.
S.
Chang,
M.
S.
Cookson,
J.
A.
Smith,
J.
Schleutker,
W.
D.
Dupont
and
J.
R.
Smith
(2014).
"An
Expressed
Retrogene
of
the
Master
Embryonic
Stem
Cell
Gene
POU5F1
Is
Associated
with
Prostate
Cancer
Susceptibility."
Am
J
Hum
Genet
94(3):
395-‐‑404.
Bright-‐‑Thomas,
R.
M.
and
R.
Hargest
(2003).
"APC,
beta-‐‑Catenin
and
hTCF-‐‑4;
an
unholy
trinity
in
the
genesis
of
colorectal
cancer."
Eur
J
Surg
Oncol
29(2):
107-‐‑117.
Browning,
S.
R.
and
B.
L.
Browning
(2007).
"Rapid
and
accurate
haplotype
phasing
and
missing-‐‑data
inference
for
whole-‐‑genome
association
studies
by
use
of
localized
haplotype
clustering."
Am
J
Hum
Genet
81(5):
1084-‐‑1097.
Buenrostro,
J.
D.,
P.
G.
Giresi,
L.
C.
Zaba,
H.
Y.
Chang
and
W.
J.
Greenleaf
(2013).
"Transposition
of
native
chromatin
for
fast
and
sensitive
epigenomic
profiling
of
open
chromatin,
DNA-‐‑binding
proteins
and
nucleosome
position."
Nat
Methods
10(12):
1213-‐‑
1218.
Cai,
J.,
Y.
Zhao,
Y.
Liu,
F.
Ye,
Z.
Song,
H.
Qin,
S.
Meng,
Y.
Chen,
R.
Zhou,
X.
Song,
Y.
Guo,
M.
Ding
and
H.
Deng
(2007).
"Directed
differentiation
of
human
embryonic
stem
cells
into
functional
hepatic
cells."
Hepatology
45:
1229-‐‑1239.
Calo,
E.
and
J.
Wysocka
(2013).
"Modification
of
enhancer
chromatin:
what,
how,
and
why?"
Mol
Cell
49(5):
825-‐‑837.
Canver,
M.
C.,
D.
E.
Bauer,
A.
Dass,
Y.
Y.
Yien,
J.
Chung,
T.
Masuda,
T.
Maeda,
B.
H.
Paw
and
S.
H.
Orkin
(2014).
"Characterization
of
genomic
deletion
efficiency
mediated
by
clustered
regularly
interspaced
palindromic
repeats
(CRISPR)/Cas9
nuclease
system
in
mammalian
cells."
J
Biol
Chem
289(31):
21312-‐‑21324.
Canver,
M.
C.,
E.
C.
Smith,
F.
Sher,
L.
Pinello,
N.
E.
Sanjana,
O.
Shalem,
D.
D.
Chen,
P.
G.
Schupp,
D.
S.
Vinjamur,
S.
P.
Garcia,
S.
Luc,
R.
Kurita,
Y.
Nakamura,
Y.
Fujiwara,
T.
Maeda,
G.
C.
Yuan,
F.
152
Zhang,
S.
H.
Orkin
and
D.
E.
Bauer
(2015).
"BCL11A
enhancer
dissection
by
Cas9-‐‑mediated
in
situ
saturating
mutagenesis."
Nature.
Cao,
A.
R.,
R.
Rabinovich,
M.
Xu,
X.
Xu,
V.
X.
Jin
and
P.
J.
Farnham
(2011).
"Genome-‐‑wide
analysis
of
transcription
factor
E2F1
mutant
proteins
reveals
that
N-‐‑
and
C-‐‑terminal
protein
interaction
domains
do
not
participate
in
targeting
E2F1
to
the
human
genome."
J
Biol
Chem
286(14):
11985-‐‑11996.
Cao,
Y.,
T.
M.
Bryan
and
R.
R.
Reddel
(2008).
"Increased
copy
number
of
the
TERT
and
TERC
telomerase
subunit
genes
in
cancer
cells."
Cancer
Sci
99(6):
1092-‐‑1099.
Carneiro,
P.,
J.
Figueiredo,
R.
Bordeira-‐‑Carrico,
M.
S.
Fernandes,
J.
Carvalho,
C.
Oliveira
and
R.
Seruca
(2013).
"Therapeutic
targets
associated
to
E-‐‑cadherin
dysfunction
in
gastric
cancer."
Expert
Opin
Ther
Targets
17(10):
1187-‐‑1201.
Carvajal-‐‑Carmona,
L.
G.,
J.
B.
Cazier,
A.
M.
Jones,
K.
Howarth,
P.
Broderick,
A.
Pittman,
S.
Dobbins,
A.
Tenesa,
S.
Farrington,
J.
Prendergast,
E.
Theodoratou,
R.
Barnetson,
D.
Conti,
P.
Newcomb,
J.
L.
Hopper,
M.
A.
Jenkins,
S.
Gallinger,
D.
J.
Duggan,
H.
Campbell,
D.
Kerr,
G.
Casey,
R.
Houlston,
M.
Dunlop
and
I.
Tomlinson
(2011).
"Fine-‐‑mapping
of
colorectal
cancer
susceptibility
loci
at
8q23.3,
16q22.1
and
19q13.11:
refinement
of
association
signals
and
use
of
in
silico
analysis
to
suggest
functional
variation
and
unexpected
candidate
target
genes."
Hum
Mol
Genet
20(14):
2879-‐‑2888.
Chen,
X.,
H.
Xu,
P.
Yuan,
F.
Fang,
M.
Huss,
V.
B.
Vega,
E.
Wong,
Y.
L.
Orlov,
W.
Zhang,
J.
Jiang,
Y.
H.
Loh,
H.
C.
Yeo,
Z.
X.
Yeo,
V.
Narang,
K.
R.
Govindarajan,
B.
Leong,
A.
Shahab,
Y.
Ruan,
G.
Bourque,
W.
K.
Sung,
N.
D.
Clarke,
C.
L.
Wei
and
H.
H.
Ng
(2008).
"Integration
of
external
signaling
pathways
with
the
core
transcriptional
network
in
embryonic
stem
cells."
Cell
133(6):
1106-‐‑1117.
Cheng,
A.
W.,
H.
Wang,
H.
Yang,
L.
Shi,
Y.
Katz,
T.
W.
Theunissen,
S.
Rangarajan,
C.
S.
Shivalila,
D.
B.
Dadon
and
R.
Jaenisch
(2013).
"Multiplexed
activation
of
endogenous
genes
by
CRISPR-‐‑
on,
an
RNA-‐‑guided
transcriptional
activator
system."
Cell
Res
23(10):
1163-‐‑1171.
Cheung,
E.
C.,
D.
Athineos,
P.
Lee,
R.
A.
Ridgway,
W.
Lambie,
C.
Nixon,
D.
Strathdee,
K.
Blyth,
O.
J.
Sansom
and
K.
H.
Vousden
(2013).
"TIGAR
is
required
for
efficient
intestinal
regeneration
and
tumorigenesis."
Dev
Cell
25(5):
463-‐‑477.
Choi,
Y.,
G.
E.
Sims,
S.
Murphy,
J.
R.
Miller
and
A.
P.
Chan
(2012).
"Predicting
the
functional
effect
of
amino
acid
substitutions
and
indels."
PLoS
One
7(10):
e46688.
Cingolani,
P.,
A.
Platts,
L.
Wang
le,
M.
Coon,
T.
Nguyen,
L.
Wang,
S.
J.
Land,
X.
Lu
and
D.
M.
Ruden
(2012).
"A
program
for
annotating
and
predicting
the
effects
of
single
nucleotide
polymorphisms,
SnpEff:
SNPs
in
the
genome
of
Drosophila
melanogaster
strain
w1118;
iso-‐‑
2;
iso-‐‑3."
Fly
(Austin)
6(2):
80-‐‑92.
Claussnitzer,
M.,
S.
N.
Dankel,
B.
Klocke,
H.
Grallert,
V.
Glunk,
T.
Berulava,
H.
Lee,
N.
Oskolkov,
J.
Fadista,
K.
Ehlers,
S.
Wahl,
C.
Hoffmann,
K.
Qian,
T.
Ronn,
H.
Riess,
M.
Muller-‐‑Nurasyid,
N.
Bretschneider,
T.
Schroeder,
T.
Skurk,
B.
Horsthemke,
D.
Spieler,
M.
Klingenspor,
M.
Seifert,
M.
J.
Kern,
N.
Mejhert,
I.
Dahlman,
O.
Hansson,
S.
M.
Hauck,
M.
Bluher,
P.
Arner,
L.
Groop,
T.
Illig,
K.
Suhre,
Y.
H.
Hsu,
G.
Mellgren,
H.
Hauner
and
H.
Laumen
(2014).
"Leveraging
cross-‐‑
species
transcription
factor
binding
site
patterns:
from
diabetes
risk
loci
to
disease
mechanisms."
Cell
156(1-‐‑2):
343-‐‑358.
Clevers,
H.
(2004).
"Wnt
breakers
in
colon
cancer."
Cancer
Cell
5(1):
5-‐‑6.
Clevers,
H.
(2006).
"Wnt/beta-‐‑catenin
signaling
in
development
and
disease."
Cell
127(3):
469-‐‑480.
153
Coetzee,
G.
A.,
L.
Jia,
B.
Frenkel,
B.
E.
Henderson,
A.
Tanay,
C.
A.
Haiman
and
M.
L.
Freedman
(2010).
"A
systematic
approach
to
understand
the
functional
consequences
of
non-‐‑protein
coding
risk
regions."
Cell
Cycle
9(2):
256-‐‑259.
Coetzee,
S.
G.,
S.
K.
Rhie,
B.
P.
Berman,
G.
A.
Coetzee
and
H.
Noushmehr
(2012).
"FunciSNP:
an
R/bioconductor
tool
integrating
functional
non-‐‑coding
data
sets
with
genetic
association
studies
to
identify
candidate
regulatory
SNPs."
Nucleic
Acids
Res
40(18):
e139.
Coller,
H.
A.,
C.
Grandori,
P.
Tamayo,
T.
Colbert,
E.
S.
Lander,
R.
N.
Eisenman
and
T.
R.
Golub
(2000).
"Expression
analysis
with
oligonucleotide
microarrays
reveals
that
MYC
regulates
genes
involved
in
growth,
cell
cycle,
signaling,
and
adhesion."
Proc.
Natl.
Acad.
Sci.
USA
97:
3260-‐‑3265.
Conacci-‐‑Sorrell,
M.,
L.
McFerrin
and
R.
N.
Eisenman
(2014).
"An
overview
of
MYC
and
its
interactome."
Cold
Spring
Harb
Perspect
Med
4(1):
a014357.
Cong,
L.,
F.
A.
Ran,
D.
Cox,
S.
Lin,
R.
Barretto,
N.
Habib,
P.
D.
Hsu,
X.
Wu,
W.
Jiang,
L.
A.
Marraffini
and
F.
Zhang
(2013).
"Multiplex
genome
engineering
using
CRISPR/Cas
systems."
Science
339(6121):
819-‐‑823.
Corradin,
O.,
A.
Saiakhova,
B.
Akhtar-‐‑Zaidi,
L.
Myeroff,
J.
Willis,
R.
Cowper-‐‑Sal
lari,
M.
Lupien,
S.
Markowitz
and
P.
C.
Scacheri
(2014).
"Combinatorial
effects
of
multiple
enhancer
variants
in
linkage
disequilibrium
dictate
levels
of
gene
expression
to
confer
susceptibility
to
common
traits."
Genome
Res
24(1):
1-‐‑13.
Corradin,
O.
and
P.
C.
Scacheri
(2014).
"Enhancer
variants:
evaluating
functions
in
common
disease."
Genome
Med
6(10):
85.
Danussi,
C.,
U.
D.
Akavia,
F.
Niola,
A.
Jovic,
A.
Lasorella,
D.
Pe'er
and
A.
Iavarone
(2013).
"RHPN2
Drives
Mesenchymal
Transformation
in
Malignant
Glioma
by
Triggering
RhoA
Activation."
Cancer
Res
73(16):
5140-‐‑5150.
Dayeh,
T.
A.,
A.
H.
Olsson,
P.
Volkov,
P.
Almgren,
T.
Ronn
and
C.
Ling
(2013).
"Identification
of
CpG-‐‑SNPs
associated
with
type
2
diabetes
and
differential
DNA
methylation
in
human
pancreatic
islets."
Diabetologia
56(5):
1036-‐‑1046.
Dimas,
A.
S.,
S.
Deutsch,
B.
E.
Stranger,
S.
B.
Montgomery,
C.
Borel,
H.
Attar-‐‑Cohen,
C.
Ingle,
C.
Beazley,
M.
Gutierrez
Arcelus,
M.
Sekowska,
M.
Gagnebin,
J.
Nisbett,
P.
Deloukas,
E.
T.
Dermitzakis
and
S.
E.
Antonarakis
(2009).
"Common
regulatory
variation
impacts
gene
expression
in
a
cell
type-‐‑dependent
manner."
Science
325(5945):
1246-‐‑1250.
Ding,
Q.,
Y.
K.
Lee,
E.
A.
Schaefer,
D.
T.
Peters,
A.
Veres,
K.
Kim,
N.
Kuperwasser,
D.
L.
Motola,
T.
B.
Meissner,
W.
T.
Hendriks,
M.
Trevisan,
R.
M.
Gupta,
A.
Moisan,
E.
Banks,
M.
Friesen,
R.
T.
Schinzel,
F.
Xia,
A.
Tang,
Y.
Xia,
E.
Figueroa,
A.
Wann,
T.
Ahfeldt,
L.
Daheron,
F.
Zhang,
L.
L.
Rubin,
L.
F.
Peng,
R.
T.
Chung,
K.
Musunuru
and
C.
A.
Cowan
(2013).
"A
TALEN
genome-‐‑
editing
system
for
generating
human
stem
cell-‐‑based
disease
models."
Cell
Stem
Cell
12(2):
238-‐‑251.
Ding,
Z.,
Y.
Ni,
S.
W.
Timmer,
B.
K.
Lee,
A.
Battenhouse,
S.
Louzada,
F.
Yang,
I.
Dunham,
G.
E.
Crawford,
J.
D.
Lieb,
R.
Durbin,
V.
R.
Iyer
and
E.
Birney
(2014).
"Quantitative
genetics
of
CTCF
binding
reveal
local
sequence
effects
and
different
modes
of
X-‐‑chromosome
association."
PLoS
Genet
10(11):
e1004798.
Diolaiti,
D.,
L.
McFerrin,
P.
A.
Carroll
and
R.
N.
Eisenman
(2014).
"Functional
interactions
among
members
of
the
MAX
and
MLX
transcriptional
network
during
oncogenesis."
Biochim
Biophys
Acta:
S1874-‐‑9399.
154
Dryden,
N.
H.,
L.
R.
Broome,
F.
Dudbridge,
N.
Johnson,
N.
Orr,
S.
Schoenfelder,
T.
Nagano,
S.
Andrews,
S.
Wingett,
I.
Kozarewa,
I.
Assiotis,
K.
Fenwick,
S.
L.
Maguire,
J.
Campbell,
R.
Natrajan,
M.
Lambros,
E.
Perrakis,
A.
Ashworth,
P.
Fraser
and
O.
Fletcher
(2014).
"Unbiased
analysis
of
potential
targets
of
breast
cancer
susceptibility
loci
by
Capture
Hi-‐‑C."
Genome
Res
24(11):
1854-‐‑1868.
Du,
M.,
T.
Yuan,
K.
F.
Schilter,
R.
L.
Dittmar,
A.
Mackinnon,
X.
Huang,
M.
Tschannen,
E.
Worthey,
H.
Jacob,
S.
Xia,
J.
Gao,
L.
Tillmans,
Y.
Lu,
P.
Liu,
S.
N.
Thibodeau
and
L.
Wang
(2015).
"Prostate
cancer
risk
locus
at
8q24
as
a
regulatory
hub
by
physical
interactions
with
multiple
genomic
loci
across
the
genome."
Hum
Mol
Genet
24(1):
154-‐‑166.
Dunlop,
M.
G.,
S.
E.
Dobbins,
S.
M.
Farrington,
A.
M.
Jones,
C.
Palles,
N.
Whiffin,
A.
Tenesa,
S.
Spain,
P.
Broderick,
L.
Y.
Ooi,
E.
Domingo,
C.
Smillie,
M.
Henrion,
M.
Frampton,
L.
Martin,
G.
Grimes,
M.
Gorman,
C.
Semple,
Y.
P.
Ma,
E.
Barclay,
J.
Prendergast,
J.
B.
Cazier,
B.
Olver,
S.
Penegar,
S.
Lubbe,
I.
Chander,
L.
G.
Carvajal-‐‑Carmona,
S.
Ballereau,
A.
Lloyd,
J.
Vijayakrishnan,
L.
Zgaga,
I.
Rudan,
E.
Theodoratou,
J.
M.
Starr,
I.
Deary,
I.
Kirac,
D.
Kovacevic,
L.
A.
Aaltonen,
L.
Renkonen-‐‑Sinisalo,
J.
P.
Mecklin,
K.
Matsuda,
Y.
Nakamura,
Y.
Okada,
S.
Gallinger,
D.
J.
Duggan,
D.
Conti,
P.
Newcomb,
J.
Hopper,
M.
A.
Jenkins,
F.
Schumacher,
G.
Casey,
D.
Easton,
M.
Shah,
P.
Pharoah,
A.
Lindblom,
T.
Liu,
C.
G.
Smith,
H.
West,
J.
P.
Cheadle,
R.
Midgley,
D.
J.
Kerr,
H.
Campbell,
I.
P.
Tomlinson
and
R.
S.
Houlston
(2012).
"Common
variation
near
CDKN1A,
POLD3
and
SHROOM2
influences
colorectal
cancer
risk."
Nat
Genet
44(7):
770-‐‑776.
Edwards,
S.
L.,
J.
Beesley,
J.
D.
French
and
A.
M.
Dunning
(2013).
"Beyond
GWASs:
illuminating
the
dark
road
from
association
to
function."
Am
J
Hum
Genet
93(5):
779-‐‑797.
Emilsson,
V.,
G.
Thorleifsson,
B.
Zhang,
A.
S.
Leonardson,
F.
Zink,
J.
Zhu,
S.
Carlson,
A.
Helgason,
G.
B.
Walters,
S.
Gunnarsdottir,
M.
Mouy,
V.
Steinthorsdottir,
G.
H.
Eiriksdottir,
G.
Bjornsdottir,
I.
Reynisdottir,
D.
Gudbjartsson,
A.
Helgadottir,
A.
Jonasdottir,
A.
Jonasdottir,
U.
Styrkarsdottir,
S.
Gretarsdottir,
K.
P.
Magnusson,
H.
Stefansson,
R.
Fossdal,
K.
Kristjansson,
H.
G.
Gislason,
T.
Stefansson,
B.
G.
Leifsson,
U.
Thorsteinsdottir,
J.
R.
Lamb,
J.
R.
Gulcher,
M.
L.
Reitman,
A.
Kong,
E.
E.
Schadt
and
K.
Stefansson
(2008).
"Genetics
of
gene
expression
and
its
effect
on
disease."
Nature
452(7186):
423-‐‑428.
ENCODE_Project_Consortium
(2012).
"An
integrated
encyclopedia
of
DNA
elements
in
the
human
genome."
Nature
489(7414):
57-‐‑74.
Ernst,
J.,
P.
Kheradpour,
T.
S.
Mikkelsen,
N.
Shoresh,
L.
D.
Ward,
C.
B.
Epstein,
X.
Zhang,
L.
Wang,
R.
Issner,
M.
Coyne,
M.
Ku,
T.
Durham,
M.
Kellis
and
B.
E.
Bernstein
(2011).
"Mapping
and
analysis
of
chromatin
state
dynamics
in
nine
human
cell
types."
Nature
473(7345):
43-‐‑
49.
Fairfax,
B.
P.,
P.
Humburg,
S.
Makino,
V.
Naranbhai,
D.
Wong,
E.
Lau,
L.
Jostins,
K.
Plant,
R.
Andrews,
C.
McGee
and
J.
C.
Knight
(2014).
"Innate
immune
activity
conditions
the
effect
of
regulatory
variants
upon
monocyte
gene
expression."
Science
343(6175):
1246949.
Farh,
K.
K.,
A.
Marson,
J.
Zhu,
M.
Kleinewietfeld,
W.
J.
Housley,
S.
Beik,
N.
Shoresh,
H.
Whitton,
R.
J.
Ryan,
A.
A.
Shishkin,
M.
Hatan,
M.
J.
Carrasco-‐‑Alfonso,
D.
Mayer,
C.
J.
Luckey,
N.
A.
Patsopoulos,
P.
L.
De
Jager,
V.
K.
Kuchroo,
C.
B.
Epstein,
M.
J.
Daly,
D.
A.
Hafler
and
B.
E.
Bernstein
(2015).
"Genetic
and
epigenetic
fine
mapping
of
causal
autoimmune
disease
variants."
Nature
518(7539):
337-‐‑343.
Farnham,
P.
J.
(2009).
"Insights
from
genomic
profiling
of
transcription
factors."
Nature
Rev
Genet
10:
605-‐‑616.
155
Farnham,
P.
J.
(2012).
"Thematic
Minireview
Series
on
Results
from
the
ENCODE
Project:
Integrative
Global
Analyses
of
Regulatory
Regions
in
the
Human
Genome."
J
Biol
Chem
287(37):
30885-‐‑30887.
Fearon,
E.
R.
(2011).
"Molecular
genetics
of
colorectal
cancer."
Annu
Rev
Pathol
6:
479-‐‑507.
Fortini,
B.
K.,
S.
Tring,
S.
J.
Plummer,
C.
K.
Edlund,
V.
Moreno,
R.
S.
Bresalier,
E.
L.
Barry,
T.
R.
Church,
J.
C.
Figueiredo
and
G.
Casey
(2014).
"Multiple
functional
risk
variants
in
a
SMAD7
enhancer
implicate
a
colorectal
cancer
risk
haplotype."
PLoS
One
9(11):
e111914.
Freedman,
M.
L.,
A.
N.
Monteiro,
S.
A.
Gayther,
G.
A.
Coetzee,
A.
Risch,
C.
Plass,
G.
Casey,
M.
De
Biasi,
C.
Carlson,
D.
Duggan,
M.
James,
P.
Liu,
J.
W.
Tichelaar,
H.
G.
Vikis,
M.
You
and
I.
G.
Mills
(2011).
"Principles
for
the
post-‐‑GWAS
functional
characterization
of
cancer
risk
loci."
Nat
Genet
43(6):
513-‐‑518.
Frietze,
S.,
R.
Wang,
L.
Yao,
Y.
G.
Tak,
Z.
Ye,
M.
Gaddis,
H.
Witt,
P.
J.
Farnham
and
V.
X.
Jin
(2012).
"Cell
type-‐‑specific
binding
patterns
reveal
that
TCF7L2
can
be
tethered
to
the
genome
by
association
with
GATA3."
Genome
Biol
13(9):
R52.
Gabay,
M.,
Y.
Li
and
D.
W.
Felsher
(2014).
"MYC
activation
is
a
hallmark
of
cancer
initiation
and
maintenance."
Cold
Spring
Harb
Perspect
Med
4(6).
Gaddis,
M.,
D.
Gerrard,
S.
Frietze
and
P.
J.
Farnham
(2015).
"Altering
cancer
transcriptomes
using
epigenomic
inhibitors."
Epigenetics
Chromatin
8:
9.
Gao,
X.,
J.
C.
Tsang,
F.
Gaba,
D.
Wu,
L.
Lu
and
P.
Liu
(2014).
"Comparison
of
TALE
designer
transcription
factors
and
the
CRISPR/dCas9
in
regulation
of
gene
expression
by
targeting
enhancers."
Nucleic
Acids
Res
42(20):
e155.
Garte,
S.
J.
(1993).
"The
c-‐‑myc
oncogene
in
tumor
progression."
Critical
Reviews
in
Oncog.
4(4):
435-‐‑449.
Ghazalpour,
A.,
B.
Bennett,
V.
A.
Petyuk,
L.
Orozco,
R.
Hagopian,
I.
N.
Mungrue,
C.
R.
Farber,
J.
Sinsheimer,
H.
M.
Kang,
N.
Furlotte,
C.
C.
Park,
P.
Z.
Wen,
H.
Brewer,
K.
Weitz,
D.
G.
Camp,
2nd,
C.
Pan,
R.
Yordanova,
I.
Neuhaus,
C.
Tilford,
N.
Siemers,
P.
Gargalovic,
E.
Eskin,
T.
Kirchgessner,
D.
J.
Smith,
R.
D.
Smith
and
A.
J.
Lusis
(2011).
"Comparative
analysis
of
proteome
and
transcriptome
variation
in
mouse."
PLoS
Genet
7(6):
e1001393.
Gibson,
G.,
J.
E.
Powell
and
U.
M.
Marigorta
(2015).
"Expression
quantitative
trait
locus
analysis
for
translational
medicine."
Genome
Med
7(1):
60.
Grobarczyk,
B.,
B.
Franco,
K.
Hanon
and
B.
Malgrange
(2015).
"Generation
of
Isogenic
Human
iPS
Cell
Line
Precisely
Corrected
by
Genome
Editing
Using
the
CRISPR/Cas9
System."
Stem
Cell
Rev
11(5):
774-‐‑787.
Guo,
Y.,
D.
V.
Conti
and
K.
Wang
(2015).
"Enlight:
web-‐‑based
integration
of
GWAS
results
with
biological
annotations."
Bioinformatics
31(2):
275-‐‑276.
Guo,
Y.,
Q.
Xu,
D.
Canzio,
J.
Shou,
J.
Li,
D.
U.
Gorkin,
I.
Jung,
H.
Wu,
Y.
Zhai,
Y.
Tang,
Y.
Lu,
Y.
Wu,
Z.
Jia,
W.
Li,
M.
Q.
Zhang,
B.
Ren,
A.
R.
Krainer,
T.
Maniatis
and
Q.
Wu
(2015).
"CRISPR
Inversion
of
CTCF
Sites
Alters
Genome
Topology
and
Enhancer/Promoter
Function."
Cell
162(4):
900-‐‑910.
Gutierrez-‐‑Arcelus,
M.,
T.
Lappalainen,
S.
B.
Montgomery,
A.
Buil,
H.
Ongen,
A.
Yurovsky,
J.
Bryois,
T.
Giger,
L.
Romano,
A.
Planchon,
E.
Falconnet,
D.
Bielser,
M.
Gagnebin,
I.
Padioleau,
C.
Borel,
A.
Letourneau,
P.
Makrythanasis,
M.
Guipponi,
C.
Gehrig,
S.
E.
Antonarakis
and
E.
T.
Dermitzakis
(2013).
"Passive
and
active
DNA
methylation
and
the
interplay
with
genetic
variation
in
gene
regulation."
Elife
2:
e00523.
156
Han,
Y.,
D.
J.
Hazelett,
F.
Wiklund,
F.
R.
Schumacher,
D.
O.
Stram,
S.
I.
Berndt,
Z.
Wang,
K.
A.
Rand,
R.
N.
Hoover,
M.
J.
Machiela,
M.
Yeager,
L.
Burdette,
C.
C.
Chung,
A.
Hutchinson,
K.
Yu,
J.
Xu,
R.
C.
Travis,
T.
J.
Key,
A.
Siddiq,
F.
Canzian,
A.
Takahashi,
M.
Kubo,
J.
L.
Stanford,
S.
Kolb,
S.
M.
Gapstur,
W.
R.
Diver,
V.
L.
Stevens,
S.
S.
Strom,
C.
A.
Pettaway,
A.
A.
Al
Olama,
Z.
Kote-‐‑Jarai,
R.
A.
Eeles,
E.
D.
Yeboah,
Y.
Tettey,
R.
B.
Biritwum,
A.
A.
Adjei,
E.
Tay,
A.
Truelove,
S.
Niwa,
A.
P.
Chokkalingam,
W.
B.
Isaacs,
C.
Chen,
S.
Lindstrom,
L.
Le
Marchand,
E.
L.
Giovannucci,
M.
Pomerantz,
H.
Long,
F.
Li,
J.
Ma,
M.
Stampfer,
E.
M.
John,
S.
A.
Ingles,
R.
A.
Kittles,
A.
B.
Murphy,
W.
J.
Blot,
L.
B.
Signorello,
W.
Zheng,
D.
Albanes,
J.
Virtamo,
S.
Weinstein,
B.
Nemesure,
J.
Carpten,
M.
C.
Leske,
S.
Y.
Wu,
A.
J.
Hennis,
B.
A.
Rybicki,
C.
Neslund-‐‑Dudas,
A.
W.
Hsing,
L.
Chu,
P.
J.
Goodman,
E.
A.
Klein,
S.
L.
Zheng,
J.
S.
Witte,
G.
Casey,
E.
Riboli,
Q.
Li,
M.
L.
Freedman,
D.
J.
Hunter,
H.
Gronberg,
M.
B.
Cook,
H.
Nakagawa,
P.
Kraft,
S.
J.
Chanock,
D.
F.
Easton,
B.
E.
Henderson,
G.
A.
Coetzee,
D.
V.
Conti
and
C.
A.
Haiman
(2015).
"Integration
of
multiethnic
fine-‐‑mapping
and
genomic
annotation
to
prioritize
candidate
functional
SNPs
at
prostate
cancer
susceptibility
regions."
Hum
Mol
Genet.
Han,
Y.,
O.
J.
Slivano,
C.
K.
Christie,
A.
W.
Cheng
and
J.
M.
Miano
(2015).
"CRISPR-‐‑Cas9
genome
editing
of
a
single
regulatory
element
nearly
abolishes
target
gene
expression
in
mice-‐‑-‐‑brief
report."
Arterioscler
Thromb
Vasc
Biol
35(2):
312-‐‑315.
Hardison,
R.
C.
(2012).
"Genome-‐‑wide
Epigenetic
Data
Facilitate
Understanding
of
Disease
Susceptibility
Association
Studies."
J
Biol
Chem
287(37):
30932-‐‑30940.
Hazelett,
D.
J.,
S.
K.
Rhie,
M.
Gaddis,
C.
Yan,
D.
L.
Lakeland,
S.
G.
Coetzee,
B.
E.
Henderson,
H.
Noushmehr,
W.
Cozen,
Z.
Kote-‐‑Jarai,
R.
A.
Eeles,
D.
F.
Easton,
C.
A.
Haiman,
W.
Lu,
P.
J.
Farnham
and
G.
A.
Coetzee
(2014).
"Comprehensive
functional
annotation
of
77
prostate
cancer
risk
loci."
PLoS
Genet
10(1):
e1004102.
Heinz,
S.,
C.
E.
Romanoski,
C.
Benner,
K.
A.
Allison,
M.
U.
Kaikkonen,
L.
D.
Orozco
and
C.
K.
Glass
(2013).
"Effect
of
natural
genetic
variation
on
enhancer
selection
and
function."
Nature
503(7477):
487-‐‑492.
Heyn,
H.,
S.
Sayols,
C.
Moutinho,
E.
Vidal,
J.
V.
Sanchez-‐‑Mut,
O.
A.
Stefansson,
E.
Nadal,
S.
Moran,
J.
E.
Eyfjord,
E.
Gonzalez-‐‑Suarez,
M.
A.
Pujana
and
M.
Esteller
(2014).
"Linkage
of
DNA
methylation
quantitative
trait
loci
to
human
cancer
risk."
Cell
Rep
7(2):
331-‐‑338.
Hilton,
I.
B.,
A.
M.
D'Ippolito,
C.
M.
Vockley,
P.
I.
Thakore,
G.
E.
Crawford,
T.
E.
Reddy
and
C.
A.
Gersbach
(2015).
"Epigenome
editing
by
a
CRISPR-‐‑Cas9-‐‑based
acetyltransferase
activates
genes
from
promoters
and
enhancers."
Nat
Biotechnol
33(5):
510-‐‑517.
Hindorff,
L.
A.,
P.
Sethupathy,
H.
A.
Junkins,
E.
M.
Ramos,
J.
P.
Mehta,
F.
S.
Collins
and
T.
A.
Manolio
(2009).
"Potential
etiologic
and
functional
implications
of
genome-‐‑wide
association
loci
for
human
diseases
and
traits."
Proc
Natl
Acad
Sci
U
S
A
106(23):
9362-‐‑9367.
Hitchins,
M.
P.,
R.
W.
Rapkins,
C.
T.
Kwok,
S.
Srivastava,
J.
J.
Wong,
L.
M.
Khachigian,
P.
Polly,
J.
Goldblatt
and
R.
L.
Ward
(2011).
"Dominantly
inherited
constitutional
epigenetic
silencing
of
MLH1
in
a
cancer-‐‑affected
family
is
linked
to
a
single
nucleotide
variant
within
the
5'UTR."
Cancer
Cell
20(2):
200-‐‑213.
Holwerda,
S.
J.
and
W.
de
Laat
(2013).
"CTCF:
the
protein,
the
binding
partners,
the
binding
sites
and
their
chromatin
loops."
Philos
Trans
R
Soc
Lond
B
Biol
Sci
368(1620):
20120369.
Home,
P.,
B.
Saha,
S.
Ray,
D.
Dutta,
S.
Gunewardena,
B.
Yoo,
A.
Pal,
J.
L.
Vivian,
M.
Larson,
M.
Petroff,
P.
G.
Gallagher,
V.
P.
Schulz,
K.
L.
White,
T.
G.
Golos,
B.
Behr
and
S.
Paul
(2012).
"Altered
subcellular
localization
of
transcription
factor
TEAD4
regulates
first
mammalian
cell
lineage
commitment."
Proc
Natl
Acad
Sci
U
S
A
109(19):
7362-‐‑7367.
157
Houlston,
R.
S.,
J.
Cheadle,
S.
E.
Dobbins,
A.
Tenesa,
A.
M.
Jones,
K.
Howarth,
S.
L.
Spain,
P.
Broderick,
E.
Domingo,
S.
Farrington,
J.
G.
Prendergast,
A.
M.
Pittman,
E.
Theodoratou,
C.
G.
Smith,
B.
Olver,
A.
Walther,
R.
A.
Barnetson,
M.
Churchman,
E.
E.
Jaeger,
S.
Penegar,
E.
Barclay,
L.
Martin,
M.
Gorman,
R.
Mager,
E.
Johnstone,
R.
Midgley,
I.
Niittymaki,
S.
Tuupanen,
J.
Colley,
S.
Idziaszczyk,
H.
J.
Thomas,
A.
M.
Lucassen,
D.
G.
Evans,
E.
R.
Maher,
T.
Maughan,
A.
Dimas,
E.
Dermitzakis,
J.
B.
Cazier,
L.
A.
Aaltonen,
P.
Pharoah,
D.
J.
Kerr,
L.
G.
Carvajal-‐‑
Carmona,
H.
Campbell,
M.
G.
Dunlop
and
I.
P.
Tomlinson
(2010).
"Meta-‐‑analysis
of
three
genome-‐‑wide
association
studies
identifies
susceptibility
loci
for
colorectal
cancer
at
1q41,
3q26.2,
12q13.13
and
20q13.33."
Nat
Genet
42(11):
973-‐‑977.
Houlston,
R.
S.,
E.
Webb,
P.
Broderick,
A.
M.
Pittman,
M.
C.
Di
Bernardo,
S.
Lubbe,
I.
Chandler,
J.
Vijayakrishnan,
K.
Sullivan,
S.
Penegar,
L.
Carvajal-‐‑Carmona,
K.
Howarth,
E.
Jaeger,
S.
L.
Spain,
A.
Walther,
E.
Barclay,
L.
Martin,
M.
Gorman,
E.
Domingo,
A.
S.
Teixeira,
D.
Kerr,
J.
B.
Cazier,
I.
Niittymaki,
S.
Tuupanen,
A.
Karhu,
L.
A.
Aaltonen,
I.
P.
Tomlinson,
S.
M.
Farrington,
A.
Tenesa,
J.
G.
Prendergast,
R.
A.
Barnetson,
R.
Cetnarskyj,
M.
E.
Porteous,
P.
D.
Pharoah,
T.
Koessler,
J.
Hampe,
S.
Buch,
C.
Schafmayer,
J.
Tepel,
S.
Schreiber,
H.
Volzke,
J.
Chang-‐‑Claude,
M.
Hoffmeister,
H.
Brenner,
B.
W.
Zanke,
A.
Montpetit,
T.
J.
Hudson,
S.
Gallinger,
H.
Campbell
and
M.
G.
Dunlop
(2008).
"Meta-‐‑analysis
of
genome-‐‑wide
association
data
identifies
four
new
susceptibility
loci
for
colorectal
cancer."
Nat
Genet
40(12):
1426-‐‑1435.
Howie,
B.
N.,
P.
Donnelly
and
J.
Marchini
(2009).
"A
flexible
and
accurate
genotype
imputation
method
for
the
next
generation
of
genome-‐‑wide
association
studies."
PLoS
Genet
5(6):
e1000529.
Hsu,
P.
D.,
E.
S.
Lander
and
F.
Zhang
(2014).
"Development
and
applications
of
CRISPR-‐‑Cas9
for
genome
engineering."
Cell
157(6):
1262-‐‑1278.
Hsu,
P.
D.,
D.
A.
Scott,
J.
A.
Weinstein,
F.
A.
Ran,
S.
Konermann,
V.
Agarwala,
Y.
Li,
E.
J.
Fine,
X.
Wu,
O.
Shalem,
T.
J.
Cradick,
L.
A.
Marraffini,
G.
Bao
and
F.
Zhang
(2013).
"DNA
targeting
specificity
of
RNA-‐‑guided
Cas9
nucleases."
Nat
Biotechnol
31(9):
827-‐‑832.
Huang,
G.
L.,
H.
Q.
Guo,
F.
Yang,
O.
F.
Liu,
B.
B.
Li,
X.
Y.
Liu,
Y.
Lu
and
Z.
W.
He
(2012).
"Activating
transcription
factor
1
is
a
prognostic
marker
of
colorectal
cancer."
Asian
Pac
J
Cancer
Prev
13(3):
1053-‐‑1057.
Hughes,
J.
R.,
N.
Roberts,
S.
McGowan,
D.
Hay,
E.
Giannoulatou,
M.
Lynch,
M.
De
Gobbi,
S.
Taylor,
R.
Gibbons
and
D.
R.
Higgs
(2014).
"Analysis
of
hundreds
of
cis-‐‑regulatory
landscapes
at
high
resolution
in
a
single,
high-‐‑throughput
experiment."
Nat
Genet
46(2):
205-‐‑212.
Huppi,
K.,
J.
J.
Pitt,
B.
M.
Wahlberg
and
N.
J.
Caplen
(2012).
"The
8q24
gene
desert:
an
oasis
of
non-‐‑coding
transcriptional
activity."
Front
Genet
3:
69.
Jager,
R.,
G.
Migliorini,
M.
Henrion,
R.
Kandaswamy,
H.
E.
Speedy,
A.
Heindl,
N.
Whiffin,
M.
J.
Carnicer,
L.
Broome,
N.
Dryden,
T.
Nagano,
S.
Schoenfelder,
M.
Enge,
Y.
Yuan,
J.
Taipale,
P.
Fraser,
O.
Fletcher
and
R.
S.
Houlston
(2015).
"Capture
Hi-‐‑C
identifies
the
chromatin
interactome
of
colorectal
cancer
risk
loci."
Nat
Commun
6:
6178.
Jeon,
B.
N.,
M.
K.
Kim,
W.
I.
Choi,
D.
I.
Koh,
S.
Y.
Hong,
K.
S.
Kim,
M.
Kim,
C.
O.
Yun,
J.
Yoon,
K.
Y.
Choi,
K.
R.
Lee,
K.
P.
Nephew
and
M.
W.
Hur
(2012).
"KR-‐‑POK
interacts
with
p53
and
represses
its
ability
to
activate
transcription
of
p21WAF1/CDKN1A."
Cancer
Res
72(5):
1137-‐‑1148.
158
Jin,
F.,
Y.
Li,
J.
R.
Dixon,
S.
Selvaraj,
Z.
Ye,
A.
Y.
Lee,
C.
A.
Yen,
A.
D.
Schmitt,
C.
A.
Espinoza
and
B.
Ren
(2013).
"A
high-‐‑resolution
map
of
the
three-‐‑dimensional
chromatin
interactome
in
human
cells."
Nature
503(7475):
290-‐‑294.
Jones,
P.
A.
(2012).
"Functions
of
DNA
methylation:
islands,
start
sites,
gene
bodies
and
beyond."
Nat
Rev
Genet
13(7):
484-‐‑492.
Kabadi,
A.
M.
and
C.
A.
Gersbach
(2014).
"Engineering
synthetic
TALE
and
CRISPR/Cas9
transcription
factors
for
regulating
gene
expression."
Methods
69(2):
188-‐‑197.
Karagiannis,
G.
S.,
A.
Berk,
A.
Dimitromanolakis
and
E.
P.
Diamandis
(2013).
"Enrichment
map
profiling
of
the
cancer
invasion
front
suggests
regulation
of
colorectal
cancer
progression
by
the
bone
morphogenetic
protein
antagonist,
gremlin-‐‑1."
Mol
Oncol
7(4):
826-‐‑839.
Kearns,
N.
A.,
H.
Pham,
B.
Tabak,
R.
M.
Genga,
N.
J.
Silverstein,
M.
Garber
and
R.
Maehr
(2015).
"Functional
annotation
of
native
enhancers
with
a
Cas9-‐‑histone
demethylase
fusion."
Nat
Methods
12(5):
401-‐‑403.
Kessler,
J.
D.,
K.
T.
Kahle,
T.
Sun,
K.
L.
Meerbrey,
M.
R.
Schlabach,
E.
M.
Schmitt,
S.
O.
Skinner,
Q.
Xu,
M.
Z.
Li,
Z.
C.
Hartman,
M.
Rao,
P.
Yu,
R.
Dominguez-‐‑Vidana,
A.
C.
Liang,
N.
L.
Solimini,
R.
J.
Bernardi,
B.
Yu,
T.
Hsu,
I.
Golding,
J.
Luo,
C.
K.
Osborne,
C.
J.
Creighton,
S.
G.
Hilsenbeck,
R.
Schiff,
C.
A.
Shaw,
S.
J.
Elledge
and
T.
F.
Westbrook
(2012).
"A
SUMOylation-‐‑dependent
transcriptional
subprogram
is
required
for
Myc-‐‑driven
tumorigenesis."
Science
335(6066):
348-‐‑353.
Kichaev,
G.
and
B.
Pasaniuc
(2015).
"Leveraging
Functional-‐‑Annotation
Data
in
Trans-‐‑ethnic
Fine-‐‑Mapping
Studies."
Am
J
Hum
Genet
97(2):
260-‐‑271.
Kilpinen,
H.,
S.
M.
Waszak,
A.
R.
Gschwind,
S.
K.
Raghav,
R.
M.
Witwicki,
A.
Orioli,
E.
Migliavacca,
M.
Wiederkehr,
M.
Gutierrez-‐‑Arcelus,
N.
I.
Panousis,
A.
Yurovsky,
T.
Lappalainen,
L.
Romano-‐‑Palumbo,
A.
Planchon,
D.
Bielser,
J.
Bryois,
I.
Padioleau,
G.
Udin,
S.
Thurnheer,
D.
Hacker,
L.
J.
Core,
J.
T.
Lis,
N.
Hernandez,
A.
Reymond,
B.
Deplancke
and
E.
T.
Dermitzakis
(2013).
"Coordinated
effects
of
sequence
variation
on
DNA
binding,
chromatin
structure,
and
transcription."
Science
342(6159):
744-‐‑747.
Kim,
H.
and
J.
S.
Kim
(2014).
"A
guide
to
genome
engineering
with
programmable
nucleases."
Nat
Rev
Genet
15(5):
321-‐‑334.
Knosel,
T.,
Y.
Chen,
S.
Hotovy,
U.
Settmacher,
A.
Altendorf-‐‑Hofmann
and
I.
Petersen
(2012).
"Loss
of
desmocollin
1-‐‑3
and
homeobox
genes
PITX1
and
CDX2
are
associated
with
tumor
progression
and
survival
in
colorectal
carcinoma."
Int
J
Colorectal
Dis
27(11):
1391-‐‑1399.
Koudritsky,
M.
and
E.
Domany
(2008).
"Positional
distribution
of
human
transcription
factor
binding
sites."
Nucleic
Acids
Res
36(21):
6795-‐‑6805.
Kraft,
K.,
S.
Geuer,
A.
J.
Will,
W.
L.
Chan,
C.
Paliou,
M.
Borschiwer,
I.
Harabula,
L.
Wittler,
M.
Franke,
D.
M.
Ibrahim,
B.
K.
Kragesteen,
M.
Spielmann,
S.
Mundlos,
D.
G.
Lupianez
and
G.
Andrey
(2015).
"Deletions,
Inversions,
Duplications:
Engineering
of
Structural
Variants
using
CRISPR/Cas
in
Mice."
Cell
Rep.
Krzywinski,
M.,
J.
Schein,
I.
Birol,
J.
Connors,
R.
Gascoyne,
D.
Horsman,
S.
J.
Jones
and
M.
A.
Marra
(2009).
"Circos:
an
information
aesthetic
for
comparative
genomics."
Genome
Res
19(9):
1639-‐‑1645.
Kwasnieski,
J.
C.,
C.
Fiore,
H.
G.
Chaudhari
and
B.
A.
Cohen
(2014).
"High-‐‑throughput
functional
testing
of
ENCODE
segmentation
predictions."
Genome
Res
24(10):
1595-‐‑1602.
159
Lee,
M.
N.,
C.
Ye,
A.
C.
Villani,
T.
Raj,
W.
Li,
T.
M.
Eisenhaure,
S.
H.
Imboywa,
P.
I.
Chipendo,
F.
A.
Ran,
K.
Slowikowski,
L.
D.
Ward,
K.
Raddassi,
C.
McCabe,
M.
H.
Lee,
I.
Y.
Frohlich,
D.
A.
Hafler,
M.
Kellis,
S.
Raychaudhuri,
F.
Zhang,
B.
E.
Stranger,
C.
O.
Benoist,
P.
L.
De
Jager,
A.
Regev
and
N.
Hacohen
(2014).
"Common
genetic
variants
modulate
pathogen-‐‑sensing
responses
in
human
dendritic
cells."
Science
343(6175):
1246980.
Li,
G.,
X.
Ruan,
R.
K.
Auerbach,
K.
S.
Sandhu,
M.
Zheng,
P.
Wang,
H.
M.
Poh,
Y.
Goh,
J.
Lim,
J.
Zhang,
H.
S.
Sim,
S.
Q.
Peh,
F.
H.
Mulawadi,
C.
T.
Ong,
Y.
L.
Orlov,
S.
Hong,
Z.
Zhang,
S.
Landt,
D.
Raha,
G.
Euskirchen,
C.
L.
Wei,
W.
Ge,
H.
Wang,
C.
Davis,
K.
I.
Fisher-‐‑Aylor,
A.
Mortazavi,
M.
Gerstein,
T.
Gingeras,
B.
Wold,
Y.
Sun,
M.
J.
Fullwood,
E.
Cheung,
E.
Liu,
W.
K.
Sung,
M.
Snyder
and
Y.
Ruan
(2012).
"Extensive
promoter-‐‑centered
chromatin
interactions
provide
a
topological
basis
for
transcription
regulation."
Cell
148(1-‐‑2):
84-‐‑98.
Li,
H.,
H.
Chen,
F.
Liu,
C.
Ren,
S.
Wang,
X.
Bo
and
W.
Shu
(2015).
"Functional
annotation
of
HOT
regions
in
the
human
genome:
implications
for
human
disease
and
cancer."
Sci
Rep
5:
11633.
Li,
J.,
J.
Shou,
Y.
Guo,
Y.
Tang,
Y.
Wu,
Z.
Jia,
Y.
Zhai,
Z.
Chen,
Q.
Xu
and
Q.
Wu
(2015).
"Efficient
inversions
and
duplications
of
mammalian
regulatory
DNA
elements
and
gene
clusters
by
CRISPR/Cas9."
J
Mol
Cell
Biol
7(4):
284-‐‑298.
Li,
M.
J.,
L.
Y.
Wang,
Z.
Xia,
P.
C.
Sham
and
J.
Wang
(2013).
"GWAS3D:
Detecting
human
regulatory
variants
by
integrative
analysis
of
genome-‐‑wide
associations,
chromosome
interactions
and
histone
modifications."
Nucleic
Acids
Res
41(Web
Server
issue):
W150-‐‑
158.
Li,
Y.,
C.
M.
Rivera,
H.
Ishii,
F.
Jin,
S.
Selvaraj,
A.
Y.
Lee,
J.
R.
Dixon
and
B.
Ren
(2014).
"CRISPR
reveals
a
distal
super-‐‑enhancer
required
for
Sox2
expression
in
mouse
embryonic
stem
cells."
PLoS
One
9(12):
e114485.
Li,
Y.,
C.
Willer,
S.
Sanna
and
G.
Abecasis
(2009).
"Genotype
imputation."
Annu
Rev
Genomics
Hum
Genet
10:
387-‐‑406.
Li,
Y.,
C.
J.
Willer,
J.
Ding,
P.
Scheet
and
G.
R.
Abecasis
(2010).
"MaCH:
using
sequence
and
genotype
data
to
estimate
haplotypes
and
unobserved
genotypes."
Genet
Epidemiol
34(8):
816-‐‑834.
Long,
C.,
J.
R.
McAnally,
J.
M.
Shelton,
A.
A.
Mireault,
R.
Bassel-‐‑Duby
and
E.
N.
Olson
(2014).
"Prevention
of
muscular
dystrophy
in
mice
by
CRISPR/Cas9-‐‑mediated
editing
of
germline
DNA."
Science
345(6201):
1184-‐‑1188.
Ma,
G.,
D.
Gu,
C.
Lv,
H.
Chu,
Z.
Xu,
N.
Tong,
M.
Wang,
C.
Tang,
Y.
Xu,
Z.
Zhang,
B.
Wang
and
J.
Chen
(2014).
"Genetic
variant
in
8q24
is
associated
with
prognosis
for
gastric
cancer
in
a
Chinese
population."
J
Gastroenterol
Hepatol.
Mahajan,
A.,
M.
J.
Go,
W.
Zhang,
J.
E.
Below,
K.
J.
Gaulton,
T.
Ferreira,
M.
Horikoshi,
A.
D.
Johnson,
M.
C.
Ng,
I.
Prokopenko,
D.
Saleheen,
X.
Wang,
E.
Zeggini,
G.
R.
Abecasis,
L.
S.
Adair,
P.
Almgren,
M.
Atalay,
T.
Aung,
D.
Baldassarre,
B.
Balkau,
Y.
Bao,
A.
H.
Barnett,
I.
Barroso,
A.
Basit,
L.
F.
Been,
J.
Beilby,
G.
I.
Bell,
R.
Benediktsson,
R.
N.
Bergman,
B.
O.
Boehm,
E.
Boerwinkle,
L.
L.
Bonnycastle,
N.
Burtt,
Q.
Cai,
H.
Campbell,
J.
Carey,
S.
Cauchi,
M.
Caulfield,
J.
C.
Chan,
L.
C.
Chang,
T.
J.
Chang,
Y.
C.
Chang,
G.
Charpentier,
C.
H.
Chen,
H.
Chen,
Y.
T.
Chen,
K.
S.
Chia,
M.
Chidambaram,
P.
S.
Chines,
N.
H.
Cho,
Y.
M.
Cho,
L.
M.
Chuang,
F.
S.
Collins,
M.
C.
Cornelis,
D.
J.
Couper,
A.
T.
Crenshaw,
R.
M.
van
Dam,
J.
Danesh,
D.
Das,
U.
de
Faire,
G.
Dedoussis,
P.
Deloukas,
A.
S.
Dimas,
C.
Dina,
A.
S.
Doney,
P.
J.
Donnelly,
M.
Dorkhan,
C.
van
Duijn,
J.
Dupuis,
S.
Edkins,
P.
Elliott,
V.
Emilsson,
R.
Erbel,
J.
G.
Eriksson,
J.
Escobedo,
T.
Esko,
160
E.
Eury,
J.
C.
Florez,
P.
Fontanillas,
N.
G.
Forouhi,
T.
Forsen,
C.
Fox,
R.
M.
Fraser,
T.
M.
Frayling,
P.
Froguel,
P.
Frossard,
Y.
Gao,
K.
Gertow,
C.
Gieger,
B.
Gigante,
H.
Grallert,
G.
B.
Grant,
L.
C.
Grrop,
C.
J.
Groves,
E.
Grundberg,
C.
Guiducci,
A.
Hamsten,
B.
G.
Han,
K.
Hara,
N.
Hassanali,
A.
T.
Hattersley,
C.
Hayward,
A.
K.
Hedman,
C.
Herder,
A.
Hofman,
O.
L.
Holmen,
K.
Hovingh,
A.
B.
Hreidarsson,
C.
Hu,
F.
B.
Hu,
J.
Hui,
S.
E.
Humphries,
S.
E.
Hunt,
D.
J.
Hunter,
K.
Hveem,
Z.
I.
Hydrie,
H.
Ikegami,
T.
Illig,
E.
Ingelsson,
M.
Islam,
B.
Isomaa,
A.
U.
Jackson,
T.
Jafar,
A.
James,
W.
Jia,
K.
H.
Jockel,
A.
Jonsson,
J.
B.
Jowett,
T.
Kadowaki,
H.
M.
Kang,
S.
Kanoni,
W.
H.
Kao,
S.
Kathiresan,
N.
Kato,
P.
Katulanda,
K.
M.
Keinanen-‐‑Kiukaanniemi,
A.
M.
Kelly,
H.
Khan,
K.
T.
Khaw,
C.
C.
Khor,
H.
L.
Kim,
S.
Kim,
Y.
J.
Kim,
L.
Kinnunen,
N.
Klopp,
A.
Kong,
E.
Korpi-‐‑
Hyovalti,
S.
Kowlessur,
P.
Kraft,
J.
Kravic,
M.
M.
Kristensen,
S.
Krithika,
A.
Kumar,
J.
Kumate,
J.
Kuusisto,
S.
H.
Kwak,
M.
Laakso,
V.
Lagou,
T.
A.
Lakka,
C.
Langenberg,
C.
Langford,
R.
Lawrence,
K.
Leander,
J.
M.
Lee,
N.
R.
Lee,
M.
Li,
X.
Li,
Y.
Li,
J.
Liang,
S.
Liju,
W.
Y.
Lim,
L.
Lind,
C.
M.
Lindgren,
E.
Lindholm,
C.
T.
Liu,
J.
J.
Liu,
S.
Lobbens,
J.
Long,
R.
J.
Loos,
W.
Lu,
J.
Luan,
V.
Lyssenko,
R.
C.
Ma,
S.
Maeda,
R.
Magi,
S.
Mannisto,
D.
R.
Matthews,
J.
B.
Meigs,
O.
Melander,
A.
Metspalu,
J.
Meyer,
G.
Mirza,
E.
Mihailov,
S.
Moebus,
V.
Mohan,
K.
L.
Mohlke,
A.
D.
Morris,
T.
W.
Muhleisen,
M.
Muller-‐‑Nurasyid,
B.
Musk,
J.
Nakamura,
E.
Nakashima,
P.
Navarro,
P.
K.
Ng,
A.
C.
Nica,
P.
M.
Nilsson,
I.
Njolstad,
M.
M.
Nothen,
K.
Ohnaka,
T.
H.
Ong,
K.
R.
Owen,
C.
N.
Palmer,
J.
S.
Pankow,
K.
S.
Park,
M.
Parkin,
S.
Pechlivanis,
N.
L.
Pedersen,
L.
Peltonen,
J.
R.
Perry,
A.
Peters,
J.
M.
Pinidiyapathirage,
C.
G.
Platou,
S.
Potter,
J.
F.
Price,
L.
Qi,
V.
Radha,
L.
Rallidis,
A.
Rasheed,
W.
Rathman,
R.
Rauramaa,
S.
Raychaudhuri,
N.
W.
Rayner,
S.
D.
Rees,
E.
Rehnberg,
S.
Ripatti,
N.
Robertson,
M.
Roden,
E.
J.
Rossin,
I.
Rudan,
D.
Rybin,
T.
E.
Saaristo,
V.
Salomaa,
J.
Saltevo,
M.
Samuel,
D.
K.
Sanghera,
J.
Saramies,
J.
Scott,
L.
J.
Scott,
R.
A.
Scott,
A.
V.
Segre,
J.
Sehmi,
B.
Sennblad,
N.
Shah,
S.
Shah,
A.
S.
Shera,
X.
O.
Shu,
A.
R.
Shuldiner,
G.
Sigurdsson,
E.
Sijbrands,
A.
Silveira,
X.
Sim,
S.
Sivapalaratnam,
K.
S.
Small,
W.
Y.
So,
A.
Stancakova,
K.
Stefansson,
G.
Steinbach,
V.
Steinthorsdottir,
K.
Stirrups,
R.
J.
Strawbridge,
H.
M.
Stringham,
Q.
Sun,
C.
Suo,
A.
C.
Syvanen,
R.
Takayanagi,
F.
Takeuchi,
W.
T.
Tay,
T.
M.
Teslovich,
B.
Thorand,
G.
Thorleifsson,
U.
Thorsteinsdottir,
E.
Tikkanen,
J.
Trakalo,
E.
Tremoli,
M.
D.
Trip,
F.
J.
Tsai,
T.
Tuomi,
J.
Tuomilehto,
A.
G.
Uitterlinden,
A.
Valladares-‐‑
Salgado,
S.
Vedantam,
F.
Veglia,
B.
F.
Voight,
C.
Wang,
N.
J.
Wareham,
R.
Wennauer,
A.
R.
Wickremasinghe,
T.
Wilsgaard,
J.
F.
Wilson,
S.
Wiltshire,
W.
Winckler,
T.
Y.
Wong,
A.
R.
Wood,
J.
Y.
Wu,
Y.
Wu,
K.
Yamamoto,
T.
Yamauchi,
M.
Yang,
L.
Yengo,
M.
Yokota,
R.
Young,
D.
Zabaneh,
F.
Zhang,
R.
Zhang,
W.
Zheng,
P.
Z.
Zimmet,
D.
Altshuler,
D.
W.
Bowden,
Y.
S.
Cho,
N.
J.
Cox,
M.
Cruz,
C.
L.
Hanis,
J.
Kooner,
J.
Y.
Lee,
M.
Seielstad,
Y.
Y.
Teo,
M.
Boehnke,
E.
J.
Parra,
J.
C.
Chambers,
E.
S.
Tai,
M.
I.
McCarthy
and
A.
P.
Morris
(2014).
"Genome-‐‑wide
trans-‐‑ancestry
meta-‐‑analysis
provides
insight
into
the
genetic
architecture
of
type
2
diabetes
susceptibility."
Nat
Genet
46(3):
234-‐‑244.
Mali,
P.,
L.
Yang,
K.
M.
Esvelt,
J.
Aach,
M.
Guell,
J.
E.
DiCarlo,
J.
E.
Norville
and
G.
M.
Church
(2013).
"RNA-‐‑guided
human
genome
engineering
via
Cas9."
Science
339(6121):
823-‐‑826.
Maniatis,
T.,
J.
V.
Falvo,
T.
H.
Kim,
C.
H.
Lin,
B.
S.
Parekh
and
M.
G.
Wathelet
(1998).
"Structure
and
function
of
the
interferon-‐‑B
enhanceosome."
Cold
Spring
Harb
Symp
Quant
Biol
63:
609-‐‑620.
Manolio,
T.
A.
(2010).
"Genomewide
association
studies
and
assessment
of
the
risk
of
disease."
N
Engl
J
Med
363(2):
166-‐‑176.
161
Matano,
M.,
S.
Date,
M.
Shimokawa,
A.
Takano,
M.
Fujii,
Y.
Ohta,
T.
Watanabe,
T.
Kanai
and
T.
Sato
(2015).
"Modeling
colorectal
cancer
using
CRISPR-‐‑Cas9-‐‑mediated
engineering
of
human
intestinal
organoids."
Nat
Med
21(3):
256-‐‑262.
Maurano,
M.
T.,
R.
Humbert,
E.
Rynes,
R.
E.
Thurman,
E.
Haugen,
H.
Wang,
A.
P.
Reynolds,
R.
Sandstrom,
H.
Qu,
J.
Brody,
A.
Shafer,
F.
Neri,
K.
Lee,
T.
Kutyavin,
S.
Stehling-‐‑Sun,
A.
K.
Johnson,
T.
K.
Canfield,
E.
Giste,
M.
Diegel,
D.
Bates,
R.
S.
Hansen,
S.
Neph,
P.
J.
Sabo,
S.
Heimfeld,
A.
Raubitschek,
S.
Ziegler,
C.
Cotsapas,
N.
Sotoodehnia,
I.
Glass,
S.
R.
Sunyaev,
R.
Kaul
and
J.
A.
Stamatoyannopoulos
(2012).
"Systematic
localization
of
common
disease-‐‑
associated
variation
in
regulatory
DNA."
Science
337(6099):
1190-‐‑1195.
McDaniell,
R.,
B.
K.
Lee,
L.
Song,
Z.
Liu,
A.
P.
Boyle,
M.
R.
Erdos,
L.
J.
Scott,
M.
A.
Morken,
K.
S.
Kucera,
A.
Battenhouse,
D.
Keefe,
F.
S.
Collins,
H.
F.
Willard,
J.
D.
Lieb,
T.
S.
Furey,
G.
E.
Crawford,
V.
R.
Iyer
and
E.
Birney
(2010).
"Heritable
individual-‐‑specific
and
allele-‐‑specific
chromatin
signatures
in
humans."
Science
328(5975):
235-‐‑239.
McKeown,
M.
R.
and
J.
E.
Bradner
(2014).
"Therapeutic
strategies
to
inhibit
MYC."
Cold
Spring
Harb
Perspect
Med
4(10).
McVicker,
G.,
B.
van
de
Geijn,
J.
F.
Degner,
C.
E.
Cain,
N.
E.
Banovich,
A.
Raj,
N.
Lewellen,
M.
Myrthil,
Y.
Gilad
and
J.
K.
Pritchard
(2013).
"Identification
of
genetic
variants
that
affect
histone
modifications
in
human
cells."
Science
342(6159):
747-‐‑749.
Meier,
I.
D.,
C.
Bernreuther,
T.
Tilling,
J.
Neidhardt,
Y.
W.
Wong,
C.
Schulze,
T.
Streichert
and
M.
Schachner
(2010).
"Short
DNA
sequences
inserted
for
gene
targeting
can
accidentally
interfere
with
off-‐‑target
gene
expression."
Faseb
j
24(6):
1714-‐‑1724.
Melnikov,
A.,
A.
Murugan,
X.
Zhang,
T.
Tesileanu,
L.
Wang,
P.
Rogov,
S.
Feizi,
A.
Gnirke,
C.
G.
Callan,
Jr.,
J.
B.
Kinney,
M.
Kellis,
E.
S.
Lander
and
T.
S.
Mikkelsen
(2012).
"Systematic
dissection
and
optimization
of
inducible
enhancers
in
human
cells
using
a
massively
parallel
reporter
assay."
Nat
Biotechnol
30(3):
271-‐‑277.
Meyer,
M.
B.,
N.
A.
Benkusky
and
J.
W.
Pike
(2015).
"Selective
Distal
Enhancer
Control
of
the
Mmp13
Gene
Identified
through
Clustered
Regularly
Interspaced
Short
Palindromic
Repeat
(CRISPR)
Genomic
Deletions."
J
Biol
Chem
290(17):
11093-‐‑11107.
Michailidou,
K.,
P.
Hall,
A.
Gonzalez-‐‑Neira,
M.
Ghoussaini,
J.
Dennis,
R.
L.
Milne,
M.
K.
Schmidt,
J.
Chang-‐‑Claude,
S.
E.
Bojesen,
M.
K.
Bolla,
Q.
Wang,
E.
Dicks,
A.
Lee,
C.
Turnbull,
N.
Rahman,
O.
Fletcher,
J.
Peto,
L.
Gibson,
I.
Dos
Santos
Silva,
H.
Nevanlinna,
T.
A.
Muranen,
K.
Aittomaki,
C.
Blomqvist,
K.
Czene,
A.
Irwanto,
J.
Liu,
Q.
Waisfisz,
H.
Meijers-‐‑Heijboer,
M.
Adank,
R.
B.
van
der
Luijt,
R.
Hein,
N.
Dahmen,
L.
Beckman,
A.
Meindl,
R.
K.
Schmutzler,
B.
Muller-‐‑Myhsok,
P.
Lichtner,
J.
L.
Hopper,
M.
C.
Southey,
E.
Makalic,
D.
F.
Schmidt,
A.
G.
Uitterlinden,
A.
Hofman,
D.
J.
Hunter,
S.
J.
Chanock,
D.
Vincent,
F.
Bacot,
D.
C.
Tessier,
S.
Canisius,
L.
F.
Wessels,
C.
A.
Haiman,
M.
Shah,
R.
Luben,
J.
Brown,
C.
Luccarini,
N.
Schoof,
K.
Humphreys,
J.
Li,
B.
G.
Nordestgaard,
S.
F.
Nielsen,
H.
Flyger,
F.
J.
Couch,
X.
Wang,
C.
Vachon,
K.
N.
Stevens,
D.
Lambrechts,
M.
Moisse,
R.
Paridaens,
M.
R.
Christiaens,
A.
Rudolph,
S.
Nickels,
D.
Flesch-‐‑
Janys,
N.
Johnson,
Z.
Aitken,
K.
Aaltonen,
T.
Heikkinen,
A.
Broeks,
L.
J.
Veer,
C.
E.
van
der
Schoot,
P.
Guenel,
T.
Truong,
P.
Laurent-‐‑Puig,
F.
Menegaux,
F.
Marme,
A.
Schneeweiss,
C.
Sohn,
B.
Burwinkel,
M.
P.
Zamora,
J.
I.
Perez,
G.
Pita,
M.
R.
Alonso,
A.
Cox,
I.
W.
Brock,
S.
S.
Cross,
M.
W.
Reed,
E.
J.
Sawyer,
I.
Tomlinson,
M.
J.
Kerin,
N.
Miller,
B.
E.
Henderson,
F.
Schumacher,
L.
Le
Marchand,
I.
L.
Andrulis,
J.
A.
Knight,
G.
Glendon,
A.
M.
Mulligan,
A.
Lindblom,
S.
Margolin,
M.
J.
Hooning,
A.
Hollestelle,
A.
M.
van
den
Ouweland,
A.
Jager,
Q.
M.
Bui,
J.
Stone,
G.
S.
Dite,
C.
Apicella,
H.
Tsimiklis,
G.
G.
Giles,
G.
Severi,
L.
Baglietto,
P.
A.
162
Fasching,
L.
Haeberle,
A.
B.
Ekici,
M.
W.
Beckmann,
H.
Brenner,
H.
Muller,
V.
Arndt,
C.
Stegmaier,
A.
Swerdlow,
A.
Ashworth,
N.
Orr,
M.
Jones,
J.
Figueroa,
J.
Lissowska,
L.
Brinton,
M.
S.
Goldberg,
F.
Labreche,
M.
Dumont,
R.
Winqvist,
K.
Pylkas,
A.
Jukkola-‐‑Vuorinen,
M.
Grip,
H.
Brauch,
U.
Hamann,
T.
Bruning,
P.
Radice,
P.
Peterlongo,
S.
Manoukian,
B.
Bonanni,
P.
Devilee,
R.
A.
Tollenaar,
C.
Seynaeve,
C.
J.
van
Asperen,
A.
Jakubowska,
J.
Lubinski,
K.
Jaworska,
K.
Durda,
A.
Mannermaa,
V.
Kataja,
V.
M.
Kosma,
J.
M.
Hartikainen,
N.
V.
Bogdanova,
N.
N.
Antonenkova,
T.
Dork,
V.
N.
Kristensen,
H.
Anton-‐‑Culver,
S.
Slager,
A.
E.
Toland,
S.
Edge,
F.
Fostira,
D.
Kang,
K.
Y.
Yoo,
D.
Y.
Noh,
K.
Matsuo,
H.
Ito,
H.
Iwata,
A.
Sueta,
A.
H.
Wu,
C.
C.
Tseng,
D.
Van
Den
Berg,
D.
O.
Stram,
X.
O.
Shu,
W.
Lu,
Y.
T.
Gao,
H.
Cai,
S.
H.
Teo,
C.
H.
Yip,
S.
Y.
Phuah,
B.
K.
Cornes,
M.
Hartman,
H.
Miao,
W.
Y.
Lim,
J.
H.
Sng,
K.
Muir,
A.
Lophatananon,
S.
Stewart-‐‑Brown,
P.
Siriwanarangsan,
C.
Y.
Shen,
C.
N.
Hsiung,
P.
E.
Wu,
S.
L.
Ding,
S.
Sangrajrang,
V.
Gaborieau,
P.
Brennan,
J.
McKay,
W.
J.
Blot,
L.
B.
Signorello,
Q.
Cai,
W.
Zheng,
S.
Deming-‐‑Halverson,
M.
Shrubsole,
J.
Long,
J.
Simard,
M.
Garcia-‐‑Closas,
P.
D.
Pharoah,
G.
Chenevix-‐‑Trench,
A.
M.
Dunning,
J.
Benitez
and
D.
F.
Easton
(2013).
"Large-‐‑scale
genotyping
identifies
41
new
loci
associated
with
breast
cancer
risk."
Nat
Genet
45(4):
353-‐‑
361,
361e351-‐‑352.
Musunuru,
K.,
A.
Strong,
M.
Frank-‐‑Kamenetsky,
N.
E.
Lee,
T.
Ahfeldt,
K.
V.
Sachs,
X.
Li,
H.
Li,
N.
Kuperwasser,
V.
M.
Ruda,
J.
P.
Pirruccello,
B.
Muchmore,
L.
Prokunina-‐‑Olsson,
J.
L.
Hall,
E.
E.
Schadt,
C.
R.
Morales,
S.
Lund-‐‑Katz,
M.
C.
Phillips,
J.
Wong,
W.
Cantley,
T.
Racie,
K.
G.
Ejebe,
M.
Orho-‐‑Melander,
O.
Melander,
V.
Koteliansky,
K.
Fitzgerald,
R.
M.
Krauss,
C.
A.
Cowan,
S.
Kathiresan
and
D.
J.
Rader
(2010).
"From
noncoding
variant
to
phenotype
via
SORT1
at
the
1p13
cholesterol
locus."
Nature
466(7307):
714-‐‑719.
Narendra,
V.,
P.
P.
Rocha,
D.
An,
R.
Raviram,
J.
A.
Skok,
E.
O.
Mazzoni
and
D.
Reinberg
(2015).
"Transcription.
CTCF
establishes
discrete
functional
chromatin
domains
at
the
Hox
clusters
during
differentiation."
Science
347(6225):
1017-‐‑1021.
Nichols,
M.
H.
and
V.
G.
Corces
(2015).
"A
CTCF
Code
for
3D
Genome
Architecture."
Cell
162(4):
703-‐‑705.
Nicolae,
D.
L.,
E.
Gamazon,
W.
Zhang,
S.
Duan,
M.
E.
Dolan
and
N.
J.
Cox
(2010).
"Trait-‐‑
associated
SNPs
are
more
likely
to
be
eQTLs:
annotation
to
enhance
discovery
from
GWAS."
PLoS
Genet
6(4):
e1000888.
O'Geen,
H.,
L.
Echipare
and
P.
J.
Farnham
(2011).
"Using
ChIP-‐‑Seq
Technology
to
Generate
High-‐‑Resolution
Profiles
of
Histone
Modifications."
Methods
Mol
Biol
791:
265-‐‑286.
Onengut-‐‑Gumuscu,
S.,
W.
M.
Chen,
O.
Burren,
N.
J.
Cooper,
A.
R.
Quinlan,
J.
C.
Mychaleckyj,
E.
Farber,
J.
K.
Bonnie,
M.
Szpak,
E.
Schofield,
P.
Achuthan,
H.
Guo,
M.
D.
Fortune,
H.
Stevens,
N.
M.
Walker,
L.
D.
Ward,
A.
Kundaje,
M.
Kellis,
M.
J.
Daly,
J.
C.
Barrett,
J.
D.
Cooper,
P.
Deloukas,
J.
A.
Todd,
C.
Wallace,
P.
Concannon
and
S.
S.
Rich
(2015).
"Fine
mapping
of
type
1
diabetes
susceptibility
loci
and
evidence
for
colocalization
of
causal
variants
with
lymphoid
gene
enhancers."
Nat
Genet
47(4):
381-‐‑386.
Ong,
C.
T.
and
V.
G.
Corces
(2011).
"Enhancer
function:
new
insights
into
the
regulation
of
tissue-‐‑specific
gene
expression."
Nat
Rev
Genet
12(4):
283-‐‑293.
Ong,
C.
T.
and
V.
G.
Corces
(2014).
"CTCF:
an
architectural
protein
bridging
genome
topology
and
function."
Nat
Rev
Genet
15(4):
234-‐‑246.
Ong,
R.
T.,
X.
Wang,
X.
Liu
and
Y.
Y.
Teo
(2012).
"Efficiency
of
trans-‐‑ethnic
genome-‐‑wide
meta-‐‑analysis
and
fine-‐‑mapping."
Eur
J
Hum
Genet
20(12):
1300-‐‑1307.
163
Ongen,
H.,
C.
L.
Andersen,
J.
B.
Bramsen,
B.
Oster,
M.
H.
Rasmussen,
P.
G.
Ferreira,
J.
Sandoval,
E.
Vidal,
N.
Whiffin,
A.
Planchon,
I.
Padioleau,
D.
Bielser,
L.
Romano,
I.
Tomlinson,
R.
S.
Houlston,
M.
Esteller,
T.
F.
Orntoft
and
E.
T.
Dermitzakis
(2014).
"Putative
cis-‐‑regulatory
drivers
in
colorectal
cancer."
Nature
512(7512):
87-‐‑90.
Pai,
A.
A.,
J.
K.
Pritchard
and
Y.
Gilad
(2015).
"The
genetic
and
mechanistic
basis
for
variation
in
gene
regulation."
PLoS
Genet
11(1):
e1004857.
Palmiter,
R.
D.
and
R.
L.
Brinster
(1986).
"Germ-‐‑line
transformation
of
mice."
Annu
Rev
Genet
20:
465-‐‑499.
Panne,
D.
(2008).
"The
enhanceosome."
Curr
Opin
Structural
Biol
18:
236-‐‑242.
Paredes,
J.,
J.
Figueiredo,
A.
Albergaria,
P.
Oliveira,
J.
Carvalho,
A.
S.
Ribeiro,
J.
Caldeira,
A.
M.
Costa,
J.
Simoes-‐‑Correia,
M.
J.
Oliveira,
H.
Pinheiro,
S.
S.
Pinho,
R.
Mateus,
C.
A.
Reis,
M.
Leite,
M.
S.
Fernandes,
F.
Schmitt,
F.
Carneiro,
C.
Figueiredo,
C.
Oliveira
and
R.
Seruca
(2012).
"Epithelial
E-‐‑
and
P-‐‑cadherins:
role
and
clinical
significance
in
cancer."
Biochim
Biophys
Acta
1826(2):
297-‐‑311.
Petit,
F.,
A.
S.
Jourdain,
M.
Holder-‐‑Espinasse,
B.
Keren,
J.
Andrieux,
M.
Duterque-‐‑Coquillaud,
N.
Porchet,
S.
Manouvrier-‐‑Hanu
and
F.
Escande
(2015).
"The
disruption
of
a
novel
limb
cis-‐‑
regulatory
element
of
SHH
is
associated
with
autosomal
dominant
preaxial
polydactyly-‐‑
hypertrichosis."
Eur
J
Hum
Genet
March
18,
Epub
ahead
of
print.
Pomerantz,
M.
M.,
N.
Ahmadiyeh,
L.
Jia,
P.
Herman,
M.
P.
Verzi,
H.
Doddapaneni,
C.
A.
Beckwith,
J.
A.
Chan,
A.
Hills,
M.
Davis,
K.
Yao,
S.
M.
Kehoe,
H.
J.
Lenz,
C.
A.
Haiman,
C.
Yan,
B.
E.
Henderson,
B.
Frenkel,
J.
Barretina,
A.
Bass,
J.
Tabernero,
J.
Baselga,
M.
M.
Regan,
J.
R.
Manak,
R.
Shivdasani,
G.
A.
Coetzee
and
M.
L.
Freedman
(2009).
"The
8q24
cancer
risk
variant
rs6983267
shows
long-‐‑range
interaction
with
MYC
in
colorectal
cancer."
Nat
Genet
41(8):
882-‐‑884.
Prelich,
G.
(2012).
"Gene
overexpression:
uses,
mechanisms,
and
interpretation."
Genetics
190(3):
841-‐‑854.
Ramasamy,
A.,
D.
Trabzuni,
S.
Guelfi,
V.
Varghese,
C.
Smith,
R.
Walker,
T.
De,
L.
Coin,
R.
de
Silva,
M.
R.
Cookson,
A.
B.
Singleton,
J.
Hardy,
M.
Ryten
and
M.
E.
Weale
(2014).
"Genetic
variability
in
the
regulation
of
gene
expression
in
ten
regions
of
the
human
brain."
Nat
Neurosci
17(10):
1418-‐‑1428.
Raviram,
R.,
P.
P.
Rocha,
R.
Bonneau
and
J.
A.
Skok
(2014).
"Interpreting
4C-‐‑Seq
data:
how
far
can
we
go?"
Epigenomics
6(5):
455-‐‑457.
Rivera,
C.
M.
and
B.
Ren
(2013).
"Mapping
human
epigenomes."
Cell
155(1):
39-‐‑55.
RoadmapEpigenomicsConsortium
(2015).
"Integrative
analysis
of
111
reference
human
epigenomes."
Nature
19:
317-‐‑330.
Sagai,
T.,
M.
Hosoya,
Y.
Mizushina,
M.
Tamura
and
T.
Shiroishi
(2005).
"Elimination
of
a
long-‐‑
range
cis-‐‑regulatory
module
causes
complete
loss
of
limb-‐‑specific
Shh
expression
and
truncation
of
the
mouse
limb."
Development
132(4):
797-‐‑803.
Sander,
J.
D.
and
J.
K.
Joung
(2014).
"CRISPR-‐‑Cas
systems
for
editing,
regulating
and
targeting
genomes."
Nat
Biotechnol
32(4):
347-‐‑355.
Sanyal,
A.,
B.
R.
Lajoie,
G.
Jain
and
J.
Dekker
(2012).
"The
long-‐‑range
interaction
landscape
of
gene
promoters."
Nature
489(7414):
109-‐‑113.
Schaub,
M.
A.,
A.
P.
Boyle,
A.
Kundaje,
S.
Batzoglou
and
M.
Snyder
(2012).
"Linking
disease
associations
with
regulatory
information
in
the
human
genome."
Genome
Res
22(9):
1748-‐‑
1759.
164
Schmidt,
E.
M.,
J.
Zhang,
W.
Zhou,
J.
Chen,
K.
L.
Mohlke,
Y.
E.
Chen
and
C.
J.
Willer
(2015).
"GREGOR:
evaluating
global
enrichment
of
trait-‐‑associated
variants
in
epigenomic
features
using
a
systematic,
data-‐‑driven
approach."
Bioinformatics
31(16):
2601-‐‑2606.
Schork,
A.
J.,
W.
K.
Thompson,
P.
Pham,
A.
Torkamani,
J.
C.
Roddey,
P.
F.
Sullivan,
J.
R.
Kelsoe,
M.
C.
O'Donovan,
H.
Furberg,
N.
J.
Schork,
O.
A.
Andreassen
and
A.
M.
Dale
(2013).
"All
SNPs
are
not
created
equal:
genome-‐‑wide
association
studies
reveal
a
consistent
pattern
of
enrichment
among
functionally
annotated
SNPs."
PLoS
Genet
9(4):
e1003449.
Schwank,
G.,
B.
K.
Koo,
V.
Sasselli,
J.
F.
Dekkers,
I.
Heo,
T.
Demircan,
N.
Sasaki,
S.
Boymans,
E.
Cuppen,
C.
K.
van
der
Ent,
E.
E.
Nieuwenhuis,
J.
M.
Beekman
and
H.
Clevers
(2013).
"Functional
repair
of
CFTR
by
CRISPR/Cas9
in
intestinal
stem
cell
organoids
of
cystic
fibrosis
patients."
Cell
Stem
Cell
13(6):
653-‐‑658.
Segditsas,
S.
and
I.
Tomlinson
(2006).
"Colorectal
cancer
and
genetic
alterations
in
the
Wnt
pathway."
Oncogene
25(57):
7531-‐‑7537.
Sexton,
T.
and
G.
Cavalli
(2015).
"The
role
of
chromosome
domains
in
shaping
the
functional
genome."
Cell
160(6):
1049-‐‑1059.
Sigoillot,
F.
D.,
S.
Lyman,
J.
F.
Huckins,
B.
Adamson,
E.
Chung,
B.
Quattrochi
and
R.
W.
King
(2012).
"A
bioinformatics
method
identifies
prominent
off-‐‑targeted
transcripts
in
RNAi
screens."
Nat
Methods
9(4):
363-‐‑366.
Simonis,
M.,
J.
Kooren
and
W.
de
Laat
(2007).
"An
evaluation
of
3C-‐‑based
methods
to
capture
DNA
interactions."
Nat
Methods
4(11):
895-‐‑901.
Smemo,
S.,
J.
J.
Tena,
K.
H.
Kim,
E.
R.
Gamazon,
N.
J.
Sakabe,
C.
Gomez-‐‑Marin,
I.
Aneas,
F.
L.
Credidio,
D.
R.
Sobreira,
N.
F.
Wasserman,
J.
H.
Lee,
V.
Puviindran,
D.
Tam,
M.
Shen,
J.
E.
Son,
N.
A.
Vakili,
H.
K.
Sung,
S.
Naranjo,
R.
D.
Acemel,
M.
Manzanares,
A.
Nagy,
N.
J.
Cox,
C.
C.
Hui,
J.
L.
Gomez-‐‑Skarmeta
and
M.
A.
Nobrega
(2014).
"Obesity-‐‑associated
variants
within
FTO
form
long-‐‑range
functional
connections
with
IRX3."
Nature
507(7492):
371-‐‑375.
Spain,
S.
L.
and
J.
C.
Barrett
(2015).
"Strategies
for
fine-‐‑mapping
complex
traits."
Hum
Mol
Genet
24(R1):
R111-‐‑119.
Spitz,
F.
and
E.
E.
Furlong
(2012).
"Transcription
factors:
from
enhancer
binding
to
developmental
control."
Nat
Rev
Genet
13(9):
613-‐‑626.
Splinter,
E.,
H.
Heath,
J.
Kooren,
R.-‐‑J.
Palstra,
P.
Klous,
F.
Grosveld,
N.
Galjart
and
W.
de
Laat
(2006).
"CTCF
mediates
long-‐‑range
chromatin
looping
and
local
histone
modification
in
the
beta-‐‑globin
locus."
Genes
Dev
20(17):
2349-‐‑2354.
Stergachis,
A.
B.,
E.
Haugen,
A.
Shafer,
W.
Fu,
B.
Vernot,
A.
Reynolds,
A.
Raubitschek,
S.
Ziegler,
E.
M.
LeProust,
J.
M.
Akey
and
J.
A.
Stamatoyannopoulos
(2013).
"Exonic
transcription
factor
binding
directs
codon
choice
and
affects
protein
evolution."
Science
342:
1367-‐‑1372.
Sur,
I.
K.,
O.
Hallikas,
A.
Vaharautio,
J.
Yan,
M.
Turunen,
M.
Enge,
M.
Taipale,
A.
Karhu,
L.
A.
Aaltonen
and
J.
Taipale
(2012).
"Mice
lacking
a
Myc
enhancer
that
includes
human
SNP
rs6983267
are
resistant
to
intestinal
tumors."
Science
338(6112):
1360-‐‑1363.
TheCancerGenomeAtlas
(2012).
"Comprehensive
molecular
characterization
of
human
colon
and
rectal
cancer."
Nature
487(7407):
330-‐‑337.
Trapnell,
C.,
L.
Pachter
and
S.
L.
Salzberg
(2009).
"TopHat:
discovering
splice
junctions
with
RNA-‐‑Seq."
Bioinformatics
25(9):
1105-‐‑1111.
Trapnell,
C.,
B.
A.
Williams,
G.
Pertea,
A.
Mortazavi,
G.
Kwan,
M.
J.
van
Baren,
S.
L.
Salzberg,
B.
J.
Wold
and
L.
Pachter
(2010).
"Transcript
assembly
and
quantification
by
RNA-‐‑Seq
reveals
165
unannotated
transcripts
and
isoform
switching
during
cell
differentiation."
Nat
Biotechnol
28(5):
511-‐‑515.
Trynka,
G.,
K.
A.
Hunt,
N.
A.
Bockett,
J.
Romanos,
V.
Mistry,
A.
Szperl,
S.
F.
Bakker,
M.
T.
Bardella,
L.
Bhaw-‐‑Rosun,
G.
Castillejo,
E.
G.
de
la
Concha,
R.
C.
de
Almeida,
K.
R.
Dias,
C.
C.
van
Diemen,
P.
C.
Dubois,
R.
H.
Duerr,
S.
Edkins,
L.
Franke,
K.
Fransen,
J.
Gutierrez,
G.
A.
Heap,
B.
Hrdlickova,
S.
Hunt,
L.
Plaza
Izurieta,
V.
Izzo,
L.
A.
Joosten,
C.
Langford,
M.
C.
Mazzilli,
C.
A.
Mein,
V.
Midah,
M.
Mitrovic,
B.
Mora,
M.
Morelli,
S.
Nutland,
C.
Nunez,
S.
Onengut-‐‑Gumuscu,
K.
Pearce,
M.
Platteel,
I.
Polanco,
S.
Potter,
C.
Ribes-‐‑Koninckx,
I.
Ricano-‐‑Ponce,
S.
S.
Rich,
A.
Rybak,
J.
L.
Santiago,
S.
Senapati,
A.
Sood,
H.
Szajewska,
R.
Troncone,
J.
Varade,
C.
Wallace,
V.
M.
Wolters,
A.
Zhernakova,
B.
K.
Thelma,
B.
Cukrowska,
E.
Urcelay,
J.
R.
Bilbao,
M.
L.
Mearin,
D.
Barisani,
J.
C.
Barrett,
V.
Plagnol,
P.
Deloukas,
C.
Wijmenga
and
D.
A.
van
Heel
(2011).
"Dense
genotyping
identifies
and
localizes
multiple
common
and
rare
variant
association
signals
in
celiac
disease."
Nat
Genet
43(12):
1193-‐‑1201.
Tseng,
Y.
Y.,
B.
S.
Moriarity,
W.
Gong,
R.
Akiyama,
A.
Tiwari,
H.
Kawakami,
P.
Ronning,
B.
Reuland,
K.
Guenther,
T.
C.
Beadnell,
J.
Essig,
G.
M.
Otto,
M.
G.
O'Sullivan,
D.
A.
Largaespada,
K.
L.
Schwertfeger,
Y.
Marahrens,
Y.
Kawakami
and
A.
Bagchi
(2014).
"PVT1
dependence
in
cancer
with
MYC
copy-‐‑number
increase."
Nature
512(7512):
82-‐‑86.
van
de
Werken,
H.
J.,
P.
J.
de
Vree,
E.
Splinter,
S.
J.
Holwerda,
P.
Klous,
E.
de
Wit
and
W.
de
Laat
(2012).
"4C
technology:
protocols
and
data
analysis."
Methods
Enzymol
513:
89-‐‑112.
van
de
Werken,
H.
J.,
G.
Landan,
S.
J.
Holwerda,
M.
Hoichman,
P.
Klous,
R.
Chachik,
E.
Splinter,
C.
Valdes-‐‑Quezada,
Y.
Oz,
B.
A.
Bouwman,
M.
J.
Verstegen,
E.
de
Wit,
A.
Tanay
and
W.
de
Laat
(2012).
"Robust
4C-‐‑seq
data
analysis
to
screen
for
regulatory
DNA
interactions."
Nat
Methods
9(10):
969-‐‑972.
Vierstra,
J.,
A.
Reik,
K.
H.
Chang,
S.
Stehling-‐‑Sun,
Y.
Zhou,
S.
J.
Hinkley,
D.
E.
Paschon,
L.
Zhang,
N.
Psatha,
Y.
R.
Bendana,
C.
M.
O'Neil,
A.
H.
Song,
A.
K.
Mich,
P.
Q.
Liu,
G.
Lee,
D.
E.
Bauer,
M.
C.
Holmes,
S.
H.
Orkin,
T.
Papayannopoulou,
G.
Stamatoyannopoulos,
E.
J.
Rebar,
P.
D.
Gregory,
F.
D.
Urnov
and
J.
A.
Stamatoyannopoulos
(2015).
"Functional
footprinting
of
regulatory
DNA."
Nat
Methods.
Vietri
Rudan,
M.,
C.
Barrington,
S.
Henderson,
C.
Ernst,
D.
T.
Odom,
A.
Tanay
and
S.
Hadjur
(2015).
"Comparative
Hi-‐‑C
Reveals
that
CTCF
Underlies
Evolution
of
Chromosomal
Domain
Architecture."
Cell
Rep
10(8):
1297-‐‑1309.
Voight,
B.
F.,
H.
M.
Kang,
J.
Ding,
C.
D.
Palmer,
C.
Sidore,
P.
S.
Chines,
N.
P.
Burtt,
C.
Fuchsberger,
Y.
Li,
J.
Erdmann,
T.
M.
Frayling,
I.
M.
Heid,
A.
U.
Jackson,
T.
Johnson,
T.
O.
Kilpelainen,
C.
M.
Lindgren,
A.
P.
Morris,
I.
Prokopenko,
J.
C.
Randall,
R.
Saxena,
N.
Soranzo,
E.
K.
Speliotes,
T.
M.
Teslovich,
E.
Wheeler,
J.
Maguire,
M.
Parkin,
S.
Potter,
N.
W.
Rayner,
N.
Robertson,
K.
Stirrups,
W.
Winckler,
S.
Sanna,
A.
Mulas,
R.
Nagaraja,
F.
Cucca,
I.
Barroso,
P.
Deloukas,
R.
J.
Loos,
S.
Kathiresan,
P.
B.
Munroe,
C.
Newton-‐‑Cheh,
A.
Pfeufer,
N.
J.
Samani,
H.
Schunkert,
J.
N.
Hirschhorn,
D.
Altshuler,
M.
I.
McCarthy,
G.
R.
Abecasis
and
M.
Boehnke
(2012).
"The
metabochip,
a
custom
genotyping
array
for
genetic
studies
of
metabolic,
cardiovascular,
and
anthropometric
traits."
PLoS
Genet
8(8):
e1002793.
Wang,
H.,
H.
Yang,
C.
S.
Shivalila,
M.
M.
Dawlaty,
A.
W.
Cheng,
F.
Zhang
and
R.
Jaenisch
(2013).
"One-‐‑step
generation
of
mice
carrying
mutations
in
multiple
genes
by
CRISPR/Cas-‐‑mediated
genome
engineering."
Cell
153(4):
910-‐‑918.
Wang,
J.,
J.
Zhuang,
S.
Iyer,
X.
Y.
Lin,
M.
C.
Greven,
B.
H.
Kim,
J.
Moore,
B.
G.
Pierce,
X.
Dong,
D.
Virgil,
E.
Birney,
J.
H.
Hung
and
Z.
Weng
(2013).
"Factorbook.org:
a
Wiki-‐‑based
database
for
166
transcription
factor-‐‑binding
data
generated
by
the
ENCODE
consortium."
Nucleic
Acids
Res
41(Database
issue):
D171-‐‑176.
Ward,
L.
D.
and
M.
Kellis
(2012).
"HaploReg:
a
resource
for
exploring
chromatin
states,
conservation,
and
regulatory
motif
alterations
within
sets
of
genetically
linked
variants."
Nucleic
Acids
Res
40(Database
issue):
D930-‐‑934.
Webster,
D.
E.,
B.
Barajas,
R.
T.
Bussat,
K.
J.
Yan,
P.
H.
Neela,
R.
J.
Flockhart,
J.
Kovalski,
A.
Zehnder
and
P.
A.
Khavari
(2014).
"Enhancer-‐‑targeted
genome
editing
selectively
blocks
innate
resistance
to
oncokinase
inhibition."
Genome
Res.
Welter,
D.,
J.
MacArthur,
J.
Morales,
T.
Burdett,
P.
Hall,
H.
Junkins,
A.
Klemm,
P.
Flicek,
T.
Manolio,
L.
Hindorff
and
H.
Parkinson
(2014).
"The
NHGRI
GWAS
Catalog,
a
curated
resource
of
SNP-‐‑trait
associations."
Nucleic
Acids
Res
42(Database
issue):
D1001-‐‑1006.
Westra,
H.
J.,
M.
J.
Peters,
T.
Esko,
H.
Yaghootkar,
C.
Schurmann,
J.
Kettunen,
M.
W.
Christiansen,
B.
P.
Fairfax,
K.
Schramm,
J.
E.
Powell,
A.
Zhernakova,
D.
V.
Zhernakova,
J.
H.
Veldink,
L.
H.
Van
den
Berg,
J.
Karjalainen,
S.
Withoff,
A.
G.
Uitterlinden,
A.
Hofman,
F.
Rivadeneira,
P.
A.
t
Hoen,
E.
Reinmaa,
K.
Fischer,
M.
Nelis,
L.
Milani,
D.
Melzer,
L.
Ferrucci,
A.
B.
Singleton,
D.
G.
Hernandez,
M.
A.
Nalls,
G.
Homuth,
M.
Nauck,
D.
Radke,
U.
Volker,
M.
Perola,
V.
Salomaa,
J.
Brody,
A.
Suchy-‐‑Dicey,
S.
A.
Gharib,
D.
A.
Enquobahrie,
T.
Lumley,
G.
W.
Montgomery,
S.
Makino,
H.
Prokisch,
C.
Herder,
M.
Roden,
H.
Grallert,
T.
Meitinger,
K.
Strauch,
Y.
Li,
R.
C.
Jansen,
P.
M.
Visscher,
J.
C.
Knight,
B.
M.
Psaty,
S.
Ripatti,
A.
Teumer,
T.
M.
Frayling,
A.
Metspalu,
J.
B.
van
Meurs
and
L.
Franke
(2013).
"Systematic
identification
of
trans
eQTLs
as
putative
drivers
of
known
disease
associations."
Nat
Genet
45(10):
1238-‐‑
1243.
Williamson,
I.,
S.
Berlivet,
R.
Eskeland,
S.
Boyle,
R.
S.
Illingworth,
D.
Paquette,
J.
Dostie
and
W.
A.
Bickmore
(2014).
"Spatial
genome
organization:
contrasting
views
from
chromosome
conformation
capture
and
fluorescence
in
situ
hybridization."
Genes
Dev
28(24):
2778-‐‑
2791.
Wright,
J.
B.,
S.
J.
Brown
and
M.
D.
Cole
(2010).
"Upregulation
of
c-‐‑MYC
in
cis
through
a
large
chromatin
loop
linked
to
a
cancer
risk-‐‑associated
single-‐‑nucleotide
polymorphism
in
colorectal
cancer
cells."
Mol
Cell
Biol
30(6):
1411-‐‑1420.
Xiang,
J.
F.,
Q.
F.
Yin,
T.
Chen,
Y.
Zhang,
X.
O.
Zhang,
Z.
Wu,
S.
Zhang,
H.
B.
Wang,
J.
Ge,
X.
Lu,
L.
Yang
and
L.
L.
Chen
(2014).
"Human
colorectal
cancer-‐‑specific
CCAT1-‐‑L
lncRNA
regulates
long-‐‑range
chromatin
interactions
at
the
MYC
locus."
Cell
Res
24(5):
513-‐‑531.
Xue,
W.,
S.
Chen,
H.
Yin,
T.
Tammela,
T.
Papagiannakopoulos,
N.
S.
Joshi,
W.
Cai,
G.
Yang,
R.
Bronson,
D.
G.
Crowley,
F.
Zhang,
D.
G.
Anderson,
P.
A.
Sharp
and
T.
Jacks
(2014).
"CRISPR-‐‑
mediated
direct
mutation
of
cancer
genes
in
the
mouse
liver."
Nature
514(7522):
380-‐‑384.
Yang,
H.,
H.
Wang,
C.
S.
Shivalila,
A.
W.
Cheng,
L.
Shi
and
R.
Jaenisch
(2013).
"One-‐‑step
generation
of
mice
carrying
reporter
and
conditional
alleles
by
CRISPR/Cas-‐‑mediated
genome
engineering."
Cell
154(6):
1370-‐‑1379.
Yao,
L.,
B.
P.
Berman
and
P.
J.
Farnham
(2015).
"Demystifying
the
secret
mission
of
enhancers:
linking
distal
regulatory
elements
to
target
genes."
Crit
Rev
Biochem
Mol
Biol
In
Press.
Yao,
L.,
H.
Shen,
P.
W.
Laird,
P.
J.
Farnham
and
B.
P.
Berman
(2015).
"Inferring
regulatory
element
landscapes
and
transcription
factor
networks
from
cancer
methylomes."
Genome
Biol
16:
105.
167
Yao,
L.,
Y.
G.
Tak,
B.
P.
Berman
and
P.
J.
Farnham
(2014).
"Functional
annotation
of
colon
cancer
risk
SNPs."
Nat
Commun
5:
5114.
Ye,
H.,
A.
Zhou,
Q.
Hong,
X.
Chen,
Y.
Xin,
L.
Tang,
D.
Dai,
H.
Ji,
M.
Xu,
D.
W.
Wang
and
S.
Duan
(2015).
"Association
of
seven
thrombotic
pathway
gene
CpG-‐‑SNPs
with
coronary
heart
disease."
Biomed
Pharmacother
72:
98-‐‑102.
Yin,
H.,
W.
Xue,
S.
Chen,
R.
L.
Bogorad,
E.
Benedetti,
M.
Grompe,
V.
Koteliansky,
P.
A.
Sharp,
T.
Jacks
and
D.
G.
Anderson
(2014).
"Genome
editing
with
Cas9
in
adult
mice
corrects
a
disease
mutation
and
phenotype."
Nat
Biotechnol
32(6):
551-‐‑553.
Zalatan,
J.
G.,
M.
E.
Lee,
R.
Almeida,
L.
A.
Gilbert,
E.
H.
Whitehead,
M.
La
Russa,
J.
C.
Tsai,
J.
S.
Weissman,
J.
E.
Dueber,
L.
S.
Qi
and
W.
A.
Lim
(2015).
"Engineering
complex
synthetic
transcriptional
programs
with
CRISPR
RNA
scaffolds."
Cell
160(1-‐‑2):
339-‐‑350.
Zanke,
B.
W.,
C.
M.
Greenwood,
J.
Rangrej,
R.
Kustra,
A.
Tenesa,
S.
M.
Farrington,
J.
Prendergast,
S.
Olschwang,
T.
Chiang,
E.
Crowdy,
V.
Ferretti,
P.
Laflamme,
S.
Sundararajan,
S.
Roumy,
J.
F.
Olivier,
F.
Robidoux,
R.
Sladek,
A.
Montpetit,
P.
Campbell,
S.
Bezieau,
A.
M.
O'Shea,
G.
Zogopoulos,
M.
Cotterchio,
P.
Newcomb,
J.
McLaughlin,
B.
Younghusband,
R.
Green,
J.
Green,
M.
E.
Porteous,
H.
Campbell,
H.
Blanche,
M.
Sahbatou,
E.
Tubacher,
C.
Bonaiti-‐‑
Pellie,
B.
Buecher,
E.
Riboli,
S.
Kury,
S.
J.
Chanock,
J.
Potter,
G.
Thomas,
S.
Gallinger,
T.
J.
Hudson
and
M.
G.
Dunlop
(2007).
"Genome-‐‑wide
association
scan
identifies
a
colorectal
cancer
susceptibility
locus
on
chromosome
8q24."
Nat
Genet
39(8):
989-‐‑994.
Zentner,
G.
E.
and
P.
C.
Scacheri
(2012).
"The
chromatin
fingerprint
of
gene
enhancer
elements."
J
Biol
Chem
287(37):
30888-‐‑30896.
Zhang,
X.,
R.
Cowper-‐‑Sal
lari,
S.
D.
Bailey,
J.
H.
Moore
and
M.
Lupien
(2012).
"Integrative
functional
genomics
identifies
an
enhancer
looping
to
the
SOX9
gene
disrupted
by
the
17q24.3
prostate
cancer
risk
locus."
Genome
Res
22(8):
1437-‐‑1446.
Zhang,
X.,
A.
D.
Johnson,
A.
E.
Hendricks,
S.
J.
Hwang,
K.
Tanriverdi,
S.
K.
Ganesh,
N.
L.
Smith,
P.
A.
Peyser,
J.
E.
Freedman
and
C.
J.
O'Donnell
(2014).
"Genetic
associations
with
expression
for
genes
implicated
in
GWAS
studies
for
atherosclerotic
cardiovascular
disease
and
blood
phenotypes."
Hum
Mol
Genet
23(3):
782-‐‑795.
Zhang,
Y.,
Y.
H.
Lin,
T.
D.
Johnson,
L.
S.
Rozek
and
M.
A.
Sartor
(2014).
"PePr:
a
peak-‐‑calling
prioritization
pipeline
to
identify
consistent
or
differential
peaks
from
replicated
ChIP-‐‑Seq
data."
Bioinformatics
30(18):
2568-‐‑2575.
Zhong,
H.,
J.
Beaulaurier,
P.
Y.
Lum,
C.
Molony,
X.
Yang,
D.
J.
Macneil,
D.
T.
Weingarth,
B.
Zhang,
D.
Greenawalt,
R.
Dobrin,
K.
Hao,
S.
Woo,
C.
Fabre-‐‑Suver,
S.
Qian,
M.
R.
Tota,
M.
P.
Keller,
C.
M.
Kendziorski,
B.
S.
Yandell,
V.
Castro,
A.
D.
Attie,
L.
M.
Kaplan
and
E.
E.
Schadt
(2010).
"Liver
and
adipose
expression
associated
SNPs
are
enriched
for
association
to
type
2
diabetes."
PLoS
Genet
6(5):
e1000932.
Zhou,
H.
Y.,
Y.
Katsman,
N.
K.
Dhaliwal,
S.
Davidson,
N.
N.
Macpherson,
M.
Sakthidevi,
F.
Collura
and
J.
A.
Mitchell
(2014).
"A
Sox2
distal
enhancer
cluster
regulates
embryonic
stem
cell
differentiation
potential."
Genes
Dev
28(24):
2699-‐‑2711.
Abstract (if available)
Abstract
Genome wide association studies (GWASs) have identified SNPs that are statistically associated with diseases. Interestingly, most of the GWAS-identified SNPs and their high LD SNPs are found in non-coding regions of the genome, decorated by regulatory elements (enhancers, promoters, and nuclear structure-associated elements). This led to my general hypothesis that disease-associated SNPs affect the expression of disease-associated genes through the function of regulatory elements in which the SNPs are located. Among regulatory elements, disease-associated SNPs are enriched in enhancers, which play an important role in regulating cell-type specific gene expression. However, enhancers do not necessarily regulate nearby genes. Using colorectal cancer (CRC) GWAS and epigenomic information for the active enhancer mark (H3K27Ac), I identified enhancers harboring CRC risk-associated SNPs. Employing a genome-editing tool (CRISPR/Cas9), I deleted several CRC risk-associated enhancers along with control regions and performed RNA-seq assays to identify putative CRC risk-associated genes. The putative target genes of each enhancer were assessed by chromosomal looping assay (4C), confirming direct physical interactions between enhancers and promoters. Deletion of one of the CRC-risk associated enhancers (E7) led to decreased expression of the MYC oncogene and reduced proliferation of HCT116 cells. Interestingly, deletion of the E7 region in HEK293 cells caused a similar downregulation of MYC and reduced cell proliferation as in HCT116, even though the H3K27Ac mark is not present in HEK293 cells. In conclusion, by identifying genes regulated by CRC risk-associated enhancers harboring SNPs, I have developed a general approach to connect risk loci to putative risk genes.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Breast epithelial cell type specific enhancers and functional annotation of breast cancer risk loci
PDF
Identification and characterization of cancer-associated enhancers
PDF
Functional characterization of colon cancer risk enhancers
PDF
Functional characterization of colorectal cancer GWAS loci
PDF
Functional characterization of a prostate cancer risk region
PDF
Understanding prostate cancer genetic susceptibility and chromatin regulation
PDF
The relationship between DNA methylation and transcription factor binding in colon cancer cells
PDF
Functional analysis of a prostate cancer risk enhancer at 7p15.2
PDF
Genome-wide characterization of the regulatory relationships of cell type-specific enhancer-gene links
PDF
Application of tracing enhancer networks using epigenetic traits (TENET) to identify epigenetic deregulation in cancer
PDF
Investigating the role of SASH1 gene located on chromosome 6 in ovarian cancer
PDF
DNA methylation changes in the development of lung adenocarcinoma
PDF
Functional DNA methylation changes in normal and cancer cells
PDF
Identification of novel epigenetic biomarkers and microRNAs for cancer therapeutics
PDF
Leveraging functional datasets of stimulated cells to understand the relationship between environment and diseases
PDF
Application of genetic association methods in mice to understand phenotypes with a complex etiology
PDF
Characterization of a new chromobox protein 8 (CBX8) antagonist in a model of human colon cancer
PDF
The role of glucose-regulated proteins in endometrial and pancreatic cancers
PDF
The role of endoplasmic reticulum chaperones in adipogenesis, liver cancer and mammary gland development
Asset Metadata
Creator
Tak, Yu Gyoung (Esther)
(author)
Core Title
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Genetic, Molecular and Cellular Biology
Publication Date
02/17/2016
Defense Date
10/09/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
colon cancer,GWAS,OAI-PMH Harvest,risk genes,risk-associated enhancers
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Laird-Offringa, Ite A. (
committee chair
), Coetzee, Gerry A. (
committee member
), Farnham, Peggy J. (
committee member
)
Creator Email
esther.ygt@gmail.com,ytak@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-209388
Unique identifier
UC11279169
Identifier
etd-TakYuGyoun-4110.pdf (filename),usctheses-c40-209388 (legacy record id)
Legacy Identifier
etd-TakYuGyoun-4110.pdf
Dmrecord
209388
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Tak, Yu Gyoung (Esther)
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
colon cancer
GWAS
risk genes
risk-associated enhancers