Close
USC Libraries
University of Southern California
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected 
Invert selection
Deselect all
Deselect all
 Click here to refresh results
 Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Folder
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
(USC Thesis Other) 

Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes

doctype icon
play button
PDF
 Download
 Share
 Open document
 Flip pages
 More
 Download a page range
 Download transcript
Copy asset link
Request this asset
Request accessible transcript
Transcript (if available)
Content
  1
 

       Functional Characterization of colon cancer risk-associated enhancers:
                                       connecting risk loci to risk genes

                                                                  By

Yu Gyoung Tak (Esther)

B.S. (Seoul National University) 2007

DISSERTATION  
Submitted in partial satisfaction of the requirements for the degree of  
DOCTOR OF PHILOSOPHY  
in  
Genetics, Molecular and Cellular Biology
in the
FACULTY OF GRADUATE SCHOOL
of the
UNIVERSITY OF SOUTHERN CALIFORNIA  
Approved:



______________________________                                                      
(Peggy J. Farnham, Ph.D.), Mentor


                                         ______________________________  
                                           (Ite A Laird-Offringa, Ph.D.), Chair                          


                                         ______________________________  
                                                   (Gerry A. Coetzee, Ph.D)


Committee in Charge
MAY, 2016




  2
 
 Table of contents

Table of contents ............................................................................................................. 2
Abstract ............................................................................................................................ 6
Acknowledgements ......................................................................................................... 7
List of Figures .................................................................................................................. 9
List of Tables ................................................................................................................. 10
List of Supplementary Figures ..................................................................................... 11
1Chapter 1. Introduction ............................................................................................... 13
1.1The GWAS conundrum ..................................................................................................... 13
1.2 Prioritization of SNPs associated with a specific disease using functional  
annotation ............................................................................................................................... 18
1.3 Prioritization of SNPs associated with a specific disease by linking to gene
expression ............................................................................................................................... 26
1.4 Experimental approaches to identify target genes of regulatory and eQTL SNPs .... 29
1.4.1 Deletion of regulatory elements harboring prioritized SNPs ........................................ 31
1.4.2 Epigenetic modification of regulatory elements that harbor prioritized risk SNPs ....... 34
1.4.3 Specific targeting of prioritized SNPs .......................................................................... 35
1.5 Disease-related functional analyses ............................................................................... 38
 Chapter 2: Functional annotation of colon cancer risk SNPs ................................. 43
2.1 Abstract ............................................................................................................................. 43
2.2 Introduction ....................................................................................................................... 44

  3
 
2.3 Results ............................................................................................................................... 46
2.3.1 CRC risk-associated SNPs linked to a specific gene .................................................. 46
2.3.2 CRC risk-associated SNPs in distal regulatory regions ............................................... 51
2.3.3 Effects of SNPs on binding motifs in the distal elements ............................................ 56
2.3.4 Expression analysis of candidate risk-associated genes ............................................ 57
2.3.5 The effect of enhancer deletion on the transcriptome ................................................. 62
2.4 Discussion ........................................................................................................................ 63
2.5 Methods ............................................................................................................................. 68
2.5.1 RNA-seq ...................................................................................................................... 68
2.5.2 ChIP-seq analysis ........................................................................................................ 69
2.5.3 Enhancer deletion ........................................................................................................ 69
2.5.4 Analysis of FunciSNP and correlated SNPs effects .................................................... 70
2.5.5 Batch effects analysis .................................................................................................. 70
2.5.6 eQTL analyses ............................................................................................................ 72
2.5.7 General data handling and visualization ...................................................................... 73
2.6 Supplementary figures for chapter2 ............................................................................... 74
3Chapter 3: Effects on the transcriptome upon deletion of distal elements are  
 not correlated with the size of H3K27Ac peaks in human cells .............................. 84
3.1 Abstract ............................................................................................................................. 84
3.2 Introduction ....................................................................................................................... 85
3.3 Results ............................................................................................................................... 86
3.3.1 Deletion of CRC risk-associated enhancers can cause widespread effects on the
transcriptome ........................................................................................................................ 86
3.3.2 The size or presence of an H3K27Ac peak does not correlate with enhancer activity 94

  4
 
3.3.3 Characterization of the genome-wide changes in gene expression upon enhancer
deletion ................................................................................................................................. 97
3.3.4 Cell growth is affected by deletion of enhancer E7 ..................................................... 98
3.3.5 4C analysis of E7 and E24 in HCT116 ...................................................................... 100
3.4 Discussion ...................................................................................................................... 102
3.5 Methods ........................................................................................................................... 107
3.5.1 Cell culture ................................................................................................................. 107
3.5.2 CRISPR/Cas9-mediated genome editing .................................................................. 107
3.5.3 PCR detection of cells having enhancer deletions .................................................... 108
3.5.4 RNA-seq .................................................................................................................... 108
3.5.5 Cell proliferation assays ............................................................................................ 110
3.5.6 Colony forming assays .............................................................................................. 110
3.5.7 ChIP-seq and analysis ............................................................................................... 110
3.5.8 4C-seq and analysis .................................................................................................. 111
3.6 Supplementary figures for Chapter 3 ........................................................................... 113
4Chapter 4: Conclusions and Future Studies ........................................................... 122
4.1 Summary of main findings ............................................................................................ 122
4.2 Future directions ............................................................................................................ 123
4.2.1.Epigenome editing of risk-associated enhancers ...................................................... 123
4.2.2. Targeting combinatorial enhancers .......................................................................... 125
4.2.3. Targeting SNP using HDR mechanism .................................................................... 125
4.2.4 . Disease-related functional assays. .......................................................................... 125
Appendix A: Publications as a contributing author ................................................. 127
Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the  
genome by association with GATA3 .................................................................................. 127

  5
 
Appendix B: Supplementary data files in DVD ......................................................... 146
References ................................................................................................................... 148






 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 
               

 

  6
 
                                              Abstract


 
 
 
 
 
 
 
 
 
 Genome wide association studies (GWASs) have identified SNPs that are statistically
associated with diseases. Interestingly, most of the GWAS-identified SNPs and their high LD
SNPs are found in non-coding regions of the genome, decorated by regulatory elements
(enhancers, promoters, and nuclear structure-associated elements). This led to my general
hypothesis that disease-associated SNPs affect the expression of disease-associated genes
through the function of regulatory elements in which the SNPs are located.  Among regulatory
elements, disease-associated SNPs are enriched in enhancers, which play an important role in
regulating cell-type specific gene expression. However, enhancers do not necessarily regulate
nearby genes. Using colorectal cancer (CRC) GWAS and epigenomic information for the active
enhancer mark (H3K27Ac), I identified enhancers harboring CRC risk-associated SNPs.
Employing a genome-editing tool (CRISPR/Cas9), I deleted several CRC risk-associated
enhancers along with control regions and performed RNA-seq assays to identify putative CRC
risk-associated genes. The putative target genes of each enhancer were assessed by
chromosomal looping assay (4C), confirming direct physical interactions between enhancers and
promoters. Deletion of one of the CRC-risk associated enhancers (E7) led to decreased
expression of the MYC oncogene and reduced proliferation of HCT116 cells. Interestingly,
deletion of the E7 region in HEK293 cells caused a similar downregulation of MYC and reduced
cell proliferation as in HCT116, even though the H3K27Ac mark is not present in HEK293 cells. In
conclusion, by identifying genes regulated by CRC risk-associated enhancers harboring SNPs, I
have developed a general approach to connect risk loci to putative risk genes.

 


  7
 
                                  Acknowledgements

 
I would like to thank to my mentor, Professor Peggy Farnham, for being my role model as
a scientist and a teacher during these past five years. Peggy is someone every student likes as a
mentor. She is one of the smartest people that I know. She has always guided me to become a
better researcher through many insightful discussions and suggestions. I deeply thank the fact
that her office door is always open, which allows me to see her whenever I have issues. She is
always prepared to answer any questions and I always receive great feedbacks with positive
energy. In addition to scientific discussions, she showed me how to enjoy life through lab parties
and baseball games. I will never forget several conferences where all lab members attended,
which was possible with Peggy’s great support. Not only having an opportunity to learn how to
make and present a poster at the conference, which gave me lots of opportunities to
communicate with other scientists, I had lots of fun with lab members in Italy and Santa Fe.
There are a lot of great things that I appreciate Peggy for, which were not described here, and I
really hope that I could be as lively, enthusiastic, insightful, and generous as Peggy when I am a
mentor. Peggy, thank you very much for being my mentor!!
I am also very grateful to my committee members Professor Ite A. Laird-Offringa, and
Professor Gerry Coetzee. When I was wondering about subjects for my Ph.D work in my first
year, Ite guided me towards epigenetics during my first rotation. Her kindness and warm
attention helped me to adjust to Ph.D life. I also have to thank Gerry for helping me think different
about aspects of my experiments by providing lots of important questions and suggestions.
I thank all the past and present members of the Peggy’s group. I learned a lot about
ChIP-seq from Seth Frietze and Heather Witt. Seth was a great teacher when I started my Ph.D
work in Peggy’s lab. Matt Grimmer and Adam Blattler were great senior grad students, giving me
lots of advice. Malaina and Lijing were amazing friends, and I enjoyed having scientific
discussions with them, and also having fun inside and outside of the lab. Yuli Hung was a great

  8
 
Master student, and I enjoyed teaching her and talking to her about science. I enjoyed our PET
meetings with Yu Guo, Hang Yang, and Zhiefe Luo, which helped me to learn skills to become a
mentor. I also appreciate Suhn Kyong Rhie, Fides Lay, and Shannon Wood for their scientific
suggestions. I thank Charlie Nicolet, Helen Truong, and Selene Tyndale for their efforts to
provide me the best quality sequencing data. I am also grateful that I have always seen Charlie
and Peggy’s happy family life. I thank Vicky Yamamoto who always smiles and works really hard
till late night providing me company, listening to my stories, and giving me suggestions. I felt like
she was my older sister.  
I also want to give special thanks to my former advisor in Korea, Sung Jin Kim, who
treated me as his daughter, providing advices and help for my work, health, and my family.  
I also want to give thanks to my spiritual family in USA and my real family in Korea.
Without their love and help, I could not have finished my Ph.D.


 

 





 

 


 

 

  9
 
                                     List of Figures

 

 
Figure 1.1 Making sense of GWAS: an overview ............................................................ 17
Figure 2.1 Identification of potential functional SNPs for CRC. ....................................... 49
Figure 2.2 Expression of risk-associated genes in colon cells. ....................................... 54
Figure 2.3 Linking a transcript to an enhancer using TCGA data. .................................. 61
Figure 2.4 Identification of genes affected by deletion of enhancer 7. ............................ 63
Figure 2.5 Summary of identified candidate genes correlated with increased risk for
CRC. ................................................................................................................................ 64
Figure 3.1 Genomic location and H3K27Ac profiles of E7, E24, 18qE, and 18qNE. ...... 88
Figure 3.2 Experimental schema. .................................................................................... 89
Figure 3.3 Gene expression and H3K27Ac changes upon deletion of E7 in HCT116  
cells. ................................................................................................................................ 91
Figure 3.4 Gene expression and H3K27Ac changes upon deletion of E24 in HCT11
cells. ................................................................................................................................ 93
Figure 3.5 H3K27Ac, TCF7L2, and CTCF binding profiles of E7, E24, 18qE, and
18qNE. ........................................................................................................................... 95
Figure 3.6 Gene expression and H3K27Ac changes upon deletion of E7 in HEK293
cells. ................................................................................................................................ 96
Figure 3.7 Proliferation is affected by deletion of enhancer 7. ...................................... 100
Figure 3.8 4C analysis of enhancer E7  and E24. ......................................................... 102
Figure 4.1 Repressed CRC risk enhancer E21 ............................................................. 124


  10
 
                                          List of Tables
Table 1.1 Publicly available functional annotation programs. ............................................ 25
Table 1.2 Sources of eQTL databases ............................................................................... 29
Table 2.1 Summary of regions linked to CRC tag SNPs. ................................................... 50
Table 2.2 Expressed transcripts directly linked CRC index SNPs ..................................... 51
Table 2.3 Distal regulatory regions correlated with CRC tag SNPs ................................... 55
Table 2.4 Effects of SNPs on motifs in the distal regulatory regions. ................................. 57
Table 2.5 Linking transcripts to enhancers using TCGA data. ........................................... 60
Table 3.1 Altered gene expression upon enhancer deletion. ............................................. 94
                       





 

  11
 
                        List of Supplementary Figures
Supplementary figure 2.1 Correlated exon SNPs. .......................................................... 74
Supplementary figure 2.2 Analysis of RNA-seq data. ..................................................... 75
Supplementary figure 2.3 Correlated TSS SNPs. ........................................................... 76
Supplementary figure 2.4 ChIP-seq peak analysis. ........................................................ 77
Supplementary figure 2.5 Correlated enhancer SNPs. ................................................... 78
Supplementary figure 2.6 TCGA batch effects analysis. RNA-seq batch effects. ........... 79
Supplementary figure 2.7 TCGA batch effects analysis. CNV (GW SNP6 array) batch
effects. ............................................................................................................................. 80
Supplementary figure 2.8 TCGA batch effects analysis. DNA methylation (Infinium
HM450K microarray) batch effects.   ............................................................................... 81
Supplementary figure 2.9 Expression analysis of genes identified by promoter and  
exon SNPs and potential enhancer target genes in TCGA samples.   ............................ 82
Supplementary figure 2.10 eQTL analysis summary.   .................................................... 83
Supplementary figure 3.1 Guide RNA sequences. ........................................................ 113
Supplementary figure 3.2 Confirmation of enhancer deletions. .................................... 114
Supplementary figure 3.3 List of datasets. .................................................................... 115
Supplementary figure 3.4 PCA plots of RNA-seq data. ................................................. 116
Supplementary figure 3.5 Circos plots for top downregulated genes. ........................... 117
Supplementary figure 3.6 Comparison of TCF7L2 and CTCF binding patterns upon  
E7 deletion in HCT116. ................................................................................................. 118
Supplementary figure 3.7 .............................................................................................. 120

  12
 
Supplementary figure 3.8 Proliferation assays. ............................................................. 121
Supplementary figure 3.9 4C-seq information. .............................................................. 121



 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


 

 

  13
 
1 Chapter 1. Introduction  

The following introductory chapter is being prepared for submission as a review article entitled “Making
Sense of GWAS: using epigenomics and genome engineering to understand the functional relevance of
SNPs in non-coding regions of the human genome” for publication in the journal Epigenetics and
Chromatin. The authors will be Yu Gyoung Tak and Peggy J. Farnham.  Yu Gyoung Tak reviewed all the
literature, wrote the first draft, and incorporated all suggestions and edits; Peggy Farnham provided
suggestions and helped to edit the document.

1.1 The GWAS conundrum
         Considerable progress towards an understanding of complex diseases has been made in
recent years due to the development of high throughput genotyping technologies. Using
microarrays that contain millions of single nucleotide polymorphisms (SNPs), Genome Wide
Association Studies (GWASs) have identified SNPs that are associated with many complex
diseases or traits (Welter, MacArthur et al. 2014). Such studies rely on differences in the
frequency of a specific SNP in, for example, healthy (or control) vs. diseased (or affected)
populations. To date, ~14 million validated SNPs have been identified in human populations.
Although GWAS arrays do not contain all mapped SNPs, it is estimated that they do capture most
human genome variation through haplotype-based SNP imputation (Li, Willer et al. 2009, Kichaev
and Pasaniuc 2015). The SNPs identified by GWAS that are statistically significantly over-
represented in the disease (or affected) populations are called index SNPs and genomic regions
containing the index SNPs are called risk loci for that particular disease. As of February 2015,
2111 different association studies have identified 15,396 index SNPs for various diseases and
traits (www.genome.gov/gwastudies), with the number of identified SNP-disease/trait associations

  14
 
increasingly rapidly in recent years (Welter, MacArthur et al. 2014). However, there are several
issues that have made it difficult for researchers to explain disease risk using GWAS results.
        First, unlike Mendelian diseases such as cystic fibrosis that are often caused by mutations in
coding regions of proteins, GWAS-identified disease-associated nucleotide differences are rarely
found in coding regions. Instead, most disease-associated index SNPs are located in non-coding
regions of the genome, equally proportioned between the intergenic and intronic compartments
(Freedman, Monteiro et al. 2011, Blattler, Yao et al. 2014) However, it is important to consider
that, as mentioned above, the index SNPs actually serve only as representatives for all the SNPs
in the same haplotype block, and it is equally likely that other SNPs in high linkage disequilibrium
(LD) with the array-identified index SNPs are causal for the disease. Because it was hoped that
disease-associated coding variants would be identified if the true casual SNPs were known,
investigators began expanding their analyses to include more than just the index SNPs. A
commonly used approach to investigate SNPs other than the index SNPs present on the standard
GWAS array has been to use LD calculation (Browning and Browning 2007, Howie, Donnelly et al.
2009, Li, Willer et al. 2010) together with the 1000 Genomes Project reference panels from 14
populations (Abecasis, Auton et al. 2012). Such approaches have generally expanded the list of
putative causal SNPs from less than 100 index SNPs for a particular disease or trait to thousands
of associated SNPs (Figure 1; High LD SNPs).  For example, 727 SNPs are in high LD (r
2
>0.5)
with 77 index SNPs linked to prostate cancer (Hazelett, Rhie et al. 2014). Unfortunately, most of
these LD-associated SNPs are also in non-coding regions of the genome.  Similarly, SNPs
correlated with 25 colon cancer risk-associated index SNPs were analyzed (using an r
2
>0.5); 13
correlated SNPs were located in exons (only 2 of which were predicted to be damaging to the
protein structure), whereas 503 correlated SNPs were located in non-coding regions
corresponding to promoters or enhancers (Yao, Tak et al. 2014). An alternative approach called
fine mapping has also been used in attempts to move from the index SNP (which basically

  15
 
identifies a large genomic region) to a more refined list of putative causative SNPs located within
the identified region. Fine-mapping studies employ dense genotyping arrays that contain all
common SNPs within the previously identified risk loci, which together with imputation (Browning
and Browning 2007, Howie, Donnelly et al. 2009, Li, Willer et al. 2010), allows investigators to
perform a more complete analysis of the risk regions (Error! Reference source not found.; Fine-
mapped SNPs) (Spain and Barrett 2015). However, genotyping at this fine scale requires large
sample sizes to provide the statistical power needed to differentiate the causal SNPs from the
non-causal SNPs. In addition, creation of loci-specific genotyping arrays is quite expensive.
Therefore, most fine mapping analyses have been done by international consortia with shared
interests for specific diseases or traits; examples include the Immunochip (Trynka, Hunt et al.
2011), the Metabochip (Voight, Kang et al. 2012), the iCOGs array (Michailidou, Hall et al. 2013),
and the Oncoarray (hyyp://epi.grants.cancer/gov/oncoarray/).  The majority of fine-mapping
studies have been performed using European-ancestry populations in which LD blocks are longer
than in other populations and therefore many correlated SNPs are present per loci, exacerbating
the problems related to a need for large sample sizes to separate true candidate causal SNPs
from less significantly associated risk SNPs (Edwards, Beesley et al. 2013, Amin Al Olama,
Dadaev et al. 2015). However, recent fine-mapping studies of trans-ethnic populations have
shown better results in discovering candidate causal SNPs (Ong, Wang et al. 2012, Mahajan, Go
et al. 2014, Han, Hazelett et al. 2015, Kichaev and Pasaniuc 2015); trans-ethnic fine-mapping
increases statistical power by increasing the number of samples and also helps to avoid false
positives due to confounding factors of population stratification. Unfortunately, a recent multi-
ethnic analysis of prostate cancer risk SNPs found that, even after fine-mapping, most risk-
associated SNPs are located in non-coding regions (Han, Hazelett et al. 2015). Thus, the GWAS
field has been left with the conundrum as to how a single nucleotide change in a non-coding
region could confer increased risk for a specific disease. One possible answer to this puzzle is

  16
 
that the SNPs cause changes in gene expression levels rather than causing changes in protein
function. This chapter provides a description of 1) advances in genomic and epigenomic
approaches that incorporate functional annotation of regulatory elements to prioritize for follow-up
studies the disease risk-associated SNPs that are located in non-coding regions of the genome, 2)
various computational tools that aid in identifying gene expression changes caused by the non-
coding disease-associated SNPs, and 3) experimental approaches to identify target genes of, and
study the biological phenotypes conferred by, non-coding disease-associated SNPs.  

  17
 

 
Figure 1.1 Making sense of GWAS: an overview
Shown is a flow chart of analytical and experimental steps that can be followed to understand how
a non-coding SNP can be associated with an increased risk for a specific disease.  Index SNPs
are identified using GWAS arrays and then expanded to a larger set of SNPs (termed Refined
Associated SNPs) using LD scores and fine mapping.  These Refined Associated SNPs are then
prioritized using functional annotation to identify Regulatory SNPs (Reg SNPs) or linkage to allele-
specific gene expression to identify eQTL SNPs, producing a set of Candidate Functional SNPs.
The Candidate Functional SNPs can either be studied directly or further refined by testing the Reg
SNPs for possible SNP-RNA linkages or by testing the eQTL SNPs for functional annotation. If the
Candidate Functional SNP (yellow arrowhead) lies within a distal regulatory element, it should be
deleted or modified using genomic nucleases or epigenomic toggle switches (Approach A);
putative target genes are then identified using RNA-seq.  Distal regulatory elements that cause
changes in gene expression when deleted or modified can then be studied using allele-specific
analyses (Approach B); promoters harboring risk-associated SNPs (pink arrowhead) can be
directly studied using Approach B.  As described in the text, the cells deleted for the distal
regulatory elements can be used to identify an appropriate phenotypic assay for analysis of the
candidate target genes. Then, the genes that show expression changes that are linked to distal
SNPs and the genes regulated by the promoter SNPs can be studied using those biological
assays to identify possible therapeutic targets and/or candidates for diagnostic tests.  Finally,
looping assays can be performed to distinguish direct from indirect targets of the distal regulatory
elements. It is important to note that a gene whose expression is indirectly affected by a non-
coding SNP could be a more important diagnostic or therapeutic target that the directly affected
gene.
 

  18
 
 
1.2 Prioritization of SNPs associated with a specific disease using functional
annotation
         As noted above, not only do the vast majority (~93%) of index SNPs in the GWAS catalog
that have been associated with specific diseases or traits lie within non-coding regions, but also
most SNPs in high LD with the index SNPs and most SNPs identified by fine-mapping (Figure 1;
collectively identified as Refined Associated SNPs) are also located in non-coding regions.  
The current hypothesis is that one or more of these refined risk-associated non-coding SNPs
cause changes in gene expression of a critical gene. However, functional follow-up experiments
(described in Section 1.4) are both expensive and time-consuming and one cannot test each
possible candidate SNP for causality. It is also important to note that although fine-mapping
usually results in a smaller number of associated SNPs than does LD calculation, fine-mapping
has only been performed for a relatively small number of disease-associated loci and therefore
most investigators are left with the problem of a fairly large list of “possible” causal SNPs.  Clearly,
it is necessary to prioritize the list of refined associated SNPs for follow-up analyses.  One way to
prioritize the list of SNPs is to identify those located in regulatory regions of the genome (Figure 1;
Regulatory SNPs). There are several types of elements involved in transcriptional regulation
including promoters, enhancers, and nuclear structure-associated elements such as CTCF
binding regions; each of these elements has been associated with non-coding SNPs.
The first step in identifying Regulatory SNPs is to select from the list of Refined Associated
SNPs those that lie within regulatory regions.  A promoter is user-defined region, usually
corresponding to several Kb surrounding a transcription start site (TSS) of a known coding or non-
coding gene. Thus, investigators can bioinformatically identify promoter SNPs.  It is more difficult
to identify SNPs within enhancers because, unlike promoters, they do not occur at a defined

  19
 
distance from a TSS. However, they can be identified by specific epigenomic profiles. Within
recent years, consortia such as the Encyclopedia of DNA Elements (ENCODE)
(ENCODE_Project_Consortium 2012) and the Roadmap Epigenomics Mapping Consortium
(REMC) (Bernstein, Stamatoyannopoulos et al. 2010, RoadmapEpigenomicsConsortium 2015)
have used a variety of genome-wide methods to study the chromatin state of non-coding regions
in the human genome in hundreds of different cell types (primary cell lines, immortalized cell lines,
and tissues).  In these studies, enhancers have been identified using methods that detect open
chromatin, specific histone modifications, and enhancer RNAs (eRNAs).  For example, DNase-
seq (Maurano, Humbert et al. 2012) has been used to identify DNase1-hypersensitive regions
(DHSs) that correspond to areas of open, accessible chromatin that contain binding motifs for
transcription factors (TFs). Although DHSs are generally a few Kb in length, DNase footprinting
(which combines deeply sequenced DNase-seq data with motif information) can help to more
precisely identify the critical nucleotides within a DHS site (Boyle, Song et al. 2011, Schaub, Boyle
et al. 2012). More recently, ATAC-seq, a method that employs an engineered Tn5 transposase to
measure chromatin accessibility, has been used to define genomic maps of open chromatin;
advantages of ATAC-seq include the requirement for fewer cells (500-50,000 cells) and fewer
experimental steps, as compared to DNase-seq (Buenrostro, Giresi et al. 2013).  The entire set of
DHSs includes promoter regions, distal enhancer regions, and sites of binding of structural TFs.
To further refine the set of distal DHSs to include only active enhancers, investigators use the
method of ChIP-seq and antibodies specific to histone modifications. For example, potentially
active enhancers are identified as regions of open chromatin with flanking nucleosomes having
histone 3 marked by monomethylation of lysine 4 (H3K4me1), whereas nucleosomes flanking fully
active enhancers are marked by H3K4me1 and also by acetylation of lysine 27 (H3K27Ac)
(ENCODE_Project_Consortium 2012); enhancers also sometimes have low levels of histone H3
trimethylated on lysine 4 (H3K4me3), a mark that is quite strong at promoters. The H3K27Ac

  20
 
mark at enhancers is likely a consequence of the binding of site-specific TFs (e.g. TCF7L2) that
recruit histone acetyltransferases such as EP300 and CBP. It is thought that the acetylation of the
histones flanking a DHS increases the net affinity of other TFs to the region of open chromatin
(Spitz and Furlong 2012).  Thus, it seems logical that identifying active enhancers using TF ChIP-
seq data would also be possible. However, considering the fact that ChIP-seq patterns have been
identified for less than 150 transcription factors out of 1800 known TFs (and in only a few cell
types), the combination of DHS and histone modifications is more commonly used to identify
enhancers (Maurano, Humbert et al. 2012). Finally, a different approach to identifying active
enhancers has been used by the FANTOM5 project, which employed cap analysis of gene
expression (CAGE) to discover active enhancers that produce bidirectional capped RNA. Notably,
although very few enhancers were identified by this method, a high percentage of these
enhancers were validated by reporter assays (Andersson, Gebhard et al. 2014). It is also
important to note that enhancers are very cell type-specific and therefore enhancer mapping must
be performed in the cell type(s) that are relevant to the disease under study.
Several studies have shown that index SNPs and/or correlated SNPs that are in high LD
to the index SNPs are enriched in enhancer regions.  For example, one study found that non-
coding index SNPs from 426 GWASs are enriched in enhancers present in the relevant cell types
and that several of the index SNPs created or disrupted TF motifs in the identified enhancers
(Ernst, Kheradpour et al. 2011).  Also, Schaub et al. studied 4724 GWAS index SNPs associated
with 470 different phenotypes using ENCODE data, showing that 36% of the SNPs are in DHSs
and 20% are in a ChIP-seq peak in at least one cell line. When they extended their analyses to
SNPs that are in high LD (r^2 >0.8) with the index SNPs, the overlap increased by over two-fold
(Schaub, Boyle et al. 2012). These findings are consistent with a recent study in which
investigators used H3K27Ac ChIP-seq data from normal and colon cancer cells and found that
270 SNPs that have a high LD (r
2
>0.5) with 25 colorectal cancer index SNPs are located in

  21
 
H3K27Ac sites; when the SNPs were limited to distal regions they identified 68 unique enhancers
(Yao, Tak et al. 2014). Similarly, combining H3K27Ac and H3K4me1 ChIP-seq data and DNase-
seq data from prostate cancer cells, Hazelett et al. identified 727 SNPs that were in high LD  
(r
2
>0.5) with 77 prostate cancer risk SNPs; of these, 663 SNPs were in putative enhancer regions
(Hazelett, Rhie et al. 2014). Also, a recent fine-mapping study of Type 1 Diabetes (T1D) found
that fine-mapped T1D-associated SNPs are localized in active enhancers of thymus, T and B cells,
and CD34+ stem cells (Li, Chen et al. 2015, Onengut-Gumuscu, Chen et al. 2015).
 The working model for establishment and maintenance of active enhancers is that
transcription factors bind to the DNA, position the nucleosomes, and then serve to keep the region
in between the nucleosomes in an open conformation (Spitz and Furlong 2012). Thus, it is logical
to assume that risk-associated regulatory SNPs would have a higher likelihood of causality if they
disrupt a motif for a site-specific TF in the nucleosome-free region of an enhancer or DHS.
Unfortunately, although progress has been made in identifying in vivo motifs for TFs using ChIP-
seq data (Wang, Zhuang et al. 2013), the motifs for most site-specific TFs are not known.
However, programs have been developed that allow investigators to incorporate information about
the set of known TF motifs into SNP prioritization (Kilpinen, Waszak et al. 2013, Amin Al Olama,
Dadaev et al. 2015). Using such programs, regulatory SNPs located in motifs of TFs known to be
important in establishing or maintaining the phenotypic characteristics of specific cell types have
been identified.  For example, motifbreakR (Amin Al Olama, Dadaev et al. 2015) can predict
TF motif disruptions for a large number of provided SNPs using several different sources of TF
motifs (see Table 1 for details). However, it should be noted that studies have shown that many
risk-associated SNPs (index SNPs and SNPs that are in high LD to index SNPs) are not precisely
located in the conserved binding motif of transcription factors but are in nearby regions (Heinz,
Romanoski et al. 2013, Farh, Marson et al. 2015). It is possible that such SNPs disrupt an as-of-
yet unknown motif for a TF that has not yet been characterized by ChIP-seq.  

  22
 
Perhaps the TF for which the most ChIP-seq experiments have been performed is
CCCTC-binding factor (CTCF) (Holwerda and de Laat 2013, Ong and Corces 2014, Nichols
and Corces 2015).  ENCODE, as well as many individual laboratories, have mapped CTCF
binding sites in a large number of human cell types. Such studies have revealed that CTCF binds
to promoter and enhancer regions, but it can also bind to regions of the genome that lack the
histone modifications that specify active promoters and enhancers.    For example, in Panc1 cells,
15% of CTCF peaks are in promoters, 14% are in enhancers, and 71% are in neither promoters
nor enhancers (M. Gaddis and P. Farnham, unpublished data). Topologically-associating domains
(TADs), which demarcate large chromatin regions that interact via looping, are enriched for CTCF
binding sites at their boundaries, suggesting a role for CTCF-mediated looping in the maintenance
of TADs (Sexton and Cavalli 2015). CTCF is also thought to contribute to the overall 3-
dimensional structure of chromatin by forming a loop through which distal enhancers and
promoters can be brought into close proximity, perhaps leading to transcriptional activation of the
linked promoter (Ong and Corces 2014). CTCF has also been shown to serve as insulator that
interferes with the interaction between an enhancer and a promoter and to block chromosome
position effects of transgenes (Ong and Corces 2014). Thus, regulatory SNPs that disrupt or
create a CTCF site would have high priority for follow-up analyses.  A recent GWAS of a Chinese
population identified 3 index SNPs statistically associated with increased risk of lung cancer that
are located within CTCF ChIP-seq peaks in the A549 lung cancer cell line (Petit, Jourdain et al.
2015). In addition, Ding et al. identified statistically significant allele-specific CTCF binding data
from 50 lymphoblastoid cells lines (McDaniell, Lee et al. 2010) which were genotyped as a part of
the 1000 Genomes Project, providing a source of prioritized SNPs to study the involvement of
CTCF in disease risk (Ding, Ni et al. 2014). Interestingly, only 25% of these genetic variants are
exactly in the CTCF motif; however, most are located within 1kb of the motif (Ding, Ni et al. 2014).
This finding is consistent with the studies described above showing that many risk-associated

  23
 
SNPs are not in the conserved binding motif of transcription factors but are in nearby regions.  Of
course, it is not yet known if the SNPs that are nearby, but not in, CTCF motifs are functionally
relevant.
          Finally, SNPs located within CpG sites have been studied for their relationship to disease.
Clearly, if a CpG site within a known motif for a TF is identified as a disease-associated SNP, it
could alter gene regulation simply by changing the affinity of the TF for that region. In fact, several
TFs do harbor CpG dinucleotides at critical positions in their motifs (Blattler and Farnham 2013,
Blattler, Yao et al. 2014). However, CpGs can also regulate gene expression in a more region-
specific way. CpG island methylation of promoter regions of tumor-suppressor genes is one of the
driving factors for cancer development (Jones 2012). In addition, recent studies have shown that
hyper- and hypo-methylation of distal elements can be linked to tumor-specific changes in gene
expression (Yao, Shen et al. 2015). Increased methylation of a promoter or enhancer is generally
thought to lead to transcriptional repression, whereas decreased methylation is thought to lead to
gene activation. Thus, a single change at a SNP (which disrupts or increases binding of a TF) can
lead to an altered epigenetic pattern of a larger region. Measuring methylation levels at 22,290
CpG diuncleotides in lymphoblastoid cell lines of 77 individuals from the HaMap project, Bell et al.
found 180 CpG sites in 173 genes that are associated with SNPs located nearby (within a 5 Kb
window) (Bell, Pai et al. 2011). Additionally, several diseases have been reported to be linked to
aberrant SNP-associated methylation at CpGs in promoter regions (Hitchins, Rapkins et al. 2011,
Dayeh, Olsson et al. 2013, Ye, Zhou et al. 2015). For example, Hitchins et al. found that a single
nucleotide variant in the 5’ UTR of the MLH1 gene resulted in increased methylation of the
promoter, leading to transcriptional repression. It has been suggested that the variant SNP
decreases recruitment of a TF, causing loss of protection from methylation on nearby CpG sites,
thus leading to Lynch syndrome.  

  24
 
As described above, identification of Regulatory SNPs requires investigation as to whether
any of the Refined Association SNPs fall within promoters, enhancers, TF binding sites, or CpG
dinucleotides. Although one could determine if any of the relatively small set of index SNPs for a
particular disease is located within a mapped regulatory element by simply visualizing the location
of the SNP and the location of functional elements on a genome browser, it would be quite
laborious to do this for the many hundreds of the SNPs in high LD with the index SNPs. Therefore,
several different programs have been developed that integrate genetic information (genotyping
and imputation data for GWAS index SNPs and SNPs in LD to index SNPs) with epigenetic
information (generated by DNase-seq, ChIP-seq, or DNA methylation assays) and chromatin
interaction data. Listed in Table 1 are some of the publicly available functional annotation
programs; each program has its own advantages and disadvantages. For example, Regulome DB
(Boyle, Hong et al. 2012) and HaploReg (Ward and Kellis 2012) share similar features,
automatically providing all possible epigenetic information for all available cell types and tissues
for the input SNPs (the epigenetic maps are derived from the ENCODE and REMC databases).
However, neither program has options for analyzing only the relevant cell types for the disease-
associated SNPs. In contrast, FunciSNP (Coetzee, Rhie et al. 2012), GREGOR (Schmidt, Zhang
et al. 2015) and Enlight (Guo, Conti et al. 2015) allow users to add their specific epigenetic data
from the cell type of interest (which may not be in the public databases), providing a better
prioritization of the regulatory SNPs. Of note, GWAS3D (Li, Wang et al. 2013) and Enlight (Guo,
Conti et al. 2015) include an automatic analysis of chromatin interaction features (although such
data is not yet available for many cell types), and Enlight automatically generates plots showing
LD information and overlapping annotated features.

  25
 
                      Table 1.1 Publicly available functional annotation programs.

Tool Type Minimum,input, Output
epigenetic,annotation,file,
used
URL PMID
HaploReg Web$server rsID
Overlapping$annotated$features$
and$TF$motif$disruption$
information$for$SNPs$(input)$and$
correlated$SNPs$with$r2$>0.8
ChromHMM,$DnaseJseq,$a$
library$of$position$weight$
matrices$(PWMs)$from$
TRANSFAC,$JASPAR,$and$
protein$binding$array$(PBM),$
and$eQTL
http://www.broadinstitute.o
rg/mammals/haploreg/haplo
reg_v3.php
22064851
RegulomeDB Web$server rsID
Overlapping$annotated$feaure$for$
SNPs$(input)$with$scores$which$
depend$on$the$combination$of$
overlapping$annotated$features$
and$UCSC$genome$browser$
showing$overlapping$feaures$
TF$binding,$DnaseJseq,$
FAIRE,$Dnase$
footprinting,eQTL,$dsQTL,$
ChIPJexo$and$DNA$
methylation
http://www.regulomedb.org 22955989
FORGE Web$server rsID
Overlapping$DNase1$hotspots$for$
SNP(input)
DNase1$hotspot
http://browser.1000genome
s.org/Homo_sapiens/UserDa
ta/Forge
rSNPBase Web$server rsID$or$gene$name
Proximal$or$distal$transcriptional$
regulation,$miRNA$regulation,$
RNA$binidng$protein$mediated$
regulation,$eQTL$results$for$SNPs$
(input)$and$correlated$SNPs$
(r2>0.8)
histone$modification,$TF$
bindings,$CpG$islands,$RBP,$
miRNA$data$$
http://rsnp.psych.ac.cn/ 24285297
FunciSNP R$package
GWAS$index$SNP$
information$
(chrom:position,$rsID,$
population)in$tabJdelimited$
file,$$biofeature$$
information$in$.bed$format,$
userJdefined$r2$value
Overlapping$annotated$features$
for$index$SNP(input)$and$
correlated$SNPs$which$r2$values$
are$userJdefined.
Any$biofeature$annotation$
information$in$.bed$format
https://github.com/labrazil/
Coetzee_Seq_Analysis/tree/
master/FunciSNP
22684628
GREGOR
A$package$run$using$
perl$code
A$file$containing$single$
column$of$index$SNP,$
biofeature$information$in$
.bed$format,$userJdefined$
r2$value
prioritized$variants$based$on$
overlap$with$selected$regulatory$
regions,$enrichment$analysis$with$
PJvalues$showing$how$index$SNPs$
or$correlated$SNPs$are$enriched$in$
annoated$feaure$compared$to$
control$SNPs$
Any$biofeature$annotation$
information$in$.bed$format
http://csg.sph.umich.edu/GR
EGOR/index.php/site/index
25886982
Enlight Web$server rsID,$PJvalue
$plots$shwing$LD$and$overlapping$
annotated$features$for$SNP$
(input)
chromHMM,$histone$
modification,$DNA$
methylation,$TF$bindings,$
eQTL,$HiJC$or$customized$
BED$file$for$biofeatures
http://enlight.usc.edu/index.
html
25262152
GWAS3D Web$server rsID,$PJvalue
TF$motif$analysis$and$overlapping$
annotated$features$for$SNPs$
(input)
5C,$HiJC,$ChIAJPET,$
ChromeHMM,$H3K27Ac,$
p300,$CTCF,$DHS$$(Option$
for$selecting$cell$lines$
relavant$to$disease)
http://jjwanglab.org/gwas3d 23723249
motifbreakR R$package
SNP$information$in$.bed$or$
.vcf$format
Comprehensive$TF$binding$sites$
disruption$at$SNPs$(input)
TF$motif$information$from$
ScerTF,$FlyFactorSurvey,$
hpDI,$UniPROBE,$JASPAR,$
ENCODE,Homer,Factorbook,$
HOCOMOCO
https://github.com/SimonJ
Coetzee/motifBreakR
26272984

  26
 

1.3 Prioritization of SNPs associated with a specific disease by linking to
gene expression
         A second way to prioritize risk-associated SNPs is to link the different SNP alleles to
changes in gene expression using population-based methods. These methods identify
“expression quantitative trait loci” (eQTL), which are defined as genomic regions that harbor one
or more nucleotide variants that correlate with differences in gene expression (Albert and
Kruglyak 2015). We note that although eQTLs are said to identify ‘loci’, most investigators use this
term to refer to specific nucleotides (i.e. SNPs) that correlate with differences in gene expression
(Albert and Kruglyak 2015). Expression-associated SNPs (Figure 1; eQTL SNPs) can be
bioinformatically associated with genes that are located in a genomic regions near to or far from
the SNP in question. If associated with a nearby gene, the relationship is termed a “local eQTL’”,
whereas SNPs associated with genes located farther away on the same chromosome or on
different chromosomes are called “distal eQTLs”. In many cases, local eQTLs work as cis-eQTLs,
which directly affect expression of nearby genes (usually limited to genes within 250 Kb to 1 Mb)
in an allelic-specific manner (Albert and Kruglyak 2015, Gibson, Powell et al. 2015). In contrast,
trans-eQTLs cannot be applied to the study of allele-specific gene expression because they likely
affect expression of the identified gene as a secondary consequence of changes in direct target
genes. Most trans-eQTLs are distal eQTLs, being associated with genes found far from the SNP
on the same chromosome or on different chromosomes (Pai, Pritchard et al. 2015). However, it
should be noted that some trans-eQTLs can be local eQTLs; even though nearby the SNP under
study, the associated gene can be affected as a secondary consequence of gene expression
changes of a direct target gene. Most studies focus on cis-eQTLs (Pai, Pritchard et al. 2015)
because trans-eQTLs require multiple testing to gain statistical power (Westra, Peters et al. 2013).  

  27
 
For eQTL analyses, SNPs are mapped using a genotyping array and mRNA abundance is
measured by microarray or, more commonly in recent studies, by RNA-seq using hundreds of
samples from cell lines or tissues that are relevant to the disease or traits under study.  Statistical
methods are then used to associate SNPs with transcripts to identify eQTLs (Gibson, Powell et al.
2015); sources of eQTL databases are listed in Table 2. It is important to note that not all eQTL
SNPs have been linked to disease; in other words, the SNPs associated with gene expression
were not identified via GWAS. However, many studies have revealed that eQTLs can be identified
for some GWAS risk loci (testing index SNPs or SNPs in high LD with the index SNPs); in these
cases, the association of the SNP and expression of nearby genes was identified in a trait or
disease-specific manner (Nicolae, Gamazon et al. 2010, Zhong, Beaulaurier et al. 2010,
Ramasamy, Trabzuni et al. 2014, Zhang, Johnson et al. 2014). For example, type 2 Diabetes
index SNPs and high LD SNPs (r2>0.9) are enriched in the set of eQTL SNPs identified using
liver and fat tissues (Zhong, Beaulaurier et al. 2010). Also, eQTL SNPs identified using gene
expression datasets from blood showed enrichment for association with autoimmune disease, but
not with bipolar disorder or type 2 Diabetes(2015). On the other hand, the sets of genes located
nearby GWAS-identified SNPs are not always highly concordant with eQTL-associated genes
(2015), suggesting that some GWAS signals affect genes that are far away. Therefore, we cannot
conclude that the target genes of GWAS SNPs are the same genes identified by cis-eQTL SNPs,
For example, Munsunuru et al. (Musunuru, Strong et al. 2010) used GWAS information to identify
a risk loci at 1p13 that is associated with both plasma low-density lipoprotein cholesterol (LDL-C)
and myocardial infarction (MI). Also, they used eQTL analysis of liver gene expression datasets to
determine if risk SNPs in the 1p13 region are associated with nearby genes, finding that two
GWAS index SNPs (rs646776 and rs12740374) were in eQTL with the SORT1 gene. The authors
suggest that the minor allele of rs12740374 creates a C/EBP binding site and results in increased
SORT1 expression, which contributes to the risk for LDL-C and MI.  However, it should be noted

  28
 
that SORT1 is not the nearest gene to rs12740374 and is located 123 Kb from the risk-associated
index SNP.
       Although some eQTLs are shared across different cell types, most eQTL associations are
cell-type specific (Emilsson, Thorleifsson et al. 2008, Dimas, Deutsch et al. 2009). These cell-type
specific eQTLs are often quite far from the gene they as associated with and tend to have small
effects on gene expression, reflecting the characteristics of enhancer elements (Dimas, Deutsch
et al. 2009). Using epigenetic information from ENCODE and REMC to functionally annotate 4085
intergenic eQTLs, investigators showed that the eQTLs which have the highest significance per
gene are enriched in transcription factor binding sites, enhancers, promoters, and open chromatin.  
A recent study identified enrichment of eQTL SNPs in distal elements, but the SNP-gene
expression linkage only appeared upon immune stimulation of naïve monocytes (Fairfax,
Humburg et al. 2014), suggesting that new enhancers harboring eQTL SNPs were created by
immune stimuli. Several studies have suggested that changes in transcription factor binding are a
major result of cell type-specific eQTLs, leading to changes in chromatin structure, histone
modification, or methylation, with resultant changes in gene expression (McVicker, van de Geijn et
al. 2013). In a recent study correlating RNA-seq data from 103 matched tumor and normal colon
mucosa samples from Danish patients with germline genotyping from 90 patients, investigators
found that many of the identified eQTLs are tumor specific. Using ChIP-seq peaks from a colon
cancer cell line, they concluded that the tumor-specific eQTLs are associated with binding of
several transcription factors that show increased expression in tumors (Ongen, Andersen et al.
2014). Other evidence supporting an important role for transcription factor binding in the
mechanism by which eQTLs function is provided by meQTLs, defined as CpG sites in which DNA
methylation changes have association with SNPs that are several Kb away (Banovich, Lan et al.
2014). A recent study showed 23 SNPs out of 109 cancer GWAS SNPs from 13 different cancer
types had associations with methylation status (Heyn, Sayols et al. 2014). Banovich et al. showed

  29
 
that meQTLs are frequently associated with changes in histone modification, DNase1
hypersensitivity, chromatin accessibility, and expression changes in nearby genes. As described
above, meQTLs are thought to affect TF binding, which in turn influences DNA methylations at
nearby CpG sites (Banovich, Lan et al. 2014). In the cases where meQTLs are eQTLs, a positive
correlation between methylation and expression was shown when meQTLs are not near a TSS
(median distance of ~7Kb) and a negative correlation between methylation and expression was
shown when meQTLs are near a TSS (median distance of ~1Kb), which is consistent with
findings that active promoters show low DNA methylation whereas bodies of actively transcribed
genes show high DNA methylation (Gutierrez-Arcelus, Lappalainen et al. 2013, Banovich, Lan et
al. 2014).  

                     Table 1.2 Sources of eQTL databases


1.4 Experimental approaches to identify target genes of regulatory and eQTL
SNPs
         Although investigators often use either functional annotation or eQTL to identify prioritized
SNPs (Figure 1; collectively referred to as Candidate Functional SNPs), using a combination
approach may help to rank the individual lists for follow-up study. The set of Regulatory SNPs
(especially those obtained using high LD and not fine-mapping) is usually larger than the set of
eQTL SNps.  Therefore, determining if any of the large set of enhancers that harbor risk-
Tool Features URL PMID
NCBI%eQTL%browser cis1eQTL%from%liver,%limphoblastoid,%brain http://www.ncbi.nlm.nih.gov/projects/gap/eqtl/index.cgi
seeQTL
browser%for%cis1eQTL,%and%trans1eQTL%from%
limhpblastoid,%brain,%monocyte
http://www.bios.unc.edu/research/genomic_software/seeQTL/ 22171328
Chicago%eQTL
QTL%(eQTL,%dsQTL,%trQTL,%exonQTL)from%
limphoblastoid,%brain,%liver,%fibroblast,%T1cells
http://eqtl.uchicago.edu/cgi1bin/gbrowse/eqtl/
GTEx%Portal >60%tissues%eQTL%data%and%eQTL%IGV%browser http://www.gtexportal.org/home/ 25954001
GeneVar >5%tissues%eQTL%,%meQTL%data%and%visualization https://www.sanger.ac.uk/resources/software/genevar/ 20702402
Blood%eQTL Blood%cis1and%trans1eQTLs http://genenetwork.nl/bloodeqtlbrowser/ 24013639
Geuvadis
QTL%(eQTL,mirQTL,%trQTL)%from%limphoblastoid%
cell%lines
http://www.ebi.ac.uk/Tools/geuvadis1das/ 24037378

  30
 
associated SNPs are also in eQTL with one or more genes may identify a subset of risk-
associated enhancers that have a higher probability of having an impact on gene expression.
Similarly, although the set of eQTL SNPs is usually not large, it is difficult to perform functional
follow-up studies of the entire set. Therefore, determining which of the eQTL SNPs are also
located in a regulatory region would help to prioritize the list. Having identified a set of Regulatory
and/or eQTL SNPs, the next logical step would seem to be functional follow-up studies of the
genes regulated by the SNP-harboring elements. However, it is not easy to determine the actual
target gene of a regulatory element.  It is a commonly held assumption that a risk-associated SNP
that falls within a promoter region influences expression of that particular gene. In fact, if the gene
in question has a known biological function consistent with the possibility that it may influence
cellular phenotype in a manner consistent with the disease being studied, then investigators often
go straight to studying that gene. However, some have postulated that promoters can also have
enhancer activity and can influence the expression of other genes (Andersson, Sandelin et al.
2015). Thus, it may be premature to assume that SNPs located near to the 5’ end of a gene only
influence the regulation of that particular gene.  It is even more difficult to predict what gene is
directly regulated by an enhancer because they are located distal from a TSS, can regulate genes
in an orientation-independent manner, and, most importantly, can skip over nearby genes to
regulate genes farther away. Thus, although one hypothesis is that the gene nearest to a
promoter or an enhancer that harbors a regulatory SNP is the disease-related gene, in most
cases this hypothesis has not been proven (or even tested). Therefore, it is best to take an
unbiased approach and experimentally identify target genes of the regulatory elements harboring
Candidate Functional SNPs. This can be accomplished by manipulating the genomic region
containing the SNPs in question and determining if expression of the putative target gene is in fact
altered and/or by testing physical interactions between the region harboring the SNP and a
putative target gene using looping assays.  

  31
 
1.4.1 Deletion of regulatory elements harboring prioritized SNPs
We suggest that the first step toward identifying a target gene of a distal element should
be to delete or epigenetically modify the element and study subsequent effects on the
transcriptome (Figure 1; Approach A). We note that because deleting or inactivating an entire
promoter region would automatically eliminate expression from that gene (making it difficult to
determine the exact role of the SNP), we recommend that analysis of Candidate Functional SNPs
located in promoters should begin with specific targeting methods described below  (Figure 1;
Approach B). A traditional method to study a distal regulatory element in the genome of cultured
cells or in a mouse model has been to remove or replace a wildtype regulatory element with a
mutated version using loxP and the Cre recombinase. In a recent study, the loxP-Cre
recombination method was used to delete an enhancer from the mouse genome that is located
within a region that corresponds to a region of the human genome harboring a colon cancer
GWAS SNP (Sur, Hallikas et al. 2012). Mice lacking this enhancer element were resistant to
intestinal tumor formation, possibly due to down regulation of Myc, which is located 335 kb from
the deleted sequence.  Although these results are promising, there are several disadvantages in
using the loxP-Cre system.  For example, creation of the plasmids needed for homologous
recombination is laborious and the insertion of the foreign loxP DNA sequence into the genome
could potentially affect gene expression (Meier, Bernreuther et al. 2010). Fortunately, recently
developed technologies that are based on zinc finger proteins (ZFPs), transcription activator-like
effectors (TALEs), or the clustered regularly interspaced short palindromic repeats
(CRISPR)/CRISPR-associated protein (Cas) have allowed researchers to investigate functionality
of genomic elements in the endogenous context in almost any organism (Hsu, Lander et al. 2014).
Using these genomic engineering platforms, regulatory elements can be deleted from the genome
without the introduction of exogenous sequences. In addition, the same genomic platforms can be
used to epigenetically alter the genomic sequences containing a risk-associated SNP.  

  32
 
Regulatory elements harboring Candidate Functional SNPs can be deleted using zinc
finger-based nucleases (ZFNs), TALE-based nucleases (TALENs), or Cas9-based nucleases
(CRISPR/Cas9) (Hsu, Lander et al. 2014, Sander and Joung 2014). ZFNs and TALENs work as
heterodimers, with each monomer consisting of multiple DNA binding domains and a partial Fok1
nuclease. The DNA binding domains of ZNFs are tandem arrays of C2H2 zinc fingers, with each
finger recognizing 3-bp of DNA; ZFNs are created such that each half of the heterodimer
recognizes between 9 to 18bp of DNA at the target cut site. The DNA binding domains of TALENs
are composed of a tandem array of repetitive 33-35 amino acid modules, with each module
recognizing 1-bp of DNA; TALENs are usually created such that they recognize between 12 to 19
bp of DNA at the target cut site.  Binding of a pair of heterodimeric ZFNs or TALENs at target
sequences leads to Fok1 dimerization and DNA double strand breaks (DSB) (Kim and Kim 2014).
Because TALENs can be assembled based on a single bp recognition schema, they can be
targeted to a larger percentage of the genome than can ZNFs, which are based on a 3 nt motif
schema.  Also, the DNA binding domains of TALENs are easier to assemble than are zinc finger
domains. In contrast to ZFNs and TALENs, which rely on protein-target DNA interaction,
CRISPRs nucleases use complementary binding between RNA and DNA (Sander and Joung
2014). The most widely used CRISPR/CAS9 system has two components; the Cas9 nuclease
and a guide RNA (sgRNA) that can bind to specific target DNA sequence and recruit Cas9 to that
genomic location, resulting in a double strand break (Barrangou 2014, Sander and Joung 2014).
Construction of CRISPRs requires only cloning RNA sequences that will hybridize to target sites
(Hsu, Lander et al. 2014).  Because of the ease of cloning, reports of high targeting specificity
(Hsu, Lander et al. 2014), and accessibility of the guide RNAs to regions of methylated DNA (Hsu,
Scott et al. 2013), most investigators have begun using the CRISPR/Cas9 system to make DSBs
in human cells (Sander and Joung 2014).  

  33
 
      Deletion of regulatory elements by ZFNs, TALENs, or CRISPR nucleases requires targeting
functional nucleases (heterodimeric in the case of ZFNs or TALENs or monomeric in the case of
CRISPR nucleases) to both sides of the element. DSBs occur at both target sites, resulting in
local sequence alterations at each target site and loss of the intervening sequences. Recent
studies have shown that genomic regions ranging from several bp to more than 1 Mb can be
deleted (Li, Rivera et al. 2014, Webster, Barajas et al. 2014, Kraft, Geuer et al. 2015, Li, Shou et
al. 2015), with deletion efficiency having an inverse correlation with the size of the deleted region
(Canver, Bauer et al. 2014). The frequency of obtaining biallelic deletions in normal cells or
cancer cells having almost diploid chromosomal numbers is much higher than when multi-copy
genomic regions (created by amplification or increased chromosomal copy numbers) in cancer
cells are targeted. In most cases, many clones must be analyzed to identify cells that lack all
copies of the regulatory element under study. It is also important to keep in mind that if a
regulatory element plays a large role in controlling expression of an essential gene, then deletion
of all copies of that element from the genome could affect cell proliferation or survival (Canver,
Bauer et al. 2014); in this case, cells having monoallelic or partial loss of the copies of the element
(in the case of aneuploid cancer cells) must be analyzed. Several recent studies have used
genomic nucleases to delete regulatory elements and identify target genes. For example, Li et al.
deleted a 13 Kb section of a super-enhancer located 100 Kb downstream of the Sox2 gene and
observed ~90% downregulation of Sox2 gene expression (Li, Rivera et al. 2014). Myer et al.
deleted a Vitamin D receptor (VDR) binding region located 10 Kb upstream of the Mmp13 gene
and found that VDR-mediated regulation of Mmp13 was abolished. They also deleted a RUNX2
binding region located 30 Kb upstream of Mmp13 and observed a complete loss of Mmp13
expression (Meyer, Benkusky et al. 2015).  As described in Chapter 2 and Chapter 3, I have
deleted enhancers harboring Regulatory SNPs from colon cancer cells and have identified genes
whose expression is altered.

  34
 
 
1.4.2 Epigenetic modification of regulatory elements that harbor prioritized risk
SNPs
         An alternative method to identify target genes for distal Candidate Functional SNPs is to
modulate the chromatin state of the element using ZFs or TALEs fused to a chromatin modifying
domain or by using a Cas9 that has no nuclease activity (dCas9) fused to a chromatin modifying
domain; such engineered systems are termed “epigenetic toggle switches”.  To mimic deletion of
an enhancer, epigenetic repressors can be employed. The lysine-specific histone demethylase
KDM1A (also known as LSD1) and a KRAB domain that recruits the KAP1/SETDB1 histone
methylase have been fused to TALEs and dCAS9; constructs having KDM1A should decrease
active histone methylation marks whereas constructs having the KRAB domain should increase
inactivating histone methylation marks. One study has suggested that dCas9-KRAB is more
efficient than TALE-KRAB for inactivating enhancers, perhaps due to steric hindrance caused by
bound dCas9 in preventing recruitment of activating factors (Gao, Tsang et al. 2014). Another
study that targeted dCas-LSD1 to the distal enhancer of Oct34 and Tbx3 showed loss of
H3K4me2 and a dramatic decrease of H3K27Ac at enhancer regions. Interestingly, the action of
dCas9-LSD1 was shown to be specific to enhancers, with very little consequences if targeted to
promoters. In contrast, in this study dCas9-KRAB was more effective at promoters, resulting in an
increase of H3K27me3 or H3K29me3 level at targeted promoters but not at targeted enhancers
(Kearns, Pham et al. 2015). To achieve the opposite effect, investigators have used domains such
as VP64, an activating domain that recruits histone acetylases (HATs), as well as the enzymatic
domain of the p300 HAT to increase the levels of active epigenetic marks at regulatory elements.  
For example, Gao et al. modified enhancers that regulate the Oct 4 gene. These enhancers are
normally only active in embryonic stem cells and are marked by the repressive histone
modification H3K27me3 in mouse embryonic fibroblasts. However, TALE-VP64 constructs

  35
 
targeted to these enhancers decreased levels of H3K27me3 and increased levels of the active
marks H3K27Ac and H3K4me1 (Gao, Tsang et al. 2014). Also, a recent study showed that the
catalytic domain of the HAT P300 (P300
core
) fused to dCas9 could activate target enhancers and
promoters. In this study, a single gRNA targeting an enhancer region with dCas9-P300
core
was
sufficient to activate target gene expression, whereas other dCas9 activators required several
gRNAs to achieve high levels of gene expression (Hilton, D'Ippolito et al. 2015). The authors
suggested that the P300 domain is superior to the VP64 domain because P300 directly regulates
histone acetylation whereas VP64 must recruit a HAT. It is possible that many of the differences
in effectiveness of the various activating or repressing epigenetic toggle switches in the different
studies are due to specific features of the exact promoters and enhancers that were studied.
However, considering the ease of cloning guides RNAs, it seems that CRISPR/dCas9 constructs
such as dCas9-P300
core
and dCas9-LSD1 could become a standard method used to identify
target genes after turning on repressed enhancers or turning off activated enhancers, respectively.  
1.4.3 Specific targeting of prioritized SNPs
         Once deletion or epigenetic modification of a distal regulatory element has been shown to
have functional consequences, a more detailed analysis can be performed to compare the effects
of the risk and non-risk alleles and to identify specific nucleotides within the element important for
regulation; this same approach should be used to study the effect of a SNP on the activity of a
promoter region (Figure 1; Approach B).  In traditional approaches, investigators have used
luciferase reporter assays to test individual TF binding sites of enhancers. Such studies require
removing putative enhancer elements from their native chromosomal location and ligating them
into luciferase constructs such that they regulate a heterologous promoter (Melnikov, Murugan et
al. 2012, Fortini, Tring et al. 2014).  In addition to not using the correct promoter to test enhancer
elements, the choice of cell type could influence the results for enhancers, which function in a
highly cell-type specific manner. Another approach using mice involves pronuclear injection of

  36
 
endogenous versus mutated enhancer sequences linked to a lacZ gene (Palmiter and Brinster
1986). These approaches have issues regarding copy number and position-dependent effects on
reporter gene activity and effects of foreign DNA sequences on the native genomic landscape that
perturb endogenous gene expression (Palmiter and Brinster 1986). More recent studies have
used genomic engineering to compare endogenous versus mutated regulatory elements.  When
CRISPR/Cas9 makes a double stranded break, cells use either nonhomologous end-joining
(NHEJ) or homology-directed repair (HDR) to repair the break (Sander and Joung 2014). DNA
repair mediated by NHEJ is used when two CRISPR nucleases are targeted to either side of an
enhancer, resulting in local alterations at each target site and loss of the intervening sequences.  
However, because NHEJ results in small insertions or deletions at the site of cleavage this
method can be used for disrupting transcription factor motifs if one guide RNA is precisely
targeted to the motif.  Another way to study the precise effects of removing or altering a SNP is to
substitute a section of the genome with exogenously provided DNA, using the HDR pathway.  By
providing, along with the guide RNAs and Cas9, a donor DNA fragment that is basically identical
to the genomic sequence but contains the alternative SNP allele or a mutation of a TF motif, a
precise exchange of genomic regions can be accomplished.
In one study, Vierstra et al. deleted three DHSs located 62, 58, and 55 Kb away from the
TSS of BCL11A, a TF that represses fetal hemoglobin (HbF) levels. Deletion of the DHSs located
at 55 and 58 Kb away using TALENs led to downregulation of BCL11A and increased level of
HbF, but no effect was seen after deletion of the DHS located 62 Kb away (Vierstra, Reik et al.
2015). This study provides an excellent example supporting the recommendation for deletion of
regulatory elements prior to performing more detailed mutational analyses of an element. In this
case, studies of individual binding sites in the DHS located 62 Kb away would have not been
useful. Following upon the deletion studies, Viestral et al. then used ZFNs to disrupt five
transcription factor footprints in the enhancer located 58 Kb away from BCL11A and found that

  37
 
disruption of one of the TF footprints led to reduction of BCL11A.  Another method for identifying
critical regions of an enhancer is to use tiled guide RNAs and Cas9. Investigators used ~150 to
~200 different guide RNAs to target the +55, +58, and +62 DHS regions of the BCL11A locus.
They found that guide RNAs that disrupted the +58 DHS showed the most effect (Canver, Smith
et al. 2015).  Even though HDR is less efficient compared to NHEJ, the fact that this mechanism
can be used to exchange DNA fragments between a plasmid and the genome makes this the
method of choice to study SNP-specific differences. Several studies have used CRISRP/Cas9
and HDR-mediated genome editing to change SNPs in mice and cell culture model systems
(Claussnitzer, Dankel et al. 2014, Lee, Ye et al. 2014, Long, McAnally et al. 2014, Yin, Xue et al.
2014, Han, Slivano et al. 2015). The most common method is to introduce plasmids that express
the guide RNAs and Cas9, along with a plasmid that contains the donor sequence (e.g. an
enhancer fragment that has the SNP changed to the other allele). Claussnitzer et al. transfected
guide RNAs along with Cas9 and donor DNA plasmids into cultured adipose cells to switch a
Type 2 Diabetes risk SNP to the non-risk SNP allele, affecting binding of a TF and causing a
decrease in target gene expression (Claussnitzer, Dankel et al. 2014). Other studies have
reported an increased efficiency of HDR-mediated genome editing using purified guide RNAs and
Cas9 mRNA in place of the expression plasmids and single stranded oligodeoxynucelotides
having homology arms in place of the double stranded DNA (Wang, Yang et al. 2013, Yang,
Wang et al. 2013). Using this strategy in a mouse model, Han et al. substituted a 5 nucleotide
sequence within an intronic region of the Cnn1 gene, which disrupted a CArG box for SRF factor
and caused a reduction in expression of Cnn1 (Han, Slivano et al. 2015). Finally, these genomic
tools can be used to study orientation dependence of a region harboring a Candidate Functional
SNP. CTCF-mediated loops are frequently formed in a convergent orientation involving
homodimerization of CTCF proteins located quite far apart on the genome, with the orientation of
the CTCF sites determining the choice of interaction between specific enhancers and promoters

  38
 
(Ong and Corces 2011, Guo, Xu et al. 2015). Using 2 guide RNAs and Cas9, Guo et.al inverted
the region containing a CTCF binding site, switching the CTCF orientation with respect to
surrounding CTCF sites; they found that this inversion resulted in changes in gene expression
patterns (Guo, Xu et al. 2015).
1.5 Disease-related functional analyses
As described above, an integrated and ordered approach should be used to investigate the role of
non-coding SNPs in gene expression. Namely, a combination of deletion or modification of a
regulatory element plus eQTL analyses can provide a list of candidate target genes. However, an
analysis of non-coding risk-associated SNPs is not complete without further characterization of
how the gene(s) whose activity is influenced by a particular SNP affect initiation, progression, or
manifestation of the disease under study. Identifying the causal gene(s) will provide insights into
the disease and perhaps also provide new diagnostic or therapeutic targets. It is likely that the
results of manipulation of the regulatory element or the eQTL analyses identified more than one
candidate target gene. Thus, it may be difficult to know which of the genes whose expression is
linked to the SNP should be tested in phenotypic assays.  If one of the candidate target genes is
tested with negative results, this could mean either that the candidate SNP is not really linked to
the disease, that the wrong assay was used, or that the wrong candidate gene was assayed.  One
approach to deal with this uncertainty is to first analyze the effects of deleting or repressing the
element in a functional assay; if the element can be shown to affect a particular phenotype, then
individual candidate target genes can subsequently be studied using that same assay.  
If studying GWAS loci related to cancer, methods that are used for functional follow-up
studies include proliferation and cell migration assays (Edwards, Beesley et al. 2013). However,
cultured cancer cell lines are not ideal model systems because of their genomic instability (which
leads to variable karyotypes) and because isolated cancer cell lines grown in tissue culture dishes
do not properly represent the complex environment of the cells in the context of either a normal

  39
 
tissue or a tumor. Investigators have begun to use 3 dimensional organoids (Matano, Date et al.
2015), normal cell lines, or isogenic ES or iPS cells (Schwank, Koo et al. 2013, Grobarczyk,
Franco et al. 2015) to try to reproduce a more natural cellular environment for functional studies.
However, even these assays do not allow the study of effects seen only within a complex tissue. If
a mouse mode exists that closely reproduces the human disease, then perhaps this would be the
ideal system to use; the phenotypic consequences of a SNP and/or putative causal target gene
may be more consequential in a living organism than in a short-term cell culture assay. For
example, when a mouse lacking a homologous enhancer that is associated with colon cancer in
humans (described above) was crossed to a mouse that spontaneously develops tumors in the
intestine and colon, the incidence of polyp formation was reduced in their offspring (Sur, Hallikas
et al. 2012). Another issue to consider is that an individual SNP (or regulatory element) may not
cause dramatic phenotypic differences. Instead, it may be necessary to study combinations of
SNPs. A recent study evaluating the combinatorial effects of SNPs showed that different SNPs in
the same LD block identified different enhancers that cooperatively regulate the same target gene
(Corradin, Saiakhova et al. 2014). Such studies suggest that altering an individual GWAS-
identified regulatory element may have less functional consequences than inactivation of a target
gene. However, if multiple target genes work together to contribute to disease risk then even
moving from SNP to target gene may not solve the problem. Perhaps investigators should use
multiplexing CRISPR/Cas9 systems (Cheng, Wang et al. 2013, Cong, Ran et al. 2013, Zalatan,
Lee et al. 2015) to simultaneously target many regulatory elements and/or putative target genes
from several different risk-associated loci to test for combinatorial effects in phenotypic assays
(Corradin and Scacheri 2014).
If an appropriate assay is identified whose outcome is influenced by loss or modification of
the SNP or regulatory element, then candidate target genes can be tested using that same assay
in the hopes of identifying the causal gene. Commonly used approaches to investigate the

  40
 
function of a candidate causal gene include overexpressing an exogenous form of the gene (e.g.
using a cloned cDNA) or reducing levels of the endogenous gene using RNAi tools. More recently,
alternative approaches for overexpressing or repressing genes have been developed that are
based on the genomic engineering tools described above. For example, investigators have used
CRISPR/dCas9 nucleases to mutate coding regions (Xue, Chen et al. 2014) and epigenomic tools
such as TALEs and dCAs9 fused to activator or repression domains have been used to regulate
the promoter of a gene of interest (Kabadi and Gersbach 2014). However, it is important to
consider that overexpressing a gene from a cDNA may not appropriately provide the correct
splice variant (Prelich 2012) and that inactivation methods such as siRNA, shRNA, or genomic
nucleases have the inherent problem of off-target effects (Sigoillot, Lyman et al. 2012).
Investigators often choose putative causal genes based on a) proximity to the regulatory
element, b) degree to which expression is affected, or c) a gene function that can be easily
imagined to contribute to the disease risk.  Each of these choices is fraught with problems. For
example, as discussed above, genes are not necessarily near their regulatory elements. Another
confounding issue is that changes in mRNA do not always lead to similar changes in protein
levels (Ghazalpour, Bennett et al. 2011, Battle, Khan et al. 2015) and thus the genes that show
the largest changes in mRNA might not necessarily produce the largest changes in protein.  
Finally, gene function is often assigned based on the first set of experiments performed on that
gene; many genes function in multiple networks, often in a tissue-specific manner.  Therefore, it is
important to keep in mind that identifying a causal gene may require testing several different
candidates.  
 A limitation of the genomic and epigenomic editing technologies described above is that it
is hard to distinguish target genes directly regulated by an enhancer from genes whose
expression has been indirectly affected as a consequence of the expression changes of the direct
targets (e.g. changes in signaling pathways or proliferative states). However, physical interaction

  41
 
assays can be used to attempt to distinguish genes that are directly vs. indirectly affected by a
risk-associated enhancer. Many interaction assays are based on principles of the chromosome
confirmation capture (3C) assay, which involves capturing chromosome interactions by
formaldehyde cross-linking, followed by digestion with a restriction enzyme and subsequent
ligation of DNA regions that were brought together by protein-protein interactions; ligation
frequency between two loci is assessed using qPCR (Rivera and Ren 2013). Using 3C, Zhang et
al. investigated all possible interactions between a prostate-specific enhancer and genes that are
within an ~3 Mb window, identifying a single loop to a gene that is 1 Mb away from the enhancer
(Zhang, Cowper-Sal lari et al. 2012). However, the results from 3C assays are limited to a pre-
selected region, excluding the discovery of interactions with regions beyond the tested genomic
window. A modification of 3C, Circular Chromosome Conformation Capture followed by
sequencing (4C-seq), allows the investigation of all possible interactions mediated by a specific
enhancer by employing high throughput sequencing instead of qPCR. Using 4C-seq, investigators
showed that enhancers located within an intron of the FTO gene and harboring Obesity and Type
2 Diabetes GWAS-identified SNPs do not interact with the FTO promoter but instead interact with
the IRX3 gene which is located 500 Kb downstream (Smemo, Tena et al. 2014). Hi-C, another
variation of 3C, can be used to study all chromatin interaction within the genome. Unfortunately,
the majority of Hi-C experiments capture interactions separated by at least 1 Mb (Corradin and
Scacheri 2014) and thus may miss “close-by” enhancer-promoter loops. However, a recent
modification of Hi-C, called Capture Hi-C, which increases the resolution of the mapped
interactions, has been used to study colon cancer risk SNPs. These experiments identified
interactions that are enriched with colon cancer-specific transcription factor binding sites (Jager,
Migliorini et al. 2015). This technique was also used to identify short-range interactions between
an enhancer and a gene 26 Kb away (Hughes, Roberts et al. 2014). Therefore, to study
interactions between enhancers and promoters, Capture-C is a recommended method since it not

  42
 
only provides a better resolution of ~1-2 Kb, but also can detect hundreds of interactions in one
experiment. Importantly, even though these looping assays that identify interactions between
regulatory elements harboring SNPs and promoters can provide clues as to the identity of putative
target genes, it is important to compare these results to those in which the regulatory element has
been experimentally.  Genes whose expression is linked to the regulatory element and that are
also involved in promoter-enhancer loops are likely to be direct targets, whereas genes whose
expression is linked to the element but no loop is found can either be indirect targets or direct
targets that are difficult to identify due to limitations of the current looping assays; it is also
possible that enhancer-promoter loops will be identified that are not related to genes whose
expression changes upon manipulation of the enhancer. Finally, it is important to note that a
gene whose expression is indirectly affected by a non-coding SNP could be a more important
diagnostic or therapeutic target that the directly affected gene. Thus, it is important to identify both
the direct targets of a risk-associated enhancer and other genes affected by reduction of levels of
the direct targets.  

 

 

 

 

 

 

 

 

 

 


 

  43
 

2 Chapter 2: Functional annotation of colon cancer risk SNPs

The work described in this chapter has been published in Nature Communication, 5: 5114. doi:
10.1038/ncomms6114. Lijing Yao, Yu Gyoung Tak, Benjamin P. Berman and Peggy J. Farnham.
“Functional annotation of colon cancer risk SNPs”. Lijing Yao is responsible for all bioinformatic
analyses and assisted with manuscript preparation; Yu Gyoung Tak performed the enhancer
characterizations (Table 2.3 and Figure 2.4 ) and helped to edit the manuscript; Benjamin P.
Berman advised L.Y. in bioinformatic analyses and Peggy J. Farnham conceived the project and
wrote the manuscript.


2.1 Abstract

 
              Colorectal cancer (CRC) is a leading cause of cancer-related deaths in the United States.
Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms
(SNPs) associated with increased risk for CRC. A molecular understanding of the functional
consequences of this genetic variation has been complicated because each GWAS SNP is a
surrogate for hundreds of other SNPs, most of which are located in non-coding regions. Using
genomic and epigenomic information, we tested the hypothesis that the GWAS SNPs and/or
correlated SNPs are in elements that regulate gene expression, identifying 23 promoters and 28
enhancers. Using gene expression data from normal and tumor cells, we identified 66 putative
target genes of the risk-associated enhancers (10 of which were also identified by promoter
SNPs).  Employing CRISPR nucleases, we deleted one risk-associated enhancer and identified
genes showing altered expression. We suggest that similar studies should be performed to fully
characterize all CRC risk-associated enhancers.  
 

  44
 
2.2 Introduction

 
Colorectal cancer (CRC) ranks among the leading causes of cancer-related deaths in the
United States.  The incidence of and death from CRC is in the top 3 of all cancers in the United
States for both men and women (http://apps.nccd.cdc.gov/uscs/toptencancers.aspx). It is
estimated that 142,820 men and women will be diagnosed with, and 50,830 men and women will
die of, cancer of the colon and rectum in 2013 (http://seer.cancer.gov/-
statfacts/html/colorect.html). A better understanding of the regulatory factors and signaling
pathways that are deregulated in CRC could provide new insight into appropriate
chemotherapeutic targets.  Decades of studies have revealed that certain genes and pathways,
such as WNT, RAS, PI3K, TGF-B, p53, and mismatch repair proteins, are important in the
initiation and progression of CRC (Fearon 2011). In an attempt to obtain a more comprehensive
view of CRC, two new approaches have been used; exome sequencing of tumors and genome-
wide population analyses of human variation.  The Cancer Genome Atlas (TCGA) has taken the
first of these new approaches in the hopes of moving closer to a full molecular characterization of
the genetic contributions to CRC, analyzing somatic alterations in 224 tumors (2012). These
studies again implicated the WNT, RAS, and PI3K signaling pathways. The second new approach
identifies single nucleotide polymorphisms associated with specific diseases using genome wide
association studies (GWAS). GWAS has led to the identification of thousands of single nucleotide
polymorphisms (SNPs) associated with a large number of phenotypes (Hindorff, Sethupathy et al.
2009, Manolio 2010). Such studies identify what are known as tag SNPs that are associated with
a particular disease. Specifically for CRC, 25-30 tag SNPs have been identified (Zanke,
Greenwood et al. 2007, Houlston, Webb et al. 2008, Houlston, Cheadle et al. 2010, Dunlop,
Dobbins et al. 2012, Bauer, Kamran et al. 2013, Ding, Lee et al. 2013).  
Although identification of tag SNPs is an important first step in understanding the relationship
between human variation and risk for CRC, a major challenge in the post-GWAS era is to

  45
 
understand the functional significance of the identified SNPs (Boyle, Hong et al. 2012).  It is
critical to advance the field by progressing from a statistical association between genetic variation
and disease to a molecular understanding of the functional consequences of the genetic variation.
Progress towards this goal has been mostly successful when the genetic variation falls within a
coding region. Unfortunately, most SNPs identified as associated with human disease in large
GWAS studies are located within large introns or distal to coding regions, in what in the past has
been considered to be the unexplored territory of the genome. However, recent studies from the
ENCODE Consortium have shown that introns and regions distal to genes contain regulatory
elements.  In particular, the ENCODE Consortium has made major progress in defining hundreds
of thousands of cell-type specific distal enhancer regions (2012, Frietze, Wang et al. 2012,
Zentner and Scacheri 2012). Comparison of GWAS SNPs to these enhancer regions has
revealed several important findings. For example, work from ENCODE and others have shown
that many GWAS SNPs fall within enhancers, DNase hypersensitive sites, and transcription factor
binding sites (Maurano, Humbert et al. 2012),(Schaub, Boyle et al. 2012),(Akhtar-Zaidi, Cowper-
Sal-lari et al. 2012). It is also clear that the SNP whose functional role is most strongly supported
by ENCODE data is often a SNP in linkage disequilibrium (LD) with the GWAS tag SNP, not the
actual SNP reported in the association study (Hardison 2012).  
These recent reports clearly show that regulatory elements can help to identify important SNPs
(Boyle, Hong et al. 2012, Farnham 2012, Hardison 2012). However, the studies were performed
using all available ENCODE data and did not focus the functional analysis of cancer-associated
SNPs on the regulatory information obtained using the relevant cell types. Using epigenetic marks
obtained from normal colon and colon cancer cells, we have identified SNPs in high LD with
GWAS SNPs that are located in regulatory elements specifically active in normal and/or tumor
colon cells. Characterization of transcripts nearby CRC risk-associated promoters and enhancers
using RNA expression data allows the prediction of putative genes and non-coding RNAs

  46
 
associated with an increased risk of colon cancer.  Using genomic nucleases, we deleted one
risk-associated enhancer and compared the deregulated genes with those predicted to be targets
of that enhancer. Our studies suggest that transcriptome characterization after precise deletion of
a risk-associated enhancer will be a useful approach for post-GWAS analyses.  

2.3 Results

 
2.3.1 CRC risk-associated SNPs linked to a specific gene

 
         For our studies, we chose 25 tag SNPs, 4 of which have been associated with an increased
risk for CRC in Asia-derived case-control cohorts and the rest in Europe-derived case-control
cohorts; the genomic coordinates of each SNP can be found in Table 2.1. Of these 25 tag SNPs,
only one is found within an exon, occurring in the third exon of the MYNN gene and resulting in a
synonymous change that does not lead to a coding difference. However, there are hundreds of
SNPs in high LD with each tag SNP and it is possible that some of the high LD SNPs may reside
in coding exons. To address this possibility we used a bioinformatics program called FunciSNP to
identify SNPs correlated with CRC tag SNPs that also intersect the set of coding exons in the
human genome (Coetzee, Rhie et al. 2012).  FunciSNP is an R/Bioconductor package that allows
a comparison of population-based correlated SNPs from the 1000 Genomes Project
(http://www.1000genomes.org/) with any set of chromatin biofeatures.  In this initial analysis, we
chose coding exons from the Gencode 15 dataset (http://www.gencodegenes.org/releases/) as
the biofeature.  Because LD varies with the population, to identify population-based correlated
SNPs we specified the Asian population for analysis of the 4 tag SNPs identified using Asian-
derived case-control cohorts and we specified the European population for analysis of the rest of
the tag SNPs. Using FunciSNP, we identified 240 unique SNPs that are correlated with the 25 tag
SNPs at an r
2
>0.1 and are within a coding exon (see Supplementary Figure S2.1). We then

  47
 
used snpeff (http://snpeff.sourceforge.net/ (Cingolani, Platts et al. 2012)) to determine that 40 of
these correlated SNPs create non-synonymous changes; however, limiting the SNPs to those
with an LD of r
2
>0.5 with the tag SNP reduced the number to only 13. Using polyphen-2
(http://genetics.bwh.harvard.edu/pph2/ (Adzhubei, Schmidt et al. 2010)) and provean
(http://provean.jcvi.org/index.php (Choi, Sims et al. 2012)),  only 2 potentially damaging SNPs at
r
2
>0.5 were found, both in POU5F1B (Figure 2.1).  At the less restrictive  r
2
>0.1, 4 other genes
were also found to harbor a damaging SNP (RHPN2, UTP23, LAMA5, and FAM186A). To
determine if these genes are expressed in colon cells, we performed two replicates of RNA-seq
for HCT116 cells and also used RNA-seq data from the Roadmap Epigenome Mapping
Consortium for normal sigmoid colon to examine expression. After analysis of both sets of RNA-
seq data, we categorized transcripts that are not expressed as having less than 0.5 FPKM
(Supplementary Figure S 2.2). Analysis of the RNA-seq data revealed that POU5F1B and
FAM186A are not expressed in either the normal sigmoid colon or HCT116 cells (however, see
these genes are expressed in a cohort of TCGA colon tumors; see Table 2.2).  
Another way to link a SNP to particular gene is if the SNP falls within a promoter region.  We
again used FunciSNP, but this time the biofeature analyzed corresponded to the region from -
2000 to +2000 nt of the transcription start site (TSS) of each transcribed gene (we analyzed
coding and non-coding transcripts from GENCODE V15).  We chose to include 2 kb upstream
and downstream of the start site as the promoter proximal regions because several studies
(Koudritsky and Domany 2008, Stergachis, Haugen et al. 2013), as well as visual inspection of
the ENCODE TF ChIP-seq tracks, have shown that transcription factors can bind on either side of
a transcription start site. Using an r
2
>0.1, we found 684 correlated promoter SNPs which were
reduced to 233 SNPs at r
2
>0.5 (Figure 2.1 and Supplementary Figure S2.3).  Many of these
SNPs fall within the same promoter regions.  When collapsed into distinct promoters, we identified
the TSS regions of 17 protein coding genes and 2 noncoding RNAs which are expressed in

  48
 
HCT116 or sigmoid colon cells; promoter SNPs identified 4 additional expressed genes when a
larger number of TCGA colon tumor samples were analyzed (Table 2.2).

  49
 

 
Figure 2.1 Identification of potential functional SNPs for CRC.
A) Shown is the number of SNPs identified by FunciSNP in each of 3 categories for 25 colon cancer
risk loci (see Table 2.1 for information on each CRC risk SNP). For exons, only non-synonymous
SNPs are reported; parentheses indicated the number of SNPs that are predicted to be damaging;
see Table 2.2 for a list of the expressed genes associated with the correlated SNPs.  For TSS
regions, the region from -2kb to +2kb relative to the start site of all transcripts annotated in
GENCODE V15, including coding genes and non-coding RNAs was used; see Table 2.2 for a list of
expressed transcripts associated with the correlated SNPs.  B)  For H3K27Ac analyses, ChIP-seq
data from normal sigmoid colon and HCT116 tumor cells were used; see Table 2.3 for further
analysis of distal regions harboring SNPs in normal and tumor colon cells.  The SNPs having an
r
2
>0.1 that overlapped with H3K27Ac sites were identified separately for HCT116 and sigmoid colon
datasets. Because more than one SNPs could identify the same H3K27Ac-marked region, the SNPs
were then collapsed into distinct H3K27Ac peaks.  The sites that were within +/- 2 kb of a promoter
region were removed to limit the analysis to distal elements. To obtain a more stringent set of
enhancers, those regions having only SNPs with r
2
<0.5 were removed. This remaining set of 68
distal H3K27Ac sites were contained within 19 of the 25 risk loci.  Visual inspection to identify only
the robust enhancers having linked SNPs not at the margins reduced the set to 27 enhancers
located in 9 of the 25 risk loci; an additional enhancer was identified in SW480 cells (see Table 2.3
for the genomic locations of all 28 enhancers). Color key: green=SNPs or H3K27Ac sites unique to
normal colon, red=unique to colon tumor cells, blue=present in both normal and tumor colon.

A B
CRC tag SNPs (25)
Genomic window
(+/-200kb) around tag SNP
Extract all known SNPs
(1000 genome database)
Select correlated SNPs
LD r
2
> 0.1
Select correlated SNPs
LD r
2
> 0.1
r
2
> 0.1: 40 (7)
r
2
> 0.5: 13 (2)
r
2
> 0.1: 684
r
2
> 0.5: 233
r
2
> 0.1: 746
r
2
> 0.5: 270
Exon TSS region H3K27Ac
370
236
140
111
47
41
96
27
32
41
18
9
13 13
1
SNPs with r
2
> 0.1 and
overlapping H3K27Ac
different
H3K27Ac sites
distal
H3K27Ac sites
distal H3K27Ac sites
r
2
> 0.5
visual inspection
Yao_Figure 1

  50
 
          Table 2.1 Summary of regions linked to CRC tag SNPs.

The positions and classification of the CRC tag SNPs are based on the hg19 UCSC genome
browser reference genome; the hg19 reference alleles (Ref) and the alternative alleles (Alt) are
indicated; the risk alleles are in red.  The number of exons having a non-synonymous,
damaging correlated SNP with an LD of r
2
>0.1 are reported; the 3 regions marked with an
asterisk are the only ones for which the damaging SNP has an LD of r
2
>0.5 with the tag SNP.
For TSS and enhancers, the number of different promoters or enhancers having at least one
SNP with an LD of r
2
>0.5 with the tag SNP are reported (note that a given TSS or enhancer can
be identified by more than one tag SNP; see Table 2.2 and Table 2.3 for more details). PMID
indicates the PubMED ID for a publication describing the identification of the tag SNP. A list of
all correlated SNPs with r
2
>0.1 in exons, TSS, or enhancers can be found in Supplementary
data file 2.1.




Tag$SNP Position Ref/Alt $Exons
Protein$
Coding$TSS
Non8Coding$
TSS Enhancers$ PMID
rs6691170 chr1:222045446 G/T 0 0 2 0 20972440
rs6687758 chr1:222164948 A/G 0 0 3 0 20972440
rs10936599 chr3:169492101 C/T 0 4 1 0 20972440
rs647161 chr5:134499092 C/A 0 0 1 6 23263487
rs1321311 chr6:36622900 C/A 0 1 1 0 22634755
rs16892766 chr8:117630683 A/C 1 1 0 0 18372905
rs10505477 chr8:128407443 A/G *1 1 0 2 17618283
rs6983267 chr8:128413305 G/T *1 1 0 2 23266556
rs7014346 chr8:128424792 A/G *1 1 1 2 18372901
rs10795668 chr10:8701219 G/A 0 0 1 0 18372905
rs1665650 chr10:118487100 T/C 0 0 0 0 23263487
rs3824999 chr11:74345550 T/G 0 1 0 1 22634755
rs3802842 chr11:111171709 C/A 0 3 1 0 18372901
rs10774214 chr12:4368352 T/C 0 0 0 1 23263487
rs7136702 chr12:50880216 T/C 1 3 1 4 20972440
rs11169552 chr12:51155663 C/T 0 2 2 3 20972440
rs4444235 chr14:54410919 T/C 0 1 1 0 19011631
rs4779584 chr15:32994756 T/C 0 1 1 0 18372905
rs9929218 chr16:68820946 G/A 0 2 1 4 19011631
rs4939827 chr18:46453463 T/C 0 0 0 2 18372905
rs10411210 chr19:33532300 C/T 1 2 0 2 19011631
rs961253 chr20:6404281 C/A 0 0 0 0 19011631
rs2423279 chr20:7812350 T/C 0 0 0 0 23263487
rs4925386 chr20:60921044 T/C 1 1 3 4 20972440
rs5934683 chrX:9751474 T/C 0 0 0 0 22634755

  51
 

             
                  Table 2.2 Expressed transcripts directly linked CRC index SNPs

Only 3 damaging SNPs having an r
2
>0.1 were identified in the exons of genes expressed in
either HCT116 or normal sigmoid colon cells; of these, only UTP23 and RHPN2 were identified
as damaging by two different programs. RNAs expressed in HCT116 or sigmoid colon cells and
having a correlated SNP with r
2
>0.5 within +/- 2kb of the TSS of protein coding transcripts or
non-coding RNAs are shown. The cases in which the tag SNP is located in the TSS region are in
bold and non-coding RNAs are in parentheses. We note that exon SNPs identified two additional
expressed genes (POU5F1B and FAM186A) and promoter SNPs identified 3 additional
expressed genes (FAM186A, LRRC34, and LRRIQ4) when a larger number of TCGA colon
tumor samples were analyzed.  
 

2.3.2 CRC risk-associated SNPs in distal regulatory regions

 
    Most of the SNPs in LD with the CRC GWAS tag SNPs cannot be easily linked to a specific
gene because they do not fall within a coding region or a promoter-proximal region. However, it is
possible that a relevant SNP associated with increased risk lies within a distal regulatory element
of a gene whose function is important in cell growth or tumorigenicity.  To address this possibility,
we used the histone modification H3K27Ac to identify active regulatory regions throughout the
genome of colon cancer cells or normal sigmoid colon cells.  We used HCT116 H3K27Ac ChIP-
Tag$SNP $Exons RNAs$of$TSS$SNPs
rs10936599 ACTRT3,MYNN,.(TERC)
rs1321311 CDKN1A
rs16892766 UTP23 EIF3H
rs7014346 (RP11>382A18.1)
rs3824999 POLD3
rs3802842 C11orf92,C11orf93,C11orf53
rs7136702 DIP2B
rs11169552 ATF1,DIP2B
rs4444235 BMP4
rs4779584 GREM1
rs9929218 CDH3,CDH1
rs10411210 RHPN2 GPATCH1,RHPN2
rs4925386 LAMA5 LAMA5

  52
 
seq data (Frietze, Wang et al. 2012) produced in our lab for the tumor cells and we obtained
H3K27Ac ChIP-seq data for normal colon cells from the NIH Roadmap Epigenome Mapping
Consortium. The ChIP-seq data for both the normal and tumor cells included two replicates. To
demonstrate the high quality of the datasets, we called peaks on each replicate of H3K27Ac from
HCT116 and each replicate of H3K27Ac from sigmoid colon using Sole-search (Blahnik, Dou et
al. 2010, Blahnik, Dou et al. 2011) and compared the peak sets from the two replicates using the
ENCODE 40% overlap rule (after truncating both lists to the same number, 80% of the top 40% of
one replicate must be found in the other replicate and vice versa).  After determining that the
HCT116 and sigmoid colon datasets were of high quality (Supplementary Figure S2.4), we
merged the two replicates from HCT116 and separately merged the two replicates from sigmoid
colon and called peaks on the two merged datasets; see Supplementary Table S2.2 for a list of
all ChIP-seq peaks. Using the merged peak lists from each of the samples as biofeatures in
FunciSNP, we determined that 746 of the 4894 SNPs that were in LD with a tag SNP at r
2
>0.1
were located in H3K27Ac regions identified in either the HCT116 or sigmoid colon peak sets; of
these 270 SNPs had an r
2
>0.5 with a tag SNP (Figure 2.1 and Supplementary Figure S2.5).  
A comparison of the H3K27Ac peaks from normal and tumor cells indicated that the patterns
are very similar; in fact, ~24,000 H3K27Ac peaks are in common in the normal and tumor cells.  
However, there are clearly some peaks unique to normal and some peaks unique to the tumor
cells. Therefore, we separately analyzed the normal and tumor H3K27Ac ChIP-seq peaks as
different sets of biofeatures using FunciSNP (Figure 2.1B).  Of the 746 SNPs, 236 were located
in a H3K27Ac site common to both normal and tumor cells, whereas 140 were unique to tumor
and 370 were unique to normal cells.  Visual inspection of the SNPs and peaks using the UCSC
genome browser showed that many of the identified enhancers harbored multiple correlated
SNPs.  Reduction of the number of SNPs to the number of different H3K27Ac sites resulted in 47
common, 41 tumor-specific, and 111 normal-specific regions. Visual inspection also showed that

  53
 
some of the H3K27 genomic regions corresponded to promoter regions (see Supplementary
Figure S2.4).  Because promoter regions having correlated SNPs were already identified using
TSS regions (see above), we eliminated the promoter-proximal H3K27Ac sites, resulting in 27
common, 32 tumor-specific, and 96 normal-specific distal H3K27Ac regions.  As the next
winnowing step, we selected only those enhancers having at least one SNP with an r
2
> 0.5,
leaving 18 common, 9 tumor-specific, and 41 normal-specific distal H3K27Ac regions.  We noted
that some of the identified regions corresponded to low ranked H3K27Ac peaks. For our
subsequent analyses, we wanted to limit our studies to robust enhancers that harbor correlated
SNPs. Therefore, we visually inspected each of the genomic regions identified as having distal
H3K27Ac peaks harboring a correlated SNP. To prioritize the distal regions for further analysis,
we eliminated those for which the correlated SNPs was on the edge of the region covered by the
H3K27Ac signal or corresponded to a very low-ranked peak. After inspection, we were left with a
set of 27 distal H3K2Ac regions in which a correlated SNP (r
2
>0.5) was well within the boundaries
of a robust peak (Figure 12.B). To confirm our results, we repeated the analysis using H3K27Ac
data from a different colon cancer cell line, SW480, identifying only one additional enhancer
harboring risk SNPs for CRC. The genomic coordinates of each of these 28 enhancers, which are
clustered in 9 genomic regions, are listed in Table 2.3 (see also Supplementary Table S2.3).
Combining all data, enhancers in 5 of the 9 regions were identified in all 3 cell types and 8 of the 9
regions were identified in at least two of the cell types.  


  54
 

 
Figure 2.2 Expression of risk-associated genes in colon cells.
The left panel indicates if a transcript was identified by a SNP located in an exon or a TSS or is
nearby a risk-associated enhancer; the middle panel shows the expression values of each of
the 41 transcripts in sigmoid colon or HCT116 tumor cells; the right panel shows the fold
change of each transcript in the tumor cells (positive indicates higher expression in tumor).


  55
 
           Table 2.3 Distal regulatory regions correlated with CRC tag SNPs
 

 
   The tag SNP and the correlated SNPs for 28 distal, robust H3K27Ac regions are indicated;
the enhancers that are found only in normal sigmoid colon are indicated with an asterisk.
The 3 nearest protein-coding RNAs and 3 nearest non-coding RNAs were identified using
the GENCODE V15 gene annotation; only those RNAs that are expressed in HCT116 or
sigmoid colon cells are shown (see also Supplementary 2.3)
Enhancer Tag*SNP
No.*of*
Correlated*
SNPs
Chromosome Start End Location
Nearby*Expressed*Coding*and*Non=
Coding*RNAs
1 4 chr5 134468409 134473214 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
2 6 chr5 134474759 134478528 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
3* 4 chr5 134520309 134523373 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
4 7 chr5 134525698 134531612 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
5 7 chr5 134543144 134548023 CTC0203F4.13intron H2AFY,PITX1
6 7 chr5 134511610 134516426 CTC0203F4.13intron PITX1,CATSPER3,H2AFY
7 3 chr8 128412778 128414859
RP110382A18.13
intron
MYC,RP110382A18.1,RP110
382A18.2,3RP110255B23.3
8* 5 chr8 128420412 128422114
RP110382A18.13
intron
MYC,RP110382A18.1,RP110
382A18.2,3RP110255B23.3
9* rs3824999 4 chr11 74288844 74294943 Intergenic POLD3,LIPT2,KCNE3,AP001372.2
10
rs10774214.A
SN
1 chr12
4378128 4379840
Intergenic CCND2,C12orf5
11* 1 chr12 50908239 50913757 DIP2B3intron DIP2B,LARP4
12 2 chr12 50938468 50940796 DIP2B3intron DIP2B,LARP4
13* 2 chr12 51018019 51020503 DIP2B3intron DIP2B,ATF1,LARP4
14* 1 chr12 50973150 50974328 DIP2B3intron DIP2B,LARP4
15 1 chr12 51012054 51014942 DIP2B3intron DIP2B,ATF1,LARP4
16
rs11169552,333333
rs7136702
3 chr12 51040371 51042207 DIP2B3intron DIP2B,ATF1
17 1 chr16 68740822 68742561 Intergenic CDH3,CDH1,TMCO7
18* 4 chr16 68754658 68757192 Intergenic CDH1,CDH3,TMCO7
19 5 chr16 68774214 68780161 CDH13intron CDH1,CDH3,TMCO7
20 11 chr16 68784044 68791839 CDH13intron CDH1,CDH3,TMCO7
21 4 chr18 46448530 46450772 SMAD73intron SMAD7,CTIF,DYM,RP11015F12.1
22 6 chr18 46450800 46454601 SMAD73intron SMAD7,CTIF,DYM,3RP11015F12.1
23 6 chr19 33537339 33541195 RHPN23intron RHPN2,GPATCH1,C19orf40
24 1 Chr19 33530860 33533823 RHPN23intron RHPN2,GPATCH1,C19orf40
25 3 chr20
60929861 60935447
LAMA53intron
LAMA5,RPS21,CABLES2,3RP110
157P1.4
26 3 chr20
60938278 60941762
LAMA53intron
LAMA5,RPS21,CABLES2,3RP110
157P1.4
27 6 chr20 60948726 60951918 Intergenic LAMA5,RPS21,CABLES2
28 6 chr20 60955085 60958391 Intergenic RPS21,LAMA5,CABLES2
rs647161.ASN
rs10411210
rs4939827
rs4925386
rs10505477,333
rs6983267,3333333
rs7014346
rs7136702
rs11169552
rs9929218

  56
 
 
    2.3.3 Effects of SNPs on binding motifs in the distal elements

 
To determine possible effects of the correlated SNPs on transcription factor binding, we first
analyzed all SNPs having an r
2
>0.1 with the 25 CRC tag SNPs.  Using position weight matrices
from Factorbook (Blattler, Yao et al. 2013), all correlated SNPs that fell within a critical position in
a transcription factor binding motif were identified (Supplementary Table S2.4). We identified
~800 SNPs that were predicted to impact binding of transcription factor to a known motif.  
However, most of these SNPs are not in regulatory regions important for CRC.  Therefore, we
next limited our analysis to the set of correlated SNPs that fall within the 28 robust enhancers
(Supplementary Table S2.5). We found 80 SNPs that cause motif changes in a total of 124
motifs, representing binding sites for 40 different transcription factors. Using RNA-seq data, we
found that 36 of these factors are expressed in HCT116 and/or sigmoid colon cells (Table 2.4),
suggesting that perhaps the binding of these factors at the risk-associated enhancers is
influenced by the correlated SNPs.  Of the 36 factors, most were expressed at either
approximately the same levels in normal and tumor colon or at higher levels in HCT116 cells than
in normal colon. However, several factors showed large decreases in gene expression in HCT116
as compared to sigmoid colon cells, including FOS and JUN which were ~10 fold higher in normal
colon and HNF4A and ETS1 which were 30-40 fold higher in normal colon; see Supplementary
Table S 2.6.  

 

 

 

 

 

  57
 

 
             Table 2.4 Effects of SNPs on motifs in the distal regulatory regions.

  Details concerning the impacted motifs (SNP position, sequence of reference and alternative
alleles, and the direction of the effect on the motif) can be found in Supplementary data file 2.4.
1
identified by motif UA9, which was the top motif in NANOG hES ChIP-seq data;
2
identified by
motif UA2, which was the top motif in PBX3 GM12878 ChIP-seq data;
3
identified by motif UA5;
4
identified by motif UA3, which was the top motif in ZBTB7A K562 ChIP-seq data.


2.3.4 Expression analysis of candidate risk-associated genes

Although the genes identified by the exon or TSS SNPs are clearly good candidate genes for
analysis of their possible role in the development of colon cancer, it is difficult to definitively link a
target gene with a distal enhancer region because enhancers can function in either direction and
do not necessarily regulate the nearest gene.  In fact, the ENCODE Consortium recently reported
that, on average, a distal element can physically associate with ~3 different promoter regions
(Sanyal, Lajoie et al. 2012).  Also, only 27% of the distal elements showed an interaction with the
nearest TSS, although this increased to 47% when only expressed genes were used in the
analysis (Sanyal, Lajoie et al. 2012).  Taken together, these analyses suggest that examining the
3 nearest genes may produce a reasonable list of genes potentially regulated by the CRC risk-
associated enhancers. Therefore, we used the GENCODE V15 dataset and identified the 3
nearest promoters of coding genes and 3 nearest promoters of non-coding transcripts around
each of the 28 enhancers (Supplementary Table S2.3). We next limited the nearby coding and
AP1 EGR1 MYC/MAX TCF12
AP2 ELF1 NR2C2 TCF7L2
BHLHE40 ELK4 PBX31 TEAD1
CEBPB ESRRA PRDM1 THAP1
2
CREB1 ETS1 RUNX1 USF1
CTCF GABP RXRA YY1
E2F1 GATA SP1 ZBTB7A
3
E2F4 GFI1 SREBF1 ZEB1
EBF1 HNF4A STAT1 ZNF281

  58
 
non-coding transcripts to those expressed in either sigmoid colon RNA or HCT116 cells (Table
2.3); we note that taking into account expression did not greatly change the list of coding
transcripts but eliminated most of the non-coding transcripts which tend to be expressed in a very
cell-type specific manner.  Interestingly, several of the genes nearby the risk-associated
enhancers were also identified in the TSS analyses, suggesting that a putative causal gene
associated with CRC might be differentially regulated by risk-associated SNPs found in the
promoter and in a nearby enhancer  (Figure 2.2).  We note that in these cases, the promoters
and enhancers were identified by different risk-associated SNPs in high LD with a tag SNP, with
the promoters being identified by SNPs within +/- 2kb of the TSS and enhancers being identified
by distal SNPs. We further analyzed the expression levels of all genes directly linked to the risk
SNPs (by exons or TSS) and the expressed genes nearby the risk-associated enhancers in
normal colon and HCT116 tumor cells.  Shown in Figure 2.2 are the expression levels of each of
the 41 transcripts and the fold change in expression in HCT116 vs. normal cells; several of these
genes display robust changes in expression in the tumor cells.
As a second approach to identify transcripts potentially regulated by the identified enhancers,
we developed a new statistical approach that employs RNA-seq data from TCGA.  We selected
the 10 nearest genes 5’ of and the 10 nearest genes 3’ of each of the 28 enhancers. Because of
the difference in gene density in different regions of the genome, the 20-gene span ranged from
786 kb to 7.5 MB, depending on the specific enhancer.  Because several of the 28 enhancers are
clustered near each other, this resulted in a total of 182 unique genes.  We downloaded the RNA-
seq data for 233 colorectal tumor samples and 21 colorectal normal samples from the TCGA data
download website (https://tcga-data.nci.nih.gov/tcga/dataAccessMatrix.htm) and determined if any
of the 182 genes show a significant increase or decrease (>2 fold change and P value < 0.01) in
colon tumors vs. normal colon (see Methods and Supplementary Figure S2.6 for an analysis of
potential TCGA batch effects). We then eliminated those genes whose expression change did not

  59
 
correspond to the nature of the enhancer (e.g. a tumor-specific enhancer should not regulate a
gene that is higher in normal cells), leaving a total of 39 possible genes whose expression might
be differentially regulated in colon cancer by the risk enhancers (Table 2.5). We note that 5 of the
genes shown to be differentially expressed in the TCGA data (MYC, PITX1, POU5F1B, C5orf20,
and CDH3) are also in the set of nearest 3 genes to an enhancer having CRC risk-associated
SNPs.  We found that 0-6 differentially expressed genes were linked to an enhancer, with an
average of 4 transcripts per enhancer that showed correct differential expression in colon tumors.
Heatmaps of the expression of the 39 putative enhancer-regulated genes, as well as the
expression of the genes identified by exon and promoter SNPs, in the TCGA samples are shown
in Supplementary Figure S2.7. To determine if we could validate any of the putative enhancer
targets, we used eQTL analyses based on data from TCGA.  We began by identifying the SNPs
within each of the 28 enhancers that are on the Illumina WG SNP6 array used by TCGA.  
Unfortunately, these arrays include only 8% of the SNPs of interest (i.e. the exon, promoter, and
enhancer SNPs that are correlated with the CRC tag SNPs), greatly limiting our ability to
effectively utilize the eQTL methodology.  However, we did identify two examples of allelic
expression differences in the set of putative enhancer targets that correlated with SNPs in an
enhancer region.  Both of these SNPs fell within enhancer 19 and showed correlation with allelic
expression differences of the TMED6 gene (the two SNPs significantly associated with TMED6
expression had an adjusted P value FDR < 0.1 rs7203339 and rs1078621); enhancer 19 falls
within the intron of the CDH1 gene, which is 600 kb from the transcription start site of the TMED6
gene (Figure 2.3).  A summary of the eQTL analysis of enhancer and promoter risk-associated
SNPs can be found in Supplementary Table S2.7 and Supplementary Figure S2.8.  


  60
 
          Table 2.5 Linking transcripts to enhancers using TCGA data.

Shown are the subset of the 10 nearest 5’ and 10 nearest 3’ transcripts for each enhancer
that show significant gene expression differences in normal vs. tumor samples, as determined
using RNA-seq data from TCGA. The numbers in parentheses indicate the fold change, with
positive indicating a higher expression in tumors. The 7 normal-specific enhancers are shown
in bold and all genes correlated with these enhancers should be expressed higher in normal
cells and thus have a negative value.  The R vs L designation indicates the direction and
relative location of the transcript with respect to each enhancer

Region Enhancer( Correlated(transcripts
1
Enhancer1
PITX1(1.37_L1)4C5orf20(;1.96_R2)4TIFAB(;1.64_R3)4CXCL14(;1.39_R5)4SLC25A48(;
1.34_R7)
1
Enhancer2
PITX1(1.37_L1)4C5orf20(;1.96_R24)TIFAB(;1.64_R3)4CXCL14(;1.39_R5)4SLC25A48(;
1.34_R7)
1 Enhancer3 C5orf20(;1.96_R2)4TIFAB(;1.64_R3)4CXCL14(;1.39_R5)4SLC25A48(;1.34_R7)
1
Enhancer4
PITX1(1.37_L2)4C5orf20(;1.96_R1)4TIFAB(;1.64_R2)4CXCL14(;1.39_R4)4SLC25A48(;
1.34_R6)4TGFBI(2.74_R10)
1
Enhancer5
PITX1(1.37_L2)4C5orf20(;1.96_R1)4TIFAB(;1.64_R2)4CXCL14(;1.39_R4)4SLC25A48(;
1.34_R6)4TGFBI(2.74_R10)
1 Enhancer6 PITX1(1.37_L1)
2
Enhancer7
SQLE(1.8_L6)4FAM84B(1.01_L2)4POU5F1B(3.02_L1)4MYC(1.58_R2)4PVT1(2.44_R3)4
GSDMC(1.75_R4)
2 Enhancer8 none
3 Enhancer9 ARRB1(;1.15_R8)
4
Enhancer10
NRIP2(;1.07_L10)4FOXM1(1.47_L9)4TEAD4(2.15_L6)4RAD51AP1(1.36_R5)4GALNT8(;
1.58_R9)4KCNA6(;2.16_R10)
5 Enhancer11 LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;1.14_R9)
5
Enhancer12
RACGAP1(1.06_L10)4ASIC1(1.49_L9)4LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;
1.14_R9)
5 Enhancer13 LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;1.14_R9)
5 Enhancer14 LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;1.14_R9)
5
Enhancer15
RACGAP14(1.06_L10)4ASIC1(1.49_L9)4LIMA1(;1.16_L4)4METTL7A(;2.65_R3)4POU6F1(;
1.14_R9)
5 Enhancer16 RACGAP1(1.06_L10)4ASIC1(1.49_L9)
6 Enhancer17 SMPD3(;1.3_L4)4CDH3(6.24_L2)4TMED6(;1.36_R10)
6 Enhancer18 SMPD3(;1.3_L4)4TMED6(;1.36_R10)
6 Enhancer19 SMPD3(;1.3_L4)4CDH3(6.24_L2)4TMED6(;1.36_R10)
6 Enhancer20 SMPD3(;1.3_L4)4CDH3(6.24_L2)4TMED6(;1.36_R10)
7 Enhancer21 KATNAL2(;1.06_L9)4ZBTB7C(;2.81_L3)4LIPG(41.3_R6)4ACAA2(;1.43_R7)
7 Enhancer22 KATNAL2(;1.06_L9)4ZBTB7C(;2.81_L3)4LIPG(41.3_R6)4ACAA2(;1.43_R7)
8 Enhancer23 CHST8(44;2_R9)4KCTD15(;1.02_R10)
8 Enhancer24 CHST8(44;2_R9)4KCTD15(;1.02_R10)
9
Enhancer25
RBBP8NL(1.33_R3)4C20orf166;AS1(;3.81_R6)4SLCO4A1(3.19_R7)4
LOC100127888(2.45_R8)4NTSR1(;2.34_R9)4MRGBP(1.44_R10)
9
Enhancer26
RBBP8NL(1.33_R2)4C20orf166;AS1(;3.81_R5)4SLCO4A1(3.19_R6)4
LOC100127888(2.45_R7)4NTSR1(;2.34_R8)4MRGBP(1.44_R9)
9
Enhancer27
RBBP8NL(1.33_R3)4C20orf166;AS1(;3.81_R6)4SLCO4A1(3.19_R7)4
LOC100127888(2.45_R8)4NTSR1(;2.34_R9)4MRGBP(1.44_R10)
9
Enhancer28
RBBP8NL(1.33_R2)4C20orf166;AS1(;3.81_R5)4SLCO4A1(3.19_R6)4
LOC100127888(2.45_R7)4NTSR1(;2.34_R8)4MRGBP(1.44_R9)

  61
 
 
Figure 2.3 Linking a transcript to an enhancers using TCGA data.
A) Shown is the location of enhancer 19 and the position of the three SNPs (in red) identified in  
the eQTL studies and two other SNPs (in blue) identified by the FunciSNP analysis but not
present on the SNParray, in relation to the H3K27Ac, RNA-seq, and TCF7L2 ChIP-seq data for
that region.  Also shown are the ENCODE ChIP-seq transcription factor tracks from the UCSC
genome browser.  B) The expression of the Tmed6 RNA is shown for samples having
homozygous or heterozygous alleles for 3 SNPs in enhancer 19.. The upper and lower quartiles
of the box plots are the 75
th
and 25
th
percentiles, respectively. The whisker top and bottom are
90
th
and 10
th
percentiles, respectively. The horizontal line through the box is median value. The
pvalue corresponds to the regression coefficient based on the residue expression level and the
germ line genotype. Sample size is listed under each genotype. C) A schematic of the gene
structure in the genomic region around enhancer 19 (yellow box) is shown; the arrows indicate
the direction of transcription of each gene.  The 3 genes in the enhancer 19 region that showed
differential expression in normal vs tumor colon samples (Table 2.5) are indicated; of these,
only TMED6 was identified in the eQTL analysis.
 

  62
 
 
2.3.5 The effect of enhancer deletion on the transcriptome
 
The expression analyses described above provide a list of genes that potentially are regulated by
the CRC risk-associated enhancers.  However, it is possible that the enhancers regulate only a
subset of those genes and/or the target genes are at a greater distance than was analyzed.  One
approach to identify targets of the CRC risk-associated enhancers would be to delete an
enhancer from the genome and determine changes in gene expression.  As an initial test of this
method, we selected enhancer 7, located at 8q24.  The region encompassing this enhancer has
previously been implicated in regulating expression of MYC (Bond-Smith, Banga et al. 2012),
which is located 335 kb from enhancer 7.  We introduced guide RNAs that flanked enhancer 7,
along with Cas9, into HCT116 cells, and identified cells that showed deletion of the enhancer.  We
then performed expression analysis using gene expression arrays, identifying 105 genes whose
expression was down-regulated by loss of the enhancer (Supplementary Table S2.8); the
closest one was MYC, which was expressed 1.5 times higher in control vs. deleted cells (Figure
2.4).

  63
 

 
   Figure 2.4 Identification of genes affected by deletion of enhancer 7.
   (A) Shown are the expression differences (x axis) and the significance of the change (y axis)
of the genes in the control HCT116 cells vs. HCT116 cells having complete deletion of
enhancer 7. The Illumina Custom Differential Expression Algorithm was used to determine
Pvalues to identify the significantly altered genes; 3 replicates each for the control and deleted
cells were used. Genes on chromosome 8 (the location of enhancer 7) are shown in blue.  
The spot representing the MYC gene is indicated by the arrow. (B) Shown are all genes on
chromosome 8 that change in expression and the 10 genes showing the largest changes in
expression upon deletion of enhancer 7. The location of the enhancer is indicated and the
chromosome number is shown on the outside of the circle. (C) The genes identified as
potential targets using TCGA expression data are indicated; of these, MYC is the only
showing a change in gene expression upon deletion of the enhancer


2.4 Discussion

 
We have used the program FunciSNP (Coetzee, Rhie et al. 2012), in combination with
genomic, epigenomic, and transcriptomic data, to analyze 25 tag SNPs (and all SNPs in high LD
with those tag SNPs) that have been associated with an increased risk for CRC (Zanke,
Greenwood et al. 2007, Houlston, Webb et al. 2008, Houlston, Cheadle et al. 2010, Dunlop,

  64
 
Dobbins et al. 2012, Bauer, Kamran et al. 2013, Ding, Lee et al. 2013). Taken together, we have
identified a total of 80 genes that may be regulated by risk-associated SNPs. Of these, 24 are
directly linked to a gene via a SNP within an exon or proximal promoter region and 56 additional
genes are putative target genes of risk-associated enhancers; see Figure 2.5 for a schematic
summary of the location of the tag and LD SNPs and associated genes and Supplementary
Table S2.10 for a complete list of genes and how they were identified.

 
Figure 2.5 Summary of identified candidate genes correlated with increased risk for CRC.  
Shown are the 80 candidate genes identified in this study.  For the gene names, green means
that it was only identified as a potential enhancer target, the other genes were identified as
direct targets either by an exon SNP or a TSS SNP; the putative enhancer target genes were
selected as described in the text. For each tag SNP, the relative number of SNPs that identified
an exon (red portion), a TSS (blue portion), or an enhancer (green portion) is shown by the bar
graph. The 9 genomic regions that harbor CRC risk enhancers are shown by the green
rectangles outside the circle.

1
3
5
6
8
10
11
12
14
15
16
18 19 20
x
rs10795668
rs10774214
rs11169552
rs10411210
rs10936599
rs10505477
rs16892766
rs6687758
rs6691170
rs1665650
rs3802842
rs3824999
rs7136702
rs4444235
rs4779584
rs9929218
rs4939827
rs2423279
rs4925386
rs1321311
rs6983267
rs7014346
rs5934683
rs961253
rs647161
RP11-255B23.3
RP11-382A18.1
RP11-382A18.2
RP11-157P1.4
LIPG
AP001372.2
CATSPER3
GPATCH1
CABLES2
CDKN1A
C19orf40
ACTRT3
C11orf53
C11orf92
C11orf93
TMCO7
SMAD7
C12orf5
GREM1
CCND2
LAMA5
RHPN2
POLD3
H2AFY
UTP23
MYNN
LARP4
KCNE3
DIP2B
RPS21
PITX1
BMP4
EIF3H
CDH1
TERC
ATF1
DYM
MYC
CTIF
TGFBI
C5orf20
TIFAB
CXCL14
SLC25A48
SQLE
FAM84B
GSDMC
PVT1
ARRB1
LIPT2
NRIP2
FOXM1
TEAD4
RAD51AP1
GALNT8
KCNA6
RACGAP1
ASIC1
LIMA1
METTL7A
POU6F1
CDH3
SMPD3
TMED6
KATNAL2
ZBTB7C
ACAA2
CHST8
KCTD15
RBBP8NL
C20orf166-AS1
SLCO4A1
NTSR1
LOC100127888
MRGBP
POU5F1B
LRRC34
LRRIQ4
FAM186A
RP11-15F12.1
Yao Figure 5

  65
 
  Of the 25 tag SNPs, only one is found within a coding exon, occurring in the third exon of
the MYNN gene and resulting in a synonymous change that does not lead to a coding difference.
However, by analysis of SNPs in high LD with the 25 tag SNPs, we identified 5 genes that harbor
damaging SNPs and which are expressed in colon cells (HCT116, normal sigmoid colon, or
TCGA tumors); these are POU5F1B, RHNP2, UTP23, LAMA5, and FAM186A). Interestingly, the
retrogene POU5F1B which encodes a homolog of the stem cell regulator OCT4 has recently been
associated with prostate cancer susceptibility (Breyer, Dorset et al. 2014). We also identified 23
genes (21 coding and 2 non-coding) that harbor highly correlated SNPs in their promoter regions
and are expressed in colon cells.   Several of the genes that we have directly linked to increased
risk for CRC by virtue of promoter SNPs show large changes in gene expression in tumor vs.
normal colon tissue.  For example, TERC, the non-coding RNA that is a component of the
telomerase complex, was identified by a promoter SNP and has higher expression in a subset of
colon tumors (Supplementary Figure S2.7A). Similarly, CDH3 (P-cadherin) was identified by a
promoter SNP and shows increased expression in many of the colon tumors.  Both TERC and
CDH3 have previously been linked to cancer (Cao, Bryan et al. 2008, Paredes, Figueiredo et al.
2012).  Promoter SNPs also identified three uncharacterized proteins (c11orf93, c11orf92 and
c11orf53) clustered together on chromosome 11.  Inspection of H3K4me3 and H3K27Ac ChIP-
seq signals suggested that these genes are in open chromatin in normal sigmoid colon, but not in
HCT116.  Accordingly, the TCGA gene expression data showed that all 3 genes are down-
regulated in a subset of human CRC tumors  (Supplementary Figure S2.7A). Additional genes
identified by promoter SNPs that have been linked to cancer include ATF1, BMP4, CDH1,
CDKN1A, EIF3H, GREM1, LAMA5, and RHPN2 (Garte 1993, Cai, Zhao et al. 2007, Chen, Xu et
al. 2008, 2012, Huang, Guo et al. 2012, Paredes, Figueiredo et al. 2012, Abba, Patil et al. 2013,
Carneiro, Figueiredo et al. 2013, Cheung, Athineos et al. 2013, Danussi, Akavia et al. 2013,
Karagiannis, Berk et al. 2013, Aran and Hellman 2014).  For example, BMP4 is up-regulated in

  66
 
the HCT116 cells and has been suggested to confer an invasive phenotype during progression of
colon cancer (Cai, Zhao et al. 2007).  Interestingly, we also identified GREM1, an antagonist of
BMP proteins, and showed that expression of GRIM1 is decreased in HCT116. The down-
regulation of the antagonist GRIM1 and the up-regulation of the cancer-promoting BMP4 may
cooperate to drive colon cancer progression. LAMA5 is a subunit of laminin-10, laminin-11 and
laminin-15. Laminins, a family of extracellular matrix glycoproteins, are the major non-collagenous
constituent of basement membranes and have been implicated in a wide variety of biological
processes including cell adhesion, migration, signaling, and metastasis (Aumailley 2013).
We identified 28 enhancers, clustered in 9 genomic regions, that harbor correlated SNPs. It is
important to note that in our studies we have used the appropriate cell types and the appropriate
epigenetic mark to identify CRC-associated enhancers.  Previous analyses have attempted to link
SNPs to enhancers by using transcript abundance, epigenetic marks, or transcription factor
binding from non-colon cell types (Carvajal-Carmona, Cazier et al. 2011).  In contrast, we have
used normal and tumor cells from the colon.  Of equal importance is the actual epigenetic mark
that is used to identify enhancers.  A previous study used H3K4me1 to identify genomic regions
that were differently marked between normal and tumor colon cells (Akhtar-Zaidi, Cowper-Sal-lari
et al. 2012). However, although H3K4me1 is associated with enhancers, this mark does not
specifically identify active enhancers. Some regions marked by H3K4me1 are classified as “weak”
or “poised” enhancers and it is thought that these regions may become active in different cells or
developmental states (Bernstein, Stamatoyannopoulos et al. 2010).  In contrast, H3K27Ac is
strongly associated with active enhancers (Bonn, Zinzen et al. 2012, Calo and Wysocka 2013)
and we feel that this mark is the most appropriate one for identification of CRC-associated risk
enhancers.  
Although it is not possible to conclusively know a priori what gene is regulated by each of the
identified enhancers, we have derived a list of putative CRC risk-associated enhancer target

  67
 
genes by examining gene expression data from HCT116 cells and from a large number of colon
tumors.  Several of the genes that are possible enhancer targets are transcription factors that
have previously been linked to cancer, including H2AFY, MYC, SMAD7, PITX1, TEAD4, and
ZBTB7C.  MYC, of course, has been linked to colon cancer by many studies due to the fact that it
is a downstream mediator of WNT signaling, which is a strongly correlated with colon cancer
(2012).   In addition, PITX1, TEAD4, and ZBTB7C are all transcription factors that have been
previously linked to the control of cell proliferation, specification of cell fate, or regulation of
telomerase activity (Cao, Rabinovich et al. 2011, Home, Saha et al. 2012, Jeon, Kim et al. 2012,
Knosel, Chen et al. 2012).  Also, PVT1 is a Myc-regulated non-coding RNA that may play a role in
neoplasia (Birney, Stamatoyannopoulos et al. 2007, Huppi, Pitt et al. 2012).    
In conclusion, we have used epigenomic and transcriptome information from normal and
tumor colon cells to identify a set of genes that may be involved in an increased risk for the
development of colon cancer. We realize that we cast a rather large net by analyzing 10 genes 5’
and 10 genes 3’ of each enhancer.  We note that 5 of the genes shown to be differentially
expressed in the TCGA data (MYC, PITX1, POU5F1B, C5orf20, and CDH3) are also in the set of
nearest 3 genes to an enhancer having CRC risk-associated SNPs.  However, enhancers can
also work at large distances.  In fact, the eQTL analysis identified TMED6 as a potential target of
enhancer 19 (over 600 kb away) and deletion of enhancer 7 identified MYC as a target (335 kb
away).  Future analyses of the entire set of CRC risk-associated enhancers are required to
confirm the additional putative long range regulatory loops suggested by our studies. Such studies
will provide a high confidence list of genes which, when combined with the genes identified by the
TSS risk-associated SNPs, should be prioritized for analysis in tumorigenicity assays.




  68
 
2.5 Methods

 
2.5.1 RNA-seq

 
         RNA-seq data was downloaded from the Reference Epigenome Mapping Center for
analysis of gene expression in sigmoid colon cells. For HCT116 colon cancer cells, RNA was
prepared using Trizol (Life Technologies, Carlsbad, CA),  paired-end libraries were prepared
using the Illumina TruSeqV2 Sample Prep Kit (Catalog #15596-026), starting with 1 ug total
RNA. Libraries were barcoded, pooled, and sequenced using an Ilumina Hiseq. For analysis of
RNA-seq data, we used Cufflinks (Trapnell, Williams et al. 2010), a program of “alignment to
annotation” having discontinuous mapping to the reference genome. mRNA abundance was
measured by calculating FPKM (expected fragments per kilobase of transcripts per million
fragments sequenced), to allow inter-sample comparisons. We specified the –G option with the
GENCODE V15 comprehensive annotation so that the program will only do alignments that are
structurally compatible with the reference transcript provided. Two biological replicates were
performed and the mean FPKM of two biological replicates represents the expression of each
gene. We categorized genes into non-expressed, low expressed, and expressed based on the
distribution of the Gene FPKM (Figure S2.2) generated by the R package “ggplot2”.  
RNA-seq data for 233 colorectal tumor samples and 21 colorectal normal samples was
downloaded from the TCGA data download website (https://tcga-
data.nci.nih.gov/tcga/dataAccessMatrix.htm). The data were all generated on the Illumina HiSeq
platform, and mapped with the RSEM algorithm and normalized so that the third quartile for each
sample equals 1000. Entrez gene IDs were used for mapping to genomic locations using
GenomicRanges (http://www.bioconductor.org/packages//2.12/bioc/html/GenomicRanges.html).
To identify transcripts differentially expressed in the tumor samples, we selected the 10 nearest
genes 5’ of and the 10 nearest genes 3’ of each of the 28 enhancers. After removing the non-
expressed genes, we then log2-transformed the expression data [log2(RSEM+1)], and performed

  69
 
a t test on gene expression between the normal group and the tumor group for each gene using
254 TCGA colorectal RNAseq datasets. We selected statistically genes that showed a statisically
significant 2 fold change in expression (P <0.01, after adjustment by Benjamini and Hochberg’s
False Discovery Rate Methods).  
To genererate the heatmap showing expression of genes in the TCGA samples, we log2-
transformed the expression data of the 254 TCGA colorectal samples RNAseq [log2(RSEM+1)].
Then we computed the mean and standard deviation of the expression of the each gene (𝑋
"
and
𝑠
"
). We normalized gene expression by [Z=
$%&
'
(
'
]. Hierarchical clustering with Ward’s method was
used to normalized TSS/exon gene expression.  
2.5.2 ChIP-seq analysis

         Two replicate H3K27Ac ChIP-seq datasets from HCT116 cells and two replicate H3K27Ac
ChIP-seq datasets from normal sigmoid colon were analyzed using the Sole-search ChIP-seq
peak calling program (Blahnik, Dou et al. 2010, Blahnik, Dou et al. 2011) using the following
parameters (Permutation:5; Fragment:250; AlphaValue: 0.00010 = 1.0E-4; FDR: 0.00010 = 1.0E-
4; PeakMergeDistance:0; HistoneBlurLength:1200). Each dataset was analyzed separately and
also analyzed as a merged dataset for HCT116 or sigmoid colon. The merged H3K27Ac peaks
from HCT116 or Sigmoid colon were analyzed using the GenomicRanges package of
bioconductor to identify promoter vs distal peaks.  
2.5.3 Enhancer deletion

          Guide RNAs designed to recognize chr8: 128412821-128412843 and chr8: 128414816-
128414838 (hg19) were cloned into a gRNA cloning vector (Addgene plasmid 41824) and
introduced into HCT116 cells by transfection, along with a plasmid encoding Cas9 and GFP.  
Cells were sorted using a flow cytometer to capture the cells having high GFP signals and then

  70
 
colonies were grown from single cells.  Complete deletion of all alleles for enhancer 7 was
confirmed by PCR using primers flanking the enhancer.  RNA analysis was performed in triplicate
using HumanHT-12 v4 Expression BeadChip arrays (Illumina), comparing the deleted cells to
parental HCT116 cells.
2.5.4 Analysis of FunciSNP and correlated SNPs effects

         To identify SNPs correlated with the 25 CRC tag SNPs and that overlap with chromatin
biofeatures, we use the R package for FunciSNP (Coetzee, Rhie et al. 2012), which is available in
Bioconductor. We used H3K27ac ChIP-seq data from HCT116 cells and sigmoid colon tissue and
as biofeatures we used exon, intron, UTR, TSS annotations generated from GENCODE V15. We
ran FunciSNP with the following parameters: +/- 200 kb around each of the 25 tag SNPs and r
2

>0.1. To analysis the potential effects of correlated SNPs on protein coding, we employed SnpEff
and Provean using suggested default parameters. For analysis of SNPs on transcription factor
motifs, we employ a method developed by Dennis Hazellett (personal communiciation).  
2.5.5 Batch effects analysis

          We note that TCGA has strict sample criteria. Each frozen primary tumor specimen has a
companion normal tissue specimen which could be blood/blood components (including DNA
extracted at the tissue source site), or adjacent normal tissue taken from greater than 2 cm from
the tumor. Each tumor and adjacent normal tissue specimen (if available) were embedded in
optimal cutting temperature (OCT) medium and a histologic section was obtained for review. Each
H&E stained case was reviewed by a board-certified pathologist to confirm that the tumor
specimen was histologically consistent with colon adenocarcinoma and the adjacent normal
specimen contained no tumor cells. The tumor sections were required to contain an average of
60% tumor cell nuclei (TCGA has found that this provides a sufficient proportion so that the tumor
signal can be distinguished from other cells), with less than 20% necrosis for inclusion in the study

  71
 
per TCGA protocol requirements. To address potential batch effects, we applied MBatch software,
which was developed by the MD Anderson Cancer Center and has been widely used to address
batch effects in TCGA Consortium (2012, TheCancerGenomeAtlas 2012), to perform hierarchical
clustering and Principal Component Analysis (PCA) to address any potential batch effects in the
colorectal TCGA data sets: level 3 mRNA expression (RNA-seq Illumina Hiseq), level 3 DNA
methylation (Infinium HM450K microarray), level 4 SNPs CNV by gene (GW SNP 6). We
assessed batch effects for two variables: batch ID and tissue source site. For hierarchical
clustering, MBatch uses the average linkage algorithm with 1 minus the Pearson correlation
coefficient as the dissimilarity measure. The samples were clustered after labeling with different
colors, each of which corresponds to a batch ID or a tissue source site. (Figures S2.6a.1,
S2.6b.1, and S2.6c.1). For PCA, MBatch plotted four principal components (Figures S2.6a.2,3,
S2.6b.2, 3, and S2.6c.2, 3). Samples with the same batch ID (or tissue source site) were labeled
as same color and shape and were connected to the batch centroids. The centroids were
computed by taking the mean across all samples in the same batch.  To assess batch effects on
mRNA expression (Supplementary Fig. S2.6a), genes with zero values were removed and
normalized gene expression values were log2-transformed before analyzing batch effects. Batch
132 and 154 stood out in one comparison (Comp1 vs Comp2) but not in the other comparisons
(Suppl. Fig. 6a2). The remaining batches or tissue source sites did not stand out in clustering or in
any of the PCA plots; thus the data is not supportive of a strong batch effect and all data was
used for analysis. When batch effect on CNV (Supplementary Fig. S2.6b) was analyzed, the
centroid for the NH tissue source site stood out among other batches. The remaining batches or
tissue source sites did not stand out in clustering or in any of the PCA plots.  We did not apply
correction on the data because (i) there were only two samples and a centroid calculated by only
two samples is likely not accurate, (ii) the two samples within the NH batch were not far from other
individual samples, and (iii) two samples would not dramatically affect our analysis of 233

  72
 
samples.  When assessing batch affects on DNA methylation analysis, no batches or tissue
source sites stood out in clustering or in any of the PCA plots. (Supplementary Fig. S2.6c).  In
summary, none of the samples consistently show batch effects in both clustering and PCA
algorithms. Based on the above analysis, we believe that batch effects among the data sets are
not dramatically influencing our analysis.
2.5.6 eQTL analyses

         We employed a two step linear regression model which considers somatic germline
genotype, copy number variation and DNA methylation at gene promoters to perform eQTL
analysis (Abba, Patil et al. 2013). We selected 228 patients with both tumor samples and
matched normal blood or normal tissue samples from the TCGA colorectal cancer data set. For
each of these patients, we obtained the germline genotypes from normal blood or normal tissue
samples using data from the GW SNP6 array platform. We directly downloaded gene-level
somatic copy number, gene isoform expression (from the  RNAseqHiseq Illumina platform) and
DNA methylation data (from the HM450K platform) for each tumor sample from the TCGA data
download website
(http://gdac.broadinstitute.org/runs/analyses__2014_01_15/data/COAD/20140115/). To determine
DNA methylation of a promoter, we calculated the average DNA methylation at 100bp upstream
of and 700bp downstream of the TSS for a transcript. We fit the germline genotype of patients, the
continuous DNA methylation level of promoters, and the CNV of matched tumor samples into the
two steps multivariate linear regression model. 60 SNPs, including 6 tag SNPs, 18 SNPs within
risk enhancers, and 45 SNPs within TSS regions, were present on the GW SNP6 array.  eQTL
analyses were performed using these 60 SNPs and the genes identified by exon or TSS SNPs or
by differential expression analysis (see Table 2.2 and Table 2.5). To reduce false positives, we
excluded genes showing log2 expression less than 2 in over 90% of the samples. The Benjamini-

  73
 
Hochberg method was used to correct the original P value and FDR of 0.1 was used as the
threshold of significant association.  
2.5.7 General Data Handling and Visualization

         Throughout the analyses we used GenomicRanges to import, export and/or intersect
genomic data for plotting and annotation purposes; the R version 3.0.0 (2013-04-03) was used for
all statistical analyses, the R function ‘image’ was used for heatmap generation, and package
‘ggplot2’ was used to generate scatterplots. To generate the circle plot, Circos software was used
(Krzywinski, Schein et al. 2009).  All genomic location information is based on hg19.  

  74
 

2.6 Supplementary figures for chapter2
 

 
Supplementary figure 2.1 Correlated exon SNPs.  
Shown are correlated SNPs that fall within exons located within +/- 200 kb from each tag SNP.
The r2 value in relation to the tag SNP is shown on the x axis and the distance of the correlated
SNP from the tag SNP is shown on the y axis. The color indicates whether the exon is
contained within a protein coding (green), non-coding (red), or pseudogene (blue) transcript.


 



  75
 


 
Supplementary figure 2.2 Analysis of RNA-seq data.  
The top panels represent expression levels of coding (left panel) and non-coding (right panel)
RNAs in HCT116 and sigmoid colon samples. RNAs were divided into highly expressed (FPKM
higher than 2), modestly expressed (FPKM between 0.5 and 2) and not expressed FPKM less
than 0.5). The number of genes in each category are shown in the table.















  76
 


 
Supplementary figure 2.3 Correlated TSS SNPs.  
Shown are correlated SNPs that fall within TSS regions located within +/- 200 kb from each tag
SNP. The r2 value in relation to the tag SNP is shown on the x axis and the distance of the
correlated SNP from the tag SNP is shown on the y axis. The color indicates whether the TSS
regulates a protein coding (green), non-coding (red), or pseudogene (blue) transcript.  














  77
 

 
Supplementary figure 2.4 ChIP-seq peak analysis.  
The top panels show the H3K27Ac peak height vs peak rank in the HCT116 (left) and sigmoid
colon (right) ChIP-seq datasets; blue indicates peaks present in both replicates, green indicates
peaks present in only one replicate, and red indicates peaks present only in the merged dataset.
The bottom panels show the H3K27Ac peak height vs peak rank in the HCT116 (left) and
sigmoid colon (right) ChIP-seq datasets; blue indicates peaks located in promoter proximal
regions and red indicates peaks located in distal regions.  



 

 

 

 

 

 

 

 

 

 

 

 

  78
 

 
Supplementary figure 2.5 Correlated enhancer SNPs.  
Shown are correlated SNPs that fall within distal H3K27Ac regions located within +/- 200 kb
from each tag SNP. The r2 value in relation to the tag SNP is shown on the x axis and the
distance of the correlated SNP from the tag SNP is shown on the y axis. The color indicates
whether the enhancer is found only in normal sigmoid colon (blue), only in HCT116 (green), or
in both normal and HCT116 cells (red).  

 














  79
 

 
Supplementary figure 2.6 TCGA batch effects analysis. RNA-seq batch effects.
a.1) Hierarchical clustering plot for mRNA expression data. a.2) PCA for mRNA expression,
showing first and second components comparison, second and third components comparison,
and third and forth components comparison plots with samples connected by centroids according
to batch ID. a.3) PCA for mRNA expression, showing first and second components comparison,
second and third components comparison and third and forth components comparison plots with
samples connected by centroids according to tissue source site (TSS).  

  80
 
Supplementary figure 2.7 TCGA batch effects analysis. CNV (GW SNP6 array) batch
effects.  
b.1) Hierarchical clustering plots for copy number variation (CNV) data. b.2) PCA for CNV,
showing first and second components comparison, second and third components comparison
and third and forth components comparison plots with samples connected by centroids
according to batch ID. b.3) PCA for CNV, showing first and second components comparison,
second and third components comparison and third and forth components comparison plots
with samples connected by centroids according to TSS.  

  81
 
Supplementary figure 2.8 TCGA batch effects analysis. DNA methylation (Infinium
HM450K microarray) batch effects.    
c.1) Hierarchical clustering plot for DNA methylation data. c.2) PCA for DNA methylation,
showing first and second components comparison, second and third components comparison
and third and forth components comparison plots with samples connected by centroids
according to batch ID. c.3) PCA for DNA methylation, showing first and second components
comparison, second and third components comparison and third and forth components
comparison plots with samples connected by centroids according to TSS.  

  82
 

Supplementary figure 2.9 Expression analysis of genes identified by promoter and exon
SNPs and potential enhancer target genes in TCGA samples.  
(a) Shown are the expression levels for the genes identified by exon or TSS SNPs in normal and
colorectal TCGA tumor samples. The heatmap was created by unsupervised clustering of
normalized gene expression, using 21 normal and 233 tumor colon tissues from TCGA. Each
row represents genes and each column is one of the normal (left panel) or tumor (right panel)
samples. (b) Shown are the expression differences (x axis) in the normal vs tumor samples and
the significance of the change (y axis) of the genes identified by promoter or exon SNPs. Genes
identified as differently expressed are those with an adjusted P value <0.01 and log2 expression
difference >1); red indicates genes with a higher expression in tumors and blue indicates genes
with a lower expression in tumors. (c) Shown are the expression levels of the differentially
expressed genes in the set of the 10 nearest 5’ and 10 nearest 3’ genes for each of the 28
enhancers in tumor and normal samples in normal and tumor TCGA samples; heatmap was
created as described in panel A. (d) Shown are the expression differences (x axis) in the normal
vs tumor samples and the significance of the change (y axis) of the 10 nearest 5’ and 10 nearest
3’ genes for each of the 28 enhancers. Genes identified as differently expressed are those with
an adjusted P value <0.01 and log2 expression difference >1); the colors refer to the 9 different
regions within which the 28 enhancers are located (see Table 2.3).  

  83
 

 
Supplementary figure 2.10 eQTL analysis summary.    
(a) Distribution of promoter methylation of all genes in normal and tumor samples. (b)
Distribution of gene level copy number variation for all genes. (c) Shown are the subset of all
related SNP-gene pairs identified by eQTL analysis with a non-adjusted P-value <0.01. (d)
Shown is the expression level for the three genotypes for each SNP-gene pair, colored by the
copy number variable of the gene. (e) Shown is the expression level for the three genotypes for
each SNP-gene pair, colored by the DNA methylation level at the promoter.  


  84
 
3 Chapter 3: Effects on the transcriptome upon deletion of distal
elements are not correlated with the size of H3K27Ac peaks in
human cells

 

 
The following chapter is being under review as a research article entitled “Effects on the transcriptome
upon deletion of distal elements are not correlated with the size of H3K27Ac peaks in human cells” for
publication in the journal Genome biology. The authors will be Yu Gyoung Tak, Yuli Hung, Lijing Yao
,
,
Matthew R Grimmer, Albert Do, Mital S Bhakta, Henriette O’Geen, David J Segal, and Peggy J Farnham.
Yu Gyoung Tak created enhancer-deleted HCT116 and HEK293 clones, performed the RNA-seq, ChIP-seq,
and 4C-seq experiments, and performed analyses for the figures and PJF conceived of the overall project
design, directed the experimental analyses, and drafted the manuscript.

 

 

 
     
3.1 Abstract

 
        Genome-wide association studies (GWAS) have identified single nucleotide polymorphisms
(SNPs) associated with increased risk for colorectal cancer (CRC). A molecular understanding of
the functional consequences of this genetic variation is complicated because most GWAS SNPs
are located in non-coding regions. We used epigenomic information to identify H3K27Ac peaks in
HCT116 colon cancer cells that harbor SNPs associated with an increased risk for
CRC.Employing CRISPR/Cas9 nuclease, we deleted 2 CRC risk-associated H3K27Ac peaks
from HCT116 cells and analyzed effects on the transcriptome. Deletion of E7, which corresponds
to a small H3K27Ac peak, caused large-scale changes in gene expression, resulting in decreased
expression of many nearby genes. Deletion of E24, which corresponds to a large H3K27Ac peak,
caused changes in hundreds of genes, only one of which was within 1 Mb. As a comparison, we
showed that deletion of a robust H3K27Ac peak not associated with CRC had minimal effects on

  85
 
the transcriptome. Interestingly, although there is no H3K27Ac peak in HEK293 cells in the E7
region, deletion of this region in HEK293 cells decreased expression of several of the same genes
that were downregulated in HCT116 cells, including the MYC oncogene. Accordingly, deletion of
E7 causes changes in cell proliferation in HCT116 and HEK293 cells. We show that the size or
presence of an H3K27Ac peak does not correlate with enhancer activity. We also show that
deletion of distal regulatory elements associated with CRC can have genome-wide effects on the
transcriptome.
    3.2 Introduction

 
            In our previous studies, we identified a set of 28 enhancers (defined as the presence of a
H3K27Ac peak located farther than +/- 2 kb from a transcription start site) that harbor single
nucleotide polymorphisms (SNPs) associated with an increased risk for colon cancer (Yao, Tak et
al. 2014).  Our working hypothesis is that the different nucleotide sequence between the “risk-
associated” vs. “non risk-associated” SNPs affects activity of the enhancers, causing a change in
expression in genes (coding or non-coding) that can influence the balance between normal tissue
proliferation or differentiation vs. tumor initiation or progression. Enhancers are composed of
binding sites for many different site-specific DNA binding transcription factors (TFs) that are
thought to work in concert to provide cell type-specific functionality. For example, one of the first
characterized mammalian enhancers is the interferon beta enhanceosome, which is bound by 8
different TFs (Maniatis, Falvo et al. 1998, Panne 2008). Recent studies from the ENCODE
Project(2012) (ENCODE_Project_Consortium 2012) and the Roadmap Epigenome Mapping
Consortia (RoadmapEpigenomicsConsortium 2015) have identified hundreds of thousands of
enhancers, most of which include motifs for a variety of different TFs. The overall function of a
given enhancer is dependent upon several conditions, such as the number of motifs contained
within it, the extent to which the nucleotides within the enhancer match consensus binding motifs,
the expression level of the transcription factors that bind those motifs, and the location of the

  86
 
enhancer with respect to chromatin boundaries.  Because many TFs contribute to the overall
function of an enhancer, it is likely that single nucleotide changes within an enhancer will have
quite modest effects on the transcriptional output from a target promoter
 (Corradin
 and
 Scacheri
 
2014).  Although modest effects in gene expression could have strong phenotypic outcomes over
the course of a long time period, such as during tumor development, the consequences of a
single nucleotide change in an enhancer may be difficult to observe in short term cell culture
assays. Thus, rather than analyzing the effect of a single SNP, our approach is to determine the
functional role of the enhancer as a whole by identifying genes that are responsive to loss of the
enhancer in colon cancer cells. To identify target genes of CRC risk-associated enhancers, we
analyzed clonal populations of cells in which specific CRC risk-associated enhancers have been
deleted. For comparison, we also analyzed an enhancer not associated with CRC and a distal
region that lacks the H3K27Ac mark. Our results suggest that the size or presence of an
H3K27Ac peak does not correlate with effects on gene expression. We show that deletion of distal
regulatory elements associated with CRC can affect nearby genes and also have genome-wide
effects on the transcriptome.

 
3.3 Results
3.3.1 Deletion of CRC risk-associated enhancers can cause widespread effects on
the transcriptome
         We have chosen to use HCT116 colon cancer cells for our analysis of CRC risk-associated
enhancers because of the availability of histone modification data (Frietze, Wang et al. 2012),
whole genome DNA methylation data (Blattler, Yao et al. 2014), and ChIP-seq TF binding sites
(ENCODE_Project_Consortium 2012) from these cells.  Of the previously identified 28 CRC risk-
associated enhancers, 14 are detected in HCT116 cells. However, they are not necessarily

  87
 
amongst the top ranked enhancers in HCT116 cells (Figure 3.1A). Interestingly, 12 of the 14
CRC risk-associated enhancers in HCT116 cells are located within the introns of 6 protein-coding
genes and 1 non-coding RNA, whereas 2 of the enhancers are intergenic. In HCT116 cells,
approximately 50% of all H3K27Ac enhancers are intronic (Blattler, Yao et al. 2014); it is not clear
if the high preponderance of intronic enhancers in the set of CRC risk-associated enhancers is
due to a bias involving the choice of index SNPs on the GWAS array or due to a biological or
functional reason. However, others have also observed that GWAS SNPs (and SNPs in high
linkage disequilibrium with GWAS index SNPs) are more enriched in introns than in intergenic
regions (Schork, Thompson et al. 2013).  We selected 2 risk enhancers, which reside in an intron
of the non-coding RNA CASC8 (enhancer E7, identified by rs6983267) and in an intron of the
RHPN2 protein-coding gene (enhancer E24, identified by rs10411210) for targeted deletion
(Figure 3.1B). Ectopic expression of RHPN2 promotes an invasive phenotype in neural stem cells
and astrocytes and RHPN2 amplification correlates with a highly aggressive phenotype that directs
the worst clinical outcomes in patients with glioblastoma (Danussi, Akavia et al. 2013). Although
the function of CASC8 is not known, SNPs within the long non-coding RNA CASC8 have also
been implicated in increased risk for gastric and breast cancer (Ma, Gu et al. 2014).  However,
enhancers can work at a long distance and in either orientation and thus it is not certain that the
gene in which an enhancer resides is in fact regulated by that enhancer (Sagai, Hosoya et al.
2005, Farnham 2009, Sanyal, Lajoie et al. 2012, Zhou, Katsman et al. 2014). For example, in a
ChIA-PET study using an antibody for RNA polymerase II, Li et al. (Li, Ruan et al. 2012) identified
~20,000-30,000 enhancer-promoter loops in MCF7 or K562 cells. Of these, more than 40% of the
enhancers skipped over the nearest gene to loop to a farther one. Of course, it is not been proven
that all loops represent bona fide regulatory interactions or that all enhancer-mediated regulation
involves looping (Yao,
  Berman
  et
  al.
  2015). However, these studies indicate that it is not a
certainty that the genes in which the enhancers reside are the ones that should be linked to an

  88
 
increased risk for colon cancer. To gain further insight into the mechanisms by which the CRC
risk-associated enhancers may influence tumor development, we analyzed gene expression
changes in clonal populations of cells lacking each of these enhancers.  


 
Figure 3.1 Genomic location and H3K27Ac profiles of E7, E24, 18qE, and 18qNE.  
(A) Shown is a graph representing HCT116 H3K27Ac peaks, as identified using Sole-search (Blahnik,
Dou et al. 2010, Blahnik, Dou et al. 2011) to analyze the HCT116 H3K27Ac ChIP-seq data; all 35,932
peaks were plotted based on peak height (Y axis) vs. peak rank (X axis).  The locations of the 14 CRC
risk-associated enhancers present in HCT116 cells are shown (yellow circles), with the larger red circles
indicating the CRC risk-associated enhancers deleted in this study.  Also shown is the location of the
H3K27Ac peak called 18qE (blue circle); the green circle indicates that the 18qNE region was not called
as a peak. (B) For the 4 distal regions deleted in this study, the genomic location, the H3K27Ac pattern in
HCT116 cells, the combined H3K27Ac pattern in multiple cell types (the ENCODE Regulation Layered
H3K27Ac track) and TF binding data (the ENCODE Regulation Txn Factor ChIP track combines ENCODE
ChIP-seq data from many different transcription factors and cell lines in a relatively dense display) from
the UCSC genome browser are shown; the boxes indicate the approximate region that was deleted.
 

  89
 

 
Figure 3.2 Experimental schema.  
     The overall approach used to analyze the function of an enhancer is shown, beginning with design of
guide RNAs used to guide Cas9 to the targeted enhancers and ending with RNA-seq analysis of control
and enhancer-deleted cells.  

An overview of the strategy used to determine the function of the enhancers is shown in
Figure 3.2. Guide RNAs were designed that flank the H3K27Ac marks of the selected enhancers;
see Supplementary figure 3.1 for the genomic location and the sequences of all guide RNAs
used in this study. Plasmids expressing the guide RNAs and Cas9-GFP were transfected into
target cells, 48 hours later cells were sorted for high GFP expression, and colonies were grown
from single cells.  PCR analysis using a set of primers that flank the targeted genomic region and
a set of primers internal to the targeted region were used to identify clonal lines having deletion of
the targeted region (Supplementary figure 3.2).  In general, less than 10% of the clones tested
after sorting for high GFP levels showed deletion of all alleles. After confirming that the targeted

  90
 
region had been deleted, RNA-seq was performed to identify gene expression differences
between the cells with enhancer deletion vs. the control cells. For analysis of the CRC-associated
enhancers, we prepared control samples from four different clonal populations of FACS-selected
HCT116 cells that had been transfected with Cas9-GFP plus the guide RNA vector lacking
inserted guide RNAs. Each control clone was analyzed using RNA preps prepared from cell
cultures grown on different days.  For enhancer-deleted cells, we also prepared triplicate RNA
samples from cells grown on the same days as the control clones, and sequenced matched
control and deleted samples in the same lane of a sequencer to prevent batch effects.  
Information concerning all RNA-seq data sets analyzed in this study can be found in
Supplementary figure 3.3, quality control plots for the RNA-seq data can be found in
Supplementary figure 3.4, and a list of upregulated and downregulated genes for all enhancer
deletions can be found in Supplementary data file 3.1.
The E7 CRC risk-associated enhancer harbors a GWAS CRC index SNP and resides near,
but not within, a region that contains a large number of super-enhancers (see Figure 3.3). Due to
the small size of the H3K27Ac peak (Figure 3.3C), we initially thought that most genes in the
region would be controlled by the super-enhancers and not by E7.  Surprisingly, we found that
many genes within +/- 5 Mb from E7 were downregulated upon deletion of the enhancer (Figure
3.3A).  Specifically, 5 genes were downregulated within +/- 1 Mb of E7, including the non-coding
CASC8 RNA (in which E7 resides), which was reduced 1.5 fold. Of note, deletion of E7 also had
effects on genes that were not nearby, causing the up- or downregulation of more than 1000
genes in the genome, most of which were on other chromosomes (Table 3.1). In fact 25 genes
were reproducibly downregulated more than 5-fold in the E7-deleted cells; a list of all
downregulated genes in E7-deleted cells is found in Supplementary data file 3.1 and the
chromosomal location of the top 25 downregulated genes is shown in Supplementary figure 3.5.
We also noted that deletion of the small E7 enhancer resulted in loss of H3K27Ac peaks at the

  91
 
edge of the nearby super-enhancer (Figure 3.3C) and at another super-enhancer located ~2 Mb
from E7; patterns of TCF7L2 closely match the H3K27Ac patterns in control and deleted cells
(Supplementary figure 3.6).  

 
Figure 3.3 Gene expression and H3K27Ac changes upon deletion of E7 in HCT116 cells.  
     (A) The Y-axis shows log2 fold changes in gene expression in the E7-deleted cells throughout a +/- 5
Mb region from E7, with genes showing decreased expression upon enhancer deletion having negative
numbers and genes showing increased expression upon enhancer deletion having positive numbers.
(B) Shown are the H3K27Ac patterns in independent replicates of control and enhancer-deleted cells
within an ~3 Mb region near the E7 enhancer (located within the purple box, which is shown in an
expanded view in panel C). Also shown is the H3K4me3 pattern to identify promoter regions. The Y-
axis indicates the peak height and the X-axis indicates the genomic location; only those genes
significantly downregulated in the deleted cells are indicated in red below the H3K4me3 track. (C) The
H3K27Ac patterns in control and deleted cells are shown for the region nearby E7 (red arrow), showing
loss of the E7 H3K27Ac peak, as well as loss of H3K27Ac signal at the right side of the nearby large
enhancer region.  

         We next analyzed the effects of deletion of E24, which not only harbors a GWAS index
SNP associated with increased risk for CRC but is also the only robust enhancer within a 3.5 Mb
region that includes 76 expressed genes (note that most H3K27Ac peaks near E24 are promoter

  92
 
regions, as indicated by the H3K4me3 track; Figure 3.4). We thought that this robust enhancer
would be responsible for regulation of several of the genes in the nearby region.  Surprisingly, we
found that no genes nearby E24 reproducibly showed a greater than 1.5 fold change in
expression when E24 was deleted (Figure 3.4A). Thus, most genes within this region are either
not regulated by an enhancer or are regulated by an enhancer located quite far away.  However,
the gene in which the enhancer resides (RHPN2) almost met the cut-off, showing a 1.44 fold
decrease upon deletion of E24. Deletion of E24 caused a similar reduction in levels of UBA2,
which is more than 1 MB from E24. Similar to the results observed in E7-deleted cells, more than
one hundred genes far from E24 or on other chromosomes showed changes in expression in the
E24-deleted cells (Table 3.1); a list of all downregulated genes in E24-deleted cells is found in
Supplementary data file 3.1 and the chromosomal location of the top 25 downregulated genes is
shown in Supplementary figure 3.5.

  93
 

 
Figure 3.4 Gene expression and H3K27Ac changes upon deletion of E24 in HCT116 cells.  
.  (A) The Y-axis shows log2 fold changes in gene expression in the E24-deleted cells throughout a +/- 5
Mb region from E24, with genes showing decreased expression upon enhancer deletion having
negative numbers and genes showing increased expression upon enhancer deletion having positive
numbers. (B) Shown are the H3K27Ac patterns in independent replicates of control and enhancer-
deleted cells within an ~3.5 Mb region near the E24 enhancer (located within the purple box, which is
shown in an expanded view in panel C).  Also shown is the H3K4me3 pattern to identify promoter
regions. The Y-axis indicates the peak height and the X-axis indicates the genomic location; only those
genes significantly downregulated (red) or upregulated (blue) in the deleted cells are indicated below
the H3K4me3 track. (C) The H3K27Ac patterns in control and deleted cells are shown for the region
nearby E24 (red arrow), showing loss of H3K27Ac only at the E24 H3K27Ac peak.  












 

 

  94
 
Table 3.1 Altered gene expression upon enhancer deletion.
Enhancer
Total,number,of,
up,regulated,
genes,(>1.5,FC)
Upregulated,
genes,+/@,1,Mb
Total,number,of,
downregulated,
genes,,(>1.5,FC)
Downregulated,
genes,+/@,1,Mb,
Putative,direct,
targets,
Average,
RPKM,in,
Controls
Fold,
Change
Distance,from,
enhancer,to,
TSS
FAM84B 7.45 1.50 ,843,000
CCAT1 2.79 1.78 ,182,000
CASC8 2.33 1.55 81,000
MYC 112.3 2.31 335,000
PVT1 7.96 1.59 395,000
E7,,,,,,,,,,,,,,,,,,,,
(HEK293)
166 0 295 1 MYC 28.5 5.10 335,000
E24 137 2 189 1 RHPN2 9.73 1.44 47,400
18qE 13 0 3 0 N/A N/A N/A N/A
18qNE 14 0 3 0 N/A N/A N/A N/A
E7 590 0 565 5
 
The number of genes having >1.5 fold increase or decrease in expression in the genome and within +/-
1 Mb after deletion of each distal region is indicated.  Also shown are the names of the putative direct
target genes, the average RPKM values for these genes in the control cells, the fold decrease in the
enhancer-deleted cells, and the distance of these genes from the distal regions.  Note that RHPN2, the
gene in which E24 resides, was only decreased 1.44 fold but is shown here because no other nearby
genes were downregulated.
 

 
3.3.2 The size or presence of an H3K27Ac peak does not correlate with enhancer
activity
         As an approach to understanding the genome-wide changes in gene expression seen upon
deletion of E7 and E24, we also deleted a strong H3K27Ac peak on chromosome 18 (termed
18qE for “enhancer on chromosome 18q”).  The 18qE region has a more robust H3K27Ac peak in
HCT116 cells than does E7 (Figure 3.1A) and is covered by H3K27Ac in many cell types (see the
layered H3K27Ac track in Figure 3.1B). Similar to E7 (Figure 3.5), 18qE is bound by TCF7L2, a
site-specific transcription factor that has been implicated as the downstream regulator of the WNT
pathway that drives the development of colon cancer (Bright-­‐‑Thomas
  and
  Hargest
  2003,
 
Clevers
 2004,
 Clevers
 2006,
 Segditsas
 and
 Tomlinson
 2006,
 Gaddis,
 Gerrard
 et
 al.
 2015) and
by CTCF, a multi-functional protein that is thought to be involved in gene regulation and
chromosomal structure (Splinter,
 Heath
 et
 al.
 2006,
 Holwerda
 and
 de
 Laat
 2013,
 Ong
 and
 
Corces
 2014,
 Narendra,
 Rocha
 et
 al.
 2015,
 Vietri
 Rudan,
 Barrington
 et
 al.
 2015).  However, we

  95
 
note that the binding of TCF7L2 and CTCF is not as strong at 18qE as at E7. For comparison, we
also deleted another region on chromosome 18 (termed 18qNE for “no enhancer on chromosome
18q”) that does not have H3K27Ac or TCF7L2 binding but is bound by CTCF. Interestingly, we
found that deletion of the large 18qE enhancer had little effect on gene expression, causing
changes similar to those seen when the 18qNE region (which completely lacks H3K27Ac) was
deleted (Table 3.1).  

 
Figure 3.5 H3K27Ac, TCF7L2, and CTCF binding profiles of E7, E24, 18qE, and 18qNE.  
   Shown are the H3K27Ac, TCF7L2, and CTCF binding patterns at (A) the E7 enhancer, (B) the E24
enhancer, (C) 18qE, an intergenic H3K27Ac peak on chromosome 18, and (D) 18qNE, an intergenic
region on chromosome 18 that lacks the H3K27Ac mark. The Y-axis indicates the peak height and the
X-axis indicates the genomic location; the boxes indicate the region deleted by the CRISPR/Cas9
method.

The differential effects on the transcriptome seen when E7 vs. 18qE was deleted from
HCT116 cells suggest that the presence of a robust H3K27Ac peak might not reflect enhancer
activity, but that other, as of yet unknown characteristics of the region may be critical.  
Interestingly, although many cell types do show a strong H3K27Ac peak in the E7 region (see
Figure 3.1B), there is no H3K27Ac peak in HEK293 cells (Figure 3.6A). However, there is a very
strong CTCF binding site at E7 in HEK293 cells. To compare the effects of deleting a genomic
region that is marked by H3K27Ac in one cell type but not in another, we deleted E7 in HEK293
cells. We found that this region functioned as an enhancer in HEK293, causing changes in

  96
 
hundreds of genes, including MYC, which showed a robust 5-fold decrease in expression (Figure
3.6B and Table 3.1); a list of all downregulated genes in HEK293 E7-deleted cells is found in
Supplementary data file 3.1 and the chromosomal location of the top 25 downregulated genes is
shown in Supplementary figure 3.5.  Thus, the presence of an H3K27Ac signal is not required
for enhancer activity.

 
Figure 3.6 Gene expression and H3K27Ac changes upon deletion of E7 in HEK293 cells.
   (A) Shown are the H3K27Ac, CTCF, and TCF7L2 binding patterns at E7 in HEK293 cells. The Y-
axis indicates the peak height and the X-axis indicates the genomic location; the boxes indicate the
deleted region. (B) The Y-axis shows log2 fold changes in gene expression in the E7-deleted cells
throughout a +/- 5 Mb region from E7, with genes showing decreased expression upon enhancer
deletion having negative numbers and genes showing increased expression upon enhancer
deletion having positive numbers; only those genes significantly downregulated in the deleted cells
are indicated in red below the H3K4me3 track.


 

  97
 
3.3.3 Characterization of the genome-wide changes in gene expression upon
enhancer deletion
          As noted above, in addition to changes in expression of nearby genes, we found that
deletion of the E7 and E24 enhancer regions caused changes in expression of hundreds of other
genes located throughout the genome. To alleviate concerns that these changes were due to
specific methods of differential gene expression analysis and/or to artifacts due to the
CRISPR/Cas9 technology, we performed several different follow-up experiments. First, we
compared the effects on the transcriptome upon deletion of E7 vs. E24 in HCT116 cells. Although
deletion of each enhancer causes hundreds of genes to change in a statistically significant,
reproducible manner, the patterns of gene expression changes were unique to each enhancer.  
For example, we found that of the hundreds of downregulated genes, only 9 genes were
commonly downregulated more than 2-fold in the E7- vs E24-deleted cells (Supplementary data
file 3.1).  In addition, as described in detail in the methods section, all differential gene expression
analysis was performed using a set of control clones that were selected in the same way as were
the clones of enhancer-deleted cells (i.e. sorting via FACs for single colonies after transfection
with Cas9-GFP and the guide RNA vector).  Also, as described above, we show that deletion of
other genomic regions does not lead to genome-wide changes in gene expression; the 18qE and
18qNE clones have essentially identical gene expression profiles as do the control clones,
indicating that the robust changes in gene expression observed when E7 and E24 were deleted
were not general effects due to the CRISPR/Cas9 technology.  In addition, we note that we used
a different set of guide RNAs to delete E7 in HCT116 vs. HEK293 cells.  In both cases, we
observed widespread changes in gene expression and identified MYC as a target gene. The use
of two different sets of guide RNAs alleviates concerns that the effects on gene expression are
due to specific off target effects of the guide RNAs targeting E7.

  98
 
As another approach to characterizing the effects of deletion of specific enhancers, we
performed a genome-wide analysis of H3K27Ac in the E7- and E24-deleted cells. We analyzed
the H3K27Ac patterns within +/- 100 Kb of upregulated and downregulated genes in the E7- and
E24-deleted cells. We found that many of the downregulated genes in E7-deleted cells
(Supplementary figure 3.7A) and in E24-deleted cells (Supplementary figure 3.7C) had
decreased H3K27Ac peaks nearby. In contrast, many of the upregulated genes in E7-deleted
cells (Supplementary figure 3.7B) and E24-deleted cells (Supplementary figure 3.7D) had
increased H3K27Ac peaks nearby; see Supplementary data file 3.2 for the differential H3K27Ac
peak analysis. For example, in E7-deleted cells there was a large downregulation of DPEP1 and
ANXA10 mRNA, with concomitant decrease in the H3K27Ac marks near those genes
(Supplementary figure 3.7A).  In contrast, the H3K27Ac pattern was increased near upregulated
genes such as FGF9 and VSNL1 (Supplementary figure 3.7B). Similarly, in E24-deleted cells
there was a large downregulation of ARHGAP29 and NR2F1, with concomitant decreases in the
H3K27A marks (Supplementary figure 3.7C), whereas WNT11 and APOBEC3F showed
increases in RNA and increases in nearby H3K27Ac marks (Supplementary figure 3.7D). The
concomitant changes in expression and H3K27Ac patterns support the conclusion that the genes
identified as deregulated were not identified simply due to the method of RNA analysis.
 
3.3.4 Cell growth is affected by deletion of enhancer E7
         Deletion of E7 in both HCT116 cells and in HEK293 cells caused decreased expression of
MYC, a well-characterized oncogene (Diolaiti, McFerrin et al. 2014, Gabay, Li et al. 2014,
McKeown and Bradner 2014). Therefore, it is possible that deletion of the E7 enhancer affected
cell proliferation via downregulation of MYC protein. To test this hypothesis, we performed cell
proliferation and colony forming assays in control and E7-deleted HCT116 and HEK293 cells.  
HCT116 control and E7-deleted cells were plated at two different densities and grown for 72 or 96
hours. Cell proliferation (Figure 3.7A) and colony formation (Figure 3.7B) is reduced in E7-

  99
 
deleted cells, as compared to control cells. As shown in Supplementary figure 3.8, E7-deleted
cells also have morphological changes and increased contact inhibition. The HEK293 cells in
which the enhancer was deleted showed drastic changes in cell morphology, suggesting that the
ability to form large colonies was affected upon enhancer deletion (Figure 3.7C and
Supplementary figure 3.8).  Thus, deletion of the region containing rs6983267 caused down-
regulation of MYC, large changes in the transcriptome, and changes in cell proliferation in both
HCT116 and HEK293 cells, despite the fact that the epigenomic profile of E7 is quite different in
the two cell types.

  100
 

 
Figure 3.7 Proliferation is affected by deletion of enhancer 7.    
   (A) Shown are the relative cell numbers 72 hr after plating two different control clones or the E7 deleted
clone at 1 x10e4 cells (top panel) or 96 hr after after plating two different control clones or the E7
deleted clone at 2.5 x 10e3 cells (bottom panel). The Y-axis indicates relative cell numbers from an
average of 4 replicate wells, as measured by absorbance of products of the WST-1 assay. Student’s t-
test was used for significance and * indicates a p-value less than 0.005. (B) Shown are colony forming
assays for control and E7-deleted HCT116 cells. 1,500 cells were seeded per well and after two weeks
colonies were stained with crystal violet. At least 3 replicates were performed; additional replicates are
shown in Supplementary figure 3.10. (C) Shown are colony forming assays for control and E7-deleted
HEK293 cells. 6,500 cells were seeded per well and after two weeks colonies were stained with crystal
violet. At least 3 replicates were performed; additional replicates are shown in Supplementary figure
3.10.


 
3.3.5 4C analysis of E7 and E24 in HCT116
         Recent studies have begun to use looping assays to predict target genes of enhancers
(Yao,
  Berman
  et
  al.
  2015).
  Using the E7 enhancer as a bait, we performed 4C analyses and
observed extensive chromosomal interactions within a 100 Kb region near E7 (Figure 3.8A; see

  101
 
Supplementary figure 3.9 for genomic coordinates of baits and primer sequences). However, the
MYC promoter is more than 330 Kb from the E7 enhancer and few interactions were detected at
this location. Others have suggested that using a promoter as bait may provide more meaningful
4C results (van
 de
 Werken,
 Landan
 et
 al.
 2012).
 
  For example, others have shown that when
more than one alternative enhancer around HS-26 is engaged with the α1 gene promoter
(possibly in a mutually exclusive fashion), the resulting window statistics are higher when
assaying from the promoter toward the enhancer region than when assaying from any particular
enhancer in the group of enhancers toward the promoter. Because E7 is present nearby many
very large enhancers, it is possible that it may be easier to detect promoter-to-enhancer region
interactions than interactions between a specific enhancer within that region and the MYC
promoter. Thus, we also used the MYC promoter as bait in a 4C assay.  It should be noted that
the plotted contact intensities represent relative 4C-seq read counts for 5 Kb genomic windows
and reciprocal peak intensities observed when using the promoter vs. the enhancer as a bait do
not need to be identical. We found high interactions around the MYC promoter as well multiple
interactions in a 500 Kb region spanning from the MYC promoter to just past E7; this large region
is bounded by large H3K27Ac peaks on either side.  Interestingly, the major peak detected by 4C
using the MYC promoter as a bait is the same major peak detected using E7 as bait, suggesting
that perhaps E7 and MYC are brought together via common interaction with a H3K27Ac marked
region 100 Kb from E7.  We also used E24 as a bait in 4C assays and found multiple interactions
of E24 with an ~100 Kb region on either side of the enhancer, encompassing the RHPN2 gene
(Figure 3.8B).  

  102
 

 
Figure 3.8 4C analysis of enhancer E7  and E24.
Shown are the 4Cseq profiles of HCT116 cells for (A) a bait near the promoter region of MYC and a bait
at the E7 region and (B) a bait at the E24 region. Grey dots are normalized contact intensities within a
1Mb genomic range and the black line is the median of normalized intensities for 5 Kb running windows
with the 20
th
and 80
th
percentile indicated as a grey band. The medians of normalized contact intensities
of different sizes of windows from 2 Kb to 50 Kb were color-coded ranging from 1 (red) to 0.001 (grey).  
Significantly downregulated genes are shown above the 4C profiles. Also shown are the H3K27Ac,
CTCF, and TCF7L2 tracks for HCT116 cells, along with the ENCODE TF binding track.
 


 
3.4 Discussion
         Employing CRISPR/Cas9 nucleases, we deleted 2 CRC risk-associated enhancers (E7 and
E24) from the genome of HCT116 colon cancer cells and analyzed effects on the transcriptome
and epigenome.  In each case, we found widespread changes in expression and H3K27Ac
patterning in the enhancer-deleted cells. However, deletion of a robust H3K27Ac peak not
associated with CRC did not cause similar changes in gene expression. Interestingly, deletion of
the E7 region encompassing the CRC-associated risk SNP in HEK293 cells, at which there is no

  103
 
detectable H3K27Ac peak, also caused widespread changes in gene expression. Finally, we
show that deletion of the E7 region affects gene expression and cell proliferation in both HCT116
and HEK293 cells, likely due to downregulation of the MYC oncoprotein.
Large-scale GWAS efforts have identified sets of SNPs associated with a particular disease,
such as colon cancer.  If a SNP falls within an exon and changes the coding potential of that
gene, investigators have suggested that the identified gene is likely to be associated with
increased risk for that disease. However, most GWAS-identified SNPs do not fall within exons and
thus the mechanism by which these SNPs might affect disease has not been clear (Coetzee, Jia
et al. 2010, Freedman, Monteiro et al. 2011). Recent studies have shown that many of the GWAS
index SNPs and SNPs in high LD with the GWAS index SNPs fall within regulatory elements such
as promoters and enhancers. In the case of the SNPs falling close to transcription start sites, it
has been assumed that they cause changes in expression of the gene regulated by that promoter.
In the case of SNPs falling within enhancer regions (as defined by the H3K27Ac mark), it is more
difficult to understand how the SNP affects gene expression because enhancers can work at a
great distance, in either orientation, and often skip over the nearest TSS to regulate genes farther
away (Li, Ruan et al. 2012, Sanyal, Lajoie et al. 2012).  We reasoned that precise deletion of an
enhancer should identify genes regulated (directly and indirectly) by that enhancer, with nearby
downregulated genes being possible direct target genes. As a test of this experimental approach,
we deleted 2 CRC risk-associated enhancers from the human genome.  In each case, we found
that deletion of the CRC risk-associated enhancer caused reproducible changes in expression of
at least one gene within +/- 1 MB of the enhancer.  It is important to note that a promoter can be
regulated by several different enhancers; if a gene is regulated equally by two enhancers, then a
50% drop in RNA levels is what would be expected upon deletion of one of the enhancers. Most
of the nearby genes that were downregulated upon enhancer deletion were expressed at 20-65%
of their expression level in control cells, suggesting that the enhancers contributed to expression

  104
 
of these genes, but were not the sole regulatory elements controlling the activity of the target
promoters.  
We note that deletion of the CRC risk-associated enhancers not only affected nearby
genes but also affected the regulation of hundreds of other genes, some on the same
chromosome but very far from the enhancer and still more located on other chromosomes. There
have not yet been sufficient studies to know how far a gene can be from an enhancer (or even if it
can be on a different chromosome) and still be a “direct” target. One approach that is used to
identify direct targets is to use 3-dimensional looping assays. A recent study showed that 57% of
enhancer-promoter loops span more than 100 Kb (Jin, Li et al. 2013). Accordingly, our 4C studies
identified interactions that spread between the MYC promoter and a 500 Kb region encompassing
E7, suggesting that the MYC promoter is a direct target of E7 even though it is quite far away.
These results are supported by a previous study in which the E7 enhancer region was shown to
interact with the MYC and PVT1 promoters in colon cancer cells (Pomerantz, Ahmadiyeh et al.
2009, Wright, Brown et al. 2010). Studies in other cell types have also provided evidence that E7
interacts with the MYC gene (Ahmadiyeh, Pomerantz et al. 2010, Dryden, Broome et al. 2014, Du,
Yuan et al. 2015, Jager, Migliorini et al. 2015).  Interestingly, the major peak we detected by 4C
using the MYC promoter as a bait is the same major peak detected using E7 as a bait, suggesting
that perhaps E7 and MYC are brought together via common interaction with an H3K27Ac-marked
region 70 Kb from E7.  We also observed a reduction in the lncRNA CCAT1 in the E7-deleted
HCT116 cells. Others have suggested that reduction of CCAT1 can reduce long-range
interactions between the MYC promoter and its enhancers (Xiang,
  Yin
  et
  al.
  2014). Thus, it is
possible that direct downregulation of CCAT1 (which is located 182 Kb from E7) caused an
indirect downregulation of MYC.  However, looping assays should be interpreted with caution, as
different assays give different results (Williamson, Berlivet et al. 2014). Also, it is not yet known if

  105
 
all loops represent functional enhancer-promoter interactions or if some loops may play other
roles, such as structural components of chromatin (Yao,
 Berman
 et
 al.
 2015).
It is likely that most of the altered (indirect) gene regulation in the E7-deleted cells is
initiated by reduction of MYC RNA, followed by changes in the MYC regulatory network.
Deregulated expression of MYC expression is a hallmark feature of many types of cancer and
frequently predicts a poor patient outcome (Conacci-Sorrell, McFerrin et al. 2014, Gabay, Li et al.
2014).  Accordingly, there is a strong push to develop therapeutic inhibitors of MYC function
(McKeown and Bradner 2014).  Perhaps inactivation of cell type-specific enhancers that regulate
MYC expression is an alternate to developing chemotherapeutic agents that target MYC itself.
Interestingly, we have shown changes in cell proliferation upon deletion of E7 in both HCT116 and
in HEK293 cells. Similarly, Sur et al. deleted a region homologous to E7 in the mouse genome
and observed slight changes in Myc expression in the colon, but a significant change in tumor
formation in the colons of Apc
min
mice (Sur, Hallikas et al. 2012). In addition, we have observed
that E7-deleted cells are very difficult to trypsinize from the cell culture plates  (data not shown).
Others have previously reported that MYC is involved in suppressing cell adhesion molecules
(Coller, Grandori et al. 2000) and we find that many integrin molecules are significantly
upregulated at least 2-fold in E7-deleted cells (e.g. ITGA3, ITGA4, ITGA5, and ITGA6 in HCT116
and ITGA3, ITGA7, and ITGAV in HEK293; see Supplementary data file 3.1). These molecules
are not upregulated in the other deleted cells. We note that both MYC and PVT1 RNAs are
downregulated in HCT116 cells after deletion of E7. Tseng et al. (Tseng, Moriarity et al. 2014)
showed that PVT1 RNA levels and MYC protein expression correlate in primary human tumors
and, in HCT116 cells, loss of PVT1 results in reduced MYC protein, reduced proliferation, and
impaired colony formation in soft agar. Therefore, the reduction in PVT1 RNA that occurs upon
deletion of E7 may cause an even larger effect on MYC protein levels, and more effects on the
MYC regulatory network, than would be expected by the changes in MYC RNA alone.  

  106
 
We do not yet have a complete understanding of the regulatory networks affected in E24-
deleted cells. Our 4C experiments detected interactions within a 100 Kb region around E24 that
did encompass the RHPN2 TSS, which is 50 Kb from E24, and thus RHPN2 may be a direct
target of E24. RHPN2 is a Rho GTPAse that may be involved in the organization of the actin
cytoskeleton. As noted above, high expression of RHPN2 has been associated with various
cancers (Danussi, Akavia et al. 2013) and it is possible that altered regulation of the cytoskeleton
by reduction of RHPN2 could initiate changes in regulatory networks, causing the wide-spread
changes in gene expression that we observed in E24-deleted cells. We also note that UBA2, also
known as SAE2, is the next nearest down-regulated gene. UBA2 is involved in sumolyation of
proteins and a recent study has shown that cells overexpressing MYC have proliferation defects
when UBA2/SAE2 levels are reduced (Kessler,
  Kahle
  et
  al.
  2012).  The authors suggest that
UBA2/SAE2 enables cells to tolerate the MYC hyperactivation that occurs in tumors and propose
that altering distinct subprograms of MYC-mediated transcription by SAE2 inactivation could be
exploited as a therapeutic strategy in MYC-driven cancers. Thus, reduction of UBA2 could lead to
changes in expression of many genes located throughout the genome. Although, our 4C results
using E24 as bait did not identify interactions near UBA2, this could be because the distance to
UBA2 is well outside of the distance thought to be detectable by 4C (van
 de
 Werken,
 Landan
 et
 
al.
  2012).  Most 4C-seq analyses have good reproducibility within 500 Kb from the bait because
high coverage near bait regions results in statistically robust contact profiles compared to far away
regions on the same chromosome or regions on other chromosomes where reproducibility is
lower because of less coverage
 (Raviram,
 Rocha
 et
 al.
 2014). Therefore, due to limitations of the
4C-seq assay, we cannot rule out direct targets that are even further away than UBA2 and/or on
other chromosomes; we note that many transcription factors are up- or downregulated in the E24-
deleted cells and any of these could be responsible for altering a regulatory network.  

  107
 
         We have used the CRISPR/Cas9 technology to delete 2 enhancers that harbor SNPs
associated with an increased risk for CRC.  We show that loss of these enhancers causes
changes in expression of hundreds of genes. However, we also show that the size of the
H3K27Ac peak does not necessarily correlate with the effect of a distal element on gene
regulation. Our results are supported by a recent study in which 2000 predicted enhancers were
analyzed for activity in a reporter assay. The investigators found that enhancer fragments having
“weaker” H3K27Ac signals can drive expression as well as, if not better than, enhancer fragments
having “stronger” H3K27ac signals (Kwasnieski, Fiore et al. 2014). Clearly, further experiments
are required before the activity of distal regulatory elements can be accurately predicted.  

 
3.5 Methods
3.5.1 Cell culture
The human cell lines (control and enhancer-deleted versions) HCT116 (ATCC #CCL-247) and
HEK293 (ATCC #CRL-1573) were grown at 37°, in 5% CO2 in Dulbecco’s Modified Eagle
Medium (DMEM) with 10% fetal bovine serum and 1% penicillin and streptomycin.  
3.5.2 CRISPR/Cas9-mediated genome editing
The guide RNAs (gRNAs) flanking the target enhancer regions were designed using a website
tool (http://crispr.mit.edu), avoiding repeat regions in the hg19 genome. After identification of a
potential guide RNA, the 16-17 nucleotide region including the PAM sequence (NGG) was
BLASTed against the hg19 genome to confirm that it was unique in the genome; all guide RNAs
used in this study also did not have a 1 mismatch sequence in the human genome. The target
DNA sequences of the gRNAs used in this study are listed in Supplementary figure 3.1. The 100
bp oligonucleotides containing the gRNA sequences were inserted into the gRNA Empty Vector
(Addgene, catalog#41824) according to the gRNA synthesis protocol (Mali, Yang et al. 2013).  

  108
 
The sequences of gRNA expression plasmids were confirmed through Sanger sequencing. To
delete an enhancer, two gRNA plasmids and a plasmid expressing Cas9-GFP (Addgene, catalog
#44719) were transiently transfected into HCT116 or HEK293 cells in a 6-well plate with a Cas9:
gRNAs molar ratio of 1:22) using Lipofectamine 3000 (Life technologies, catalog #3000008).
Genomic DNA was extracted from the transfected cells 48 hours post transfection using the
QIAamp DNA mini kit (Qiagen, catalog# 51306). Subsequently, PCR using primers flanking the
target enhancer regions was performed to check the deletion efficiency. Once enhancer deletion
was confirmed in a pool of transfected cells, cells with high GFP expression were identified using
fluorescence-activated cell sorting with the Aria II cell sorter (BD Biosciences). Sorted cells were
plated into individual wells of a 24 well plate and then re-plated as single cells in 10cm dishes and
subsequently expanded for further analyses.
 
3.5.3 PCR detection of cells having enhancer deletions
Clonal colonies from the 10cm dishes were transferred and passaged into 24-well plates. When
cells are 80-90% confluent, genomic DNA was isolated from each clone using the QIAamp DNA
mini kit (Qiagen, catalog# 51306) and tested for deletion by PCR using GoTaq Green Master Mix
(Promega, catalog#M712) and primers flanking the target enhancer region. For those colonies
that showed loss of the enhancer region, a second PCR was performed using primers that should
detect the inner portion of the enhancer. In general, less than 10% of the clones tested after
sorting for high GFP levels showed biallelic deletion. For the CRC-associated enhancers,
H3K27Ac ChIP-seq data was also used to confirm loss of the enhancer signal (Supplementary
figure 3.2).  

 
3.5.4 RNA-seq
         RNA samples were collected from cells using Trizol (Life Technologies, catalog #15596018)
from clonal population. To remove batch effects, matched controls and deleted samples were

  109
 
plated with similar confluency and harvested at the same time, RNA was extracted at the same
time, RNA libraries were prepared at the same time, and barcoded libraries were pooled and
sequenced together (see Supplementary figure 3.3). All RNA libraries were prepared using the
Illumina TruSeqV2 Sample Prep Kit (catalog #15596-026), starting with 1 µg total RNA. ERCC
spike-in mix (Thermo Fisher, catalog # 4456740) was added when the libraries were made so that
quality assessment could be performed on the RNA-seq data, which allowed the removal of
outliers caused by technical variations. Libraries were sequenced on a Nextseq500 with 75 bp
single reads. Raw reads were trimmed using the Quality Score method (minimal quality score 20,
minimal read length 25, trimming from both ends)
  and mapped to hg19 (Ensembl 72) using
Tophat2 (Trapnell, Pachter et al. 2009) installed in the Partek Flow version 3 program (Partek
Inc., St Louis, MO, USA). A matrix of raw fragments counts for each gene was generated from
alignment files using HTSeq python package [53] with Genecode V19 annotation and these
counts were used for differential gene expression analysis using an edgeR program [54]. For the
group of large datasets (Set A; see Supplementary figure 3.3), where there are 12 controls vs. 3
samples, genes with at least 1 counts per million (CPM) in at least 6 samples were kept for
differential gene expression analysis. For groups of small datasets (Sets B and C;
Supplementary figure 3.3), consisting of 2 controls vs. 2 samples, genes with at least 1 CPM in
at least 2 samples were kept for differential gene expression analysis. In dataset A, after removing
lowly counted genes, upper-quartile normalization was used for the GLM approach in edgeR
which takes into account other covariates (e.g. date of RNA extraction; see Supplementary
figure 3.4) found in the PCA plots.  For datasets B and C, filtered gene counts were normalized
using the Trimmed Mean of M-value (TMM) method for the negative binomial model in edgeR.
Among the differentially expressed genes with an FDR <0.05, lowly expressed genes were filtered
out if the average CPM or average RPKM values (quantified using Partek software) was less than
2 in the controls for downregulated genes or 2 in the deleted samples for upregulated genes.

  110
 
Finally, genes with greater than 1.5-fold change were defined as differentially expressed genes
(i.e. the numbers shown in Table 3.1).  

 
3.5.5 Cell proliferation assays
        Controls and E7-deleted HCT116 cells were counted by sorting single cells using a flow
cytometer and then plated into a 96-well plate at a density of 1 x 10
4
or 2.5 x 10
3
cells

per well.
After 72 hours or 96 hours, cells were rinsed with 1X PBS and then 100ul of PBS mixed with 10ul
of WST-1 reagent (Roche, catalog # 05015944001) was added to each well. Cell proliferation was
measured using a microplate reader (HIDEX Chameleon V4.43) after incubation for 15 minutes at
37°C.  The wavelength for measuring the absorbance of formazan, the product of WST-1 assay,
was 450nm.

 
3.5.6 Colony forming assays
         HCT116 cells and HEK293 cells were counted by sorting single cells using a flow cytometer
and were seeded in 6-well plates at a density of 1.5 x 10
3
cells per well and 6.5 x 10
3
cells per
well, respectively. After a 2 week incubation, cells were fixed with 100% methanol for 10 minutes,
stained with 0.5% crystal violet for 30 minutes, and colonies were assessed after rinsing the
plates with water.
 
3.5.7 ChIP-seq and analysis
         For Figure 3.1, H3K27Ac ChIP-seq data from HCT116 cells (ENCODE accession number
ENCSR000EUT) was downloaded from the UCSC genome browser and analyzed using the Sole-
search ChIP-seq peak calling program (Blahnik, Dou et al. 2010, Blahnik, Dou et al. 2011) using
the following parameters (Permutation:5; Fragment:250; AlphaValue: 0.00010 = 1.0E-4; FDR:
0.00010 = 1.0E-4; PeakMergeDistance:0; HistoneBlurLength:1200). For Figure 3.3 and Figure
3.4, H3K27Ac ChIP-seq samples for control and enhancer-deleted HCT116 cells were prepared

  111
 
using an H3K27Ac antibody (Active motif catalog#39133, Lot#21311004), as previously described
(O'Geen, Echipare et al. 2011), with minor modifications (complete protocol available upon
request).  These ChIP-seq libraries were sequenced on a HiSeq2000 with 50 bp single end reads.
For Supplementary figure 3.6, the TCF7L2 ChIP  and CTCF ChIP-seq samples in control and
E7-deleted HCT116 cells were prepared using a TCF7L2 antibody (CST, catalog #C48H11, lot#2)
or a CTCF antibody (Active motif, catalog #61311, lot# 23913002).   The ChIP-seq libraries were
sequenced on a NextSeq500 with 75 bp single end reads. All ChIP-seq  FASTQ files were
mapped to hg19 using BWA (default parameters). To identify decreased or increased H3K27Ac
peaks in enhancer-deleted cells (Supplementary figure 3.7), two biological replicates for each
group were used for differential peak analysis. A peak-calling prioritization pipleline (PePr) was
used to identify significant differential binding sites of H3K27Ac caused by enhancer deletions
(Zhang,
  Lin
  et
  al.
  2014); default parameters for histone marks were used for analysis and
differential peaks having a p-value less than 1e
-5
are reported in  Supplementary figure 3.7.

 
3.5.8 4C-seq and analysis
         4C-seq was performed as previously described (Simonis,
  Kooren
  et
  al.
  2007,
  van
  de
 
Werken,
 de
 Vree
 et
 al.
 2012) with the following modifications. 10 million cells were harvested and
cross-linked with 1% formaldehyde for 10 minutes at room temperature and cells were lysed to
obtain nuclei. Digestion of nuclei started with adding 200 units of DpnII (NEB, catalog # R0543L)
for 4 hours, followed by overnight incubation with another 200 units of DpnII, and then a third
addition of 200 units of DpnII was added the next day for a 4 hour incubation. DpnII was then
inactivated by heat, followed by ligation using 50 units of T4 ligase (Roche, catalog #
10799009001) overnight at 16˚C in a cold room. After confirming high efficiency of ligation, cross-
linked were reverse at 65˚C overnight. Phenol-chloroform was used to isolate circularized ligation
products, followed by digestion with 50 units of Csp61 (Thermo, catalog # ER0211) at 37˚C

  112
 
overnight to create trimmed circular ligation products. 200ng of DNA was used for inverse PCR
using the Expand Long Template PCR system (Roche, catalog #
  11681842001) to create a 4C
library for each bait. Bait and primer information is in Supplementary figure 3.9. PCR products
were purified using the Roche High Pure PCR Products Purification Kit and further purification
using the Qiagen purification kit. Samples were multiplexed and sequenced on a HiSeq2500
(Illumina) using 65 read length. 4C FASTQ files were trimmed 10bp from 3’ end and were
analyzed using the 4Cseqpipe program (van
 de
 Werken,
 Landan
 et
 al.
 2012). 4Cseqpipe was
used to extract sequences from a FASTQ file based on provided bait-specific primer sequences.
Primers were removed from the reads and 31bp were mapped to the hg19 genome using the
4Cseqpipe mapper. The number of mapped reads are listed in Supplementary figure 3.3.
Normalized contact profiles were generated with for each bait using the following parameters:
genomic range for normalized trend : 1Mb; median resolution: 5Kb window size.  






 
 
 
 

 

  113
 
 
 
 
3.6 Supplementary figures for chapter3




 
Supplementary figure 3.1 Guide RNA sequences.
Shown are the size of the deleted regions, the genomic locations of the enhancers, and the target DNA for
the guide RNAs. Two different sets of guide RNAs were used to delete E7 in HCT116 vs. HEK293 cells.  



























  114
 





 
Supplementary figure 3.2 Confirmation of enhancer deletions.
The regions corresponding to the deleted sequences for each enhancer are indicated as bars; the
different colors refer to different pairs of guide RNAs used. There are 3 copies of E7 in HCT116 cells
and thus there are 3 bars shown for each clone; the other enhancers are in diploid regions of the
genome. (Panel A) deletion of all E7 alleles was confirmed by PCR (data not shown) and the loss of the
enhancer was confirmed through ChIP-seq; (Panel B) biallelic deletion was confirmed by PCR (data not
shown) for E24 and the loss of the enhancer was confirmed through ChIP-seq; (Panel C and D)
deletion was confirmed for 18qE and 18qNE by PCR using primers flanking each enhancer and internal
to each enhancer; (Panel E) deletion was confirmed for E7 in HEK293 by PCR.
 


  115
 





 
Supplementary figure 3.3 List of datasets.  
Shown is information relevant to the RNA-seq, ChIP-seq, and 4C-seq experiments performed for this study.
 


1.#Study#design#of#RNA3seq
A.# Date#of#sequencing Sequencer Cell#line Sample#type Clone##
Replicate#name#(given#by#
RNA#extraction#date)
Sample#name USC#ID
##of#
mapped#
reads
A C1_A USC_RNA_56 16,835,185
B C1_B USC_RNA_57 16,136,616
C C1_C USC_RNA_58 12,973,689
A C2_A USC_RNA_59 17,849,912
B C2_B USC_RNA_60 16,536,346
C C2_C USC_RNA_61 16,479,259
A C3_A USC_RNA_62 16,880,730
B C3_B USC_RNA_63 14,510,306
C C3_C USC_RNA_64 12,879,816
A C4_A USC_RNA_65 13,674,375
B C4_B USC_RNA_66 14,736,300
C C4_C USC_RNA_67 15,768,373
A E7_A USC_RNA_76 18,330,669
B E7_B USC_RNA_77 17,464,123
C E7_C USC_RNA_78 13,928,003
A E24_A USC_RNA_87 15,024,023
B E24_B USC_RNA_88 18,766,076
C E24_C USC_RNA_89 17,901,475
1 A NE1_A USC_RNA_68 14,051,545
2 B NE2_B USC_RNA_70 17,159,364
3 A NE3_A USC_RNA_72 16,559,866
B.# Date#of#sequencing Sequencer Cell#line Sample#type Clone##
Replicate#name#(given#by#
RNA#extraction#date)
Sample#name USC#ID
##of#
mapped#
reads
1 C1 USC_RNA_48 24,680,074
2 C2 USC_RNA_49 22,926,469
1 18qE_1 USC_RNA_52 24,837,265
2 18qE_2 USC_RNA_53 23,033,168
C. Date#of#sequencing Sequencer Cell#line Sample#type Clone##
Replicate#name#(given#by#
RNA#extraction#date)
Sample#name USC#ID
##of#
mapped#
reads
1 HEK_293T_C1 USC_RNA_40 28,851,361
2 HEK_293T_C2 USC_RNA_41 26,748,220
1 HEK_293T_E7_1 USC_RNA_42 35,279,209
2 HEK_293T_E7_2 USC_RNA_43 31,865,398
2.#Study#design#of#ChIP3seq
Date#of#sequencing# Sequencer Cell#line Antibody#used Sample#type# Biological#replicate## Sample#name USC#ID
##of#
mapped#
reads
9/9/14 1 H3K27Ac_Control_1 USC646 29,859,232
12/5/14 2 H3K27Ac_Control_2 USC659 27,282,365
9/9/14 1 H3K27Ac_E7Del_1 USC652 26,250,803
12/5/14 2 H3K27Ac_E7Del_2 USC660 31,175,944
9/9/14 1 H3K27Ac_E24Del_1 USC648 26,389,565
12/5/14 2 H3K27Ac_E24Del_2 USC664 28,661,150
Control 1 TCF7L2_Control_1 USC731 50,016,717
1 TCF7L2_E7Del_1 USC749 36,804,109
2 TCF7L2_E7Del_2 USC750 38,985,416
1 CTCF_Control1 USC726 28,327,919
2 CTCF_Control2 USC727 33,291,505
E7DDeletionD 1 CTCF_E7Del_1 USC728 30,044,462
3.#Study#design#of#4C3seq
Date#of#sequencing# Sequencer Cell#line Sample#type# Bait#name Biological#replicate## Sample#name USC#ID
##of#
mapped#
reads
MYC 1 4C_Control_MYC USC_4C_2
E7 1 4C_Control_E7 USC_4C_18
E24 1 4C_Control_E24 USC_4c_14 416337
4
1
1
Nextseq500 7/2/15
ControlDDDDDDDDDDDDDDDDDDDDDDDDDDD
E7DDeletionDDDDDD
E24DDeletionD
18qNEDDeletionD
HCT116
1
2
3
C
D 1/27/15 Nextseq500 HEKD293T
Control
E7DDeletion
5/5/15 Nextseq500 HCT116
18qEDDeletion
Control
7/31/15 HCT116 Control HiSeq2500
Control
E7DDeletion
E24DDeletion
E7DDeletionD
Control
5/13/15
HCT116
Nextseq500
Hiseq2000 H3K27Ac
TCF7L2
CTCF

  116
 

 
Supplementary figure 3.4 PCA plots of RNA-seq data.
S Shown are principal components PC1 (x-axis) and PC2 (y-axis) for RNA-seq datasets for (A) control vs
E7-deleted HCT116 cells, (B) E24-deleted HCT116 cells, (C) 18qNE-deleted HCT116 cells, (D) 18qE-
deleted HCT116 cells, and (E) E7-deleted HEK293 cells.



  117
 

 
Supplementary figure 3.5 Circos plots for top downregulated genes.
The circos plots show the chromosomal location of the top 25 downregulated genes (> 2 fold) for each
enhancer deletion (green for protein-coding genes and orange for non-coding genes, with the
thickness of the line indicating the extent of the fold-change; for 18qE and 18qNE only 3
downregulated genes were identified and for 18qNE these downregulated genes were decreased less
than 2 fold.
 

 

  118
 

 
Supplementary figure 3.6 Comparison of TCF7L2 and CTCF binding patterns upon E7
deletion in HCT116.
(A) Shown are the H3K27Ac, TCF7L2, and CTCF peaks within a 3 Mb region near E7, with
downregulated genes indicated in red. Expanded views of two super-enhancer regions are shown
below; deletion of E7 reduced H3K27Ac and TCF7L2 binding in these regions. (B) For
comparison, shown are the H3K27Ac, TCF7L2, and CTCF peaks near an upregulated gene
indicated in blue;  deletion of E7 increased H3K27Ac and TCF7L2 binding in this region.

 

  119
 

 

 

 

 

  120
 

 

 
Supplementary figure 3.7
H3K27Ac and RNA-seq profiles for genes deregulated in E7- and E24-deleted cells. Shown is an
analysis of genes downregulated upon E7 deletion (A), upregulated upon E7 deletion (B),
downregulated upon E24 deletion (C) and upregulated upon E24 deletion (D). For each set of
analyses panel 1 shows which genes downregulated or upregulated greater than 1.5-fold have
decreased or increased peaks within +/- 100Kb from their TSS; panels 2 and 3 show the
H3K27Ac tracks and RNA-seq data in the control and deleted cells for specific genes
 

  121
 

 

Supplementary figure 3.8 Proliferation assays.
(A) Replicates of colony forming assays in control and E7-deleted HCT116 cells. (B) Photographs
of HCT116 vs E7-deleted cells showing that cells was gained a more fibroblast-like morphology
upon deletion of E7 in HCT116 cells. (C) Replicates of colony forming assays in control and E7-
deleted HEK293 cells.
 

   
   
 


 
      Supplementary figure 3.9 4C-seq information.  
      Shown are the bait information and primer sequences for the 4C-seq experiments.

 

 




 

  122
 
4 Chapter4: Conclusions and future studies

 
4.1 Summary of main findings
         In the post-GWAS era, the conundrum of GWAS is to determine the functional relevance
of SNPs that are statistically associated with diseases or traits. Unlike rare mutations that disrupt
coding regions of proteins, GWAS-identified SNPs (and associated high LD SNPs or fine
mapped SNPs) are mostly found in non-coding regions of the genome, making it difficult for to
understand how they influence risk of a disease. However, with the efforts of ENCODE and
REMC, it has been found that non-coding regions are decorated by regulatory elements
(promoters, enhancers, and  nuclear structure-associated elements) and GWAS-identified SNPs
(index SNPs), as well as their high LD SNPs, are enriched in these regulatory elements. I believe
that characterizing the function of an entire regulatory element should be performed prior to
investigating the functions of a specific SNP, and therefore I chose to study the function of
enhancers that harbor CRC risk-associated SNPs.  
         For my thesis work, I used genomic and epigenomic information to prioritize CRC risk-
associated SNPs, prioritized enhancers that harbor the SNPS, then performed experimental
analyses of these enhancers using genome editing (CRISPR/Cas9) and a looping assay (4C) to
identify target genes of the enhancers. CRC risk-associated enhancers were chosen by
overlapping refined SNPs (identified using LD calculations of colon cancer index SNPs) with the
active enhancer mark H3K27Ac in HCT116 colon cancer cells. Among the 28 prioritized
enhancers, I deleted 2 CRC risk associated enhancers, termed E7 and E24, that are in the
intronic regions of the non-coding gene CASC8 and the coding gene RHPN2, respectively. Each
deletion resulted in hundreds of gene expression changes, including the genes in which th
enhancers were located. However, the effects on the transcriptome were not correlated with the
size of the enhancers. Deletion of E7, which corresponds to a small H3K27Ac peak, caused
large-scale changes in gene expression, resulting in decreased expression of many nearby

  123
 
genes. Deletion of E24, which corresponds to a large H3K27Ac peak, caused changes in
hundreds of genes, only one of which (RHPN2) was within 1 Mb. As a comparison, I showed that
deletion of a robust H3K27Ac peak not associated with CRC had minimal effects on the
transcriptome. Interestingly, although there is no H3K27Ac peak in HEK293 cells in the E7
region, deletion of this region in HEK293 cells decreased expression of several of the same
genes that were downregulated in HCT116 cells, including the MYC oncogene. Accordingly,
deletion of E7 causes changes in cell proliferation in HCT116 and HEK293 cells. Using a 4C
looping assay, I confirmed that E24 has a physical interaction with the RHPN2 promoter,
supporting the conclusion that RHPN2 is a direct target gene of E24. Also I showed many
interactions between the E7 region and the MYC gene, consistent with the outcomes of deletion
of E7, which affects expression of many nearby genes. Importantly, I showed that deletion of E7
has affects cell proliferation, as would be expected if it regulated MYC.  In conclusion, by
identifying genes regulated by CRC-associated enhancers harboring SNPs, I have developed a
general approach to connect risk loci to putative risk genes.

 

 

 
4.2 Future directions

 
4.2.1.Epigenome editing of risk-associated enhancers

 
I found that obtaining complete deletion of the risk enhancers was quite difficult, especially when
the enhancer was located within an amplified region in a cancer cell line.  Therefore, instead of
deleting enhancers (which requires Cas9 to cut simultaneously at two sites, followed by laborious
clonal selection for completely deleted alleles) I propose that a better method may be to activate
or repress the enhancers, which should also lead to expression changes of their target genes. By
using epigenetic toggle switches consisting of dCas9-LSD1 or dCas9-P300core along with single
gRNA, an enhancer could be repressed or activated, respectively (see details in Chapter 1). This

  124
 
method would be especially important for studying risk-associated enhancers that are highly
active in normal cells but repressed in cancer cell lines.  It is difficult to study normal-specific
enhancers because of the lack of easily engineered normal cell lines. When studying these
enhancers in cancer cells, expression of the target genes is already low, making it difficult to
observe significant changes in target gene expression upon enhancer deletion.. For example,
CRC risk enhancer E21 (Figure 4.1) is repressed in HCT116 cells as compared to normal
sigmoid colon tissue, and a recent study using luciferase assays showed that the risk haplotype
of this enhancer has lower enhancer activity than does the non-risk haplotype. However, using
dCas9-P300core along with a E21-specific gRNA (or several gRNAs that tile through the
enhancer region), the target genes of E21 could be activated. Using 4C, differences in
interactions between E21 and the target genes could be evaluated before and after epigenome
editing.  

 
Figure 4.1 Repressed CRC risk enhancer E21



  125
 

4.2.2. Targeting combinatorial enhancers

 
Given the fact that several enhancers could effect the same target gene, I propose that several
risk associated enhancers in the same risk loci should be deleted simultaneously., This should
be feasible either by deleting one very large region (up to 1 MB have been successfully deleted,
ref) or by deleting several smaller regions corresponding to each enhancer in the region.  
Alternatively, the enhancers could all be repressed using epigenomic toggle switches.  The
effects on the transcription upon deletion or inactivation of multiple enhancers could then be
compared to the results obtaining when deleting each risk-associated enhancer individually.
 
4.2.3. Targeting SNP using HDR mechanism

 
For the risk-associated enhancers that show functional relevance to CRC, for example, E7 which
regulates the MYC oncogene and whose deletion results in reduced cell proliferation and less
colony formation, I propose that a comparison of the risk and non-risk alleles should be made.  
The SNPs that are located in the E7 enhancer can be changed using single gRNA, Cas9 and
ssODN. After changing each SNP (or changing several SNPs together that are in the same
haplotype), I could compare the effects of the different alleles on target gene expression. Also,
by performing ChIP-seq for the TFs that bind to motifs containing the SNPs, I could investigate
the functional relevance of the SNPs on differential TF binding.  

4.2.4 . Disease-related functional assays.

 
The putative target gene of E7 is MYC, a well-known oncogene. However, the putative target
gene of E24 is RHPN2 which has not been previously associated with colon cancer. However, a
recent study showed that ectopic expression of RHPN2 promotes an invasive phenotype in

  126
 
neural stem cells and astrocytes and RHPN2 amplification correlates with a highly aggressive
phenotype that directs the worst clinical outcomes in patients with glioblastoma(Danussi, Akavia
et al. 2013). Therefore, I propose to inactivate RHPN2 using epigenetic toggle switches directed
to the RHPN2 promoter or knockout the RHPN2 gene using CRISPR/Cas9 to test whether
RHPN2 is a candidate oncogene in colon cancer.  Then, cell proliferation assays could be
performed in control vs. RHPN2 depleted conditions. However perhaps a mouse model is a
better system for testing the phenotypic effects of risk-associated enhancers or risk genes. Even
though I have shown that deletion of E7 led to reduced proliferation of colon cancer cells,
differences in cell proliferation rates were hard to detect when cells were grown at a high density.
This supports the argument that cancer cell lines are not the best systems to see the phenotypic
changes because of their genomic instabilities and abnormalities in karyotype.. Sur et al. showed
that the incidence of polyp formation was markedly reduced when a normal mouse lacking the
homologous E7 region as I studied in human cells was crossed to a mouse that spontaneously
develops tumors in the intestine and colon. This result supports the idea that a mouse model
may be a better system to see more accurate phenotypic consequences of SNPs, regulatory
elements that have SNPs, or putative causal target genes. Therefore, I propose that E24 be
tested in the mouse colon cancer model system.





 


  127
 
Appendix A: Publications as a contributing author
Cell type-specific binding patterns reveal that TCF7L2 can be tethered to the
genome by association with GATA3  

 
Seth F, Wang R, Yao L, Tak YG, Ye Z, Gaddis M, Witt H, Farnham PJ, Jin VX. Cell type-
specific binding patterns reveal that TCF7L2 can be tethered to the genome by
association with GATA3. Genome Biol. 2012;13(9):R52. PMCID: PMC3491396  
I contributed to this publication by performing ChIP-seq experiments revealing an association
between TCF7l2 and GATA3 as part of a project focusing on cell type-specific binding of
TCF7L2.  

 

 

 

 

  128
 

 

 
RESEARCH Open Access
Cell type-specific binding patterns reveal that
TCF7L2 can be tethered to the genome by
association with GATA3
Seth Frietze
1†
, Rui Wang
2,3†
, Lijing Yao
1
, Yu Gyoung Tak
1
, Zhenqing Ye
3
, Malaina Gaddis
1
, Heather Witt
1
,
Peggy J Farnham
1*
and Victor X Jin
3*
Abstract
Background: The TCF7L2 transcription factor is linked to a variety of human diseases, including type 2 diabetes
and cancer. One mechanism by which TCF7L2 could influence expression of genes involved in diverse diseases is
by binding to distinct regulatory regions in different tissues. To test this hypothesis, we performed ChIP-seq for
TCF7L2 in six human cell lines.
Results: We identified 116,000 non-redundant TCF7L2 binding sites, with only 1,864 sites common to the six cell
lines. Using ChIP-seq, we showed that many genomic regions that are marked by both H3K4me1 and H3K27Ac are
also bound by TCF7L2, suggesting that TCF7L2 plays a critical role in enhancer activity. Bioinformatic analysis of the
cell type-specific TCF7L2 binding sites revealed enrichment for multiple transcription factors, including HNF4alpha
and FOXA2 motifs in HepG2 cells and the GATA3 motif in MCF7 cells. ChIP-seq analysis revealed that TCF7L2 co-
localizes with HNF4alpha and FOXA2 in HepG2 cells and with GATA3 in MCF7 cells. Interestingly, in MCF7 cells the
TCF7L2 motif is enriched in most TCF7L2 sites but is not enriched in the sites bound by both GATA3 and TCF7L2.
This analysis suggested that GATA3 might tether TCF7L2 to the genome at these sites. To test this hypothesis, we
depleted GATA3 in MCF7 cells and showed that TCF7L2 binding was lost at a subset of sites. RNA-seq analysis
suggested that TCF7L2 represses transcription when tethered to the genome via GATA3.
Conclusions: Our studies demonstrate a novel relationship between GATA3 and TCF7L2, and reveal important
insights into TCF7L2-mediated gene regulation.
Background
The TCF7L2 (transcription factor 7-like 2) gene encodes a
high mobility group box-containing transcription factor
that is highly up-regulated in several types of human can-
cer, such as colon, liver, breast, and pancreatic cancer
[1-4]. Although TCF7L2 is sometimes called TCF4, there
is a helix-loop-helix transcription factor that has been
given the official gene name of TCF4 and it is important,
therefore, to be aware of possible confusion in the litera-
ture. Numerous studies have shown that TCF7L2 is an
important component of the WNT pathway [3,5,6].
TCF7L2 mediates the downstream effects of WNT signal-
ing via its interaction with CTNNB1 (beta-catenin) and it
can function as an activator or a repressor, depending on
the availability of CTNNB1 in the nucleus. For example,
TCF7L2 can associate with the members of the Groucho
repressor family in the absence of CTNNB1. The WNT
pathway is often constitutively activated in cancers, leading
to increased levels of nuclear CTNNB1 and up-regulation
of TCF7L2 target genes [3]. In addition to being linked to
neoplastic transformation, variants in TCF7L2 are thought
to be the most critical risk factors for type 2 diabetes
[7-10]. However, the functional role of TCF7L2 in these
diseases remains unclear. One hypothesis is that TCF7L2
regulates its downstream target genes in a tissue-specific
manner, with a different cohort of target genes being
turned on or off by TCF7L2 in each cell type. One way to
* Correspondence: pfarnham@usc.edu; Victor.Jin@osumc.edu
† Contributed equally
1
Department of Biochemistry and Molecular Biology, Norris Comprehensive
Cancer Center, University of Southern California, Los Angeles, CA 90089, USA
3
Department of Biomedical Informatics, The Ohio State University, Columbus,
OH 43210, USA
Full list of author information is available at the end of the article
Frietze et al. Genome Biology 2012, 13:R52
http://genomebiology.com/content/13/9/R52
©2012Frietzeetal.;licenseeBioMedCentralLtd.ThisisanopenaccessarticledistributedunderthetermsoftheCreativeCommons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.

  129
 

 

  130
 

 

  131
 

 

  132
 

 

  133
 

 

  134
 


  135
 

 

  136
 


  137
 
 

  138
 


  139
 


  140
 


  141
 


  142
 


  143
 


  144
 


  145
 


  146
 
Appendix B: Supplementary data files in DVD

 
Supplementary data file 2.1 Exon, TSS, and enhancer correlated SNPs. (xlsx file)
Individual worksheets are provided for all correlated SNPs identified in TSS regions, all
correlated SNPs found in exons, the subset of nonsynonymous correlated SNPs found in exons,
and all correlated SNPs found in the set of 28 enhancers. In all cases, the tag SNP, the LD of the
correlated, and genomic information for each SNP is provided.  

Supplementary data file 2.2. H3K27Ac ChIP-seq peaks. (xlsx file) The sets of H3K27Ac
peaks called on merged replicates for HCT116 and sigmoid colon datasets is provided in
individual worksheets.  

Supplementary data file 2.3. Distal regulatory regions correlated with CRC tag
SNPs.  (xlsx file) The tag SNP and the correlated SNPs for 28 distal, robust H3K27Ac regions
are indicated; each region is classified according to its presence or absence in HCT116 (H),
sigmoid colon (S), or SW480 (SW) cells. The 3 nearest protein-coding genes and 3 nearest non-
coding RNAs were identified using the GENCODE V15 gene annotation.  The red text indicates
that the same gene was identified by SNPs within the TSS (r2 >0.5) and the blue color means
that the same gene was identified by a SNP within a coding exon (r2 >0.1); see Table 2.2. The
strike-out indicates that the RNA is present at less than 0.5 FPKM in HCT116 and sigmoid colon
cells; bold indicates that the transcript is higher than 2 FPKM.  

Supplementary data file 2.4. Motif analysis of all correlated SNPs in enhancers.
(xlsx file) All SNPs having an r2> 0.1 with the 25 CRC tag SNPs were analyzed using motifs from
Factorbook (http://www.factorbook.org).  For those SNPs that impacted a critical position of a
motif, it was determined if the change was predicted to be an improvement or a disruption.  A
more restrictive list including only the subset of SNP-affected motifs within the robust enhancer
regions (using an r2> 0.5 cut-off) are shown in Supplementary data file 2.5.

Supplementary data file 2.5. Motif analysis of enhancer SNPs having r
2
>0.5 with a
tag SNP. (xlsx file) Shown are the motifs in the risk-associated enhancers that are affected by
the correlated SNPs; “-“ means that the alternative allele of the correlated SNP should disrupt TF
binding and “+” means that that the alternative allele of the correlated SNP should improve TF
binding.  

Supplementary data file 2.6. Expression of transcription factors with affected
motifs.  (xlsx file) Shown is the name of the transcription factors (TF), the expression levels in
HCT116 and sigmoid colon cells, the differential expression ratio (calculated both ways), and the
gene id and genomic location of each TF.

Supplementary data file 2.7. eQTL analysis results.  (xlsx file) Shown are the results for
the SNP-gene pairs using 60 SNPs (6 tag SNPs, 18 SNPs within risk enhancers, and 45 SNPs

  147
 
within TSS regions) present on the GW SNP6 array and the expression of genes identified by
exon or TSS SNPs or by differential expression analysis (see Table 2.2 and Table 2.5).

Supplementary data file 2.8. RNA analysis of cells having a deletion of enhancer 7.
(xlsx file) Shown are the genes whose expression decreased in the cells in which enhancer 7
was deleted, as determined using Illumina HumanHT-12 v4 Expression BeadChip arrays.  

Supplementary data file 2.9. TCGA sample IDs.   (xlsx file) The IDs for HM450K DNA
methylation, RNA-seq, SNP arrays, and copy number variation analyses for 228 TCGA samples
are provided, as well as the IDs for 254 samples used in normal-tumor gene expression
analyses.  

Supplementary data file 2.10. Summary of putative CRC risk-associated genes.  
(xlsx file) The names of the 80 genes identified as potentially involved in an increased risk for
colon cancer are indicated, along with the location of the SNP(s) that identified the gene. In some
cases, the gene was identified by more than one type of SNP.  In the Enhancer column, “nearest
3” and “nearest 20” refer to the position of the gene relative to the location of the enhancer.  In
the Expressed column, Yes means that the gene is expressed in colon cells and TCGA means
that the gene showed differential expression in a cohort of the TCGA normal vs. colon tumor
samples.  

Supplementary data file 3.1 Genes deregulated upon enhancer deletion.  
(xlsx file) The different worksheets show the genes upregulated or downregulated > 1.5 fold upon  
enhancer deletion, as compared to expression levels in control clones (selected after transfection
with Cas9-GFP plus gRNA empty vector).  


Supplementary data file 3.2 H3K27Ac and RNA-seq profiles for genes deregulated
in E7- and E24-deleted cells.
(xlsx file) Shown is an analysis of genes downregulated upon E7 deletion (A), upregulated upon
E7 deletion (B), downregulated upon E24 deletion (C) and upregulated upon E24 deletion (D).
For each set of analyses panel 1 shows which genes downregulated or upregulated greater than
1.5-fold have decreased or increased peaks within +/- 100Kb from their TSS; panels 2 and 3
show the H3K27Ac tracks and RNA-seq data in the control and deleted cells for specific genes.


 

  148
 


References  

(2012).
 "Comprehensive
 molecular
 characterization
 of
 human
 colon
 and
 rectal
 cancer."
 
Nature
 487(7407):
 330-­‐‑337.
 
(2012).
 "An
 integrated
 encyclopedia
 of
 DNA
 elements
 in
 the
 human
 genome."
 Nature
 
489(7414):
 57-­‐‑74.
 
(2012).
 "Method
 of
 the
 Year
 2011."
 Nature
 Methods
 9:
 1.
 
(2015).
 "Human
 genomics.
 The
 Genotype-­‐‑Tissue
 Expression
 (GTEx)
 pilot
 analysis:
 
multitissue
 gene
 regulation
 in
 humans."
 Science
 348(6235):
 648-­‐‑660.
 
Abba,
 M.,
 N.
 Patil,
 K.
 Rasheed,
 L.
 D.
 Nelson,
 G.
 Mudduluru,
 J.
 H.
 Leupold
 and
 H.
 Allgayer
 
(2013).
 "Unraveling
 the
 Role
 of
 FOXQ1
 in
 Colorectal
 Cancer
 Metastasis."
 Mol
 Cancer
 Res
 
11(9):
 1017-­‐‑1028.
 
Abecasis,
 G.
 R.,
 A.
 Auton,
 L.
 D.
 Brooks,
 M.
 A.
 DePristo,
 R.
 M.
 Durbin,
 R.
 E.
 Handsaker,
 H.
 M.
 
Kang,
 G.
 T.
 Marth
 and
 G.
 A.
 McVean
 (2012).
 "An
 integrated
 map
 of
 genetic
 variation
 from
 
1,092
 human
 genomes."
 Nature
 491(7422):
 56-­‐‑65.
 
Adzhubei,
 I.
 A.,
 S.
 Schmidt,
 L.
 Peshkin,
 V.
 E.
 Ramensky,
 A.
 Gerasimova,
 P.
 Bork,
 A.
 S.
 
Kondrashov
 and
 S.
 R.
 Sunyaev
 (2010).
 "A
 method
 and
 server
 for
 predicting
 damaging
 
missense
 mutations."
 Nat
 Methods
 7(4):
 248-­‐‑249.
 
Ahmadiyeh,
 N.,
 M.
 M.
 Pomerantz,
 C.
 Grisanzio,
 P.
 Herman,
 L.
 Jia,
 V.
 Almendro,
 H.
 H.
 He,
 M.
 
Brown,
 X.
 S.
 Liu,
 M.
 Davis,
 J.
 L.
 Caswell,
 C.
 A.
 Beckwith,
 A.
 Hills,
 L.
 Macconaill,
 G.
 A.
 Coetzee,
 M.
 
M.
 Regan
 and
 M.
 L.
 Freedman
 (2010).
 "8q24
 prostate,
 breast,
 and
 colon
 cancer
 risk
 loci
 
show
 tissue-­‐‑specific
 long-­‐‑range
 interaction
 with
 MYC."
 Proc
 Natl
 Acad
 Sci
 U
 S
 A
 107(21):
 
9742-­‐‑9746.
 
Akhtar-­‐‑Zaidi,
 B.,
 R.
 Cowper-­‐‑Sal-­‐‑lari,
 O.
 Corradin,
 A.
 Saiakhova,
 C.
 F.
 Bartels,
 D.
 
Balasubramanian,
 L.
 Myeroff,
 J.
 Lutterbaugh,
 A.
 Jarrar,
 M.
 F.
 Kalady,
 J.
 Willis,
 J.
 H.
 Moore,
 P.
 J.
 
Tesar,
 T.
 Laframboise,
 S.
 Markowitz,
 M.
 Lupien
 and
 P.
 C.
 Scacheri
 (2012).
 "Epigenomic
 
enhancer
 profiling
 defines
 a
 signature
 of
 colon
 cancer."
 Science
 336(6082):
 736-­‐‑739.
 
Albert,
 F.
 W.
 and
 L.
 Kruglyak
 (2015).
 "The
 role
 of
 regulatory
 variation
 in
 complex
 traits
 and
 
disease."
 Nat
 Rev
 Genet
 16(4):
 197-­‐‑212.
 
Amin
 Al
 Olama,
 A.,
 T.
 Dadaev,
 D.
 J.
 Hazelett,
 Q.
 Li,
 D.
 Leongamornlert,
 E.
 J.
 Saunders,
 S.
 
Stephens,
 C.
 Cieza-­‐‑Borrella,
 I.
 Whitmore,
 S.
 Benlloch
 Garcia,
 G.
 G.
 Giles,
 M.
 C.
 Southey,
 L.
 
Fitzgerald,
 H.
 Gronberg,
 F.
 Wiklund,
 M.
 Aly,
 B.
 E.
 Henderson,
 F.
 Schumacher,
 C.
 A.
 Haiman,
 J.
 
Schleutker,
 T.
 Wahlfors,
 T.
 L.
 Tammela,
 B.
 G.
 Nordestgaard,
 T.
 J.
 Key,
 R.
 C.
 Travis,
 D.
 E.
 Neal,
 J.
 
L.
 Donovan,
 F.
 C.
 Hamdy,
 P.
 Pharoah,
 N.
 Pashayan,
 K.
 T.
 Khaw,
 J.
 L.
 Stanford,
 S.
 N.
 Thibodeau,
 
S.
 K.
 McDonnell,
 D.
 J.
 Schaid,
 C.
 Maier,
 W.
 Vogel,
 M.
 Luedeke,
 K.
 Herkommer,
 A.
 S.
 Kibel,
 C.
 
Cybulski,
 D.
 Wokolorczyk,
 W.
 Kluzniak,
 L.
 Cannon-­‐‑Albright,
 H.
 Brenner,
 K.
 Butterbach,
 V.
 
Arndt,
 J.
 Y.
 Park,
 T.
 Sellers,
 H.
 Y.
 Lin,
 C.
 Slavov,
 R.
 Kaneva,
 V.
 Mitev,
 J.
 Batra,
 J.
 A.
 Clements,
 A.
 
Spurdle,
 M.
 R.
 Teixeira,
 P.
 Paulo,
 S.
 Maia,
 H.
 Pandha,
 A.
 Michael,
 A.
 Kierzek,
 K.
 Govindasami,
 
M.
 Guy,
 A.
 Lophatonanon,
 K.
 Muir,
 A.
 Vinuela,
 A.
 A.
 Brown,
 M.
 Freedman,
 D.
 V.
 Conti,
 D.
 

  149
 
Easton,
 G.
 A.
 Coetzee,
 R.
 A.
 Eeles
 and
 Z.
 Kote-­‐‑Jarai
 (2015).
 "Multiple
 novel
 prostate
 cancer
 
susceptibility
 signals
 identified
 by
 fine-­‐‑mapping
 of
 known
 risk
 loci
 among
 Europeans."
 Hum
 
Mol
 Genet.
 
Andersson,
 R.,
 C.
 Gebhard,
 I.
 Miguel-­‐‑Escalada,
 I.
 Hoof,
 J.
 Bornholdt,
 M.
 Boyd,
 Y.
 Chen,
 X.
 Zhao,
 
C.
 Schmidl,
 T.
 Suzuki,
 E.
 Ntini,
 E.
 Arner,
 E.
 Valen,
 K.
 Li,
 L.
 Schwarzfischer,
 D.
 Glatz,
 J.
 Raithel,
 B.
 
Lilje,
 N.
 Rapin,
 F.
 O.
 Bagger,
 M.
 Jorgensen,
 P.
 R.
 Andersen,
 N.
 Bertin,
 O.
 Rackham,
 A.
 M.
 
Burroughs,
 J.
 K.
 Baillie,
 Y.
 Ishizu,
 Y.
 Shimizu,
 E.
 Furuhata,
 S.
 Maeda,
 Y.
 Negishi,
 C.
 J.
 Mungall,
 T.
 
F.
 Meehan,
 T.
 Lassmann,
 M.
 Itoh,
 H.
 Kawaji,
 N.
 Kondo,
 J.
 Kawai,
 A.
 Lennartsson,
 C.
 O.
 Daub,
 P.
 
Heutink,
 D.
 A.
 Hume,
 T.
 H.
 Jensen,
 H.
 Suzuki,
 Y.
 Hayashizaki,
 F.
 Muller,
 A.
 R.
 Forrest,
 P.
 
Carninci,
 M.
 Rehli
 and
 A.
 Sandelin
 (2014).
 "An
 atlas
 of
 active
 enhancers
 across
 human
 cell
 
types
 and
 tissues."
 Nature
 507(7493):
 455-­‐‑461.
 
Andersson,
 R.,
 A.
 Sandelin
 and
 C.
 G.
 Danko
 (2015).
 "A
 unified
 architecture
 of
 transcriptional
 
regulatory
 elements."
 Trends
 Genet
 31(8):
 426-­‐‑433.
 
Aran,
 D.
 and
 A.
 Hellman
 (2014).
 "Unmasking
 risk
 loci:
 DNA
 methylation
 illuminates
 the
 
biology
 of
 cancer
 predisposition:
 analyzing
 DNA
 methylation
 of
 transcriptional
 enhancers
 
reveals
 missed
 regulatory
 links
 between
 cancer
 risk
 loci
 and
 genes."
 Bioessays
 36(2):
 184-­‐‑
190.
 
Aumailley,
 M.
 (2013).
 "The
 laminin
 family."
 Cell
 Adh
 Migr
 7(1):
 48-­‐‑55.
 
Banovich,
 N.
 E.,
 X.
 Lan,
 G.
 McVicker,
 B.
 van
 de
 Geijn,
 J.
 F.
 Degner,
 J.
 D.
 Blischak,
 J.
 Roux,
 J.
 K.
 
Pritchard
 and
 Y.
 Gilad
 (2014).
 "Methylation
 QTLs
 are
 associated
 with
 coordinated
 changes
 
in
 transcription
 factor
 binding,
 histone
 modifications,
 and
 gene
 expression
 levels."
 PLoS
 
Genet
 10(9):
 e1004663.
 
Barrangou,
 R.
 (2014).
 "RNA
 events.
 Cas9
 targeting
 and
 the
 CRISPR
 revolution."
 Science
 
344(6185):
 707-­‐‑708.
 
Battle,
 A.,
 Z.
 Khan,
 S.
 H.
 Wang,
 A.
 Mitrano,
 M.
 J.
 Ford,
 J.
 K.
 Pritchard
 and
 Y.
 Gilad
 (2015).
 
"Genomic
 variation.
 Impact
 of
 regulatory
 variation
 from
 RNA
 to
 protein."
 Science
 
347(6222):
 664-­‐‑667.
 
Bauer,
 D.
 E.,
 S.
 C.
 Kamran,
 S.
 Lessard,
 J.
 Xu,
 Y.
 Fujiwara,
 C.
 Lin,
 Z.
 Shao,
 M.
 C.
 Canver,
 E.
 C.
 
Smith,
 L.
 Pinello,
 P.
 J.
 Sabo,
 J.
 Vierstra,
 R.
 A.
 Voit,
 G.
 C.
 Yuan,
 M.
 H.
 Porteus,
 J.
 A.
 
Stamatoyannopoulos,
 G.
 Lettre
 and
 S.
 H.
 Orkin
 (2013).
 "An
 erythroid
 enhancer
 of
 BCL11A
 
subject
 to
 genetic
 variation
 determines
 fetal
 hemoglobin
 level."
 Science
 342(6155):
 253-­‐‑
257.
 
Bell,
 J.
 T.,
 A.
 A.
 Pai,
 J.
 K.
 Pickrell,
 D.
 J.
 Gaffney,
 R.
 Pique-­‐‑Regi,
 J.
 F.
 Degner,
 Y.
 Gilad
 and
 J.
 K.
 
Pritchard
 (2011).
 "DNA
 methylation
 patterns
 associate
 with
 genetic
 and
 gene
 expression
 
variation
 in
 HapMap
 cell
 lines."
 Genome
 Biol
 12(1):
 R10.
 
Bernstein,
 B.
 E.,
 J.
 A.
 Stamatoyannopoulos,
 J.
 F.
 Costello,
 B.
 Ren,
 A.
 Milosavljevic,
 A.
 Meissner,
 
M.
 Kellis,
 M.
 A.
 Marra,
 A.
 L.
 Beaudet,
 J.
 R.
 Ecker,
 P.
 J.
 Farnham,
 M.
 Hirst,
 E.
 S.
 Lander,
 T.
 S.
 
Mikkelsen
 and
 J.
 A.
 Thomson
 (2010).
 "The
 NIH
 Roadmap
 Epigenomics
 Mapping
 
Consortium."
 Nat
 Biotechnol
 28(10):
 1045-­‐‑1048.
 
Birney,
 E.,
 J.
 A.
 Stamatoyannopoulos,
 A.
 Dutta,
 R.
 Guigo,
 T.
 R.
 Gingeras,
 E.
 H.
 Margulies,
 Z.
 
Weng,
 M.
 Snyder,
 E.
 T.
 Dermitzakis,
 R.
 E.
 Thurman,
 M.
 S.
 Kuehn,
 C.
 M.
 Taylor,
 S.
 Neph,
 C.
 M.
 
Koch,
 S.
 Asthana,
 A.
 Malhotra,
 I.
 Adzhubei,
 J.
 A.
 Greenbaum,
 R.
 M.
 Andrews,
 P.
 Flicek,
 P.
 J.
 
Boyle,
 H.
 Cao,
 N.
 P.
 Carter,
 G.
 K.
 Clelland,
 S.
 Davis,
 N.
 Day,
 P.
 Dhami,
 S.
 C.
 Dillon,
 M.
 O.
 
Dorschner,
 H.
 Fiegler,
 P.
 G.
 Giresi,
 J.
 Goldy,
 M.
 Hawrylycz,
 A.
 Haydock,
 R.
 Humbert,
 K.
 D.
 
James,
 B.
 E.
 Johnson,
 E.
 M.
 Johnson,
 T.
 T.
 Frum,
 E.
 R.
 Rosenzweig,
 N.
 Karnani,
 K.
 Lee,
 G.
 C.
 

  150
 
Lefebvre,
 P.
 A.
 Navas,
 F.
 Neri,
 S.
 C.
 Parker,
 P.
 J.
 Sabo,
 R.
 Sandstrom,
 A.
 Shafer,
 D.
 Vetrie,
 M.
 
Weaver,
 S.
 Wilcox,
 M.
 Yu,
 F.
 S.
 Collins,
 J.
 Dekker,
 J.
 D.
 Lieb,
 T.
 D.
 Tullius,
 G.
 E.
 Crawford,
 S.
 
Sunyaev,
 W.
 S.
 Noble,
 I.
 Dunham,
 F.
 Denoeud,
 A.
 Reymond,
 P.
 Kapranov,
 J.
 Rozowsky,
 D.
 
Zheng,
 R.
 Castelo,
 A.
 Frankish,
 J.
 Harrow,
 S.
 Ghosh,
 A.
 Sandelin,
 I.
 L.
 Hofacker,
 R.
 Baertsch,
 D.
 
Keefe,
 S.
 Dike,
 J.
 Cheng,
 H.
 A.
 Hirsch,
 E.
 A.
 Sekinger,
 J.
 Lagarde,
 J.
 F.
 Abril,
 A.
 Shahab,
 C.
 Flamm,
 
C.
 Fried,
 J.
 Hackermuller,
 J.
 Hertel,
 M.
 Lindemeyer,
 K.
 Missal,
 A.
 Tanzer,
 S.
 Washietl,
 J.
 Korbel,
 
O.
 Emanuelsson,
 J.
 S.
 Pedersen,
 N.
 Holroyd,
 R.
 Taylor,
 D.
 Swarbreck,
 N.
 Matthews,
 M.
 C.
 
Dickson,
 D.
 J.
 Thomas,
 M.
 T.
 Weirauch,
 J.
 Gilbert,
 J.
 Drenkow,
 I.
 Bell,
 X.
 Zhao,
 K.
 G.
 Srinivasan,
 
W.
 K.
 Sung,
 H.
 S.
 Ooi,
 K.
 P.
 Chiu,
 S.
 Foissac,
 T.
 Alioto,
 M.
 Brent,
 L.
 Pachter,
 M.
 L.
 Tress,
 A.
 
Valencia,
 S.
 W.
 Choo,
 C.
 Y.
 Choo,
 C.
 Ucla,
 C.
 Manzano,
 C.
 Wyss,
 E.
 Cheung,
 T.
 G.
 Clark,
 J.
 B.
 
Brown,
 M.
 Ganesh,
 S.
 Patel,
 H.
 Tammana,
 J.
 Chrast,
 C.
 N.
 Henrichsen,
 C.
 Kai,
 J.
 Kawai,
 U.
 
Nagalakshmi,
 J.
 Wu,
 Z.
 Lian,
 J.
 Lian,
 P.
 Newburger,
 X.
 Zhang,
 P.
 Bickel,
 J.
 S.
 Mattick,
 P.
 Carninci,
 
Y.
 Hayashizaki,
 S.
 Weissman,
 T.
 Hubbard,
 R.
 M.
 Myers,
 J.
 Rogers,
 P.
 F.
 Stadler,
 T.
 M.
 Lowe,
 C.
 L.
 
Wei,
 Y.
 Ruan,
 K.
 Struhl,
 M.
 Gerstein,
 S.
 E.
 Antonarakis,
 Y.
 Fu,
 E.
 D.
 Green,
 U.
 Karaoz,
 A.
 Siepel,
 J.
 
Taylor,
 L.
 A.
 Liefer,
 K.
 A.
 Wetterstrand,
 P.
 J.
 Good,
 E.
 A.
 Feingold,
 M.
 S.
 Guyer,
 G.
 M.
 Cooper,
 G.
 
Asimenos,
 C.
 N.
 Dewey,
 M.
 Hou,
 S.
 Nikolaev,
 J.
 I.
 Montoya-­‐‑Burgos,
 A.
 Loytynoja,
 S.
 Whelan,
 F.
 
Pardi,
 T.
 Massingham,
 H.
 Huang,
 N.
 R.
 Zhang,
 I.
 Holmes,
 J.
 C.
 Mullikin,
 A.
 Ureta-­‐‑Vidal,
 B.
 Paten,
 
M.
 Seringhaus,
 D.
 Church,
 K.
 Rosenbloom,
 W.
 J.
 Kent,
 E.
 A.
 Stone,
 S.
 Batzoglou,
 N.
 Goldman,
 R.
 
C.
 Hardison,
 D.
 Haussler,
 W.
 Miller,
 A.
 Sidow,
 N.
 D.
 Trinklein,
 Z.
 D.
 Zhang,
 L.
 Barrera,
 R.
 Stuart,
 
D.
 C.
 King,
 A.
 Ameur,
 S.
 Enroth,
 M.
 C.
 Bieda,
 J.
 Kim,
 A.
 A.
 Bhinge,
 N.
 Jiang,
 J.
 Liu,
 F.
 Yao,
 V.
 B.
 
Vega,
 C.
 W.
 Lee,
 P.
 Ng,
 A.
 Yang,
 Z.
 Moqtaderi,
 Z.
 Zhu,
 X.
 Xu,
 S.
 Squazzo,
 M.
 J.
 Oberley,
 D.
 Inman,
 
M.
 A.
 Singer,
 T.
 A.
 Richmond,
 K.
 J.
 Munn,
 A.
 Rada-­‐‑Iglesias,
 O.
 Wallerman,
 J.
 Komorowski,
 J.
 C.
 
Fowler,
 P.
 Couttet,
 A.
 W.
 Bruce,
 O.
 M.
 Dovey,
 P.
 D.
 Ellis,
 C.
 F.
 Langford,
 D.
 A.
 Nix,
 G.
 
Euskirchen,
 S.
 Hartman,
 A.
 E.
 Urban,
 P.
 Kraus,
 S.
 Van
 Calcar,
 N.
 Heintzman,
 T.
 H.
 Kim,
 K.
 
Wang,
 C.
 Qu,
 G.
 Hon,
 R.
 Luna,
 C.
 K.
 Glass,
 M.
 G.
 Rosenfeld,
 S.
 F.
 Aldred,
 S.
 J.
 Cooper,
 A.
 Halees,
 J.
 
M.
 Lin,
 H.
 P.
 Shulha,
 M.
 Xu,
 J.
 N.
 Haidar,
 Y.
 Yu,
 V.
 R.
 Iyer,
 R.
 D.
 Green,
 C.
 Wadelius,
 P.
 J.
 
Farnham,
 B.
 Ren,
 R.
 A.
 Harte,
 A.
 S.
 Hinrichs,
 H.
 Trumbower,
 H.
 Clawson,
 J.
 Hillman-­‐‑Jackson,
 A.
 
S.
 Zweig,
 K.
 Smith,
 A.
 Thakkapallayil,
 G.
 Barber,
 R.
 M.
 Kuhn,
 D.
 Karolchik,
 L.
 Armengol,
 C.
 P.
 
Bird,
 P.
 I.
 de
 Bakker,
 A.
 D.
 Kern,
 N.
 Lopez-­‐‑Bigas,
 J.
 D.
 Martin,
 B.
 E.
 Stranger,
 A.
 Woodroffe,
 E.
 
Davydov,
 A.
 Dimas,
 E.
 Eyras,
 I.
 B.
 Hallgrimsdottir,
 J.
 Huppert,
 M.
 C.
 Zody,
 G.
 R.
 Abecasis,
 X.
 
Estivill,
 G.
 G.
 Bouffard,
 X.
 Guan,
 N.
 F.
 Hansen,
 J.
 R.
 Idol,
 V.
 V.
 Maduro,
 B.
 Maskeri,
 J.
 C.
 
McDowell,
 M.
 Park,
 P.
 J.
 Thomas,
 A.
 C.
 Young,
 R.
 W.
 Blakesley,
 D.
 M.
 Muzny,
 E.
 Sodergren,
 D.
 
A.
 Wheeler,
 K.
 C.
 Worley,
 H.
 Jiang,
 G.
 M.
 Weinstock,
 R.
 A.
 Gibbs,
 T.
 Graves,
 R.
 Fulton,
 E.
 R.
 
Mardis,
 R.
 K.
 Wilson,
 M.
 Clamp,
 J.
 Cuff,
 S.
 Gnerre,
 D.
 B.
 Jaffe,
 J.
 L.
 Chang,
 K.
 Lindblad-­‐‑Toh,
 E.
 S.
 
Lander,
 M.
 Koriabine,
 M.
 Nefedov,
 K.
 Osoegawa,
 Y.
 Yoshinaga,
 B.
 Zhu
 and
 P.
 J.
 de
 Jong
 (2007).
 
"Identification
 and
 analysis
 of
 functional
 elements
 in
 1%
 of
 the
 human
 genome
 by
 the
 
ENCODE
 pilot
 project."
 Nature
 447(7146):
 799-­‐‑816.
 
Blahnik,
 K.
 R.,
 L.
 Dou,
 L.
 Echipare,
 S.
 Iyengar,
 H.
 O'Geen,
 E.
 Sanchez,
 Y.
 Zhao,
 M.
 A.
 Marra,
 M.
 
Hirst,
 J.
 F.
 Costello,
 I.
 Korf
 and
 P.
 J.
 Farnham
 (2011).
 "Characterization
 of
 the
 contradictory
 
chromatin
 signatures
 at
 the
 3'
 exons
 of
 zinc
 finger
 genes."
 PLOS
 One
 6:
 e17121.
 
Blahnik,
 K.
 R.,
 L.
 Dou,
 H.
 O'Geen,
 T.
 McPhillips,
 X.
 Xu,
 A.
 R.
 Cao,
 S.
 Iyengar,
 C.
 M.
 Nicolet,
 B.
 
Ludaescher,
 I.
 Korf
 and
 P.
 J.
 Farnham
 (2010).
 "Sole-­‐‑search:
 An
 integrated
 analysis
 program
 
for
 peak
 detection
 and
 functional
 annotation
 using
 ChIP-­‐‑seq
 data."
 Nucleic
 Acids
 Res
 38:
 
e13.
 

  151
 
Blattler,
 A.
 and
 P.
 J.
 Farnham
 (2013).
 "Cross-­‐‑talk
 between
 site-­‐‑specific
 transcription
 factors
 
and
 DNA
 methylation
 states."
 J
 Biol
 Chem
 288(48):
 34287-­‐‑34294.
 
Blattler,
 A.,
 L.
 Yao,
 Y.
 Wang,
 Z.
 Ye,
 V.
 X.
 Jin
 and
 P.
 J.
 Farnham
 (2013).
 "ZBTB33
 binds
 
unmethylated
 regions
 of
 the
 genome
 associated
 with
 actively
 expressed
 genes."
 Epigenetics
 
Chromatin
 6:
 13.
 
Blattler,
 A.,
 L.
 Yao,
 H.
 Witt,
 Y.
 Guo,
 C.
 M.
 Nicolet,
 B.
 P.
 Berman
 and
 P.
 J.
 Farnham
 (2014).
 
"Global
 loss
 of
 DNA
 methylation
 uncovers
 intronic
 enhancers
 in
 genes
 showing
 expression
 
changes."
 Genome
 Biol
 15(9):
 469.
 
Bond-­‐‑Smith,
 G.,
 N.
 Banga,
 T.
 M.
 Hammond
 and
 C.
 J.
 Imber
 (2012).
 "Pancreatic
 
adenocarcinoma."
 BMJ
 344:
 e2476.
 
Bonn,
 S.,
 R.
 P.
 Zinzen,
 C.
 Girardot,
 E.
 H.
 Gustafson,
 A.
 Perez-­‐‑Gonzalez,
 N.
 Delhomme,
 Y.
 Ghavi-­‐‑
Helm,
 B.
 Wilczynski,
 A.
 Riddell
 and
 E.
 E.
 Furlong
 (2012).
 "Tissue-­‐‑specific
 analysis
 of
 
chromatin
 state
 identifies
 temporal
 signatures
 of
 enhancer
 activity
 during
 embryonic
 
development."
 Nat
 Genet
 44(2):
 148-­‐‑156.
 
Boyle,
 A.
 P.,
 E.
 L.
 Hong,
 M.
 Hariharan,
 Y.
 Cheng,
 M.
 A.
 Schaub,
 M.
 Kasowski,
 K.
 J.
 Karczewski,
 J.
 
Park,
 B.
 C.
 Hitz,
 S.
 Weng,
 J.
 M.
 Cherry
 and
 M.
 Snyder
 (2012).
 "Annotation
 of
 functional
 
variation
 in
 personal
 genomes
 using
 RegulomeDB."
 Genome
 Res
 22(9):
 1790-­‐‑1797.
 
Boyle,
 A.
 P.,
 L.
 Song,
 B.
 K.
 Lee,
 D.
 London,
 D.
 Keefe,
 E.
 Birney,
 V.
 R.
 Iyer,
 G.
 E.
 Crawford
 and
 T.
 
S.
 Furey
 (2011).
 "High-­‐‑resolution
 genome-­‐‑wide
 in
 vivo
 footprinting
 of
 diverse
 transcription
 
factors
 in
 human
 cells."
 Genome
 Res
 21(3):
 456-­‐‑464.
 
Breyer,
 J.
 P.,
 D.
 C.
 Dorset,
 T.
 A.
 Clark,
 K.
 M.
 Bradley,
 T.
 A.
 Wahlfors,
 K.
 M.
 McReynolds,
 W.
 H.
 
Maynard,
 S.
 S.
 Chang,
 M.
 S.
 Cookson,
 J.
 A.
 Smith,
 J.
 Schleutker,
 W.
 D.
 Dupont
 and
 J.
 R.
 Smith
 
(2014).
 "An
 Expressed
 Retrogene
 of
 the
 Master
 Embryonic
 Stem
 Cell
 Gene
 POU5F1
 Is
 
Associated
 with
 Prostate
 Cancer
 Susceptibility."
 Am
 J
 Hum
 Genet
 94(3):
 395-­‐‑404.
 
Bright-­‐‑Thomas,
 R.
 M.
 and
 R.
 Hargest
 (2003).
 "APC,
 beta-­‐‑Catenin
 and
 hTCF-­‐‑4;
 an
 unholy
 
trinity
 in
 the
 genesis
 of
 colorectal
 cancer."
 Eur
 J
 Surg
 Oncol
 29(2):
 107-­‐‑117.
 
Browning,
 S.
 R.
 and
 B.
 L.
 Browning
 (2007).
 "Rapid
 and
 accurate
 haplotype
 phasing
 and
 
missing-­‐‑data
 inference
 for
 whole-­‐‑genome
 association
 studies
 by
 use
 of
 localized
 haplotype
 
clustering."
 Am
 J
 Hum
 Genet
 81(5):
 1084-­‐‑1097.
 
Buenrostro,
 J.
 D.,
 P.
 G.
 Giresi,
 L.
 C.
 Zaba,
 H.
 Y.
 Chang
 and
 W.
 J.
 Greenleaf
 (2013).
 
"Transposition
 of
 native
 chromatin
 for
 fast
 and
 sensitive
 epigenomic
 profiling
 of
 open
 
chromatin,
 DNA-­‐‑binding
 proteins
 and
 nucleosome
 position."
 Nat
 Methods
 10(12):
 1213-­‐‑
1218.
 
Cai,
 J.,
 Y.
 Zhao,
 Y.
 Liu,
 F.
 Ye,
 Z.
 Song,
 H.
 Qin,
 S.
 Meng,
 Y.
 Chen,
 R.
 Zhou,
 X.
 Song,
 Y.
 Guo,
 M.
 Ding
 
and
 H.
 Deng
 (2007).
 "Directed
 differentiation
 of
 human
 embryonic
 stem
 cells
 into
 functional
 
hepatic
 cells."
 Hepatology
 45:
 1229-­‐‑1239.
 
Calo,
 E.
 and
 J.
 Wysocka
 (2013).
 "Modification
 of
 enhancer
 chromatin:
 what,
 how,
 and
 why?"
 
Mol
 Cell
 49(5):
 825-­‐‑837.
 
Canver,
 M.
 C.,
 D.
 E.
 Bauer,
 A.
 Dass,
 Y.
 Y.
 Yien,
 J.
 Chung,
 T.
 Masuda,
 T.
 Maeda,
 B.
 H.
 Paw
 and
 S.
 
H.
 Orkin
 (2014).
 "Characterization
 of
 genomic
 deletion
 efficiency
 mediated
 by
 clustered
 
regularly
 interspaced
 palindromic
 repeats
 (CRISPR)/Cas9
 nuclease
 system
 in
 mammalian
 
cells."
 J
 Biol
 Chem
 289(31):
 21312-­‐‑21324.
 
Canver,
 M.
 C.,
 E.
 C.
 Smith,
 F.
 Sher,
 L.
 Pinello,
 N.
 E.
 Sanjana,
 O.
 Shalem,
 D.
 D.
 Chen,
 P.
 G.
 Schupp,
 
D.
 S.
 Vinjamur,
 S.
 P.
 Garcia,
 S.
 Luc,
 R.
 Kurita,
 Y.
 Nakamura,
 Y.
 Fujiwara,
 T.
 Maeda,
 G.
 C.
 Yuan,
 F.
 

  152
 
Zhang,
 S.
 H.
 Orkin
 and
 D.
 E.
 Bauer
 (2015).
 "BCL11A
 enhancer
 dissection
 by
 Cas9-­‐‑mediated
 in
 
situ
 saturating
 mutagenesis."
 Nature.
 
Cao,
 A.
 R.,
 R.
 Rabinovich,
 M.
 Xu,
 X.
 Xu,
 V.
 X.
 Jin
 and
 P.
 J.
 Farnham
 (2011).
 "Genome-­‐‑wide
 
analysis
 of
 transcription
 factor
 E2F1
 mutant
 proteins
 reveals
 that
 N-­‐‑
 and
 C-­‐‑terminal
 protein
 
interaction
 domains
 do
 not
 participate
 in
 targeting
 E2F1
 to
 the
 human
 genome."
 J
 Biol
 Chem
 
286(14):
 11985-­‐‑11996.
 
Cao,
 Y.,
 T.
 M.
 Bryan
 and
 R.
 R.
 Reddel
 (2008).
 "Increased
 copy
 number
 of
 the
 TERT
 and
 TERC
 
telomerase
 subunit
 genes
 in
 cancer
 cells."
 Cancer
 Sci
 99(6):
 1092-­‐‑1099.
 
Carneiro,
 P.,
 J.
 Figueiredo,
 R.
 Bordeira-­‐‑Carrico,
 M.
 S.
 Fernandes,
 J.
 Carvalho,
 C.
 Oliveira
 and
 R.
 Seruca
 
(2013).
 "Therapeutic
 targets
 associated
 to
 E-­‐‑cadherin
 dysfunction
 in
 gastric
 cancer."
 Expert
 Opin
 
Ther
 Targets
 17(10):
 1187-­‐‑1201.
 
Carvajal-­‐‑Carmona,
 L.
 G.,
 J.
 B.
 Cazier,
 A.
 M.
 Jones,
 K.
 Howarth,
 P.
 Broderick,
 A.
 Pittman,
 S.
 
Dobbins,
 A.
 Tenesa,
 S.
 Farrington,
 J.
 Prendergast,
 E.
 Theodoratou,
 R.
 Barnetson,
 D.
 Conti,
 P.
 
Newcomb,
 J.
 L.
 Hopper,
 M.
 A.
 Jenkins,
 S.
 Gallinger,
 D.
 J.
 Duggan,
 H.
 Campbell,
 D.
 Kerr,
 G.
 Casey,
 
R.
 Houlston,
 M.
 Dunlop
 and
 I.
 Tomlinson
 (2011).
 "Fine-­‐‑mapping
 of
 colorectal
 cancer
 
susceptibility
 loci
 at
 8q23.3,
 16q22.1
 and
 19q13.11:
 refinement
 of
 association
 signals
 and
 
use
 of
 in
 silico
 analysis
 to
 suggest
 functional
 variation
 and
 unexpected
 candidate
 target
 
genes."
 Hum
 Mol
 Genet
 20(14):
 2879-­‐‑2888.
 
Chen,
 X.,
 H.
 Xu,
 P.
 Yuan,
 F.
 Fang,
 M.
 Huss,
 V.
 B.
 Vega,
 E.
 Wong,
 Y.
 L.
 Orlov,
 W.
 Zhang,
 J.
 Jiang,
 Y.
 
H.
 Loh,
 H.
 C.
 Yeo,
 Z.
 X.
 Yeo,
 V.
 Narang,
 K.
 R.
 Govindarajan,
 B.
 Leong,
 A.
 Shahab,
 Y.
 Ruan,
 G.
 
Bourque,
 W.
 K.
 Sung,
 N.
 D.
 Clarke,
 C.
 L.
 Wei
 and
 H.
 H.
 Ng
 (2008).
 "Integration
 of
 external
 
signaling
 pathways
 with
 the
 core
 transcriptional
 network
 in
 embryonic
 stem
 cells."
 Cell
 
133(6):
 1106-­‐‑1117.
 
Cheng,
 A.
 W.,
 H.
 Wang,
 H.
 Yang,
 L.
 Shi,
 Y.
 Katz,
 T.
 W.
 Theunissen,
 S.
 Rangarajan,
 C.
 S.
 Shivalila,
 
D.
 B.
 Dadon
 and
 R.
 Jaenisch
 (2013).
 "Multiplexed
 activation
 of
 endogenous
 genes
 by
 CRISPR-­‐‑
on,
 an
 RNA-­‐‑guided
 transcriptional
 activator
 system."
 Cell
 Res
 23(10):
 1163-­‐‑1171.
 
Cheung,
 E.
 C.,
 D.
 Athineos,
 P.
 Lee,
 R.
 A.
 Ridgway,
 W.
 Lambie,
 C.
 Nixon,
 D.
 Strathdee,
 K.
 Blyth,
 
O.
 J.
 Sansom
 and
 K.
 H.
 Vousden
 (2013).
 "TIGAR
 is
 required
 for
 efficient
 intestinal
 
regeneration
 and
 tumorigenesis."
 Dev
 Cell
 25(5):
 463-­‐‑477.
 
Choi,
 Y.,
 G.
 E.
 Sims,
 S.
 Murphy,
 J.
 R.
 Miller
 and
 A.
 P.
 Chan
 (2012).
 "Predicting
 the
 functional
 
effect
 of
 amino
 acid
 substitutions
 and
 indels."
 PLoS
 One
 7(10):
 e46688.
 
Cingolani,
 P.,
 A.
 Platts,
 L.
 Wang
 le,
 M.
 Coon,
 T.
 Nguyen,
 L.
 Wang,
 S.
 J.
 Land,
 X.
 Lu
 and
 D.
 M.
 
Ruden
 (2012).
 "A
 program
 for
 annotating
 and
 predicting
 the
 effects
 of
 single
 nucleotide
 
polymorphisms,
 SnpEff:
 SNPs
 in
 the
 genome
 of
 Drosophila
 melanogaster
 strain
 w1118;
 iso-­‐‑
2;
 iso-­‐‑3."
 Fly
 (Austin)
 6(2):
 80-­‐‑92.
 
Claussnitzer,
 M.,
 S.
 N.
 Dankel,
 B.
 Klocke,
 H.
 Grallert,
 V.
 Glunk,
 T.
 Berulava,
 H.
 Lee,
 N.
 Oskolkov,
 
J.
 Fadista,
 K.
 Ehlers,
 S.
 Wahl,
 C.
 Hoffmann,
 K.
 Qian,
 T.
 Ronn,
 H.
 Riess,
 M.
 Muller-­‐‑Nurasyid,
 N.
 
Bretschneider,
 T.
 Schroeder,
 T.
 Skurk,
 B.
 Horsthemke,
 D.
 Spieler,
 M.
 Klingenspor,
 M.
 Seifert,
 
M.
 J.
 Kern,
 N.
 Mejhert,
 I.
 Dahlman,
 O.
 Hansson,
 S.
 M.
 Hauck,
 M.
 Bluher,
 P.
 Arner,
 L.
 Groop,
 T.
 
Illig,
 K.
 Suhre,
 Y.
 H.
 Hsu,
 G.
 Mellgren,
 H.
 Hauner
 and
 H.
 Laumen
 (2014).
 "Leveraging
 cross-­‐‑
species
 transcription
 factor
 binding
 site
 patterns:
 from
 diabetes
 risk
 loci
 to
 disease
 
mechanisms."
 Cell
 156(1-­‐‑2):
 343-­‐‑358.
 
Clevers,
 H.
 (2004).
 "Wnt
 breakers
 in
 colon
 cancer."
 Cancer
 Cell
 5(1):
 5-­‐‑6.
 
Clevers,
 H.
 (2006).
 "Wnt/beta-­‐‑catenin
 signaling
 in
 development
 and
 disease."
 Cell
 127(3):
 
469-­‐‑480.
 

  153
 
Coetzee,
 G.
 A.,
 L.
 Jia,
 B.
 Frenkel,
 B.
 E.
 Henderson,
 A.
 Tanay,
 C.
 A.
 Haiman
 and
 M.
 L.
 Freedman
 
(2010).
 "A
 systematic
 approach
 to
 understand
 the
 functional
 consequences
 of
 non-­‐‑protein
 
coding
 risk
 regions."
 Cell
 Cycle
 9(2):
 256-­‐‑259.
 
Coetzee,
 S.
 G.,
 S.
 K.
 Rhie,
 B.
 P.
 Berman,
 G.
 A.
 Coetzee
 and
 H.
 Noushmehr
 (2012).
 "FunciSNP:
 
an
 R/bioconductor
 tool
 integrating
 functional
 non-­‐‑coding
 data
 sets
 with
 genetic
 association
 
studies
 to
 identify
 candidate
 regulatory
 SNPs."
 Nucleic
 Acids
 Res
 40(18):
 e139.
 
Coller,
 H.
 A.,
 C.
 Grandori,
 P.
 Tamayo,
 T.
 Colbert,
 E.
 S.
 Lander,
 R.
 N.
 Eisenman
 and
 T.
 R.
 Golub
 
(2000).
 "Expression
 analysis
 with
 oligonucleotide
 microarrays
 reveals
 that
 MYC
 regulates
 
genes
 involved
 in
 growth,
 cell
 cycle,
 signaling,
 and
 adhesion."
 Proc.
 Natl.
 Acad.
 Sci.
 USA
 97:
 
3260-­‐‑3265.
 
Conacci-­‐‑Sorrell,
 M.,
 L.
 McFerrin
 and
 R.
 N.
 Eisenman
 (2014).
 "An
 overview
 of
 MYC
 and
 its
 
interactome."
 Cold
 Spring
 Harb
 Perspect
 Med
 4(1):
 a014357.
 
Cong,
 L.,
 F.
 A.
 Ran,
 D.
 Cox,
 S.
 Lin,
 R.
 Barretto,
 N.
 Habib,
 P.
 D.
 Hsu,
 X.
 Wu,
 W.
 Jiang,
 L.
 A.
 
Marraffini
 and
 F.
 Zhang
 (2013).
 "Multiplex
 genome
 engineering
 using
 CRISPR/Cas
 systems."
 
Science
 339(6121):
 819-­‐‑823.
 
Corradin,
 O.,
 A.
 Saiakhova,
 B.
 Akhtar-­‐‑Zaidi,
 L.
 Myeroff,
 J.
 Willis,
 R.
 Cowper-­‐‑Sal
 lari,
 M.
 Lupien,
 
S.
 Markowitz
 and
 P.
 C.
 Scacheri
 (2014).
 "Combinatorial
 effects
 of
 multiple
 enhancer
 variants
 
in
 linkage
 disequilibrium
 dictate
 levels
 of
 gene
 expression
 to
 confer
 susceptibility
 to
 
common
 traits."
 Genome
 Res
 24(1):
 1-­‐‑13.
 
Corradin,
 O.
 and
 P.
 C.
 Scacheri
 (2014).
 "Enhancer
 variants:
 evaluating
 functions
 in
 common
 
disease."
 Genome
 Med
 6(10):
 85.
 
Danussi,
 C.,
 U.
 D.
 Akavia,
 F.
 Niola,
 A.
 Jovic,
 A.
 Lasorella,
 D.
 Pe'er
 and
 A.
 Iavarone
 (2013).
 
"RHPN2
 Drives
 Mesenchymal
 Transformation
 in
 Malignant
 Glioma
 by
 Triggering
 RhoA
 
Activation."
 Cancer
 Res
 73(16):
 5140-­‐‑5150.
 
Dayeh,
 T.
 A.,
 A.
 H.
 Olsson,
 P.
 Volkov,
 P.
 Almgren,
 T.
 Ronn
 and
 C.
 Ling
 (2013).
 "Identification
 of
 
CpG-­‐‑SNPs
 associated
 with
 type
 2
 diabetes
 and
 differential
 DNA
 methylation
 in
 human
 
pancreatic
 islets."
 Diabetologia
 56(5):
 1036-­‐‑1046.
 
Dimas,
 A.
 S.,
 S.
 Deutsch,
 B.
 E.
 Stranger,
 S.
 B.
 Montgomery,
 C.
 Borel,
 H.
 Attar-­‐‑Cohen,
 C.
 Ingle,
 C.
 
Beazley,
 M.
 Gutierrez
 Arcelus,
 M.
 Sekowska,
 M.
 Gagnebin,
 J.
 Nisbett,
 P.
 Deloukas,
 E.
 T.
 
Dermitzakis
 and
 S.
 E.
 Antonarakis
 (2009).
 "Common
 regulatory
 variation
 impacts
 gene
 
expression
 in
 a
 cell
 type-­‐‑dependent
 manner."
 Science
 325(5945):
 1246-­‐‑1250.
 
Ding,
 Q.,
 Y.
 K.
 Lee,
 E.
 A.
 Schaefer,
 D.
 T.
 Peters,
 A.
 Veres,
 K.
 Kim,
 N.
 Kuperwasser,
 D.
 L.
 Motola,
 
T.
 B.
 Meissner,
 W.
 T.
 Hendriks,
 M.
 Trevisan,
 R.
 M.
 Gupta,
 A.
 Moisan,
 E.
 Banks,
 M.
 Friesen,
 R.
 T.
 
Schinzel,
 F.
 Xia,
 A.
 Tang,
 Y.
 Xia,
 E.
 Figueroa,
 A.
 Wann,
 T.
 Ahfeldt,
 L.
 Daheron,
 F.
 Zhang,
 L.
 L.
 
Rubin,
 L.
 F.
 Peng,
 R.
 T.
 Chung,
 K.
 Musunuru
 and
 C.
 A.
 Cowan
 (2013).
 "A
 TALEN
 genome-­‐‑
editing
 system
 for
 generating
 human
 stem
 cell-­‐‑based
 disease
 models."
 Cell
 Stem
 Cell
 12(2):
 
238-­‐‑251.
 
Ding,
 Z.,
 Y.
 Ni,
 S.
 W.
 Timmer,
 B.
 K.
 Lee,
 A.
 Battenhouse,
 S.
 Louzada,
 F.
 Yang,
 I.
 Dunham,
 G.
 E.
 
Crawford,
 J.
 D.
 Lieb,
 R.
 Durbin,
 V.
 R.
 Iyer
 and
 E.
 Birney
 (2014).
 "Quantitative
 genetics
 of
 CTCF
 
binding
 reveal
 local
 sequence
 effects
 and
 different
 modes
 of
 X-­‐‑chromosome
 association."
 
PLoS
 Genet
 10(11):
 e1004798.
 
Diolaiti,
 D.,
 L.
 McFerrin,
 P.
 A.
 Carroll
 and
 R.
 N.
 Eisenman
 (2014).
 "Functional
 interactions
 
among
 members
 of
 the
 MAX
 and
 MLX
 transcriptional
 network
 during
 oncogenesis."
 Biochim
 
Biophys
 Acta:
 S1874-­‐‑9399.
 

  154
 
Dryden,
 N.
 H.,
 L.
 R.
 Broome,
 F.
 Dudbridge,
 N.
 Johnson,
 N.
 Orr,
 S.
 Schoenfelder,
 T.
 Nagano,
 S.
 
Andrews,
 S.
 Wingett,
 I.
 Kozarewa,
 I.
 Assiotis,
 K.
 Fenwick,
 S.
 L.
 Maguire,
 J.
 Campbell,
 R.
 
Natrajan,
 M.
 Lambros,
 E.
 Perrakis,
 A.
 Ashworth,
 P.
 Fraser
 and
 O.
 Fletcher
 (2014).
 "Unbiased
 
analysis
 of
 potential
 targets
 of
 breast
 cancer
 susceptibility
 loci
 by
 Capture
 Hi-­‐‑C."
 Genome
 
Res
 24(11):
 1854-­‐‑1868.
 
Du,
 M.,
 T.
 Yuan,
 K.
 F.
 Schilter,
 R.
 L.
 Dittmar,
 A.
 Mackinnon,
 X.
 Huang,
 M.
 Tschannen,
 E.
 
Worthey,
 H.
 Jacob,
 S.
 Xia,
 J.
 Gao,
 L.
 Tillmans,
 Y.
 Lu,
 P.
 Liu,
 S.
 N.
 Thibodeau
 and
 L.
 Wang
 
(2015).
 "Prostate
 cancer
 risk
 locus
 at
 8q24
 as
 a
 regulatory
 hub
 by
 physical
 interactions
 with
 
multiple
 genomic
 loci
 across
 the
 genome."
 Hum
 Mol
 Genet
 24(1):
 154-­‐‑166.
 
Dunlop,
 M.
 G.,
 S.
 E.
 Dobbins,
 S.
 M.
 Farrington,
 A.
 M.
 Jones,
 C.
 Palles,
 N.
 Whiffin,
 A.
 Tenesa,
 S.
 
Spain,
 P.
 Broderick,
 L.
 Y.
 Ooi,
 E.
 Domingo,
 C.
 Smillie,
 M.
 Henrion,
 M.
 Frampton,
 L.
 Martin,
 G.
 
Grimes,
 M.
 Gorman,
 C.
 Semple,
 Y.
 P.
 Ma,
 E.
 Barclay,
 J.
 Prendergast,
 J.
 B.
 Cazier,
 B.
 Olver,
 S.
 
Penegar,
 S.
 Lubbe,
 I.
 Chander,
 L.
 G.
 Carvajal-­‐‑Carmona,
 S.
 Ballereau,
 A.
 Lloyd,
 J.
 
Vijayakrishnan,
 L.
 Zgaga,
 I.
 Rudan,
 E.
 Theodoratou,
 J.
 M.
 Starr,
 I.
 Deary,
 I.
 Kirac,
 D.
 Kovacevic,
 
L.
 A.
 Aaltonen,
 L.
 Renkonen-­‐‑Sinisalo,
 J.
 P.
 Mecklin,
 K.
 Matsuda,
 Y.
 Nakamura,
 Y.
 Okada,
 S.
 
Gallinger,
 D.
 J.
 Duggan,
 D.
 Conti,
 P.
 Newcomb,
 J.
 Hopper,
 M.
 A.
 Jenkins,
 F.
 Schumacher,
 G.
 
Casey,
 D.
 Easton,
 M.
 Shah,
 P.
 Pharoah,
 A.
 Lindblom,
 T.
 Liu,
 C.
 G.
 Smith,
 H.
 West,
 J.
 P.
 Cheadle,
 
R.
 Midgley,
 D.
 J.
 Kerr,
 H.
 Campbell,
 I.
 P.
 Tomlinson
 and
 R.
 S.
 Houlston
 (2012).
 "Common
 
variation
 near
 CDKN1A,
 POLD3
 and
 SHROOM2
 influences
 colorectal
 cancer
 risk."
 Nat
 Genet
 
44(7):
 770-­‐‑776.
 
Edwards,
 S.
 L.,
 J.
 Beesley,
 J.
 D.
 French
 and
 A.
 M.
 Dunning
 (2013).
 "Beyond
 GWASs:
 
illuminating
 the
 dark
 road
 from
 association
 to
 function."
 Am
 J
 Hum
 Genet
 93(5):
 779-­‐‑797.
 
Emilsson,
 V.,
 G.
 Thorleifsson,
 B.
 Zhang,
 A.
 S.
 Leonardson,
 F.
 Zink,
 J.
 Zhu,
 S.
 Carlson,
 A.
 
Helgason,
 G.
 B.
 Walters,
 S.
 Gunnarsdottir,
 M.
 Mouy,
 V.
 Steinthorsdottir,
 G.
 H.
 Eiriksdottir,
 G.
 
Bjornsdottir,
 I.
 Reynisdottir,
 D.
 Gudbjartsson,
 A.
 Helgadottir,
 A.
 Jonasdottir,
 A.
 Jonasdottir,
 U.
 
Styrkarsdottir,
 S.
 Gretarsdottir,
 K.
 P.
 Magnusson,
 H.
 Stefansson,
 R.
 Fossdal,
 K.
 Kristjansson,
 
H.
 G.
 Gislason,
 T.
 Stefansson,
 B.
 G.
 Leifsson,
 U.
 Thorsteinsdottir,
 J.
 R.
 Lamb,
 J.
 R.
 Gulcher,
 M.
 L.
 
Reitman,
 A.
 Kong,
 E.
 E.
 Schadt
 and
 K.
 Stefansson
 (2008).
 "Genetics
 of
 gene
 expression
 and
 its
 
effect
 on
 disease."
 Nature
 452(7186):
 423-­‐‑428.
 
ENCODE_Project_Consortium
 (2012).
 "An
 integrated
 encyclopedia
 of
 DNA
 elements
 in
 the
 
human
 genome."
 Nature
 489(7414):
 57-­‐‑74.
 
Ernst,
 J.,
 P.
 Kheradpour,
 T.
 S.
 Mikkelsen,
 N.
 Shoresh,
 L.
 D.
 Ward,
 C.
 B.
 Epstein,
 X.
 Zhang,
 L.
 
Wang,
 R.
 Issner,
 M.
 Coyne,
 M.
 Ku,
 T.
 Durham,
 M.
 Kellis
 and
 B.
 E.
 Bernstein
 (2011).
 "Mapping
 
and
 analysis
 of
 chromatin
 state
 dynamics
 in
 nine
 human
 cell
 types."
 Nature
 473(7345):
 43-­‐‑
49.
 
Fairfax,
 B.
 P.,
 P.
 Humburg,
 S.
 Makino,
 V.
 Naranbhai,
 D.
 Wong,
 E.
 Lau,
 L.
 Jostins,
 K.
 Plant,
 R.
 
Andrews,
 C.
 McGee
 and
 J.
 C.
 Knight
 (2014).
 "Innate
 immune
 activity
 conditions
 the
 effect
 of
 
regulatory
 variants
 upon
 monocyte
 gene
 expression."
 Science
 343(6175):
 1246949.
 
Farh,
 K.
 K.,
 A.
 Marson,
 J.
 Zhu,
 M.
 Kleinewietfeld,
 W.
 J.
 Housley,
 S.
 Beik,
 N.
 Shoresh,
 H.
 Whitton,
 
R.
 J.
 Ryan,
 A.
 A.
 Shishkin,
 M.
 Hatan,
 M.
 J.
 Carrasco-­‐‑Alfonso,
 D.
 Mayer,
 C.
 J.
 Luckey,
 N.
 A.
 
Patsopoulos,
 P.
 L.
 De
 Jager,
 V.
 K.
 Kuchroo,
 C.
 B.
 Epstein,
 M.
 J.
 Daly,
 D.
 A.
 Hafler
 and
 B.
 E.
 
Bernstein
 (2015).
 "Genetic
 and
 epigenetic
 fine
 mapping
 of
 causal
 autoimmune
 disease
 
variants."
 Nature
 518(7539):
 337-­‐‑343.
 
Farnham,
 P.
 J.
 (2009).
 "Insights
 from
 genomic
 profiling
 of
 transcription
 factors."
 Nature
 Rev
 
Genet
 10:
 605-­‐‑616.
 

  155
 
Farnham,
 P.
 J.
 (2012).
 "Thematic
 Minireview
 Series
 on
 Results
 from
 the
 ENCODE
 Project:
 
Integrative
 Global
 Analyses
 of
 Regulatory
 Regions
 in
 the
 Human
 Genome."
 J
 Biol
 Chem
 
287(37):
 30885-­‐‑30887.
 
Fearon,
 E.
 R.
 (2011).
 "Molecular
 genetics
 of
 colorectal
 cancer."
 Annu
 Rev
 Pathol
 6:
 479-­‐‑507.
 
Fortini,
 B.
 K.,
 S.
 Tring,
 S.
 J.
 Plummer,
 C.
 K.
 Edlund,
 V.
 Moreno,
 R.
 S.
 Bresalier,
 E.
 L.
 Barry,
 T.
 R.
 
Church,
 J.
 C.
 Figueiredo
 and
 G.
 Casey
 (2014).
 "Multiple
 functional
 risk
 variants
 in
 a
 SMAD7
 
enhancer
 implicate
 a
 colorectal
 cancer
 risk
 haplotype."
 PLoS
 One
 9(11):
 e111914.
 
Freedman,
 M.
 L.,
 A.
 N.
 Monteiro,
 S.
 A.
 Gayther,
 G.
 A.
 Coetzee,
 A.
 Risch,
 C.
 Plass,
 G.
 Casey,
 M.
 De
 
Biasi,
 C.
 Carlson,
 D.
 Duggan,
 M.
 James,
 P.
 Liu,
 J.
 W.
 Tichelaar,
 H.
 G.
 Vikis,
 M.
 You
 and
 I.
 G.
 Mills
 
(2011).
 "Principles
 for
 the
 post-­‐‑GWAS
 functional
 characterization
 of
 cancer
 risk
 loci."
 Nat
 
Genet
 43(6):
 513-­‐‑518.
 
Frietze,
 S.,
 R.
 Wang,
 L.
 Yao,
 Y.
 G.
 Tak,
 Z.
 Ye,
 M.
 Gaddis,
 H.
 Witt,
 P.
 J.
 Farnham
 and
 V.
 X.
 Jin
 
(2012).
 "Cell
 type-­‐‑specific
 binding
 patterns
 reveal
 that
 TCF7L2
 can
 be
 tethered
 to
 the
 
genome
 by
 association
 with
 GATA3."
 Genome
 Biol
 13(9):
 R52.
 
Gabay,
 M.,
 Y.
 Li
 and
 D.
 W.
 Felsher
 (2014).
 "MYC
 activation
 is
 a
 hallmark
 of
 cancer
 initiation
 
and
 maintenance."
 Cold
 Spring
 Harb
 Perspect
 Med
 4(6).
 
Gaddis,
 M.,
 D.
 Gerrard,
 S.
 Frietze
 and
 P.
 J.
 Farnham
 (2015).
 "Altering
 cancer
 transcriptomes
 
using
 epigenomic
 inhibitors."
 Epigenetics
 Chromatin
 8:
 9.
 
Gao,
 X.,
 J.
 C.
 Tsang,
 F.
 Gaba,
 D.
 Wu,
 L.
 Lu
 and
 P.
 Liu
 (2014).
 "Comparison
 of
 TALE
 designer
 
transcription
 factors
 and
 the
 CRISPR/dCas9
 in
 regulation
 of
 gene
 expression
 by
 targeting
 
enhancers."
 Nucleic
 Acids
 Res
 42(20):
 e155.
 
Garte,
 S.
 J.
 (1993).
 "The
 c-­‐‑myc
 oncogene
 in
 tumor
 progression."
 Critical
 Reviews
 in
 Oncog.
 
4(4):
 435-­‐‑449.
 
Ghazalpour,
 A.,
 B.
 Bennett,
 V.
 A.
 Petyuk,
 L.
 Orozco,
 R.
 Hagopian,
 I.
 N.
 Mungrue,
 C.
 R.
 Farber,
 J.
 
Sinsheimer,
 H.
 M.
 Kang,
 N.
 Furlotte,
 C.
 C.
 Park,
 P.
 Z.
 Wen,
 H.
 Brewer,
 K.
 Weitz,
 D.
 G.
 Camp,
 
2nd,
 C.
 Pan,
 R.
 Yordanova,
 I.
 Neuhaus,
 C.
 Tilford,
 N.
 Siemers,
 P.
 Gargalovic,
 E.
 Eskin,
 T.
 
Kirchgessner,
 D.
 J.
 Smith,
 R.
 D.
 Smith
 and
 A.
 J.
 Lusis
 (2011).
 "Comparative
 analysis
 of
 
proteome
 and
 transcriptome
 variation
 in
 mouse."
 PLoS
 Genet
 7(6):
 e1001393.
 
Gibson,
 G.,
 J.
 E.
 Powell
 and
 U.
 M.
 Marigorta
 (2015).
 "Expression
 quantitative
 trait
 locus
 
analysis
 for
 translational
 medicine."
 Genome
 Med
 7(1):
 60.
 
Grobarczyk,
 B.,
 B.
 Franco,
 K.
 Hanon
 and
 B.
 Malgrange
 (2015).
 "Generation
 of
 Isogenic
 
Human
 iPS
 Cell
 Line
 Precisely
 Corrected
 by
 Genome
 Editing
 Using
 the
 CRISPR/Cas9
 
System."
 Stem
 Cell
 Rev
 11(5):
 774-­‐‑787.
 
Guo,
 Y.,
 D.
 V.
 Conti
 and
 K.
 Wang
 (2015).
 "Enlight:
 web-­‐‑based
 integration
 of
 GWAS
 results
 
with
 biological
 annotations."
 Bioinformatics
 31(2):
 275-­‐‑276.
 
Guo,
 Y.,
 Q.
 Xu,
 D.
 Canzio,
 J.
 Shou,
 J.
 Li,
 D.
 U.
 Gorkin,
 I.
 Jung,
 H.
 Wu,
 Y.
 Zhai,
 Y.
 Tang,
 Y.
 Lu,
 Y.
 Wu,
 
Z.
 Jia,
 W.
 Li,
 M.
 Q.
 Zhang,
 B.
 Ren,
 A.
 R.
 Krainer,
 T.
 Maniatis
 and
 Q.
 Wu
 (2015).
 "CRISPR
 
Inversion
 of
 CTCF
 Sites
 Alters
 Genome
 Topology
 and
 Enhancer/Promoter
 Function."
 Cell
 
162(4):
 900-­‐‑910.
 
Gutierrez-­‐‑Arcelus,
 M.,
 T.
 Lappalainen,
 S.
 B.
 Montgomery,
 A.
 Buil,
 H.
 Ongen,
 A.
 Yurovsky,
 J.
 
Bryois,
 T.
 Giger,
 L.
 Romano,
 A.
 Planchon,
 E.
 Falconnet,
 D.
 Bielser,
 M.
 Gagnebin,
 I.
 Padioleau,
 C.
 
Borel,
 A.
 Letourneau,
 P.
 Makrythanasis,
 M.
 Guipponi,
 C.
 Gehrig,
 S.
 E.
 Antonarakis
 and
 E.
 T.
 
Dermitzakis
 (2013).
 "Passive
 and
 active
 DNA
 methylation
 and
 the
 interplay
 with
 genetic
 
variation
 in
 gene
 regulation."
 Elife
 2:
 e00523.
 

  156
 
Han,
 Y.,
 D.
 J.
 Hazelett,
 F.
 Wiklund,
 F.
 R.
 Schumacher,
 D.
 O.
 Stram,
 S.
 I.
 Berndt,
 Z.
 Wang,
 K.
 A.
 
Rand,
 R.
 N.
 Hoover,
 M.
 J.
 Machiela,
 M.
 Yeager,
 L.
 Burdette,
 C.
 C.
 Chung,
 A.
 Hutchinson,
 K.
 Yu,
 J.
 
Xu,
 R.
 C.
 Travis,
 T.
 J.
 Key,
 A.
 Siddiq,
 F.
 Canzian,
 A.
 Takahashi,
 M.
 Kubo,
 J.
 L.
 Stanford,
 S.
 Kolb,
 S.
 
M.
 Gapstur,
 W.
 R.
 Diver,
 V.
 L.
 Stevens,
 S.
 S.
 Strom,
 C.
 A.
 Pettaway,
 A.
 A.
 Al
 Olama,
 Z.
 Kote-­‐‑Jarai,
 
R.
 A.
 Eeles,
 E.
 D.
 Yeboah,
 Y.
 Tettey,
 R.
 B.
 Biritwum,
 A.
 A.
 Adjei,
 E.
 Tay,
 A.
 Truelove,
 S.
 Niwa,
 A.
 
P.
 Chokkalingam,
 W.
 B.
 Isaacs,
 C.
 Chen,
 S.
 Lindstrom,
 L.
 Le
 Marchand,
 E.
 L.
 Giovannucci,
 M.
 
Pomerantz,
 H.
 Long,
 F.
 Li,
 J.
 Ma,
 M.
 Stampfer,
 E.
 M.
 John,
 S.
 A.
 Ingles,
 R.
 A.
 Kittles,
 A.
 B.
 
Murphy,
 W.
 J.
 Blot,
 L.
 B.
 Signorello,
 W.
 Zheng,
 D.
 Albanes,
 J.
 Virtamo,
 S.
 Weinstein,
 B.
 
Nemesure,
 J.
 Carpten,
 M.
 C.
 Leske,
 S.
 Y.
 Wu,
 A.
 J.
 Hennis,
 B.
 A.
 Rybicki,
 C.
 Neslund-­‐‑Dudas,
 A.
 W.
 
Hsing,
 L.
 Chu,
 P.
 J.
 Goodman,
 E.
 A.
 Klein,
 S.
 L.
 Zheng,
 J.
 S.
 Witte,
 G.
 Casey,
 E.
 Riboli,
 Q.
 Li,
 M.
 L.
 
Freedman,
 D.
 J.
 Hunter,
 H.
 Gronberg,
 M.
 B.
 Cook,
 H.
 Nakagawa,
 P.
 Kraft,
 S.
 J.
 Chanock,
 D.
 F.
 
Easton,
 B.
 E.
 Henderson,
 G.
 A.
 Coetzee,
 D.
 V.
 Conti
 and
 C.
 A.
 Haiman
 (2015).
 "Integration
 of
 
multiethnic
 fine-­‐‑mapping
 and
 genomic
 annotation
 to
 prioritize
 candidate
 functional
 SNPs
 at
 
prostate
 cancer
 susceptibility
 regions."
 Hum
 Mol
 Genet.
 
Han,
 Y.,
 O.
 J.
 Slivano,
 C.
 K.
 Christie,
 A.
 W.
 Cheng
 and
 J.
 M.
 Miano
 (2015).
 "CRISPR-­‐‑Cas9
 
genome
 editing
 of
 a
 single
 regulatory
 element
 nearly
 abolishes
 target
 gene
 expression
 in
 
mice-­‐‑-­‐‑brief
 report."
 Arterioscler
 Thromb
 Vasc
 Biol
 35(2):
 312-­‐‑315.
 
Hardison,
 R.
 C.
 (2012).
 "Genome-­‐‑wide
 Epigenetic
 Data
 Facilitate
 Understanding
 of
 Disease
 
Susceptibility
 Association
 Studies."
 J
 Biol
 Chem
 287(37):
 30932-­‐‑30940.
 
Hazelett,
 D.
 J.,
 S.
 K.
 Rhie,
 M.
 Gaddis,
 C.
 Yan,
 D.
 L.
 Lakeland,
 S.
 G.
 Coetzee,
 B.
 E.
 Henderson,
 H.
 
Noushmehr,
 W.
 Cozen,
 Z.
 Kote-­‐‑Jarai,
 R.
 A.
 Eeles,
 D.
 F.
 Easton,
 C.
 A.
 Haiman,
 W.
 Lu,
 P.
 J.
 
Farnham
 and
 G.
 A.
 Coetzee
 (2014).
 "Comprehensive
 functional
 annotation
 of
 77
 prostate
 
cancer
 risk
 loci."
 PLoS
 Genet
 10(1):
 e1004102.
 
Heinz,
 S.,
 C.
 E.
 Romanoski,
 C.
 Benner,
 K.
 A.
 Allison,
 M.
 U.
 Kaikkonen,
 L.
 D.
 Orozco
 and
 C.
 K.
 
Glass
 (2013).
 "Effect
 of
 natural
 genetic
 variation
 on
 enhancer
 selection
 and
 function."
 
Nature
 503(7477):
 487-­‐‑492.
 
Heyn,
 H.,
 S.
 Sayols,
 C.
 Moutinho,
 E.
 Vidal,
 J.
 V.
 Sanchez-­‐‑Mut,
 O.
 A.
 Stefansson,
 E.
 Nadal,
 S.
 
Moran,
 J.
 E.
 Eyfjord,
 E.
 Gonzalez-­‐‑Suarez,
 M.
 A.
 Pujana
 and
 M.
 Esteller
 (2014).
 "Linkage
 of
 
DNA
 methylation
 quantitative
 trait
 loci
 to
 human
 cancer
 risk."
 Cell
 Rep
 7(2):
 331-­‐‑338.
 
Hilton,
 I.
 B.,
 A.
 M.
 D'Ippolito,
 C.
 M.
 Vockley,
 P.
 I.
 Thakore,
 G.
 E.
 Crawford,
 T.
 E.
 Reddy
 and
 C.
 A.
 
Gersbach
 (2015).
 "Epigenome
 editing
 by
 a
 CRISPR-­‐‑Cas9-­‐‑based
 acetyltransferase
 activates
 
genes
 from
 promoters
 and
 enhancers."
 Nat
 Biotechnol
 33(5):
 510-­‐‑517.
 
Hindorff,
 L.
 A.,
 P.
 Sethupathy,
 H.
 A.
 Junkins,
 E.
 M.
 Ramos,
 J.
 P.
 Mehta,
 F.
 S.
 Collins
 and
 T.
 A.
 
Manolio
 (2009).
 "Potential
 etiologic
 and
 functional
 implications
 of
 genome-­‐‑wide
 association
 
loci
 for
 human
 diseases
 and
 traits."
 Proc
 Natl
 Acad
 Sci
 U
 S
 A
 106(23):
 9362-­‐‑9367.
 
Hitchins,
 M.
 P.,
 R.
 W.
 Rapkins,
 C.
 T.
 Kwok,
 S.
 Srivastava,
 J.
 J.
 Wong,
 L.
 M.
 Khachigian,
 P.
 Polly,
 J.
 
Goldblatt
 and
 R.
 L.
 Ward
 (2011).
 "Dominantly
 inherited
 constitutional
 epigenetic
 silencing
 
of
 MLH1
 in
 a
 cancer-­‐‑affected
 family
 is
 linked
 to
 a
 single
 nucleotide
 variant
 within
 the
 
5'UTR."
 Cancer
 Cell
 20(2):
 200-­‐‑213.
 
Holwerda,
 S.
 J.
 and
 W.
 de
 Laat
 (2013).
 "CTCF:
 the
 protein,
 the
 binding
 partners,
 the
 binding
 
sites
 and
 their
 chromatin
 loops."
 Philos
 Trans
 R
 Soc
 Lond
 B
 Biol
 Sci
 368(1620):
 20120369.
 
Home,
 P.,
 B.
 Saha,
 S.
 Ray,
 D.
 Dutta,
 S.
 Gunewardena,
 B.
 Yoo,
 A.
 Pal,
 J.
 L.
 Vivian,
 M.
 Larson,
 M.
 
Petroff,
 P.
 G.
 Gallagher,
 V.
 P.
 Schulz,
 K.
 L.
 White,
 T.
 G.
 Golos,
 B.
 Behr
 and
 S.
 Paul
 (2012).
 
"Altered
 subcellular
 localization
 of
 transcription
 factor
 TEAD4
 regulates
 first
 mammalian
 
cell
 lineage
 commitment."
 Proc
 Natl
 Acad
 Sci
 U
 S
 A
 109(19):
 7362-­‐‑7367.
 

  157
 
Houlston,
 R.
 S.,
 J.
 Cheadle,
 S.
 E.
 Dobbins,
 A.
 Tenesa,
 A.
 M.
 Jones,
 K.
 Howarth,
 S.
 L.
 Spain,
 P.
 
Broderick,
 E.
 Domingo,
 S.
 Farrington,
 J.
 G.
 Prendergast,
 A.
 M.
 Pittman,
 E.
 Theodoratou,
 C.
 G.
 
Smith,
 B.
 Olver,
 A.
 Walther,
 R.
 A.
 Barnetson,
 M.
 Churchman,
 E.
 E.
 Jaeger,
 S.
 Penegar,
 E.
 
Barclay,
 L.
 Martin,
 M.
 Gorman,
 R.
 Mager,
 E.
 Johnstone,
 R.
 Midgley,
 I.
 Niittymaki,
 S.
 Tuupanen,
 
J.
 Colley,
 S.
 Idziaszczyk,
 H.
 J.
 Thomas,
 A.
 M.
 Lucassen,
 D.
 G.
 Evans,
 E.
 R.
 Maher,
 T.
 Maughan,
 A.
 
Dimas,
 E.
 Dermitzakis,
 J.
 B.
 Cazier,
 L.
 A.
 Aaltonen,
 P.
 Pharoah,
 D.
 J.
 Kerr,
 L.
 G.
 Carvajal-­‐‑
Carmona,
 H.
 Campbell,
 M.
 G.
 Dunlop
 and
 I.
 P.
 Tomlinson
 (2010).
 "Meta-­‐‑analysis
 of
 three
 
genome-­‐‑wide
 association
 studies
 identifies
 susceptibility
 loci
 for
 colorectal
 cancer
 at
 1q41,
 
3q26.2,
 12q13.13
 and
 20q13.33."
 Nat
 Genet
 42(11):
 973-­‐‑977.
 
Houlston,
 R.
 S.,
 E.
 Webb,
 P.
 Broderick,
 A.
 M.
 Pittman,
 M.
 C.
 Di
 Bernardo,
 S.
 Lubbe,
 I.
 Chandler,
 
J.
 Vijayakrishnan,
 K.
 Sullivan,
 S.
 Penegar,
 L.
 Carvajal-­‐‑Carmona,
 K.
 Howarth,
 E.
 Jaeger,
 S.
 L.
 
Spain,
 A.
 Walther,
 E.
 Barclay,
 L.
 Martin,
 M.
 Gorman,
 E.
 Domingo,
 A.
 S.
 Teixeira,
 D.
 Kerr,
 J.
 B.
 
Cazier,
 I.
 Niittymaki,
 S.
 Tuupanen,
 A.
 Karhu,
 L.
 A.
 Aaltonen,
 I.
 P.
 Tomlinson,
 S.
 M.
 Farrington,
 
A.
 Tenesa,
 J.
 G.
 Prendergast,
 R.
 A.
 Barnetson,
 R.
 Cetnarskyj,
 M.
 E.
 Porteous,
 P.
 D.
 Pharoah,
 T.
 
Koessler,
 J.
 Hampe,
 S.
 Buch,
 C.
 Schafmayer,
 J.
 Tepel,
 S.
 Schreiber,
 H.
 Volzke,
 J.
 Chang-­‐‑Claude,
 
M.
 Hoffmeister,
 H.
 Brenner,
 B.
 W.
 Zanke,
 A.
 Montpetit,
 T.
 J.
 Hudson,
 S.
 Gallinger,
 H.
 Campbell
 
and
 M.
 G.
 Dunlop
 (2008).
 "Meta-­‐‑analysis
 of
 genome-­‐‑wide
 association
 data
 identifies
 four
 
new
 susceptibility
 loci
 for
 colorectal
 cancer."
 Nat
 Genet
 40(12):
 1426-­‐‑1435.
 
Howie,
 B.
 N.,
 P.
 Donnelly
 and
 J.
 Marchini
 (2009).
 "A
 flexible
 and
 accurate
 genotype
 
imputation
 method
 for
 the
 next
 generation
 of
 genome-­‐‑wide
 association
 studies."
 PLoS
 Genet
 
5(6):
 e1000529.
 
Hsu,
 P.
 D.,
 E.
 S.
 Lander
 and
 F.
 Zhang
 (2014).
 "Development
 and
 applications
 of
 CRISPR-­‐‑Cas9
 
for
 genome
 engineering."
 Cell
 157(6):
 1262-­‐‑1278.
 
Hsu,
 P.
 D.,
 D.
 A.
 Scott,
 J.
 A.
 Weinstein,
 F.
 A.
 Ran,
 S.
 Konermann,
 V.
 Agarwala,
 Y.
 Li,
 E.
 J.
 Fine,
 X.
 
Wu,
 O.
 Shalem,
 T.
 J.
 Cradick,
 L.
 A.
 Marraffini,
 G.
 Bao
 and
 F.
 Zhang
 (2013).
 "DNA
 targeting
 
specificity
 of
 RNA-­‐‑guided
 Cas9
 nucleases."
 Nat
 Biotechnol
 31(9):
 827-­‐‑832.
 
Huang,
 G.
 L.,
 H.
 Q.
 Guo,
 F.
 Yang,
 O.
 F.
 Liu,
 B.
 B.
 Li,
 X.
 Y.
 Liu,
 Y.
 Lu
 and
 Z.
 W.
 He
 (2012).
 
"Activating
 transcription
 factor
 1
 is
 a
 prognostic
 marker
 of
 colorectal
 cancer."
 Asian
 Pac
 J
 
Cancer
 Prev
 13(3):
 1053-­‐‑1057.
 
Hughes,
 J.
 R.,
 N.
 Roberts,
 S.
 McGowan,
 D.
 Hay,
 E.
 Giannoulatou,
 M.
 Lynch,
 M.
 De
 Gobbi,
 S.
 
Taylor,
 R.
 Gibbons
 and
 D.
 R.
 Higgs
 (2014).
 "Analysis
 of
 hundreds
 of
 cis-­‐‑regulatory
 
landscapes
 at
 high
 resolution
 in
 a
 single,
 high-­‐‑throughput
 experiment."
 Nat
 Genet
 46(2):
 
205-­‐‑212.
 
Huppi,
 K.,
 J.
 J.
 Pitt,
 B.
 M.
 Wahlberg
 and
 N.
 J.
 Caplen
 (2012).
 "The
 8q24
 gene
 desert:
 an
 oasis
 of
 
non-­‐‑coding
 transcriptional
 activity."
 Front
 Genet
 3:
 69.
 
Jager,
 R.,
 G.
 Migliorini,
 M.
 Henrion,
 R.
 Kandaswamy,
 H.
 E.
 Speedy,
 A.
 Heindl,
 N.
 Whiffin,
 M.
 J.
 
Carnicer,
 L.
 Broome,
 N.
 Dryden,
 T.
 Nagano,
 S.
 Schoenfelder,
 M.
 Enge,
 Y.
 Yuan,
 J.
 Taipale,
 P.
 
Fraser,
 O.
 Fletcher
 and
 R.
 S.
 Houlston
 (2015).
 "Capture
 Hi-­‐‑C
 identifies
 the
 chromatin
 
interactome
 of
 colorectal
 cancer
 risk
 loci."
 Nat
 Commun
 6:
 6178.
 
Jeon,
 B.
 N.,
 M.
 K.
 Kim,
 W.
 I.
 Choi,
 D.
 I.
 Koh,
 S.
 Y.
 Hong,
 K.
 S.
 Kim,
 M.
 Kim,
 C.
 O.
 Yun,
 J.
 Yoon,
 K.
 Y.
 
Choi,
 K.
 R.
 Lee,
 K.
 P.
 Nephew
 and
 M.
 W.
 Hur
 (2012).
 "KR-­‐‑POK
 interacts
 with
 p53
 and
 
represses
 its
 ability
 to
 activate
 transcription
 of
 p21WAF1/CDKN1A."
 Cancer
 Res
 72(5):
 
1137-­‐‑1148.
 

  158
 
Jin,
 F.,
 Y.
 Li,
 J.
 R.
 Dixon,
 S.
 Selvaraj,
 Z.
 Ye,
 A.
 Y.
 Lee,
 C.
 A.
 Yen,
 A.
 D.
 Schmitt,
 C.
 A.
 Espinoza
 and
 
B.
 Ren
 (2013).
 "A
 high-­‐‑resolution
 map
 of
 the
 three-­‐‑dimensional
 chromatin
 interactome
 in
 
human
 cells."
 Nature
 503(7475):
 290-­‐‑294.
 
Jones,
 P.
 A.
 (2012).
 "Functions
 of
 DNA
 methylation:
 islands,
 start
 sites,
 gene
 bodies
 and
 
beyond."
 Nat
 Rev
 Genet
 13(7):
 484-­‐‑492.
 
Kabadi,
 A.
 M.
 and
 C.
 A.
 Gersbach
 (2014).
 "Engineering
 synthetic
 TALE
 and
 CRISPR/Cas9
 
transcription
 factors
 for
 regulating
 gene
 expression."
 Methods
 69(2):
 188-­‐‑197.
 
Karagiannis,
 G.
 S.,
 A.
 Berk,
 A.
 Dimitromanolakis
 and
 E.
 P.
 Diamandis
 (2013).
 "Enrichment
 
map
 profiling
 of
 the
 cancer
 invasion
 front
 suggests
 regulation
 of
 colorectal
 cancer
 
progression
 by
 the
 bone
 morphogenetic
 protein
 antagonist,
 gremlin-­‐‑1."
 Mol
 Oncol
 7(4):
 
826-­‐‑839.
 
Kearns,
 N.
 A.,
 H.
 Pham,
 B.
 Tabak,
 R.
 M.
 Genga,
 N.
 J.
 Silverstein,
 M.
 Garber
 and
 R.
 Maehr
 
(2015).
 "Functional
 annotation
 of
 native
 enhancers
 with
 a
 Cas9-­‐‑histone
 demethylase
 
fusion."
 Nat
 Methods
 12(5):
 401-­‐‑403.
 
Kessler,
 J.
 D.,
 K.
 T.
 Kahle,
 T.
 Sun,
 K.
 L.
 Meerbrey,
 M.
 R.
 Schlabach,
 E.
 M.
 Schmitt,
 S.
 O.
 Skinner,
 
Q.
 Xu,
 M.
 Z.
 Li,
 Z.
 C.
 Hartman,
 M.
 Rao,
 P.
 Yu,
 R.
 Dominguez-­‐‑Vidana,
 A.
 C.
 Liang,
 N.
 L.
 Solimini,
 
R.
 J.
 Bernardi,
 B.
 Yu,
 T.
 Hsu,
 I.
 Golding,
 J.
 Luo,
 C.
 K.
 Osborne,
 C.
 J.
 Creighton,
 S.
 G.
 Hilsenbeck,
 R.
 
Schiff,
 C.
 A.
 Shaw,
 S.
 J.
 Elledge
 and
 T.
 F.
 Westbrook
 (2012).
 "A
 SUMOylation-­‐‑dependent
 
transcriptional
 subprogram
 is
 required
 for
 Myc-­‐‑driven
 tumorigenesis."
 Science
 335(6066):
 
348-­‐‑353.
 
Kichaev,
 G.
 and
 B.
 Pasaniuc
 (2015).
 "Leveraging
 Functional-­‐‑Annotation
 Data
 in
 Trans-­‐‑ethnic
 
Fine-­‐‑Mapping
 Studies."
 Am
 J
 Hum
 Genet
 97(2):
 260-­‐‑271.
 
Kilpinen,
 H.,
 S.
 M.
 Waszak,
 A.
 R.
 Gschwind,
 S.
 K.
 Raghav,
 R.
 M.
 Witwicki,
 A.
 Orioli,
 E.
 
Migliavacca,
 M.
 Wiederkehr,
 M.
 Gutierrez-­‐‑Arcelus,
 N.
 I.
 Panousis,
 A.
 Yurovsky,
 T.
 
Lappalainen,
 L.
 Romano-­‐‑Palumbo,
 A.
 Planchon,
 D.
 Bielser,
 J.
 Bryois,
 I.
 Padioleau,
 G.
 Udin,
 S.
 
Thurnheer,
 D.
 Hacker,
 L.
 J.
 Core,
 J.
 T.
 Lis,
 N.
 Hernandez,
 A.
 Reymond,
 B.
 Deplancke
 and
 E.
 T.
 
Dermitzakis
 (2013).
 "Coordinated
 effects
 of
 sequence
 variation
 on
 DNA
 binding,
 chromatin
 
structure,
 and
 transcription."
 Science
 342(6159):
 744-­‐‑747.
 
Kim,
 H.
 and
 J.
 S.
 Kim
 (2014).
 "A
 guide
 to
 genome
 engineering
 with
 programmable
 
nucleases."
 Nat
 Rev
 Genet
 15(5):
 321-­‐‑334.
 
Knosel,
 T.,
 Y.
 Chen,
 S.
 Hotovy,
 U.
 Settmacher,
 A.
 Altendorf-­‐‑Hofmann
 and
 I.
 Petersen
 (2012).
 
"Loss
 of
 desmocollin
 1-­‐‑3
 and
 homeobox
 genes
 PITX1
 and
 CDX2
 are
 associated
 with
 tumor
 
progression
 and
 survival
 in
 colorectal
 carcinoma."
 Int
 J
 Colorectal
 Dis
 27(11):
 1391-­‐‑1399.
 
Koudritsky,
 M.
 and
 E.
 Domany
 (2008).
 "Positional
 distribution
 of
 human
 transcription
 factor
 
binding
 sites."
 Nucleic
 Acids
 Res
 36(21):
 6795-­‐‑6805.
 
Kraft,
 K.,
 S.
 Geuer,
 A.
 J.
 Will,
 W.
 L.
 Chan,
 C.
 Paliou,
 M.
 Borschiwer,
 I.
 Harabula,
 L.
 Wittler,
 M.
 
Franke,
 D.
 M.
 Ibrahim,
 B.
 K.
 Kragesteen,
 M.
 Spielmann,
 S.
 Mundlos,
 D.
 G.
 Lupianez
 and
 G.
 
Andrey
 (2015).
 "Deletions,
 Inversions,
 Duplications:
 Engineering
 of
 Structural
 Variants
 
using
 CRISPR/Cas
 in
 Mice."
 Cell
 Rep.
 
Krzywinski,
 M.,
 J.
 Schein,
 I.
 Birol,
 J.
 Connors,
 R.
 Gascoyne,
 D.
 Horsman,
 S.
 J.
 Jones
 and
 M.
 A.
 
Marra
 (2009).
 "Circos:
 an
 information
 aesthetic
 for
 comparative
 genomics."
 Genome
 Res
 
19(9):
 1639-­‐‑1645.
 
Kwasnieski,
 J.
 C.,
 C.
 Fiore,
 H.
 G.
 Chaudhari
 and
 B.
 A.
 Cohen
 (2014).
 "High-­‐‑throughput
 
functional
 testing
 of
 ENCODE
 segmentation
 predictions."
 Genome
 Res
 24(10):
 1595-­‐‑1602.
 

  159
 
Lee,
 M.
 N.,
 C.
 Ye,
 A.
 C.
 Villani,
 T.
 Raj,
 W.
 Li,
 T.
 M.
 Eisenhaure,
 S.
 H.
 Imboywa,
 P.
 I.
 Chipendo,
 F.
 
A.
 Ran,
 K.
 Slowikowski,
 L.
 D.
 Ward,
 K.
 Raddassi,
 C.
 McCabe,
 M.
 H.
 Lee,
 I.
 Y.
 Frohlich,
 D.
 A.
 
Hafler,
 M.
 Kellis,
 S.
 Raychaudhuri,
 F.
 Zhang,
 B.
 E.
 Stranger,
 C.
 O.
 Benoist,
 P.
 L.
 De
 Jager,
 A.
 
Regev
 and
 N.
 Hacohen
 (2014).
 "Common
 genetic
 variants
 modulate
 pathogen-­‐‑sensing
 
responses
 in
 human
 dendritic
 cells."
 Science
 343(6175):
 1246980.
 
Li,
 G.,
 X.
 Ruan,
 R.
 K.
 Auerbach,
 K.
 S.
 Sandhu,
 M.
 Zheng,
 P.
 Wang,
 H.
 M.
 Poh,
 Y.
 Goh,
 J.
 Lim,
 J.
 
Zhang,
 H.
 S.
 Sim,
 S.
 Q.
 Peh,
 F.
 H.
 Mulawadi,
 C.
 T.
 Ong,
 Y.
 L.
 Orlov,
 S.
 Hong,
 Z.
 Zhang,
 S.
 Landt,
 D.
 
Raha,
 G.
 Euskirchen,
 C.
 L.
 Wei,
 W.
 Ge,
 H.
 Wang,
 C.
 Davis,
 K.
 I.
 Fisher-­‐‑Aylor,
 A.
 Mortazavi,
 M.
 
Gerstein,
 T.
 Gingeras,
 B.
 Wold,
 Y.
 Sun,
 M.
 J.
 Fullwood,
 E.
 Cheung,
 E.
 Liu,
 W.
 K.
 Sung,
 M.
 Snyder
 
and
 Y.
 Ruan
 (2012).
 "Extensive
 promoter-­‐‑centered
 chromatin
 interactions
 provide
 a
 
topological
 basis
 for
 transcription
 regulation."
 Cell
 148(1-­‐‑2):
 84-­‐‑98.
 
Li,
 H.,
 H.
 Chen,
 F.
 Liu,
 C.
 Ren,
 S.
 Wang,
 X.
 Bo
 and
 W.
 Shu
 (2015).
 "Functional
 annotation
 of
 
HOT
 regions
 in
 the
 human
 genome:
 implications
 for
 human
 disease
 and
 cancer."
 Sci
 Rep
 5:
 
11633.
 
Li,
 J.,
 J.
 Shou,
 Y.
 Guo,
 Y.
 Tang,
 Y.
 Wu,
 Z.
 Jia,
 Y.
 Zhai,
 Z.
 Chen,
 Q.
 Xu
 and
 Q.
 Wu
 (2015).
 "Efficient
 
inversions
 and
 duplications
 of
 mammalian
 regulatory
 DNA
 elements
 and
 gene
 clusters
 by
 
CRISPR/Cas9."
 J
 Mol
 Cell
 Biol
 7(4):
 284-­‐‑298.
 
Li,
 M.
 J.,
 L.
 Y.
 Wang,
 Z.
 Xia,
 P.
 C.
 Sham
 and
 J.
 Wang
 (2013).
 "GWAS3D:
 Detecting
 human
 
regulatory
 variants
 by
 integrative
 analysis
 of
 genome-­‐‑wide
 associations,
 chromosome
 
interactions
 and
 histone
 modifications."
 Nucleic
 Acids
 Res
 41(Web
 Server
 issue):
 W150-­‐‑
158.
 
Li,
 Y.,
 C.
 M.
 Rivera,
 H.
 Ishii,
 F.
 Jin,
 S.
 Selvaraj,
 A.
 Y.
 Lee,
 J.
 R.
 Dixon
 and
 B.
 Ren
 (2014).
 "CRISPR
 
reveals
 a
 distal
 super-­‐‑enhancer
 required
 for
 Sox2
 expression
 in
 mouse
 embryonic
 stem
 
cells."
 PLoS
 One
 9(12):
 e114485.
 
Li,
 Y.,
 C.
 Willer,
 S.
 Sanna
 and
 G.
 Abecasis
 (2009).
 "Genotype
 imputation."
 Annu
 Rev
 Genomics
 
Hum
 Genet
 10:
 387-­‐‑406.
 
Li,
 Y.,
 C.
 J.
 Willer,
 J.
 Ding,
 P.
 Scheet
 and
 G.
 R.
 Abecasis
 (2010).
 "MaCH:
 using
 sequence
 and
 
genotype
 data
 to
 estimate
 haplotypes
 and
 unobserved
 genotypes."
 Genet
 Epidemiol
 34(8):
 
816-­‐‑834.
 
Long,
 C.,
 J.
 R.
 McAnally,
 J.
 M.
 Shelton,
 A.
 A.
 Mireault,
 R.
 Bassel-­‐‑Duby
 and
 E.
 N.
 Olson
 (2014).
 
"Prevention
 of
 muscular
 dystrophy
 in
 mice
 by
 CRISPR/Cas9-­‐‑mediated
 editing
 of
 germline
 
DNA."
 Science
 345(6201):
 1184-­‐‑1188.
 
Ma,
 G.,
 D.
 Gu,
 C.
 Lv,
 H.
 Chu,
 Z.
 Xu,
 N.
 Tong,
 M.
 Wang,
 C.
 Tang,
 Y.
 Xu,
 Z.
 Zhang,
 B.
 Wang
 and
 J.
 
Chen
 (2014).
 "Genetic
 variant
 in
 8q24
 is
 associated
 with
 prognosis
 for
 gastric
 cancer
 in
 a
 
Chinese
 population."
 J
 Gastroenterol
 Hepatol.
 
Mahajan,
 A.,
 M.
 J.
 Go,
 W.
 Zhang,
 J.
 E.
 Below,
 K.
 J.
 Gaulton,
 T.
 Ferreira,
 M.
 Horikoshi,
 A.
 D.
 
Johnson,
 M.
 C.
 Ng,
 I.
 Prokopenko,
 D.
 Saleheen,
 X.
 Wang,
 E.
 Zeggini,
 G.
 R.
 Abecasis,
 L.
 S.
 Adair,
 
P.
 Almgren,
 M.
 Atalay,
 T.
 Aung,
 D.
 Baldassarre,
 B.
 Balkau,
 Y.
 Bao,
 A.
 H.
 Barnett,
 I.
 Barroso,
 A.
 
Basit,
 L.
 F.
 Been,
 J.
 Beilby,
 G.
 I.
 Bell,
 R.
 Benediktsson,
 R.
 N.
 Bergman,
 B.
 O.
 Boehm,
 E.
 
Boerwinkle,
 L.
 L.
 Bonnycastle,
 N.
 Burtt,
 Q.
 Cai,
 H.
 Campbell,
 J.
 Carey,
 S.
 Cauchi,
 M.
 Caulfield,
 J.
 
C.
 Chan,
 L.
 C.
 Chang,
 T.
 J.
 Chang,
 Y.
 C.
 Chang,
 G.
 Charpentier,
 C.
 H.
 Chen,
 H.
 Chen,
 Y.
 T.
 Chen,
 K.
 
S.
 Chia,
 M.
 Chidambaram,
 P.
 S.
 Chines,
 N.
 H.
 Cho,
 Y.
 M.
 Cho,
 L.
 M.
 Chuang,
 F.
 S.
 Collins,
 M.
 C.
 
Cornelis,
 D.
 J.
 Couper,
 A.
 T.
 Crenshaw,
 R.
 M.
 van
 Dam,
 J.
 Danesh,
 D.
 Das,
 U.
 de
 Faire,
 G.
 
Dedoussis,
 P.
 Deloukas,
 A.
 S.
 Dimas,
 C.
 Dina,
 A.
 S.
 Doney,
 P.
 J.
 Donnelly,
 M.
 Dorkhan,
 C.
 van
 
Duijn,
 J.
 Dupuis,
 S.
 Edkins,
 P.
 Elliott,
 V.
 Emilsson,
 R.
 Erbel,
 J.
 G.
 Eriksson,
 J.
 Escobedo,
 T.
 Esko,
 

  160
 
E.
 Eury,
 J.
 C.
 Florez,
 P.
 Fontanillas,
 N.
 G.
 Forouhi,
 T.
 Forsen,
 C.
 Fox,
 R.
 M.
 Fraser,
 T.
 M.
 Frayling,
 
P.
 Froguel,
 P.
 Frossard,
 Y.
 Gao,
 K.
 Gertow,
 C.
 Gieger,
 B.
 Gigante,
 H.
 Grallert,
 G.
 B.
 Grant,
 L.
 C.
 
Grrop,
 C.
 J.
 Groves,
 E.
 Grundberg,
 C.
 Guiducci,
 A.
 Hamsten,
 B.
 G.
 Han,
 K.
 Hara,
 N.
 Hassanali,
 A.
 
T.
 Hattersley,
 C.
 Hayward,
 A.
 K.
 Hedman,
 C.
 Herder,
 A.
 Hofman,
 O.
 L.
 Holmen,
 K.
 Hovingh,
 A.
 
B.
 Hreidarsson,
 C.
 Hu,
 F.
 B.
 Hu,
 J.
 Hui,
 S.
 E.
 Humphries,
 S.
 E.
 Hunt,
 D.
 J.
 Hunter,
 K.
 Hveem,
 Z.
 I.
 
Hydrie,
 H.
 Ikegami,
 T.
 Illig,
 E.
 Ingelsson,
 M.
 Islam,
 B.
 Isomaa,
 A.
 U.
 Jackson,
 T.
 Jafar,
 A.
 James,
 
W.
 Jia,
 K.
 H.
 Jockel,
 A.
 Jonsson,
 J.
 B.
 Jowett,
 T.
 Kadowaki,
 H.
 M.
 Kang,
 S.
 Kanoni,
 W.
 H.
 Kao,
 S.
 
Kathiresan,
 N.
 Kato,
 P.
 Katulanda,
 K.
 M.
 Keinanen-­‐‑Kiukaanniemi,
 A.
 M.
 Kelly,
 H.
 Khan,
 K.
 T.
 
Khaw,
 C.
 C.
 Khor,
 H.
 L.
 Kim,
 S.
 Kim,
 Y.
 J.
 Kim,
 L.
 Kinnunen,
 N.
 Klopp,
 A.
 Kong,
 E.
 Korpi-­‐‑
Hyovalti,
 S.
 Kowlessur,
 P.
 Kraft,
 J.
 Kravic,
 M.
 M.
 Kristensen,
 S.
 Krithika,
 A.
 Kumar,
 J.
 Kumate,
 J.
 
Kuusisto,
 S.
 H.
 Kwak,
 M.
 Laakso,
 V.
 Lagou,
 T.
 A.
 Lakka,
 C.
 Langenberg,
 C.
 Langford,
 R.
 
Lawrence,
 K.
 Leander,
 J.
 M.
 Lee,
 N.
 R.
 Lee,
 M.
 Li,
 X.
 Li,
 Y.
 Li,
 J.
 Liang,
 S.
 Liju,
 W.
 Y.
 Lim,
 L.
 Lind,
 
C.
 M.
 Lindgren,
 E.
 Lindholm,
 C.
 T.
 Liu,
 J.
 J.
 Liu,
 S.
 Lobbens,
 J.
 Long,
 R.
 J.
 Loos,
 W.
 Lu,
 J.
 Luan,
 V.
 
Lyssenko,
 R.
 C.
 Ma,
 S.
 Maeda,
 R.
 Magi,
 S.
 Mannisto,
 D.
 R.
 Matthews,
 J.
 B.
 Meigs,
 O.
 Melander,
 A.
 
Metspalu,
 J.
 Meyer,
 G.
 Mirza,
 E.
 Mihailov,
 S.
 Moebus,
 V.
 Mohan,
 K.
 L.
 Mohlke,
 A.
 D.
 Morris,
 T.
 
W.
 Muhleisen,
 M.
 Muller-­‐‑Nurasyid,
 B.
 Musk,
 J.
 Nakamura,
 E.
 Nakashima,
 P.
 Navarro,
 P.
 K.
 Ng,
 
A.
 C.
 Nica,
 P.
 M.
 Nilsson,
 I.
 Njolstad,
 M.
 M.
 Nothen,
 K.
 Ohnaka,
 T.
 H.
 Ong,
 K.
 R.
 Owen,
 C.
 N.
 
Palmer,
 J.
 S.
 Pankow,
 K.
 S.
 Park,
 M.
 Parkin,
 S.
 Pechlivanis,
 N.
 L.
 Pedersen,
 L.
 Peltonen,
 J.
 R.
 
Perry,
 A.
 Peters,
 J.
 M.
 Pinidiyapathirage,
 C.
 G.
 Platou,
 S.
 Potter,
 J.
 F.
 Price,
 L.
 Qi,
 V.
 Radha,
 L.
 
Rallidis,
 A.
 Rasheed,
 W.
 Rathman,
 R.
 Rauramaa,
 S.
 Raychaudhuri,
 N.
 W.
 Rayner,
 S.
 D.
 Rees,
 E.
 
Rehnberg,
 S.
 Ripatti,
 N.
 Robertson,
 M.
 Roden,
 E.
 J.
 Rossin,
 I.
 Rudan,
 D.
 Rybin,
 T.
 E.
 Saaristo,
 V.
 
Salomaa,
 J.
 Saltevo,
 M.
 Samuel,
 D.
 K.
 Sanghera,
 J.
 Saramies,
 J.
 Scott,
 L.
 J.
 Scott,
 R.
 A.
 Scott,
 A.
 V.
 
Segre,
 J.
 Sehmi,
 B.
 Sennblad,
 N.
 Shah,
 S.
 Shah,
 A.
 S.
 Shera,
 X.
 O.
 Shu,
 A.
 R.
 Shuldiner,
 G.
 
Sigurdsson,
 E.
 Sijbrands,
 A.
 Silveira,
 X.
 Sim,
 S.
 Sivapalaratnam,
 K.
 S.
 Small,
 W.
 Y.
 So,
 A.
 
Stancakova,
 K.
 Stefansson,
 G.
 Steinbach,
 V.
 Steinthorsdottir,
 K.
 Stirrups,
 R.
 J.
 Strawbridge,
 H.
 
M.
 Stringham,
 Q.
 Sun,
 C.
 Suo,
 A.
 C.
 Syvanen,
 R.
 Takayanagi,
 F.
 Takeuchi,
 W.
 T.
 Tay,
 T.
 M.
 
Teslovich,
 B.
 Thorand,
 G.
 Thorleifsson,
 U.
 Thorsteinsdottir,
 E.
 Tikkanen,
 J.
 Trakalo,
 E.
 
Tremoli,
 M.
 D.
 Trip,
 F.
 J.
 Tsai,
 T.
 Tuomi,
 J.
 Tuomilehto,
 A.
 G.
 Uitterlinden,
 A.
 Valladares-­‐‑
Salgado,
 S.
 Vedantam,
 F.
 Veglia,
 B.
 F.
 Voight,
 C.
 Wang,
 N.
 J.
 Wareham,
 R.
 Wennauer,
 A.
 R.
 
Wickremasinghe,
 T.
 Wilsgaard,
 J.
 F.
 Wilson,
 S.
 Wiltshire,
 W.
 Winckler,
 T.
 Y.
 Wong,
 A.
 R.
 Wood,
 
J.
 Y.
 Wu,
 Y.
 Wu,
 K.
 Yamamoto,
 T.
 Yamauchi,
 M.
 Yang,
 L.
 Yengo,
 M.
 Yokota,
 R.
 Young,
 D.
 
Zabaneh,
 F.
 Zhang,
 R.
 Zhang,
 W.
 Zheng,
 P.
 Z.
 Zimmet,
 D.
 Altshuler,
 D.
 W.
 Bowden,
 Y.
 S.
 Cho,
 N.
 
J.
 Cox,
 M.
 Cruz,
 C.
 L.
 Hanis,
 J.
 Kooner,
 J.
 Y.
 Lee,
 M.
 Seielstad,
 Y.
 Y.
 Teo,
 M.
 Boehnke,
 E.
 J.
 Parra,
 J.
 
C.
 Chambers,
 E.
 S.
 Tai,
 M.
 I.
 McCarthy
 and
 A.
 P.
 Morris
 (2014).
 "Genome-­‐‑wide
 trans-­‐‑ancestry
 
meta-­‐‑analysis
 provides
 insight
 into
 the
 genetic
 architecture
 of
 type
 2
 diabetes
 
susceptibility."
 Nat
 Genet
 46(3):
 234-­‐‑244.
 
Mali,
 P.,
 L.
 Yang,
 K.
 M.
 Esvelt,
 J.
 Aach,
 M.
 Guell,
 J.
 E.
 DiCarlo,
 J.
 E.
 Norville
 and
 G.
 M.
 Church
 
(2013).
 "RNA-­‐‑guided
 human
 genome
 engineering
 via
 Cas9."
 Science
 339(6121):
 823-­‐‑826.
 
Maniatis,
 T.,
 J.
 V.
 Falvo,
 T.
 H.
 Kim,
 C.
 H.
 Lin,
 B.
 S.
 Parekh
 and
 M.
 G.
 Wathelet
 (1998).
 "Structure
 
and
 function
 of
 the
 interferon-­‐‑B
 enhanceosome."
 Cold
 Spring
 Harb
 Symp
 Quant
 Biol
 63:
 
609-­‐‑620.
 
Manolio,
 T.
 A.
 (2010).
 "Genomewide
 association
 studies
 and
 assessment
 of
 the
 risk
 of
 
disease."
 N
 Engl
 J
 Med
 363(2):
 166-­‐‑176.
 

  161
 
Matano,
 M.,
 S.
 Date,
 M.
 Shimokawa,
 A.
 Takano,
 M.
 Fujii,
 Y.
 Ohta,
 T.
 Watanabe,
 T.
 Kanai
 and
 T.
 
Sato
 (2015).
 "Modeling
 colorectal
 cancer
 using
 CRISPR-­‐‑Cas9-­‐‑mediated
 engineering
 of
 
human
 intestinal
 organoids."
 Nat
 Med
 21(3):
 256-­‐‑262.
 
Maurano,
 M.
 T.,
 R.
 Humbert,
 E.
 Rynes,
 R.
 E.
 Thurman,
 E.
 Haugen,
 H.
 Wang,
 A.
 P.
 Reynolds,
 R.
 
Sandstrom,
 H.
 Qu,
 J.
 Brody,
 A.
 Shafer,
 F.
 Neri,
 K.
 Lee,
 T.
 Kutyavin,
 S.
 Stehling-­‐‑Sun,
 A.
 K.
 
Johnson,
 T.
 K.
 Canfield,
 E.
 Giste,
 M.
 Diegel,
 D.
 Bates,
 R.
 S.
 Hansen,
 S.
 Neph,
 P.
 J.
 Sabo,
 S.
 
Heimfeld,
 A.
 Raubitschek,
 S.
 Ziegler,
 C.
 Cotsapas,
 N.
 Sotoodehnia,
 I.
 Glass,
 S.
 R.
 Sunyaev,
 R.
 
Kaul
 and
 J.
 A.
 Stamatoyannopoulos
 (2012).
 "Systematic
 localization
 of
 common
 disease-­‐‑
associated
 variation
 in
 regulatory
 DNA."
 Science
 337(6099):
 1190-­‐‑1195.
 
McDaniell,
 R.,
 B.
 K.
 Lee,
 L.
 Song,
 Z.
 Liu,
 A.
 P.
 Boyle,
 M.
 R.
 Erdos,
 L.
 J.
 Scott,
 M.
 A.
 Morken,
 K.
 S.
 
Kucera,
 A.
 Battenhouse,
 D.
 Keefe,
 F.
 S.
 Collins,
 H.
 F.
 Willard,
 J.
 D.
 Lieb,
 T.
 S.
 Furey,
 G.
 E.
 
Crawford,
 V.
 R.
 Iyer
 and
 E.
 Birney
 (2010).
 "Heritable
 individual-­‐‑specific
 and
 allele-­‐‑specific
 
chromatin
 signatures
 in
 humans."
 Science
 328(5975):
 235-­‐‑239.
 
McKeown,
 M.
 R.
 and
 J.
 E.
 Bradner
 (2014).
 "Therapeutic
 strategies
 to
 inhibit
 MYC."
 Cold
 
Spring
 Harb
 Perspect
 Med
 4(10).
 
McVicker,
 G.,
 B.
 van
 de
 Geijn,
 J.
 F.
 Degner,
 C.
 E.
 Cain,
 N.
 E.
 Banovich,
 A.
 Raj,
 N.
 Lewellen,
 M.
 
Myrthil,
 Y.
 Gilad
 and
 J.
 K.
 Pritchard
 (2013).
 "Identification
 of
 genetic
 variants
 that
 affect
 
histone
 modifications
 in
 human
 cells."
 Science
 342(6159):
 747-­‐‑749.
 
Meier,
 I.
 D.,
 C.
 Bernreuther,
 T.
 Tilling,
 J.
 Neidhardt,
 Y.
 W.
 Wong,
 C.
 Schulze,
 T.
 Streichert
 and
 
M.
 Schachner
 (2010).
 "Short
 DNA
 sequences
 inserted
 for
 gene
 targeting
 can
 accidentally
 
interfere
 with
 off-­‐‑target
 gene
 expression."
 Faseb
 j
 24(6):
 1714-­‐‑1724.
 
Melnikov,
 A.,
 A.
 Murugan,
 X.
 Zhang,
 T.
 Tesileanu,
 L.
 Wang,
 P.
 Rogov,
 S.
 Feizi,
 A.
 Gnirke,
 C.
 G.
 
Callan,
 Jr.,
 J.
 B.
 Kinney,
 M.
 Kellis,
 E.
 S.
 Lander
 and
 T.
 S.
 Mikkelsen
 (2012).
 "Systematic
 
dissection
 and
 optimization
 of
 inducible
 enhancers
 in
 human
 cells
 using
 a
 massively
 parallel
 
reporter
 assay."
 Nat
 Biotechnol
 30(3):
 271-­‐‑277.
 
Meyer,
 M.
 B.,
 N.
 A.
 Benkusky
 and
 J.
 W.
 Pike
 (2015).
 "Selective
 Distal
 Enhancer
 Control
 of
 the
 
Mmp13
 Gene
 Identified
 through
 Clustered
 Regularly
 Interspaced
 Short
 Palindromic
 Repeat
 
(CRISPR)
 Genomic
 Deletions."
 J
 Biol
 Chem
 290(17):
 11093-­‐‑11107.
 
Michailidou,
 K.,
 P.
 Hall,
 A.
 Gonzalez-­‐‑Neira,
 M.
 Ghoussaini,
 J.
 Dennis,
 R.
 L.
 Milne,
 M.
 K.
 Schmidt,
 
J.
 Chang-­‐‑Claude,
 S.
 E.
 Bojesen,
 M.
 K.
 Bolla,
 Q.
 Wang,
 E.
 Dicks,
 A.
 Lee,
 C.
 Turnbull,
 N.
 Rahman,
 
O.
 Fletcher,
 J.
 Peto,
 L.
 Gibson,
 I.
 Dos
 Santos
 Silva,
 H.
 Nevanlinna,
 T.
 A.
 Muranen,
 K.
 Aittomaki,
 
C.
 Blomqvist,
 K.
 Czene,
 A.
 Irwanto,
 J.
 Liu,
 Q.
 Waisfisz,
 H.
 Meijers-­‐‑Heijboer,
 M.
 Adank,
 R.
 B.
 van
 
der
 Luijt,
 R.
 Hein,
 N.
 Dahmen,
 L.
 Beckman,
 A.
 Meindl,
 R.
 K.
 Schmutzler,
 B.
 Muller-­‐‑Myhsok,
 P.
 
Lichtner,
 J.
 L.
 Hopper,
 M.
 C.
 Southey,
 E.
 Makalic,
 D.
 F.
 Schmidt,
 A.
 G.
 Uitterlinden,
 A.
 Hofman,
 
D.
 J.
 Hunter,
 S.
 J.
 Chanock,
 D.
 Vincent,
 F.
 Bacot,
 D.
 C.
 Tessier,
 S.
 Canisius,
 L.
 F.
 Wessels,
 C.
 A.
 
Haiman,
 M.
 Shah,
 R.
 Luben,
 J.
 Brown,
 C.
 Luccarini,
 N.
 Schoof,
 K.
 Humphreys,
 J.
 Li,
 B.
 G.
 
Nordestgaard,
 S.
 F.
 Nielsen,
 H.
 Flyger,
 F.
 J.
 Couch,
 X.
 Wang,
 C.
 Vachon,
 K.
 N.
 Stevens,
 D.
 
Lambrechts,
 M.
 Moisse,
 R.
 Paridaens,
 M.
 R.
 Christiaens,
 A.
 Rudolph,
 S.
 Nickels,
 D.
 Flesch-­‐‑
Janys,
 N.
 Johnson,
 Z.
 Aitken,
 K.
 Aaltonen,
 T.
 Heikkinen,
 A.
 Broeks,
 L.
 J.
 Veer,
 C.
 E.
 van
 der
 
Schoot,
 P.
 Guenel,
 T.
 Truong,
 P.
 Laurent-­‐‑Puig,
 F.
 Menegaux,
 F.
 Marme,
 A.
 Schneeweiss,
 C.
 
Sohn,
 B.
 Burwinkel,
 M.
 P.
 Zamora,
 J.
 I.
 Perez,
 G.
 Pita,
 M.
 R.
 Alonso,
 A.
 Cox,
 I.
 W.
 Brock,
 S.
 S.
 
Cross,
 M.
 W.
 Reed,
 E.
 J.
 Sawyer,
 I.
 Tomlinson,
 M.
 J.
 Kerin,
 N.
 Miller,
 B.
 E.
 Henderson,
 F.
 
Schumacher,
 L.
 Le
 Marchand,
 I.
 L.
 Andrulis,
 J.
 A.
 Knight,
 G.
 Glendon,
 A.
 M.
 Mulligan,
 A.
 
Lindblom,
 S.
 Margolin,
 M.
 J.
 Hooning,
 A.
 Hollestelle,
 A.
 M.
 van
 den
 Ouweland,
 A.
 Jager,
 Q.
 M.
 
Bui,
 J.
 Stone,
 G.
 S.
 Dite,
 C.
 Apicella,
 H.
 Tsimiklis,
 G.
 G.
 Giles,
 G.
 Severi,
 L.
 Baglietto,
 P.
 A.
 

  162
 
Fasching,
 L.
 Haeberle,
 A.
 B.
 Ekici,
 M.
 W.
 Beckmann,
 H.
 Brenner,
 H.
 Muller,
 V.
 Arndt,
 C.
 
Stegmaier,
 A.
 Swerdlow,
 A.
 Ashworth,
 N.
 Orr,
 M.
 Jones,
 J.
 Figueroa,
 J.
 Lissowska,
 L.
 Brinton,
 
M.
 S.
 Goldberg,
 F.
 Labreche,
 M.
 Dumont,
 R.
 Winqvist,
 K.
 Pylkas,
 A.
 Jukkola-­‐‑Vuorinen,
 M.
 Grip,
 
H.
 Brauch,
 U.
 Hamann,
 T.
 Bruning,
 P.
 Radice,
 P.
 Peterlongo,
 S.
 Manoukian,
 B.
 Bonanni,
 P.
 
Devilee,
 R.
 A.
 Tollenaar,
 C.
 Seynaeve,
 C.
 J.
 van
 Asperen,
 A.
 Jakubowska,
 J.
 Lubinski,
 K.
 
Jaworska,
 K.
 Durda,
 A.
 Mannermaa,
 V.
 Kataja,
 V.
 M.
 Kosma,
 J.
 M.
 Hartikainen,
 N.
 V.
 
Bogdanova,
 N.
 N.
 Antonenkova,
 T.
 Dork,
 V.
 N.
 Kristensen,
 H.
 Anton-­‐‑Culver,
 S.
 Slager,
 A.
 E.
 
Toland,
 S.
 Edge,
 F.
 Fostira,
 D.
 Kang,
 K.
 Y.
 Yoo,
 D.
 Y.
 Noh,
 K.
 Matsuo,
 H.
 Ito,
 H.
 Iwata,
 A.
 Sueta,
 A.
 
H.
 Wu,
 C.
 C.
 Tseng,
 D.
 Van
 Den
 Berg,
 D.
 O.
 Stram,
 X.
 O.
 Shu,
 W.
 Lu,
 Y.
 T.
 Gao,
 H.
 Cai,
 S.
 H.
 Teo,
 C.
 
H.
 Yip,
 S.
 Y.
 Phuah,
 B.
 K.
 Cornes,
 M.
 Hartman,
 H.
 Miao,
 W.
 Y.
 Lim,
 J.
 H.
 Sng,
 K.
 Muir,
 A.
 
Lophatananon,
 S.
 Stewart-­‐‑Brown,
 P.
 Siriwanarangsan,
 C.
 Y.
 Shen,
 C.
 N.
 Hsiung,
 P.
 E.
 Wu,
 S.
 L.
 
Ding,
 S.
 Sangrajrang,
 V.
 Gaborieau,
 P.
 Brennan,
 J.
 McKay,
 W.
 J.
 Blot,
 L.
 B.
 Signorello,
 Q.
 Cai,
 W.
 
Zheng,
 S.
 Deming-­‐‑Halverson,
 M.
 Shrubsole,
 J.
 Long,
 J.
 Simard,
 M.
 Garcia-­‐‑Closas,
 P.
 D.
 Pharoah,
 
G.
 Chenevix-­‐‑Trench,
 A.
 M.
 Dunning,
 J.
 Benitez
 and
 D.
 F.
 Easton
 (2013).
 "Large-­‐‑scale
 
genotyping
 identifies
 41
 new
 loci
 associated
 with
 breast
 cancer
 risk."
 Nat
 Genet
 45(4):
 353-­‐‑
361,
 361e351-­‐‑352.
 
Musunuru,
 K.,
 A.
 Strong,
 M.
 Frank-­‐‑Kamenetsky,
 N.
 E.
 Lee,
 T.
 Ahfeldt,
 K.
 V.
 Sachs,
 X.
 Li,
 H.
 Li,
 N.
 
Kuperwasser,
 V.
 M.
 Ruda,
 J.
 P.
 Pirruccello,
 B.
 Muchmore,
 L.
 Prokunina-­‐‑Olsson,
 J.
 L.
 Hall,
 E.
 E.
 
Schadt,
 C.
 R.
 Morales,
 S.
 Lund-­‐‑Katz,
 M.
 C.
 Phillips,
 J.
 Wong,
 W.
 Cantley,
 T.
 Racie,
 K.
 G.
 Ejebe,
 M.
 
Orho-­‐‑Melander,
 O.
 Melander,
 V.
 Koteliansky,
 K.
 Fitzgerald,
 R.
 M.
 Krauss,
 C.
 A.
 Cowan,
 S.
 
Kathiresan
 and
 D.
 J.
 Rader
 (2010).
 "From
 noncoding
 variant
 to
 phenotype
 via
 SORT1
 at
 the
 
1p13
 cholesterol
 locus."
 Nature
 466(7307):
 714-­‐‑719.
 
Narendra,
 V.,
 P.
 P.
 Rocha,
 D.
 An,
 R.
 Raviram,
 J.
 A.
 Skok,
 E.
 O.
 Mazzoni
 and
 D.
 Reinberg
 (2015).
 
"Transcription.
 CTCF
 establishes
 discrete
 functional
 chromatin
 domains
 at
 the
 Hox
 clusters
 
during
 differentiation."
 Science
 347(6225):
 1017-­‐‑1021.
 
Nichols,
 M.
 H.
 and
 V.
 G.
 Corces
 (2015).
 "A
 CTCF
 Code
 for
 3D
 Genome
 Architecture."
 Cell
 
162(4):
 703-­‐‑705.
 
Nicolae,
 D.
 L.,
 E.
 Gamazon,
 W.
 Zhang,
 S.
 Duan,
 M.
 E.
 Dolan
 and
 N.
 J.
 Cox
 (2010).
 "Trait-­‐‑
associated
 SNPs
 are
 more
 likely
 to
 be
 eQTLs:
 annotation
 to
 enhance
 discovery
 from
 GWAS."
 
PLoS
 Genet
 6(4):
 e1000888.
 
O'Geen,
 H.,
 L.
 Echipare
 and
 P.
 J.
 Farnham
 (2011).
 "Using
 ChIP-­‐‑Seq
 Technology
 to
 Generate
 
High-­‐‑Resolution
 Profiles
 of
 Histone
 Modifications."
 Methods
 Mol
 Biol
 791:
 265-­‐‑286.
 
Onengut-­‐‑Gumuscu,
 S.,
 W.
 M.
 Chen,
 O.
 Burren,
 N.
 J.
 Cooper,
 A.
 R.
 Quinlan,
 J.
 C.
 Mychaleckyj,
 E.
 
Farber,
 J.
 K.
 Bonnie,
 M.
 Szpak,
 E.
 Schofield,
 P.
 Achuthan,
 H.
 Guo,
 M.
 D.
 Fortune,
 H.
 Stevens,
 N.
 
M.
 Walker,
 L.
 D.
 Ward,
 A.
 Kundaje,
 M.
 Kellis,
 M.
 J.
 Daly,
 J.
 C.
 Barrett,
 J.
 D.
 Cooper,
 P.
 Deloukas,
 
J.
 A.
 Todd,
 C.
 Wallace,
 P.
 Concannon
 and
 S.
 S.
 Rich
 (2015).
 "Fine
 mapping
 of
 type
 1
 diabetes
 
susceptibility
 loci
 and
 evidence
 for
 colocalization
 of
 causal
 variants
 with
 lymphoid
 gene
 
enhancers."
 Nat
 Genet
 47(4):
 381-­‐‑386.
 
Ong,
 C.
 T.
 and
 V.
 G.
 Corces
 (2011).
 "Enhancer
 function:
 new
 insights
 into
 the
 regulation
 of
 
tissue-­‐‑specific
 gene
 expression."
 Nat
 Rev
 Genet
 12(4):
 283-­‐‑293.
 
Ong,
 C.
 T.
 and
 V.
 G.
 Corces
 (2014).
 "CTCF:
 an
 architectural
 protein
 bridging
 genome
 topology
 
and
 function."
 Nat
 Rev
 Genet
 15(4):
 234-­‐‑246.
 
Ong,
 R.
 T.,
 X.
 Wang,
 X.
 Liu
 and
 Y.
 Y.
 Teo
 (2012).
 "Efficiency
 of
 trans-­‐‑ethnic
 genome-­‐‑wide
 
meta-­‐‑analysis
 and
 fine-­‐‑mapping."
 Eur
 J
 Hum
 Genet
 20(12):
 1300-­‐‑1307.
 

  163
 
Ongen,
 H.,
 C.
 L.
 Andersen,
 J.
 B.
 Bramsen,
 B.
 Oster,
 M.
 H.
 Rasmussen,
 P.
 G.
 Ferreira,
 J.
 Sandoval,
 
E.
 Vidal,
 N.
 Whiffin,
 A.
 Planchon,
 I.
 Padioleau,
 D.
 Bielser,
 L.
 Romano,
 I.
 Tomlinson,
 R.
 S.
 
Houlston,
 M.
 Esteller,
 T.
 F.
 Orntoft
 and
 E.
 T.
 Dermitzakis
 (2014).
 "Putative
 cis-­‐‑regulatory
 
drivers
 in
 colorectal
 cancer."
 Nature
 512(7512):
 87-­‐‑90.
 
Pai,
 A.
 A.,
 J.
 K.
 Pritchard
 and
 Y.
 Gilad
 (2015).
 "The
 genetic
 and
 mechanistic
 basis
 for
 variation
 
in
 gene
 regulation."
 PLoS
 Genet
 11(1):
 e1004857.
 
Palmiter,
 R.
 D.
 and
 R.
 L.
 Brinster
 (1986).
 "Germ-­‐‑line
 transformation
 of
 mice."
 Annu
 Rev
 
Genet
 20:
 465-­‐‑499.
 
Panne,
 D.
 (2008).
 "The
 enhanceosome."
 Curr
 Opin
 Structural
 Biol
 18:
 236-­‐‑242.
 
Paredes,
 J.,
 J.
 Figueiredo,
 A.
 Albergaria,
 P.
 Oliveira,
 J.
 Carvalho,
 A.
 S.
 Ribeiro,
 J.
 Caldeira,
 A.
 M.
 
Costa,
 J.
 Simoes-­‐‑Correia,
 M.
 J.
 Oliveira,
 H.
 Pinheiro,
 S.
 S.
 Pinho,
 R.
 Mateus,
 C.
 A.
 Reis,
 M.
 Leite,
 
M.
 S.
 Fernandes,
 F.
 Schmitt,
 F.
 Carneiro,
 C.
 Figueiredo,
 C.
 Oliveira
 and
 R.
 Seruca
 (2012).
 
"Epithelial
 E-­‐‑
 and
 P-­‐‑cadherins:
 role
 and
 clinical
 significance
 in
 cancer."
 Biochim
 Biophys
 
Acta
 1826(2):
 297-­‐‑311.
 
Petit,
 F.,
 A.
 S.
 Jourdain,
 M.
 Holder-­‐‑Espinasse,
 B.
 Keren,
 J.
 Andrieux,
 M.
 Duterque-­‐‑Coquillaud,
 
N.
 Porchet,
 S.
 Manouvrier-­‐‑Hanu
 and
 F.
 Escande
 (2015).
 "The
 disruption
 of
 a
 novel
 limb
 cis-­‐‑
regulatory
 element
 of
 SHH
 is
 associated
 with
 autosomal
 dominant
 preaxial
 polydactyly-­‐‑
hypertrichosis."
 Eur
 J
 Hum
 Genet
 March
 18,
 Epub
 ahead
 of
 print.
 
Pomerantz,
 M.
 M.,
 N.
 Ahmadiyeh,
 L.
 Jia,
 P.
 Herman,
 M.
 P.
 Verzi,
 H.
 Doddapaneni,
 C.
 A.
 
Beckwith,
 J.
 A.
 Chan,
 A.
 Hills,
 M.
 Davis,
 K.
 Yao,
 S.
 M.
 Kehoe,
 H.
 J.
 Lenz,
 C.
 A.
 Haiman,
 C.
 Yan,
 B.
 
E.
 Henderson,
 B.
 Frenkel,
 J.
 Barretina,
 A.
 Bass,
 J.
 Tabernero,
 J.
 Baselga,
 M.
 M.
 Regan,
 J.
 R.
 
Manak,
 R.
 Shivdasani,
 G.
 A.
 Coetzee
 and
 M.
 L.
 Freedman
 (2009).
 "The
 8q24
 cancer
 risk
 
variant
 rs6983267
 shows
 long-­‐‑range
 interaction
 with
 MYC
 in
 colorectal
 cancer."
 Nat
 Genet
 
41(8):
 882-­‐‑884.
 
Prelich,
 G.
 (2012).
 "Gene
 overexpression:
 uses,
 mechanisms,
 and
 interpretation."
 Genetics
 
190(3):
 841-­‐‑854.
 
Ramasamy,
 A.,
 D.
 Trabzuni,
 S.
 Guelfi,
 V.
 Varghese,
 C.
 Smith,
 R.
 Walker,
 T.
 De,
 L.
 Coin,
 R.
 de
 
Silva,
 M.
 R.
 Cookson,
 A.
 B.
 Singleton,
 J.
 Hardy,
 M.
 Ryten
 and
 M.
 E.
 Weale
 (2014).
 "Genetic
 
variability
 in
 the
 regulation
 of
 gene
 expression
 in
 ten
 regions
 of
 the
 human
 brain."
 Nat
 
Neurosci
 17(10):
 1418-­‐‑1428.
 
Raviram,
 R.,
 P.
 P.
 Rocha,
 R.
 Bonneau
 and
 J.
 A.
 Skok
 (2014).
 "Interpreting
 4C-­‐‑Seq
 data:
 how
 
far
 can
 we
 go?"
 Epigenomics
 6(5):
 455-­‐‑457.
 
Rivera,
 C.
 M.
 and
 B.
 Ren
 (2013).
 "Mapping
 human
 epigenomes."
 Cell
 155(1):
 39-­‐‑55.
 
RoadmapEpigenomicsConsortium
 (2015).
 "Integrative
 analysis
 of
 111
 reference
 human
 
epigenomes."
 Nature
 19:
 317-­‐‑330.
 
Sagai,
 T.,
 M.
 Hosoya,
 Y.
 Mizushina,
 M.
 Tamura
 and
 T.
 Shiroishi
 (2005).
 "Elimination
 of
 a
 long-­‐‑
range
 cis-­‐‑regulatory
 module
 causes
 complete
 loss
 of
 limb-­‐‑specific
 Shh
 expression
 and
 
truncation
 of
 the
 mouse
 limb."
 Development
 132(4):
 797-­‐‑803.
 
Sander,
 J.
 D.
 and
 J.
 K.
 Joung
 (2014).
 "CRISPR-­‐‑Cas
 systems
 for
 editing,
 regulating
 and
 
targeting
 genomes."
 Nat
 Biotechnol
 32(4):
 347-­‐‑355.
 
Sanyal,
 A.,
 B.
 R.
 Lajoie,
 G.
 Jain
 and
 J.
 Dekker
 (2012).
 "The
 long-­‐‑range
 interaction
 landscape
 of
 
gene
 promoters."
 Nature
 489(7414):
 109-­‐‑113.
 
Schaub,
 M.
 A.,
 A.
 P.
 Boyle,
 A.
 Kundaje,
 S.
 Batzoglou
 and
 M.
 Snyder
 (2012).
 "Linking
 disease
 
associations
 with
 regulatory
 information
 in
 the
 human
 genome."
 Genome
 Res
 22(9):
 1748-­‐‑
1759.
 

  164
 
Schmidt,
 E.
 M.,
 J.
 Zhang,
 W.
 Zhou,
 J.
 Chen,
 K.
 L.
 Mohlke,
 Y.
 E.
 Chen
 and
 C.
 J.
 Willer
 (2015).
 
"GREGOR:
 evaluating
 global
 enrichment
 of
 trait-­‐‑associated
 variants
 in
 epigenomic
 features
 
using
 a
 systematic,
 data-­‐‑driven
 approach."
 Bioinformatics
 31(16):
 2601-­‐‑2606.
 
Schork,
 A.
 J.,
 W.
 K.
 Thompson,
 P.
 Pham,
 A.
 Torkamani,
 J.
 C.
 Roddey,
 P.
 F.
 Sullivan,
 J.
 R.
 Kelsoe,
 
M.
 C.
 O'Donovan,
 H.
 Furberg,
 N.
 J.
 Schork,
 O.
 A.
 Andreassen
 and
 A.
 M.
 Dale
 (2013).
 "All
 SNPs
 
are
 not
 created
 equal:
 genome-­‐‑wide
 association
 studies
 reveal
 a
 consistent
 pattern
 of
 
enrichment
 among
 functionally
 annotated
 SNPs."
 PLoS
 Genet
 9(4):
 e1003449.
 
Schwank,
 G.,
 B.
 K.
 Koo,
 V.
 Sasselli,
 J.
 F.
 Dekkers,
 I.
 Heo,
 T.
 Demircan,
 N.
 Sasaki,
 S.
 Boymans,
 E.
 
Cuppen,
 C.
 K.
 van
 der
 Ent,
 E.
 E.
 Nieuwenhuis,
 J.
 M.
 Beekman
 and
 H.
 Clevers
 (2013).
 
"Functional
 repair
 of
 CFTR
 by
 CRISPR/Cas9
 in
 intestinal
 stem
 cell
 organoids
 of
 cystic
 
fibrosis
 patients."
 Cell
 Stem
 Cell
 13(6):
 653-­‐‑658.
 
Segditsas,
 S.
 and
 I.
 Tomlinson
 (2006).
 "Colorectal
 cancer
 and
 genetic
 alterations
 in
 the
 Wnt
 
pathway."
 Oncogene
 25(57):
 7531-­‐‑7537.
 
Sexton,
 T.
 and
 G.
 Cavalli
 (2015).
 "The
 role
 of
 chromosome
 domains
 in
 shaping
 the
 functional
 
genome."
 Cell
 160(6):
 1049-­‐‑1059.
 
Sigoillot,
 F.
 D.,
 S.
 Lyman,
 J.
 F.
 Huckins,
 B.
 Adamson,
 E.
 Chung,
 B.
 Quattrochi
 and
 R.
 W.
 King
 
(2012).
 "A
 bioinformatics
 method
 identifies
 prominent
 off-­‐‑targeted
 transcripts
 in
 RNAi
 
screens."
 Nat
 Methods
 9(4):
 363-­‐‑366.
 
Simonis,
 M.,
 J.
 Kooren
 and
 W.
 de
 Laat
 (2007).
 "An
 evaluation
 of
 3C-­‐‑based
 methods
 to
 capture
 
DNA
 interactions."
 Nat
 Methods
 4(11):
 895-­‐‑901.
 
Smemo,
 S.,
 J.
 J.
 Tena,
 K.
 H.
 Kim,
 E.
 R.
 Gamazon,
 N.
 J.
 Sakabe,
 C.
 Gomez-­‐‑Marin,
 I.
 Aneas,
 F.
 L.
 
Credidio,
 D.
 R.
 Sobreira,
 N.
 F.
 Wasserman,
 J.
 H.
 Lee,
 V.
 Puviindran,
 D.
 Tam,
 M.
 Shen,
 J.
 E.
 Son,
 
N.
 A.
 Vakili,
 H.
 K.
 Sung,
 S.
 Naranjo,
 R.
 D.
 Acemel,
 M.
 Manzanares,
 A.
 Nagy,
 N.
 J.
 Cox,
 C.
 C.
 Hui,
 J.
 
L.
 Gomez-­‐‑Skarmeta
 and
 M.
 A.
 Nobrega
 (2014).
 "Obesity-­‐‑associated
 variants
 within
 FTO
 form
 
long-­‐‑range
 functional
 connections
 with
 IRX3."
 Nature
 507(7492):
 371-­‐‑375.
 
Spain,
 S.
 L.
 and
 J.
 C.
 Barrett
 (2015).
 "Strategies
 for
 fine-­‐‑mapping
 complex
 traits."
 Hum
 Mol
 
Genet
 24(R1):
 R111-­‐‑119.
 
Spitz,
 F.
 and
 E.
 E.
 Furlong
 (2012).
 "Transcription
 factors:
 from
 enhancer
 binding
 to
 
developmental
 control."
 Nat
 Rev
 Genet
 13(9):
 613-­‐‑626.
 
Splinter,
 E.,
 H.
 Heath,
 J.
 Kooren,
 R.-­‐‑J.
 Palstra,
 P.
 Klous,
 F.
 Grosveld,
 N.
 Galjart
 and
 W.
 de
 Laat
 
(2006).
 "CTCF
 mediates
 long-­‐‑range
 chromatin
 looping
 and
 local
 histone
 modification
 in
 the
 
beta-­‐‑globin
 locus."
 Genes
 Dev
 20(17):
 2349-­‐‑2354.
 
Stergachis,
 A.
 B.,
 E.
 Haugen,
 A.
 Shafer,
 W.
 Fu,
 B.
 Vernot,
 A.
 Reynolds,
 A.
 Raubitschek,
 S.
 
Ziegler,
 E.
 M.
 LeProust,
 J.
 M.
 Akey
 and
 J.
 A.
 Stamatoyannopoulos
 (2013).
 "Exonic
 
transcription
 factor
 binding
 directs
 codon
 choice
 and
 affects
 protein
 evolution."
 Science
 
342:
 1367-­‐‑1372.
 
Sur,
 I.
 K.,
 O.
 Hallikas,
 A.
 Vaharautio,
 J.
 Yan,
 M.
 Turunen,
 M.
 Enge,
 M.
 Taipale,
 A.
 Karhu,
 L.
 A.
 
Aaltonen
 and
 J.
 Taipale
 (2012).
 "Mice
 lacking
 a
 Myc
 enhancer
 that
 includes
 human
 SNP
 
rs6983267
 are
 resistant
 to
 intestinal
 tumors."
 Science
 338(6112):
 1360-­‐‑1363.
 
TheCancerGenomeAtlas
 (2012).
 "Comprehensive
 molecular
 characterization
 of
 human
 
colon
 and
 rectal
 cancer."
 Nature
 487(7407):
 330-­‐‑337.
 
Trapnell,
 C.,
 L.
 Pachter
 and
 S.
 L.
 Salzberg
 (2009).
 "TopHat:
 discovering
 splice
 junctions
 with
 
RNA-­‐‑Seq."
 Bioinformatics
 25(9):
 1105-­‐‑1111.
 
Trapnell,
 C.,
 B.
 A.
 Williams,
 G.
 Pertea,
 A.
 Mortazavi,
 G.
 Kwan,
 M.
 J.
 van
 Baren,
 S.
 L.
 Salzberg,
 B.
 
J.
 Wold
 and
 L.
 Pachter
 (2010).
 "Transcript
 assembly
 and
 quantification
 by
 RNA-­‐‑Seq
 reveals
 

  165
 
unannotated
 transcripts
 and
 isoform
 switching
 during
 cell
 differentiation."
 Nat
 Biotechnol
 
28(5):
 511-­‐‑515.
 
Trynka,
 G.,
 K.
 A.
 Hunt,
 N.
 A.
 Bockett,
 J.
 Romanos,
 V.
 Mistry,
 A.
 Szperl,
 S.
 F.
 Bakker,
 M.
 T.
 
Bardella,
 L.
 Bhaw-­‐‑Rosun,
 G.
 Castillejo,
 E.
 G.
 de
 la
 Concha,
 R.
 C.
 de
 Almeida,
 K.
 R.
 Dias,
 C.
 C.
 van
 
Diemen,
 P.
 C.
 Dubois,
 R.
 H.
 Duerr,
 S.
 Edkins,
 L.
 Franke,
 K.
 Fransen,
 J.
 Gutierrez,
 G.
 A.
 Heap,
 B.
 
Hrdlickova,
 S.
 Hunt,
 L.
 Plaza
 Izurieta,
 V.
 Izzo,
 L.
 A.
 Joosten,
 C.
 Langford,
 M.
 C.
 Mazzilli,
 C.
 A.
 
Mein,
 V.
 Midah,
 M.
 Mitrovic,
 B.
 Mora,
 M.
 Morelli,
 S.
 Nutland,
 C.
 Nunez,
 S.
 Onengut-­‐‑Gumuscu,
 
K.
 Pearce,
 M.
 Platteel,
 I.
 Polanco,
 S.
 Potter,
 C.
 Ribes-­‐‑Koninckx,
 I.
 Ricano-­‐‑Ponce,
 S.
 S.
 Rich,
 A.
 
Rybak,
 J.
 L.
 Santiago,
 S.
 Senapati,
 A.
 Sood,
 H.
 Szajewska,
 R.
 Troncone,
 J.
 Varade,
 C.
 Wallace,
 V.
 
M.
 Wolters,
 A.
 Zhernakova,
 B.
 K.
 Thelma,
 B.
 Cukrowska,
 E.
 Urcelay,
 J.
 R.
 Bilbao,
 M.
 L.
 Mearin,
 
D.
 Barisani,
 J.
 C.
 Barrett,
 V.
 Plagnol,
 P.
 Deloukas,
 C.
 Wijmenga
 and
 D.
 A.
 van
 Heel
 (2011).
 
"Dense
 genotyping
 identifies
 and
 localizes
 multiple
 common
 and
 rare
 variant
 association
 
signals
 in
 celiac
 disease."
 Nat
 Genet
 43(12):
 1193-­‐‑1201.
 
Tseng,
 Y.
 Y.,
 B.
 S.
 Moriarity,
 W.
 Gong,
 R.
 Akiyama,
 A.
 Tiwari,
 H.
 Kawakami,
 P.
 Ronning,
 B.
 
Reuland,
 K.
 Guenther,
 T.
 C.
 Beadnell,
 J.
 Essig,
 G.
 M.
 Otto,
 M.
 G.
 O'Sullivan,
 D.
 A.
 Largaespada,
 K.
 
L.
 Schwertfeger,
 Y.
 Marahrens,
 Y.
 Kawakami
 and
 A.
 Bagchi
 (2014).
 "PVT1
 dependence
 in
 
cancer
 with
 MYC
 copy-­‐‑number
 increase."
 Nature
 512(7512):
 82-­‐‑86.
 
van
 de
 Werken,
 H.
 J.,
 P.
 J.
 de
 Vree,
 E.
 Splinter,
 S.
 J.
 Holwerda,
 P.
 Klous,
 E.
 de
 Wit
 and
 W.
 de
 
Laat
 (2012).
 "4C
 technology:
 protocols
 and
 data
 analysis."
 Methods
 Enzymol
 513:
 89-­‐‑112.
 
van
 de
 Werken,
 H.
 J.,
 G.
 Landan,
 S.
 J.
 Holwerda,
 M.
 Hoichman,
 P.
 Klous,
 R.
 Chachik,
 E.
 Splinter,
 
C.
 Valdes-­‐‑Quezada,
 Y.
 Oz,
 B.
 A.
 Bouwman,
 M.
 J.
 Verstegen,
 E.
 de
 Wit,
 A.
 Tanay
 and
 W.
 de
 Laat
 
(2012).
 "Robust
 4C-­‐‑seq
 data
 analysis
 to
 screen
 for
 regulatory
 DNA
 interactions."
 Nat
 
Methods
 9(10):
 969-­‐‑972.
 
Vierstra,
 J.,
 A.
 Reik,
 K.
 H.
 Chang,
 S.
 Stehling-­‐‑Sun,
 Y.
 Zhou,
 S.
 J.
 Hinkley,
 D.
 E.
 Paschon,
 L.
 Zhang,
 
N.
 Psatha,
 Y.
 R.
 Bendana,
 C.
 M.
 O'Neil,
 A.
 H.
 Song,
 A.
 K.
 Mich,
 P.
 Q.
 Liu,
 G.
 Lee,
 D.
 E.
 Bauer,
 M.
 C.
 
Holmes,
 S.
 H.
 Orkin,
 T.
 Papayannopoulou,
 G.
 Stamatoyannopoulos,
 E.
 J.
 Rebar,
 P.
 D.
 Gregory,
 
F.
 D.
 Urnov
 and
 J.
 A.
 Stamatoyannopoulos
 (2015).
 "Functional
 footprinting
 of
 regulatory
 
DNA."
 Nat
 Methods.
 
Vietri
 Rudan,
 M.,
 C.
 Barrington,
 S.
 Henderson,
 C.
 Ernst,
 D.
 T.
 Odom,
 A.
 Tanay
 and
 S.
 Hadjur
 
(2015).
 "Comparative
 Hi-­‐‑C
 Reveals
 that
 CTCF
 Underlies
 Evolution
 of
 Chromosomal
 Domain
 
Architecture."
 Cell
 Rep
 10(8):
 1297-­‐‑1309.
 
Voight,
 B.
 F.,
 H.
 M.
 Kang,
 J.
 Ding,
 C.
 D.
 Palmer,
 C.
 Sidore,
 P.
 S.
 Chines,
 N.
 P.
 Burtt,
 C.
 
Fuchsberger,
 Y.
 Li,
 J.
 Erdmann,
 T.
 M.
 Frayling,
 I.
 M.
 Heid,
 A.
 U.
 Jackson,
 T.
 Johnson,
 T.
 O.
 
Kilpelainen,
 C.
 M.
 Lindgren,
 A.
 P.
 Morris,
 I.
 Prokopenko,
 J.
 C.
 Randall,
 R.
 Saxena,
 N.
 Soranzo,
 E.
 
K.
 Speliotes,
 T.
 M.
 Teslovich,
 E.
 Wheeler,
 J.
 Maguire,
 M.
 Parkin,
 S.
 Potter,
 N.
 W.
 Rayner,
 N.
 
Robertson,
 K.
 Stirrups,
 W.
 Winckler,
 S.
 Sanna,
 A.
 Mulas,
 R.
 Nagaraja,
 F.
 Cucca,
 I.
 Barroso,
 P.
 
Deloukas,
 R.
 J.
 Loos,
 S.
 Kathiresan,
 P.
 B.
 Munroe,
 C.
 Newton-­‐‑Cheh,
 A.
 Pfeufer,
 N.
 J.
 Samani,
 H.
 
Schunkert,
 J.
 N.
 Hirschhorn,
 D.
 Altshuler,
 M.
 I.
 McCarthy,
 G.
 R.
 Abecasis
 and
 M.
 Boehnke
 
(2012).
 "The
 metabochip,
 a
 custom
 genotyping
 array
 for
 genetic
 studies
 of
 metabolic,
 
cardiovascular,
 and
 anthropometric
 traits."
 PLoS
 Genet
 8(8):
 e1002793.
 
Wang,
 H.,
 H.
 Yang,
 C.
 S.
 Shivalila,
 M.
 M.
 Dawlaty,
 A.
 W.
 Cheng,
 F.
 Zhang
 and
 R.
 Jaenisch
 (2013).
 
"One-­‐‑step
 generation
 of
 mice
 carrying
 mutations
 in
 multiple
 genes
 by
 CRISPR/Cas-­‐‑mediated
 
genome
 engineering."
 Cell
 153(4):
 910-­‐‑918.
 
Wang,
 J.,
 J.
 Zhuang,
 S.
 Iyer,
 X.
 Y.
 Lin,
 M.
 C.
 Greven,
 B.
 H.
 Kim,
 J.
 Moore,
 B.
 G.
 Pierce,
 X.
 Dong,
 D.
 
Virgil,
 E.
 Birney,
 J.
 H.
 Hung
 and
 Z.
 Weng
 (2013).
 "Factorbook.org:
 a
 Wiki-­‐‑based
 database
 for
 

  166
 
transcription
 factor-­‐‑binding
 data
 generated
 by
 the
 ENCODE
 consortium."
 Nucleic
 Acids
 Res
 
41(Database
 issue):
 D171-­‐‑176.
 
Ward,
 L.
 D.
 and
 M.
 Kellis
 (2012).
 "HaploReg:
 a
 resource
 for
 exploring
 chromatin
 states,
 
conservation,
 and
 regulatory
 motif
 alterations
 within
 sets
 of
 genetically
 linked
 variants."
 
Nucleic
 Acids
 Res
 40(Database
 issue):
 D930-­‐‑934.
 
Webster,
 D.
 E.,
 B.
 Barajas,
 R.
 T.
 Bussat,
 K.
 J.
 Yan,
 P.
 H.
 Neela,
 R.
 J.
 Flockhart,
 J.
 Kovalski,
 A.
 
Zehnder
 and
 P.
 A.
 Khavari
 (2014).
 "Enhancer-­‐‑targeted
 genome
 editing
 selectively
 blocks
 
innate
 resistance
 to
 oncokinase
 inhibition."
 Genome
 Res.
 
Welter,
 D.,
 J.
 MacArthur,
 J.
 Morales,
 T.
 Burdett,
 P.
 Hall,
 H.
 Junkins,
 A.
 Klemm,
 P.
 Flicek,
 T.
 
Manolio,
 L.
 Hindorff
 and
 H.
 Parkinson
 (2014).
 "The
 NHGRI
 GWAS
 Catalog,
 a
 curated
 
resource
 of
 SNP-­‐‑trait
 associations."
 Nucleic
 Acids
 Res
 42(Database
 issue):
 D1001-­‐‑1006.
 
Westra,
 H.
 J.,
 M.
 J.
 Peters,
 T.
 Esko,
 H.
 Yaghootkar,
 C.
 Schurmann,
 J.
 Kettunen,
 M.
 W.
 
Christiansen,
 B.
 P.
 Fairfax,
 K.
 Schramm,
 J.
 E.
 Powell,
 A.
 Zhernakova,
 D.
 V.
 Zhernakova,
 J.
 H.
 
Veldink,
 L.
 H.
 Van
 den
 Berg,
 J.
 Karjalainen,
 S.
 Withoff,
 A.
 G.
 Uitterlinden,
 A.
 Hofman,
 F.
 
Rivadeneira,
 P.
 A.
 t
 Hoen,
 E.
 Reinmaa,
 K.
 Fischer,
 M.
 Nelis,
 L.
 Milani,
 D.
 Melzer,
 L.
 Ferrucci,
 A.
 
B.
 Singleton,
 D.
 G.
 Hernandez,
 M.
 A.
 Nalls,
 G.
 Homuth,
 M.
 Nauck,
 D.
 Radke,
 U.
 Volker,
 M.
 
Perola,
 V.
 Salomaa,
 J.
 Brody,
 A.
 Suchy-­‐‑Dicey,
 S.
 A.
 Gharib,
 D.
 A.
 Enquobahrie,
 T.
 Lumley,
 G.
 W.
 
Montgomery,
 S.
 Makino,
 H.
 Prokisch,
 C.
 Herder,
 M.
 Roden,
 H.
 Grallert,
 T.
 Meitinger,
 K.
 
Strauch,
 Y.
 Li,
 R.
 C.
 Jansen,
 P.
 M.
 Visscher,
 J.
 C.
 Knight,
 B.
 M.
 Psaty,
 S.
 Ripatti,
 A.
 Teumer,
 T.
 M.
 
Frayling,
 A.
 Metspalu,
 J.
 B.
 van
 Meurs
 and
 L.
 Franke
 (2013).
 "Systematic
 identification
 of
 
trans
 eQTLs
 as
 putative
 drivers
 of
 known
 disease
 associations."
 Nat
 Genet
 45(10):
 1238-­‐‑
1243.
 
Williamson,
 I.,
 S.
 Berlivet,
 R.
 Eskeland,
 S.
 Boyle,
 R.
 S.
 Illingworth,
 D.
 Paquette,
 J.
 Dostie
 and
 W.
 
A.
 Bickmore
 (2014).
 "Spatial
 genome
 organization:
 contrasting
 views
 from
 chromosome
 
conformation
 capture
 and
 fluorescence
 in
 situ
 hybridization."
 Genes
 Dev
 28(24):
 2778-­‐‑
2791.
 
Wright,
 J.
 B.,
 S.
 J.
 Brown
 and
 M.
 D.
 Cole
 (2010).
 "Upregulation
 of
 c-­‐‑MYC
 in
 cis
 through
 a
 large
 
chromatin
 loop
 linked
 to
 a
 cancer
 risk-­‐‑associated
 single-­‐‑nucleotide
 polymorphism
 in
 
colorectal
 cancer
 cells."
 Mol
 Cell
 Biol
 30(6):
 1411-­‐‑1420.
 
Xiang,
 J.
 F.,
 Q.
 F.
 Yin,
 T.
 Chen,
 Y.
 Zhang,
 X.
 O.
 Zhang,
 Z.
 Wu,
 S.
 Zhang,
 H.
 B.
 Wang,
 J.
 Ge,
 X.
 Lu,
 L.
 
Yang
 and
 L.
 L.
 Chen
 (2014).
 "Human
 colorectal
 cancer-­‐‑specific
 CCAT1-­‐‑L
 lncRNA
 regulates
 
long-­‐‑range
 chromatin
 interactions
 at
 the
 MYC
 locus."
 Cell
 Res
 24(5):
 513-­‐‑531.
 
Xue,
 W.,
 S.
 Chen,
 H.
 Yin,
 T.
 Tammela,
 T.
 Papagiannakopoulos,
 N.
 S.
 Joshi,
 W.
 Cai,
 G.
 Yang,
 R.
 
Bronson,
 D.
 G.
 Crowley,
 F.
 Zhang,
 D.
 G.
 Anderson,
 P.
 A.
 Sharp
 and
 T.
 Jacks
 (2014).
 "CRISPR-­‐‑
mediated
 direct
 mutation
 of
 cancer
 genes
 in
 the
 mouse
 liver."
 Nature
 514(7522):
 380-­‐‑384.
 
Yang,
 H.,
 H.
 Wang,
 C.
 S.
 Shivalila,
 A.
 W.
 Cheng,
 L.
 Shi
 and
 R.
 Jaenisch
 (2013).
 "One-­‐‑step
 
generation
 of
 mice
 carrying
 reporter
 and
 conditional
 alleles
 by
 CRISPR/Cas-­‐‑mediated
 
genome
 engineering."
 Cell
 154(6):
 1370-­‐‑1379.
 
Yao,
 L.,
 B.
 P.
 Berman
 and
 P.
 J.
 Farnham
 (2015).
 "Demystifying
 the
 secret
 mission
 of
 
enhancers:
 linking
 distal
 regulatory
 elements
 to
 target
 genes."
 Crit
 Rev
 Biochem
 Mol
 Biol
 In
 
Press.
 
Yao,
 L.,
 H.
 Shen,
 P.
 W.
 Laird,
 P.
 J.
 Farnham
 and
 B.
 P.
 Berman
 (2015).
 "Inferring
 regulatory
 
element
 landscapes
 and
 transcription
 factor
 networks
 from
 cancer
 methylomes."
 Genome
 
Biol
 16:
 105.
 

  167
 
Yao,
 L.,
 Y.
 G.
 Tak,
 B.
 P.
 Berman
 and
 P.
 J.
 Farnham
 (2014).
 "Functional
 annotation
 of
 colon
 
cancer
 risk
 SNPs."
 Nat
 Commun
 5:
 5114.
 
Ye,
 H.,
 A.
 Zhou,
 Q.
 Hong,
 X.
 Chen,
 Y.
 Xin,
 L.
 Tang,
 D.
 Dai,
 H.
 Ji,
 M.
 Xu,
 D.
 W.
 Wang
 and
 S.
 Duan
 
(2015).
 "Association
 of
 seven
 thrombotic
 pathway
 gene
 CpG-­‐‑SNPs
 with
 coronary
 heart
 
disease."
 Biomed
 Pharmacother
 72:
 98-­‐‑102.
 
Yin,
 H.,
 W.
 Xue,
 S.
 Chen,
 R.
 L.
 Bogorad,
 E.
 Benedetti,
 M.
 Grompe,
 V.
 Koteliansky,
 P.
 A.
 Sharp,
 T.
 
Jacks
 and
 D.
 G.
 Anderson
 (2014).
 "Genome
 editing
 with
 Cas9
 in
 adult
 mice
 corrects
 a
 disease
 
mutation
 and
 phenotype."
 Nat
 Biotechnol
 32(6):
 551-­‐‑553.
 
Zalatan,
 J.
 G.,
 M.
 E.
 Lee,
 R.
 Almeida,
 L.
 A.
 Gilbert,
 E.
 H.
 Whitehead,
 M.
 La
 Russa,
 J.
 C.
 Tsai,
 J.
 S.
 
Weissman,
 J.
 E.
 Dueber,
 L.
 S.
 Qi
 and
 W.
 A.
 Lim
 (2015).
 "Engineering
 complex
 synthetic
 
transcriptional
 programs
 with
 CRISPR
 RNA
 scaffolds."
 Cell
 160(1-­‐‑2):
 339-­‐‑350.
 
Zanke,
 B.
 W.,
 C.
 M.
 Greenwood,
 J.
 Rangrej,
 R.
 Kustra,
 A.
 Tenesa,
 S.
 M.
 Farrington,
 J.
 
Prendergast,
 S.
 Olschwang,
 T.
 Chiang,
 E.
 Crowdy,
 V.
 Ferretti,
 P.
 Laflamme,
 S.
 Sundararajan,
 S.
 
Roumy,
 J.
 F.
 Olivier,
 F.
 Robidoux,
 R.
 Sladek,
 A.
 Montpetit,
 P.
 Campbell,
 S.
 Bezieau,
 A.
 M.
 
O'Shea,
 G.
 Zogopoulos,
 M.
 Cotterchio,
 P.
 Newcomb,
 J.
 McLaughlin,
 B.
 Younghusband,
 R.
 
Green,
 J.
 Green,
 M.
 E.
 Porteous,
 H.
 Campbell,
 H.
 Blanche,
 M.
 Sahbatou,
 E.
 Tubacher,
 C.
 Bonaiti-­‐‑
Pellie,
 B.
 Buecher,
 E.
 Riboli,
 S.
 Kury,
 S.
 J.
 Chanock,
 J.
 Potter,
 G.
 Thomas,
 S.
 Gallinger,
 T.
 J.
 
Hudson
 and
 M.
 G.
 Dunlop
 (2007).
 "Genome-­‐‑wide
 association
 scan
 identifies
 a
 colorectal
 
cancer
 susceptibility
 locus
 on
 chromosome
 8q24."
 Nat
 Genet
 39(8):
 989-­‐‑994.
 
Zentner,
 G.
 E.
 and
 P.
 C.
 Scacheri
 (2012).
 "The
 chromatin
 fingerprint
 of
 gene
 enhancer
 
elements."
 J
 Biol
 Chem
 287(37):
 30888-­‐‑30896.
 
Zhang,
 X.,
 R.
 Cowper-­‐‑Sal
 lari,
 S.
 D.
 Bailey,
 J.
 H.
 Moore
 and
 M.
 Lupien
 (2012).
 "Integrative
 
functional
 genomics
 identifies
 an
 enhancer
 looping
 to
 the
 SOX9
 gene
 disrupted
 by
 the
 
17q24.3
 prostate
 cancer
 risk
 locus."
 Genome
 Res
 22(8):
 1437-­‐‑1446.
 
Zhang,
 X.,
 A.
 D.
 Johnson,
 A.
 E.
 Hendricks,
 S.
 J.
 Hwang,
 K.
 Tanriverdi,
 S.
 K.
 Ganesh,
 N.
 L.
 Smith,
 
P.
 A.
 Peyser,
 J.
 E.
 Freedman
 and
 C.
 J.
 O'Donnell
 (2014).
 "Genetic
 associations
 with
 expression
 
for
 genes
 implicated
 in
 GWAS
 studies
 for
 atherosclerotic
 cardiovascular
 disease
 and
 blood
 
phenotypes."
 Hum
 Mol
 Genet
 23(3):
 782-­‐‑795.
 
Zhang,
 Y.,
 Y.
 H.
 Lin,
 T.
 D.
 Johnson,
 L.
 S.
 Rozek
 and
 M.
 A.
 Sartor
 (2014).
 "PePr:
 a
 peak-­‐‑calling
 
prioritization
 pipeline
 to
 identify
 consistent
 or
 differential
 peaks
 from
 replicated
 ChIP-­‐‑Seq
 
data."
 Bioinformatics
 30(18):
 2568-­‐‑2575.
 
Zhong,
 H.,
 J.
 Beaulaurier,
 P.
 Y.
 Lum,
 C.
 Molony,
 X.
 Yang,
 D.
 J.
 Macneil,
 D.
 T.
 Weingarth,
 B.
 
Zhang,
 D.
 Greenawalt,
 R.
 Dobrin,
 K.
 Hao,
 S.
 Woo,
 C.
 Fabre-­‐‑Suver,
 S.
 Qian,
 M.
 R.
 Tota,
 M.
 P.
 
Keller,
 C.
 M.
 Kendziorski,
 B.
 S.
 Yandell,
 V.
 Castro,
 A.
 D.
 Attie,
 L.
 M.
 Kaplan
 and
 E.
 E.
 Schadt
 
(2010).
 "Liver
 and
 adipose
 expression
 associated
 SNPs
 are
 enriched
 for
 association
 to
 type
 
2
 diabetes."
 PLoS
 Genet
 6(5):
 e1000932.
 
Zhou,
 H.
 Y.,
 Y.
 Katsman,
 N.
 K.
 Dhaliwal,
 S.
 Davidson,
 N.
 N.
 Macpherson,
 M.
 Sakthidevi,
 F.
 
Collura
 and
 J.
 A.
 Mitchell
 (2014).
 "A
 Sox2
 distal
 enhancer
 cluster
 regulates
 embryonic
 stem
 
cell
 differentiation
 potential."
 Genes
 Dev
 28(24):
 2699-­‐‑2711. 
Asset Metadata
Creator Tak, Yu Gyoung (Esther) (author) 
Core Title Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes 
Contributor Electronically uploaded by the author (provenance) 
School Keck School of Medicine 
Degree Doctor of Philosophy 
Degree Program Genetic, Molecular and Cellular Biology 
Publication Date 02/17/2016 
Defense Date 10/09/2015 
Publisher University of Southern California (original), University of Southern California. Libraries (digital) 
Tag colon cancer,GWAS,OAI-PMH Harvest,risk genes,risk-associated enhancers 
Format application/pdf (imt) 
Language English
Advisor Laird-Offringa, Ite A. (committee chair), Coetzee, Gerry A. (committee member), Farnham, Peggy J. (committee member) 
Creator Email esther.ygt@gmail.com,ytak@usc.edu 
Permanent Link (DOI) https://doi.org/10.25549/usctheses-c40-209388 
Unique identifier UC11279169 
Identifier etd-TakYuGyoun-4110.pdf (filename),usctheses-c40-209388 (legacy record id) 
Legacy Identifier etd-TakYuGyoun-4110.pdf 
Dmrecord 209388 
Document Type Dissertation 
Format application/pdf (imt) 
Rights Tak, Yu Gyoung (Esther) 
Type texts
Source University of Southern California (contributing entity), University of Southern California Dissertations and Theses (collection) 
Access Conditions The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law.  Electronic access is being provided by the USC Libraries in agreement with the a... 
Repository Name University of Southern California Digital Library
Repository Location USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Abstract (if available)
Abstract Genome wide association studies (GWASs) have identified SNPs that are statistically associated with diseases. Interestingly, most of the GWAS-identified SNPs and their high LD SNPs are found in non-coding regions of the genome, decorated by regulatory elements (enhancers, promoters, and nuclear structure-associated elements). This led to my general hypothesis that disease-associated SNPs affect the expression of disease-associated genes through the function of regulatory elements in which the SNPs are located. Among regulatory elements, disease-associated SNPs are enriched in enhancers, which play an important role in regulating cell-type specific gene expression. However, enhancers do not necessarily regulate nearby genes. Using colorectal cancer (CRC) GWAS and epigenomic information for the active enhancer mark (H3K27Ac), I identified enhancers harboring CRC risk-associated SNPs. Employing a genome-editing tool (CRISPR/Cas9), I deleted several CRC risk-associated enhancers along with control regions and performed RNA-seq assays to identify putative CRC risk-associated genes. The putative target genes of each enhancer were assessed by chromosomal looping assay (4C), confirming direct physical interactions between enhancers and promoters. Deletion of one of the CRC-risk associated enhancers (E7) led to decreased expression of the MYC oncogene and reduced proliferation of HCT116 cells. Interestingly, deletion of the E7 region in HEK293 cells caused a similar downregulation of MYC and reduced cell proliferation as in HCT116, even though the H3K27Ac mark is not present in HEK293 cells. In conclusion, by identifying genes regulated by CRC risk-associated enhancers harboring SNPs, I have developed a general approach to connect risk loci to putative risk genes. 
Tags
colon cancer
GWAS
risk genes
risk-associated enhancers
Linked assets
University of Southern California Dissertations and Theses
doctype icon
University of Southern California Dissertations and Theses 
Action button