Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Understanding prostate cancer genetic susceptibility and chromatin regulation
(USC Thesis Other)
Understanding prostate cancer genetic susceptibility and chromatin regulation
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
1
Understanding Prostate Cancer
Genetic Susceptibility and
Chromatin Regulation
Yu (Phoebe) Guo
Mentor: Peggy J. Farnham, Ph.D.
Doctor of Philosophy (CANCER BIOLOGY AND GENOMICS)
University of Southern California
May 2019
2
For Jieli Shen,
who is always there for me
For Dad,
who has always encouraged me to pursue what I want and has always been my role model
For Auntie Ye Guo, Grandma and families,
with whom I always felt at home
For Peggy,
a true great mentor that’s hard to find and impossible to forget
3
Table of Contents
TABLE OF CONTENTS 3
LIST OF FIGURES 5
LIST OF TABLES 7
ABSTRACT 8
CHAPTER 1. THESIS SYNOPSIS 9
PROSTATE CANCER 9
GENOME WIDE ASSOCIATION STUDY (GWAS) AND POLYGENIC RISK SCORE 9
PROSTATE CANCER GWAS AND THE GWAS CHALLENGE 9
REGULATION AND THE 3D GENOME 10
CHAPTER 2. MATERIALS AND METHODS 11
CELL CULTURE 11
CHIP-SEQ AND ANALYSIS 11
HI-C EXPERIMENTS AND ANALYSES 15
SNP ANNOTATION 15
CRISPR/CAS9 -MEDIATED GENOMIC DELETIONS 16
RNA-SEQ AND QPCR ANALYSIS 18
CHAPTER 3. COMPREHENSIVE ANNOTATION OF PROSTATE CANCER GWAS RISK LOCI 21
FINE-MAPPED RISK SNPS ARE MOSTLY IN NON-CODING REGIONS OF GENOME 21
MAPPING OF CANDIDATE CIS REGULATORY ELEMENTS IN PROSTATE CELLS 23
USING 3D CHROMATIN INTERACTION DATASETS TO PRIORITIZE CRES THAT FORM LONG-RANGE LOOPING 26
IDENTIFICATION OF RISK CRES THAT ARE RELATED TO PCA SUSCEPTIBILITY 27
4
CHAPTER 4. FUNCTIONAL VALIDATION OF RISK CRES 37
CRISPR MEDIATED CTCF BINDING SITE DELETION CAN IDENTIFY TARGET GENES 37
TWO RISK CTCF BINDING SITES THAT FORM LOOPS WERE SELECTED FOR FUNCTIONAL FOLLOW UPS 38
FUNCTIONAL VALIDATION OF RISK CBS IN 1Q21.3 LOCUS IDENTIFIED TARGET GENE KCNN3 39
FUNCTIONAL VALIDATION OF RISK CBS IN 12Q13.13 LOCUS 43
PCA RISK-ASSOCIATED CTCF LOOPS MAY SEQUESTER GENES FROM ENHANCERS LOCATED OUTSIDE THE LOOPS 48
KCNN3 AND KRT GENES AND PROSTATE CANCER 51
CHAPTER 5. SUMMARY AND DISCUSSION 53
THE ROLE OF CTCF IN REGULATING THE CANCER TRANSCRIPTOME 53
THE ENHANCER ADOPTION HYPOTHESIS. 54
IDENTIFICATION OF TRANSCRIPTION FACTORS LINKED TO INCREASED RISK OF PROSTATE CANCER. 55
REFERENCES 56
APPENDICES 1. TABLE OF ANNOTATED FINE-MAPPED SNPS 64
APPENDICES 2. COLLECTION OF PUBLICATIONS 74
5
List of figures
Figure 2-1 High confidence ChIP-seq peaks. .............................................................................. 12
Figure 3-1 Pie chart of SNP genomic location .............................................................................. 21
Figure 3-2 Identification and classification of H3K27Ac (a) and CTCF (b) sites in prostate cells. . 25
Figure 3-3 Experimental and analytical steps used to identify PCa risk-associated regulatory
elements involved in chromatin loops. ................................................................................ 27
Figure 3-4 PCa risk SNPs associated with H3K27Ac sites and chromatin loops. .......................... 28
Figure 3-5 PCa risk SNPs associated with CTCF sites and chromatin loops .................................. 32
Figure 3-6 Genomic circos plot (Krzywinski et al., 2009) of 22Rv1 risk CREs, risk intra-
chromosome loops and fine-mapped PCa GWAS SNPs ....................................................... 35
Figure 3-7 Gene Ontology (GO) biological processes that are enriched in the set of PCa
susceptibility related genes .................................................................................................. 36
Figure 4-1 Experimental workflow for functional investigation of PCa risk-associated CTCF sites.
.............................................................................................................................................. 38
Figure 4-2 Circos plots of risk loci on chromosome 1 (left) and chromosome 12 (right). ............ 39
Figure 4-3 KCNN3 is upregulated upon targeted deletion of the region encompassing the CTCF
site near rs12144978. ........................................................................................................... 41
Figure 4-4 Analysis of the rs12144978-associated chromatin loops. ........................................... 42
Figure 4-5 Deletion of PCa risk-associated CTCF site 1 in different cell lines. ............................ 43
Figure 4-6 Deletion of the region encompassing the PCa risk-associated CTCF site near
rs4919742 increases KRT gene expression. .......................................................................... 45
Figure 4-7 Analysis of the rs4919742-associated chromatin loops. ............................................. 46
Figure 4-8 Deletion of PCa risk-associated CTCF site 4 in different cell lines. ............................ 47
Figure 4-9 Genome-wide RNA-seq analysis of cells deleted for PCa risk-associated CTCF sites. . 48
Figure 4-10 PCa risk-associated CTCF loops encompass enhancer deserts. ................................ 49
6
Figure 4-11 PCa risk-associated CTCF loops may sequester genes from enhancers located
outside the loops. ................................................................................................................. 50
Figure 4-12 Knocking down of putative adopted enhancers decrease KCNN3 expression in site1
depletion cells. ..................................................................................................................... 51
Figure 4-13 KRT4 expression in TCGA Prostate Adenocarcinoma (PRAD) vs. matching normal
tissue and GTEx normal prostate tissues. ............................................................................ 52
7
List of tables
Table 2-1 ChIP-seq data sets ........................................................................................................ 13
Table 2-2 gRNAs ........................................................................................................................... 16
Table 2-3 CRISPR deletion site info .............................................................................................. 17
Table 2-4 genotype primers ......................................................................................................... 17
Table 2-5 qPCR primer (Zeisel, Yitzhaky, Bossel Ben-Moshe, & Domany, 2013) .......................... 18
Table 2-6 RNA-seq data sets ........................................................................................................ 19
Table 3-1 Exon SNPs ..................................................................................................................... 22
Table 3-2 Genes that have promoter SNPs .................................................................................. 23
Table 3-3 Prostate cells used in this study ................................................................................... 26
Table 3-4 Risk H3K27Ac sites in 22Rv1 ......................................................................................... 29
Table 3-5 Risk CTCF binding sites in 22Rv1 ................................................................................... 32
8
Abstract
Prostate cancer (PCa) is the leading cause of new cancer cases and the 3rd most common cause
of cancer death among men in the USA. Recent genome-wide association studies (GWAS) have
identified more than 100 loci associated with increased risk of prostate cancer, most of which
are in non-coding regions of the genome. Understanding the function of these non-coding risk
loci is critical to elucidate the genetic susceptibility to prostate cancer.
Results: I generated genome-wide regulatory element maps of normal and tumorigenic
prostate cells and, using genome-wide chromosome conformation capture data (in situ Hi-C)
from those same cells, I annotated the regulatory potential of 2,181 fine-mapped prostate
cancer risk-associated SNPs. I then predicted a set of target genes that are regulated by
prostate cancer risk-related H3K27Ac-mediated loops. I next identified prostate cancer risk-
associated CTCF sites involved in long-range chromatin loops. I used CRISPR-mediated deletion
to remove prostate cancer risk-associated CTCF anchor regions as well as the CTCF anchor
regions looped to the prostate cancer risk-associated CTCF sites. I observed up to 100-fold
increased expression of genes within the loops after deletion of the prostate cancer risk-
associated CTCF anchor regions.
Conclusions: I identified GWAS risk loci involved in long-range loops that function to repress
gene expression within chromatin loops. My studies provide new insights into the genetic
susceptibility to prostate cancer.
9
Chapter 1. Thesis synopsis
Prostate cancer
Prostate cancer (PCa) is the leading cause of new cancer cases and the 3rd most common cause
of cancer death among men in the USA (Siegel, Miller, & Jemal, 2017). Of note, 42% of prostate
cancer susceptibility can be accounted for by genetic factors, the highest among all cancer
types(O’Brien, 2000). Therefore, it is of critical importance to understand the underlying
genetic mechanisms that lead to PCa.
Genome wide association study (GWAS) and polygenic risk score
Investigators have used Genome-Wide Association Studies (GWAS) to investigate the genetic
components of risk for PCa. The first step in GWAS employs arrays of 1-5 million selected single
nucleotide polymorphisms (SNPs), which allows the identification of risk-associated haplotype
blocks in the human genome. Because large regions of the human genome are inherited in
blocks, each risk locus potentially contains many risk-associated SNPs. Fine-mapping studies are
next performed to more fully characterize these risk loci, identifying the SNPs that are in high
linkage disequilibrium with the GWAS-identified index SNP and that are most highly associated
with disease risk (as defined by allelic frequencies that are statistically most different between
cases and controls). Polygenic risk scores developed based on GWAS variants can be used to
assess disease risk.
Prostate cancer GWAS and the GWAS challenge
To date, GWAS has identified around 170 prostate cancer risk loci (Al Olama et al., 2014; Berndt
et al., 2015; Eeles et al., 2009; Schumacher et al., 2018; Thomas et al., 2008), with subsequent
fine-mapping studies employing both a multi-ethnic population and a single large European
population, identifying at least 2,181 PCa risk-associated SNPs (Amin Al Olama et al., 2015; Han
et al., 2015, 2016). Although considerable progress has been made in identifying genetic
variation linked to disease, the task of defining the mechanisms by which individual SNPs
contribute to disease risk remains a great challenge. One reason for this lack of progress is
because a great majority of the risk-related SNPs lie in non-coding regions of the genome. In
fact, only ~1% of the genome encodes protein and very few disease-associated SNPs are
located in coding exons. Thus, the GWAS field has been left with the conundrum as to how a
single-nucleotide change in a non-coding region might confer increased risk for a specific
disease. These non-coding risk-associated SNPs clearly do not affect disease risk by changing
the function of a specific protein, but rather it is thought that a subset of these SNPs may
contribute to changes in levels of expression of a key protein or non-coding regulatory RNA
(Hazelett et al., n.d.; Rhie et al., 2013; Tak & Farnham, 2015; Tak et al., 2016; Yao, Tak, Berman,
& Farnham, 2014). Deciphering which risk-associated SNP is likely to be a functional SNP (i.e. a
SNP that contributes to changes in gene expression) and not simply a “hitchhiker” SNP is the
first step in a post-GWAS study (Hazelett et al., 2016; Tak & Farnham, 2015).
10
Regulation and the 3D Genome
I reasoned that risk-associated SNPs lying within regulatory elements are more likely to be
causal, rather than hitchhiker, SNPs. Therefore, my approach, described in detail below, was to
perform a comprehensive analysis of the regulatory potential of all prostate cancer risk-
associated SNPs identified by the fine-mapping studies, by comparing the location of each SNP
to regulatory elements (promoters, enhancers, insulators, and chromatin loop anchors) that are
active in prostate cells. Using this approach, I reduced the set of 2,181 fine-mapped PCa risk-
associated SNPs to a smaller set of ~300 candidate functional SNPs. After selecting the subset of
SNPs that are in active regulatory regions, I next assayed the effects of removal of a small
genomic region harboring a SNP-containing regulatory element on gene expression (Tak &
Farnham, 2015). Using CRISPR-mediated deletion of candidate functional PCa risk-associated
SNPs at two risk loci, I have identified long-range loops that function to repress gene
expression. These experiments and analyses, along with a discussion of potential future
experiments leading from my studies are described in the following chapters. In addition, I have
included, as an Appendix, copies of manuscripts to which I contributed during my graduate
career at USC.
11
Chapter 2. Materials and methods
Cell culture
C4-2B cells were obtained from ViroMed Laboratories (Minneapolis, MN, USA). RWPE-1 (CRL-
11609), RWPE-2 (CRL-11610), 22Rv1 (CRL-2505), LNCaP (CRL-1740), and VCaP (CRL-2876) were
all obtained from American Type Culture Collection (ATCC). The human normal prostate
epithelial cells (PrEC) were obtained from Lonza (CC-2555, Lonza, Walkersville, MD, USA). Cells
were cultured ac- cording to the suggested protocols at 37 °C with 5% CO2. The medium used
to culture C4-2B (RPMI 1640), VCaP (DMEM), LNCaP (RPMI 1640), and 22Rv1 (RPMI 1640) was
supplemented with 10% fetal bovine serum (Gibco by Thermo Fisher, #10437036) plus 1%
penicillin and 1% streptomycin. For DHT experiments, 22Rv1 and LNCaP cells were grown in
phenol-red free RPMI 1640 with 10% charcoal-stripped fetal bovine serum for 48 h and then
treated with 10 nM DHT or vehicle for 4 h before harvest. RWPE-1 and RWPE-2 cells were
cultured in Keratinocyte Serum Free Medium kit (Thermo Fisher Scientific, 17005-042) without
antibiotics. PrEC cells were grown using the PrEGM Bullet Kit (Lonza, #CC-3166). All cell lines
were authenticated at the USC Norris Cancer Center cell culture facility by comparison to the
ATCC and/or published genomic criteria for that specific cell line; all cells were documented to
be free of mycoplasma. Pre-authentication was performed at Lonza (Walkersville, MD, USA) for
PrEC.
ChIP-seq and analysis
All ChIP-seq samples were performed in duplicate according to a previously published protocol
(O’Geen, Echipare, & Farnham, 2011; O’Geen, Frietze, & Farnham, 2010; Rhie et al., 2018). 5ug
CTCF antibody (Active Motif #61311) was used to precipitate 20ug chromatin for 22Rv1, PrEC,
RWPE-2, VCaP (rep1) cells and 10ul CTCF antibody (Cell Signaling #3418S) were used to
precipitate 20ug chromatin for LNCaP, C4-2B, RWPE-1, VCaP (rep2) cells. 8ug H3K27Ac antibody
(Active Motif #39133) was used to precipitate 20ug chromatin for all H3K27Ac ChIP-seq. 10uL
H3K27me3 antibody (Cell Signaling #9733S) was used to precipitate 20ug of 22Rv1 chromatin
for H3K27me3 ChIP-Seq. All antibodies were validated according to ENCODE standards;
validation documents are available on the ENCODE portal (encodeproject.org). ChIP-seq
libraries were prepared using Kapa Hyper prep kit (Kapa #KK8503) according to the provided
protocol. Samples were sequenced on Illumina HiSeq3000 machine using paired-ended 100bp
reads (except for H3K27Ac-LNCaP ChIP-seqs which were sequenced using 50bp single-ended
reads). All ChIP-seq data were mapped to hg19 and peaks were called using MACS2 (Zhang et
al., 2008) after preprocessing data with the ENCODE3 ChIP-seq pipeline
(https://www.encodeproject.org/chip-seq/). High confidence (HC) peaks were called by taking
peaks that were found in both duplicates for a given cell line/antibody combination using
intersectBed function from the bedtools suite (Quinlan & Hall, 2010).
12
Figure 2-1 High confidence ChIP-seq peaks. Shown are peak score vs. peak rank graphs for H3K27Ac or CTCF ChIP-seq datasets for normal (PrEC and RWPE1)
and tumor (RWPE2, LNCaP, VCaP, 22RV1, and C4-2B) prostate cells; for LNCaP and 22RV1 cells, ChIP-seq was
performed before and after addition of DHT.
13
Table 2-1 ChIP-seq data sets
library id target cell
Treat-
ment
public.id
run.
type
reads
macs2.
peaks
HC
peaks
USC1028 CTCF LNCaP EtOH ENCLB241MDZ PE 18440880 111866
USC1030 CTCF LNCaP EtOH ENCLB201WSI PE 29477797
139339
CTCF LNCaP EtOH 43157
USC1029 CTCF LNCaP DHT ENCLB039OJJ PE 18951538 116824
USC1031 CTCF LNCaP DHT ENCLB380GDQ PE 31316289
323484
CTCF LNCaP DHT 54341
USC1032 CTCF C4-2B N.A. ENCLB941BJR PE 13137698 69950
USC1033 CTCF C4-2B N.A. ENCLB281KTN PE 15057721 86218
CTCF C4-2B N.A. 56601
USC1034 CTCF RWPE-1 N.A. ENCLB160JPF PE 21695068 82872
USC1035 CTCF RWPE-1 N.A. ENCLB323YFJ PE 16941694 137385
CTCF RWPE-1 N.A. 44545
USC1089 CTCF VCaP N.A. ENCLB352TMT PE 21624456 84024
USC877 CTCF VCaP N.A. ENCLB697NZR PE 33842592 75619
CTCF VCaP N.A. 56596
USC863 CTCF 22Rv1 EtOH ENCLB407QHY PE 48471492 70968
USC864 CTCF 22Rv1 EtOH ENCLB947RFP PE 51359449 75199
CTCF 22Rv1 EtOH 57102
USC865 CTCF 22Rv1 DHT ENCLB204YIW PE 18439223 65694
USC866 CTCF 22Rv1 DHT ENCLB110WLV PE 54084839 73292
CTCF 22Rv1 DHT 51485
USC875 CTCF RWPE-2 N.A. ENCLB933LMG PE 20453623 72758
USC876 CTCF RWPE-2 N.A. ENCLB666CZM PE 45822145 95163
CTCF RWPE-2 N.A. 62293
USC873 CTCF PrEC N.A. ENCLB046TKT PE 36796888 80754
USC874 CTCF PrEC N.A. ENCLB116DCU PE 57939253 96716
CTCF PrEC N.A. 69945
USC1054 H3K27Ac C4-2B N.A. ENCLB205HAN PE 20711977 69034
USC1055 H3K27Ac C4-2B N.A. ENCLB151CZX PE 35344839 96634
H3K27Ac C4-2B N.A. 50385
USC1088 H3K27Ac VCaP N.A. ENCLB856OBQ PE 27033773 111827
USC882 H3K27Ac VCaP N.A. ENCLB904PZW PE 28781742 91846
H3K27Ac VCaP N.A. 81879
USC463 H3K27Ac LNCaP EtOH GSM1249447 SE 37954028 113480
14
USC736.YG6 H3K27Ac LNCaP EtOH
SE 3045875 105840
H3K27Ac LNCaP EtOH 75020
USC464 H3K27Ac LNCaP DHT GSM1249448 SE 46907188 125476
USC737.YG7 H3K27Ac LNCaP DHT
SE 2302834 80093
H3K27Ac LNCaP DHT 63792
USC741 H3K27Ac RWPE-1 N.A. ENCLB154ZKP PE 56688191 103161
USC742 H3K27Ac RWPE-1 N.A. ENCLB807RBV PE 50371504 120010
H3K27Ac RWPE-1 N.A. 94688
USC867 H3K27Ac 22Rv1 EtOH ENCLB605NOX PE 34857660 63808
USC868 H3K27Ac 22Rv1 EtOH ENCLB149SBT PE 20442766 59358
H3K27Ac 22Rv1 EtOH 48796
USC869 H3K27Ac 22Rv1 DHT ENCLB989TCG PE 40266783 65529
USC870 H3K27Ac 22Rv1 DHT ENCLB707NFY PE 33188421 60428
H3K27Ac 22Rv1 DHT 48922
USC880 H3K27Ac RWPE-2 N.A. ENCLB566UMR PE 18789916 61247
USC881 H3K27Ac RWPE-2 N.A. ENCLB446XXQ PE 35005114 73445
H3K27Ac RWPE-2 N.A. 61085
USC878 H3K27Ac PrEC N.A. ENCLB588PFU PE 52794097 76323
USC879 H3K27Ac PrEC N.A. ENCLB910GSC PE 38818186 81579
H3K27Ac PrEC N.A. 67667
USC1090 INPUT VCaP N.A. ENCLB968FVP PE 36585188 N.A.
USC887 INPUT VCaP N.A. ENCLB244AHP PE 27379266 N.A.
USC844 INPUT C4-2B N.A. ENCLB838PTL PE 17589820 N.A.
USC851 INPUT C4-2B N.A. ENCLB695NWD PE 22976342 N.A.
USC849 INPUT RWPE-1 N.A. ENCLB495RTI PE 67360775 N.A.
USC850 INPUT RWPE-1 N.A. ENCLB289ZPO PE 48201954 N.A.
USC852 INPUT LNCaP EtOH ENCLB827RAP PE 44718059 N.A.
USC853 INPUT LNCaP EtOH ENCLB793TTX PE 33594553 N.A.
USC871 INPUT 22Rv1 EtOH ENCLB509KOU PE 32043223 N.A.
USC872 INPUT 22Rv1 EtOH ENCLB527PRR PE 39920339 N.A.
USC885 INPUT RWPE-2 N.A. ENCLB798EFB PE 26891554 N.A.
USC886 INPUT RWPE-2 N.A. ENCLB563ZFD PE 12290319 N.A.
USC883
INPUT PrEC N.A.
ENCLB881YCH
PE 32016764 N.A.
USC884
INPUT PrEC N.A.
ENCLB056QHH
PE 32351751 N.A.
USC1335 H3K27Ac
22Rv1-
11/12-
D8
N.A.
SE 52363105 33812
15
USC1337 H3K27Ac
22Rv1-
11/12-
D8
N.A.
SE 32749924 26119
USC1336 H3K27Ac
22Rv1-
11/12-
F6
N.A.
SE
16559716 70202
USC1338 H3K27Ac
22Rv1-
11/12-
F6
N.A.
SE
20326664 32055
USC1341 H3K27Ac
22Rv1-
22/23-
F8
N.A.
SE
7934908 28159
USC1339 INPUT
22Rv1-
11/12-
D8
N.A.
SE 33672303 N.A.
USC1340 INPUT
22Rv1-
11/12-
F6
N.A.
SE 47079049 N.A.
USC1342 INPUT
22Rv1-
22/23-
F8
N.A.
SE 18922635 N.A.
Hi-C experiments and analyses
In situ Hi-C experiments were performed following the original protocol by Rao et al. (S. S. Rao
et al., 2014) with minor modifications (Luo, Rhie, Lay, & Farnham, 2017); Hi-C libraries were
prepared by Dr. Fides Lay (a member of the Farnham laboratory). Hi-C datasets were processed
using the HiC-Pro (Servant et al., 2015) to make normalized 10kb resolution matrices. Intra-
chromosomal loops (50kb-10Mb range) were selected using Fit-Hi-C using a q-value < 0.05 (Ay,
Bailey, & Noble, 2014), as described in previous studies (Luo et al., 2017); Hi-C analyses were
performed by Dr. Suhn Rhie (a member of the Farnham laboratory). Hi-C chromatin interaction
heatmaps were visualized using the HiCPlotter (Akdemir & Chin, 2015).
SNP annotation
Fine-mapped SNPs from previous studies (Amin Al Olama et al., 2015; Han et al., 2015,
2016)were curated and SNP information was extracted from dbSNP147. SNPs were annotated
(Appendices 1. Table of annotated fine-mapped SNPs) by their overlap with the genomic
coordinates of a) a comprehensive set of DHS downloaded from the ENCODE project portal at
encodeproject.org, b) H3K27Ac high confidence peaks, c) regions corresponding to +/- 1kb from
CTCF high confidence peak summits, and d) chromatin loops and topologically associated
domains from Hi-C or Cohesin HiChIP data from GM12878 cells(Gomez-Marin et al., 2015; S. S.
Rao et al., 2014), RWPE-1 normal prostate cells (Luo et al., 2017) and 22Rv1 and C4-2B prostate
cancer cells (Rhie et al, in preparation); annotation was performed using the annotateBed
function in bedtools (Quinlan & Hall, 2010).
16
CRISPR/Cas9 -mediated genomic deletions
gRNAs were cloned into the pSpCas9(BB)-2A-Puro (PX459) V2.0 plasmid (Addgene #62988)
following a previously published protocol (Ran et al., 2013); the sequence of all guide RNAs
used in this study can be found in Table 2-2. 22Rv1 cells (wild type or single deletion clones)
were transfected with guide RNA and Cas9 expression plasmids using Lipofectamine LTX with
PLUS regent (Thermo Fisher, #15338100) according to the manufacture's protocol. After 24hrs
transfection, cells were treated with 2ug/mL puromycin for 48-72 hrs (ensuring that the un-
transfected control cells all died). The media was then replaced with new media without
puromycin and the cells were allowed to recover for 24-48 hrs. The cells were then harvested
for further analysis or disassociated and sorted into 96-well plates with 1 cell/well using flow
cytometry. The single cells were grown into colonies, then expanded to obtain clonal
populations for further analysis. Cell pools and single cells were harvested using QuickExtract
DNA Extraction Solution (Epicentre #QE9050) according to the manufacture’s protocol and
genotyped by PCR using primers listed in Table 2-4.
To control for off target effects, two individual sets of guide RNA were designed and used when
assessing gene expression change for site 1 and site 4 knock outs.
Table 2-2 gRNAs
Guide RNA id
CTCF site targeted
for deletion
Name
Sco
re
Sequence w/ PAM
CTCF.gRNA05 Site 3 1q21.3.UBE2Q1.TSS_R1 90
TAAATGGCCCCACGAAGACCAG
G
CTCF.gRNA06 Site 3 1q21.3.UBE2Q1.TSS_L1 93
CGGGCACGCTCGCCACGGCTCG
G
CTCF.gRNA11
Site 1
(rs12144978) risk1q21.3_L1 92
GAAGTGCTGCCGACGTGCACTG
G
CTCF.gRNA12
Site 1
(rs12144978) risk1q21.3_R1 94 TCTTCGGATTATTACTGCGGAGG
CTCF.gRNA21
Site 4 (rs4919742,
rs902774) risk12q13.13_In4 97
CGGCTAGGATTGACCGAAGCGG
G
CTCF.gRNA22
Site 4 (rs4919742,
rs902774) risk12q13.13_L1 93
GTATGTGCATAGGCGCAGACAG
G
CTCF.gRNA23
Site 4 (rs4919742,
rs902774) risk12q13.13_R1 86 ACATACATGTACCGTTCTCCAGG
CTCF.gRNA24 Site 2 1q21.3.KCNN3-3'CBS_L 95 TACCCCGAGGTGCTCGCTACAGG
CTCF.gRNA25 Site 2
1q21.3.KCNN3-
3'CBS_R(88) 88 CTCTGGGAGTTCGTAAGACCTGG
CTCF.gRNA26 Site 2
1q21.3.KCNN3-
3'CBS_R(81) 81 GGGCTATCTAGCTCCAGAGTTGG
CTCF.gRNA35
Site 1
(rs12144978) 1q21.3-riskCBS-L2 93 GATGGCTCTCTCTACCGAGTCGG
17
CTCF.gRNA36
Site 1
(rs12144978) 1q21.3-riskCBS-R2 89 CCATCATCTCTCGGTTGCTATGG
CTCF.gRNA37
Site 4 (rs4919742,
rs902774) 12q13.13-riskCBS-L2 85 GTGCGCGTGTGTGTACTGTCAGG
CTCF.gRNA38 Site 6 12q13.13-Site6-L 90
GTTGGGGGGGCACGATTTAAAG
G
CTCF.gRNA39 Site 6 12q13.13-Site6-R 88 GGCAGCATTCCATTCCGGAAAGG
CTCF.gRNA40 Site 5 12q13.13-Site5-L 97
GCGTGGACGACTTACCGGGGAG
G
CTCF.gRNA41 Site 5 12q13.13-Site5-R 91 TAGAGTGCCGGCTTTAACACAGG
Table 2-3 CRISPR deletion site info
CTCF site Guide RNAs Size of deletion Coordinates
Site 1 11,12 1607 chr1:154850252-154851859
Site 1 35,36 1221 chr1:154850375-154851596
Site 2 24,26 395 chr1:154642200-154642595
Site 3 5,6 913 chr1:154531542-154532455
Site 4 22,23 2875 chr12:53272450-53275325
Site 4 21,37 1384 chr12:53272552-53273936
Site 5 40,41 1969 chr12:52976094-52978063
Site 6 38,39 5457 chr12:52552496-52557953
Table 2-4 genotype primers
Genotype Location Forward Reverse
Amplicon
size
delSite1 in tgaatgagagggctggacag CCCCATAGCAACCGAGAGAT 190 bp
delSite2 in
ACTAGGTGCTTGACTCTGG
G CGAAGCACATTGAAGAGGGG 141 bp
delSite3 in gctctcatggcgttttgtga ttatggtgtcctcgcatcga 138 bp
delSite4 in GGCTCAGGTTCTTATGCTGC agaccttggctcgaaacaga 189 bp
delSite5 in CAAAGTCCTGCTGTTGGTCC GCTTCGACACCTCCTTACCT 210 bp
delSite6 in catcctgcctctcaacatgc tgtgctccagggtgaatgat 252 bp
delSite1 out
TGCTGATACCGTGTCTGGA
A CTCTTTCCCTTCTCAGCCCA 1847 bp
delSite2 out CTCTGCAAATGCTCCACTGC GCATACAGCAAGTGCTTAATAGTT 806 bp
delSite3 out TGATCTCTCCGTCTTTCCCG ATCAGCTTGCACACACAGAC 1210 bp
delSite4 out tgctctgcttcctgacatga GCTCACGTTTGTAACCCCAA 3173 bp
delSite5 out CCACCACGCTATGGTTCAGT TTCACACCTGAGGGCCTTTG 2480 bp
delSite6 out GAATGGTCTTCCCCAGGCTT TTGAGAGCCGTTCCCCTTTT 6369 bp
18
RNA-seq and qPCR analysis
Total RNA was extracted from cell pools and cell populations derived from single cell colonies
using TRIzol protocol (Thermo Fisher, #15596026) or DirectZol (Zymo, #R2062). For RNA-seq,
ERCC spike-in control mix 1 (Thermo Fisher, #4456704) was added before library preparation,
according to manufacturer’s suggestion. Libraries were made using the Kapa Stranded mRNA kit
with beads (Kapa #KK8421). Samples were sequenced on an Illumina HiSeq3000 with single-end
50 bp read length. RNA-seq results were aligned to Gencode v19 and reads were counted using
STAR (Dobin et al., 2013). Differentially expressed genes were determined using edgeR
(McCarthy, Chen, & Smyth, 2012; Robinson, McCarthy, & Smyth, 2010) and batch effects were
corrected using the RUVg function of RUVseq (Risso, Ngai, Speed, & Dudoit, 2014). See Table
2-6 for more information about the RNA-seq libraries. For analysis of RNA from cell pools, cDNA
libraries were made using the Maxima kit (Thermo Fisher, #K1671). qPCR was performed using
SYBR Green (Bio-Rad, #1725275) and a Bio-Rad CFX96 machine (Bio-Rad, #1855196). See Table
2-5 for information concerning the primers used in RT-qPCR reactions.
For analysis of site 1 by RNA-seq, a 1607 bp region was deleted using guide RNAs 11+12; 2
independent clones were identified and each clone was analyzed in triplicate (Figure 4-3).
Effects of deleting site 1 on the expression of KCNN3 were also analyzed in a cell pool using
guide RNAs 11+12 or 35+36 (which deleted a 1221 bp region encompassing site 1), in wt cells
and in a cell pool that was previously deleted for a 913 bp region encompassing site 3 (Figure
4-4). Effects of deleting site 2 on the expression of KCNN3 were analyzed in a cell pool using
guide RNAs 24+26 (which deleted a 395 bp region encompassing site 2), in wt cells and in a cell
clone previously deleted for site 3 (Figure 4-4). For analysis of site 3 deletion by RNA-seq, a 913
bp region was deleted using guide RNAs 5+6; 3 independent clones were identified and each
clone was analyzed by RNA-seq. Effects of deleting site 3 in combination with deletions of site 1
and site 2 are described above. For analysis of site 4 by RNA-seq, a 2875 bp region was deleted
using guide RNAs 22+23; 2 independent clones were identified and each clone was analyzed in
triplicate by RNA-seq (Figure 4-6). Effects of deleting site 4 on the expression of KRT78 were
also analyzed in a cell pool using guide RNAs 21+37 to delete a 1384 bp region encompassing
site 4 plus guide RNAs 40+41 to delete a 1969 bp region encompassing site 5 or guide RNAs
38+39 to delete a 5457 bp region encompassing site 6 (Figure 4-7). Effects of deleting site 5 on
KRT78 expression was analyzed using guide RNAs 40+41 alone or in combination with guide
RNAs 38+39 to delete site 6. Finally, effects of deleting a 5457 bp region encompassing site 6 on
KRT78 expression was analyzed in a cell pool using guide RNAs 38+39 (Figure 4-7); combination
deletions are described above; see Table 2-3 for details of all guide RNA locations and deletion
sizes.
Table 2-5 qPCR primer (Zeisel, Yitzhaky, Bossel Ben-Moshe, & Domany, 2013)
Gene Forward Reverse
UCSC qPCR primer pair
name*
UBE2Q1 GATCCTGTCCGCATCCACT GTCATCAGACTCCACCGACC UBE2Q1_uc001fff.1_11_1_1
19
KCNN3 GTCCACCATCATCCTTTTGG GTACAGGATGCGCTCGTAGG KCNN3_uc001ffp.3_5_2_1
ADAR GGTGGCCTTTTGGAGTACG GACCCCCAACTTTTGCTTG ADAR_uc001ffi.3_8_1_1
ASH1L ATTGTAGCCCTACCCGGAAA GCTGCAGCAGACTATCCACA ASH1L_uc009wqq.3_23_2_2
CREB3L4 CAGTGAGCTGCCCTTTGATG ATCGGTCAGGAACAGGGTTT CREB3L4_uc001fdq.2_3_2_1
KRT8 CCCTCAACAACAAGTTTGCC TCCAGCATCTTGTTCTGCTG KRT8_uc009zmm.1_6_1_1
KRT78 GCAGGAAACGAAAGTCCAGA ACGCTGCTCAGCATCAGTG KRT78_uc001sbc.1_2_1_1
KRT79 GATGGATCTGCATGGCAAAG AGACACGTGGGTCTGCACTT KRT79_uc001sbb.3_4_1_1
KRT80 AAAAGCCTGGAGAGCTTCGT CCGACACATCCTTCACCTGT KRT80_uc001rzw.3_4_1_1
KRT4 CAGCGTGGAGGACTTCAAGA CAAAGTCATTCTCGGCTGCT KRT4_uc001saz.3_6_1_1
Table 2-6 RNA-seq data sets
* For analysis of site 1 by RNA-seq, the region was deleted using one set of guide RNAs; 2 independent clones were
identified and each clone was analyzed in triplicate.
^ For analysis of site 3 by RNA-seq, the region was deleted using one set of guide RNAs; 3 independent clones were
identified and each clone was analyzed.
# For analysis of site 4 by RNA-seq, the region was deleted using 1 set of guide RNAs; 2 independent clones were
identified and each clone was analyzed in triplicate.
**For analysis of wt 22rv1 by RNA-seq, the cell line was treated with Cas9 without guide RNAs; 3 independent
clones were identified and each clone was analyzed.
RNA-seq
library ID
Genotype (chr
targeted)
risk SNP
targeted
gRNAs
used
clone totalReads public.id
USCRNA249
22rv1 del site 1
(chr 1) * rs12144978 11+12 D8, rep1 20717044 GSE118514
USCRNA250
22rv1 del site 1
(chr 1) rs12144978 11+12 D8, rep2 17741591 GSE118514
USCRNA251
22rv1 del site 1
(chr 1) rs12144978 11+12 D8, rep3 17386454 GSE118514
USCRNA252
22rv1 del site 1
(chr 1) rs12144978 11+12 F6, rep1 15667354 GSE118514
USCRNA253
22rv1 del site 1
(chr 1) rs12144978 11+12 F6, rep2 16413779 GSE118514
USCRNA254
22rv1 del site 1
(chr 1) rs12144978 11+12 F6, rep3 15706497 GSE118514
USCRNA246
22rv1 del site 3
(chr 1) ^ N.A. 5+6 B8, rep1 18860367 GSE118514
USCRNA247
22rv1 del site 3
(chr 1) N.A. 5+6 F3, rep1 19257433 GSE118514
USCRNA248
22rv1 del site 3
(chr 1) N.A. 5+6 H3, rep1 17475875 GSE118514
USCRNA258
22rv1 del site 4
(chr 12)#
rs4919742,
rs902774 22+23 A5, rep1 16517570 GSE118514
20
USCRNA259
22rv1 del site 4
(chr 12
rs4919742,
rs902774 22+23 A5, rep2 17464954 GSE118514
USCRNA260
22rv1 del site 4
(chr 12)
rs4919742,
rs902774 22+23 A5, rep3 12881502 GSE118514
USCRNA261
22rv1 del site 4
(chr 12)
rs4919742,
rs902774 22+23 F8, rep1 16662551 GSE118514
USCRNA262
22rv1 del site 4
(chr 12)
rs4919742,
rs902774 22+23 F8, rep2 18270429 GSE118514
USCRNA263
22rv1 del site 4
(chr 12)
rs4919742,
rs902774 22+23 F8, rep3 16117771 GSE118514
USCRNA264
22rv1 wild type
clone** N.A. WT A4, rep1 14482450 GSE118514
USCRNA265
22rv1 wild type
clone N.A. WT C6, rep1 16543670 GSE118514
USCRNA266
22rv1 wild type
clone N.A. WT G3, rep1 13543115 GSE118514
21
Chapter 3. Comprehensive annotation of prostate cancer GWAS risk loci
Part of the work described in this chapter has been published in Guo, Y., Perez, A. A., Hazelett, D.
J., Coetzee, G. A., Rhie, S. K., & Farnham, P. J. (2018). CRISPR-mediated deletion of prostate
cancer risk-associated CTCF loop anchors identifies repressive chromatin loops. Genome
biology, 19(1), 160. Prostate cell Hi-C libraries used in this chapter were prepared by Fides Lay
and Hi-C analysis were done by Suhn Rhie. Suhn Rhie also provided the DHS information used in
annotation. I performed the remainder of the experimental and analytical procedures.
In this chapter, I describe the approach by which I identified functional risk variants from
“noisy” GWAS results and tackled the GWAS challenge.
Fine-mapped risk SNPs are mostly in non-coding regions of genome
Genome wide association studies are conducted
using SNP microarrays to genotype surrogate index
SNPs. However, in order to identify causal SNPs and
understand the biology behind the GWAS results,
fine-mapping is needed. Through dense genotyping,
fine-mapping can further characterize each risk
variant in a linkage disequilibrium block and narrow
down from a large set of SNPs to a more precise pool
of SNPs that might be causal.
Several fine-mapping studies have been previously
performed to comprehensively expand the analysis
of the 100 known prostate cancer GWAS loci (as of
2015) (Amin Al Olama et al., 2015; Chen et al., 2008;
Han et al., 2016). Combining 3 published studies, I curated a list of 2,181 putative causal SNPs
for further mechanistic studies. As expected from analyses of SNPs related to other cancer
types (Elkon & Agami, 2017), most of the SNPs linked to an increased risk of prostate cancer are
located in non-coding regions. Only a small portion of the SNPs can be directly linked to a
coding gene, including 38 SNPs that are found in exons (Figure 3-1); these 38 SNPs identified 26
different genes. Table 3-1 lists all of the exonic SNPs and the corresponding genes. Of these, the
SNPs most likely to affect gene function are those that change the encoded amino acid.
However, only a few exonic SNPs cause non-synonymous changes; the functional effects of
several of these exonic SNPs have been previously characterized (Hazelett et al., 2014). The
vast majority of the prostate cancer risk-associated SNPs fall in the genomic categories of intron
or others (which mostly identifies intergenic regions). It is difficult to understand the function of
intronic or intergenic SNPs. However, studies have shown that non-coding SNPs tend to be
enriched in regulatory elements. SNPs that lie within regulatory elements in the non-coding
genome are more likely to be causal than the other non-exonic SNPs. Therefore, my next step
was to identify the SNPs that fall within regulatory elements.
1076
975
51
38 41
Prostate cancer GWAS fine-mapped SNPs
(n=2,181)
Intron
Others
Promoter
Exon
TTS
Figure 3-1 Pie chart of SNP genomic location
TTS, Transcription termination site.
22
Table 3-1 Exon SNPs
SNP Gene
rs12473819 GGCX
rs6547621 GGCX
rs7605975 GGCX
rs2296763 ZBTB46
rs12573077 ARL3
rs1374961 FGF10
rs11765552 LMTK2
rs3801294 LMTK2
rs1469541 AC010969.1
rs73913955 AC010969.1
rs2061690 PBXIP1
rs6901971 GPRC6A
rs1810126 SLC22A3
rs2292334 SLC22A3
rs3088442 SLC22A3
rs3734650 ARMC2
rs3734651 ARMC2
rs1058588 VAMP8
rs4907792 RP11-56H2.2
rs114278123 KRT8P25
rs9854612 TMED10P2
rs2427533 ZGPAT
rs4919763 RPL7P41
rs2258056 RTEL1-TNFRSF6B
rs1048374 AP001258.4
rs11654557 ZNF652
rs6764769 ZBTB38
rs6152 AR
rs2274911 GPRC6A
rs3862792 CCND1
rs56095813 KRT8
rs13098 KRT8
rs115384359 PCAT1
rs76135898 PCAT1
rs77211883 PCAT1
rs3072 AC012065.7
rs9306895 AC012065.7
rs2257885 ARFRP1
23
Mapping of candidate cis regulatory elements in prostate cells
My goal in this study was to identify PCa risk-associated SNPs that lie within regulatory
elements, particularly elements that are involved in distal regulation. Regulatory elements are
sites of binding of transcription factors (TFs) and can be identified as regions of open chromatin
corresponding to DNase hypersensitive sites (DHS) or by the method of ChIP-seq using
antibodies to specifically modified histones. Because DHS sites identify regions of open
chromatin that closely correspond to the TF binding platform of regulatory elements, requiring
the SNPs to overlap a DHS will reduce the number of passenger SNPs that lie at the outer
margins of broad ChIP-seq peaks. Therefore, I began by determining which of the 2,181 fine-
mapped PCa SNPs are located within the set of known DHS. To capture as many SNPs within
regulatory elements as possible, I obtained an aggregated set of 2.89 million DHS peaks that
have been identified from a large number of human cell lines and tissues (downloaded from the
ENCODE project portal at encodeproject.org). Overlapping the genomic coordinates of these
DHS with the genomic locations of the set of fine-mapped PCa risk-associated SNPs identified
443 SNPs located within open chromatin. This set of SNPs has a greater likelihood of impacting
binding of transcription factors, as compared to SNPs not in open chromatin, and are more
likely to play a role in transcription regulation.
Because I used DHS sites from more than 100 cell or tissue samples, many of the SNP-
associated regulatory elements may not be active in prostate cells. Therefore, as a second step I
wanted to identify the subsets of DHS-localized SNPs that are within regulatory elements that
are active in prostate cells. There are 3 major cis regulatory element (CRE) categories:
promoters, enhancers and insulators. Promoters are defined as the region near a transcription
start site, active enhancers are distal elements that are usually marked by H3K27Ac (Creyghton
et al., 2010), and Insulator elements are usually bound by the CTCF protein (and regulate gene
expression by chromatin looping (Nora et al., 2017). In my SNP annotations, I defined
promoters as -1kb to +100bp from the transcription start site (TSS); to map active enhancers
and insulators genome wide, I performed H3K27Ac and CTCF ChIP-seq. Table 3-2 lists genes
that have risk SNPs in their promoters. Some of these SNPs also co-localize with H3K27Ac sites
or CTCF binding sites (CBS) within promoters. I will further discuss these promoters in Chapter
4.
Table 3-2 Genes that have promoter SNPs
Name SNP
ANO7 rs76832527
BIK rs4988372
CCND1 rs55911137
FOXP4 rs2104506
HLA-DRA
rs9268641, rs3129872, rs2395179, rs2395180,
rs2395181, rs3129873, rs3129874, rs3129875
LIME1 rs2427534, rs914559
MDM4 rs4951391, rs4951392
24
MLPH
rs13386290, rs67071074, rs58057291,
rs2271809
MSMB rs10993994
OTX1 rs58235267
PMVK rs1109815
RFX6 rs339356
RP11-255B23.1 rs59765225
RP11-
307C12.13
rs1109815
RP11-430C7.4 rs10900601
RP11-81K2.1 rs118099826, rs117801358
RP4-583P15.14 rs914559
SKIL rs78416326
SLC2A4RG rs1741708, rs2253823, rs2253829
TMED10P2 rs9869095
TRIM8 rs34032774
ZBTB38 rs7632381
ZGPAT rs1151622
Studies of cultured prostate cancer cells and sequencing of prostate cancers have revealed
multiple distinct subgroups of prostate cancer (Wedge et al., 2018), including prostate cancer
cells that are refractory to androgen treatment, that contain the androgen receptor splice
variant AR-V7, or that express fusion proteins such as TMPRSS2-ERG. Because I wished to
capture SNPs in regulatory elements that are present in multiple prostate cancer subgroups, as
well as in normal prostate cells, I performed H3K27Ac and CTCF ChIP-seq in 2 non-tumorigenic
prostate cell populations (PrEC and RWPE-1) and 5 prostate cancer cell lines (RWPE-2, 22Rv1,
C4-2B, LNCaP, and VCaP). PrEC are normal human prostate epithelial primary cells whereas
RWPE-1 is a normal prostate epithelial cell line that was immortalized by transfection with a
single copy of human papilloma virus 18 (Bello, Webber, Kleinman, Wartinger, & Rhim, 1997).
RWPE-2 cells are derived from RWPE-1 cells by transformation with the Kirsten murine sarcoma
virus (Bello et al., 1997). LNCaP is an androgen-sensitive prostate adenocarcinoma cell line
derived from a lymph node metastasis (Horoszewicz et al., 1980). C4-2B is a castration-
resistant prostate cancer cell line derived from a LNCaP xenograft that relapsed and
metastasized to bone after castration (Thalmann et al., 1994); C4-2B cells do not require
androgen for proliferation, having similar growth rates in the presence or absence of androgen
(Decker et al., 2012). VCaP cells are derived from a metastatic lesion to a lumbar vertebrae of a
Caucasian male with hormone refractory prostate cancer; VCaP is a TMPRSS2-ERG fusion
positive prostate cancer cell line, expressing high levels of the androgen receptor splice variant
AR-V7 (Korenchuk et al., 2001). 22Rv1 is a castration-resistant human prostate carcinoma
epithelial cell line that is derived from an androgen-dependent CWR22 xenograft that relapsed
during androgen ablation (Sramkoski et al., 1999); this cell line also expresses the androgen
receptor splice variant AR-V7. Unlike most prostate cancer cell lines, 22Rv1 has an almost
25
diploid karyotype. Table 3-3 summarizes the information concerning the cell lines that I used in
my studies.
Each ChIP-seq experiment was performed in duplicate and, for 22Rv1 and LNCaP cells, in the
presence or absence of dihydrotestosterone (DHT), for a total of 18 datasets for each mark (36
ChIP-seq experiments in total). Peaks were called for individual datasets using MACS2 and the
ENCODE3 pipeline (Zhang et al., 2008) and only high confidence (HC) peaks (defined as those
peaks present in both replicates) were used for further analysis; see Figure 2-1 for ranked peak
graphs for each HC peaks dataset. As shown in Figure 3-2, I identified 48,796-94,688 H3K27Ac
and 43,157-69,945 CTCF sites that were reproducible in the two replicates from each cell line
and growth condition. As expected from other studies, most of the H3K27Ac and CTCF sites are
located either in introns or are intergenic, with a small subset located in promoter regions
(defined as 1kb upstream to +100 bp downstream from a known TSS).
Figure 3-2 Identification and classification of H3K27Ac (a) and CTCF (b) sites in prostate cells.
H3K27Ac and CTCF ChIP-seq was performed in duplicate for each cell line; for 22Rv1 and LNCaP cells, ChIP-seq was
performed in duplicate in the presence or absence of DHT. Peaks were called for individual datasets using MACS2
and the ENCODE3 pipeline, then peaks present in both replicates were identified (high confidence peaks) and used
for further analysis. The location of the peaks was classified using the HOMER annotatePeaks.pl program and the
6543
28396
28972
3477
16934
21053
4908
25410
27145
5272
22125
24651
4766
19858
22469
3641
22572
26643
2719
17027
20480
2359
21235
27872
4717
22929
24429
0.00
0.25
0.50
0.75
1.00
PrEC
RWPE−1
RWPE−2
22Rv1
22Rv1+DHT
C4−2B
LNCaP
LNCap+DHT
VCaP
Annotation
3UTR
5UTR
Exon
Intergenic
Intron
Others
Promoter
TTS
CTCF high confidence peaks
5628
32872
24803
6865
52420
25109
5559
29976
21427
6570
22345
14748
6643
22460
14671
6402
23645
15730
8086
41790
20073
7146
35856
16673
6476
41716
27876
0.00
0.25
0.50
0.75
1.00
PrEC
RWPE−1
RWPE−2
22Rv1
22Rv1+DHT
C4−2B
LNCaP
LNCap+DHT
VCaP
H3K27Ac high confidence peaks
Fraction
Guo_Figure 2.
A B
26
Gencode V19 database. The fraction of high confidence peaks in each category is shown on the Y axis, with the
number of peaks in each category for each individual cell line and/or treatment shown within each bar
Table 3-3 Prostate cells used in this study
Using 3D chromatin interaction datasets to prioritize CREs that form long-range looping
Distal CREs regulate genes by forming 3-dimensional (3D) chromatin loops. Enhancers loop to
promoters and bring in TFs that would regulation the transcription. Two insulators can come
together and form loops that insulate a local microenvironment that is segregated from the rest
of the genome.
In previous studies, the Farnham lab found that deletion of a regulatory element that has active
histone marks does not always alter the transcriptome (Tak et al., 2016). This suggests that not
all regulatory elements (even if marked by H3K27Ac) are critically involved in gene regulation in
that particular cell type under those particular conditions (perhaps due to functional
redundancy of regulatory elements). I reasoned that one way to identify critical regulatory
elements could be to focus on the subset that is involved in chromatin looping. Although
analysis of Hi-C data suggests that many of the long-range chromatin loops (e.g. those that are
anchored by CTCF sites and that define TADs) are conserved to multiple cell types, intra-TAD
loops may be cell type-specific (S. S. Rao et al., 2014). Therefore, I used in situ Hi-C (S. S. Rao et
al., 2014) from the Farnham lab for normal prostate RWPE-1 cells (Luo et al., 2017) and for the
prostate cancer cell lines C4-2B and 22Rv1 (Rhie et al., manuscript in preparation). Because high
resolution chromosome conformation assays require billions of NGS reads, our in-house Hi-C
analyses were limited to a resolution of 10kb, which is challenging for assigning CREs, that are
usually 1-2 kb in size, to loop anchors. However, there are published Hi-C and cohesin HiChiP
datasets of GM12878 cells (Mumbach et al., 2016; S. S. Rao et al., 2014) that have 1kb
resolution. Therefore, I leveraged the conserved nature of chromatin loops and utilized
GM12878 looping data to further refine my annotation results from prostate cells, allowing me
to more precisely identify risk CREs that form high confidence loops.
27
Identification of risk CREs that are related to PCa susceptibility
Figure 3-3 illustrates the SNPs annotation workflow. A comparison of the set of DHS-localized
SNPs to the union set of H3K27Ac or CTCF HC peaks from the prostate cells identified 222 PCa
risk-associated SNPs located within a DHS site that corresponds to an H3K27Ac peak (Figure
3-4) and 93 PCa risk-associated SNPs located within a DHS site that corresponds to a CTCF peak
(Figure 3-5).
I then overlapped PCa risk-associated DHS+, K27Ac+ SNPs with the genomic coordinates of the
anchors of the identified loops, identifying 203 SNPs located in the DHS portion of a H3K27Ac
ChIP-seq peak and associated with a chromatin loop (Figure 3-4); a list of these risk SNPs can be
found in Appendices 1. Table of annotated fine-mapped SNPs. Most of these SNPs are located
in intronic or intergenic regions and many are located in loops present in both prostate and
GM12878 cells. I performed similar analyses overlapping the PCa risk-associated DHS+, CTCF+
SNPs with the loop anchor regions and identified 85 SNPs located in the DHS portion of a CTCF
ChIP-seq peak and associated with a chromatin loop (Figure 3-5). Again, the majority of these
SNPs and co-localized risk CREs are located in intronic or intergenic regions.
2,181 fine-mapped
PCa GWAS SNPs
DNase-seq from
multiple consortiums
H3K27Ac ChIP-seq for
7 prostate cell lines
SNPs in open
chromatin; 443 SNPs
CTCF ChIP-seq for
7 prostate cell lines
PCa risk-associated H3K27Ac sites
involved in loops; 201 SNPs
Hi-C looping data
for prostate cells
PCa risk-associated
CTCF sites; 93 SNPs
PCa risk-associated
H3K27Ac sites; 222 SNPs
1
2
3
PCa risk-associated CTCF sites
involved in loops; 85 SNPS
Figure 3-3 Experimental and analytical steps used to identify PCa risk-associated regulatory elements involved
in chromatin loops.
Step (1): The subset of 2,181 fine-mapped PCa-associated SNPs that overlap a DNase hypersensitive site was
identified. Step (2): H3K27Ac and CTCF ChIP-seq was performed in duplicate in two normal (PrEC and RWPE-1)
and five cancers (RWPE-2, 22Rv1, C4-2B, LNCaP, and VCaP) prostate cell lines; data was collected plus or minus
DHT for 22Rv1 and LNCaP cells, for a total of 18 datasets for each mark (36 ChIP-seq samples). The SNPs in open
chromatin sites (i.e., those that are contained within a DHS site) were then subdivided into those that overlap a
H3K27Ac or a CTCF site in prostate cells; the number of PCa-associated SNPs associated with the H3K27Ac or CTCF
sites is shown. Step (3): The PCa risk-associated H3K27Ac and CTCF sites were overlapped with Hi-C looping data,
and the subset of each type of site involved in chromatin loops was identified; the number of PCa-associated SNPs
associated with the H3K27Ac or CTCF sites involved in looping is shown.
28
The ultimate goal for this study is to find the genes that contribute to PCa susceptibility. Hence
the next step is to identify the genes regulated by risk CREs through functional studies. The
22Rv1 cell line has been used as a castrate-resistant prostate cancer model, which is an
aggressive form of prostate cancer. I chose 22Rv1 as a model cell line because it is diploid, is
easily transfected, and represents an aggressive form of prostate cancer.
In 22Rv1, the 222 DHS+ SNPs within H3K27Ac marks are located in 41 distinct H3K27Ac sites
and the 93 DHS+ SNPs within CTCF peaks are located in 19 distinct CBS. As I indicated above,
some of these CREs are near a TSS, suggesting that they may perturb promoter function and
alter expression of the nearby genes. However, the majority of the SNP-containing CREs are
distal to genes. For these CREs, chromatin looping information can help identify their putative
target genes. Previous studies have suggested different mechanisms by which chromatin loops
can regulate gene expression. Loops can play a role either as activator by bringing a CRE close
to a TSS or as a repressor by insulating a gene from active regulatory elements (Flavahan et al.,
2016; Li et al., 2014). The tables below provide detailed information for each risk PCa SNP-
containing CRE and the promoters that they localize with, loop to, or insulating from for
H3K27Ac sites (Table 3-4) and CTCF binding sites (Table 3-5) in 22Rv1 cells.
Figure 3-4 PCa risk SNPs associated with H3K27Ac sites and chromatin loops.
Each row represents one of the 222 SNPs that are associated with both a DHS site and a H3K27ac peak in normal
or tumor prostate cells (Appendices 1. Table of annotated fine-mapped SNPs). The location of each SNP was
classified using the Gencode V19 database. “Others” represents mostly intergenic regions. To identify the subset of
H3K27Ac-associated risk SNPs located in an anchor point of a loop, chromatin loops were identified using Hi-C data
from normal RWPE-1 prostate cells (Luo et al., 2017) or 22Rv1 and C4-2B prostate tumor cells (Rhie et al., in
PrEC
RWPE 1
RWPE 2
22Rv1
22Rv1+DHT
C4 2B
LnCaP
LnCaP+DHT
VCaP
Annotation
RWPE 1.HiC
22Rv1.HiC
C4 2B.HiC
GM.HiC
GM.Cohesin.HiChIP
chr10q26.13
chr10q26.13
chr10q26.13
chr10q26.13
PCa GWAS fine mapped risk SNPs
H3K27Ac
Y
N
Annotation
Exon
Intron
Others
Promoter
TTS
Loops
Y
N
LnCaP
LnCaP+DHT
VCaP
Annotation
RWPE 1.HiC
GM.HiC
GM.Cohesin.HiChIP
H3K27Ac
Y
N
Annotation
Exon
Intron
Others
Promoter
TTS
Loops
Y
N
PCa fine-mapped GWAS SNPs
H3K27Ac sites Loops
Guo_Figure 3.
29
preparation); Hi-C (S. S. Rao et al., 2014) and cohesin HiChIP data (Mumbach et al., 2016) from GM12878 was also
used.
Table 3-4 Risk H3K27Ac sites in 22Rv1
Site
ID
SNPs
Colocalized
Promoters
Looped Promoters Location
E1
rs10908445,rs120693
56
.
UBE2Q1,RP11-
350G8.9,TDRD10,ADAR,CHRNB2,RP11-
61L14.6,UBE2Q1-
AS1,KCNN3,SHE,AL606500.1
chr1:154849564
-154850407
E2
rs1109815,rs5862912
9,rs877343
PMVK,RP11-
307C12.13
UBE2Q1,TDRD10,ADAR,CHRNB2,RP11-
350G8.9,UBE2Q1-AS1,RP11-
61L14.6,SHE,AL606500.1
chr1:154907847
-154911231
E3
rs11240753,rs112407
54,rs4951389
.
chr1:204474823
-204478037
E4
rs4245736,rs4951391,
rs4951392
MDM4
chr1:204484474
-204486668
E5
rs11043132,rs110431
33,rs11043135,rs1160
3101,rs7121039
. ASCL2
chr11:2232428-
2232969
E6
rs11820019,rs144245
804,rs180898041,rs18
4379418,rs191958513
,rs3212859,rs3212880
,rs36225067,rs362250
73,rs3862792,rs39182
96,rs3918298,rs55816
909,rs55911137,rs746
06104,rs76941912
CCND1,ORA
OV1
MYEOV,TPCN2,RP11-
554A11.7,MIR3164,AP000439.5,IFITM
9P,RP11-554A11.9,RP11-554A11.8
chr11:69448299
-69472254
E7 rs4919707 .
AC107016.2,AC107016.1,KRT82,KRT84
,KRT85,KRT18,RP11-
1020M18.10,KRT8,EIF4B
chr12:53288877
-53290929
E8 rs13098,rs55958994 KRT8 KRT85
chr12:53296493
-53302895
E9 rs7327286 .
chr13:73712720
-73713564
E10 rs7158115 FERMT2 RNA5SP384,RP11-589M4.3
chr14:53417076
-53419118
E11 rs461251,rs684232 VPS53
AC108004.3,AC108004.2,DOC2B,AC10
8004.5,RPH3AL,C17orf97,RP11-
1260E13.4,NXN,TIMM22,AC015884.1
chr17:618340-
619783
E12
rs16948071,rs570300
23,rs77653195,rs8072
520
. RNU7-135P
chr17:47466463
-47469298
E13
rs11653701,rs116537
75
. RNU7-135P,AC006487.1
chr17:47472072
-47472395
30
E14
rs17765332,rs177653
44,rs1859962,rs80715
58,rs8072254,rs98443
4,rs9901566,rs990808
7,rs991429
.
RP11-
57A1.1,SLC39A11,AC007461.2,AC0074
61.1,COG1,AC005152.3,AC005152.2,S
OX9,RP11-84E24.2,RP11-
84E24.3,RP11-1124B17.1,LINC00511
chr17:69105863
-69110755
E15 rs58235267
AC009501.4
,OTX1
AC016734.1,WDPCP,MDH1
chr2:63276502-
63278789
E16 rs7591175 .
SH2D6,Y_RNA,RNU7-
64P,RN7SL251P,CAPG
chr2:85737695-
85739360
E17 rs1446668,rs2028900 MAT2A
RETSAT,SH2D6,RN7SL113P,RN7SL251P
,RP11-717A5.1,Y_RNA,ELMOD3,RNU7-
162P,RNU7-64P,CAPG
chr2:85764672-
85768996
E18
rs12473819,rs760597
5
.
PEBP1P2,AC013262.1,RETSAT,SH2D6,
RN7SL113P,RN7SL251P,AC105342.1,R
P11-717A5.1,RP11-
717A5.2,AC012075.2,ELMOD3,RNU7-
162P,Y_RNA,RNU7-64P,CAPG
chr2:85772292-
85774674
E19
rs10172544,rs756845
8
GGCX,VAMP
8
PEBP1P2,RETSAT,SH2D6,RN7SL113P,T
GOLN2,RN7SL251P,ELMOD3,RP11-
717A5.2,RP11-
717A5.1,Y_RNA,AC012075.1,RNU7-
162P,RNU7-64P,CAPG
chr2:85788025-
85789341
E20
rs10187185,rs133862
90,rs13408361,rs1463
795,rs2292879,rs2325
839,rs3087553,rs5805
7291,rs6431544,rs670
71074,rs6724766,rs67
37888,rs6760842,rs75
69197,rs7571898,rs75
83768
MLPH
COPS8,PTMA,CAB39,PRLH,MGC4771,R
AB17,COL6A3,MSL3P1,RNU6-1140P
chr2:238382258
-238425384
E21
rs10176842,rs101849
04
MLPH AC107079.1
chr2:238426998
-238430490
E22
rs10186470,rs118913
48,rs11896232,rs1247
1291,rs13404177,rs13
409241,rs13418586,rs
13425580,rs13426236
,rs13429533,rs197976
2,rs2271809,rs229288
4,rs2292885,rs757088
2,rs7574337,rs758631
2,rs7609087
MLPH AC107079.1
chr2:238431100
-238451177
E23
rs111770284,rs76832
527
ANO7 PPP1R7,PASK,AC005237.2
chr2:242156584
-242157281
E24 rs2258056 TNFRSF6B SAMD10,ZNF512B
chr20:62328118
-62328734
31
E25
rs1056441,rs1741708,
rs2253823,rs2253829,
rs6122159
LIME1,RP4-
583P15.14,S
LC2A4RG
chr20:62370127
-62373695
E26 rs5759182 MCAT GOLGA2P4,POLDIP3,RNU12,RPL5P34
chr22:43538647
-43540024
E27 rs7632756 RUVBL1 RP11-475N22.4,C3orf27,GATA2
chr3:127816706
-127817724
E28 rs2871960 ZBTB38
chr3:141120545
-141122111
E29 rs78416326 SKIL RNY5P3
chr3:170074006
-170076910
E30 rs1534642 RASSF6
chr4:74485765-
74486701
E31 rs17021972,rs998071 . BMPR1B
chr4:95589491-
95592862
E32
rs4714476,rs6935446,
rs6935737,rs9381075
FOXP4,RP11
-328M4.2
chr6:41513420-
41515237
E33
rs10486567,rs171560
41,rs67152137
.
HOXA11-
AS,HOTTIP,MIR196B,HOXA13,HOXA11
,HOXA10,RP1-170O19.17,RP1-
170O19.20,RP1-170O19.14,HOXA-
AS4,RPL35P4,HOXA9
chr7:27975464-
27977211
E34
rs11135756,rs111357
57,rs13264646,rs1326
4869,rs1355600,rs487
2162,rs6980627,rs698
1463,rs7001308,rs701
6506,rs715725
.
RP11-175E9.1,NKX2-6,RP11-
213G6.2,NKX3-1
chr8:23448031-
23450837
E35
rs11135766,rs132621
67,rs4872175
NKX3-1
FP15737,AC051642.1,RNU4-
71P,SLC25A37,ENTPD4,CTC-
756D1.1,CTC-756D1.3,CTC-756D1.2
chr8:23533575-
23541498
E36
rs115266272,rs13254
738,rs1456315,rs1486
63063,rs7001895,rs70
12442,rs72725879
.
chr8:128102256
-128106799
E37
rs145698299,rs57010
632,rs61265543
.
chr8:128110667
-128114831
E38
rs1984037,rs3425167
3,rs71509575
.
chr9:124424865
-124426784
E39 rs5964602 . RP6-22P16.1
chrX:66742239-
66748595
E40
rs1204038,rs1204039,
rs1204040,rs1239624
9,rs2255702,rs236162
AR
chrX:66761237-
66809011
32
9,rs2767564,rs591875
7,rs6152
E41 rs5919393 .
chrX:66822818-
66826149
Figure 3-5 PCa risk SNPs associated with CTCF sites and chromatin loops
Each row represents one of the 93 SNPs that are associated with both a DHS site and a CTCF peak in normal or
tumor prostate cells (Appendices 1. Table of annotated fine-mapped SNPs). The location of each SNP was
classified using the Gencode V19 database. “Others” represents mostly intergenic regions. To identify the subset of
CTCF-associated risk SNPs located in an anchor point of a loop, chromatin loops were identified using Hi-C data
from normal RWPE-1 prostate cells (Luo et al., 2017) or 22Rv1 and C4-2B prostate tumor cells (Rhie et al., in
preparation); Hi-C (S. S. Rao et al., 2014) and cohesin HiChIP data (Mumbach et al., 2016).
Table 3-5 Risk CTCF binding sites in 22Rv1
CBS
ID
SNPs
Colocalized
Promoters
Looped Promoters Location
C1 rs12144978 .
UBE2Q1,TDRD10,ADAR,CHRNB2,RP11-
350G8.9,UBE2Q1-AS1,RP11-
61L14.6,SHE,AL606500.1
chr1:154850223
-154852082
C2 rs877343
PMVK,RP11-
307C12.13
UBE2Q1,TDRD10,ADAR,CHRNB2,RP11-
350G8.9,UBE2Q1-AS1,RP11-
61L14.6,SHE,AL606500.1
chr1:154909844
-154910469
PrEC
R WPE−1
R WPE−2
22Rv1
22Rv1+DHT
C4−2B
LnCaP
LnCaP+DHT
VCaP
Annotation
R WPE−1.HiC
22Rv1.HiC
C4−2B.HiC
GM.HiC
GM.Cohesin.HiChIP
CTCF
Y
N
Annotation
Exon
Intron
Others
Promoter
TTS
Loops
Y
N
PCa fine-mapped GWAS SNPs
Guo_Figure 4.
CTCF binding sites Loops
33
C3 rs7094325 .
chr10:10442943
5-104430076
C4 rs4077456 .
AC124057.5,TSPAN32,CD81,ASCL2,AC129929.5,Y
_RNA,TRPM5,TSSC4
chr11:2229930-
2230330
C5 rs10896450 . RP11-211G23.1,ORAOV1,AP000439.3
chr11:69008106
-69008703
C6 rs55816909 CCND1
MYEOV,RP11-554A11.7,MIR3164,RP11-
554A11.9,RP11-554A11.8
chr11:69457969
-69458640
C7 rs3918296 CCND1
MYEOV,RP11-554A11.7,MIR3164,RP11-
554A11.9,RP11-554A11.8
chr11:69459008
-69459581
C8
rs4919742,r
s902774
.
KRT74,METTL7AP1,KRT85,AC121757.1,KRT82,KR
T83,KRT84,C12orf80,LINC00592,KRT7,RP3-
416H24.4,RP11-1020M18.10,KRT124P,KRT80
chr12:53272331
-53274356
C9 rs7158115 FERMT2 RNA5SP384,RP11-589M4.3
chr14:53417275
-53418922
C10 rs11653775 . RNU7-135P,AC006487.1
chr17:47472133
-47472878
C11
rs7247241,r
s8100395
PPP1R14A
KCNK6,WDR87,YIF1B,AC026806.2,SIPA1L3,CATSP
ERG
chr19:38746719
-38747640
C12 rs3087553 .
chr2:238421025
-238422115
C13 rs2281926 . C20ORF135,ABHD16B,TPD52L2
chr20:62383754
-62384362
C14 rs4988372 BIK
ATP5L2,CYB5R3,RNU12,POLDIP3,Y_RNA,GOLGA2
P4,RPL5P34
chr22:43506266
-43506900
C15 rs1534642 RASSF6
chr4:74485981-
74486383
C16
rs2395172,r
s3129857
.
PPP1R2P1,TAP1,TAP2,XXbac-BPG154L12.4,HLA-
DOB,NOTCH4,MTCO3P1,PSMB8,MIR3135B,XXbac
-BPG254F23.6,HLA-DQB1,PSMB9,HLA-DQB3,HLA-
DQB2,TAPSAR1,HLA-DQB1-AS1,XXbac-
BPG254F23.7,HLA-DQA1,HLA-DQA2
chr6:32399741-
32400321
C17
rs6935446,r
s6935737
FOXP4,RP11
-328M4.2
chr6:41514302-
41514842
C18 rs4646284 .
chr6:160581263
-160582024
C19 rs76387712 .
chr8:128219096
-128219763
Figure 3-6 shows the genome-wide distribution of the PCa risk CREs and provides a visualization
of the long-range intra-chromosome loops they form in 22Rv1 cells. 199 genes were identified
as either directly linked to risk SNPs/ CREs or looped to a risk CRE. Gene Ontology (GO) analysis
of these 199 genes reveals key biological processes that are related to PCa susceptibility. These
34
risk-related loci are enriched for genes that play roles in various development processes,
gender-related development and differentiation, antigen processing, and keratinization.
35
Figure 3-6 Genomic circos plot (Krzywinski et al., 2009) of 22Rv1 risk CREs, risk intra-chromosome loops and fine-
mapped PCa GWAS SNPs
From outer circle: hg19 chromosome ideogram; fine-mapped PCa GWAS SNPs shown in black; risk H3K27Ac sites in
22Rv1 shown in green; risk CTCF binding sites in 22Rv1 shown in red; risk intra-chromosome loops in 22Rv1 shown
in blue.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17 18 19 20 2122
X
Y
36
Figure 3-7 Gene Ontology (GO) biological processes that are enriched in the set of PCa susceptibility related
genes
199 genes were identified as directly perturbed by risk SNPs (i.e the SNP is located within an exon or promoter
region) or through loops from a distal risk CRE to their promoter region. GO analysis was performed using this set
of genes vs the whole genome.
In the next chapter, I will explain how I tested the hypothesis that SNPs within regulatory
elements associated with chromatin loops are important in regulating the transcriptome. These
investigations further identified the target genes of selected risk CREs and provided an
understanding as to which biological processes may be affected by the PCa risk SNPs.
system development
animal organ development
tissue development
programmed cell death
epithelium development
reproductive structure development
epithelial cell differentiation
reproductive system development
sex differentiation
epidermis development
appendage development
limb development
skin development
development of primary sexual characteristics
gonad development
epidermal cell differentiation
male sex differentiation
keratinocyte differentiation
antigen processing and presentation
antigen processing and presentation of peptide antigen
antigen processing and presentation of exogenous antigen
development of primary male sexual characteristics
male gonad development
keratinization
antigen processing and presentation of exogenous peptide antigen
cornification
0 5 10 15 20
Fold enrichment
GO terms
37
Chapter 4. Functional validation of risk CREs
Most of the work described in this chapter has been published in Guo, Y., Perez, A. A., Hazelett,
D. J., Coetzee, G. A., Rhie, S. K., & Farnham, P. J. (2018). CRISPR-mediated deletion of prostate
cancer risk-associated CTCF loop anchors identifies repressive chromatin loops. Genome
biology, 19(1), 160. Prostate cell Hi-C libraries used in this chapter were prepared by Fides Lay
and Hi-C analysis were done by Suhn Rhie. Andrew Perez and Shannon Schreiner contributed to
ChIP-seq experiments. Charles Nicolet and Jenevieve Polin contributed to the experiment in
Figure 4-12. The remainder of the experimental and analytical procedures were done by Yu Guo.
In this chapter, I describe how I leveraged the CRISPR/Cas9 genome engineering tool and high
throughput sequencing assays to conduct functional studies of selected PCa-associated risk
regulatory elements, using the diploid, androgen-independent prostate cancer cell line 22Rv1
as a disease model.
CRISPR mediated CTCF binding site deletion can identify target genes
CTCF has been shown to affect gene regulation by several different mechanisms. For example,
TADs are formed by the interaction of two convergently bound CTCFs separated by a large
number of base pairs (500 kb to 1 Mb) (de Wit et al., 2015; Gomez-Marin et al., 2015; Guo et
al., 2015; S. S. Rao et al., 2014; Vietri Rudan et al., 2015); the physical interaction of the CTCFs
bound to each anchor point creates a chromatin loop. CTCF is also thought to influence
enhancer-mediated gene regulation, functioning in both positive and negative ways. For
example, CTCF may help to bring an enhancer closer in 3D space to a target promoter via its
ability to form intra-TAD loops with other CTCF sites. In contrast, binding of CTCF at a site
between an enhancer and promoter can, in some cases, block long-range regulation. To
determine if the PCa risk-associated CTCF anchor regions that I identified to be involved in
looping do in fact control the expression of specific genes, my next step was to use the
CRISPR/Cas9 system to delete PCa risk-associated CTCF anchor regions and then assess the
effects of these deletions on the transcriptome using RNA-seq (Figure 4-1; see also Table 2-2
for sequences of guide RNA). Unlike most PCa cells, 22Rv1 cells are diploid; therefore, I used
these cells for my CRISPR/Cas9 experiments.
38
Figure 4-1 Experimental workflow for functional investigation of PCa risk-associated CTCF sites.
Phase 1: Plasmids encoding guide RNAs that target sequences on each side of a PCa risk-associated CTCF site were
introduced into the PCa cell line 22Rv1 along with a Cas9 expression vector (see Chapter 2 for details). The resultant
cell pool was analyzed to determine deletion efficiency (red slashes represent alleles in each cell that harbor a CTCF
site deletion). Single cells were then selected and expanded into clonal populations for RNA-seq analysis. Phase 2:
After identifying the gene most responsive (within a ± 1-Mb window) to deletion of the region encompassing a risk-
associated CTCF site, plasmids encoding guide RNAs that target the risk-associated CTCF anchor region and/or the
regions encompassing the CTCF sites looped to the risk CTCF site and a Cas9 expression plasmid were introduced
into 22Rv1 cells; cell pools were analyzed by PCR to check deletion frequency and by RT-qPCR to measure
expression of the target gene.
Cells transfected with CRISPR/Cas9 system will become a heterogeneous population that
consist of wild type cells, heterozygous knock out cells, and homozygous knock out cells. Clonal
expansion and screening is needed to obtain a homogenous population of homozygous knock
out cells for RNA-seq (Phase 1 in Figure 4-1). However, the clonal expansion and screening
process is time consuming. Hence, after identifying putative target gene(s) in phase 1, I
introduced a phase 2, where I use the results from phase 1 to guide my analysis of the effects of
CRIPSR/Cas9-mediated targeting in a heterogeneous population using qPCR. Even though the
gene expression changes from the heterogeneous population may be subtle, as compared to
the clonal population, this approach allows me to check CRE function at a more rapid pace.
Two risk CTCF binding sites that form loops were selected for functional follow ups
I chose to study two PCa risk-associated CTCF binding sites (CBS), one on Chr1 and one on
Chr12. These regions are both located in intergenic regions of the genome and thus are not
easily associated a priori with a specific target gene. Also, these regions are robustly bound by
CTCF in all nine CTCF peak sets and are identified as being involved in 3D chromatin looping in
all of the Hi-C or HiChIP datasets that I analyzed (Figure 4-3 A, Figure 4-6 A).The circos plot in
Figure 4-2 shows the genomic landscape of the two risk loci in 22Rv1 cells. Each locus harbors
multiple risk SNPs and risk CREs. 22Rv1 Hi-C identified many loops (shown in blue in the plot)
for the selected CREs. For each locus, two of the loops (out of the many) are reproducible in the
GM12878 Hi-C loops (S. S. Rao et al., 2014) and other independent data sets (shown in red in
the plot).
39
Although the chosen PCa risk-associated SNPs are not located precisely within the CTCF motif,
they are within the CTCF peaks. In a previous study of allele-specific differences in binding
strength of CTCF in 51 lymphoblastoid cell lines, the authors found that the majority of the
nucleotide changes associated with CTCF binding strength were within 1 kb of the CTCF-binding
motif (or in linkage disequilibrium with a variant within 1 kb of the motif), but very few were
actually in the CTCF motif itself (Ding et al., 2014).
Figure 4-2 Circos plots of risk loci on chromosome 1 (left) and chromosome 12 (right).
From outer circle: hg19 chromosome ideogram; fine-mapped PCa GWAS SNPs shown in black; risk H3K27Ac sites in
22Rv1 shown in green; risk CTCF binding sites in 22Rv1 shown in red; risk intra-chromosome loops in 22Rv1 shown
as blue links in the middle; risk high confidence loops shown as red links in the middle.
Functional validation of risk CBS in 1q21.3 locus identified target gene KCNN3
I began by deleting the CTCF anchor region on chr1 which is associated with the PCa risk-
associated SNP rs12144978. This SNP has a strong CTCF peak nearby, is located in an intergenic
region, and was identified to be involved in looping in five independent chromatin interaction
datasets (Figure 4-3 A). Hi-C data identified two high confidence risk loops (220 kb and 320 kb)
anchored by the PCa risk-associated CTCF site, refer to as site 1 in the following text; each loop
has convergent CTCF peaks at the anchor regions (Figure 4-3 B, C). Both loops were identified in
prostate Hi-C datasets as well as in GM12878 Hi-C and HiChIP datasets and can be visually
observed in the 22Rv1 Hi-C interaction heat map (blue circles in Figure 4-3 B). There are
additional CTCF sites near rs12144978. However, the other CTCF sites are 10 kb away from the
anchor region and therefore were not identified as being involved in statistically significant
loops with the prostate cancer risk-associated CTCF site; a browser snapshot of the CTCF ChIP-
seq data and the loops identified by Hi-C can be seen in Figure 4-10 A.
Guide RNAs were introduced into 22Rv1 prostate cancer cells along with Cas9, and clonal
populations were analyzed to identify clones in which both Chr1 alleles were deleted for a
0KB 30KB 60KB 90KB 120KB 150KB 180KB 210KB 240KB
780KB 810KB 840KB 870KB 900KB 930KB 960KB
270KB
300KB
330KB
360KB
390KB
420KB
450KB
480KB
510KB
540KB
570KB
600KB
630KB
660KB
690KB
720KB
750KB
990KB
chr1
0KB 60KB 120KB 180KB 240KB 300KB 360KB 420KB 480KB
1560KB 1620KB 1680KB 1740KB 1800KB 1860KB 1920KB
540KB
600KB
660KB
720KB
780KB
840KB
900KB
960KB
1020KB
1080KB
1140KB
1200KB
1260KB
1320KB
1380KB
1440KB
1500KB
1980KB
chr12
40
1607-bp region encompassing CTCF site 1. Using RNA-seq analysis of the clonal population, I
found that deletion of the anchor region harboring CTCF site 1 caused a large increase (almost
100-fold) in expression of KCNN3 (Figure 4-3 D), which is located within the loops anchored by
the PCa risk-associated CTCF site. Other genes within the same loops or within ± 1 Mb from the
risk CTCF site did not exhibit large changes in expression. However, other genes in the genome
did show changes in expression, most likely as an indirect effect of altered expression of the
nearby KCNN3 gene (Figure 4-9). To determine if deletion of the region encompassing CTCF site
3, which anchors the larger loop but does not have a PCa risk-associated SNP nearby, also
affected expression of KCNN3, I created clonal 22Rv1 cell populations having homozygous
deletion of a 913-bp region encompassing CTCF site 3. RNA-seq analysis revealed a modest
increase in the expression of KCNN3 in cells homozygously deleted for site 3 (Figure 4-3 E).
These data suggest that perhaps KCNN3 expression is regulated by maintaining its topological
associations within either the 220-kb or the 320-kb loop. If so, then deletion of the regions
encompassing both sites 2 and 3 may be required to see the same effect on KCNN3 expression
as seen upon deletion of site 1.
41
Figure 4-3 KCNN3 is upregulated upon targeted deletion of the region encompassing the CTCF site near
rs12144978.
a A blowup of the CTCF peak information, genomic annotation, and looping information for rs12144978 from
Figure 3-2. b Hi-C chromatin interaction map of the region of chromosome 1 near the rs12144978. The location of
the SNP is indicated by the blue line and arrow. The blue circles indicate the high confidence risk loops used in the
analysis. c Detailed schematic of the high confidence risk loops in which rs12144978 is involved, as identified by the
Hi-C chromatin interaction data. d Shown is the fold-change expression of all genes within a ± 1-Mb region near
rs12144978 in the cells deleted for a 1607-bp region encompassing the PCa risk-associated CTCF site (site 1); a
volcano plot illustrating the genome-wide analysis of the RNA-seq data can be found Figure 4-9. The yellow X
indicates which CTCF site has been deleted. e Shown is the fold-change expression of all genes within a ± 1-Mb
region near rs12144978 in the cells deleted for a 913-bp region encompassing CTCF site 3. The yellow X indicates
which CTCF site has been deleted.
To test the effects of deletion of individual vs. multiple CTCF sites on KCNN3 gene expression, I
introduced guide RNAs (plus Cas9) to the regions encompassing CTCF sites 1, 2, or 3
individually, or guide RNAs targeting a combination of the regions, into 22Rv1 cells, harvested
the transfected cell pools, and then performed RT-qPCR to measure KCNN3 gene expression
(Figure 4-4). Introduction of the guide RNAs to delete a 1607-bp or 1221-bp region
42
encompassing CTCF site 1 elicited a 90-fold increase in KCNN3 expression, similar to the RNA-
seq result shown in Figure 4-3 D. Deletion of a 913-bp region encompassing site 3 showed a
modest (less than 2-fold) increase in expression of KCNN3 (similar to the RNA-seq results);
similar results were seen upon deletion of a 395-bp region encompassing site 2. Notably, the
combination of site 2 and 3 deletions did not cause a large increase in KCNN3 expression (~ 7-
fold). Rather, only when the region encompassing CTCF site 1 (which I identified as a PCa risk-
associated CTCF site) was deleted alone, or in combination with site 3, did KCNN3 expression
increase 100-fold.
Figure 4-4 Analysis of the rs12144978-associated chromatin loops.
Guide RNAs targeting regions encompassing CTCF site 1 (the PCa risk-associated CTCF site), CTCF site 2, and/or
CTCF site 3 (or the empty guide RNA vector as a control) were introduced into 22Rv1 prostate cancer cells, along
with Cas9. Cell pools were harvested, and KCNN3 expression was analyzed by RT-qPCR. Shown within the blue bars
is the fold change in KCNN3 expression in the pools that received guide RNAs vs. the vector control. The yellow X
indicates which CTCF site has been deleted.
I used the same sets of gRNAs to delete CTCF sites 1, 2, and 3 in HEK293T kidney cells and HAP1
chronic myelogenous leukemia cells. HEK293T and HAP1 cells have a similar CTCF binding
pattern at the 1q21.3 risk locus, but a slightly different H3K27Ac landscape (Figure 4-5 B). I
found that KCNN3 activation was only observed in 22Rv1 cells (Figure 4-5 A), suggesting that
the CTCF site 1-mediated regulation of KCNN3 might be prostate cell specific. Whether this
specific effect is related to differences in the CRE or in chromatin structure still remain to be
explored.
83.7
109.23
6.68 S2+S3
S1+S3
S3
S2
S1
Cas9 CTRL
0 2 4 6
Log2 Fold Change
KCNN3 Relative Expression Level
1
1.91
1.40
43
Figure 4-5 Deletion of PCa risk-associated CTCF site 1 in different cell lines.
a Shown is the expression level of KCNN3 in control and cell pools transfected with guide RNAs targeting CTCF site
1, for 22RV1, HAP1, and HEK293T cells. b Shown are CTCF and H3K27Ac ChIP-seq tracks for 22RV1, HAP1, and
HEK293T cells.
Functional validation of risk CBS in 12q13.13 locus
I next assayed the effects of deletion of the region encompassing the CTCF site on chr12 with
the PCa risk-associated SNP rs4919742 and rs902774. This PCa risk-associated CTCF peak,
referred to as site 4 in the following text, is also located in an intergenic region and was
Figure S3. Deletion of PCa risk-associated CTCF site 1 in different cell lines.
(A) Shown is the expression level of KCNN3 in control and cell pools transfected with guide RNAs
targeting CTCF site 1, for 22RV1, HAP1, and HEK293T cells. (B) Shown are CTCF and H3K27Ac
ChIP-seq tracks for 22RV1, HAP1, and HEK293T cells.
S3
S2
S1
Cas9 CTRL
0 25 50 75
Fold Change
KCNN3 Relative Expression Level
cell
22Rv1
HAP1
HEK293T
A
B
22Rv1 HC risk loops
risk SNPs
CTCF
H3K27Ac
Site 1 Site 2 Site 3
HEK293T
HAP1
22Rv1
HEK293T
HAP1
22Rv1
44
identified to be involved in looping in five independent chromatin interaction datasets (Figure
4-6 A). Hi-C data identified two high confidence loops (300 kb and 715 kb) anchored by the PCa
risk-associated CTCF site; each loop has convergent CTCF peaks at the anchors (Figure 4-6 B, C).
Similar to the loops at CTCF site 1, both loops at CTCF site 4 were identified in prostate Hi-C
datasets as well as in GM12878 Hi-C data and can be visually observed in the Hi-C interaction
map (blue circles in Figure 4-6 B). Due to the higher resolution of the GM12878 Hi-C dataset,
the genomic locations of the anchor regions of the two high confidence risk loops were taken
from the GM12878 data. I note that there are additional CTCF sites near rs4919742. However,
the other sites were not identified to be in statistically significant high confidence loops linked
to the prostate cancer risk-associated CTCF site 4; a browser snapshot of the CTCF ChIP-seq
data and the loops identified by Hi-C can be seen in Figure 4-10 B.
Guide RNAs were introduced into 22Rv1 cells along with Cas9, and clonal populations were
analyzed to identify clones in which both chr12 alleles were deleted for a 2875-bp region
encompassing CTCF site 4. I found that deletion of this region caused a large increase in
expression of KRT78, KRT4, KRT79, and KRT80 (Figure 4-6 D). KRT78, KRT4, and KRT79 are
located within the 300-kb loop whereas KRT80 is outside of the 300-kb loop but within the
larger 715-kb loop, both of which are anchored by the PCa risk-associated CTCF site 4.
45
Figure 4-6 Deletion of the region encompassing the PCa risk-associated CTCF site near rs4919742 increases KRT
gene expression.
a A blowup of the CTCF peak information, genomic annotation, and looping information for rs4919742 from Figure
3-5 b Hi-C chromatin interaction map of the region of chromosome 1 near the rs4919742. The location of the SNP is
indicated by the blue line and arrow. The blue circles indicate the high confidence risk loops used in the analysis. c
Detailed schematic of the high confidence risk loops in which rs4919742 is involved, as identified by the Hi-C
chromatin interaction data; there are 26 keratin genes within the loops. d Shown is the fold-change expression of
all genes within a ± 1-Mb region near rs4919742 in the cells deleted for a 2875-bp region encompassing the PCa
risk-associated CTCF site (site 4); a volcano plot illustrating the genome-wide analysis of the RNA-seq data can be
found Figure 4-9. The yellow X indicates which CTCF site has been deleted.
To test the effects of deletion of individual vs. multiple CTCF sites on KRT gene expression, I
introduced guide RNAs (plus Cas9) to regions encompassing CTCF sites 4, 5, or 6 individually, or
guide RNAs that target a combination of the sites, into 22Rv1 cells, harvested the transfected
cell pools, and then performed RT-qPCR to measure KRT78 gene expression (Figure 4-7).
Introduction of guide RNAs that would delete a 2875-bp or a 1384-bp region encompassing PCa
risk-associated CTCF site 4 showed more than a 100-fold increase in KRT78 expression, similar
to the RNA-seq analyses shown in Fig. 8. Deletion of 1969-bp and 5457-bp regions
encompassing CTCF sites 5 or 6, respectively (which are not associated with PCa), showed very
modest increases in expression of KRT78, whereas the combination of deletion of sites 5 and 6
46
did not increase KRT78 expression. The only large changes in KRT78 expression were in cells
deleted for the region encompassing CTCF site 4 alone or when deleted in combination with
other CTCF sites.
Figure 4-7 Analysis of the rs4919742-associated chromatin loops.
Guide RNAs targeting regions encompassing CTCF site 4 (the PCa risk-associated CTCF site), CTCF site 5, and/or
CTCF site 6 (or the empty guide RNA vector as a control) were introduced into 22Rv1 prostate cancer cells, along
with Cas9. Cell pools were harvested, and KRT78 expression was analyzed by RT-qPCR. Shown within the blue bars
is the fold change in KRT78 expression in the pools that received guide RNAs vs the vector control. The yellow X
indicates which CTCF site has been deleted.
Finally, I investigated the cell-type specificity of the response to deletion of site 4 by also
deleting these regions using same gRNA sets in HEK293T kidney cells and HAP1 chronic
myelogenous leukemia cells. The 12q13.13 locus has similar CTCF binding landscapes in
HEK293T, HAP1 and 22Rv1. However, their H3K27Ac landscapes are distinctive (
Figure 4-8 B). The KRT78 gene was upregulated (~ 25-fold) in both HEK293T and HAP1 when a
1.6-kb region encompassing PCa risk-associated CTCF site 4 on chr12 was deleted. But the
effect is modest compare to the over 100 fold KRT78 increase in 22Rv1 (Figure 4-5 A),
118.19
26.7
34.39
S5+S6
S4+S6
S4+S5
S6
S5
S4
Cas9 CTRL
0 2 4 6
Log2 Fold Change
KRT78 Relative Expression Level
1
1.72
2.14
0.89
47
suggesting that, again, this CRE might function in a prostate cell-specific manner. Additional
studies should be done to fully understand this specificity.
Figure 4-8 Deletion of PCa risk-associated CTCF site 4 in different cell lines. a Shown is the expression level of KRT78 in control and cell pools transfected with guide RNAs targeting CTCF site 4,
for 22RV1, HAP1, and HEK293T cells. b Shown are CTCF and H3K27Ac ChIP-seq tracks for 22RV1, HAP1, and
HEK293T cells.
Guo_Figure S4.
Figure S4. Deletion of PCa risk-associated CTCF site 4 in different cell lines.
(A) Shown is the expression level of KRT78 in control and cell pools transfected with guide
RNAs targeting CTCF site 4, for 22RV1, HAP1, and HEK293T cells. (B) Shown are CTCF
and H3K27Ac ChIP-seq tracks for 22RV1, HAP1, and HEK293T cells.
S4
Cas9 CTRL
0 50 100
Fold Change
cell
22Rv1
HAP1
HEK293T
KRT78 Relative Expression Level
A
B
22Rv1 HC risk loops
risk SNPs
CTCF
H3K27Ac
Site 4 Site 5 Site 6
8
HEK293T
HAP1
22Rv1
HEK293T
HAP1
22Rv1
48
Figure 4-9 Genome-wide RNA-seq analysis of cells deleted for PCa risk-associated CTCF sites.
Shown are volcano plots representing changes in gene expression, as compared to control 22Rv1 cells, in cells
having a homozygous deletion of CTCF PCa risk-associated site 1 (a) or site 4 (b).
PCa risk-associated CTCF loops may sequester genes from enhancers located outside the
loops
To gain insight into the mechanism by which the PCa risk-associated CTCF sites near SNPs
rs12144978 and rs4919742 may regulate expression of KCNN3 and KRT78, respectively, I
examined the pattern of H3K27Ac peaks in a large region surrounding each SNP (Figure 4-10).
Interestingly, in both cases, in the parental cells, the genomic regions within the loops that are
anchored by the PCa risk-associated SNP are devoid of the active enhancer mark H3K27Ac.
These are very large genomic regions (~200–600 kb) to lack any H3K27Ac peaks. This pattern
suggested two mechanisms by which these CTCF sites could potentially maintain expression of
KCNN3 and KRT78 at low levels. First, the loops may prevent activation of potential enhancers
by formation of a repressive chromatin structure. I determined that the loop regions anchored
by the two PCa risk-associated CTCF sites (site 1 on chr1 and site 4 on chr12) are both covered
by H3K27me3, which is known to be associated with polycomb-mediated gene silencing
(Conway, Healy, & Bracken, 2015); deletion of the risk-associated CTCF sites may result in the
formation of new enhancers within these previously repressed regions. Alternatively, the PCa
risk-associated CTCF sites may prevent the promoters of the KCNN3 and KRT78 genes from Guo_Figure S2.
Figure S2. Genome-wide RNA-seq analysis of cells deleted for PCa risk-associated CTCF sites.
Shown are volcano plots representing changes in gene expression, as compared to control 22Rv1
cells, in cells having a homozygous deletion of CTCF PCa risk-associated site 1 (A) or site 4 (B).
p < 0.05 | FC > 1.5
KCNN3
0
3
6
9
12
8 4 0 4
log2 Fold Change
log10 p value
del Site1
p < 0.05 | FC > 1.5
KRT80
KRT78
KRT4
KRT79
KRT8P3
0
3
6
9
5 0 5
log2 Fold Change
log10 p value
del Site 4
A B
49
interacting with a pre-existing active enhancer(s) located outside the loop (in this case, the
enhancer would be marked by H3K27Ac in both control and CRISPR-deleted cells).
Figure 4-10 PCa risk-associated CTCF loops encompass enhancer deserts.
Shown are genome browser snapshots of CTCF, CTCF motifs with orientation, H3K27Ac, and H3K27me3 ChIP-seq
data for the regions near chromatin loops associated with the rs12144978 (a) or rs4919742 (b) risk SNPs. In each
panel, the H3K27Ac ChIP-seq track for cells deleted for the region encompassing the PCa risk-associated SNP is also
shown. Also shown are all fine-mapped SNPs in each locus and the high confidence risk loops identified by Hi-C
chromatin interaction data anchored by each SNP and the RefSeq gene track. Insets show blowups of the regions
containing the PCa risk-associated CTCF sites and PCa risk-associated H3K27Ac sites at each locus.
To distinguish these possibilities, I performed H3K27Ac ChIP-seq in clonal population of cells
homozygously deleted for either the PCa risk-associated CTCF site 1 on chr1 or the site 4 on
chr12. Interestingly, I found that the regions remained as enhancer deserts, even after deletion
of the PCa risk-associated CTCF sites (Figure 4-10). This result is consistent with a previous
study in which global CTCF degradation did not lead to dramatic H3K27me3 decrease (Nora et
al., 2017). My data supports a model in which the PCa risk-associated CTCF-mediated loops
insulate the KCNN3 and KRT78 promoters from nearby pre-existing active enhancers. Removing
the CBS leads to the loss of insulation, and promoters in the loop can now form ectopic contact
with the nearby active enhancers, thus resulting in increased expression (Figure 4-11 A).
ACVRL1 KRT80 KRT83 KRT6B KRT72 KRT78 KRT8 EIF4B CSAD PFDN5
52,400 kb 52,600 kb 52,800 kb 53,000 kb 53,200 kb 53,400 kb 53,600 kb 53,800 kb
1,466 kb
A
B
SHE UBE2Q1 KCNN3 PMVK FLAD1 ADAM15
154,500 kb 154,600 kb 154,700 kb 154,800 kb 154,900 kb 155,000 kb
603 kb
chr1
Site 1 Site 2 Site 3
SNP
High confi dence risk loop
CTCF
H3K27Ac
H3K27me3
After deletion H3K27Ac
22Rv1 cells
CTCF motif (red forward, blue reverse)
PMVK
KRT8
chr12
SNP
High confi dence risk loop
CTCF
H3K27Ac
H3K27me3
After deletion H3K27Ac
CTCF motif (red forward, blue reverse)
Site 4 Site 5 Site 6
rs10796934 rs12144978 rs6683557
rs73108429 rs4919742 rs902774 rs11836041
CTCF
Motif
Site 4 CRISPR deletion
(chr12:53272450-53275325, 2875 bp)
Loops
CTCF
Motif
Site 1 CRISPR deletion
(chr1:154850252-154851859, 1607 bp)
Loops
H3K27Ac
H3K27Ac
50
Similarly, the GWAS SNPs might increase risk by changing the binding affinity of CTCF at specific
sites, thus changing disease susceptibility-related gene expression (Figure 4-11 B).
Figure 4-11 PCa risk-associated CTCF loops may sequester genes from enhancers located outside the loops.
a Shown is a model for enhancer adoption that may occur upon deletion of a PCa risk-associated CTCF site; in this
model, the entire CTCF binding site (CBS) is removed and therefore the loop is broken. b Shown is a model for
enhancer adoption that occurs only at the allele that has less binding affinity for CTCF (due to the SNP); in this
model, the CBS remains, but the loop is broken due to reduced affinity of CTCF protein for the site.
For both CTCF site 1 and site 4, there are risk enhancers adjacent to the loop anchors (Figure
4-10, zoom out of the H3K27Ac peak). To test whether the previously insulated promoters of
the KCCN3 and KRT78 gene can form ectopic enhancer contacts upon CBS knock out, I used the
CRIPSR/Cas9 system again to target 4 nearby active enhancers in the CTCF site 1 knock out cells
(Figure 4-12). RT-qPCR analysis of the KCNN3 mRNA in the transfected, heterogeneous cell
population showed ~20% decreased expression (Figure 4-12). Although these effects are
modest, the results suggest that these enhancers might contribute to increased KCNN3 gene
expression in the CTCF site 1-deleted cells. Further studies using homozygous knockouts and
combinations of enhancer deletions are needed to fully understand the effects of the different
enhancers.
Guo_Figure 11.
CBS
Enhancer
CBS
CTCF CTCF
Enhancer
CBS
CTCF
CBS
CTCF
X
CBS Knockout
CBS
Enhancer
CBS
CTCF CTCF
Allele 1
Enhancer
CBS
CTCF
CBS
CTCF
A
B
Allele 2
51
Figure 4-12 Knocking down of putative adopted enhancers decrease KCNN3 expression in site1 depletion cells.
Guide RNAs that flanked H3K27Ac sites A, B, C, or D were transfected (along with Cas9) into cells that were deleted
for the CTCF site 1, RNA was harvested and analyzed for KCNN3 expression using qPCR. Deletion of each of the
nearby enhancers caused a decrease of ~20% in KCNN3 expression.
KCNN3 and KRT genes and prostate cancer
In this chapter, I identified KCNN3 and 4 KRT genes as targets of the two risk CBS. A link
between KCNN3, also known as SK3, and prostate cancer biology has been previously observed.
KCNN3 is a calcium-activated potassium channel that has been shown to enhance tumor cell
invasion in breast cancer and malignant melanoma (Potier et al., 2006). For example, Chantome
et al. (Chantome et al., 2013) have shown that the majority of breast and prostate cancer
samples from primary tumors or bone metastases (but not normal tissues) are positive for
KCNN3. Of note, the shRNA-mediated reduction of KCNN3 RNA did not result in changes in cell
proliferation, but rather resulted in a lower number of bone metastases in a nude mouse model
system. The bone is the most frequent site of prostate carcinoma metastasis with skeletal
rs11264276 rs2135219 rs4845685 rs1109815 rs56103503
KCNN3 PMVK PBXIP1 SHC1 FLAD1 ZBTB7B DCST1
154,800 kb 154,820 kb 154,840 kb 154,860 kb 154,880 kb 154,900 kb 154,920 kb 154,940 kb 154,960 kb 154,980 kb 155,000 kb 155,020 kb
241 kb
chr1
SNP
High confi dence risk loop
CTCF
H3K27Ac
After deletion H3K27Ac
22Rv1 cells
CTCF motif (red forward, blue reverse)
Site 1
A B C D1D2
52
metastases identified at autopsy in up to 90% of patients
dying from prostate carcinoma (Abrams, Spiro, &
Goldstein, 1950; Bubendorf et al., 2000; Rana et al., 1993).
Taken together with previous studies, our work suggests
that binding of CTCF to rs12144978 may, via its repressive
role on KCNN3 expression, play a protective role regarding
human prostate cancer. Of clinical relevance, edelfosine, a
glycerophospholipid with antitumoral properties that
inhibits SK3 channel activity, can inhibit migration and
invasion of cancer cells in vitro and in vivo in an SK3-
dependent manner, pointing towards a possible use of
edelfosine in prostate cancer treatment (Berthe et al.,
2016; Girault et al., 2011; Potier et al., 2011; Steinestel et
al., 2016).
The KRT genes translate into keratin proteins that are
related to the keratinization/ cornification process
mentioned in Figure 3-7. Although KRT78, KRT79, KRT80
and KRT4 have not previously been associated with prostate
cancer, keratins have been identified as a diagnostic marker
for metastatic melanoma (Wang, Li, & Chen, 2018) and
cervical cancers (Varga et al., 2017). Keratin expression is highly related to the squamous cell
carcinoma (Rosendahl et al., 2012), which is a rare but aggressive form of prostate cancer (Rena
D. Malik, George Dakwar & Nicholas J. Sanfilippo, Andrew B. Rosenkrantz, 2011). Notably, KRT4
expression in TCGA prostate adenocarcinoma (PRAD) samples are much higher when in normal
prostate tissue than in prostate tumors (Cancer Genome Atlas Research, 2015;
GTExConsortium, 2015; Tang et al., 2017) (Figure 4-13), suggesting that the risk variants in this
locus may play a disruptive role regarding prostate cancer.
Figure 4-13 KRT4 expression in TCGA
Prostate Adenocarcinoma (PRAD) vs.
matching normal tissue and GTEx
normal prostate tissues.
0 2 4 6 8 0 2 4 6 8
*
PRAD
n(Tumor)=492 n(Normal)=152)
KRT4
53
Chapter 5. Summary and Discussion
Some of the content in this chapter has been published in Guo, Y., Perez, A. A., Hazelett, D. J.,
Coetzee, G. A., Rhie, S. K., & Farnham, P. J. (2018). CRISPR-mediated deletion of prostate cancer
risk-associated CTCF loop anchors identifies repressive chromatin loops. Genome biology, 19(1),
160.
The role of CTCF in regulating the cancer transcriptome
I performed a comprehensive analysis of the regulatory potential of 2,181 fine-mapped PCa
risk-associated SNPs, identifying a subset of these SNPs that fall within DHS sites located within
either a H3K27Ac peak or a CTCF binding site defined by ChIP-seq datasets I produced for
normal and tumor prostate cells. After selecting the fine-mapped SNPs that fall within these
active regulatory regions, I next identified the subset of SNPs that lie within an anchor region of
a chromatin loop, using in situ Hi-C data from normal and tumor prostate cells. Using this
information, I predicted a set of target genes that are regulated by PCa risk-related H3K27Ac-
marked enhancers (Table 3-4, Table 3-5). Finally, I used CRISPR-mediated deletion to remove
CTCF anchor regions that encompass PCa risk-associated CTCF sites and also deleted regions
encompassing CTCF sites that fall within the anchor regions of the other ends of the loops. I
found that deletion of the region encompassing a PCa risk-associated CTCF site on chr1 or the
region encompassing a PCa risk-associated site on chr12 turns on a nearby gene located in an
enhancer desert. This results suggest that these two PCa risk-associated CTCF sites may
function by encaging cancer-relevant genes in repressive loops.
My studies can be put into context by discussion of other work that has focused on depletion of
CTCF or components of the cohesin complex from the cell. Nora et al. (Nora et al., 2017)
identified only a small number of genes (~ 200) that were upregulated more than 10-fold after
removal of CTCF from mES cells using an auxin degron system. The authors noted that not all
genes within a TAD responded in the same way to CTCF depletion and concluded that depletion
of CTCF triggers upregulation of genes that are normally insulated from neighboring enhancers
by a TAD boundary. Similarly, Rao et al. (S. S. P. Rao et al., 2017) found that auxin-mediated
depletion of RAD21 (a core component of cohesin) in HCT116 colon cancer cells led to the
upregulation of a small number of genes (~ 200 genes showed at least a 30% increase in
expression). These analyses of the transcriptional consequences of CTCF or RAD21 depletion
are similar to my studies of CRISPR-mediated CTCF site deletion. I found that deletion of non-
risk CTCF anchor regions had very modest effects on target gene regulation, even though Hi-C
data shows interaction of these CTCF anchor regions with the PCa risk-associated CTCF sites.
However, the degree of upregulation that I observed upon deletion of the regions
encompassing PCa risk-associated CTCF sites is much greater than the majority of the effects
observed in the previous studies.
It is not yet clear if other GWAS-identified CTCF sites will also be important in gene regulation,
Gallager et al. have proposed that a CTCF site near a SNP involved in risk for frontotemporal
54
lobar degeneration creates a loop that enhances expression of TMEM106B; however, because
the CTCF site was not deleted, the actual effect of the site on gene expression is not known
[38]. Several groups have studied other disease-related CTCF sites [39]. In most cases, the CTCF
sites have resided within a TAD boundary element and, when these sites are deleted, only
modest upregulation of a nearby gene has occurred. Thus, the very large increases in gene
expression that I observed upon deletion of regions encompassing PCa risk-associated CTCF
sites suggest that these CTCF sites may be members of a small set of sites that are highly critical
for robust gene regulation.
The enhancer adoption hypothesis.
In this study, I hypothesized the upregulation of susceptible genes upon deletion is the result of
enhancer adoption. Enhancer adoption hypothesis has been proposed in various study before.
For example, deletion of a TAD boundary was shown to increase expression of PAX3, WNT6,
and IHH (Lupianez et al., 2015) via a proposed mechanism of enhancer adoption made possible
by removal of a repressive loop. Enhancer adoption has also been linked to AML/MDS, Mono-
MAc/Emerger syndromes, and medulloblastoma (Groschel et al., 2014; Northcott et al., 2014).
Also, investigators have shown that elimination of a boundary site of an insulated
neighborhood can modestly activate expression of an oncogene (Hnisz, Day, & Young, 2016;
Hnisz, Weintraub, et al., 2016). Other examples of enhancer adoption include a modest
upregulation of the Fnb2 gene when a CTCF site located 230 kb downstream is deleted (de Wit
et al., 2015)and a 3-fold increase in PDGFRA expression upon deletion of a CTCF site (Flavahan
et al., 2016). Interestingly, Ibn-Salem et al. searched the Human Phenotype Ontology database
and identified 922 deletion cases in which tissue-specific enhancers have been brought into the
vicinity of developmental genes as consequence of a deletion that removed a TAD boundary.
They predicted that 11% of the phenotype effects of the deletions could be best explained by
enhancer adoption that occurs upon removal of TAD boundary (Ibn-Salem et al., 2014). Future
studies that test these predictions would help to understand the global significance of
repressive 3D chromatin loops
As noted above, I observed profound effects on KCNN3 and KRT78 gene expression when I
deleted regions encompassing CTCF sites related to increased risk for PCa. Both KCNN3 and
KRT78 are each located within large genomic regions that are devoid of the H3K27Ac mark. The
upregulation of KCNN3 and KRT78 upon deletion of the risk-associated CTCF regions could be
due to the creation of new active enhancers in the previous enhancer deserts, which are
covered by the repressive H3K27me3 mark in control cells. Alternatively, it has previously been
proposed that CTCF can limit gene expression by sequestering a gene within a loop and
preventing it from being regulated by nearby enhancers (Flavahan et al., 2016; Grimmer &
Costello, 2016). Therefore, it was possible that pre-existing enhancers, located outside the
enhancer deserts, gain access to the promoters of KCNN3 and KRT78 genes after deletion of the
regions encompassing the risk CTCF sites (i.e., an enhancer adoption model).
Using H3K27Ac ChIP-seq analysis of clonal cell populations homozygously deleted for the
regions encompassing the risk-associated CTCF sites, I showed that new active enhancers are
not created within the large enhancer deserts (Figure 4-10). Therefore, it is likely that the
55
increased expression of KCNN3 and KRT78 is due to adoption of an existing enhancer, not
creation of a new enhancer. Interestingly, through my analysis of H3K27Ac sites associated with
PCa risk, I identified an H3K27Ac site overlapping several PCa risk-associated SNPs that is ~ 70
kb from the KCNN3 transcription start site and an H3K27Ac site overlapping several PCa risk-
associated SNPs that is ~60 kb upstream from the KRT78 transcription start site. I note that in
each case, the PCa risk-associated H3K27Ac site is the closest H3K27Ac site to the deleted CTCF
site and is the first H3K27Ac at the edge of the enhancer desert. Thus, these PCa risk-associated
H3K27Ac sites may be involved in “enhancer adoption” by the promoters of the KCNN3 and
KRT78 genes in the cells deleted for the PCa risk-associated CTCF sites. Additional studies are
required using cells homozygously deleted for these enhancers to analyze their effects on gene
regulation.
Identification of transcription factors linked to increased risk of prostate cancer.
Going forward, there are many future directions that this study can evolve into. By thoroughly
annotating GWAS SNPs that are within coding regions, within promoters, or within enhancers
or CTCF sites that are looped to promoters, I have identified ~200 genes that may be involved in
increased risk of prostate cancer. Of these, 21 are known transcription factors.
One promising direction will be to understand the role of risk related transcription factors. Dixit
et al. recently developed perturb-seq, a technique that allow researchers to use CRISPR
screening system to perturb multiple transcription factor at once and use single cell RNA-seq to
exam the transcriptome changes. Perturb-seq enables the study of gene regulatory circuit at a
high throughput manner. Using perturb-seq to study these PCa susceptibility related genes
might shad lights in the disease mechanism.
Recently, new studies had pushed the total prostate cancer risk loci to around 170 (Schumacher
et al., 2018). Annotation of the novel loci might uncover more risk CREs that play vital role in
gene regulation.
56
References
Abrams, H. L., Spiro, R., & Goldstein, N. (1950). Metastases in carcinoma; analysis of 1000
autopsied cases. Cancer, 3(1), 74–85. Retrieved from
https://www.ncbi.nlm.nih.gov/pubmed/15405683
Akdemir, K. C., & Chin, L. (2015). HiCPlotter integrates genomic data with interaction matrices.
Genome Biol, 16, 198. https://doi.org/10.1186/s13059-015-0767-1
Al Olama, A. A., Kote-Jarai, Z., Berndt, S. I., Conti, D. V., Schumacher, F., Han, Y., … Haiman, C. A.
(2014). A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for
prostate cancer. Nature Genetics. https://doi.org/10.1038/ng.3094
Amin Al Olama, A., Dadaev, T., Hazelett, D. J., Li, Q., Leongamornlert, D., Saunders, E. J., … Kote-
Jarai, Z. (2015). Multiple novel prostate cancer susceptibility signals identified by fine-
mapping of known risk loci among Europeans. Hum Mol Genet, 24(19), 5589–5602.
https://doi.org/10.1093/hmg/ddv203
Ay, F., Bailey, T. L., & Noble, W. S. (2014). Statistical confidence estimation for Hi-C data reveals
regulatory chromatin contacts. Genome Res, 24(6), 999–1011.
https://doi.org/10.1101/gr.160374.113
Bello, D., Webber, M. M., Kleinman, H. K., Wartinger, D. D., & Rhim, J. S. (1997). Androgen
responsive adult human prostatic epithelial cell lines immortalized by human
papillomavirus 18. Carcinogenesis, 18(6), 1215–1223. Retrieved from
https://www.ncbi.nlm.nih.gov/pubmed/9214605
Berndt, S. I., Wang, Z., Yeager, M., Alavanja, M. C., Albanes, D., Amundadottir, L., … Blot, W. J.
(2015). Two susceptibility loci identified for prostate cancer aggressiveness. Nature
Communications. https://doi.org/10.1038/ncomms7889
Berthe, W., Sevrain, C. M., Chantome, A., Bouchet, A. M., Gueguinou, M., Fourbon, Y., … Jaffres,
P. A. (2016). New Disaccharide-Based Ether Lipids as SK3 Ion Channel Inhibitors.
ChemMedChem, 11(14), 1531–1539. https://doi.org/10.1002/cmdc.201600147
Bubendorf, L., Schopfer, A., Wagner, U., Sauter, G., Moch, H., Willi, N., … Mihatsch, M. J. (2000).
Metastatic patterns of prostate cancer: an autopsy study of 1,589 patients. Hum Pathol,
31(5), 578–583. Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/10836297
Cancer Genome Atlas Research, N. (2015). The Molecular Taxonomy of Primary Prostate
Cancer. Cell, 163(4), 1011–1025. https://doi.org/10.1016/j.cell.2015.10.025
Chantome, A., Potier-Cartereau, M., Clarysse, L., Fromont, G., Marionneau-Lambot, S.,
Gueguinou, M., … Vandier, C. (2013). Pivotal role of the lipid Raft SK3-Orai1 complex in
human cancer cell migration and bone metastases. Cancer Res, 73(15), 4852–4861.
57
https://doi.org/10.1158/0008-5472.CAN-12-4572
Chen, X., Xu, H., Yuan, P., Fang, F., Huss, M., Vega, V. B., … Ng, H. H. (2008). Integration of
external signaling pathways with the core transcriptional network in embryonic stem cells.
Cell, 133(6), 1106–1117. https://doi.org/10.1016/j.cell.2008.04.043
Conway, E., Healy, E., & Bracken, A. P. (2015). PRC2 mediated H3K27 methylations in cellular
identity and cancer. Curr Opin Cell Biol, 37, 42–48.
https://doi.org/10.1016/j.ceb.2015.10.003
Creyghton, M. P., Cheng, A. W., Welstead, G. G., Kooistra, T., Carey, B. W., Steine, E. J., …
Jaenisch, R. (2010). Histone H3K27ac separates active from poised enhancers and predicts
developmental state. Proceedings of the National Academy of Sciences of the United States
of America, 107, 21931–21936. https://doi.org/10.1073/pnas.1016071107
de Wit, E., Vos, E. S., Holwerda, S. J., Valdes-Quezada, C., Verstegen, M. J., Teunissen, H., … de
Laat, W. (2015). CTCF Binding Polarity Determines Chromatin Looping. Mol Cell, 60(4),
676–684. https://doi.org/10.1016/j.molcel.2015.09.023
Decker, K. F., Zheng, D., He, Y., Bowman, T., Edwards, J. R., & Jia, L. (2012). Persistent androgen
receptor-mediated transcription in castration-resistant prostate cancer under androgen-
deprived conditions. Nucleic Acids Res, 40(21), 10765–10779.
https://doi.org/10.1093/nar/gks888
Ding, Z., Ni, Y., Timmer, S. W., Lee, B. K., Battenhouse, A., Louzada, S., … Birney, E. (2014).
Quantitative genetics of CTCF binding reveal local sequence effects and different modes of
X-chromosome association. PLoS Genet, 10(11), e1004798.
https://doi.org/10.1371/journal.pgen.1004798
Dobin, A., Davis, C. A., Schlesinger, F., Drenkow, J., Zaleski, C., Jha, S., … Gingeras, T. R. (2013).
STAR: ultrafast universal RNA-seq aligner. Bioinformatics, 29(1), 15–21.
https://doi.org/10.1093/bioinformatics/bts635
Eeles, R. A., Kote-Jarai, Z., Al Olama, A. A., Giles, G. G., Guy, M., Severi, G., … Easton, D. F.
(2009). Identification of seven new prostate cancer susceptibility loci through a genome-
wide association study. Nature Genetics. https://doi.org/10.1038/ng.450
Elkon, R., & Agami, R. (2017). Characterization of noncoding regulatory DNA in the human
genome. Nature Biotechnology. https://doi.org/10.1038/nbt.3863
Flavahan, W. A., Drier, Y., Liau, B. B., Gillespie, S. M., Venteicher, A. S., Stemmer-Rachamimov,
A. O., … Bernstein, B. E. (2016). Insulator dysfunction and oncogene activation in IDH
mutant gliomas. Nature, 529(7584), 110–114. https://doi.org/10.1038/nature16490
58
Girault, A., Haelters, J. P., Potier-Cartereau, M., Chantome, A., Pinault, M., Marionneau-Lambot,
S., … Vandier, C. (2011). New alkyl-lipid blockers of SK3 channels reduce cancer cell
migration and occurrence of metastasis. Curr Cancer Drug Targets, 11(9), 1111–1125.
Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/21999627
Gomez-Marin, C., Tena, J. J., Acemel, R. D., Lopez-Mayorga, M., Naranjo, S., de la Calle-
Mustienes, E., … Gomez-Skarmeta, J. L. (2015). Evolutionary comparison reveals that
diverging CTCF sites are signatures of ancestral topological associating domains borders.
Proc Natl Acad Sci U S A, 112(24), 7542–7547. https://doi.org/10.1073/pnas.1505463112
Grimmer, M. R., & Costello, J. F. (2016). Cancer: Oncogene brought into the loop. Nature,
529(7584), 34–35. https://doi.org/10.1038/nature16330
Groschel, S., Sanders, M. A., Hoogenboezem, R., de Wit, E., Bouwman, B. A., Erpelinck, C., …
Delwel, R. (2014). A single oncogenic enhancer rearrangement causes concomitant EVI1
and GATA2 deregulation in leukemia. Cell, 157(2), 369–381.
https://doi.org/10.1016/j.cell.2014.02.019
GTExConsortium. (2015). Human genomics. The Genotype-Tissue Expression (GTEx) pilot
analysis: multitissue gene regulation in humans. Science, 348(6235), 648–660.
https://doi.org/10.1126/science.1262110
Guo, Y., Xu, Q., Canzio, D., Shou, J., Li, J., Gorkin, D. U., … Wu, Q. (2015). CRISPR Inversion of
CTCF Sites Alters Genome Topology and Enhancer/Promoter Function. Cell, 162(4), 900–
910. https://doi.org/10.1016/j.cell.2015.07.038
Han, Y., Hazelett, D. J., Wiklund, F., Schumacher, F. R., Stram, D. O., Berndt, S. I., … Haiman, C. A.
(2015). Integration of multiethnic fine-mapping and genomic annotation to prioritize
candidate functional SNPs at prostate cancer susceptibility regions. Hum Mol Genet,
24(19), 5603–5618. https://doi.org/10.1093/hmg/ddv269
Han, Y., Rand, K. A., Hazelett, D. J., Ingles, S. A., Kittles, R. A., Strom, S. S., … Haiman, C. A.
(2016). Prostate Cancer Susceptibility in Men of African Ancestry at 8q24. J Natl Cancer
Inst, 108(7). https://doi.org/10.1093/jnci/djv431
Hazelett, D. J., Conti, D. V, Han, Y., Al Olama, A. A., Easton, D., Eeles, R. A., … Coetzee, G. A.
(2016). Reducing GWAS Complexity. Cell Cycle, 15(1), 22–24.
https://doi.org/10.1080/15384101.2015.1120928
Hazelett, D. J., Rhie, S. K., Gaddis, M., Yan, C., Lakeland, D. L., Coetzee, S. G., … Coetzee, G. A.
(n.d.). Comprehensive functional annotation of 77 prostate cancer risk loci. PLoS Genet,
10(1), e1004102. https://doi.org/10.1371/journal.pgen.1004102
Hazelett, D. J., Rhie, S. K., Gaddis, M., Yan, C., Lakeland, D. L., Coetzee, S. G., … Coetzee, G. A.
59
(2014). Comprehensive functional annotation of 77 prostate cancer risk loci. PLoS Genet,
10(1), e1004102. https://doi.org/10.1371/journal.pgen.1004102
Hnisz, D., Day, D. S., & Young, R. A. (2016). Insulated Neighborhoods: Structural and Functional
Units of Mammalian Gene Control. Cell, 167(5), 1188–1200.
https://doi.org/10.1016/j.cell.2016.10.024
Hnisz, D., Weintraub, A. S., Day, D. S., Valton, A. L., Bak, R. O., Li, C. H., … Young, R. A. (2016).
Activation of proto-oncogenes by disruption of chromosome neighborhoods. Science,
351(6280), 1454–1458. https://doi.org/10.1126/science.aad9024
Horoszewicz, J. S., Leong, S. S., Chu, T. M., Wajsman, Z. L., Friedman, M., Papsidero, L., …
Sandberg, A. A. (1980). The LNCaP cell line--a new model for studies on human prostatic
carcinoma. Prog Clin Biol Res, 37, 115–132. Retrieved from
https://www.ncbi.nlm.nih.gov/pubmed/7384082
Ibn-Salem, J., Kohler, S., Love, M. I., Chung, H. R., Huang, N., Hurles, M. E., … Robinson, P. N.
(2014). Deletions of chromosomal regulatory boundaries are associated with congenital
disease. Genome Biol, 15(9), 423. https://doi.org/10.1186/s13059-014-0423-1
Korenchuk, S., Lehr, J. E., L, M. C., Lee, Y. G., Whitney, S., Vessella, R., … Pienta, K. J. (2001).
VCaP, a cell-based model system of human prostate cancer. In Vivo, 15(2), 163–168.
Retrieved from https://www.ncbi.nlm.nih.gov/pubmed/11317522
Krzywinski, M., Schein, J., Birol, I., Connors, J., Gascoyne, R., Horsman, D., … Marra, M. A.
(2009). Circos: an information aesthetic for comparative genomics. Genome Res, 19(9),
1639–1645. https://doi.org/10.1101/gr.092759.109
Li, Y., Rivera, C. M., Ishii, H., Jin, F., Selvaraj, S., Lee, A. Y., … Ren, B. (2014). CRISPR reveals a
distal super-enhancer required for Sox2 expression in mouse embryonic stem cells. PLoS
One, 9(12), e114485. https://doi.org/10.1371/journal.pone.0114485
Luo, Z., Rhie, S. K., Lay, F. D., & Farnham, P. J. (2017). A Prostate Cancer Risk Element Functions
as a Repressive Loop that Regulates HOXA13. Cell Rep, 21(6), 1411–1417.
https://doi.org/10.1016/j.celrep.2017.10.048
Lupianez, D. G., Kraft, K., Heinrich, V., Krawitz, P., Brancati, F., Klopocki, E., … Mundlos, S.
(2015). Disruptions of topological chromatin domains cause pathogenic rewiring of gene-
enhancer interactions. Cell, 161(5), 1012–1025. https://doi.org/10.1016/j.cell.2015.04.004
McCarthy, D. J., Chen, Y., & Smyth, G. K. (2012). Differential expression analysis of multifactor
RNA-Seq experiments with respect to biological variation. Nucleic Acids Res, 40(10), 4288–
4297. https://doi.org/10.1093/nar/gks042
60
Mumbach, M. R., Rubin, A. J., Flynn, R. A., Dai, C., Khavari, P. A., Greenleaf, W. J., & Chang, H. Y.
(2016). HiChIP: efficient and sensitive analysis of protein-directed genome architecture.
Nat Methods, 13(11), 919–922. https://doi.org/10.1038/nmeth.3999
Nora, E. P., Goloborodko, A., Valton, A. L., Gibcus, J. H., Uebersohn, A., Abdennur, N., …
Bruneau, B. G. (2017). Targeted Degradation of CTCF Decouples Local Insulation of
Chromosome Domains from Genomic Compartmentalization. Cell, 169(5), 930–944 e22.
https://doi.org/10.1016/j.cell.2017.05.004
Northcott, P. A., Lee, C., Zichner, T., Stutz, A. M., Erkek, S., Kawauchi, D., … Pfister, S. M. (2014).
Enhancer hijacking activates GFI1 family oncogenes in medulloblastoma. Nature,
511(7510), 428–434. https://doi.org/10.1038/nature13379
O’Brien, J. M. (2000). Environmental and heritable factors in the causation of cancer analyses of
cohorts of twins from Sweden, Denmark, and Finland,. Survey of Ophthalmology.
https://doi.org/10.1016/S0039-6257(00)00165-X
O’Geen, H., Echipare, L., & Farnham, P. J. (2011). Using ChIP-Seq Technology to Generate High-
Resolution Profiles of Histone Modifications. Methods Mol Biol, 791, 265–286.
https://doi.org/10.1007/978-1-61779-316-5_20 [doi]
O’Geen, H., Frietze, S., & Farnham, P. J. (2010). Using ChIP-seq technology to identify targets of
zinc finger transcription factors. Methods Mol Biol, 649, 437–455.
https://doi.org/10.1007/978-1-60761-753-2_27
Potier, M., Chantome, A., Joulin, V., Girault, A., Roger, S., Besson, P., … Vandier, C. (2011). The
SK3/K(Ca)2.3 potassium channel is a new cellular target for edelfosine. Br J Pharmacol,
162(2), 464–479. https://doi.org/10.1111/j.1476-5381.2010.01044.x
Potier, M., Joulin, V., Roger, S., Besson, P., Jourdan, M. L., Leguennec, J. Y., … Vandier, C. (2006).
Identification of SK3 channel as a new mediator of breast cancer cell migration. Mol
Cancer Ther, 5(11), 2946–2953. https://doi.org/10.1158/1535-7163.MCT-06-0194
Quinlan, A. R., & Hall, I. M. (2010). BEDTools: a flexible suite of utilities for comparing genomic
features. Bioinformatics, 26(6), 841–842. https://doi.org/10.1093/bioinformatics/btq033
Ran, F. A., Hsu, P. D., Wright, J., Agarwala, V., Scott, D. A., & Zhang, F. (2013). Genome
engineering using the CRISPR-Cas9 system. Nat Protoc, 8(11), 2281–2308.
https://doi.org/10.1038/nprot.2013.143
Rana, A., Chisholm, G. D., Khan, M., Sekharjit, S. S., Merrick, M. V, & Elton, R. A. (1993). Patterns
of bone metastasis and their prognostic significance in patients with carcinoma of the
prostate. Br J Urol, 72(6), 933–936. Retrieved from
https://www.ncbi.nlm.nih.gov/pubmed/8306158
61
Rao, S. S., Huntley, M. H., Durand, N. C., Stamenova, E. K., Bochkov, I. D., Robinson, J. T., …
Aiden, E. L. (2014). A 3D map of the human genome at kilobase resolution reveals
principles of chromatin looping. Cell, 159(7), 1665–1680.
https://doi.org/10.1016/j.cell.2014.11.021
Rao, S. S. P., Huang, S. C., Glenn St Hilaire, B., Engreitz, J. M., Perez, E. M., Kieffer-Kwon, K. R., …
Aiden, E. L. (2017). Cohesin Loss Eliminates All Loop Domains. Cell, 171(2), 305–320 e24.
https://doi.org/10.1016/j.cell.2017.09.026
Rena D. Malik, George Dakwar, M. E. H., & Nicholas J. Sanfilippo, Andrew B. Rosenkrantz, S. S. T.
(2011). Squamous Cell Carcinoma of the Prostate. Review in Urology, 13(1), 56–60.
Rhie, S. K., Coetzee, S. G., Noushmehr, H., Yan, C., Kim, J. M., Haiman, C. A., & Coetzee, G. A.
(2013). Comprehensive functional annotation of seventy-one breast cancer risk Loci. PLoS
One, 8(5), e63925. https://doi.org/10.1371/journal.pone.0063925
Rhie, S. K., Yao, L., Luo, Z., Witt, H., Schreiner, S., Guo, Y., … Farnham, P. J. (2018). ZFX acts as a
transcriptional activator in multiple types of human tumors by binding downstream of
transcription start sites at the majority of CpG island promoters. Genome Res, 28, 310–
320. https://doi.org/10.1101/gr.228809.117
Risso, D., Ngai, J., Speed, T. P., & Dudoit, S. (2014). Normalization of RNA-seq data using factor
analysis of control genes or samples. Nat Biotechnol, 32(9), 896–902.
https://doi.org/10.1038/nbt.2931
Robinson, M. D., McCarthy, D. J., & Smyth, G. K. (2010). edgeR: a Bioconductor package for
differential expression analysis of digital gene expression data. Bioinformatics, 26(1), 139–
140. https://doi.org/10.1093/bioinformatics/btp616
Rosendahl, C., Cameron, A., Argenziano, G., Zalaudek, I., Tschandl, P., & Kittler, H. (2012).
Dermoscopy of squamous cell carcinoma and keratoacanthoma. Archives of Dermatology.
https://doi.org/10.1001/archdermatol.2012.2974
Schumacher, F. R., Al Olama, A. A., Berndt, S. I., Benlloch, S., Ahmed, M., Saunders, E. J., …
Eeles, R. A. (2018). Association analyses of more than 140,000 men identify 63 new
prostate cancer susceptibility loci. Nature Genetics. https://doi.org/10.1038/s41588-018-
0142-8
Servant, N., Varoquaux, N., Lajoie, B. R., Viara, E., Chen, C. J., Vert, J. P., … Barillot, E. (2015).
HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol, 16, 259.
https://doi.org/10.1186/s13059-015-0831-x
Siegel, R. L., Miller, K. D., & Jemal, A. (2017). Cancer Statistics, 2017. CA: A Cancer Journal for
Clinicians. https://doi.org/10.3322/caac.21387
62
Sramkoski, R. M., Pretlow 2nd, T. G., Giaconia, J. M., Pretlow, T. P., Schwartz, S., Sy, M. S., …
Jacobberger, J. W. (1999). A new human prostate carcinoma cell line, 22Rv1. In Vitro Cell
Dev Biol Anim, 35(7), 403–409. https://doi.org/10.1007/s11626-999-0115-4
Steinestel, K., Eder, S., Ehinger, K., Schneider, J., Genze, F., Winkler, E., … Steinestel, J. (2016).
The small conductance calcium-activated potassium channel 3 (SK3) is a molecular target
for Edelfosine to reduce the invasive potential of urothelial carcinoma cells. Tumour Biol,
37(5), 6275–6283. https://doi.org/10.1007/s13277-015-4509-5
Tak, Y. G., & Farnham, P. J. (2015). Making sense of GWAS: using epigenomics and genome
engineering to understand the functional relevance of SNPs in non-coding regions of the
human genome. Epigenetics Chromatin, 8, 57. https://doi.org/10.1186/s13072-015-0050-4
Tak, Y. G., Hung, Y., Yao, L., Grimmer, M. R., Do, A., Bhakta, M. S., … Farnham, P. J. (2016).
Effects on the transcriptome upon deletion of a distal element cannot be predicted by the
size of the H3K27Ac peak in human cells. Nucleic Acids Res, 44(9), 4123–4133.
https://doi.org/10.1093/nar/gkv1530
Tang, Z., Li, C., Kang, B., Gao, G., Li, C., & Zhang, Z. (2017). GEPIA: A web server for cancer and
normal gene expression profiling and interactive analyses. Nucleic Acids Research.
https://doi.org/10.1093/nar/gkx247
Thalmann, G. N., Anezinis, P. E., Chang, S. M., Zhau, H. E., Kim, E. E., Hopwood, V. L., … Chung, L.
W. (1994). Androgen-independent cancer progression and bone metastasis in the LNCaP
model of human prostate cancer. Cancer Res, 54(10), 2577–2581. Retrieved from
https://www.ncbi.nlm.nih.gov/pubmed/8168083
Thomas, G., Jacobs, K. B., Yeager, M., Kraft, P., Wacholder, S., Orr, N., … Chanock, S. J. (2008).
Multiple loci identified in a genome-wide association study of prostate cancer. Nature
Genetics. https://doi.org/10.1038/ng.91
Varga, N., Mozes, J., Keegan, H., White, C., Kelly, L., Pilkington, L., … Jeney, C. (2017). The Value
of a Novel Panel of Cervical Cancer Biomarkers for Triage of HPV Positive Patients and for
Detecting Disease Progression. Pathol Oncol Res, 23(2), 295–305.
https://doi.org/10.1007/s12253-016-0094-1
Vietri Rudan, M., Barrington, C., Henderson, S., Ernst, C., Odom, D. T., Tanay, A., & Hadjur, S.
(2015). Comparative Hi-C Reveals that CTCF Underlies Evolution of Chromosomal Domain
Architecture. Cell Rep, 10(8), 1297–1309. https://doi.org/10.1016/j.celrep.2015.02.004
Wang, L. X., Li, Y., & Chen, G. Z. (2018). Network-based co-expression analysis for exploring the
potential diagnostic biomarkers of metastatic melanoma. PLoS One, 13(1), e0190447.
https://doi.org/10.1371/journal.pone.0190447
63
Wedge, D. C., Gundem, G., Mitchell, T., Woodcock, D. J., Martincorena, I., Ghori, M., … Eeles, R.
A. (2018). Sequencing of prostate cancers identifies new cancer genes, routes of
progression and drug targets. Nat Genet, 50(5), 682–692. https://doi.org/10.1038/s41588-
018-0086-z
Yao, L., Tak, Y. G., Berman, B. P., & Farnham, P. J. (2014). Functional annotation of colon cancer
risk SNPs. Nat Commun, 5, 5114. https://doi.org/10.1038/ncomms6114
Zeisel, A., Yitzhaky, A., Bossel Ben-Moshe, N., & Domany, E. (2013). An accessible database for
mouse and human whole transcriptome qPCR primers. Bioinformatics.
https://doi.org/10.1093/bioinformatics/btt145
Zhang, Y., Liu, T., Meyer, C. A., Eeckhoute, J., Johnson, D. S., Bernstein, B. E., … Liu, X. S. (2008).
Model-based analysis of ChIP-Seq (MACS). Genome Biol, 9(9), R137.
https://doi.org/10.1186/gb-2008-9-9-r137
64
Appendices 1. Table of annotated fine-mapped SNPs
SNP Band Annotation Detail H3K27Ac CTCF Loop
rs7098889 chr10q11.23 Others . Y N Y
rs12241712 chr10q24.32 Intron
intron (ENST00000462202.2,
intron 1 of 4)
Y N Y
rs34032774 chr10q24.32 Promoter
promoter-TSS
(ENST00000479004.1)
Y N N
rs12773600 chr10q24.32 TTS TTS (ENST00000302424.7) Y N N
rs12773833 chr10q24.32 TTS TTS (ENST00000302424.7) Y N N
rs12570611 chr10q24.32 Others . Y N N
rs7094325 chr10q24.32 Others . Y Y N
rs12573077 chr10q24.32 Exon
3' UTR (ENST00000260746.5,
exon 6 of 6)
Y N N
rs11191371 chr10q24.32 Intron
intron (ENST00000260746.5,
intron 2 of 5)
Y N N
rs17784294 chr10q24.32 Intron
intron (ENST00000602660.1,
intron 1 of 4)
Y N Y
rs11598549 chr10q26.13 Intron
intron (ENST00000395705.2,
intron 1 of 4)
Y Y Y
rs4962419 chr10q26.13 Intron
intron (ENST00000460976.1,
intron 2 of 2)
Y Y Y
rs7077275 chr10q26.13 Intron
intron (ENST00000460976.1,
intron 2 of 2)
Y Y Y
rs12769019 chr10q26.13 Intron
intron (ENST00000460976.1,
intron 2 of 2)
Y Y Y
rs72853963 chr11p15.5 Others . Y N Y
rs4077456 chr11p15.5 Others . Y Y Y
rs4077457 chr11p15.5 Others . Y Y Y
rs7481129 chr11p15.5 Others . Y Y Y
rs4076054 chr11p15.5 Others . Y N Y
rs7395734 chr11p15.5 Others . Y N Y
rs6578999 chr11p15.5 Others . Y N Y
rs11603101 chr11p15.5 Others . Y N Y
rs7127900 chr11p15.5 Others . Y N Y
rs1048374 chr11q12.1 Exon
exon (ENST00000501817.2,
exon 4 of 5)
N Y Y
rs4620729 chr11q13.3 Others . N Y Y
rs7939803 chr11q13.3 Others . Y N Y
rs7929962 chr11q13.3 Others . Y N Y
65
rs10896450 chr11q13.3 Others . N Y Y
rs11228583 chr11q13.3 Others . N Y Y
rs74606104 chr11q13.3 Others . Y N Y
rs76941912 chr11q13.3 Others . Y N Y
rs191958513 chr11q13.3 Others . Y N Y
rs184379418 chr11q13.3 Others . Y N Y
rs180898041 chr11q13.3 Others . Y N Y
rs144245804 chr11q13.3 Others . Y N Y
rs36225067 chr11q13.3 Others . Y Y Y
rs55816909 chr11q13.3 Intron
intron (ENST00000536559.1,
intron 1 of 1)
Y Y Y
rs3918296 chr11q13.3 Intron
intron (ENST00000536559.1,
intron 1 of 1)
Y Y Y
rs3862792 chr11q13.3 Exon
exon (ENST00000542367.1,
exon 1 of 2)
Y N Y
rs3212880 chr11q13.3 TTS TTS (ENST00000545484.1) Y N Y
rs6580706 chr12q13.12 Others . Y N Y
rs73106451 chr12q13.13 Others . Y N Y
rs56207534 chr12q13.13 Others . Y N Y
rs73108429 chr12q13.13 Others . Y Y Y
rs4919742 chr12q13.13 Others . Y Y Y
rs902774 chr12q13.13 Others . Y Y Y
rs4919707 chr12q13.13 Others . Y N Y
rs55958994 chr12q13.13 Intron
intron (ENST00000552150.1,
intron 1 of 8)
Y Y Y
rs4919743 chr12q13.13 Intron
intron (ENST00000552150.1,
intron 1 of 8)
Y Y Y
rs1247938 chr12q24.21 Others . N Y N
rs7327286 chr13q22.1 Others . Y N Y
rs7158115 chr14q22.1 Intron
intron (ENST00000554712.1,
intron 1 of 3)
Y Y Y
rs7140266 chr14q24.1 Intron
intron (ENST00000553595.1,
intron 5 of 5)
Y N Y
rs2525530 chr14q24.1 Intron
intron (ENST00000553595.1,
intron 5 of 5)
Y Y Y
rs17105852 chr14q24.1 Intron
intron (ENST00000553595.1,
intron 5 of 5)
Y N Y
rs684232 chr17p13.3 Intron
intron (ENST00000576019.1,
intron 1 of 6)
Y Y Y
rs461251 chr17p13.3 Intron
intron (ENST00000576019.1,
intron 1 of 6)
Y Y Y
66
rs718961 chr17q12 Intron
intron (ENST00000561193.1,
intron 4 of 8)
Y N Y
rs718960 chr17q12 Intron
intron (ENST00000561193.1,
intron 4 of 8)
Y N Y
rs11651052 chr17q12 Intron
intron (ENST00000561193.1,
intron 1 of 8)
Y Y Y
rs11263763 chr17q12 Intron
intron (ENST00000561193.1,
intron 1 of 8)
Y Y Y
rs142415707 chr17q21.33 Intron
intron (ENST00000510360.2,
intron 2 of 2)
N Y Y
rs77488129 chr17q21.33 Intron
intron (ENST00000576461.1,
intron 1 of 2)
Y N Y
rs8072520 chr17q21.33 Intron
intron (ENST00000576461.1,
intron 2 of 2)
Y N Y
rs57030023 chr17q21.33 Intron
intron (ENST00000576461.1,
intron 2 of 2)
Y N Y
rs77653195 chr17q21.33 Intron
intron (ENST00000576461.1,
intron 2 of 2)
Y N Y
rs11653775 chr17q21.33 Intron
intron (ENST00000576461.1,
intron 2 of 2)
Y Y Y
rs8072254 chr17q24.3 Intron
intron (ENST00000569074.1,
intron 3 of 3)
Y N Y
rs984434 chr17q24.3 Intron
intron (ENST00000569074.1,
intron 3 of 3)
Y N Y
rs8102476 chr19q13.2 Intron
intron (ENST00000590510.1,
intron 1 of 2)
Y Y Y
rs12976534 chr19q13.2 Intron
intron (ENST00000590510.1,
intron 1 of 2)
Y N Y
rs12611084 chr19q13.2 Intron
intron (ENST00000590510.1,
intron 1 of 2)
Y N Y
rs4803934 chr19q13.2 Intron
intron (ENST00000590510.1,
intron 1 of 2)
N Y Y
rs8100395 chr19q13.2 Intron
intron (ENST00000590510.1,
intron 1 of 2)
N Y Y
rs7247241 chr19q13.2 Intron
intron (ENST00000590510.1,
intron 1 of 2)
N Y Y
rs7250689 chr19q13.2 Intron
intron (ENST00000590510.1,
intron 1 of 2)
Y N Y
rs887391 chr19q13.2 Intron
intron (ENST00000595837.1,
intron 1 of 1)
Y N Y
rs11673591 chr19q13.2 Intron
intron (ENST00000595837.1,
intron 1 of 1)
Y N Y
67
rs6656494 chr1q21.3 Intron
intron (ENST00000358505.2,
intron 1 of 7)
N Y Y
rs10908444 chr1q21.3 Intron
intron (ENST00000358505.2,
intron 1 of 7)
N Y Y
rs7531728 chr1q21.3 Intron
intron (ENST00000271915.4,
intron 1 of 7)
N Y Y
rs12069356 chr1q21.3 Others . Y N Y
rs12144978 chr1q21.3 Others . N Y Y
rs906593 chr1q21.3 Others . N Y Y
rs4845689 chr1q21.3 Others . N Y Y
rs4240874 chr1q21.3 Others . Y Y Y
rs4240875 chr1q21.3 Others . Y Y Y
rs877343 chr1q21.3 Promoter
promoter-TSS
(ENST00000368467.3)
Y Y Y
rs4845694 chr1q21.3 Others . Y N Y
rs72702224 chr1q21.3 Others . Y N Y
rs4845695 chr1q21.3 Others . Y N Y
rs6686873 chr1q21.3 Others . Y N Y
rs2061690 chr1q21.3 Exon
exon (ENST00000368463.3,
exon 10 of 11)
Y N Y
rs938126 chr1q32.1 Intron
intron (ENST00000429009.1,
intron 2 of 2)
Y N Y
rs938125 chr1q32.1 Intron
intron (ENST00000429009.1,
intron 2 of 2)
Y N Y
rs10900590 chr1q32.1 Intron
intron (ENST00000429009.1,
intron 2 of 2)
Y N Y
rs4951382 chr1q32.1 Intron
intron (ENST00000429009.1,
intron 2 of 2)
Y N Y
rs4951389 chr1q32.1 Others . Y N Y
rs11240755 chr1q32.1 Others . Y N Y
rs4245736 chr1q32.1 Intron
intron (ENST00000471783.1,
intron 1 of 5)
Y Y Y
rs2169137 chr1q32.1 Intron
intron (ENST00000367183.3,
intron 2 of 2)
N Y Y
rs6681905 chr1q32.1 Intron
intron (ENST00000507825.2,
intron 6 of 8)
Y Y Y
rs2258056 chr20q13.33 Exon
exon (ENST00000496281.1,
exon 12 of 14)
Y Y Y
rs1623866 chr20q13.33 Promoter
promoter-TSS
(ENST00000485858.1)
Y N Y
rs2427530 chr20q13.33 Intron
intron (ENST00000328969.5,
intron 2 of 6)
Y N N
68
rs2236510 chr20q13.33 Intron
intron (ENST00000496820.1,
intron 1 of 3)
Y Y N
rs914559 chr20q13.33 Promoter
promoter-TSS
(ENST00000494776.1)
Y Y N
rs1151624 chr20q13.33 TTS TTS (ENST00000465591.1) Y Y N
rs1056441 chr20q13.33 TTS TTS (ENST00000465591.1) Y N N
rs6122159 chr20q13.33 TTS TTS (ENST00000496820.1) Y N N
rs1741708 chr20q13.33 Promoter
promoter-TSS
(ENST00000493772.1)
Y Y N
rs2253823 chr20q13.33 Promoter
promoter-TSS
(ENST00000473157.1)
Y Y N
rs2253829 chr20q13.33 Promoter
promoter-TSS
(ENST00000473157.1)
Y Y N
rs2281926 chr20q13.33 Intron
intron (ENST00000302995.2,
intron 3 of 6)
N Y Y
rs4809334 chr20q13.33 Intron
intron (ENST00000302995.2,
intron 3 of 6)
N Y Y
rs138029 chr22q13.1 Intron
intron (ENST00000402203.1,
intron 4 of 23)
Y N Y
rs5759167 chr22q13.2 Others . Y N Y
rs4988372 chr22q13.2 Promoter
promoter-TSS
(ENST00000216115.2)
Y Y Y
rs4988434 chr22q13.2 Intron
intron (ENST00000216115.2,
intron 3 of 4)
Y N Y
rs5759182 chr22q13.2 Promoter
promoter-TSS
(ENST00000608052.1)
Y N Y
rs7591175 chr2p11.2 Others . Y N Y
rs2166529 chr2p11.2 Others . Y N Y
rs6705971 chr2p11.2 Others . Y N Y
rs2028900 chr2p11.2 Intron
intron (ENST00000306434.3,
intron 1 of 8)
Y N Y
rs1078004 chr2p11.2 TTS TTS (ENST00000469221.1) Y N Y
rs7605975 chr2p11.2 Exon
3' UTR (ENST00000233838.4,
exon 15 of 15)
Y N Y
rs12473819 chr2p11.2 Exon
3' UTR (ENST00000233838.4,
exon 15 of 15)
Y Y Y
rs6547621 chr2p11.2 Exon
3' UTR (ENST00000233838.4,
exon 15 of 15)
Y Y Y
rs7568458 chr2p11.2 Intron
intron (ENST00000481541.1,
intron 1 of 1)
Y Y Y
rs1562323 chr2p11.2 TTS TTS (ENST00000432071.1) Y N Y
69
rs58235267 chr2p15 Promoter
promoter-TSS
(ENST00000484066.2)
Y Y Y
rs9306894 chr2p24.1 TTS TTS (ENSG00000270100.1) N Y Y
rs9306895 chr2p24.1 Exon
exon (ENST00000565841.1,
exon 1 of 1)
N Y Y
rs75166599 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs75689576 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs28485589 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs57878744 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs3762618 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs1574259 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs12052253 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs16860419 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs77167534 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y Y Y
rs12622724 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y N Y
rs74461567 chr2q31.1 Intron
intron (ENST00000409532.1,
intron 1 of 25)
Y Y Y
rs59476799 chr2q31.1 Intron
intron (ENST00000442417.1,
intron 2 of 3)
N Y Y
rs13408361 chr2q37.3 Others . Y N Y
rs7569197 chr2q37.3 Others . Y N Y
rs67071074 chr2q37.3 Promoter
promoter-TSS
(ENST00000264605.3)
Y Y Y
rs58057291 chr2q37.3 Promoter
promoter-TSS
(ENST00000264605.3)
Y Y Y
rs6760842 chr2q37.3 Intron
intron (ENST00000429898.1,
intron 1 of 2)
Y Y Y
rs6724766 chr2q37.3 Intron
intron (ENST00000264605.3,
intron 2 of 15)
Y N Y
rs10187185 chr2q37.3 Intron
intron (ENST00000264605.3,
intron 2 of 15)
Y N Y
70
rs6737888 chr2q37.3 Intron
intron (ENST00000264605.3,
intron 2 of 15)
Y N Y
rs3087553 chr2q37.3 Intron
intron (ENST00000468178.1,
intron 4 of 11)
Y Y Y
rs10184904 chr2q37.3 TTS TTS (ENST00000477501.1) Y N Y
rs10176842 chr2q37.3 Intron
intron (ENST00000494110.1,
intron 1 of 5)
Y N Y
rs7570882 chr2q37.3 Intron
intron (ENST00000410032.1,
intron 7 of 13)
Y N Y
rs11896232 chr2q37.3 Intron
intron (ENST00000410032.1,
intron 7 of 13)
Y N Y
rs7586312 chr2q37.3 Intron
intron (ENST00000410032.1,
intron 7 of 13)
Y N Y
rs7574337 chr2q37.3 Intron
intron (ENST00000410032.1,
intron 7 of 13)
Y N Y
rs12471291 chr2q37.3 Intron
intron (ENST00000410032.1,
intron 7 of 13)
Y N Y
rs2292884 chr2q37.3 Intron
intron (ENST00000410032.1,
intron 7 of 13)
Y N Y
rs13426236 chr2q37.3 Intron
intron (ENST00000410032.1,
intron 7 of 13)
Y N Y
rs13429533 chr2q37.3 Intron
intron (ENST00000410032.1,
intron 7 of 13)
Y N Y
rs192127563 chr2q37.3 Intron
intron (ENST00000274979.8,
intron 19 of 24)
Y N Y
rs111770284 chr2q37.3 Intron
intron (ENST00000274979.8,
intron 19 of 24)
Y Y Y
rs76832527 chr2q37.3 Promoter
promoter-TSS
(ENST00000487192.1)
Y Y Y
rs78718649 chr3p12.1 Others . Y Y Y
rs712529 chr3q13.2 Intron
intron (ENST00000355385.3,
intron 15 of 19)
Y N Y
rs7632756 chr3q21.3 Intron
intron (ENST00000478892.1,
intron 2 of 5)
Y N Y
rs9814834 chr3q21.3 Intron
intron (ENST00000484438.1,
intron 1 of 2)
Y N Y
rs6763931 chr3q23 Intron
intron (ENST00000509842.1,
intron 4 of 7)
Y Y Y
rs2871960 chr3q23 Intron
intron (ENST00000513249.1,
intron 1 of 1)
Y N N
rs1344674 chr3q23 Intron
intron (ENST00000514251.1,
intron 2 of 3)
Y N N
71
rs1344672 chr3q23 Intron
intron (ENST00000514251.1,
intron 2 of 3)
Y N N
rs78416326 chr3q26.2 Promoter
promoter-TSS
(ENST00000477216.1)
Y N Y
rs6446944 chr4q13.3 Intron
intron (ENST00000335049.5,
intron 2 of 9)
Y N Y
rs1957659 chr4q13.3 Intron
intron (ENST00000335049.5,
intron 2 of 9)
Y N Y
rs13131954 chr4q13.3 Intron
intron (ENST00000335049.5,
intron 2 of 9)
Y N Y
rs1534642 chr4q13.3 Intron
intron (ENST00000512591.1,
intron 1 of 2)
Y Y Y
rs62312151 chr4q13.3 Others . Y N Y
rs4694629 chr4q13.3 Others . Y Y Y
rs998071 chr4q22.3 Others . Y N Y
rs7679673 chr4q24 Intron
intron (ENST00000504082.1,
intron 1 of 1)
N Y Y
rs12652957 chr5p12 Intron
intron (ENST00000264664.4,
intron 1 of 2)
N Y Y
rs987642 chr5p12 Intron
intron (ENST00000264664.4,
intron 1 of 2)
N Y Y
rs4714476 chr6p21.1 Intron
intron (ENST00000440194.1,
intron 1 of 1)
Y Y Y
rs6935446 chr6p21.1 Intron
intron (ENST00000432751.1,
intron 1 of 1)
Y Y Y
rs6935737 chr6p21.1 Intron
intron (ENST00000432751.1,
intron 1 of 1)
Y Y Y
rs9381075 chr6p21.1 Intron
intron (ENST00000454812.1,
intron 1 of 1)
Y Y Y
rs3747744 chr6p21.1 Intron
intron (ENST00000373057.3,
intron 1 of 16)
N Y Y
rs9296365 chr6p21.1 Intron
intron (ENST00000373063.3,
intron 2 of 16)
Y N Y
rs4714485 chr6p21.1 Intron
intron (ENST00000373063.3,
intron 2 of 16)
Y N Y
rs4714487 chr6p21.1 Intron
intron (ENST00000373063.3,
intron 3 of 16)
Y N Y
rs3129857 chr6p21.32 Others . N Y Y
rs6908184 chr6q21 Intron
intron (ENST00000436639.2,
intron 1 of 9)
Y Y Y
rs339328 chr6q22.1 Intron
intron (ENST00000487683.1,
intron 4 of 13)
Y Y Y
72
rs339331 chr6q22.1 Intron
intron (ENST00000487683.1,
intron 4 of 13)
Y N Y
rs6939447 chr6q25.2 Intron
intron (ENST00000206262.1,
intron 1 of 4)
Y N Y
rs6901126 chr6q25.2 Intron
intron (ENST00000206262.1,
intron 1 of 4)
Y N Y
rs9384082 chr6q25.2 Intron
intron (ENST00000206262.1,
intron 1 of 4)
Y N Y
rs9479517 chr6q25.2 Intron
intron (ENST00000206262.1,
intron 1 of 4)
Y N Y
rs13215402 chr6q25.2 Intron
intron (ENST00000206262.1,
intron 1 of 4)
Y N Y
rs6934121 chr6q25.2 Intron
intron (ENST00000206262.1,
intron 1 of 4)
Y N Y
rs6932847 chr6q25.2 Intron
intron (ENST00000206262.1,
intron 1 of 4)
Y N Y
rs4646284 chr6q25.3 Others . Y Y Y
rs388170 chr6q25.3 Intron
intron (ENST00000392145.1,
intron 5 of 10)
Y N Y
rs3106162 chr6q25.3 Intron
intron (ENST00000454031.1,
intron 13 of 16)
Y N Y
rs12666442 chr7p15.2 Intron
intron (ENST00000447620.1,
intron 2 of 4)
Y N Y
rs67152137 chr7p15.2 Intron
intron (ENST00000447620.1,
intron 2 of 4)
Y N Y
rs10486567 chr7p15.2 Intron
intron (ENST00000447620.1,
intron 2 of 4)
Y Y Y
rs989481 chr7q21.3 Others . Y N Y
rs6967926 chr7q21.3 Others . Y Y Y
rs13221897 chr7q21.3 Others . Y N Y
rs13221758 chr7q21.3 Others . Y N Y
rs12666406 chr7q21.3 Intron
intron (ENST00000297293.5,
intron 6 of 13)
Y Y Y
rs6965016 chr7q21.3 Intron
intron (ENST00000297293.5,
intron 7 of 13)
Y N Y
rs6465657 chr7q21.3 Intron
intron (ENST00000297293.5,
intron 9 of 13)
Y N Y
rs6991461 chr8p21.2 Others . Y N Y
rs11994906 chr8p21.2 Others . Y N Y
rs11985071 chr8p21.2 Others . Y N Y
rs11998600 chr8p21.2 Others . Y N Y
rs4872162 chr8p21.2 Others . Y N Y
73
rs4872175 chr8p21.2 Others . Y N Y
rs11992541 chr8p21.2 Intron
intron (ENST00000517825.1,
intron 1 of 5)
N Y Y
rs75500912 chr8q24.21 Intron
intron (ENST00000523510.1,
intron 2 of 3)
Y N Y
rs73705708 chr8q24.21 Intron
intron (ENST00000523510.1,
intron 1 of 3)
Y N Y
rs56006726 chr8q24.21 Promoter
promoter-TSS
(ENST00000519282.1)
Y N Y
rs7001895 chr8q24.21 Others . Y N Y
rs114099351 chr8q24.21 Others . Y N Y
rs61265543 chr8q24.21 Others . Y N Y
rs16901996 chr8q24.21 Others . Y N Y
rs76784613 chr8q24.21 Others . N Y Y
rs76387712 chr8q24.21 TTS TTS (ENST00000500112.1) Y Y Y
rs139297625 chr8q24.21 Others . Y N Y
rs10114711 chr9q31.2 Others . Y N Y
rs1746822 chr9q31.2 Others . Y N Y
rs1746823 chr9q31.2 Others . Y N Y
rs1771725 chr9q31.2 Others . Y N Y
rs1746826 chr9q31.2 Others . Y N Y
rs1771718 chr9q31.2 Others . Y N Y
rs1746831 chr9q31.2 Others . Y N Y
rs1984037 chr9q33.2 Intron
intron (ENST00000259371.2,
intron 1 of 16)
Y N Y
rs6530332 chrXp22.2 Intron
intron (ENST00000380913.3,
intron 1 of 9)
Y N Y
rs6530333 chrXp22.2 Intron
intron (ENST00000380913.3,
intron 1 of 9)
Y N Y
rs5933763 chrXp22.2 Intron
intron (ENST00000380913.3,
intron 1 of 9)
Y N Y
rs5933765 chrXp22.2 Intron
intron (ENST00000380913.3,
intron 1 of 9)
Y N Y
rs5964602 chrXq12 Others . Y N Y
rs6152 chrXq12 Exon
exon (ENST00000513847.1,
exon 1 of 4)
Y Y Y
rs5919393 chrXq12 Intron
intron (ENST00000374690.3,
intron 1 of 7)
Y N Y
74
Appendices 2. Collection of publications
RESEARCH Open Access
CRISPR-mediated deletion of prostate
cancer risk-associated CTCF loop anchors
identifies repressive chromatin loops
Yu Guo
1
, Andrew A. Perez
1
, Dennis J. Hazelett
2
, Gerhard A. Coetzee
3
, Suhn Kyong Rhie
1*
and Peggy J. Farnham
1,4*
Abstract
Background: Recent genome-wide association studies (GWAS) have identified more than 100 loci associated with
increased risk of prostate cancer, most of which are in non-coding regions of the genome. Understanding the function
of these non-coding risk loci is critical to elucidate the genetic susceptibility to prostate cancer.
Results: We generate genome-wide regulatory element maps and performed genome-wide chromosome confirmation
capture assays (in situ Hi-C) in normal and tumorigenic prostate cells. Using this information, we annotate the regulatory
potential of 2,181 fine-mapped prostate cancer risk-associated SNPs and predict a set of target genes that are regulated
by prostate cancer risk-related H3K27Ac-mediated loops. We next identify prostate cancer risk-associated CTCF sites
involved in long-range chromatin loops. We use CRISPR-mediated deletion to remove prostate cancer risk-associated
CTCF anchor regions and the CTCF anchor regions looped to the prostate cancer risk-associated CTCF sites, and we
observe up to 100-fold increases in expression of genes within the loops when the prostate cancer risk-associated
CTCF anchor regions are deleted.
Conclusions: We identify GWAS risk loci involved in long-range loops that function to repress gene expression within
chromatin loops. Our studies provide new insights into the genetic susceptibility to prostate cancer.
Keywords: CTCF, GWAS SNPs, CRISPR, Prostate cancer, Chromatin looping
Background
Prostate cancer (PCa) is the leading cause of new cancer
cases and the third cause of cancer death among men in
the USA [1]. Of note, 42% of prostate cancer susceptibility
can be accounted for by genetic factors, the highest among
all cancer types [2]. Therefore, it is of critical importance
to understandtheunderlyinggeneticmechanismsthatlead
to PCa. Investigators have used genome-wide association
studies (GWAS) to investigate the genetic components of
risk for PCa. The first step in GWAS employs arrays of 1–
5millionselectedsinglenucleotidepolymorphisms(SNPs),
which allows the identification of risk-associated haplotype
blocks in the human genome. Because large regions of the
human genome are inherited in blocks, each risk locus
potentially contains many risk-associated SNPs. Fine-
mapping studies are next performed to more fully
characterize these risk loci, identifying the SNPs that
are in high linkage disequilibrium with the GWAS-
identified index SNP and that are most highly associ-
ated with disease risk (as defined by allelic frequencies
that are statistically most different between cases and
controls). To date, GWAS has identified more than 100
prostate cancer risk loci [3–7], with subsequent
fine-mapping studies employing both a multi-ethnic
and a single large European population identifying at
least 2,181 PCa risk-associated SNPs [8–10]. Although
considerable progress has been made in identifying
genetic variation linked to disease, the task of defining
the mechanisms by which individual SNPs contribute
to disease risk remains a great challenge. One reason
for this lack of progress is because a great majority of
the risk-related SNPs lie in non-coding regions of the
genome. Thus, the GWAS field has been left with the
* Correspondence: rhie@usc.edu; peggy.farnham@med.usc.edu
1
Department of Biochemistry and Molecular Medicine and the Norris
Comprehensive Cancer Center, Keck School of Medicine, University of
Southern California, 1450 Biggy Street, NRT 6503, Los Angeles, CA
90089-9601, USA
Full list of author information is available at the end of the article
© The Author(s). 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0
International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and
reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to
the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver
(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Guo et al. Genome Biology (2018) 19:160
https://doi.org/10.1186/s13059-018-1531-0
conundrum as to how a single nucleotide change in a
non-coding region might confer increased risk for a
specific disease. These non-coding risk-associated
SNPs clearly do not affect disease risk by changing the
function of a specific protein but rather it is thought
that a subset of these SNPs may contribute to changes
in levels of expression of a key protein or non-coding
regulatory RNA [11–15]. Deciphering which risk-asso-
ciated SNP is likely to be a functional SNP (i.e., a SNP
that contributes to changes in gene expression) and
not simply a “hitchhiker” SNP is the first step in a
post-GWAS study [12, 16]. We reasoned that risk-as-
sociated SNPs lying withinregulatory elements are
more likely to be causal, rather than hitchhiker SNPs.
Therefore, our approach, described in detail below,
was to perform a comprehensive analysis of the regula-
tory potential of all prostate cancer risk-associated
SNPs identified by the fine-mapping studies, by com-
paring the location of each SNP to regulatory elements
(promoters, enhancers, insulators, and chromatin loop
anchors) that are active in prostate cells. Using this ap-
proach, we reduced the set of 2,181 fine-mapped PCa
risk-associated SNPs to a smaller set of ~300 candi-
date functional SNPs. After selecting the subset of
SNPs that are in active regulatory regions, we next
assayed the effects of removal of a small genomic re-
gion harboring a SNP-containing regulatory element
on gene expression [12]. Using CRISPR-mediated dele-
tion of candidate functional PCa risk-associated SNPs
at two risk loci, we have identified long-range loops
that function to repress gene expression.
Results
Identification of PCa risk-associated regulatory elements
Our goal in this study was to identify PCa risk-associ-
ated SNPs that are important in regulating gene expres-
sion (e.g., by their influence on the activity of distal
enhancers or via their involvement in maintaining 3D
chromatin structure). As described above, fine-mapping
has been previously performed to expand the set of
prostate cancer GWAS index SNPs to a larger set of
2,181 PCa risk-associated SNPs that are potential causal
variants [8–10]. As our first step (Fig. 1), we determined
which of the 2,181 fine-mapped PCa SNPs are located
within known DNase hypersensitive sites (DHS). We
began with this comparison because, in contrast to
ChIP-seq peaks of histone modifications which are fairly
broad, DHS sites identify relatively narrow regions of
open chromatin that closely correspond to the transcrip-
tion factor (TF) binding platform of regulatory elements.
By first requiring the SNPs to overlap a DHS, we reduce
the number of “false-positive” SNPs that lie at the outer
margins of broad ChIP-seq peaks. To capture as many
SNPs within regulatory elements as possible, we ob-
tained a set of 2.89 million DHS peaks that have been
identified from a large number of human cell lines and
tissues (downloaded from the ENCODE project portal at
encodeproject.org). Overlapping the genomic coordi-
nates of these DHS with the genomic locations of the set
of fine-mapped PCa risk-associated SNPs identified 443
SNPs located within open chromatin.
Because we used DHS sites from more than 100 cell
or tissue samples, many of the SNP-associated regulatory
Fig. 1 Experimental and analytical steps used to identify PCa risk-associated regulatory elements involved in chromatin loops. Step (1): The subset
of 2,181 fine-mapped PCa-associated SNPs that overlap a DNase hypersensitive site was identified. Step (2): H3K27Ac and CTCF ChIP-seq was
performed in duplicate in two normal (PrEC and RWPE-1) and five cancer (RWPE-2, 22Rv1, C4-2B, LNCaP, and VCaP) prostate cell lines; data was
collected plus or minus DHT for 22Rv1 and LNCaP cells, for a total of 18 datasets for each mark (36 ChIP-seq samples). The SNPs in open chromatin
sites (i.e., those that are contained within a DHS site) were then subdivided into those that overlap a H3K27Ac or a CTCF site in prostate cells; the
number of PCa-associated SNPs associated with the H3K27Ac or CTCF sites is shown. Step (3): The PCa risk-associated H3K27Ac and CTCF sites were
overlapped with Hi-C looping data, and the subset of each type of site involved in chromatin loops was identified; the number of PCa-associated SNPs
associated with the H3K27Ac or CTCF sites involved in looping is shown
Guo et al. Genome Biology (2018) 19:160 Page 2 of 17
elements may not be active in prostate cells. Therefore,
as a second step, we identified the subsets of DHS-local-
ized SNPs that are within H3K27Ac or CTCF ChIP-seq
peaks that are present in prostate cells. Studies of cul-
tured prostate cancer cells and sequencing of prostate
cancers have revealed multiple distinct subgroups of
prostate cancer [17], including prostate cancer cells that
are refractory to androgen treatment, that contain the
androgen receptor splice variant AR-V7, or that express
fusion proteins such as TMPRSS2-ERG. Because we
wished to capture SNPs in regulatory elements that are
present in multiple prostate cancer subgroups, as well as
in normal prostate cells, we performed H3K27Ac and
CTCF ChIP-seq in two non-tumorigenic prostate cell
populations (PrEC and RWPE-1) and five prostate can-
cer cell lines (RWPE-2, 22Rv1, C4-2B, LNCaP, and
VCaP). PrEC are normal human prostate epithelial pri-
mary cells whereas RWPE-1 is a normal prostate epithe-
lial cell line that was immortalized by transfection with a
single copy of human papilloma virus 18 [18]. RWPE-2
cells were derived from RWPE-1 cells by transformation
with the Kirsten murine sarcoma virus [18]. LNCaP is
an androgen-sensitive prostate adenocarcinoma cell line
derived from a lymph node metastasis [19]. C4-2B is a
castration-resistant prostate cancer cell line derived from
a LNCaP xenograft that relapsed and metastasized to
bone after castration [20]; C4-2B cells do not require
androgen for proliferation, having similar growth rates
in the presence or absence of androgen [21]. VCaP cells
are derived from a metastatic lesion to a lumbar verte-
brae of a Caucasian male with hormone refractory pros-
tate cancer; VCaP is a TMPRSS2-ERG fusion-positive
prostate cancer cell line, expressing high levels of the an-
drogen receptor splice variant AR-V7 [22]. 22Rv1 is a
castration-resistant human prostate carcinoma epithelial
cell line that is derived from an androgen-dependent
CWR22 xenograft that relapsed during androgen abla-
tion [23]; this cell line also expresses the androgen re-
ceptor splice variant AR-V7. Unlike most prostate
cancer cell lines, 22Rv1 has an almost diploid karyotype.
Each ChIP-seq was performed in duplicate and, for
22Rv1 and LNCaP cells, in the presence or absence of
dihydrotestosterone (DHT), for a total of 18 datasets
for each mark (36 ChIP-seq experiments in total).
Peaks were called for individual datasets using MACS2
and the ENCODE3 pipeline [24], and only high confi-
dence (HC) peaks (defined as those peaks present in
both replicates) were used for further analysis; see
Additional file 1:FigureS1forrankedpeakgraphsfor
each HC peaks dataset, Additional file 2:TableS1for
information concerning all genomic datasets created in
this study, and Additional file 3:TableS2forlistsof
the HC ChIP-seq peaks for H3K27Ac and CTCF for
each cell line. As shown in Fig. 2,we identified
6543
28396
28972
3477
16934
21053
4908
25410
27145
5272
22125
24651
4766
19858
22469
3641
22572
26643
2719
17027
20480
2359
21235
27872
4717
22929
24429
0.00
0.25
0.50
0.75
1.00
PrEC
RWPE−1
RWPE−2
22Rv1
22Rv1+DHT
C4−2B
LNCaP
LNCap+DHT
VCaP
Annotation
3UTR
5UTR
Exon
Intergenic
Intron
Others
Promoter
TTS
CTCF high confidence peaks
5628
32872
24803
6865
52420
25109
5559
29976
21427
6570
22345
14748
6643
22460
14671
6402
23645
15730
8086
41790
20073
7146
35856
16673
6476
41716
27876
0.00
0.25
0.50
0.75
1.00
PrEC
RWPE−1
RWPE−2
22Rv1
22Rv1+DHT
C4−2B
LNCaP
LNCap+DHT
VCaP
H3K27Ac high confidence peaks
Fraction
AB
Fig. 2 Identification and classification of H3K27Ac (a) and CTCF (b) sites in prostate cells. H3K27Ac and CTCF ChIP-seq was performed in duplicate
for each cell line; for 22Rv1 and LNCaP cells, ChIP-seq was performed in duplicate in the presence or absence of DHT. Peaks were called for
individual datasets using MACS2 and the ENCODE3 pipeline, then peaks present in both replicates were identified (high confidence peaks) and
used for further analysis (see Additional file 3: Table S2). The location of the peaks was classified using the HOMER annotatePeaks.pl program and
the Gencode V19 database. The fraction of high confidence peaks in each category is shown on the Y axis, with the number of peaks in each
category for each individual cell line and/or treatment shown within each bar
Guo et al. Genome Biology (2018) 19:160 Page 3 of 17
48,796–94,688 H3K27Ac and 43,157–69,945 CTCF
sites that were reproducible in the two replicates from
each cell line and growth condition. As expected from
other studies, most of the H3K27Ac and CTCF sites
were either located in introns or were intergenic, with
asmallsubsetlocatedinpromoterregions(definedas
1kbupstreamto+100bpdownstreamfromaknown
TSS). A comparison of the set of DHS-localized SNPs
to the union set of H3K27Ac or CTCF HC peaks from
the prostate cells identified 222 PCa risk-associated
SNPs located within a DHS site that corresponds to an
H3K27Ac peak (Fig. 3) and 93 PCa risk-associated
SNPs located within a DHS site that corresponds to a
CTCF peak (Fig. 4).
Using 3D chromatin interaction datasets to identify PCa
risk-associated enhancer and CTCF sites involved in long-
range looping
In previous studies, we found that deletion of a regulatory
element that has active histone marks does not always
alter the transcriptome [13]. This suggests that not all
regulatory elements (even if marked by H3K27Ac) are
critically involved in gene regulation in that particular cell
type under those particular conditions (perhaps due to
functional redundancy of regulatory elements). We rea-
soned that one way to identify critical regulatory elements
couldbetofocusonthesubsetthatisinvolvedinchroma-
tin looping. Although analysis of Hi-C data suggests that
many of the long-range chromatin loops (e.g., those that
are anchored by CTCF sites and that define topological
associating chromatin domains (TADs)) are common to
multiple cell types, intra-TAD loops may be cell type-spe-
cific [25]. Therefore, we performed in situ Hi-C [25]in
normal prostate RWPE-1 cells [26] and in the prostate
cancer cell lines C4-2B and 22Rv1 (Rhie et al., manuscript
in preparation). For comparison, we also obtained Hi-C
and cohesin HiChiP datasets from GM12878 cells [25,
27]. We then overlapped PCa risk-associated DHS+,
K27Ac+ SNPs with the genomic coordinates of the an-
chors of the identified loops, identifying 203 SNPs located
in the DHS portion of a H3K27Ac ChIP-seq peak and as-
sociated with a chromatin loop (Fig. 3); a list of these risk
SNPs can be found in Additional file 4:TableS3.Mostof
these SNPs are located in intronic or intergenic regions,
and many are located in loops present in both prostate
and GM12878 cells. We performed similar experiments
overlapping the PCa risk-associated DHS+, CTCF+ SNPs
with the loop anchor regions and identified 85 SNPs
located in the DHS portion of a CTCF ChIP-seq peak
and associated with a chromatin loop (Fig. 4); see
Additional file 4:TableS3.Again,themajorityofthese
SNPs are located in intronic or intergenic regions.
PrEC
RWPE 1
RWPE 2
22Rv1
22Rv1+DHT
C4 2B
LnCaP
LnCaP+DHT
VCaP
Annotation
RWPE 1.HiC
22Rv1.HiC
C4 2B.HiC
GM.HiC
GM.Cohesin.HiChIP
H3K27Ac
Y
N
Annotation
Exon
Intron
Others
Promoter
TTS
Loops
Y
N
PCa fine-mapped GWAS SNPs
H3K27Ac sites Loops
Fig. 3 PCa risk SNPs associated with H3K27Ac sites and chromatin loops. Each row represents one of the 222 SNPs that are associated with both
a DHS site and a H3K27ac peak in normal or tumor prostate cells (Additional file 4: Table S3). The location of each SNP was classified using the
Gencode V19 database. “Others” represents mostly intergenic regions. To identify the subset of H3K27Ac-associated risk SNPs located in an
anchor point of a loop, chromatin loops were identified using Hi-C data from normal RWPE-1 prostate cells [26] or 22Rv1 and C4-2B prostate
tumor cells (Rhie et al., in preparation); Hi-C [25] and cohesin HiChIP data [27] from GM12878 was also used
Guo et al. Genome Biology (2018) 19:160 Page 4 of 17
Functional analysis of prostate cancer risk-associated
CTCF sites
CTCF has been shown to affect gene regulation by
several different mechanisms. For example, TADs are
formed by interaction of two convergently bound CTCFs
separated by a large number of base pairs (500 kb to
1 Mb) [25, 28–31]; the physical interaction of the CTCFs
bound to each anchor point creates a chromatin loop.
CTCF is also thought to influence enhancer-mediated
gene regulation, functioning in both positive and nega-
tive ways. For example, CTCF may help to bring an en-
hancer closer in 3D space to a target promoter via its
ability to form intra-TAD loops with other CTCF sites.
In contrast, binding of CTCF at a site between an
enhancer and promoter can, in some cases, block
long-range regulation (see the “Discussion” section). To
determine if PCa risk-associated CTCF anchor regions
that we identified to be involved in looping do in fact
control the expression of specific genes, we used the
CRISPR/Cas9 system to delete PCa risk-associated
CTCF anchor regions and then assessed the effects of
these deletions on the transcriptome (Fig. 5; see also
Additional file 5: Table S4 for sequences of guide RNAs
used for all deletion studies). Unlike most PCa cells,
22Rv1 cells are diploid; therefore, we have used these
cells for our CRISPR/Cas9 experiments. We chose to
study two PCa risk-associated CTCF anchor regions,
one on chr1 and one on chr12. These regions are both
located in intergenic regions of the genome and thus are
not easily associated a priori with a specific target gene.
Also, these regions are robustly bound by CTCF in all
nine HC peak sets and are identified as being involved
in 3D chromatin looping in all of the Hi-C or HiChIP
datasets that we analyzed. Although the chosen PCa
risk-associated SNPs are not located precisely within the
CTCF motif, they are within CTCF peaks. In a previous
study of allele-specific differences in binding strength of
CTCF in 51 lymphoblastoid cell lines, the authors found
that the majority of the nucleotide changes associated
with CTCF binding strength were within 1 kb of the
CTCF-binding motif (or in linkage disequilibrium with a
variant within 1 kb of the motif) but very few were actu-
ally in the CTCF motif itself [32].
We began by deleting the CTCF anchor region on
chr1 near PCa risk-associated SNP rs12144978. This
SNP has a strong CTCF peak nearby, is located in an
intergenic region, and was identified to be involved in
looping in five independent chromatin interaction data-
sets (Fig. 6a). Hi-C data identified two high confidence
risk loops (220 kb and 320 kb) anchored by the PCa
risk-associated CTCF site; each loop has convergent
CTCF peaks at the anchor regions (Fig. 6b, c). Both
PrEC
RWPE−1
RWPE−2
22Rv1
22Rv1+DHT
C4−2B
LnCaP
LnCaP+DHT
VCaP
Annotation
RWPE−1.HiC
22Rv1.HiC
C4−2B.HiC
GM.HiC
GM.Cohesin.HiChIP
CTCF
Y
N
Annotation
Exon
Intron
Others
Promoter
TTS
Loops
Y
N
PCa fine-mapped GWAS SNPs
CTCF binding sites Loops
Fig. 4 PCa risk SNPs associated with CTCF sites and chromatin loops. Each row represents one of the 93 SNPs that are associated with both a
DHS site and a CTCF peak in normal or tumor prostate cells (Additional file 4: Table S3). The location of each SNP was classified using the Gencode
V19 database. “Others” represents mostly intergenic regions. To identify the subset of CTCF-associated risk SNPs located in an anchor point of a loop,
chromatin loops were identified using Hi-C data from normal RWPE-1 prostate cells [26] or 22Rv1 and C4-2B prostate tumor cells (Rhie et al., in
preparation); Hi-C [25] and cohesin HiChIP data [27] from GM12878 was also used
Guo et al. Genome Biology (2018) 19:160 Page 5 of 17
loops were identified in prostate Hi-C datasets as well as
in GM12878 Hi-C and HiChIP datasets and can be visu-
ally observed in the Hi-C interaction map (blue circles
in Fig. 6b). Due to the higher resolution of the
GM12878 Hi-C dataset, the genomic locations of the an-
chor regions of the two high confidence risk loops were
taken from the GM12878 data. We note that there are
additional CTCF sites near rs12144978. However, the
other CTCF sites are 10 kb away from the anchor region
and therefore were not identified as being involved in
statistically significant loops with the prostate cancer
risk-associated CTCF site; a browser snapshot of the
CTCF ChIP-seq data and the loops identified by Hi-C
can be seen in Fig. 10 and Additional file 1: Figure S3.
Guide RNAs were introduced into 22Rv1 prostate can-
cer cells along with Cas9, and clonal populations were
analyzed to identify clones in which both chr1 alleles
were deleted for a 1607-bp region encompassing CTCF
site 1. Using RNA-seq analysis of the clonal population,
we found that deletion of the anchor region harboring
CTCF site 1 caused a large increase (almost 100-fold) in
expression of KCNN3 (Fig. 6d), which is located within
the loops anchored by the PCa risk-associated CTCF
site. Other genes within the same loops or within ±
1 Mb from the risk CTCF site did not exhibit large
changes in expression. However, other genes in the gen-
ome did show changes in expression, most likely as an
indirect effect of altered expression of the nearby
KCNN3 gene (Additional file 1: Figure S2 and Add-
itional file 6: Table S5). To determine if deletion of the
region encompassing CTCF site 3, which anchors the
larger loop but does not have a PCa risk-associated SNP
nearby, also affected expression of KCNN3, we created
clonal 22Rv1 cell populations having homozygous
deletion of a 913-bp region encompassing CTCF site 3.
RNA-seq analysis revealed a modest increase in the ex-
pression of KCNN3 in cells homozygously deleted for
site 3 (Fig. 6e). These data suggest that perhaps KCNN3
expression is regulated by maintaining its topological as-
sociations within either the 220-kb or the 320-kb loop. If
so, then deletion of the regions encompassing both sites
2 and 3 may be required to see the same effect on
KCNN3 expression as seen upon deletion of site 1. To
test the effects of deletion of individual vs. multiple
CTCF sites on KCNN3 gene expression, we introduced
guide RNAs (plus Cas9) to the regions encompassing
CTCF sites 1, 2, or 3 individually, or guide RNAs target-
ing a combination of the regions, into 22Rv1 cells, har-
vested the transfected cell pools, and then performed
RT-qPCR to measure KCNN3 gene expression (Fig. 7).
Introduction of the guide RNAs to delete a 1607-bp or
1221-bp region encompassing CTCF site 1 elicited a
90-fold increase in KCNN3 expression, similar to the
RNA-seq result shown in Fig. 6. Deletion of a 913-bp re-
gion encompassing site 3 showed a modest (less than
2-fold) increase in expression of KCNN3 (similar to the
RNA-seq results); similar results were seen upon dele-
tion of a 395-bp region encompassing site 2. Notably,
the combination of site 2 and 3 deletions did not cause a
large increase in KCNN3 expression (~7-fold). Rather,
only when the region encompassing CTCF site 1 (which
we identified as a PCa risk-associated CTCF site) was
deleted alone, or in combination with site 3, did KCNN3
expression increase 100-fold.
We next assayed the effects of deletion of the region
encompassing the CTCF site on chr12 near the PCa
risk-associated SNP rs4919742. This PCa risk-associated
CTCF peak is also located in an intergenic region and
PCR to check
deletion efficiency
CRISPR-mediated
deletion
RNA-seq of
a cell clone
qRT-PCR of
target gene
Clonal
expansion
Isolation of
single cells
Phase 1
Phase 2
Fig. 5 Experimental workflow for functional investigation of PCa risk-associated CTCF sites. Phase 1: Plasmids encoding guide RNAs that target
sequences on each side of a PCa risk-associated CTCF site were introduced into the PCa cell line 22Rv1 along with a Cas9 expression vector (see
the “Methods” section for details). The resultant cell pool was analyzed to determine deletion efficiency (red slashes represent alleles in each cell
that harbor a CTCF site deletion). Single cells were then selected and expanded into clonal populations for RNA-seq analysis. Phase 2: After
identifying the gene most responsive (within a ±1-Mb window) to deletion of the region encompassing a risk-associated CTCF site, plasmids
encoding guide RNAs that target the risk-associated CTCF anchor region and/or the regions encompassing the CTCF sites looped to the risk
CTCF site and a Cas9 expression plasmid were introduced into 22Rv1 cells; cell pools were analyzed by PCR to check deletion frequency and by
RT-qPCR to measure expression of the target gene
Guo et al. Genome Biology (2018) 19:160 Page 6 of 17
was identified to be involved in looping in five independ-
ent chromatin interaction datasets (Fig. 8a). Hi-C data
identified two loops (300 kb and 715 kb) anchored by
the PCa risk-associated CTCF site; each loop has conver-
gent CTCF peaks at the anchors (Fig. 8b, c). Similar to
the loops at CTCF site 1, both loops at CTCF site 4 were
identified in prostate Hi-C datasets as well as in
GM12878 Hi-C data and can be visually observed in the
Hi-C interaction map (blue circles in Fig. 8b). Due to
the higher resolution of the GM12878 Hi-C dataset, the
genomic locations of the anchor regions of the two high
confidence risk loops were taken from the GM12878
data. We note that there are additional CTCF sites near
rs4919742. However, the other sites were not identified
to be in statistically significant high confidence loops
linked to the prostate cancer risk-associated CTCF site
4; a browser snapshot of the CTCF ChIP-seq data and
the loops identified by Hi-C can be seen in Fig. 10 and
Additional file 1: Figure S4. Guide RNAs were intro-
duced into 22Rv1 prostate cancer cells along with Cas9,
and clonal populations were analyzed to identify clones
in which both chr12 alleles were deleted for a 2875-bp
region encompassing CTCF site 4. We found that dele-
tion of this region caused a large increase in expression
of KRT78, KRT4, KRT79, and KRT80 (Fig. 8d). KRT78,
KRT4, and KRT79 are located within the 300-kb loop
whereas KRT80 is outside of the 300-kb loop but within
the larger 715-kb loop, both of which are anchored by
the PCa risk-associated CTCF site 4. To test the effects
of deletion of individual vs. multiple CTCF sites on KRT
gene expression, we introduced guide RNAs (plus Cas9)
to regions encompassing CTCF sites 4, 5, or 6 individu-
ally, or guide RNAs that target a combination of the
sites, into 22Rv1 cells, harvested the transfected cell
A
B
C
E
D
Fig. 6 KCNN3 is upregulated upon targeted deletion of the region encompassing the CTCF site near rs12144978. a A blowup of the CTCF peak
information, genomic annotation, and looping information for rs12144978 from Fig. 4. b Hi-C chromatin interaction map of the region of chromosome
1 near the rs12144978. The location of the SNP is indicated by the blue line and arrow. The blue circles indicate the high confidence risk loops used in
the analysis.c Detailed schematic of the high confidence risk loops in which rs12144978 is involved, as identified by the Hi-C chromatin interaction
data. d Shown is the fold-change expression of all genes within a ±1-Mb region near rs12144978 in the cells deleted for a 1607-bp region
encompassing the PCa risk-associated CTCF site (site 1); a volcano plot illustrating the genome-wide analysis of the RNA-seq data can be
found in Additional file 1:FigureS2.TheyellowXindicateswhichCTCFsitehasbeendeleted. e Shown is the fold-change expression of all
genes within a ±1-Mb region near rs12144978 in the cells deleted for a 913-bp region encompassing CTCF site 3. The yellow X indicates
which CTCF site has been deleted
Guo et al. Genome Biology (2018) 19:160 Page 7 of 17
pools, and then performed RT-qPCR to measure KRT78
gene expression (Fig. 9). Introduction of guide RNAs
that would delete a 2875-bp or a 1384-bp region encom-
passing PCa risk-associated CTCF site 4 showed more
than a 100-fold increase in KRT78 expression, similar to
the RNA-seq analyses shown in Fig. 8. Deletion of
1969-bp and 5457-bp regions encompassing CTCF sites
5 or 6, respectively (which are not associated with PCa),
showed very modest increases in expression of KRT78,
whereas the combination of deletion of sites 5 and 6 did
not increase KRT78 expression. The only large changes
in KRT78 expression were in cells deleted for the region
encompassing CTCF site 4 alone or when deleted in
combination with other CTCF sites.
Finally, we investigated the cell-type specificity of the re-
sponse to deletion of the regions encompassing the PCa
risk-associated CTCF sites by also deleting these regions in
HEK293T kidney cells and HAP1 chronic myelogenous
leukemia cells. Although the KRT78 gene was upregulated
(~25-fold) in both HEK293Tand HAP1 when a 1.6-kb re-
gion encompassing PCa risk-associated CTCF site on
chr12 was deleted (Additional file 1:FigureS3),deletionof
a2.8-kb region encompassing thePCarisk-associated
CTCF site on chr1 in HEK293T or HAP1 cells did not
resultinanincreaseinKCNN3expression(Additionalfile1:
FigureS4).
PCa risk-associated CTCF loops may sequester genes from
enhancers located outside the loops
To gain insight into the mechanism by which the PCa
risk-associated CTCF sites near SNPs rs12144978 and
rs4919742 mayregulate expressionofKCNN3 andKRT78,
respectively, we examined the pattern of H3K27Ac peaks
in a large region surrounding each SNP (Fig. 10). Interest-
ingly, in both cases, the genomic regions within the loops
that are anchored by the PCa risk-associated SNP are de-
void of the active enhancer mark H3K27Ac. These are
very large genomic regions (~200–600 kb) to lack any
H3K27Ac peaks. This pattern suggested two mechanisms
by which these CTCF sites could potentially maintain ex-
pression of KCNN3 and KRT78 at low levels. First, the
loops may prevent activation of potential enhancers by
formation of a repressive chromatin structure. We deter-
mined that the loop regions anchored by the two PCa
risk-associated CTCF sites (site 1 on chr1 and site 4 on
chr12)are bothcovered byH3K27me3, whichisknownto
be associatedwithpolycomb-mediated genesilencing[33];
deletion of the risk-associated CTCF sites may result in
83.7
109.23
6.68 S2+S3
S1+S3
S3
S2
S1
Cas9 CTRL
0246
Log2 Fold Change
KCNN3 Relative Expression Level
1
1.91
1.40
Fig. 7 Analysis of the rs12144978-associated chromatin loops. Guide RNAs targeting regions encompassing CTCF site 1 (the PCa risk-associated
CTCF site), CTCF site 2, and/or CTCF site 3 (or the empty guide RNA vector as a control) were introduced into 22Rv1 prostate cancer cells, along
with Cas9. Cell pools were harvested, and KCNN3 expression was analyzed by RT-qPCR. Shown within the blue bars is the fold change in KCNN3
expression in the pools that received guide RNAs vs. the vector control. The yellow X indicates which CTCF site has been deleted; the size of
each deletion can be found in Additional file 5: Table S4
Guo et al. Genome Biology (2018) 19:160 Page 8 of 17
the formation of new enhancers within these previously
repressed regions. Alternatively, the PCa risk-associated
CTCF sites may prevent the promoters of the KCNN3 and
KRT78 genes from interacting with a pre-existing active en-
hancer(s) located outside the loop (in this case, the enhan-
cer would be marked by H3K27Ac in both control and
CRISPR-deleted cells). To distinguish these possibilities, we
performed H3K27Ac ChIP-seq in clonal population of cells
homozygously deleted for either the PCa risk-associated
CTCF site 1 on chr1 or the site 4 on chr12. Interestingly,
we found that the regions remained as enhancer deserts,
even after deletion of the PCa risk-associated CTCF sites.
Our data supports a model in which the PCa risk-associ-
ated CTCF-mediated loops insulate the KCNN3 and
KRT78 promoters from nearby pre-existing active
enhancers.
Discussion
We performed a comprehensive analysis of the regula-
tory potential of 2,181 fine-mapped PCa risk-associated
SNPs, identifying a subset of these SNPs that fall within
DHS sites located within either a H3K27Ac peak or a
CTCF peak defined by ChIP-seq datasets we produced
for normal and tumor prostate cells. After selecting the
fine-mapped SNPs that fall within these active regulatory
regions, we next identified the subset of SNPs that lie
within an anchor region of a chromatin loop, using in
situ Hi-C data from normal and tumor prostate cells.
Using this information, we predicted a set of target
genes that are regulated by PCa risk-related H3K27Ac-
marked enhancers (Additional file 7: Table S6). Finally,
we used CRISPR-mediated deletion to remove CTCF an-
chor regions that encompass PCa risk-associated CTCF
sites and also deleted regions encompassing CTCF sites
that fall within the anchor regions of the other ends of
the loops. We found that deletion of the region encom-
passing a PCa risk-associated CTCF site on chr1 or the
region encompassing a PCa risk-associated site on chr12
turns on a nearby gene located in an enhancer desert.
Our results suggest that these two PCa risk-associated
CTCF sites may function by encaging cancer-relevant
genes in repressive loops.
We focused our studies on two PCa risk-associated
genomic loci (one on chr1 and one on chr12), each of
which harbors a CTCF site which is both near a SNP
identified by fine-mapping to be related to increased risk
for PCa and identified by in situ Hi-C analysis to be in-
volved in large chromatin loops. Upon deletion of the
A
B
C
D
Fig. 8 Deletion of the region encompassing the PCa risk-associated CTCF site near rs4919742 increases KRT gene expression. a A blowup of the
CTCF peak information, genomic annotation, and looping information for rs4919742 from Fig. 4. b Hi-C chromatin interaction map of the region
of chromosome 1 near the rs4919742. The location of the SNP is indicated by the blue line and arrow. The blue circles indicate the high confidence
risk loops used in the analysis. c Detailed schematic of the high confidence risk loops in which rs4919742 is involved, as identified by the Hi-C
chromatin interaction data; there are 26 keratin genes within the loops. d Shown is the fold-change expression of all genes within a ±1-Mb
region near rs4919742 in the cells deleted for a 2875-bp region encompassing the PCa risk-associated CTCF site (site 4); a volcano plot illustrating the
genome-wide analysis of the RNA-seq data can be found in Additional file 1: Figure S2. The yellow X indicates which CTCF site has been deleted
Guo et al. Genome Biology (2018) 19:160 Page 9 of 17
region encompassing the PCa risk-associated CTCF site
on chr1, we found that KCNN3 expression was increased
~100-fold; no other genes within ±1 Mb of the risk
CTCF site on chr1 showed a large change in gene ex-
pression. Similarly, deletion of the region encompassing
the risk-associated CTCF site on chr12 caused a ~
100-fold increase in expression of KRT78; in this case,
four of the other nearby KRT genes also showed in-
creased expression, albeit not as high as KRT78. The
very large increases in gene expression that we observed
upon deletion of regions encompassing PCa risk-associ-
ated CTCF sites are interesting due to the fact that re-
moval of CTCF or the cohesin component RAD21 from
the cell has quite modest overall effects on the transcrip-
tome. Nora et al. [34] identified only a small number of
genes (~200) that were upregulated more than 10-fold
after removal of CTCF from mES cells using an auxin
degron system. The authors noted that not all genes
within a TAD responded in the same way to CTCF de-
pletion and concluded that depletion of CTCF triggers
upregulation of genes that are normally insulated from
neighboring enhancers by a TAD boundary. Similarly,
Rao et al. [35] found that auxin-mediated depletion of
RAD21 (a core component of cohesin) in HCT116 colon
cancer cells led to the upregulation of a small number of
genes (~200 genes showed at least a 30% increase in ex-
pression). These analyses of the transcriptional conse-
quences of CTCF or RAD21 depletion are similar to our
studies of CRISPR-mediated CTCF site deletion. How-
ever, the degree of upregulation that we observed upon
deletion of the regions encompassing PCa
risk-associated CTCF sites is much greater than the ma-
jority of the effects observed in the previous studies.
As noted above, we observed profound effects on gene
expression when we deleted regions encompassing
CTCF sites related to increased risk for PCa. To investi-
gate whether other nearby CTCF sites are also involved
in regulating gene expression, we also deleted two add-
itional CTCF sites on chr1 and two additional CTCF
sites on chr12 that are at the other end of the chromatin
loops formed by the risk-associated CTCF sites. We
found that on both chr1 and chr12, deletion of either
one of the CTCF sites that pair with the PCa
risk-associated CTCF site had little effect on gene ex-
pression. One might expect that simultaneous deletion
of both of the pairing CTCF anchors would cause an in-
crease in gene expression. However, single deletion of
the region encompassing the PCa risk-associated CTCF
118.19
26.7
34.39
S5+S6
S4+S6
S4+S5
S6
S5
S4
Cas9 CTRL
0246
Log2 Fold Change
KRT78 Relative Expression Level
1
1.72
2.14
0.89
Fig. 9 Analysis of the rs4919742-associated chromatin loops. Guide RNAs targeting regions encompassing CTCF site 4 (the PCa risk-associated
CTCF site), CTCF site 5, and/or CTCF site 6 (or the empty guide RNA vector as a control) were introduced into 22Rv1 prostate cancer cells, along
with Cas9. Cell pools were harvested, and KRT78 expression was analyzed by RT-qPCR. Shown within the blue bars is the fold change in KRT78
expression in the pools that received guide RNAs vs the vector control. The yellow X indicates which CTCF site has been deleted; the size of each
deletion can be found in Additional file 5: Table S4
Guo et al. Genome Biology (2018) 19:160 Page 10 of 17
site had much greater effects on expression than simul-
taneous removal of the other two sites. These results
demonstrate that the increased expression of KCNN3
and KRT78 is not simply a response to the method of
CRISPR-mediated deletion but rather suggest that the
regions encompassing the PCa risk-associated CTCF
sites are more important in regulating the expression of
these genes than are the CTCF sites at the other end of
the loops. Perhaps the PCa risk-associated CTCF sites
can establish repressive loops with other CTCF sites
upon deletion of the other ends of the original loops; we
note that there are several CTCF peaks with motifs ori-
ented in the correct direction that could possibly be
adopted as a new anchor for CTCF site 1 and site 4 if
the normal loop anchor sites are deleted. Also, it is pos-
sible that other, lower frequency interactions encompass-
ing KCNN3 or KRT78 (involving CTCF site 1 or site 4,
respectively) also create repressive loops (see Add-
itional file 8: Table S7 for a list of all loops involving
CTCF site 1 and site 4). Finally, it is also possible that
other transcription factors that bind to sequences near
CTCF site 1 or site 4 (within the regions targeted for de-
letion) serve as repressors of the KCNN3 and KRT78
promoters. In this case, CTCF-mediated looping may
not be the primary mechanism by which the expression
of two genes is kept at low levels.
Both KCNN3 and KRT78 are each located within large
genomic regions that are devoid of the H3K27Ac mark.
The upregulation of KCNN3 and KRT78 upon deletion of
the risk-associated CTCF regions could be due to the cre-
ation of new active enhancers in the previous enhancer
deserts, which are covered by the repressive H3K27me3
mark in control cells. Alternatively, it has previously been
proposed that CTCF can limit gene expression by seques-
tering a gene within a loop and preventing it from being
regulated by nearby enhancers [36, 37]. Therefore, it was
possible that pre-existing enhancers, located outside the
enhancer deserts, gain access to the promoters of KCNN3
andKRT78 genes after deletion of the regions encompass-
ing the riskCTCF sites (i.e.,an enhancer adoption model).
A
B
Fig. 10 PCa risk-associated CTCF loops encompass enhancer deserts. Shown are genome browser snapshots of CTCF, CTCF motifs with orientation,
H3K27Ac, and H3K27me3 ChIP-seq data for the regions near chromatin loops associated with the rs12144978 (a) or rs4919742 (b) risk SNPs. In each
panel, the H3K27Ac ChIP-seq track for cells deleted for the region encompassing the PCa risk-associated SNP is also shown. Also shown are all fine-
mapped SNPs in each locus and the high confidence risk loops identified by Hi-C chromatin interaction data anchored by each SNP and the RefSeq
gene track. Insets show blowups of the regions containing the PCa risk-associated CTCF sites and PCa risk-associated H3K27Ac sites at each locus
Guo et al. Genome Biology (2018) 19:160 Page 11 of 17
H3K27Ac ChIP-seq analysis of clonal cell populations ho-
mozygously deleted for the regions encompassing the
risk-associated CTCF sites showed that new active en-
hancers are not created within the large enhancer deserts
(Fig. 10). Therefore, it is likely that the increased expres-
sion of KCNN3 andKRT78 is due to adoption of an exist-
ing enhancer, not creation of a new enhancer (Fig. 11).
We note that not all nearby genes are upregulated when
the regions encompassing the PCa risk-associated CTCF
sites are deleted. This suggests that there may be some
biochemical compatibility between enhancers and pro-
moters that is required for robust activation and/or that
other factors that prime a specific promoter for activation
must be present. Interestingly, through our analysis of
H3K27Ac sites associated with PCa risk (Fig. 3), we have
identified an H3K27Ac site overlapping several PCa
risk-associated SNPs that is ~70 kb from theKCNN3 tran-
scription start site (Fig. 10a) and an H3K27Ac site overlap-
ping several PCa risk-associated SNPs that is ~60 kb
upstreamfromtheKRT78transcriptionstartsite(Fig.10b).
We note that in each case, the PCa risk-associated
H3K27Ac site is the closest H3K27Ac site to the deleted
CTCF site and is the first H3K27Ac at the edge of the en-
hancer desert. Thus, these PCa risk-associated H3K27Ac
sites may be involved in “enhancer adoption” by the pro-
motersoftheKCNN3andKRT78genesinthe cellsdeleted
for the PCa risk-associated CTCFsites.
Although the effects of deletion of other GWAS-re-
lated CTCF sites have not been reported, Gallager et al.
have proposed that a CTCF site near a SNP involved in
risk for frontotemporal lobar degeneration creates a loop
that enhances expression of TMEM106B; however, be-
cause the CTCF site was not deleted, the actual effect of
the site on gene expression is not known [38]. Several
groups have studied other disease-related CTCF sites
[39]. In most cases, the CTCF sites have resided within a
TAD boundary element and, when these sites are de-
leted, modest upregulation of a nearby gene has
occurred. For example, deletion of a TAD boundary was
shown to increase expression of PAX3, WNT6,and IHH
[40] via a proposed mechanism of enhancer adoption
made possible by removal of a repressive loop. Enhancer
adoption has also been linked to AML/MDS, Mono-
MAc/Emerger syndromes, and medulloblastoma [41,
42]. Also, investigators have shown that elimination of a
boundary site of an insulated neighborhood can mod-
estly activate expression of an oncogene [43, 44]. Other
examples of enhancer adoption include a modest upreg-
ulation of the Fnb2 gene when a CTCF site located
230 kb downstream is deleted [30] and a 3-fold increase
in PDGFRA expression upon deletion of a CTCF site
[37]. Interestingly, Ibn-Salem et al. searched the Human
Phenotype Ontology database and identified 922 dele-
tion cases in which tissue-specific enhancers have been
brought into the vicinity of developmental genes as con-
sequence of a deletion that removed a TAD boundary.
They predicted that 11% of the phenotype effects of the
deletions could be best explained by enhancer adoption
that occurs upon removal of TAD boundary [45]. Future
studies that test these predictions would help to under-
stand the global significance of repressive 3D chromatin
loops.
Conclusions
We have identified PCa risk-associated CTCF anchor re-
gions that appear to function by creating a repressive
regulatory environment; deletion of these anchor regions
results in a very large increase (~100-fold) in expression
of KCNN3 (upon deletion of the CTCF site on chr1) or
KRT78 (upon deletion of the CTCF site on chr12). A
link between KCNN3, also known as SK3, and prostate
cancer biology has been previously observed. KCNN3 is
a calcium-activated potassium channel that has been
shown to enhance tumor cell invasion in breast cancer
and malignant melanoma [46]. For example, Chantome
et al. [47] have shown that the majority of breast and
CBS
Enhancer
CBS
CTCF CTCF
Enhancer
CBS
CTCF
CBS
CTCF
CBS Knockout
A
Fig. 11 PCa risk-associated CTCF loops may sequester genes from enhancers located outside the loops. Shown is one potential model for gene
activation that occurs upon deletion of a PCa risk-associated CTCF site. In this model, the entire CTCF-binding site (CBS) is removed and therefore
the loop is broken, allowing an enhancer outside the original loop to increase the activity of a promoter located within the original loop
Guo et al. Genome Biology (2018) 19:160 Page 12 of 17
prostate cancer samples from primary tumors or bone
metastases (but not normal tissues) are positive for
KCNN3. Of note, the shRNA-mediated reduction of
KCNN3 RNA did not result in changes in cell prolifera-
tion, but rather resulted in a lower number of bone me-
tastases in a nude mouse model system. The bone is the
most frequent site of prostate carcinoma metastasis with
skeletal metastases identified at autopsy in up to 90% of
patients dying from prostate carcinoma [48–50]. Taken
together with previous studies, our work suggests that
binding of CTCF to rs12144978 may, via its repressive
role on KCCN3 expression, play a protective role regard-
ing human prostate cancer. Of clinical relevance, edelfo-
sine, a glycerophospholipid with antitumoral properties
that inhibits SK3 channel activity, can inhibit migration
and invasion of cancer cells in vitro and in vivo in an
SK3-dependent manner, pointing towards a possible use
of edelfosine in prostate cancer treatment [51–54]. Al-
though KRT78 has not previously been associated with
prostate cancer, it has been identified as a diagnostic
marker for metastatic melanoma [55] and cervical can-
cers [56]. Investigation of the function of other GWAS-
identified CTCF sites involved in chromatin loops may
reveal additional genes involved in the development or
diagnosis of prostate cancer.
Methods
Cell culture
C4-2B cells were obtained from ViroMed Laboratories
(Minneapolis, MN, USA). RWPE-1 (CRL-11609), RWPE-2
(CRL-11610), 22Rv1 (CRL-2505), LNCaP (CRL-1740), and
VCap (CRL-2876) were all obtained from American Type
Culture Collection (ATCC). The human normal prostate
epithelial cells (PrEC) were obtained from Lonza (CC-2555,
Lonza, Walkersville, MD, USA). Cells were cultured ac-
cording to the suggested protocols at 37 °C with 5% CO
2
.
The medium used to culture C4-2B (RPMI 1640), VCaP
(DMEM), LNCaP (RPMI 1640), and 22Rv1 (RPMI 1640)
was supplemented with 10% fetal bovine serum (Gibco by
Thermo Fisher, #10437036) plus 1% penicillin and 1%
streptomycin. For DHT experiments, 22Rv1 and LNCaP
cells were grown in phenol-red free RPMI 1640 with
10% charcoal-stripped fetal bovine serum for 48 h and
then treated with 10 nM DHT or vehicle for 4 h before
harvest. RWPE-1 and RWPE-2 cells were cultured in
Keratinocyte Serum Free Medium kit (Thermo Fisher
Scientific, 17005-042) without antibiotics. PrEC cells
were grown using the PrEGM Bullet Kit (Lonza,
#CC-3166). All cell lines were authenticated at the USC
Norris Cancer Center cell culture facility by compari-
son to the ATCC and/or published genomic criteria for
that specific cell line; all cells were documented to
be free of mycoplasma. Pre-authentication was per-
formed at Lonza (Walkersville, MD, USA) for PrEC.
Detailed cell culture protocols are provided for each
cell line/primary cells in Additional file 9:Cell Cul-
ture Protocols.
ChIP-seq
AllChIP-seq sampleswereperformedinduplicateaccord-
ing to a previously published protocol [57–59]. Five mi-
crograms of CTCF antibody (Active Motif #61311) was
used to precipitate 20 μg chromatin for 22Rv1, PrEC,
RWPE-2, VCaP (rep1) cells, and 10 ul CTCF antibody
(Cell Signaling #3418S) were used to precipitate 20 μg
chromatin for LNCaP, C4-2B, RWPE-1,VCaP (rep2) cells.
Eight micrograms of H3K27Ac antibody (Active Motif
#39133) was used to precipitate 20 μgchromatinforall
H3K27Ac ChIP-seq. Ten microliters of H3K27me3 anti-
body (Cell Signaling #9733S) was used to precipitate
20 μg of 22Rv1 chromatin for K27me3 ChIP-Seq. All anti-
bodies were validated according to ENCODE standards;
validationdocumentsareavailableontheENCODEportal
(encodeproject.org). ChIP-seq libraries were prepared
using Kapa Hyper prep kit (Kapa #KK8503) according to
the provided protocol. Samples were sequenced on Illu-
mina HiSeq3000 machine using paired-ended 100-bp
reads (except for H3K27Ac-LNCaP ChIP-seqs which were
sequenced using 50-bp single-ended reads). All ChIP-seq
data were mapped to hg19, and peaks were called using
MACS2 [60] after preprocessing data with the ENCODE3
ChIP-seq pipeline (https://www.encodeproject.org/chip--
seq/).Highconfidence(HC)peaks(Additionalfile3:Table
S2) were called by taking peaks that were found in both
duplicates for a given cellline/antibody combinationusing
intersectBed function fromthebedtools suite [61].
Hi-C
In situ Hi-C experiments were performed following the
original protocol by Rao et al. [25] with minor modifica-
tions [26]. Hi-C datasets were processed using the
HiC-Pro [62] to make normalized 10-kb resolution
matrices. Intra-chromosomal loops (50 kb to 10 Mb
range) were selected using Fit-Hi-C using a q value <
0.05 [63], as we have described in previous studies [26].
Hi-C chromatin interaction heatmaps were visualized
using the HiCPlotter [64].
SNP annotation
Fine-mapped SNPs from previous studies [8–10]werecu-
rated,and SNP information was extracted from dbSNP147.
SNPs were annotated (Additional file 4:TableS3)bytheir
overlap with the genomic coordinates of (a) a comprehen-
sive set of DHS downloaded from the ENCODE project
portal at encodeproject.org,(b)H3K27Achighconfidence
peaks,(c)regionscorrespondingto±1kbfromCTCFhigh
confidence peak summits, and (d) chromatin loops and
topologically associated domains from Hi-C or Cohesin
Guo et al. Genome Biology (2018) 19:160 Page 13 of 17
HiChIP data from GM12878 cells [25, 27], RWPE-1
normal prostate cells [26], and 22Rv1 and C4-2B
prostate cancer cells (Rhie et al., in preparation); an-
notation was performed using the annotateBed func-
tion in bedtools [61].
CRISPR/Cas9-mediated genomic deletions
gRNAs were cloned into pSpCas9(BB)-2A-Puro (PX459)
V2.0 plasmid (Addgene #62988) following the previously
published protocol [65]; the sequence of all guide RNAs
used in this study can be found in Additional file 5:
Table S4. 22Rv1 cells (wild type or single deletion
clones) were transfected with guide RNA and Cas9 ex-
pression plasmids using Lipofectamine LTX with PLUS
regent (Thermo Fisher, #15338100) according to the
manufacture’s protocol. After 24 h transfection, cells
were treated with 2 μg/mL puromycin for 48–72 h (en-
suring that the un-transfected control cells all died). The
media was then replaced with new media without puro-
mycin, and the cells were allowed to recover for 24–
48 h. The cells were then harvested for further analysis
or disassociated and sorted into 96-well plates with 1
cell/well using flow cytometry. The single cells were
grown into colonies, then expanded to obtain clonal
populations for further analysis. Cell pools and single
cells were harvested using QuickExtract DNA Extraction
Solution (Epicentre #QE9050) according to the manufac-
ture’s protocol and genotyped by PCR using primers
listed in Additional file 5: Table S4.
RNA analyses
Total RNA was extracted from cell pools and cell popula-
tions derived from single cell colonies usingTRIzol proto-
col (Thermo Fisher, #15596026) or DirectZol (Zymo,
#R2062). For RNA-seq, ERCC spike-in control mix 1
(Thermo Fisher, #4456704) was added before library prep-
aration, according to the manufacturer’s suggestion. Li-
braries were made using the Kapa Stranded mRNA kit
with beads (Kapa #KK8421). Samples were sequenced on
an Illumina HiSeq3000 with single-end 50-bp read length.
RNA-seq results were aligned to Gencode v19, and reads
were counted using STAR [66]. Differentially expressed
genes were determinedusing edgeR [67, 68],and batch ef-
fects were corrected using the RUVg function of RUVseq
[69]. See Additional file 2: Table S1 for more information
about the RNA-seq libraries and Additional file 6:Table
S5 for the list of genes differentially expressed in cells har-
boring deletions of PCa risk-associated CTCF sites. For
analysis of RNA from cell pools, cDNA libraries were
made using the Maxima kit (Thermo Fisher, #K1671).
qPCR was performed using SYBR Green (Bio-Rad,
#1725275) and a Bio-Rad CFX96 machine (Bio-Rad,
#1855196). See Additional file 5: Table S4 for information
concerningthe primers usedinRT-qPCRreactions.
For the analysis of site 1 by RNA-seq, a 1607-bp region
was deleted using guide RNAs 11+12; two independent
clones were identified, and each clone was analyzed in
triplicate (Fig. 6). The effects of deleting site 1 on the ex-
pression ofKCNN3 were also analyzed in a cell pool using
guide RNAs 11+12 or 35+36 (which deleted a 1221-bp re-
gion encompassing site 1), in wt cells and in a cell pool
that was previously deleted for a 913-bp region encom-
passing site 3 (Fig. 7). The effects of deleting site 2 on the
expression of KCNN3 were analyzed in a cell pool using
guide RNAs 24+26 (which deleted a 395-bp region
encompassing site 2), in wt cells and in a cell clone previ-
ously deleted for site 3 (Fig. 7). For the analysis of site 3
deletion by RNA-seq, a 913-bp region was deleted using
guide RNAs 5+6; three independent clones were identi-
fied, and eachclonewas analyzedbyRNA-seq.Theeffects
of deleting site 3 in combination with deletions of site 1
and site 2 are described above. For the analysis of site 4 by
RNA-seq,a 2875-bp region was deleted using guide RNAs
22+23; two independent clones were identified and each
clone was analyzed in triplicate by RNA-seq (Fig. 8). The
effects of deleting site 4 on the expression of KRT78 were
also analyzed in a cell pool using guide RNAs 21+37 to
delete a 1384-bp region encompassing site 4 plus guide
RNAs 40+41todeletea1969-bpregionencompassingsite
5 or guide RNAs 38+39 to delete a 5457-bp region
encompassing site 6 (Fig. 9). The effects of deleting site 5
on KRT78 expression was analyzed using guide RNAs 40
+41 alone or in combination with guide RNAs 38+39 to
delete site 6. Finally, the effects of deleting a 5457-bp re-
gion encompassing site 6 on KRT78 expression was
analyzed in a cell pool using guide RNAs 38+39
(Fig. 9); combination deletions are described above;
see Additional file 5:TableS4fordetailsofallguide
RNA locations and deletion sizes.
Additional files
Additional file 1: Figure S1. High confidence ChIP-seq peaks. Figure S2.
Genome-wide RNA-seq analysis of cells deleted for PCa risk-associated
CTCF sites. Figure S3. Deletion of PCa risk-associated CTCF site 1 in different
cell lines. Figure S4. Deletion of PCa risk-associated CTCF site 4 in different
cell lines. (PDF 2538 kb)
Additional file 2: Table S1. List of ChIP-seq and RNA-seq datasets.
18 KB (XLSX 17 kb)
Additional file 3: Table S2. ChIP-seq peaks. (XLSB 21665 kb)
Additional file 4: Table S3. Annotated SNPs. (XLSX 12 kb)
Additional file 5: Table S4. Sequences of guide RNAs and primers.
(XLSX 15 kb)
Additional file 6: Table S5. RNA-seq analyses. (XLSX 4442 kb)
Additional file 7: Table S6. Predicted genes regulated by PCa-related
K27Ac sites. (XLSX 11 kb)
Additional file 8: Table S7. 22Rv1 Hi-C interactions involving CTCF site
1 and CTCF site 4. (XLSX 12 kb)
Additional file 9: Cell culture protocols. (PDF 388 kb)
Guo et al. Genome Biology (2018) 19:160 Page 14 of 17
Abbreviations
DHS: DNase hypersensitive site; DHT: Dihydrotestosterone; GWAS: Genome-
wide association studies; PCa: Prostate cancer; SNP: Single nucleotide
polymorphisms; TAD: Topological associating chromatin domain
Acknowledgements
We thank the ENCODE Consortium for data access, the USC/Norris Cancer
Center Molecular Genomics core, the Stanford Sequencing Center, the UCLA
Technology Center for Genomics & Bioinformatics, and the USC Center for
High-performance Computing (hpc.usc.edu). We also thank Jenevieve Polin,
Jiani Shi, and Charles Nicolet for the assistance with the transient CRISPR-
mediated deletion experiments.
Funding
This work was supported by the following grants from the National Institutes
of Health: R01CA136924, U54HG006996, and P30CA014089.
Availability of data and materials
All ChIP-seq, RNA-seq, and Hi-C data generated in this study are available at
the NCBI GEO with accession number GSE118514 [70]. Access to other publicly
available datasets from GEO or ENCODE [71–74] used in this study is detailed in
Additional file 2: Table S1.
Authors’ contributions
YG, SKR, DJH, GAC, and PJF conceived and designed the experiments. YG
and AP performed the experiments. YG and SKR performed the data analysis.
YG, SKR, and PJF wrote and edited the manuscript. All authors read and
approved the final manuscript.
Ethics approval and consent to participate
Not applicable
Consent for publication
Not applicable
Competing interests
The authors declare that they have no competing interests.
Publisher’sNote
Springer Nature remains neutral with regard to jurisdictional claims in
published maps and institutional affiliations.
Author details
1
Department of Biochemistry and Molecular Medicine and the Norris
Comprehensive Cancer Center, Keck School of Medicine, University of
Southern California, 1450 Biggy Street, NRT 6503, Los Angeles, CA
90089-9601, USA.
2
Department of Biomedical Sciences and the Samuel
Oschin Comprehensive Cancer Institute, Cedars-Sinai Medical Center, Los
Angeles, CA 90048, USA.
3
Van Andel Research Institute, Grand Rapids, MI
49503, USA.
4
Department of Biochemistry and Molecular Medicine and the
Norris Comprehensive Cancer Center, Keck School of Medicine, University of
Southern California, 1450 Biggy Street, NRT G511B, Los Angeles, CA
90089-9601, USA.
Received: 17 May 2018 Accepted: 9 September 2018
References
1. Siegel RL, Miller KD, Jemal A. Cancer statistics, 2017. CA Cancer J Clin. 2017;
67:7–30.
2. Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M,
Pukkala E, Skytthe A, Hemminki K. Environmental and heritable factors in
the causation of cancer--analyses of cohorts of twins from Sweden,
Denmark, and Finland. N Engl J Med. 2000;343:78–85.
3. Al Olama AA, Kote-Jarai Z, Berndt SI, Conti DV, Schumacher F, Han Y,
Benlloch S, Hazelett DJ, Wang Z, Saunders E, et al. A meta-analysis of 87,040
individuals identifies 23 new susceptibility loci for prostate cancer. Nat
Genet. 2014;46:1103–9.
4. Thomas G, Jacobs KB, Yeager M, Kraft P, Wacholder S, Orr N, Yu K, Chatterjee N,
Welch R, Hutchinson A, et al. Multiple loci identified in a genome-wide
association study of prostate cancer. Nat Genet. 2008;40:310–5.
5. Eeles RA, Kote-Jarai Z, Al Olama AA, Giles GG, Guy M, Severi G, Muir K,
Hopper JL, Henderson BE, Haiman CA, et al. Identification of seven new
prostate cancer susceptibility loci through a genome-wide association
study. Nat Genet. 2009;41:1116–21.
6. Berndt SI, Wang Z, Yeager M, Alavanja MC, Albanes D, Amundadottir L,
Andriole G, Beane Freeman L, Campa D, Cancel-Tassin G, et al. Two
susceptibility loci identified for prostate cancer aggressiveness. Nat
Commun. 2015;6:6889.
7. Amin Al Olama SA, Schumacher F: Prostate cancer meta-analysis of more
than 140,000 men identifies 63 novel prostate cancer susceptibility loci.
Nature Genetics in press.
8. Han Y, Rand KA, Hazelett DJ, Ingles SA, Kittles RA, Strom SS, Rybicki BA,
Nemesure B, Isaacs WB, Stanford JL, et al. Prostate cancer susceptibility in
men of African ancestry at 8q24. J Natl Cancer Inst. 2016;108(7). https://doi.
org/10.1093/jnci/djv431. https://www.ncbi.nlm.nih.gov/pubmed/26823525.
9. Han Y, Hazelett DJ, Wiklund F, Schumacher FR, Stram DO, Berndt SI, Wang
Z, Rand KA, Hoover RN, Machiela MJ, et al. Integration of multiethnic fine-
mapping and genomic annotation to prioritize candidate functional SNPs at
prostate cancer susceptibility regions. Hum Mol Genet. 2015;24:5603–18.
10. Amin Al Olama A, Dadaev T, Hazelett DJ, Li Q, Leongamornlert D, Saunders
EJ, Stephens S, Cieza-Borrella C, Whitmore I, Benlloch Garcia S, et al. Multiple
novel prostate cancer susceptibility signals identified by fine-mapping of
known risk loci among Europeans. Hum Mol Genet. 2015;24:5589–602.
11. Rhie SK, Coetzee SG, Noushmehr H, Yan C, Kim JM, Haiman CA, Coetzee GA.
Comprehensive functional annotation of seventy-one breast cancer risk loci.
PLoS One. 2013;8:e63925.
12. Tak YG, Farnham PJ. Making sense of GWAS: using epigenomics and
genome engineering to understand the functional relevance of SNPs in
non-coding regions of the human genome. Epigenetics Chromatin. 2015;8:57.
13. Tak YG, Hung Y, Yao L, Grimmer MR, Do A, Bhakta MS, O'Geen H, Segal DJ,
Farnham PJ. Effects on the transcriptome upon deletion of a distal element
cannot be predicted by the size of the H3K27Ac peak in human cells.
Nucleic Acids Res. 2016;44:4123–33.
14. Hazelett DJ, Rhie SK, Gaddis M, Yan C, Lakeland DL, Coetzee SG, Henderson
BE, Noushmehr H, Cozen W, Kote-Jarai Z, et al. Comprehensive functional
annotation of 77 prostate cancer risk loci. PLoS Genet. 2014;10:e1004102.
15. Yao L, Tak YG, Berman BP, Farnham PJ. Functional annotation of colon
cancer risk SNPs. Nat Commun. 2014;5:5114.
16. Hazelett DJ, Conti DV, Han Y, Al Olama AA, Easton D, Eeles RA, Kote-Jarai Z,
Haiman CA, Coetzee GA. Reducing GWAS complexity. Cell Cycle. 2016;15:22–4.
17. Wedge DC, Gundem G, Mitchell T, Woodcock DJ, Martincorena I, Ghori M,
Zamora J, Butler A, Whitaker H, Kote-Jarai Z, et al. Sequencing of prostate
cancers identifies new cancer genes, routes of progression and drug
targets. Nat Genet. 2018;50:682–92.
18. Bello D, Webber MM, Kleinman HK, Wartinger DD, Rhim JS. Androgen
responsive adult human prostatic epithelial cell lines immortalized by
human papillomavirus 18. Carcinogenesis. 1997;18:1215–23.
19. Horoszewicz JS, Leong SS, Chu TM, Wajsman ZL, Friedman M, Papsidero L,
Kim U, Chai LS, Kakati S, Arya SK, Sandberg AA. The LNCaP cell line--a new
model for studies on human prostatic carcinoma. Prog Clin Biol Res. 1980;
37:115–32.
20. Thalmann GN, Anezinis PE, Chang SM, Zhau HE, Kim EE, Hopwood VL,
Pathak S, von Eschenbach AC, Chung LW. Androgen-independent cancer
progression and bone metastasis in the LNCaP model of human prostate
cancer. Cancer Res. 1994;54:2577–81.
21. Decker KF, Zheng D, He Y, Bowman T, Edwards JR, Jia L. Persistent androgen
receptor-mediated transcription in castration-resistant prostate cancer under
androgen-deprived conditions. Nucleic Acids Res. 2012;40:10765–79.
22. Korenchuk S, Lehr JE, MClean L, Lee YG, Whitney S, Vessella R, Lin DL, Pienta
KJ. VCaP, a cell-based model system of human prostate cancer. In Vivo.
2001;15:163–8.
23. Sramkoski RM, Pretlow TG 2nd, Giaconia JM, Pretlow TP, Schwartz S, Sy MS,
Marengo SR, Rhim JS, Zhang D, Jacobberger JW. A new human prostate
carcinoma cell line, 22Rv1. In Vitro Cell Dev Biol Anim. 1999;35:403–9.
24. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE,
Nussbaum C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of
ChIP-Seq (MACS). Genome Biol. 2008;9:R137.
25. Rao SS, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT,
Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the
human genome at kilobase resolution reveals principles of chromatin
looping. Cell. 2014;159:1665–80.
Guo et al. Genome Biology (2018) 19:160 Page 15 of 17
26. Luo Z, Rhie SK, Lay FD, Farnham PJ. A prostate cancer risk element
functions as a repressive loop that regulates HOXA13. Cell Rep. 2017;21:
1411–7.
27. Mumbach MR, Rubin AJ, Flynn RA, Dai C, Khavari PA, Greenleaf WJ, Chang
HY. HiChIP: efficient and sensitive analysis of protein-directed genome
architecture. Nat Methods. 2016;13:919–22.
28. Gomez-Marin C, Tena JJ, Acemel RD, Lopez-Mayorga M, Naranjo S, de la
Calle-Mustienes E, Maeso I, Beccari L, Aneas I, Vielmas E, et al. Evolutionary
comparison reveals that diverging CTCF sites are signatures of ancestral
topological associating domains borders. Proc Natl Acad Sci U S A. 2015;
112:7542–7.
29. Vietri Rudan M, Barrington C, Henderson S, Ernst C, Odom DT, Tanay A,
Hadjur S. Comparative Hi-C reveals that CTCF underlies evolution of
chromosomal domain architecture. Cell Rep. 2015;10:1297–309.
30. de Wit E, Vos ES, Holwerda SJ, Valdes-Quezada C, Verstegen MJ, Teunissen
H, Splinter E, Wijchers PJ, Krijger PH, de Laat W. CTCF binding polarity
determines chromatin looping. Mol Cell. 2015;60:676–84.
31. Guo Y, Xu Q, Canzio D, Shou J, Li J, Gorkin DU, Jung I, Wu H, Zhai Y, Tang Y,
et al. CRISPR inversion of CTCF sites alters genome topology and enhancer/
promoter function. Cell. 2015;162:900–10.
32. Ding Z, Ni Y, Timmer SW, Lee BK, Battenhouse A, Louzada S, Yang F,
Dunham I, Crawford GE, Lieb JD, et al. Quantitative genetics of CTCF
binding reveal local sequence effects and different modes of X-
chromosome association. PLoS Genet. 2014;10:e1004798.
33. Conway E, Healy E, Bracken AP. PRC2 mediated H3K27 methylations in
cellular identity and cancer. Curr Opin Cell Biol. 2015;37:42–8.
34. Nora EP, Goloborodko A, Valton AL, Gibcus JH, Uebersohn A, Abdennur N,
Dekker J, Mirny LA, Bruneau BG. Targeted degradation of CTCF decouples
local insulation of chromosome domains from genomic
compartmentalization. Cell. 2017;169:930–44 e922.
35. Rao SSP, Huang SC, Glenn St Hilaire B, Engreitz JM, Perez EM, Kieffer-Kwon
KR, Sanborn AL, Johnstone SE, Bascom GD, Bochkov ID, et al. Cohesin loss
eliminates all loop domains. Cell. 2017;171:305–20 e324.
36. Grimmer MR, Costello JF. Cancer: oncogene brought into the loop. Nature.
2016;529:34–5.
37. Flavahan WA, Drier Y, Liau BB, Gillespie SM, Venteicher AS, Stemmer-
Rachamimov AO, Suva ML, Bernstein BE. Insulator dysfunction and
oncogene activation in IDH mutant gliomas. Nature. 2016;529:110–4.
38. Gallagher MD, Posavi M, Huang P, Unger TL, Berlyand Y, Gruenewald AL,
Chesi A, Manduchi E, Wells AD, Grant SFA, et al. A dementia-associated risk
variant near TMEM106B alters chromatin architecture and gene expression.
Am J Hum Genet. 2017;101:643–63.
39. Lupianez DG, Spielmann M, Mundlos S. Breaking TADs: how alterations of
chromatin domains result in disease. Trends Genet. 2016;32:225–37.
40. Lupianez DG, Kraft K, Heinrich V, Krawitz P, Brancati F, Klopocki E, Horn D,
Kayserili H, Opitz JM, Laxova R, et al. Disruptions of topological chromatin
domains cause pathogenic rewiring of gene-enhancer interactions. Cell.
2015;161:1012–25.
41. Groschel S, Sanders MA, Hoogenboezem R, de Wit E, Bouwman BAM,
Erpelinck C, van der Velden VHJ, Havermans M, Avellino R, van Lom K, et al.
A single oncogenic enhancer rearrangement causes concomitant EVI1 and
GATA2 deregulation in leukemia. Cell. 2014;157:369–81.
42. Northcott PA, Lee C, Zichner T, Stutz AM, Erkek S, Kawauchi D, Shih DJ,
Hovestadt V, Zapatka M, Sturm D, et al. Enhancer hijacking activates GFI1
family oncogenes in medulloblastoma. Nature. 2014;511:428–34.
43. Hnisz D, Day DS, Young RA. Insulated neighborhoods: structural and
functional units of mammalian gene control. Cell. 2016;167:1188–200.
44. Hnisz D, Weintraub AS, Day DS, Valton AL, Bak RO, Li CH, Goldmann J,
Lajoie BR, Fan ZP, Sigova AA, et al. Activation of proto-oncogenes by
disruption of chromosome neighborhoods. Science. 2016;351:1454–8.
45. Ibn-Salem J, Kohler S, Love MI, Chung HR, Huang N, Hurles ME, Haendel M,
Washington NL, Smedley D, Mungall CJ, et al. Deletions of chromosomal
regulatory boundaries are associated with congenital disease. Genome Biol.
2014;15:423.
46. Potier M, Joulin V, Roger S, Besson P, Jourdan ML, Leguennec JY, Bougnoux
P, Vandier C. Identification of SK3 channel as a new mediator of breast
cancer cell migration. Mol Cancer Ther. 2006;5:2946–53.
47. Chantome A, Potier-Cartereau M, Clarysse L, Fromont G, Marionneau-
Lambot S, Gueguinou M, Pages JC, Collin C, Oullier T, Girault A, et al. Pivotal
role of the lipid Raft SK3-Orai1 complex in human cancer cell migration and
bone metastases. Cancer Res. 2013;73:4852–61.
48. Abrams HL, Spiro R, Goldstein N. Metastases in carcinoma; analysis of 1000
autopsied cases. Cancer. 1950;3:74–85.
49. Rana A, Chisholm GD, Khan M, Sekharjit SS, Merrick MV, Elton RA. Patterns
of bone metastasis and their prognostic significance in patients with
carcinoma of the prostate. Br J Urol. 1993;72:933–6.
50. Bubendorf L, Schopfer A, Wagner U, Sauter G, Moch H, Willi N, Gasser TC,
Mihatsch MJ. Metastatic patterns of prostate cancer: an autopsy study of
1,589 patients. Hum Pathol. 2000;31:578–83.
51. Potier M, Chantome A, Joulin V, Girault A, Roger S, Besson P, Jourdan ML,
LeGuennec JY, Bougnoux P, Vandier C. The SK3/K(Ca)2.3 potassium channel
is a new cellular target for edelfosine. Br J Pharmacol. 2011;162:464–79.
52. Girault A, Haelters JP, Potier-Cartereau M, Chantome A, Pinault M,
Marionneau-Lambot S, Oullier T, Simon G, Couthon-Gourves H, Jaffres
PA, et al. New alkyl-lipid blockers of SK3 channels reduce cancer cell
migration and occurrence of metastasis. Curr Cancer Drug Targets.
2011;11:1111–25.
53. Berthe W, Sevrain CM, Chantome A, Bouchet AM, Gueguinou M, Fourbon Y,
Potier-Cartereau M, Haelters JP, Couthon-Gourves H, Vandier C, Jaffres PA.
New disaccharide-based ether lipids as SK3 ion channel inhibitors.
ChemMedChem. 2016;11:1531–9.
54. Steinestel K, Eder S, Ehinger K, Schneider J, Genze F, Winkler E, Wardelmann
E, Schrader AJ, Steinestel J. The small conductance calcium-activated
potassium channel 3 (SK3) is a molecular target for Edelfosine to reduce the
invasive potential of urothelial carcinoma cells. Tumour Biol. 2016;37:
6275–83.
55. Wang LX, Li Y, Chen GZ. Network-based co-expression analysis for exploring
the potential diagnostic biomarkers of metastatic melanoma. PLoS One.
2018;13:e0190447.
56. Varga N, Mozes J, Keegan H, White C, Kelly L, Pilkington L, Benczik M,
Zsuzsanna S, Sobel G, Koiss R, et al. The value of a novel panel of cervical
cancer biomarkers for triage of HPV positive patients and for detecting
disease progression. Pathol Oncol Res. 2017;23:295–305.
57. Rhie SK, Yao L, Luo Z, Witt H, Schreiner S, Guo Y, Perez AA, Farnham PJ. ZFX
acts as a transcriptional activator in multiple types of human tumors by
binding downstream of transcription start sites at the majority of CpG
island promoters. Genome Res. 2018. https://doi.org/10.1101/gr.228809.117.
https://www.ncbi.nlm.nih.gov/pubmed/29429977.
58. O'Geen H, Echipare L, Farnham PJ. Using ChIP-Seq technology to generate
high-resolution profiles of histone modifications. Methods Mol Biol. 2011;
791:265–86.
59. O'Geen H, Frietze S, Farnham PJ. Using ChIP-seq technology to identify
targets of zinc finger transcription factors. Methods Mol Biol. 2010;649:
437–55.
60. Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, Nusbaum
C, Myers RM, Brown M, Li W, Liu XS. Model-based analysis of ChIP-Seq
(MACS). Genome Biol. 2008;9:R137.
61. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing
genomic features. Bioinformatics. 2010;26:841–2.
62. Servant N, Varoquaux N, Lajoie BR, Viara E, Chen CJ, Vert JP, Heard E, Dekker
J, Barillot E. HiC-Pro: an optimized and flexible pipeline for Hi-C data
processing. Genome Biol. 2015;16:259.
63. Ay F, Bailey TL, Noble WS. Statistical confidence estimation for Hi-C data
reveals regulatory chromatin contacts. Genome Res. 2014;24:999–1011.
64. Akdemir KC, Chin L. HiCPlotter integrates genomic data with interaction
matrices. Genome Biol. 2015;16:198.
65. Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F. Genome
engineering using the CRISPR-Cas9 system. Nat Protoc. 2013;8:2281–308.
66. Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P,
Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner.
Bioinformatics. 2013;29:15–21.
67. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for
differential expression analysis of digital gene expression data.
Bioinformatics. 2010;26:139–40.
68. McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of
multifactor RNA-Seq experiments with respect to biological variation.
Nucleic Acids Res. 2012;40:4288–97.
69. Risso D, Ngai J, Speed TP, Dudoit S. Normalization of RNA-seq data using
factor analysis of control genes or samples. Nat Biotechnol. 2014;32:
896–902.
70. Guo Y, Perez AA, Lay FD, Rhie SK, Farnham PJ. CRISPR-mediated deletion of
prostate cancer risk-associated CTCF loop anchors identifies repressive
Guo et al. Genome Biology (2018) 19:160 Page 16 of 17
chromatin loops. NCBI GEO. 2018. Datasets: https://www.ncbi.nlm.nih.gov/
geo/query/acc.cgi?acc=GSE118514.
71. Davis CA, Hitz BC, Sloan CA, Chan ET, Davidson JM, Gabdank I, Hilton JA,
Jain K, Baymuradov UK, Narayanan AK, et al. The Encyclopedia of DNA
elements (ENCODE): data portal update. Nucleic Acids Res. 2018;46:D794–801.
72. Moore J, Purcaro MJ, Pratt HE, Epstein CB, Shoresh N, Adrian J, Kawli T,
Davis CA, Dobin A, Kaul R, et al. ENCODE phase III: building an
encyclopedia of candidatte regulatory elements for human and mouse.
In: Revision; 2018.
73. EncodeProjectConsortium. An integrated encyclopedia of DNA elements in
the human genome. Nature. 2012;489:57–74.
74. Sloan CA, Chan ET, Davidson JM, Malladi VS, Strattan JS, Hitz BC, Gabdank I,
Narayanan AK, Ho M, Lee BT, et al. ENCODE data at the ENCODE portal.
Nucleic Acids Res. 2016;44:D726–32.
Guo et al. Genome Biology (2018) 19:160 Page 17 of 17
ZFX acts as a transcriptional activator in multiple
types of human tumors by binding downstream
from transcription start sites at the majority
of CpG island promoters
Suhn Kyong Rhie, Lijun Yao, Zhifei Luo, Heather Witt, Shannon Schreiner, Yu Guo,
Andrew A. Perez, and Peggy J. Farnham
Department of Biochemistry and Molecular Medicine and the Norris Comprehensive Cancer Center, Keck School of Medicine,
University of Southern California, Los Angeles, California 90089, USA
HighexpressionofthetranscriptionfactorZFXiscorrelatedwithproliferation,tumorigenesis,andpatientsurvivalinmul-
tiple types of human cancers. However, the mechanism by which ZFX influences transcriptional regulation has not been
determined. We performed ChIP-seq in four cancer cell lines (representing kidney, colon, prostate, and breast cancers)
to identify ZFX binding sites throughout the human genome. We identified roughly 9000 ZFX binding sites and found
thatmostofthesitesareinCpGislandpromoters.Moreover,geneswithpromotersboundbyZFXareexpressedathigher
levels than genes with promoters not bound by ZFX. To determine if ZFX contributes to regulation of the promoters to
which it is bound, we performed RNA-seq analysis afterknockdown of ZFX bysiRNA in prostateand breastcancercells.
Many genes with promoters bound by ZFX were down-regulated upon ZFX knockdown, supporting the hypothesis that
ZFXactsasatranscriptionalactivator.Surprisingly,ZFXbindsat+240bpdownstreamfromtheTSSoftheresponsivepro-
moters. Using Nucleosome Occupancy and Methylome Sequencing (NOMe-seq), we show that ZFX binds between the
openchromatinregionattheTSSandthefirstdownstreamnucleosome,suggestingthatZFXmayplayacriticalroleinpro-
moterarchitecture.WehavealsoshownthatacloselyrelatedzincfingerproteinZNF711hasasimilarbindingpatternatCpG
islandpromoters,butZNF711mayplayasubordinateroletoZFX.ThisfunctionalcharacterizationofZFXprovidesimpor-
tant new insights into transcription, chromatin structure, and the regulation of the cancer transcriptome.
[Supplemental material is available for this article.]
Altered transcriptomesare a general characteristic of human can-
cers. In many cases, the transcriptional dysregulation is driven
by altered expression levels or activity of transcription factors
(TFs) (Yao et al. 2015; Rhie et al. 2016). There are about 2000
DNA-bindingTFsinthehumangenome,butlittleisknownabout
most of these regulators (Vaquerizas et al. 2009; Wingenderet al.
2015). We previously identified distinct sets of TFs having in-
creased expression associated with different cancers (Yao et al.
2015; Rhie et al. 2016). In contrast, ZFX, a zinc finger protein
(ZNF) that containsa DNA binding domain, has been implicated
intheinitiationorprogressionofmanydifferenttypesofhuman
cancers,includingprostatecancer,breastcancer,colorectalcancer,
renalcellcarcinoma,glioma,gastriccancer,gallbladderadenocar-
cinoma, non-small cell lung carcinoma, and laryngeal squamous
cell carcinoma (Zhou et al. 2011; Fang et al. 2012; Jiang et al.
2012b,c; Nikpour et al. 2012; Li et al. 2013; Fang et al. 2014a,b;
Yang et al. 2014; Weng et al. 2015). In these previous studies, it
wasshownthathighexpressionofZFXislinkedtotumorigenesis,
andknockingdownZFXcansuppresscellularproliferationandin-
crease the proportion of apoptotic cells (Fang et al. 2014a; Jiang
and Liu 2015; Yang et al. 2015; Yan et al. 2016). In addition,
high ZFX expression correlates with poor survival of cancer pa-
tients (Jiang and Liu 2015; Li et al. 2015; Yang et al. 2015; Yan
et al. 2016). For example, ZFX expression is significantly related
tohistologicalgrade(P-value<0.001)ingallbladderadenocarcino-
ma, and patients that survived <1 yr were found to have signifi-
cantly higher ZFX expression than patients that survived >1 yr
(Weng et al. 2015). Taken together, these studies suggest that
ZFX may function as an oncogene. However, the mechanism by
which ZFX may influence transcriptional regulation in such a
diversesetofhumantumorshasnotbeendetermined.
ZFXisencodedontheXChromosomeandhighlyconserved
invertebrates.Amongtheroughly2000site-specificDNA-binding
TFs,theC2H2ZNFsarethelargestclassencodedinthehumange-
nome. Althoughthe biological functionsof the majorityof ZNFs
areunknown,themolecularfunctionsofZNFsincludenotonlyse-
quence-specificbindingto DNA butalso protein–proteininterac-
tions and RNA binding (Stubbs et al. 2011; Najafabadi et al.
2015).DNA-bindingZNFsgenerallyhavemultiple,adjacent,prop-
erlyspaced zincfingersin their DNAbindingdomain; ZNFswith
fewer than three properly spaced fingers are more likely to be in-
volved in protein–protein or protein–RNA interactions (Brown
2005; Brayerand Segal 2008). ZFX has 13 C2H2-type zinc fingers
in its putative DNA binding domain, the last nine of which are
properly spaced, supporting the hypothesis that ZFX is a DNA
Correspondingauthor:peggy.farnham@med.usc.edu
Articlepublished onlinebeforeprint.Article, supplementalmaterial,andpubli-
cation date are at http://www.genome.org/cgi/doi/10.1101/gr.228809.117.
Freelyavailable online through the Genome Research Open Access option.
©2018Rhieetal. Thisarticle,publishedinGenomeResearch,isavailableunder
a Creative Commons License (Attribution 4.0 International), as described at
http://creativecommons.org/licenses/by/4.0/.
Research
310 Genome Research 28:310–320 Published by Cold Spring Harbor Laboratory Press; ISSN 1088-9051/18; www.genome.org
www.genome.org
bindingfactor.ZFXcontainsalargeacid-
ic activation domain in addition to the
C2H2-type zinc finger-containing DNA
bindingdomain,suggestingthat,incon-
trast to the hundreds of ZNFs that con-
tain a KRAB repression domain, ZFX
maybeatranscriptionalactivator.
Although the structure of ZFX sug-
gests that it is a DNA binding transcrip-
tional activator that is expressed at high
levelsinmanydifferenttypesofcancers,
ZFX binding sites have not yet been
mapped in cancer cells. To understand
themechanismbywhichZFXmayregu-
late the cancer transcriptome, we per-
formed ChIP-seq, NOMe-seq, and RNA-
seqassayswithknockdownexperiments
in HEK293T kidney, HCT116 colon,
C4-2Bprostate,andMCF-7breastcancer
cells, identifying ZFX-binding sites and
ZFX-regulated genesthroughout the hu-
mangenome.
Results
ZFX binds to CpG island promoter
regions
Toprofilethegenome-widebindingsites
ofZFX,weperformedtwobiologicalZFX
ChIP-seq replicates using chromatin
from HEK293T kidney, HCT116 colon,
C4-2Bprostate,andMCF-7breastcancer
cells(Fig.1A;seeSupplementalFig.S1for
ZFX antibody validation and Supple-
mental Table S1 for access information
for all genomic data sets). We chose to
use thesecancermodelsbecause thereis
a strong link in these four cancer types
betweenZFXexpressionandcellprolifer-
ation, tumor development, or patient
survival.Forexample,prostatecancertis-
sues exhibit significantly higher ZFX
expression than benign prostatic hyper-
plasia and adjacent tissues and siRNA-
mediated knockdown of ZFX suppresses
the proliferation of prostate cancer cells
and reduces the number of colonies in
colony forming assays (Jiang et al.
2012a). Similarly, expression of ZFX is
highinadvancedinvasivebreastcancers
andknockdownofZFXreducestheproliferationrateofbreastcan-
cercells(Yangetal.2014).HighexpressionofZFXpromotestumor
growth of colon cancer cells and colorectal cancer patients with
high ZFX expressionhavepooreroverallanddisease-freesurvival
(JiangandLiu2015).Moreover,knockdownofZFXsuppressespro-
liferation and invasion of colon cancer cell lines (Jiang and Liu
2015).Finally,ZFXissignificantlyup-regulatedinrenalcellcarci-
nomas (RCC) and has been suggested to be a strong predictor for
prognosis of RCC patients (Li et al. 2015). Also, knockdown of
ZFXexpressioninrenalcarcinomacellsresultsinsignificantlyin-
hibitedproliferationandcellcycleprogression(Fangetal.2014a).
Weidentifiedroughly9000reproducibleZFXbindingsitesin
eachcancercellline(HEK293T:9955;HCT116:9039;C4-2B:8708;
MCF-7:9382).AnnotationoftheZFXbindingsiteswithrespectto
differentgenomicregionsshowedthat∼80%ofthesitesarelocat-
ed in a promoter region (±2kb of a TSS).In each of the celltypes
examined, only about 1000 sites are located in distal elements
(i.e.,distalsiteshavingH3K27acorCTCFpeaksorotherdistalsites
that are not marked with H3K27ac or CTCF) (Fig. 1B). To further
classify the promoter binding sites, we determined whether ZFX
preferentially binds to housekeeping, CpG island promoters, or
to more cell-type–specific, non-CpG island promoters. We found
Figure 1. Genome-wide ZFX binding profiles in multiple types of human tumors. (A) ZFX ChIP-seq
data for a region of∼150 Mb of Chromosome X for HEK293T kidney, HCT116 colon, C4-2B prostate,
andMCF-7breastcancercells(left)andforaregionof∼6kbneartheZFXpromoter(right).(B)Theper-
centageofZFXbindingsitesinpromoters(±2kbfromtheTSS),distalenhancers(H3K27ac),distalinsu-
lators(CTCFnotinenhancers),andatotherlocationsisshownforZFXChIP-seqdataforfourcelltypes.
(C)ThenumberofZFXbindingsiteslocatedinCpGislandpromotersversusnon-CpGislandpromotersis
shownforthefourcelltypes.(D)HeatmapoftheChIP-seqsignalcorrelationforZFXbindingsitesinthe
four tumor types. (E)VenndiagramsofZFXbindingsites near promoters(left)and distalregions(right)
forHEK293Tkidney,HCT116colon,C4-2Bprostate,andMCF-7breastcancercells.(F)Examplesofcell-
type–specific and common ZFX binding sites.
ZFX activates transcription at CGI promoters
GenomeResearch 311
www.genome.org
thatthemajorityoftheZFXpeaksarein
CpG island promoters (Fig. 1C). In fact,
we identified more than 13,000 CpG is-
land promoters that are bound by ZFX
in the union of the four cell lines, with
∼60%ofallactiveCpGislandpromoters
inagivencelltypebeingboundbyZFX,
including a strong peak at the promoter
of the ZFX gene (Fig. 1A, right; Supple-
mentalTablesS2,S3A–D).
The ZFX binding pattern at promoter
regionsisverysimilarindifferentcancer
types
ManyoncogenicTFsbindtodistinctcell-
type–specific distal regulatory elements
in different types of tumors (Rhie et al.
2016). However, the majority of ZFX
peaksareinpromoterregions,suggesting
that ZFX may bind to regulatory ele-
mentsthatarecommontoallcelltypes,
ratherthantocell-type–specificregulato-
ry elements. To further analyze the ZFX
binding patterns, we compared the ZFX
bindingsitesinthefourcancercelllines;
the binding patterns are similar to each
other in general (correlation coefficient
>0.5) (Fig. 1D). We then separated the
peaks into TSS proximal and non-TSS
(>2 kb from a TSS). We found that the
ZFX binding sites in promoter regions
are largely shared among the four cell
lines, with ZFX commonly binding to
more than 5000 promoter peaks in all
four cell lines (Fig. 1E; Supplemental
Table S3E). In contrast, the distal sites
bound by ZFX are not always the same
in the different cell types. We note that
boththecommonandthecell-type–spe-
cificZFXbindingsites arerobust andre-
producible(Fig.1F).
ZFX motifs are enriched at 240 bp
upstream of and downstream from
the TSS in CpG island promoters,
but ZFX prefers to bind downstream
from the TSS
To determine the preferred binding mo-
tiffor ZFX,weperformedmotifanalyses
using 20-bp windows from ZFX peak
summits. ZFX has nine properly spaced
zinc fingers; because a zinc finger can
bind to 3 nt of DNA (Desjarlais and
Berg 1992), one would expect a 27-nt
motif if all of these fingers are involved
in DNA binding. However, the motif we
identified in the majority of the ZFX
peaks was only 8 nt (AGGCCTAG) with
a strong 6-nt consensus (AGGCCT) (Fig.
2A). This motif was originally identified
Figure2. CharacterizationofZFXmotifsandbindingsites.(A)MotifsenrichedatsummitsoftheZFX
binding sites and the percentage of ZFX peaks having each motif for HEK293T kidney, HCT116 colon,
C4-2B prostate, and MCF-7 breast cancer cells; the same motifs were identified in the set of ZFX peaks
locatednearaTSSandthesetofdistalZFXpeaks.(B)AverageZFXChIP-seqsignalinC4-2Brelativeto±2
kb from the motif AGGCCTAG. (C) Histogram of the number of motifs per ZFX peak. (D) Scatterplot of
the relationship between the number of motifs per ZFX peak and ZFX peak width. (E) Number of
AGGCCTAG motifs per TSS region (±2 kb from the TSS) for different groups of promoters: (light blue)
CpG island promoters; (blue) non-CpG island promoters; (light green) CpG island promoters bound
byZFX;(green)non-CpGislandpromotersboundbyZFX;(lightpink)CpGislandpromotersnotbound
by ZFX; (red) non-CpG island promoters not bound by ZFX. Comparisons of data sets that show a sig-
nificantdifference(adjustedP-value<0.05)aremarkedwithanasterisk.(F)ExampleofZFXbindingsite
andmotifpositionatCpGislandpromotershavingone(MAP2K2)ormore(ZNF260,SOCS6)motifs.(G)
FrequencyofAGGCCTAGmotifslocated±2kbfromtheTSSofCpGislandpromoters,ofnon-CpGisland
promoters,ofCpG islandpromotersbound by ZFX,and ofCpG islandpromotersnotbound byZFX;a
comparison to results obtained using a scrambled motif can be found in Supplemental Figure S2.(H)
Average ZFX ChIP-seq signal ±2 kb from the TSS of promoters bound by ZFX in MCF-7. (I) Heatmap
representing unsupervised clustering of ZFX ChIP-seq signals in MCF-7 cells at promoters bound by
ZFX thatonly haveone TSS in a ±2 kb region (n=3961). Also shownareexamplesof ZFX binding sites
from the light green cluster (promoters having a ZFX peak only upstream of the TSS), the red cluster
(promoters havingaZFX peak both up and downstreamfromthe TSS), and the light blue cluster(pro-
moters having a ZFX peak only downstream from the TSS). Genes from each cluster are listed in
Supplemental Table S3F.
Rhie et al.
312 Genome Research
www.genome.org
from ZNF711 ChIP-seq data from the brain tumor cell line SH-
SY5Y (Kleine-Kohlbrecher et al. 2010) and from ZFX ChIP-seq
data from mouse embryonic stem cells (Chen et al. 2008). We
alsoidentifiedseveralotherCT-containingmotifs,suchasthemo-
tif for AP-2. It is possible that multiple sets of zinc fingers in ZFX
may recognize a repeating unit of a short motif. Alternatively, it
is possible that ZFX utilizes only a subset of its fingers to bind
DNA;wehavepreviouslydescribedthissituationfortwoartificial
six-fingerZNFs(Grimmeretal.2014).TofurtheranalyzetheZFX
bindingpreferences,wefocusedontheAGGCCTAGmotif,which
is found in >90% of the ZFX peaks in each cell line. We showed
that the motif is centered in the ZFX peaks, suggesting that this
motif directly recruits ZFX(Fig. 2B). However, wealso foundthat
someZFXpeakshavemanycopiesofthemotif(Fig.2C),withlarg-
er peaks having, in general, more copies (Fig. 2D). Next, we
asked whetherthismotifisfoundinallpromoterregionsoronly
in promoters bound by ZFX. We found that 48,373 of 57,820
promotersofallknowngenesinthehumangenomehaveatleast
onecopyofthismotif±2kbfromtheTSS.Infact,therearemulti-
plecopiesofthismotifinmostpromoterregions,withmoremo-
tifs in CpG island promoters (Fig. 2E). We note that promoters
boundbyZFXhave,onaverage,slightlymoremotifsthanpromot-
ers not bound by ZFX. However, the number of motifs does not
necessarilycorrespondtothenumberofZFXpeaksinapromoter.
Some promoters, like MAP2K2, have one motif and one binding
site. Other promoters, such as ZNF260and SOCS6, havemultiple
copies of the motif, but not all of the motifs are bound by ZFX
(Fig.2F).
We further investigated the distribution of the AGGCCTAG
motifrelativetotheTSS.Wefoundthatthismotifissymmetrically
enriched±240bpfromtheTSS,withCpGislandpromotersshow-
inggreaterenrichmentthannon-CpGislandpromotersandCpG
islandpromotersboundbyZFXshowinggreaterenrichmentthan
thosenotboundbyZFX(Fig.2G).Thesymmetricaldistributionof
themotif±240bprelativetotheTSSisunusualandspecificforthe
ZFX motif and not scrambled variants (Supplemental Fig. S2).
Therefore, we asked whether ZFX binding has a similar distribu-
tionor,asformostTFs,ifZFXbindsmainlyupstreamoftheTSS.
We found that ZFX has a stronger preference for binding at+240
bp downstream from the TSS (Fig. 2H). The reason that there are
some peaks at−240 bp could be due to the inclusion of bidirec-
tional promoters. To test this hypothesis, we selected the ZFX
binding sites that have only one known TSS within a ±2 kb win-
dow (n=3961) and plotted heatmaps centered to each TSS (Fig.
2I; Supplemental Table S3F). Most of the time, ZFX is bound at
+240bpoftheTSS(e.g.,FBXO6).However,thereareasmallnum-
ber of promoters (<10%) that have a ZFX peak at −240 bp (e.g.,
SEPT11),andsomepromotersthathavetwoZFXpeakssymmetri-
callylocatedoneithersideoftheTSS(e.g.,SLC7A5).Weconclude
thatthepresenceofmotif,whichissymmetricallyenriched±240
bpfromtheTSSmaybenecessary, butisnotsufficient,torecruit
ZFX because ZFX has a higher binding frequency at +240 bp
thanat−240bp.
ZFX has properties of a transcriptional activator
To determineif ZFX functionsas atranscriptional activatoror re-
pressor,weseparatelyanalyzedexpressionlevelsofgenesregulated
by promoters that are bound by ZFX versus those promoters not
boundbyZFX.Wefoundthatthemedianexpressionlevelofgenes
withpromotersboundbyZFXismuchhigherthanthemedianex-
pressionlevelofgeneswithpromotersnotboundbyZFX(Fig.3A),
suggestingthatZFXmaybeatranscriptionalactivator.Togainin-
sightintopossiblemechanismsbywhichZFXmightactivatetran-
scription, we used transient transfection with siRNA to knock
down the levels of ZFX in C4-2B prostate cancer cells and identi-
fied1271geneswhoseexpressiondecreasedand1249geneswhose
expression increased upon reduction of ZFX levels in C4-2B cells
Figure3. TheroleofZFXintranscriptionregulation.ExpressionlevelsofgeneswithactivepromotersboundbyZFXandgeneswithactivepromotersnot
boundby ZFX areshownfor C4-2B (A) and MCF-7 (D); activepromotersaredefinedby detectableexpression ofatranscriptfrom thatpromoterin that
particularcellline.Volcanoplots demonstrate differential gene expression after knockdownof ZFX in C4-2B (B) and MCF-7 (E). Comparisons of the per-
centage of down-regulated versus up-regulated genes that have ZFX bound at their promoters are shown for C4-2B (C) and MCF-7 (F) cells.
ZFX activates transcription at CGI promoters
GenomeResearch 313
www.genome.org
(FDR <0.05, fold change >1.5) (Fig. 3B; Supplemental Table S4A).
Genes identified as responsive to changes in the level of a TF in-
clude both direct target genes and genes that are in downstream
signalingpathways regulatedby the direct targetgenes(i.e.,indi-
rect targets). One approach to identify direct ZFX target genes is
to determine which of the deregulated genes have ZFX binding
sites in their promoter regions. We found that the promoters of
744ofthe1271down-regulatedgenes(58.5%),butonly143pro-
motersofthe1249up-regulatedgenes(11.4%)areboundbyZFX
inC4-2Bcells(Fig.3C).
We repeated the siRNA experiments in MCF-7 breast cancer
cells.Again,wefoundthatgeneshavingaboundZFXintheirpro-
moterhadamedianhigherexpressionlevelthangeneswithouta
bound ZFX (Fig. 3D). However, when we knocked down ZFX, we
identified only 418 genes whose expression decreased and 183
genes whose expression increased (Fig. 3E; Supplemental Table
S4B). Although the number of deregulated genes in the MCF-7
knockdownexperimentsislessthanintheC4-2Bknockdownex-
periments,thedown-regulatedgenesagainhaveahigherpercent-
age of promoter-bound ZFX (59.1%) than do the up-regulated
genes(17.5%)(Fig.3F).
One explanation for the smaller effect on the transcriptome
of MCF-7 cells could be inefficient
knockdown of ZFX. However, the
reduction in ZFX was similar in the
siRNA-treated C4-2B and MCF-7 cells
(Supplemental Fig. S3A). An alternative
explanation could be that another TF is
functionally redundant with ZFX in
MCF-7. C2H2 ZNFs comprise the largest
class of site-specific DNA-binding pro-
teins encoded in the human genome
and have arisen through gene duplica-
tion followed by mutation. Specifically,
ZFX is very similar to ZFY and ZNF711
(Supplemental Fig. S3B); ZFX and
ZNF711arebothencodedontheXChro-
mosome,whereasZFYislocatedontheY
Chromosome.Overallproteinhomology
is 92% between ZFX and ZFY, with the
zincfingerdomainshaving97%homol-
ogy, suggesting that these two proteins
may have fully redundant activities
(Schreiber et al. 2014). However, MCF-7
are female breast cancer cells and thus
do not express ZFY (although C4-2B are
male, they also do not express ZFY).
Thereis55%identitybetweentheentire
ZFX and ZNF711proteins, with the zinc
finger domains having 87% identity,
also suggesting that these two TFs may
havesimilarfunctions.ZNF711isnotex-
pressed in C4-2B, but it is expressed in
MCF-7 with the expression slightly in-
creasing upon knockdown of ZFX (Sup-
plemental Fig. S3C). To investigate the
possible functional redundancy of these
twoTFs,inanindependentset of siRNA
experimentsthanshown inFigure3, we
knocked down ZFX, ZNF711, or both
TFs simultaneously in MCF-7 (Supple-
mental Table S4C). We again observed
an increase in ZNF711 expression when ZFX was knocked down
(Fig. 4A). Several thousand genes changed upon knockdown of
ZFX, but very few genes changed upon knockdown of ZNF711.
However, we detected more differentially expressed genes (n=
1847, FDR <0.05, fold change >1.5) upon knockdown of both
ZFX and ZNF711 in MCF-7 cells than in the combined single
knockdown experiments (Fig. 4B). In the double knockdown, we
identified 371 additional down-regulated genes that have ZFX
boundtotheirpromotersinMCF-7cells(Fig.4C).Thesefindings
suggestthatZNF711maysubstituteforZFXwhenZFXexpression
is reduced by knockdown in MCF-7 cells; similar results were
found when ZNF711 and ZFX were knocked down in HEK293T
cells(SupplementalFig.S4).
To gain further support for the hypothesis that ZFX and
ZNF711 are functionally redundant, we performed ChIP-seq for
ZNF711in MCF-7cells.We identified 2708ZNF711bindingsites
genome-wide (Supplemental Table S5), 98.6% of which over-
lapped with ZFX binding sites in MCF-7 cells. As expected,
ZNF711 binding sites were also enriched at CpG island promoter
regions, both in MCF-7 cells and in SH-SY5Y cells (Supplemental
Fig. S5A,B). Unlike the ZFX binding pattern in the four cancer
celltypes,ZNF711appearstohavemorecell-type–specificbinding
Figure4. CombinatorialknockdownofZFXandZNF711inMCF-7cells.(A)ZFXandZNF711expres-
sionlevelsuponknockdownofZFX,ZNF711,orbothTFsinMCF-7.Comparisonsofdatasetsthatshowa
significantdifferencearemarkedwithanasterisk(FDR<0.05).(B)Volcanoplotsdemonstratingdifferen-
tialgeneexpressionafterknockdown(kd)ofZFX,ZNF711,orbothTFs.(C)Comparisonofdown-regu-
lated genes having ZFX bound at their promoters after knockdown of ZFX, ZNF711, or both TFs.
Rhie et al.
314 Genome Research
www.genome.org
sites(SupplementalFig.S5).AcomparisonofChIP-seqtagsshows
enrichmentofZNF711atthepromoterregionsboundbyZFX,but
theZNF711ChIP-seqsignalsareweakerthantheZFXChIP-seqsig-
nalsinMCF-7cells(Fig.5A),perhapsduetodifferencesinexpres-
sion levels of the two TFs (see Fig. 4A). Importantly, the ZNF711
binding sites are also enriched at +240 bp downstream from the
TSS(Fig.5B).AcomparisonofthesetofZNF711-boundactivepro-
moterstothesetofZFX-boundactivepromotersrevealedthat99%
of promoters bound by ZNF711 are also bound by ZFX (Fig. 5C).
This binding site redundancy supports the hypothesis that
ZNF711cansubstituteforZFXwhenZFXisknockeddown.Forex-
ample,FAAHandLIPT2showstatisticallysignificantlyreducedex-
pressioninthedoubleknockdowncells(FDR<0.05)butnotinthe
singleknockdowncells,andthepromotersoftheFAAHandLIPT2
genesareboundbybothZFXandZNF711inMCF-7cells(Fig.5D).
ZFX binds adjacent to the first phased nucleosome downstream
from the TSS
Asshownabove,ZFXmotifsareenrichedat240bpbothupstream
ofanddownstreamfromtheTSS,butthemajorityofZFXpeaksare
locatedat+240bp.However,itwaspossiblethatthepromotersac-
tivatedbyZFXhaveadistinctbindingpatterncomparedtoallZFX
peaks.Therefore,wecomparedZFXbindingpatternsatpromoters
ofallexpressedgenesversusgenesdown-regulatedorup-regulated
uponZFXknockdown(Fig.6A).Interestingly,thedown-regulated
genes(genesthatmaybedirectlyactivatedbyZFX)showanicely
positioned ZFX bound at +240 bp. In contrast, the up-regulated
genes (genes that may be repressed or indirectly regulated by
ZFX) are not as highly enriched for ZFX at the +240 bp position;
ratherthereweremorepeaksatthe−240bppositionforup-regu-
lated genes (especially in MCF-7 cells). Taken together, these re-
sults suggest that ZFX may be a positive activator only when
boundat+240bpoftheTSS.
ThepreferredlocationofZFXat+240bpisauniqueposition
foraDNAbindingTF.Tofurthercharacterizetherelationshipbe-
tween the bound ZFX and open chromatin surrounding the TSS,
we used Nucleosome Occupancy and Methylome Sequencing
(NOMe-seq). This genome-wide method identifies nucleosome-
depleted regions (NDRs) and provides single molecule resolution
for both accessibility and DNA methylation, which can very pre-
cisely identify specific TF binding sites (Kelly et al. 2012). When
we used NOMe-seq to profile accessibility and DNA methylation
in MCF-7 and C4-2B cells, we found that promoters bound by
ZFXhaveamoreaccessibleregionwithlowerlevelsofDNAmeth-
ylationneartheTSSandmorehighlyphasednucleosomesdown-
streamfromtheTSScomparedtopromotersthatareactiveinthose
cellsbutnotboundbyZFX(Fig.6B).AlthoughZFXChIP-seqdoes
notallowprecisepositioningoftheboundZFX,itappearsthatthe
summit of the ZFX peak is located in the NDR downstream from
both the TSS and a bound RNA Polymerase II (RNAPII), just up-
stream of the first phased nucleosome (Fig. 6C,D). Indeed, >70%
of ZFX peaks that have a summit near +240 bp of the TSS over-
lappedwithNDRscalledbyNOMe-seqinMCF-7andC4-2Bcells
(Supplemental Table S6). Although Figure 6D shows the pattern
for all ZFX-bound promoters in MCF-7 cells, a similar pattern is
also seen if the small subset of promoters bound by ZFX only in
MCF-7cellsisanalyzed(SupplementalFig.S6).
Discussion
WeprofiledZFXbindingsitesgenome-wideinkidney,colon,pros-
tate,andbreastcancercelllines.UnlikemanyoncogenicTFsthat
bind to distal elements, ZFX binds to the majority of CpG island
promotersthatare activein cancercells,
and many genes with promoters bound
by ZFX were down-regulated upon
knockdown. Surprisingly, ZFX binds at
+240 bp downstream from the TSS of
ZFX-regulated promoters, in the open
chromatin region between the TSS and
the first downstream nucleosome.
Genome-wide analyses of open chroma-
tin and DNA methylation demonstrate
that promoters bound by ZFX have a
moreaccessibleregion,withlowerlevels
of DNA methylation near the TSS and
morehighlyphasednucleosomesdown-
stream from the TSS, compared to pro-
moters that are active but not bound by
ZFX.Takentogether,thesefindingssup-
portthehypothesisthatZFXmayactasa
transcription activator and play an im-
portant role in maintaining a nucleo-
some-free promoter region and/or in
positioning nucleosomes at many CpG
islandpromotersinthehumangenome.
In accordance with findings from
previous studies (Fang et al. 2012,
2014a; Jiang et al. 2012a; Yang et al.
2014,2015),wefoundthatthetopcate-
gories of genes affected by ZFX knock-
down are related to the cell cycle, to the
DREAM complex (which contains E2F
Figure5. ComparisonofZFXandZNF711bindingatpromotersinMCF-7cells.(A)Scatterplotofthe
normalizedZFXversusZNF711ChIP-seqtagsfortheunionsetofZFXandZNF711bindingsitesfoundin
promoters(ρ=0.85,Spearman’srankcorrelationcoefficient).(B)AverageZFX(red)andZNF711(blue)
ChIP-seq signal ±2 kb from the TSS of promoters bound by ZNF711 in MCF-7. (C) Comparison of ex-
pressed genes having ZFX or ZNF711 bound at their promoters in MCF-7. (D) Examples of ZFX and
ZNF711 binding at CpG island promoters for two genes down-regulated upon knockdown of both
ZFX and ZNF711 in MCF-7.
ZFX activates transcription at CGI promoters
GenomeResearch 315
www.genome.org
family members), and/or to genes regulated by E2F family mem-
bers(SupplementalFig.S7).Forexample,celldivisioncycle27ho-
molog (CDC27), a component of the anaphase promoting
complex/cyclosome that ubiquitinates Cyclin B (Lee and
Langhans2012),andMutLhomolog3(MLH3),whichisimplicat-
ed in maintaining DNA replication and mismatch repair (Lipkin
et al. 2000), both have a ZFX binding site downstream from the
TSS, and their expression levels are decreased upon ZFX knock-
downinbothMCF-7andC4-2B.Thus,ourresultssupportthepre-
viousstudiesthatZFXexpressionislinkedtocellproliferation.We
alsomappedZFXbindingsitesinhumannormalprostateepithe-
lial cells (PrEC) (Supplemental Fig. S8; Supplemental Table S3G).
Although the ZFX binding pattern in normal prostate cells is
very similar to the ZFX binding pattern in the prostate cancer
Figure6. TherelationshipbetweenZFXbindingandchromatinstructureatpromoters.(A)AverageZFXChIP-seqsignals±2kbfromtheTSSofdown-
regulated (red), up-regulated (blue), and all (black) genes bound by ZFX in MCF-7 (top) and C4-2B (bottom). (B) Endogenous DNA methylation (HCG)
(black)andtheaccessibility(GCH) (green) levelsfromNOMe-seq data±1kbfromthe TSSofactivepromotersbound byZFXand from theTSS ofactive
promotersnotboundbyZFXinMCF-7(top)andC4-2B(bottom).(C)ExamplesofZFXbindingsiteswithZFXChIP-seq,NOMe-seq,H3K4me3ChIP-seq,
andRNAPolymeraseIIChIP-seqsignalsinMCF-7(top)andC4-2B(bottom).(D)AverageZFX(black),H3K4me3(red),andRNAPolymeraseII(orange)ChIP-
seqsignals±1kbfromthe TSSofgenesbound byZFXinMCF-7. (E)AmodeldemonstratingtherelationshipofZFXto othercomponents ofCpGisland
promoter structure. ZFX binds at +240 bp in the nucleosome-depleted region of CpG island promoters, between the general transcription preinitiation
complex(PIC) andthe firstnucleosome inthe transcribed region.When ZFXisbound to this downstreamsite, itincreasestheexpression levelsofgenes
involved in cell proliferation; the wavy lines represent RNA levels.
Rhie et al.
316 Genome Research
www.genome.org
cell line, the ChIP-seq peaks in PrEC were considerably smaller,
suggesting that high ZFX expression in cancercells may result in
stronger binding and higher expression of genes involved in cell
proliferation.
What distinguishes a “functional” bound ZFX from
a “nonfunctional” bound ZFX?
AlthoughZFX bindsto approximately8000–9000 promotersin a
givencelltype,siRNA-mediatedknockdownofZFXresultedinal-
tered activity of only a subset of these promoters. Although the
ZFXmotifisenrichedsymmetrically±240bpfromtheTSS,ourre-
sultssuggestthatZFXactsasatranscriptionalactivatoronlywhen
boundat+240bp.However,notallpromoterswithaZFXboundat
+240bprespondedintheknockdownexperiments.Therearesev-
eralpossibilitiesthatcanexplainwhyreductionofZFXlevelsonly
affectedasmallpercentageofpromoterstowhichitisbound.First,
it is possible that the incomplete knockdown of ZFX by siRNA
treatment may have prevented the identification of all ZFX-regu-
latedgenes.Inthefuture,knockoutofZFXbyCRISPR/Cas9could
beperformedtodetermineifalargersetofZFX-regulatedgenesis
identified upon complete removal of ZFX from the cell. It is also
possiblethatcobinding of ZFXwithotherTFs is required for ZFX
toregulatetranscription.Finally,thereisthepossibilitythatother
TFsarefunctionallyredundantwithZFX.Wetestedthepossibility
that ZNF711, a TF that shares high homology and a similar DNA
binding motif to ZFX, can substitute for ZFX. Indeed, ZNF711
binding sites are shared by ZFX binding sites, and we identified
several hundred additional ZFX-bound target genes that are
down-regulated upon knockdown of both ZFX and ZNF711; per-
haps complete loss of both proteins is required to observe the
fulleffectofZFXonthetranscriptome.
How does ZFX regulate transcription of CpG island promoters
from a downstream position?
There are two main types of transcriptional regulatory elements,
promoters and enhancers. Unlike enhancers, which are located
farfromaTSS,arecell-typespecific,andarecloselylinkedtocellu-
laridentity(Rhieetal.2014,2016),promoterelementsarecrucial
forbasaltranscriptionofgenes.Themajorityofhumanpromoters
areclassifiedasCpGislandpromoters;thesepromotersaregener-
allyactiveinmostcelltypes(DeatonandBird2011).Interestingly,
ZFXisboundtomostoftheactiveCpGislandpromotersinagiven
cell.OtherTFshavebeenshowntopreferentiallybindtoCpGis-
land promoters (Rozenberg et al. 2008; Jaeger et al. 2010;
Landolin et al. 2010; Blattleret al. 2013). However, these CpG is-
land-binding TFs tend to bind upstream of the TSS (Cao et al.
2011),whereasZFXbinds240bpdownstreamfrom+1.
ComparisonofthebindingpatternsofZFXwithRNAPIIand
H3K4me3 revealed that the bound ZFX is slightly downstream
fromthe RNAPIIsignal and slightlyupstream of the downstream
peak of H3K4me3 signal (Fig. 6D). Although it is possible that
ZFX regulates release of a paused RNAPII, factors implicated in
this process are usually bound at +30 to +40 bp relative to the
TSS (Krumm et al. 1995; Shao and Zeitlinger 2017). It is unlikely
that ZFX is involved in splicing, since the binding site can be in
thefirstexonoratvariousplaceswithinthefirstintron,depending
onthesizeofthefirstexon.Moreover,RNAPIIandH3K4me3sig-
nalsaremoreenrichedatZFX-boundpromotersthanatpromoters
notboundbyZFX(SupplementalFig.S9);thesefindingsarecon-
sistent with a role for ZFX in transcriptional (not post-transcrip-
tional) regulation. ZFX does appear to be uniquely placed in
relation to the phased nucleosomes located downstream from
the TSS, and ZFX-bound promoters have a more open region
nearthestartsitethandopromotersthatareactivebutnotbound
by ZFX. Therefore, we postulate that perhaps ZFX is involved in
creating a nucleosome-depleted region in CpG island promoters
byrecruitingthetranscriptionpreinitiationcomplexand/orinpo-
sitioningthedownstreamnucleosomes(Fig.6E).
Inconclusion,weprofiledZFXbindingsitesgenome-widein
kidney,colon,prostate,andbreastcancercellsandfoundthatZFX
mayfunctionasatranscriptionalactivator,regulatingasmanyas
60% of active CpG island promoters. Because tumor cells require
abnormallyhighlevelsoftranscriptionfortheirinappropriatepro-
liferationandsurvival,increasedoveralltranscriptionmediatedby
ZFXmayexplainwhythisTFhasbeencorrelatedwithpoorprog-
nosisforavarietyof humancancers(JiangandLiu 2015;Li et al.
2015; Yang et al. 2015; Yan et al. 2016). Future studies will focus
ontestingthehypothesisthatZFXcontributestooverallhighlev-
elsoftranscriptionviaaroleinmaintainingthelargeNDRfound
at the ZFX-bound promoters. Our demonstration that ZNF711, a
TF highly related to ZFX, has a similar binding pattern suggests
that we may have identified a new class of regulatory TFs.
FurthercharacterizationoftheseTFsandtheirroleingeneregula-
tionwillprovideimportantnewinsightsintotranscription,chro-
matinstructure,andtheregulationofthecancertranscriptome.
Methods
Cell culture
The human kidney HEK293T (ATCC# CRL-3216), colon HCT116
(ATCC#CCL-247),andbreastcancerMCF-7(ATCC#HTB-22)cells
were obtained from ATCC (https://www.atcc.org/). The human
prostate cancer C4-2B cells were obtained from ViroMed
Laboratories. The human normal prostate epithelial cells (PrEC)
were obtained from Lonza (Cat# CC-2555). The corresponding
medium of each cell line (DMEM for HEK293T, McCoy’s 5A for
HCT116, RPMI 1640 for C4-2B, DMEM for MCF-7) was supple-
mented 10% fetal bovine serum (Gibco by Thermo Fisher
Scientific) and 1% penicillin and streptomycin at 37°C with 5%
CO
2
. PrEC cells were grown with PrEGM Bullet Kit (Prostate
Epithelial Cell Growth Medium with supplements), which were
obtainedfromLonza(Cat#CC-3166).Allcellstocksexceptprima-
rycells(PrEC)wereauthenticatedattheUSCNorrisCancerCenter
cellculture facility by comparison to the ATCCand/or published
genomiccriteriaforthatspecificcellline;allcellsaredocumented
to be free of mycoplasma. Preauthentication was performed at
Lonza for PrEC, and the first passage from the cultured cells was
usedfortheChIPassay.
ChIP-seq
ZFX ChIP assays were performed in HEK293T, HCT116, C4-2B,
MCF-7, and PrEC cells using a ZFX antibody(Cat# L28B6 Lot# 1,
Cell Signaling Technology) according to ENCODE standards
(Blattler et al. 2014). The ZFX antibody was validated using
siRNAs, followed by Western blots to demonstrate loss of the de-
tectedprotein band(Supplemental Fig. S1). ZNF711 ChIP-seq ex-
periments in MCF-7 cells were performed using antibodies from
two different rabbits that were generated against ZNF711 amino
acids1–358;theseantibodieshavebeenpreviouslyusedinChIP-
seq and were provided by Dr. Kristian Helin (Kleine-Kohlbrecher
et al. 2010). H3K4me3 and RNAPII ChIP-seq experiments in
C4-2B cells were performed using antibodies from Cell Signaling
Technology (Cat# 9751S) for H3K4me3 and BioLegend (Cat#
ZFX activates transcription at CGI promoters
GenomeResearch 317
www.genome.org
664906) for RNAPII. Each ZFX/ZNF711 ChIP-seq experiment in
cancer cells was performed using two biological replicates, and
ChIP-seq libraries were sequenced on an Illumina HiSeq. All
ChIP-seq data were mapped to hg19 and peaks were called using
MACS2 (Zhang et al. 2008) with the IDR tool (https://github.
com/nboley/idr) after preprocessing data with the ENCODE3
ChIP-seq pipeline (https://www.encodeproject.org/chip-seq/).
ZFX and ZNF711 binding sites are listed in Supplemental Tables
S3 and S5. A detailed description of ChIP-seq analyses can be
foundinSupplementalMethods.
Motif analyses
TodiscoverdenovomotifsenrichedintheChIP-seqpeaks,wecol-
lectedsequencesof20-bpwindowsoftheZFXpeaksummitsand
used MEME version 4.10.1 (Bailey and Elkan 1994) with a mini-
mum motif width of 6 and a maximum motif width of 12, scan-
ning both strands of the DNA sequences. The discovered motifs
were very similar to the known motifs for ZNF711 and ZFX;
AGGCCTAG motif found from HOMER (http://homer.ucsd.edu/
homer/) (Heinz et al. 2010) was originally identified from
ZNF711 ChIP-seq in SH-SY5Y (Kleine-Kohlbrecher et al. 2010)
and ZFX ChIP-seq in mouse embryonic stem cells (Chen et al.
2008). Therefore, we used known motifs to scan ZFX and
ZNF711bindingsites in fourcell typesusing findMotifsGenome.
pl script from HOMER to identify the enriched motifs and calcu-
latethepercentageofregionswiththemotifs(Fig.2A).Themotifs
reportedinFigure2Aaretheenrichedmotifs(FDR<0.05)foundin
>50%ofZFXpeaks(sequencesof20-bpwindowsoftheZFXpeak
summits)ineachcelltype.Tofurtherexaminemotifdistribution
inpromoters,wecompared theZFXmotif(AGGCCTAG),10ran-
domlyscrambledmotifshavingthesamenucleotidecomposition
astheZFXmotif,andtheETSmotif(SupplementalFig.S2).
siRNA knockdown, RT-qPCR, and RNA-seq
For transient knockdown, cells were transfected in triplicate with
100 nM of siRNA oligonucleotides targeting human ZFX (Cat#
L006572000005), ZNF711 (Cat# L008444020005), or control
oligonucleotides (Cat# D0018101005) using SMART pool
DharmaFECT transfection reagent 3 (Cat# T200301) for C4-2B
and reagent 1 (Cat# T200101) for MCF-7 (Dharmacon). Cells
were incubated for 24 h and transfected again with the same
concentrationofsiRNAs,andtheincubationwascontinuedforan
additional 24 h. RNA was extracted using TRIzol reagent (Cat#
15596-018, Thermo Fisher Scientific) following the manufacturer-
suggested protocol. cDNA was synthesized using the SuperScript
VILO cDNA Synthesis Kit (Cat# 11754-050, Life technologies).
RNA-seq libraries were made using KAPA Stranded mRNA-Seq
Kit with KAPA mRNA Capture Beads (Cat# KK8421, Kapa
Biosystems)andsequencedonanIlluminaHiSeq.Toremovebatch
effects, matched controls and knockdown samples wereprepared
and sequenced at the same time. Differentially expressed genes
wereselectedusingtheGeneSpecificAlgorithmfromPartekFlow
software with the upper quartile normalization method (Partek
Inc.).AnFDRcutoffof 0.05was usedto selectstatisticallysignifi-
cantly differently expressed genes. Differentially expressed genes
withabsolutefoldchange>1.5arelistedinSupplementalTableS4.
NOMe-seq
The NOMe-seq method is a combination of the genome-wide
identification of open chromatin regions plus whole-genome
bisulfite sequencing (to identify methylated DNA). The first step
of the method is based on the treatment of chromatin with the
M.CviPImethyltransferase.ThisenzymemethylatesCsinthecon-
textofGpCdinucleotides.GpC
m
doesnotoccurinthehumange-
nome (the vast majority of DNA methylation in the human
genome is at CpG dinucleotides, not GpC dinucleotides) and
therefore there is no endogenous background of GpC
m
. The en-
zyme can only methylate GpC dinucleotides that are accessible
in the context of chromatin, i.e., not protected by nucleosomes
or other proteins that are tightly bound to the chromatin. The
second step of the method involves bisulfite treatment of the
M.CviPI-methylated chromatin, which converts unmethylated
CstoTs.ThisallowsthedistinctionofGpCfromGpC
m
andCpG
fromC
m
pG.Usingthismethod,NDRsaredefinedasregionshav-
ing increased GpC
m
methylation over background (i.e., they are
inopenregionsandthusweremethylatedbytheM.CviPIenzyme)
that are at least 140 bp in length. Because the bisulfite treatment
also allows the distinction of CpG from C
m
pG, the endogenous
methylation status of the genome can also be obtained in the
samesequencingreaction.Itisimportanttonotethatincontrast
to the induced GpC
m,
which represents nucleosome-free, open
chromatinthatisavailableforTFbinding,theendogenousC
m
pG
represents nucleosome-bound chromatin that is not available for
TF binding. Open chromatin is expected to have high levels of
GpC
m
butlowlevelsofC
m
pG;thus,eachofthetwoseparatemeth-
ylationanalysesserveasindependent(butopposite)measuresthat
should provide matching chromatin designations (open versus
closed). C4-2B NOMe-seq data were generated as previously de-
scribed(Rhieetal. 2018)andsequenced usinganIlluminaHiSeq
2000 PE 100 bp to produce FASTQ files. FASTQ files of MCF-7
NOMe-seq data were obtained from GSE57498 (Taberlay et al.
2014).EachFASTQfilewasalignedtoabisulfite-convertedgenome
(hg19)usingBSMAP(XiandLi2009)andprocessedaspreviously
described(Rhieetal.2018).Toidentifythemethylationstatusof
CpGsites(inallHCGtrinucleotides)andGpCsites(inallGCHtri-
nucleotides) from the BAM file, the Bis-SNP (Liu et al. 2012) pro-
gram was used and the Bis-tools was used to visualize DNA
methylationandaccessibilitysignals(https://github.com/dnaase/
Bis-tools) (Lay et al. 2015). For identification of NDRs
(Supplemental Table S6), the findNDRs function in the aaRon R
packagewasused(https://github.com/astatham/aaRon).
Gene Set Enrichment Analysis and Gene Ontology analysis
Differentially expressed genes upon ZFX knockdown are selected
using FDR cutoff 0.05 and fold change cutoff 1.5 in C4-2B cells
(Supplemental Table S4), and genes bound by ZFX were selected
for Gene Set Enrichment Analysis (GSEA) and Gene Ontology
(GO)analysis.Thedifferentiallyexpressedgeneswereusedtoiden-
tify enriched gene sets using the GSEA tool (Subramanian et al.
2005).HypergeometrictestwasusedtocalculateP-value,andfalse
discovery rate (Q-value) <0.05 was used to select significantly
enriched gene sets. The same differentially expressed genes were
analyzed for enrichment in particular GO categories using
the TopGO program (https://bioconductor.org/packages/release/
bioc/html/topGO.html). A Fisher’s exact testwasperformed, and
an adjusted P-value cutoff 0.05 was used to select statistically sig-
nificantlyenrichedGOcategories(SupplementalFigs.S7,S8E).
Data access
AllChIP-seq,RNA-seq,andNOMe-seqdatageneratedinthisstudy
have been submitted to the NCBI Gene Expression Omnibus
(GEO; http://www.ncbi.nlm.nih.gov/geo/) under accession num-
ber GSE102616. Access to other publicly available data sets from
GEO or ENCODE (The ENCODE Project Consortium 2012; Sloan
etal.2016)usedinthisstudyisdetailedinSupplementalTableS1.
Rhie et al.
318 Genome Research
www.genome.org
Acknowledgments
We thank The ENCODE Project Consortium for data access, the
USC/Norris Cancer Center Molecular Genomics core, the
Stanford Sequencing Center, the UCLA Technology Center for
Genomics & Bioinformatics, USC’s Norris Medical Library bioin-
formatics service, and the USC Center for High-performance
Computing (hpcc.usc.edu). We also thank Kristian Helin for the
ZNF711antibody,CharlieNicoletforassistancewiththeZNF711
ChIP-seq experiments, and members of the Farnham laboratory
forhelpfuldiscussions.Thisworkwassupportedbythefollowing
grants from the National Institutes of Health: R01CA136924
(NCI),U54HG006996(NHGRI),andP30CA014089(NCI).
Authorcontributions:S.K.R.andP.J.F.conceivedanddesigned
the experiments; L.Y., Z.L., H.W., S.S., Y.G., and A.A.P. designed
andperformedexperimentsincelllines;S.K.R.andL.Y.performed
data analysis; and S.K.R. and P.J.F. wrote and edited the
manuscript.
References
BaileyTL,ElkanC.1994.Fittingamixturemodelbyexpectationmaximiza-
tiontodiscovermotifsinbiopolymers.ProcIntConfIntellSystMolBiol2:
28–36.
Blattler A, Yao L, Wang Y, Ye Z, Jin VX, Farnham PJ. 2013. ZBTB33 binds
unmethylatedregionsofthegenomeassociatedwithactivelyexpressed
genes. Epigenetics Chromatin6: 13.
BlattlerA,YaoL,WittH,GuoY,NicoletCM,BermanBP,FarnhamPJ.2014.
Global loss of DNA methylationuncovers intronic enhancers in genes
showingexpressionchanges. Genome Biol15:469.
BrayerKJ,SegalDJ.2008.KeepyourfingersoffmyDNA:protein–proteinin-
teractionsmediatedbyC2H2zincfingerdomains.CellBiochemBiophys
50:111–131.
BrownRS.2005.Zincfingerproteins:gettingagriponRNA.CurrOpinStruct
Biol15:94–98.
Cao AR, Rabinovich R, Xu M, Xu X, Jin VX, Farnham PJ. 2011. Genome-
wide analysis of transcription factor E2F1 mutant proteins reveals that
N-andC-terminalproteininteractiondomainsdonotparticipateintar-
geting E2F1tothe humangenome. J Biol Chem286:11985–11996.
ChenX,XuH,YuanP,FangF,HussM,VegaVB,WongE,OrlovYL,Zhang
W, Jiang J, et al. 2008. Integrationof external signaling pathwayswith
the core transcriptional network in embryonic stem cells. Cell 133:
1106–1117.
DeatonAM,BirdA.2011.CpGislandsandtheregulationoftranscription.
Genes Dev25:1010–1022.
Desjarlais JR, Berg JM. 1992. Toward rules relating zinc finger protein se-
quences and DNA binding site preferences. Proc Natl Acad Sci 89:
7345–7349.
The ENCODE Project Consortium. 2012. An integrated encyclopedia of
DNA elements inthehuman genome. Nature489:57–74.
FangJ,YuZ,LianM,MaH,TaiJ,ZhangL,HanD.2012.Knockdownofzinc
fingerprotein,X-linked(ZFX)inhibitscellproliferationandinducesap-
optosisinhumanlaryngealsquamouscellcarcinoma.MolCellBiochem
360:301–307.
FangQ,FuWH,YangJ,LiX,ZhouZS,ChenZW,PanJH.2014a.Knockdown
of ZFX suppresses renal carcinoma cell growth and induces apoptosis.
Cancer Genet207:461–466.
FangX,HuangZ,ZhouW,WuQ,SloanAE,OuyangG,McLendonRE,YuJS,
Rich JN, Bao S. 2014b. The zinc finger transcription factor ZFX is re-
quired for maintaining the tumorigenic potential of glioblastoma
stemcells. Stem Cells32:2033–2047.
Grimmer MR, Stolzenburg S, Ford E, Lister R, Blancafort P, Farnham PJ.
2014. Analysis of an artificial zinc finger epigenetic modulator: wide-
spread binding but limited regulation. Nucleic Acids Res 42:
10856–10868.
HeinzS,BennerC,SpannN,BertolinoE,LinYC,LasloP,ChengJX,Murre
C,SinghH,GlassCK.2010.Simplecombinationsoflineage-determin-
ingtranscriptionfactorsprimecis-regulatoryelementsrequiredformac-
rophageandBcellidentities. Mol Cell38:576–589.
Jaeger SA, Chan ET, Berger MF, Stottmann R, Hughes TR, Bulyk ML. 2010.
Conservation and regulatory associations of a wide affinity range of
mouse transcription factorbindingsites. Genomics95:185–195.
JiangJ,LiuLY.2015.ZincfingerproteinX-linkedisoverexpressedincolo-
rectal cancer and is associated with poor prognosis. Oncol Lett 10:
810–814.
Jiang H, Zhang L, Liu J, Chen Z, Na R, Ding G, Zhang H, Ding Q. 2012a.
Knockdown of zinc finger protein X-linked inhibits prostate cancer
cellproliferationandinducesapoptosisbyactivatingcaspase-3andcas-
pase-9. Cancer Gene Ther19:684–689.
JiangM,XuS,YueW,ZhaoX,ZhangL,ZhangC,WangY.2012b.Therole
of ZFX in non-small cell lung cancer development. Oncol Res 20:
171–178.
JiangR,WangJC, Sun M,Zhang XY, WuH. 2012c. Zinc finger X-chromo-
somalprotein(ZFX)promotessolidagarcolonygrowthofosteosarcoma
cells. Oncol Res20:565–570.
KellyTK,LiuY,LayFD,LiangG,BermanBP,JonesPA.2012.Genome-wide
mappingofnucleosomepositioningandDNAmethylationwithinindi-
vidual DNA molecules. Genome Res22:2497–2506.
Kleine-Kohlbrecher D, Christensen J, Vandamme J, Abarrategui I, Bak M,
Tommerup N, Shi X, Gozani O, Rappsilber J, Salcini AE, et al. 2010. A
functional link between the histone demethylase PHF8 and the tran-
scription factor ZNF711 in X-linked mental retardation. Mol Cell 38:
165–178.
Krumm A, Hickey LB, Groudine M. 1995. Promoter-proximal pausing of
RNApolymeraseIIdefinesageneralrate-limitingstepaftertranscription
initiation. Genes Dev9: 559–572.
Landolin JM, Johnson DS, Trinklein ND, Aldred SF, Medina C, Shulha H,
WengZ,MyersRM.2010.Sequencefeaturesthatdrivehumanpromoter
function andtissue specificity. Genome Res20:890–898.
LayFD,LiuY,KellyTK,WittH,FarnhamPJ,JonesPA,BermanBP.2015.The
roleofDNAmethylationindirectingthefunctionalorganizationofthe
cancerepigenome. Genome Res25:467–477.
Lee SJ, Langhans SA. 2012. Anaphase-promoting complex/cyclosome pro-
teinCdc27isatargetforcurcumin-inducedcellcyclearrestandapopto-
sis. BMC Cancer12:44.
LiK,ZhuZC,LiuYJ,LiuJW,WangHT,XiongZQ,ShenX,HuZL,ZhengJ.
2013.ZFXknockdowninhibitsgrowthandmigrationofnon-smallcell
lung carcinomacelllineH1299. Int J Clin Exp Pathol6:2460–2467.
Li C, Li H, Zhang T, Li J, Ma F, Li M, Sui Z, Chang J. 2015. ZFX is a strong
predictor of poor prognosis in renal cell carcinoma. Med Sci Monit 21:
3380–3385.
Lipkin SM, Wang V, Jacoby R, Banerjee-Basu S, Baxevanis AD, Lynch
HT,ElliottRM,CollinsFS.2000.MLH3:aDNAmismatchrepairgeneas-
sociated with mammalian microsatellite instability. Nat Genet 24:
27–35.
LiuY,SiegmundKD,LairdPW,BermanBP.2012.Bis-SNP:combinedDNA
methylationandSNPcallingforBisulfite-seqdata.GenomeBiol13:R61.
Najafabadi HS, Mnaimneh S, Schmitges FW, Garton M, Lam KN, Yang A,
AlbuM,WeirauchMT,RadovaniE,KimPM,etal.2015.C2H2zincfin-
ger proteins greatly expand the human regulatory lexicon. Nat
Biotechnol33:555–562.
NikpourP,Emadi-BaygiM,Mohammad-HashemF,MaracyMR,Haghjooy-
Javanmard S. 2012. Differential expression of ZFX gene in gastric can-
cer. J Biosci37:85–90.
RhieSK,HazelettDJ,CoetzeeSG,YanC,NoushmehrH,CoetzeeGA.2014.
Nucleosome positioning and histone modifications define relation-
shipsbetweenregulatoryelementsandnearbygeneexpressioninbreast
epithelialcells. BMC Genomics15:331.
RhieSK,GuoY,TakYG,YaoL,ShenH,CoetzeeGA,LairdPW,FarnhamPJ.
2016.Identificationofactivatedenhancersandlinkedtranscriptionfac-
tors in breast, prostate, and kidney tumors by tracing enhancer net-
worksusing epigenetictraits. Epigenetics Chromatin9:50.
RhieSK,SchreinerS,FarnhamPJ.2018.Definingregulatoryelementsinthe
human genome using Nucleosome Occupancy and Methylome
Sequencing (NOMe-Seq). In CpG islands: methods and protocols, Vol.
1766 of Methods in molecular biology (ed. Vavouri T, Peinado MA).
Springer,NewYork(in press).
RozenbergJM,ShlyakhtenkoA,GlassK,RishiV,MyakishevMV,FitzGerald
PC, Vinson C. 2008. All and only CpG containing sequences are en-
richedinpromotersabundantlyboundbyRNApolymeraseIIinmulti-
pletissues. BMC Genomics9:67.
SchreiberF,PatricioM,MuffatoM,PignatelliM,BatemanA.2014.TreeFam
v9:anewwebsite,morespeciesandorthology-on-the-fly.Nucleic Acids
Res42:D922–D925.
Shao W, Zeitlinger J. 2017. Paused RNA polymerase II inhibits new tran-
scriptionalinitiation. NatGenet49:1045–1051.
SloanCA,ChanET,DavidsonJM,MalladiVS,StrattanJS,HitzBC,Gabdank
I, Narayanan AK, Ho M, Lee BT, et al. 2016. ENCODE data at the
ENCODEportal. Nucleic Acids Res44:D726–D732.
StubbsL,SunY,Caetano-AnollesD.2011.FunctionandevolutionofC2H2
zinc fingerarrays. Subcell Biochem52:75–94.
SubramanianA,TamayoP,MoothaVK,MukherjeeS,EbertBL,GilletteMA,
PaulovichA,PomeroySL,GolubTR,LanderES,etal.2005.Geneseten-
richment analysis: a knowledge-based approach for interpreting ge-
nome-wideexpressionprofiles. Proc Natl Acad Sci102:15545–15550.
ZFX activates transcription at CGI promoters
GenomeResearch 319
www.genome.org
Taberlay PC, Statham AL, Kelly TK, Clark SJ, Jones PA. 2014.
Reconfigurationofnucleosome-depletedregionsatdistalregulatoryel-
ements accompanies DNA methylation of enhancers and insulators in
cancer. Genome Res24:1421–1432.
Vaquerizas JM, Kummerfeld SK, Teichmann SA, Luscombe NM. 2009. A
censusofhumantranscriptionfactors:function,expressionandevolu-
tion. NatRev Genet10:252–263.
WengH,WangX,LiM,WuX,WangZ,WuW,ZhangZ,ZhangY,ZhaoS,
LiuS,etal.2015.ZincfingerX-chromosomalprotein(ZFX)isasignifi-
cantprognosticindicatorandpromotescellularmalignantpotentialin
gallbladdercancer. Cancer Biol Ther16:1462–1470.
WingenderE,SchoepsT,HaubrockM,DönitzJ.2015.TFClass:aclassifica-
tionofhumantranscriptionfactorsandtheirrodentorthologs. Nucleic
Acids Res43:D97–D102.
XiY,Li W. 2009. BSMAP: whole genome bisulfite sequence MAPping pro-
gram. BMC Bioinformatics10:232.
Yan X, Shan Z, Yan L, Zhu Q, Liu L, Xu B, Liu S, Jin Z, Gao Y. 2016. High
expression of Zinc-finger protein X-linked promotes tumor growth
and predicts a poor outcome for stage II/III colorectal cancer patients.
Oncotarget7:19680–19692.
Yang H, Lu Y, Zheng Y, Yu X, Xia X, He X, Feng W, Xing L, Ling Z. 2014.
shRNA-mediatedsilencingofZFXattenuatedtheproliferationofbreast
cancercells. Cancer Chemother Pharmacol73:569–576.
YangF,MaH,FengL,LianM,WangR,FanE,FangJ.2015.Zincfingerpro-
tein x-linked (ZFX) contributes to patient prognosis, cell proliferation
and apoptosis in human laryngeal squamous cell carcinoma. Int J Clin
Exp Pathol8: 13886–13899.
YaoL,ShenH,LairdPW,FarnhamPJ,BermanBP.2015.Inferringregulatory
element landscapes and transcription factor networks from cancer
methylomes. Genome Biol16:105.
ZhangY,LiuT,MeyerCA,EeckhouteJ,JohnsonDS,BernsteinBE,Nusbaum
C,MyersRM,BrownM,LiW,etal.2008.Model-basedanalysisofChIP-
Seq(MACS). Genome Biol9:R137.
Zhou Y, Su Z, Huang Y, Sun T, Chen S, Wu T, Chen G, Xie X, Li B, Du Z.
2011. The Zfx gene is expressed in human gliomas and is important
in the proliferation and apoptosis of the human malignant glioma
celllineU251. J Exp Clin Cancer Res30:114.
Received August 11, 2017; accepted in revised form January 26, 2018.
Rhie et al.
320 Genome Research
www.genome.org
Rhie et al. Epigenetics & Chromatin (2016) 9:50
DOI 10.1186/s13072-016-0102-4
RESEARCH
Identifi cation of activated enhancers
and linked transcription factors in breast,
prostate, and kidney tumors by tracing
enhancer networks using epigenetic traits
Suhn Kyong Rhie
1
, Yu Guo
1
, Yu Gyoung Tak
1
, Lijing Yao
1
, Hui Shen
2
, Gerhard A. Coetzee
2
, Peter W. Laird
2
and Peggy J. Farnham
1*
Abstract
Background: Although technological advances now allow increased tumor profiling, a detailed understanding
of the mechanisms leading to the development of different cancers remains elusive. Our approach toward under-
standing the molecular events that lead to cancer is to characterize changes in transcriptional regulatory networks
between normal and tumor tissue. Because enhancer activity is thought to be critical in regulating cell fate decisions,
we have focused our studies on distal regulatory elements and transcription factors that bind to these elements.
Results: Using DNA methylation data, we identified more than 25,000 enhancers that are differentially activated in
breast, prostate, and kidney tumor tissues, as compared to normal tissues. We then developed an analytical approach
called Tracing Enhancer Networks using Epigenetic Traits that correlates DNA methylation levels at enhancers with
gene expression to identify more than 800,000 genome-wide links from enhancers to genes and from genes to
enhancers. We found more than 1200 transcription factors to be involved in these tumor-specific enhancer networks.
We further characterized several transcription factors linked to a large number of enhancers in each tumor type,
including GATA3 in non-basal breast tumors, HOXC6 and DLX1 in prostate tumors, and ZNF395 in kidney tumors. We
showed that HOXC6 and DLX1 are associated with different clusters of prostate tumor-specific enhancers and confer
distinct transcriptomic changes upon knockdown in C42B prostate cancer cells. We also discovered de novo motifs
enriched in enhancers linked to ZNF395 in kidney tumors.
Conclusions: Our studies characterized tumor-specific enhancers and revealed key transcription factors involved
in enhancer networks for specific tumor types and subgroups. Our findings, which include a large set of identified
enhancers and transcription factors linked to those enhancers in breast, prostate, and kidney cancers, will facilitate
understanding of enhancer networks and mechanisms leading to the development of these cancers.
Keywords: DNA methylation, Epigenetics, Enhancer, Transcription factor, Networks
© The Author(s) 2016. This article is distributed under the terms of the Creative Commons Attribution 4.0 International License
(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium,
provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license,
and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/
publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Background
A single genome can give rise to several hundred dis-
tinct cell types that are genetically identical but display
different epigenetic marks at regulatory elements, leading
to altered gene expression. Th ere are two main types of
regulatory elements involved in transcriptional activa-
tion, promoters and enhancers. Promoters are defined as
a relatively small region surrounding a transcription start
site (TSS) of a gene and are critical for basal transcription
of that gene. Enhancers are regulatory elements, con-
taining multiple transcription factor (TF) binding sites,
which can be far upstream or downstream of the gene
Open Access
Epigenetics & Chromatin
*Correspondence: peggy.farnham@med.usc.edu
1
Department of Biochemistry and Molecular Medicine and the Norris
Comprehensive Cancer Center, Keck School of Medicine, University
of Southern California, 1450 Biggy Street, NRT G511B, Los Angeles, CA
90089-9601, USA
Full list of author information is available at the end of the article
Page 2 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
they regulate [1]. Of note, the state most consistently
linked to cellular identity is the ‘active enhancer’ state [2,
3]. In addition, previous studies have shown that epige-
netic changes at enhancers are significantly better than
those at promoters for predicting expression changes of
target genes in cancer [4, 5].
Recent studies from the Encyclopedia of DNA Ele-
ments (ENCODE) and the Roadmap Epigenome Map-
ping Consortium (REMC) have shown that more than ten
thousand enhancers can be identified using epigenomic
marks in a given cell line or tissue [6, 7]. However, it is
not clear whether all of these enhancers are functional
[8] or which gene is regulated by each enhancer. One
enhancer may regulate multiple genes, one gene may be
regulated by multiple enhancers, and an enhancer does
not always regulate the nearest gene. In addition, we do
not have a complete understanding as to which TFs bind
to and activate each enhancer in a particular cell type.
Th erefore, it is difficult to a priori develop a detailed tran-
scriptional regulatory network for a given cell type [1, 9].
In this study, we have used known enhancer regions
and have also performed Chromatin Immunoprecipi-
tation (ChIP) and Formaldehyde-Assisted Isolation of
Regulatory Elements (FAIRE) assays to annotate addi-
tional cell type-specific enhancers. Using these enhancer
regions, along with DNA methylation data generated
as part of Th e Cancer Genome Atlas (TCGA), we have
identified enhancers that are activated or inactivated
in breast, prostate, and kidney tumor tissues. To facili-
tate understanding of enhancer networks deregulated in
tumors, we have developed an approach called Tracing
Enhancer Networks using Epigenetic Traits (TENET),
which identifies enhancer and gene expression relation-
ships (links) genome-wide. Using TENET, with epig-
enomic and RNA expression data from breast, prostate,
and kidney tumor and normal tissue samples, we found
more than 25,000 differentially activated enhancers and
more than 1200 transcription factors involved in tumor-
specific enhancer networks. For example, we found that
hundreds of tumor-specific enhancers are linked to
GATA3 overexpression in non-basal breast tumors. We
showed that HOXC6 and DLX1, independent prognos-
tic markers of prostate tumors [10], are associated with
distinct clusters of tumor-specific enhancers and tran-
scriptomic changes in C42B prostate cancer cells upon
knockdown of HOXC6 and DLX1. We also discovered de
novo motifs specifically enriched in enhancers linked to
ZNF395 in kidney tumors. Our findings, which include
a large set of identified enhancers and TFs linked to
those enhancers in breast, prostate, and kidney cancers,
will facilitate understanding of disordered epigenetic
regulation and enhancer networks in tumor types and
subgroups.
Results
Identifi cation of diff erentially methylated enhancers
in breast, prostate, and kidney tumor tissues
Technologies such as ChIP , FAIRE, and DNaseI assays
combined with sequencing [11] are generally used to
identify enhancers in cell lines. However, these assays
are not amenable for use with tissue samples because
they require a large number of cells, are time consum-
ing to perform, and do not work well with frozen tissues.
However, the analysis of DNA methylation using arrays
is easier, works well with frozen tissues, and can be per-
formed using very few cells [12]. If an enhancer region is
unmethylated, it corresponds to open chromatin that can
be bound by TFs and is given an active enhancer status.
On the other hand, if an enhancer region is methylated, it
reflects closed chromatin that is not bound by TFs and is
given an inactive enhancer state.
To identify activated and inactivated enhancers spe-
cific to breast, prostate, and kidney tumor tissue samples,
we assembled a large set of genomic coordinates that
includes regions previously identified as distal regula-
tory elements by ENCODE and REMC [6, 7] as well as
enhancer locations derived from H3K27Ac ChIP-seq
data specifically generated in our laboratory for this study
(e.g., H3K27Ac ChIP-seq for MCF7, MDAMB231, and
MCF10A breast cells and for C42B and RWPE1 prostate
cells). Because recent studies have shown that a nucle-
osome-depleted region (NDR) flanked on each side by
a nucleosome having the active enhancer histone mark
H3K27Ac is where TFs actually bind [5, 13], we used
public and newly generated Nucleosome Occupancy and
Methylome Sequencing (NOMe-seq), DNaseI-seq, and
FAIRE-seq datasets to further narrow enhancer regions
(see Additional file 1: Supplementary Methods for a
detailed description of the creation of the enhancer file
and Additional file 2: Table S1 for a list of datasets). Th ese
narrowed regions represent the functional (TF binding)
compartment of the larger regions defined by ChIP-seq
data. Th e subset of these narrowed TF binding regulatory
regions represented by probes on the Illumina HM450
array was then identified for use in our study (Fig. 1).
Th e DNA methylation profiles of the probes represent-
ing the narrowed enhancer regions in tumor and normal
tissue samples were compared using 641 tumors and
66 normals for breast invasive carcinoma (BRCA), 333
tumors and 19 normals for prostate adenocarcinoma
(PRAD), and 318 tumors and 24 normals for kidney renal
clear cell carcinoma (KIRC) from TCGA. A major prob-
lem when characterizing tissues is the purity of each sam-
ple. For instance, TCGA has shown that the proportion of
normal cells and immune cells that are intermixed with
cancerous cells in a tumor tissue sample can greatly affect
the results of genetic and epigenetic analyses. Specifically,
Page 3 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
DNA methylation analysis of prostate tumors was shown
to be heavily confounded by tumor purity, with leukocyte
infiltration being a major factor of tissue contamination
[14]. To alleviate the purity effects in our analyses, we
assessed the DNA methylation levels at enhancer regions
in normal cells from the same cell type as the tumors, as
well as in other normal cells such as leukocytes, smooth
muscles, and fibroblasts. For this study, we only classi-
fied probes as hypermethylated in tumors compared to
normal if these same probes did not also have high DNA
methylation levels in leukocytes; similarly, we only clas-
sified probes as hypomethylated in tumors compared to
normal if they did not also have low levels of DNA meth-
ylation in leukocytes (Additional file 1: Figures S1, S2).
Although this winnowing likely removed some probes
that displayed tumor-specific methylation changes, we
felt that it was best to reduce potential false positives
from our analyses.
Probes were categorized to four enhancer groups:
unmethylated in both normal and tumor samples (this
group is termed “always unmethylated” and represents
enhancers active in both normal and tumor samples),
methylated in both normal and tumor samples (this
group is termed “always methylated” and represents
enhancers inactive in both normal and tumor samples),
hypermethylated in tumors as compared to normal sam-
ples (this group is termed “normal-specific enhancers”
and represents enhancers active in normal but inactive
in tumors), and hypomethylated in tumors as compared
to normal tissues (this group is termed “tumor-specific
enhancers” and represents enhancers inactive in normal
and active in tumors) (Fig. 2a). We identified more than
50,000 probes that are differentially methylated, rep-
resenting ~25,000 different enhancers that are gained
or lost in the BRCA, PRAD, or KIRC samples (Addi-
tional file 3: Table S2). Interestingly, different fractions
of probes belonged to each enhancer group across tumor
types. For example, we identified relatively more “always
methylated” probes in PRAD than in BRCA and relatively
more hypomethylated probes in BRCA and KIRC than in
PRAD. When we further compared the activity state of
enhancers in the normals versus tumors for each tumor
type, we found both common and tumor type-specific
normal-to-tumor activity changes at these enhancers.
Breast cancer
(BRCA)
Kidney cancer
(KIRC)
Prostate cancer
(PRAD)
REMC+ENCODE
cell type specific enhancer marks
Distal enhancer (>1.5kb of TSS)
Nucleosome depleted regions
DNA methylation CpG sites
mCpG
HM450
H3K27Ac
NDR
Fig. 1 Study design. To define genomic regions for analysis of enhancer activity in tumor samples, we used the genomic coordinates of enhancers
identified by REMC and ENCODE for 98 tissues or cell lines, plus genomic coordinates of additional H3K27Ac ChIP-seq peaks from several cancer
cell lines and normal cells for breast, prostate, and kidney. We then selected the subset of these regulatory elements that are located >1.5 kb from
a known transcription state site (TSS), as defined using GENCODE v19. We further narrowed the regions by intersecting with the set of ENCODE
Master DNaseI-seq peaks from 125 tissues or cell lines or DNaseI-seq, FAIRE-seq, or NOMe-seq peaks of corresponding cell types (Additional file 2:
Table S1). The HM450 array probes that overlapped the narrowed enhancer regions were then used to study enhancer activity in normal and tumor
tissues
Page 4 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
For example, among the ~6000–20,000 hypomethyl-
ated enhancer probes from the three tumor types (cor-
responding to enhancers gained in tumors), only 2514
probes identified tumor-specific enhancers in all 3 tumor
types, suggesting that there are critical TFs in each tis-
sue type that drive distinct breast, prostate, and kidney
tumor development (Additional file 1: Figure S3). Exam-
ples of tumor-specific enhancers identified using the
TCGA DNA methylation data that were confirmed to
have tumor-specific H3K27Ac ChIP signals in appropri-
ate tumor cell lines are shown in Fig. 2b.
Identifi cation of linked genes for diff erentially methylated
enhancers
To understand how enhancer activity may contrib-
ute to cancer initiation or progression, we associated
genome-wide gene expression changes with gain or loss
of enhancer activity, using an approach called TENET
(Additional file 1: Figures S1, S2, S4). Enhancers are
generally considered to regulate expression of their direct
target genes in a positive direction. Th erefore, possible
direct targets are those in which a gained activity state
of the enhancer is associated with an increase in gene
expression. Of course, there will be many other genes
whose expression is also positively associated with the
activity of an enhancer (e.g., genes whose expression
increases as a consequence of the increased expression
of the direct target gene). Th e set of genes (direct and
indirect targets) whose expression is positively associated
with the normal-specific activity of an enhancer is indi-
cated as E
N
:G
+
, and the set of genes whose expression is
positively associated with the tumor-specific activity of
an enhancer is indicated as E
T
:G
+
(Additional file 1: Fig-
ure S4 top). Conversely, genes whose expression is nega-
tively correlated with enhancer activity are not likely to be
direct targets but instead may show decreased expression
due to changed expression of, for example, a transcrip-
tional repressor that is a direct target. Th e set of genes
0
2000
4000
6000
8000
10000
Unmeth Meth Hypermeth Hypometh
Number of enhancer probes
0
5000
10000
15000
20000
Unmeth Meth Hypermeth Hypometh
Number of enhancer probes
0
2000
4000
6000
8000
10000
Unmeth Meth Hypermeth Hypometh
Number of enhancer probes
N T N T N T N T N T N T N T N T N T N T N T N T
(4471) (10531) (4092) (6251) (3033) (5364) (7522) (19882) (5172) (6657) (3910) (10730)
a
b
Scale
chr5:
5 kb hg19
1,465,000 1,470,000 1,475,000
LPCAT1
LPCAT1
20 _
0 _
20 _
0 _
Txn Factor ChIP
Master DNaseI HS
HM450 probe
UCSC Genes
753N
H3K27Ac
753T
H3K27Ac
cg07324729
Normal Tumor
Enhancer type Enhancer type Enhancer type
Unmethylated Methylated
PRAD BRCA KIRC
Scale
chr19:
5 kb hg19
49,700,000 49,705,000 49,710,000
TRPM4
TRPM4
40 _
0 _
40 _
0 _
Txn Factor ChIP
Master DNaseI HS
HM450 probe
UCSC Genes
PrEC
H3K27Ac
C42B
H3K27Ac
cg26522368
Scale
chr7:
10 kb hg19
16,890,000 16,900,000
AGR3
25 _
0 _
25 _
0 _ _
Txn Factor ChIP
Master DNaseI HS
HM450 probe
UCSC Genes
HMEC
H3K27Ac
MCF7
H3K27Ac
cg08070476
Fig. 2 Identification of differentially methylated enhancer regions. a Differentially methylated enhancer probes located in epigenetically defined
enhancers were identified by using DNA methylation profiles from TCGA for breast (BRCA), prostate (PRAD), and kidney (KIRC) tumor tissues.
Unmeth: enhancer probes unmethylated in both normal and tumor samples; Meth: enhancer probes methylated in both normal and tumor
samples; Hypermeth: enhancer probes unmethylated in normals, but methylated in tumors; Hypometh: enhancer probes methylated in normals,
but unmethylated in tumors; the number of enhancer probes for each category is shown in parentheses. b Examples of hypomethylated enhancers
(i.e., tumor-specific enhancers) are shown for BRCA (center), PRAD (left), and KIRC (right). Genome browser screen shots show genomic coordinates,
HM450 probe location, UCSC genes, H3K27Ac ChIP-seq tracks in tumor (MCF7, C42B, and 753T) and normal (HMEC, PrEC, and 753N) cells, the
ENCODE layered ChIP-seq track for 161 TFs, and the ENCODE Master DNaseI hypersensitive site track for 125 cell types
Page 5 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
whose expression is negatively associated with the nor-
mal-specific activity of an enhancer is indicated as E
N
:G
−
,
and the set of genes whose expression is negatively asso-
ciated with the tumor-specific activity of an enhancer is
indicated as E
T
:G
−
(Additional file 1: Figure S4 bottom).
In total, we identified ~800,000 enhancer:gene links in
these 4 categories (E
N
:G
+
, E
T
:G
+
, E
N
:G
−
, and E
T
:G
−
); see
Additional file 4: Table S3.
Of high interest in our study of regulatory regions
involved in tumor development is the E
T
:G
+
subset of
tumor-specific enhancers that are positively linked to
gene expression. Among the tumor-specific enhancer
probes (19,882 for BRCA, 6251 for PRAD and 10,730 for
KIRC; see Fig. 2a), only 10–20% were positively linked
(directly or indirectly) to genes. We identified 127,725
E
T
:G
+
links between 4334 probes and 6948 genes for
BRCA, 25,428 E
T
:G
+
links between 1120 probes and
5017 genes for PRAD, and 117,557 E
T
:G
+
links between
2535 probes and 6629 genes for KIRC (Additional file 4:
Table S3, Additional file 1: Figure S5).
As noted above, the links include not only direct target
genes of the enhancers but also genes whose expression
is indirectly regulated by an enhancer due to secondary
or downstream effects. Hi-C and tethered chromatin
capture suggest that direct interactions between enhanc-
ers and promoters mostly occur on the same chromo-
some within topologically associating domains, which are
about 1 Mb in length and include 4–10 genes and several
hundred enhancers [15]. To identify potential direct tar-
get genes of the enhancer probes, the distance between
the enhancer probes and linked genes that are on the
same chromosome was measured. For example, among
the 127,725 E
T
:G
+
links in BRCA, 7153 are within the
same chromosome. Of these, 313 E
T
:G
+
enhancer:gene
links were found within a 1-Mb region. For PRAD and
KIRC, 83 and 212 E
T
:G
+
enhancer:gene links were found
within a 1-Mb region, respectively (Fig. 3; Additional
file 1: Figure S6, Additional file 5: Table S4).
TFs associated with the activity of many tumor-specifi c
enhancers
Studies from the ENCODE project reported that the
average number of enhancers directly interacting with
a promoter via looping is 3.9 [9]. Although our analy-
sis is not limited to direct interactions, the majority of
genes are associated with fewer than 5 enhancer probes
(Fig. 4a; Additional file 1: Figures S7, S8, Additional file 4:
Table S3). However, strikingly, in each tumor type, a sub-
set of genes is associated with many enhancer probes.
For BRCA, among the 6948 genes whose expression is
positively associated with activity of a tumor-specific
enhancer probe, 235 genes were associated with more
than 100 enhancer probes. For PRAD, among the 5017
whose expression is positively associated with activity of
a tumor-specific enhancer probe, 91 genes were associ-
ated with more than 30 enhancer probes, and for KIRC,
among the 6629 genes whose expression is positively
associated with activity of a tumor-specific enhancer
probe, 178 genes were associated with more than 100
enhancer probes. Th e links between genes and enhancers
for those genes whose expression is positively associated
with a large number of enhancer probes can be viewed
in two ways. Either many enhancers regulate that gene,
or perhaps more likely if the gene is a TF, then the asso-
ciation can be reversed. In other words, high expression
of a TF can lead to increased occupancy (and hypometh-
ylation) at target enhancers of the TF. For each tumor
type, we identified a unique set of TFs that are linked to
enhancers; for BRCA we identified 710 TFs, for PRAD we
identified 540 TFs, and for KIRC we identified 731 TFs,
PRAD E
T
:G
+
BRCA E
T
:G
+
KIRC E
T
:G
+
10kb 100kb 1Mb10Mb100Mb >100Mb
Distance between enhancer probe and gene
Number of enhancer probe:gene links
0 200 400 600
10kb 100kb1Mb 10Mb 100Mb >100Mb
Distance between enhancer probe and gene
Number of enhancer probe:gene links
0 1000 2000 3000 4000
10kb 100kb1Mb 10Mb 100Mb>100Mb
Distance between enhancer probe and gene
Number of enhancer probe:gene links
0 1000 2000 3000 4000
Fig. 3 Distribution of enhancer probe:gene links on the same chromosome. Shown is the number of enhancer probe to gene links (E
T
:G
+
) on the
same chromosome by distance in BRCA (left, red), PRAD (center, blue), and KIRC (right, green)
Page 6 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
for a union set of ~1200 TFs (Fig. 4b; Additional file 6:
Table S5). Among those, for example, GATA3, SPDEF,
FOXA1, and ESR1 are TFs linked to hundreds of enhanc-
ers in BRCA. Similarly, HOXC6, DLX1, and HOXC4
are top TFs linked to more than a hundred enhanc-
ers in PRAD, whereas GLIS1, MAF, SAP30, TRIM15,
and ZNF395 are TFs linked to hundreds of enhanc-
ers in KIRC. We further investigated top TFs linked to
hundreds of enhancers in breast, prostate, and kidney
tumors.
Characterization of TFs linked to breast tumor-specifi c
enhancers
GATA3 is a well-studied TF that has a long history of
association with breast cancer. Th erefore, we investigated
the enhancer to gene links identified above for GATA3
using publicly available ChIP-seq data from a breast can-
cer cell line (Additional file 2: Table S1). GATA3 expres-
sion was associated with 829 breast tumor-specific
enhancer probes located on many different chromosomes
(Fig. 5a). We found that GATA3 ChIP-seq peaks from
the MCF7 ER+ breast cancer cell line were statistically
significantly enriched in the set of tumor-specific enhanc-
ers linked to GATA3 in BRCA by TENET, as compared
to all tumor-specific enhancers linked to genes or to all
tumor-specific enhancers (Fisher exact test, adj. p value
<4 × 10
−15
) (Fig. 5b); this pattern of enrichment of the
appropriate ChIP-seq peaks in the TF-linked enhancer
probe sets was also found for FOXA1 and ESR1. Th is
analysis supports the conclusion that enhancers linked
to a TF by TENET include a subset of enhancers that
are bound by that TF. As an example, the GATA3-linked
tumor-specific enhancer probe cg04747693 is within the
H3K27Ac and GATA3 ChIP-seq signals in MCF7. When
we investigated this region using whole genome bisulfite
sequencing data from TCGA, we found that CpGs near
the GATA3 peak were unmethylated in breast but not
in the other tumor types. Importantly, this probe was
specifically unmethylated in non-basal breast tumors
(Fig. 5c), likely due to the higher expression of GATA3 in
luminal, as compared to basal tumors (Fig. 5d). We rec-
ognize that the MCF7 ER+ breast cancer cell line does
not well represent all of the heterogeneous 641 breast
tumor tissue samples that we used, and thus the MCF7
Number of linked enhancer probes per gene
Frequency
050100 150
0 500 1000 1500 2000 2500
Number of linked enhancer probes per gene
Frequency
0 200 400 600 800
0 500 1000 1500 2000
Number of linked enhancer probes per gene
Frequency
050100 150200 250300 350
0 500 1000 1500
a
HOXC6
DLX1
HOXC4
FOXA1
LASS4
SIM2
HOXC5
HOXB13
EZH2
OTX1
Number of linked enhancer probes
0204060 80 100 120
GA TA3
SPDEF
FOXA1
ESR1
MYBL2
FOXM1
XBP1
ZNF695
MYB
RCOR2
Number of linked enhancer probes
0 200 400 600 800
GLIS1
MAF
SAP30
TRIM15
ZNF395
SCAF1
TFEC
CREB3L3
RUNX1
ZNF227
Number of linked enhancer probes
050100 150200
b
91 genes with
>30 linked enhancer probes
235 genes with
>100 linked enhancer probes
178 genes with
>100 linked enhancer probes
PRAD E
T
:G
+
BRCA E
T
:G
+
KIRC E
T
:G
+
Fig. 4 Identification of TFs associated with the activity of many enhancers. a Shown is the number of linked enhancer probes per gene in the E
T
:G
+
category for BRCA (left), PRAD (center), and KIRC (right). b The top 10 TFs identified to be linked to a large number of enhancer probes for BRCA (left),
PRAD (center), and KIRC (right)
Page 7 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
ChIP-seq data cannot validate all of the tumor-specific
enhancers we identified. Th erefore, we performed motif
enrichment analyses for the GATA3-linked enhancers
and found that 24% of GATA3-linked enhancers con-
tained GATA3 motifs, validating TENET predictions.
In addition to the GATA3 motif, we also found that the
0
2
4
6
8
GATA3 FOXA1 ESR1
Percentage of enhancer probes
found from TF ChIP-seq
TF linked tumor specific enhancer probes
All linked tumor specific enhancer probes
All tumor specific enhancer probes
GATA3
a
c
2 kb hg19
88,110,000
BANP
16 _
1 _
102 _
1 _
Txn Factor ChIP
Master DNaseI HS
HM450 probe
UCSC Genes
MCF7 H3K27Ac
MCF7 GATA3
cg04747693
T
N
T
N
T
N
T
N
T
N
T
T
Whole genome bisulfite sequencing
N
N
BRCA
COAD
UCEC
LUAD
LUSC
READ
STAD
LumA
LumB
Her2
Her2
Basal
Scale
chr16:
4
8
12
16
0.00 0.25 0.50 0.75 1.00
DNA methylation level (cg04747693)
Gene expression level (GATA3)
BRCA PAM50 subtypes
T
N
LumA
LumB
Her2
Basal
Normal-like
Normal
b
d
Fig. 5 GATA3 is linked to many enhancers in breast tumor tissues. a Shown is a circos plot of the enhancers having an active state positively linked
to expression of GATA3. b Percentage of all tumor-specific enhancers (green), all tumor-specific enhancers linked to genes (red), and tumor-specific
enhancers linked to GATA3, FOXA1, or ESR1 (blue) expression that overlap with TF ChIP-seq for GATA3, FOXA1, or ESR1. c Genome browser screen
shots of an enhancer (containing probe cg04747693) having an active state in breast tumors, that is positively linked to expression of GATA3; the
probe is located within a H3K27Ac and a GATA3 peak in MCF7 cells and in a hypomethylated region specifically found in non-basal breast cancer
cells. d Shown is a scatterplot of the DNA methylation of the enhancer probe and GATA3 expression in normal and different subtypes of breast
tumor tissues
Page 8 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
GATA3-linked enhancers have known motifs of other
transcription factors (e.g., FOXA, ESR1, TCF), which
have been previously shown to work together with
GATA3 in breast cancer [16, 17]. Th ese results suggest
that through TENET analyses, we can identify sets of TFs
co-recruited to enhancers.
Characterization of prostate tumor-specifi c enhancer
networks
Unlike the association of GATA3 with breast cancer,
the TFs identified by TENET in prostate cancer have
not been well studied. Th e top 2 TFs associated with
the most tumor-specific enhancer probes in prostate
tumors, HOXC6 and DLX1, are each associated with
more than 100 tumor-specific enhancer probes (Fig. 4b).
Interestingly, HOXC6 and DLX1 were recently identi-
fied as top markers for prostate cancer, but little is known
about their target genes [10]. To further characterize
the highly linked TFs in the E
T
:G
+
category for PRAD,
we asked whether they are associated with same or dif-
ferent enhancers. When we performed clustering of the
enhancer probe:gene links for 59 different TFs linked to
more than 10 enhancer probes, we found that many of
the TFs shared subsets of linked enhancer probes. Sev-
eral clusters of TFs that are associated with the same
enhancers are marked by brackets with circled num-
bers in Fig. 6a. Cluster 1 contains HOXC6, HOXC4,
and HOXC5, cluster 2 contains HOXB13 and FOXA1,
and cluster 3 contains 19 TFs of which 8 are ZNFs (see
Additional file 6: Table S5 for the numbered list of the 59
TFs). However, HOXC6 and DLX1 mostly do not share
clusters of enhancer probes, suggesting that they regu-
late distinct sets of genes. To identify genes regulated
by these TFs, we used siRNA (performed in triplicate)
to reduce their expression in C42B prostate cancer cells,
followed by RNA-seq (Fig. 6b, c; Additional file 7: Table
S6, Additional file 8: Table S7). Interestingly, the sets of
genes changed by knockdown of each TF are different,
supporting the hypothesis that each TF has a distinct role
in prostate cancer.
Because there is a substantial heterogeneity in pros-
tate tumor samples, we felt it important to determine
the enhancer to gene link states of each tumor sample
to discover if any of the tumor-specific enhancers linked
to genes are enriched in subsets of tumors having previ-
ously identified characteristics. For example, in PRAD,
the most commonly found tumor subgroup has gene
fusions involving members of the E26 transformation-
specific (ETS) family of TFs, such as ERG, ETV1, ETV4,
and FLI1 [18]. Another common subgroup, which does
not carry ETS fusion genes, may have either a mutation
in the SPOP gene or a deletion of the CHD1 gene [19].
Additionally, mutations of the TP53, PTEN, FOXA1, or
IDH1 genes occur in subgroups of prostate cancers [14].
Th e clinical behavior and progression of prostate can-
cers vary case by case [20], and an understanding of the
mechanisms leading to the development of the different
prostate cancer subgroups is in great demand. We there-
fore more closely examined the 25,428 E
T
:G
+
links in dif-
ferent subsets of 333 prostate tumors (Fig. 7).
Interestingly, we detected a set of enhancer:gene links
that are common across the tumors (e.g., the 1075 E
T
:G
+
links in cluster 1). Th e 115 genes linked by these enhanc-
ers include genes that have been reported to be involved
in prostate cancer development. One example of a gene
which was associated with active states of enhancer
probes across all prostate tumors is the CAMKK2 gene,
an AR-regulated gene that is an upstream activator of the
AMP-dependent protein kinase (AMPK) and is involved
in catabolic pathways and physiologically relevant pro-
cesses such as cell cycle and cytoskeleton reorganization
[21]. Gene ontology analyses of these 115 linked genes
identified across all prostate tumors revealed that the
category of sequence-specific DNA binding is enriched
(Fisher exact test, adj. p value <3.1 × 10
−3
). Interestingly,
HOXC6 and DLX1, which we characterized above, were
also found in these “common” links, suggesting that they
may play a role in the development of the majority of
prostate tumors. However, in addition to the “common”
links, we identified more than 20,000 enhancer:gene links
that are uniquely enriched in a particular subgroup of
prostate tumors (Fig. 7). For example, cluster 2 of E
T
:G
+
links is enriched in ETS fusion-positive tumors, whereas
cluster 3 is enriched in tumors having a FOXA1 mutation.
(See figure on next page.)
Fig. 6 TFs linked to many tumor-specific enhancers in prostate tissues. a Unsupervised clustering of the enhancer probe:TF sets for PRAD for which
a TF is associated with more than 10 hypomethylated (tumor-specific) enhancer probes. The rows indicate the 59 TFs, and the columns indicate
the 536 hypomethylated enhancer probes linked to TFs; when there is a link, the cell is colored in black. On the top of the heatmap is shown the
chromosomal location for each enhancer probe. On the left side of the heatmap is shown the chromosomal location for each TF. TF number on the
right side indicates the TF rank, as determined by the number of linked enhancer probes for each TF (Additional file 6: Table S5). Three clusters of TFs
that are linked to the same enhancers are marked by brackets with circled numbers. b Volcano plots identifying genes differentially expressed upon
knockdown of HOXC6 and DLX1; triplicate control and knockdown samples were analyzed. c Venn diagrams of significantly down- or upregulated
genes upon knockdown of HOXC6 and DLX1
Page 9 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
b
a
−5 −3 NC 35
05 10 15 20 25
Gene expression fold change
−log2(False discovery rate)
−4 −2 24 −5 −3 NC 35
0 510 15 20 25
Gene expression fold change
−4 −2 24
HOXC6 siRNA DLX1 siRNA
DLX1
HOXC6
Enhancer probes linked to TFs (n=536)
TFs with linked enhancer probes (n=59)
TF location
Enhancer location
−log2(False discovery rate)
TF:enhancer probe
link status
Chromosome
Yes
No 1 22,X,Y
5
10
16
24
53
36
46
54
19
52
11
44
47
37
49
48
31
45
2
9
12
23
27
56
3
7
1
14
15
18
43
51
6
8
4
13
25
17
57
22
58
38
41
29
42
20
28
26
30
21
55
59
40
35
32
33
39
34
50
- DLX1
- HOXC6
HOXC6
DLX1
53 176 5
HOXC6
DLX1
69 41 6
Downregulated genes by siRNA Upregulated genes by siRNA c
1
2
3
Page 10 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
ZNF395 linked enhancers in kidney cancers have common
de novo motifs
To follow up on our identification of TFs linked to hun-
dreds of enhancers in kidney tumors, we further studied
enhancers linked to ZNF395. ZNF395, which encodes
a protein having one C2H2-type zinc finger domain, is
overexpressed in various tumors including kidney can-
cer [22] and has been reported to be induced by hypoxia
involved in IKK signaling [23]. Th e ZNF395 gene is
located at 8p21, and TENET identified hypomethylation
of enhancer probe cg12116192 (located ~500 kb from
the TSS of ZNF395) to be positively associated with the
expression of the ZNF395 gene (Fig. 8a, b), suggesting
that hypomethylation of this probe may be responsi-
ble for the increased expression of ZNF395 in kidney
tumors. Although ZNF395 has been repeatedly identi-
fied as a significant marker of renal cell carcinoma [24],
very little functional characterization of this TF has been
performed. As mentioned above, ZNF395 expression
level is positively associated with almost 200 hypometh-
ylated enhancer probes that are located throughout the
genome (Fig. 8c). ZNF395 has not been extensively stud-
ied, and there are no published ChIP-seq datasets or
motifs for this TF. However, if the linked enhancers are
in fact ZNF395-regulated enhancers, they may contain a
common motif. Th erefore, we performed a de novo motif
E
T
:G
+
links found by TENET (n=25,428)
Enhancer:gene link status
Yes No
Prostate tumors (n=333)
3+3 3+4 4+3 >=8
Wildtype
Fusion
Overexpression
Mutation
Heterozygous loss
Homozygous deletion
Genomic alterations Gleason score
Fusion/overexpression
SPOP mutation
CHD1 CNA
FOXA1 mutation
IDH1 mutation
Gleason score
1
3
2
Fig. 7 Heatmap of enhancer:gene links in prostate tumor tissues. Unsupervised clustering results using the E
T
:G
+
links (n = 25,428) for prostate
tumors (n = 333) with previously defined genomic alternations commonly found in prostate tumors and Gleason scores of the tumors [14]. Three
clusters of E
T
:G
+
links are marked by red-circled numbers
(See figure on next page.)
Fig. 8 ZNF395-linked enhancers in kidney tumor tissues. a Genome browser screen shots near the tumor-specific enhancer probe cg12116192.
From top, shown are the genomic coordinates, HM450 probe location, UCSC genes, H3K27Ac ChIP-seq tracks in tumor (753T) and normal (753N)
cells, the ENCODE layered TF ChIP-seq track, the ENCODE Master DNaseI hypersensitive site track, and an intra-chromosomal TENET-identified link
between the enhancer probe cg12116192 and the ZNF395 gene; left shaded region is the enhancer probe cg12116192, and the right shaded region
is the transcription start site of ZNF395. b Scatterplot of the DNA methylation level of the enhancer probe cg12116192 and ZNF395 expression in
normal and tumor kidney tissues. c Circos plot of the 183 enhancers having an active state positively linked to expression of the ZNF395 gene in
KIRC. d Logos of two de novo motifs identified in the 183 enhancers linked to ZNF395 expression are shown on the left; fraction of regions with the
two motifs in the 183 ZNF395-linked enhancers, in 7767 enhancers identified using a GFP antibody in K562 cells expressing a GFP-tagged ZNF395,
in all linked enhancers identified in KIRC except those linked to ZNF395, and in all distal NDR regions used in this study
Page 11 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
Scale
chr8:
200 kb hg19
27,750,000 27,800,000 27,850,000 27,900,000 27,950,000 28,000,000 28,050,000 28,100,000 28,150,000 28,200,000 28,250,000
PBK
PBK
SCARA5
SCARA5
MIR4287
SCARA5
SCARA5
NUGGC
ELP3
ELP3
ELP3
ELP3
ELP3
ELP3
PNOC
PNOC
PNOC
ZNF395
ZNF395
ZNF395
ZNF395
FBXO16
FBXO16
20 _
0 _
20 _
0 _
cg12116192
Txn Factor ChIP
Master DNaseI HS
HM450 probe
UCSC Genes
753N
H3K27Ac
753T
H3K27Ac
Normal Tumor
KIRC TENET links
ZNF395
b
10
12
14
0.00 0.25 0.50 0.75 1.00
DNA methylation level (cg12116192)
Gene expression level (ZNF395)
Normal
Tumor
a
c
0.00
0.20
0.40
0.60
0.80
1.00
Motif 1 Motif 2
Fraction of loci with motifs
ZNF395 linked enhancers
ZNF395 ChIP-seq peaks in enhancers
non-ZNF395 linked enhancers
All NDR in enhancers
Page 12 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
search [25] on these 183 ZNF395-linked enhancers. Inter-
estingly, we found that two motifs (E-value <3.2 × 10
−5
)
are enriched; motif 1 is found in 182 of the 183 enhancers,
and motif 2 is found in 75 of the 183 enhancers. However,
less than 25% of enhancers that are linked to genes other
than ZNF395 in KIRC had these motifs and less than 20%
of all NDRs distal from a TSS had the motifs (Fig. 8d).
Th e fact that these motifs are found in essentially all of
the ZNF395-linked enhancers suggests that they may
be direct binding motifs for ZNF395. Although ZNF395
ChIP-seq data obtained using an antibody to the endog-
enous protein have not been published, ChIP-seq data
obtained using a GFP antibody and K562 leukemia cells
harboring a GFP-tagged ZNF395 are available as part of
the ENCODE project. When we searched for motif 1 and
motif 2 in the distal GFP-ZNF395 K562 ChIP-seq peak
set, we found that 64% of the peaks contained motif 1 and
59% of the peaks contained motif 2. Th ese results suggest
that not only have we identified the DNA binding motif
for ZNF395, they provide evidence that TENET can be
used to identify enhancers and derive de novo motifs for
TFs that have not yet been studied using ChIP-seq (Addi-
tional file 9: Table S8).
Discussion
To understand the mechanisms underlying breast, pros-
tate, and kidney cancers, we identified differentially active
enhancers in breast, prostate, and kidney tumor tissues,
as compared to normal tissues. By using an approach
developed here called Tracing Enhancer Networks using
Epigenetic Traits (TENET), which uses DNA meth-
ylation and gene expression levels to discover genome-
wide links from enhancers to genes and from genes to
enhancers, we discovered key TFs linked to tumor-spe-
cific enhancers (Fig. 4b, Additional file 6: Table S5). We
further validated binding of the TFs GATA3, FOXA1
and ESR1 to the enhancers activated in breast cancers
using publicly available ChIP-seq data. By performing
knockdown assays and RNA-seq of the TFs HOXC6 and
DLX1 in prostate cancer and annotating enhancer states
in each prostate tumor sample, we found that expres-
sion of HOXC6 and DLX1 is highly linked to enhancers
in the majority of prostate tumors, but they are associ-
ated with different enhancers and regulate distinct gene
sets. We also revealed important TFs linked to enhancers
activated in kidney cancer and further identified de novo
motifs enriched in enhancers linked to ZNF395.
Previous studies have shown that all enhancers marked
by active epigenomic marks may not regulate gene
expression in the cells being studied [8, 26]. To prior-
itize enhancers which may possibly regulate gene expres-
sion among all enhancers marked by active epigenomic
marks, we first identified ~50,000 probes (located within
nucleosome-depleted subregions of enhancers) that are
differentially methylated in breast, prostate, and kidney
tumors, as compared to normal tissues; these probes cor-
respond to ~25,000 different enhancer regions that have
gained or lost activity in the tumor tissues. Because previ-
ous studies have shown a significant association between
the DNA methylation level of an enhancer and the
expression level of a direct target gene of the enhancer
[27, 28], we then developed an approach (TENET) that
identifies statistically significantly associated relation-
ships (links) between DNA methylation and gene expres-
sion genome-wide using raw p values by calculating z
scores, empirical p values, and Wilcoxon rank sum test
p values. Although the number of enhancer to gene links
can vary depending on the settings of cut-offs, in general,
when we linked gene expression levels with enhancer
activity, we found that only ~20% of the enhancer probes
show a positive relationship between activity state and
expression of a gene. Of these, ~40% of the enhancer
probe:gene links are between enhancers and genes on
the same chromosome, ~15% are within 10 MB of each
other, and only ~5% of the links are between an enhancer
and a gene located within 1 MB of each other. In total,
we identified 608 enhancer probe:gene links (correspond-
ing to 383 unique enhancers) in which the expression of
a nearby gene (within 1 Mb) is positively associated with
the activity of the enhancer (Additional file 5: Table S4);
this set of links is the most likely to identify direct target
genes. However, it is impossible to determine whether
the associations are direct or indirect by comparing DNA
methylation and gene expression changes. Chromosomal
looping assays are often used to evaluate chromosomal
interactions; however, the tissues analyzed here are not
available for further experimental follow-up. Future stud-
ies will require the identification of tumor cell lines that
show the appropriate enhancer activity:gene expression
relationship (i.e., robust enhancer marks and high gene
expression).
Although most genes and enhancers were involved in
a relatively small number of links, we did identify some
genes linked to hundreds of enhancers located through-
out the genome. Although one may think that the most
statistically significant enhancer to gene links would
correspond to nearby, direct target genes, our results
revealed that many links between an enhancer and a gene
located far away or on another chromosome had very
strong relationships. Many of these genes are TFs, some
of which have previously been associated with the cancer
in which the link was identified (Additional file 6: Table
S5). For example, we identified GATA3 and FOXA1 as
highly linked TFs in breast tumors. GATA3 and FOXA1
act as pioneer factors essential for mammary morphogen-
esis, and GATA3 is required for estradiol stimulation of
Page 13 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
cell cycle progression in breast cancer cells [29]. TENET
identified HOXC6 and DLX1 as important TFs in PRAD.
HOXC6 is involved in epithelial cell proliferation, and loss
of this gene induces apoptosis in prostate cancer cells [30,
31]. DLX1 encodes a distal-less homeobox 1 protein that
is reported to drive prostate cancer metastasis [32]. Upon
independent knockdown of these genes in C42B prostate
cancer cells, we found that for HOXC6, the top GO terms
for downregulated genes were mitotic cell cycle and cell
cycle (e.g., CDKN2C, CDK16, IGFBP3), and for DLX1,
genes involved in proliferation and androgen-respon-
sive genes were enriched in downregulated genes (e.g.,
CDCA7, MAGOH, MAD2L1). However, different genes
were identified upon knockdown of HOXC6 and DLX1,
suggesting that each of these TFs has a distinct role in
prostate cancer (Fig. 6c). Th is supports the recent find-
ing that the combination of three genes (HOXC6, DLX1,
and TDRD1) constitutes the top prognostic marker for
prostate cancer [10]. Some of the TENET-identified TFs
in KIRC (TFEC, RUNX1, ZNF395) have been previously
linked to kidney cancer. For example, the dysregulation
of TFEC, which belongs to the micropthalmia family of
TFs, leads to renal cell carcinoma [33], RUNX1 upregula-
tion is an important factor for clear cell renal carcinoma
survival [1, 34], and ZNF395 is known to play a role in
the pathogenesis of clear cell renal carcinoma [24], pos-
sibly affecting kidney cancer patient survival (Additional
file 1: Figure S9). Our analyses using ChIP-seq data for
TFs identified by TENET in BRCA and KIRC suggest
that enhancer:gene links identified by this approach may
help to identify specific TF binding sites and DNA bind-
ing motifs; this finding may be very beneficial for studies
of TFs for which we do not have available antibodies for
functional assays and for understanding TF-enhancer-
gene networks (Additional file 1: Figure S10).
Conclusions
In this study, we developed an approach (TENET) that
alleviated tumor purity issues and then identified more
than 800,000 enhancer probe to gene links in prostate,
breast, and kidney tissue samples using only DNA meth-
ylation and RNA-seq data (Additional file 5: Table S4).
We revealed TFs whose expression is linked to a large
number of tumor-specific enhancers and further char-
acterized selected TFs for each tumor type (e.g., BRCA:
GATA3, FOXA1, ESR1, PRAD: HOXC6, DLX1, KIRC:
ZNF395) and for specific tumor subgroups. However,
there are limitations to our analyses. For example, cur-
rently there is no H3K27Ac ChIP-seq data and no whole
genome bisulfite sequencing data available for the ~1400
tissue samples we used. Because the DNA methylation
data for the normal and tumor tissues are from HM450
arrays, we can only investigate enhancers represented
by these probes. For future studies, data from the Illu-
mina EPIC array (which has more enhancer probes) or
from whole genome bisulfite sequencing of normal and
tumor tissues can be used with TENET to comprehen-
sively identify tumor-specific changes in enhancer activ-
ity. We note that our approach can only identify enhancer
to gene links that show changes in samples. Th erefore,
enhancers linked to genes that are expressed at a high
level across normal and tumor samples (even if regulated
by different enhancers in the tumors) as well as enhanc-
ers that are constitutively active across samples (even if
they regulate different genes in the normals vs. tumors)
cannot be identified. Finally, we stress that our approach
can also be applied to studies beyond cancer to charac-
terize enhancer networks for different types of case ver-
sus control datasets.
Methods
Cell culture growth conditions
Th e human prostate cancer C42B cells, obtained from
ViroMed Laboratories (Minneapolis, MN, USA), were
maintained in RPMI 1640 supplemented with 10% fetal
bovine serum (FBS). Th e human immortalized normal
prostate cell line RWPE1 (ATCC # CRL-11609) was
grown according to the manufacturer’s recommendation.
Th e human breast cancer MCF7 cells (ATCC # HTB-22)
were grown in DMEM supplemented with 10% FBS. For
estradiol stimulation, cells were grown in phenol red-
free medium with charcoal stripped serum for several
days and treated with 100 nM of estradiol for 45 min (as
a control, ethanol was added instead of estradiol). Th e
human immortalized normal breast cell line MCF10A
(ATCC # CRL-10317) was maintained in DMEM/F12
with 5% horse serum, 100 units/ml penicillin, 0.1 mg/
ml streptomycin, 0.5 ug/ml hydrocortisone, 100 ng/ml
cholera toxin, 10 ug/ml insulin, and 20 ng/ml epidermal
growth factor.
ChIP-seq
In C42B, RWPE1, MCF7, and MCF10A cells, H3K27Ac
ChIP assays were performed using H3K27Ac antibody
(Cat # 39133 Lot # 21311004, Active Motif, Carlsbad,
CA, USA or ab4729 Abcam, Cambridge, MA, USA), as
previously described [5, 35]. Each ChIP-seq experiment
was performed in duplicate, and ChIP-seq libraries were
sequenced on either Illumina Hiseq 2000 or Nextseq
500 machines. All ChIP-seq data were mapped to hg19
using BWA (default parameters), and peaks were called
using Sole-Search as previously described [8]. All ChIP-
seq data were deposited in GEO (accession number
GSE78913). Access to other publicly available ChIP-seq
datasets used in this study can be found in Additional
file 2: Table S1.
Page 14 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
FAIRE-seq
FAIRE assays were performed in MCF10A cells as pre-
viously described [5]. Two independent libraries were
constructed and sequenced on Illumina Hi-Seq 2000.
FAIRE-seq data were mapped to hg19 using BWA
(default parameters), and peaks were called by using
Sole-Search (TF parameter, alpha value: 0.001, fdr: 0.001).
All FAIRE-seq data were also deposited in the accession
number, GSE78913.
siRNA knockdown, RT-qPCR, and RNA-seq
For transient knockdown, C42B cells were transfected
in triplicate with 100 nM of siRNA oligonucleotides of
human HOXC6 (Cat # L011871000005), DLX1 (Cat #
L011871000005), or control (Cat # D0018101005) using
SMART pool Dharmafect transfection reagent 3 (Dhar-
macon, Lafayette, CO, USA) for 72 h. RNA was extracted
using Trizol reagent (Cat # 15596-026, Life technolo-
gies, Carlsbad, CA, USA) following its protocol. cDNA
was synthesized using SuperScript
®
VILO
™
cDNA Syn-
thesis Kit (Cat # 11754-050, Life technologies, Carls-
bad, CA, USA). qPCR was performed on cDNA using
SYBR Green (Cat # 172-5201, Bio-Rad, Hercules, CA,
USA) with primers listed in Additional file 8: Table S7.
RNA-seq libraries were made using KAPA Stranded
mRNA-Seq Kit with KAPA mRNA Capture Beads (Cat
# KK8420, Kapa Biosystems, Woburn, MA, USA). ERCC
RNA Spike-In Mix (Cat # 4456740 Th erma Fisher Scien-
tific, Waltham, MA, USA) was added to each library for
quality assessment. RNA-seq libraries were sequenced on
Illumina Nextseq 500 with 75-bp single reads. To remove
batch effects, matched controls and knockdown samples
were prepared and sequenced at the same time. All RNA-
seq data were deposited in the NCBI GEO accession
number, GSE78913, and differentially expressed genes
were selected by using the Gene Specific Algorithm from
Partek
®
Flow
®
software using the upper quartile nor-
malization method (Partek Inc., St. Louis, MO, USA).
We used an FDR cutoff of 0.05 to select statistically sig-
nificantly differently expressed genes. Differentially
expressed genes with absolute fold change >1.5 were
listed in Additional file 7: Table S6.
Tracing enhancer networks using epigenetic traits (TENET)
To identify differentially methylated enhancers, the
genomic coordinates of enhancers identified by REMC
and ENCODE for 98 tissues or cell lines plus H3K27Ac
ChIP-seq peaks from breast, prostate, and kidney cells
were used. We then narrowed the regions by intersect-
ing with the set of ENCODE Master DNaseI-seq peaks
from 125 tissues or cell lines and DNaseI/FAIRE/NOMe-
seq peaks from breast, prostate, and kidney cells (Addi-
tional file 2: Table S1). Only distal regulatory elements
were used; these were located greater than 1.5 kb from
a known TSS and identified using GENCODE v19 [36].
DNA methylation HM450 data and RNA-seq data of
breast, prostate, and kidney tissues were downloaded
from the TCGA data portal (https://tcga-data.nci.nih.
gov/tcga/) and used to identify differentially methylated
enhancer probes and their associated genes by develop-
ing an approach called TENET (freely available to down-
load at http://farnhamlab.com/software). Importantly,
this method can predict enhancer:gene links using only
DNA methylation and RNA-seq data, which is easily
obtainable from frozen tissues. In step 1 of TENET, dif-
ferentially methylated enhancers in tissue samples are
identified, adjusting for tumor purity. In steps 2–4 of
TENET, relationships between enhancer activity and
gene expression levels are investigated genome-wide.
TENET was designed to detect enhancer activity changes
and enhancer:gene links that are specific to tumor sub-
groups. Th e ability of TENET to annotate enhancer:gene
links genome-wide also allows the identification of a set
of key TFs for each tumor type. All enhancer to gene
links found can be summarized and visualized using the
tools in step 5 of TENET, which creates tables annotat-
ing enhancer to gene link states of each sample, statistic
tables, histograms, scatterplots, circos plots, and genome
browser tracks. A detailed explanation of the TENET,
including information on installation, parameter settings,
and statistical methods, is available in Additional file 1:
Supplementary Methods.
Comparison of TF ChIP-seq with TENET results
We obtained GATA3, FOXA1, and ESR1 ChIP-seq from
ENCODE (Additional file 2: Table S1) and then tested
whether the TF peaks were found within ±100 bp of
each probe. Fisher exact tests were conducted between
groups, and p values were adjusted using Benjamini–
Hochberg method.
Whole genome bisulfi te sequencing (WGBS)
We used level 3 data of WGBS of breast invasive carci-
noma (BRCA), colon adenocarcinoma (COAD), uterine
corpus endometrioid carcinoma (UCEC), lung adenocar-
cinoma (LUAD), lung squamous cell carcinoma (LUSC),
rectum adenocarcinoma (READ), and stomach adeno-
carcinoma (STAD) from the TCGA data portal (https://
tcga-data.nci.nih.gov/tcga/), and we visualized these
datasets using the Integrative Genomics Viewer (IGV)
(https://www.broadinstitute.org/software/igv/).
Heatmap of E
T
:G links for prostate tumors
Using a binary file of E
T
:G
+
links found by TENET
for PRAD in step 5, unsupervised clustering was per-
formed using a binary method for distance matrix
Page 15 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
computation and Ward’s method for hierarchical clus-
tering. On the top of the heatmap, previously defined
genomic alternations commonly found in prostate
tumors and Gleason scores of the tumors are indicated
[14]. Th e images of prostate tumor tissues submitted to
TCGA were reviewed according to the American Joint
Committee on Cancer (AJCC) and assigned a Gleason
score, which describes how dangerous a prostate tumor
is in terms of how likely it is to metastasize; the higher
Gleason score, the more likely that tumor will grow and
spread quickly.
Gene ontology (GO) and GSEA analysis
To identify E
T
:G
+
links found common across prostate
tumors, the resulting dendrogram from the unsuper-
vised clustering of E
T
:G
+
links was cut (k = 5), and 1075
links between 115 unique genes and 102 unique enhancer
probes were found (cluster 1 of the Fig. 7). Th e 115 genes
were analyzed for enrichment in particular GO catego-
ries using the TopGO program [37]. A Fisher exact test
was performed, and an adjusted p value cutoff 0.05 was
used to select statistically significantly enriched GO cat-
egories. For enrichment analysis of genes differentially
expressed in knockdown experiments, genes with FC
cutoff 1.2 and FDR cutoff, 0.05 were selected, and the
above Fisher exact tests were used to determine enriched
GO categories. Th e same differentially expressed genes
were used to identify enriched gene sets using the GSEA
(Gene Set Enrichment Analysis) tool [38]. Hypergeomet-
ric test was used to calculate p value, and false discov-
ery rate (q-value) <0.05 was used to select significantly
enriched gene sets.
Motif analysis for TF-linked enhancers
To discover de novo motifs enriched in the enhancers
linked to a TF, we collected sequences of 100-bp windows
of the CpG probes and used MEME version 4.10.1 [25]
with a minimum motif width of 6 and a maximum motif
width of 12, scanning both strands of DNA sequences. To
provide a stringent analysis, we reported de novo motifs
found at enhancer probes using E-value cutoff, 0.0001,
that were found in >50% of the TF-linked enhancers; see
Additional file 9: Table S8. Two motifs (motif 1 and motif
2) were found to be enriched in the 183 enhancers linked
to ZNF395 with an E-value cutoff, 0.0001. FIMO ver-
sion 4.10.1 [39] was used to scan distal (>1500 bp from
a TSS) ZNF395 ChIP-seq peaks in K562 cells expressing
eGFP-ZNF395 (n = 7767), non-ZNF395 linked enhanc-
ers (n = 2352) from TENET in KIRC, and distal NDRs
defined using the ENCODE DNaseI master sites for 125
cell types (n = 2391,038) for the presence of motif 1 and
motif 2; only loci with a match p value <1 × 10
−4
were
counted (Fig. 8c).
Survival analysis
A Kaplan–Meier survival analysis was used to estimate
the association of ZNF395 expression with the survival
of kidney cancer patients. Overall survival was calcu-
lated using an R package, survival version 2.38 (http://
CRAN.R-project.org/package=survival), with the date of
initial diagnosis of cancer and disease-specific death or
months to last follow-up for patients who are alive. After
grouping kidney tumor samples with low (below mean)
and high (above mean) ZNF395 expression, a log rank
test was performed.
Abbreviations
BRCA: breast invasive carcinoma; ChIP: chromatin immunoprecipitation;
ENCODE: encyclopedia of DNA elements; FAIRE: formaldehyde-assisted isola-
tion of regulatory elements; KIRC: kidney renal clear cell carcinoma; NOMe-
seq: nucleosome occupancy and methylome sequencing; PRAD: prostate
adenocarcinoma; REMC: roadmap epigenome mapping consortium; TCGA:
the cancer genome atlas; TENET: tracing enhancer networks using epigenetic
traits; TF: transcription factor; TSS: transcription start site.
Authors’ contributions
SKR conceived and designed the project, created analysis tools, performed
data analysis, designed, performed, and analyzed experiments in cell lines,
wrote and edited the manuscript. YG designed and performed experi-
ments in cell lines. YGT designed and performed experiments in cell lines.
LY contributed analysis tools. HS contributed analysis tools and edited the
manuscript. GAC participated in project design and edited the manuscript.
PWL participated in project design and edited the manuscript. PJF conceived
and designed the experiments, wrote and edited the manuscript. All authors
read and approved the final manuscript.
Additional fi les
Additional fi le 1. Supplementary Methods, Figure S1. Workflow
of TENET. Figure S2. Schematic diagrams explaining the methodol-
ogy of TENET. Figure S3. Comparison of enhancer probes in the three
tumor types. Figure S4. Identification of genes linked to differentially
methylated enhancers using TENET. Figure S5. Examples of enhancer
probe:gene links identified by TENET. Figure S6. Distribution of enhancer
probe:gene links on the same chromosome. Figure S7. Histograms of
TENET-identified enhancer:gene links (centered on genes). Figure S8. His-
tograms of TENET-identified enhancer:gene links (centered on enhancers).
Figure S9. Survival curves of ZNF395 in KIRC. Figure S10. Example of a
3-layer (TF-enhancer-gene) network. and Supplementary References.
Additional fi le 2: Table S1. Next generation sequencing data used for
analyses.
Additional fi le 3: Table S2. TENET parameter settings used for BRCA,
PRAD, and KIRC.
Additional fi le 4: Table S3. Summary of TENET results for all
enhancer:gene links.
Additional fi le 5: Table S4. List of E
T
:G
+
links found by TENET (a) BRCA,
(b) PRAD, and (c) KIRC.
Additional fi le 6: Table S5. List of TFs found from TENET E
T
:G
+
links for
BRCA, PRAD, and KIRC.
Additional fi le 7: Table S6. List of the most statistically significantly dif-
ferentially expressed genes after siRNA experiments for HOXC6 and DLX1.
Additional fi le 8: Table S7. Oligonucleotides sequences used for RT-
qPCR after siRNA experiments.
Additional fi le 9: Table S8. List of de novo motifs enriched in TFs found
from TENET E
T
:G
+
links for KIRC.
Page 16 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
Author details
1
Department of Biochemistry and Molecular Medicine and the Norris Com-
prehensive Cancer Center, Keck School of Medicine, University of Southern
California, 1450 Biggy Street, NRT G511B, Los Angeles, CA 90089-9601, USA.
2
Van Andel Research Institute, Grand Rapids, MI 49503, USA.
Acknowledgements
We are grateful to all individuals who contributed to this study, as well as the
TCGA analysis working groups (specifically for BRCA, PRAD, and KIRC) and the
ENCODE and REMC Consortia for data access. We thank the USC/Norris Cancer
Center Next Generation Sequencing Facility for library construction and high-
throughput sequencing, and USC’s Norris Medical Library Bioinformatics Ser-
vice and Center for High-Performance Computing (hpc.usc.edu) for assisting
and supporting with data analysis and computation for the work described in
this paper. We also thank Professors Ben Berman and Stefano Lonardi for their
insightful comments.
Competing interests
The authors declare that they have no competing interests.
Availability of data and material
The datasets generated during this current study are available in GEO
(GSE78913). Accession numbers for all publicly available datasets used in this
study can be found in Additional file 2: Table S1.
Funding
This work was supported by the following grants from the NIH: R01CA136924
and R01CA190182.
Received: 26 August 2016 Accepted: 28 October 2016
References
1. Yao L, Shen H, Laird PW, Farnham PJ, Berman BP . Inferring regulatory
element landscapes and transcription factor networks from cancer
methylomes. Genome Biol. 2015;16:105.
2. Ernst J, Kheradpour P , Mikkelsen TS, Shoresh N, Ward LD, Epstein
CB, Zhang X, Wang L, Issner R, Coyne M, et al. Mapping and analy-
sis of chromatin state dynamics in nine human cell types. Nature.
2011;473(7345):43–9.
3. Rada-Iglesias A, Bajpai R, Swigut T, Brugmann SA, Flynn RA, Wysocka J. A
unique chromatin signature uncovers early developmental enhancers in
humans. Nature. 2011;470(7333):279–83.
4. Aran D, Sabato S, Hellman A. DNA methylation of distal regulatory
sites characterizes dysregulation of cancer genes. Genome Biol.
2013;14(3):R21.
5. Rhie SK, Hazelett DJ, Coetzee SG, Yan C, Noushmehr H, Coetzee GA.
Nucleosome positioning and histone modifications define relationships
between regulatory elements and nearby gene expression in breast
epithelial cells. BMC Genom. 2014;15:331.
6. ENCODE Project Consortium. An integrated encyclopedia of DNA ele-
ments in the human genome. Nature. 2012;489(7414):57–74.
7. Roadmap Epigenomics Consortium. Integrative analysis of 111 reference
human epigenomes. Nature. 2015;19:317–30.
8. Tak YG, Hung Y, Yao L, Grimmer MR, Do A, Bhakta MS, O’Geen H, Segal
DJ, Farnham PJ. Effects on the transcriptome upon deletion of a distal
element cannot be predicted by the size of the H3K27Ac peak in human
cells. Nucleic Acids Res. 2016;44(9):4123–33.
9. Ecker JR, Bickmore WA, Barroso I, Pritchard JK, Gilad Y, Segal E. Genomics:
ENCODE explained. Nature. 2012;489(7414):52–5.
10. Leyten GH, Hessels D, Smit FP , Jannink SA, de Jong H, Melchers WJ, Cornel
EB, de Reijke TM, Vergunst H, Kil P , et al. Identification of a candidate
gene panel for the early diagnosis of prostate cancer. Clin Cancer Res.
2015;21(13):3061–70.
11. Meyer CA, Liu XS. Identifying and mitigating bias in next-genera-
tion sequencing methods for chromatin biology. Nat Rev Genet.
2014;15(11):709–21.
12. Farlik M, Sheffield NC, Nuzzo A, Datlinger P , Schonegger A, Klughammer J,
Bock C. Single-cell DNA methylome sequencing and bioinformatic infer-
ence of epigenomic cell-state dynamics. Cell Rep. 2015;10(8):1386–97.
13. Taberlay PC, Statham AL, Kelly TK, Clark SJ, Jones PA. Reconfiguration of
nucleosome-depleted regions at distal regulatory elements accompanies
DNA methylation of enhancers and insulators in cancer. Genome Res.
2014;24(9):1421–32.
14. Cancer Genome Atlas Research. Network: the molecular taxonomy of
primary prostate cancer. Cell. 2015;163(4):1011–25.
15. Jin F, Li Y, Dixon JR, Selvaraj S, Ye Z, Lee AY, Yen CA, Schmitt AD, Espinoza
CA, Ren B. A high-resolution map of the three-dimensional chromatin
interactome in human cells. Nature. 2013;503(7475):290–4.
16. Frietze S, Wang R, Yao L, Tak YG, Ye Z, Gaddis M, Witt H, Farnham PJ, Jin VX.
Cell type-specific binding patterns reveal that TCF7L2 can be tethered to
the genome by association with GATA3. Genome Biol. 2012;13(9):R52.
17. Theodorou V, Stark R, Menon S, Carroll JS. GATA3 acts upstream of FOXA1
in mediating ESR1 binding by shaping enhancer accessibility. Genome
Res. 2013;23(1):12–22.
18. Tomlins SA, Laxman B, Dhanasekaran SM, Helgeson BE, Cao X, Morris DS,
Menon A, Jing X, Cao Q, Han B, et al. Distinct classes of chromosomal
rearrangements create oncogenic ETS gene fusions in prostate cancer.
Nature. 2007;448(7153):595–9.
19. Barbieri CE, Baca SC, Lawrence MS, Demichelis F, Blattner M, Theurillat
JP , White TA, Stojanov P , Van Allen E, Stransky N, et al. Exome sequenc-
ing identifies recurrent SPOP , FOXA1 and MED12 mutations in prostate
cancer. Nat Genet. 2012;44(6):685–9.
20. Brawley OW, Ankerst DP , Thompson IM. Screening for prostate cancer. CA
Cancer J Clin. 2009;59(4):264–73.
21. Racioppi L. CaMKK2: a novel target for shaping the androgen-regulated
tumor ecosystem. Trends Mol Med. 2013;19(2):83–8.
22. Herwartz C, Castillo-Juarez P , Schroder L, Barron BL, Steger G. The
transcription factor ZNF395 is required for the maximal hypoxic induc-
tion of proinflammatory cytokines in U87-MG cells. Mediat Inflamm.
2015;2015:804264.
23. Jordanovski D, Herwartz C, Pawlowski A, Taute S, Frommolt P , Steger G.
The hypoxia-inducible transcription factor ZNF395 is controlled by IkB
kinase-signaling and activates genes involved in the innate immune
response and cancer. PLoS ONE. 2013;8(9):e74911.
24. Dalgin GS, Holloway DT, Liou LS, DeLisi C. Identification and characteriza-
tion of renal cell carcinoma gene markers. Cancer Inform. 2007;3:65–92.
25. Bailey TL, Elkan C. Fitting a mixture model by expectation maximiza-
tion to discover motifs in biopolymers. Proc Int Conf Intell Syst Mol Biol.
1994;2:28–36.
26. Thakore PI, D’Ippolito AM, Song L, Safi A, Shivakumar NK, Kabadi AM,
Reddy TE, Crawford GE, Gersbach CA. Highly specific epigenome editing
by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat
Methods. 2015;12(12):1143–9.
27. Bell RE, Golan T, Sheinboim D, Malcov H, Amar D, Salamon A, Liron
T, Gelfman S, Gabet Y, Shamir R, et al. Enhancer methylation dynam-
ics contribute to cancer plasticity and patient mortality. Genome Res.
2016;26(5):601–11.
28. Thurman RE, Rynes E, Humbert R, Vierstra J, Maurano MT, Haugen E, Shef-
field NC, Stergachis AB, Wang H, Vernot B, et al. The accessible chromatin
landscape of the human genome. Nature. 2012;489(7414):75–82.
29. Eeckhoute J, Keeton EK, Lupien M, Krum SA, Carroll JS, Brown M. Positive
cross-regulatory loop ties GATA-3 to estrogen receptor alpha expression
in breast cancer. Cancer Res. 2007;67(13):6477–83.
30. Ramachandran S, Liu P , Young AN, Yin-Goen Q, Lim SD, Laycock
N, Amin MB, Carney JK, Marshall FF, Petros JA, et al. Loss of HOXC6
expression induces apoptosis in prostate cancer cells. Oncogene.
2005;24(1):188–98.
31. Hamid AR, Hoogland AM, Smit F, Jannink S, van Rijt-van de Westerlo
C, Jansen CF, van Leenders GJ, Verhaegh GW, Schalken JA. The role of
HOXC6 in prostate cancer development. Prostate. 2015;75(16):1868–76.
32. Chiang YT, Gout PW, Collins CC, Wang Y. Prostate cancer metastasis-driv-
ing genes: hurdles and potential approaches in their identification. Asian
J Androl. 2014;16(4):545–8.
33. Haq R, Fisher DE. Biology and clinical relevance of the micropthal-
mia family of transcription factors in human cancer. J Clin Oncol.
2011;29(25):3474–82.
Page 17 of 17 Rhie et al. Epigenetics & Chromatin (2016) 9:50
• We accept pre-submission inquiries
• Our selector tool helps you to find the most relevant journal
• We provide round the clock customer support
• Convenient online submission
• Thorough peer review
• Inclusion in PubMed and all major indexing services
• Maximum visibility for your research
Submit your manuscript at
www.biomedcentral.com/submit
Submit your next manuscript to BioMed Central
and we will help you at every step:
34. Xiong Z, Yu H, Ding Y, Feng C, Wei H, Tao S, Huang D, Zheng SL, Sun J,
Xu J, et al. RNA sequencing reveals upregulation of RUNX1-RUNX1T1
gene signatures in clear cell renal cell carcinoma. Biomed Res Int.
2014;2014:450621.
35. Blattler A, Yao L, Witt H, Guo Y, Nicolet CM, Berman BP , Farnham PJ. Global
loss of DNA methylation uncovers intronic enhancers in genes showing
expression changes. Genome Biol. 2014;15(9):469.
36. Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M, Kokocinski
F, Aken BL, Barrell D, Zadissa A, Searle S, et al. GENCODE: the reference
human genome annotation for The ENCODE Project. Genome Res.
2012;22(9):1760–74.
37. Alexa A, Rahnenfuhrer J. topGO: enrichment analysis for Gene Ontology.
R package version 2180 2010.
38. Subramanian A, Tamayo P , Mootha VK, Mukherjee S, Ebert BL, Gillette MA,
Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment
analysis: a knowledge-based approach for interpreting genome-wide
expression profiles. Proc Natl Acad Sci USA. 2005;102(43):15545–50.
39. Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given
motif. Bioinformatics. 2011;27(7):1017–8.
Abstract (if available)
Abstract
Prostate cancer (PCa) is the leading cause of new cancer cases and the 3rd most common cause of cancer death among men in the USA. Recent genome-wide association studies (GWAS) have identified more than 100 loci associated with increased risk of prostate cancer, most of which are in non-coding regions of the genome. Understanding the function of these non-coding risk loci is critical to elucidate the genetic susceptibility to prostate cancer. ❧ Results: I generated genome-wide regulatory element maps of normal and tumorigenic prostate cells and, using genome-wide chromosome conformation capture data (in situ Hi-C) from those same cells, I annotated the regulatory potential of 2,181 fine-mapped prostate cancer risk-associated SNPs. I then predicted a set of target genes that are regulated by prostate cancer risk-related H3K27Ac-mediated loops. I next identified prostate cancer risk-associated CTCF sites involved in long-range chromatin loops. I used CRISPR-mediated deletion to remove prostate cancer risk-associated CTCF anchor regions as well as the CTCF anchor regions looped to the prostate cancer risk-associated CTCF sites. I observed up to 100-fold increased expression of genes within the loops after deletion of the prostate cancer risk- associated CTCF anchor regions. ❧ Conclusions: I identified GWAS risk loci involved in long-range loops that function to repress gene expression within chromatin loops. My studies provide new insights into the genetic susceptibility to prostate cancer.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Functional characterization of a prostate cancer risk region
PDF
Using CRISPR-mediated deletion to study prostate cancer regulatory elements located at loop anchors identified by Hi-C
PDF
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
PDF
The relationship between DNA methylation and transcription factor binding in colon cancer cells
PDF
Genetic studies of cancer in populations of African ancestry and Latinos
PDF
Identification and characterization of cancer-associated enhancers
PDF
Identification and fine-mapping of genetic susceptibility loci for prostate cancer and statistical methodology for multiethnic fine-mapping
PDF
Prostate cancer: genetic susceptibility and lifestyle risk factors
PDF
Exploring three-dimensional organization of the genome by mapping chromatin contacts and population modeling
PDF
Functional analysis of a prostate cancer risk enhancer at 7p15.2
PDF
RNA methylation in cancer plasticity and drug resistance
PDF
Exploring stem cell pluripotency through long range chromosome interactions
PDF
Breast epithelial cell type specific enhancers and functional annotation of breast cancer risk loci
PDF
Exploration of the roles of cancer stem cells and survivin in the pathogenesis and progression of prostate cancer
PDF
Functional characterization of colorectal cancer GWAS loci
PDF
Perinatal epigenetic and genetic analyses in childhood cancers
PDF
Dietary and supplementary folate intake and prostate cancer risk
PDF
Functional characterization of colon cancer risk enhancers
PDF
Understanding acute lymphoblastic leukemia in different ethnic groups in the United States
PDF
Polygenes and estimated heritability of prostate cancer in an African American sample using genome-wide association study data
Asset Metadata
Creator
Guo, Yu (Phoebe)
(author)
Core Title
Understanding prostate cancer genetic susceptibility and chromatin regulation
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Cancer Biology and Genomics
Publication Date
05/10/2019
Defense Date
11/16/2018
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
chromosome conformation,CRISPR/Cas9,GWAS,OAI-PMH Harvest,prostate cancer
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Maxson, Robert (
committee chair
), Alber, Frank (
committee member
), Farnham, Peggy (
committee member
)
Creator Email
yuguo.phoebe@gmail.com,yuguo@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-156575
Unique identifier
UC11660656
Identifier
etd-GuoYuPhoeb-7440.pdf (filename),usctheses-c89-156575 (legacy record id)
Legacy Identifier
etd-GuoYuPhoeb-7440.pdf
Dmrecord
156575
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Guo, Yu (Phoebe)
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
chromosome conformation
CRISPR/Cas9
GWAS
prostate cancer