Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The relationship between DNA methylation and transcription factor binding in colon cancer cells
(USC Thesis Other)
The relationship between DNA methylation and transcription factor binding in colon cancer cells
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
UNIVERSITY
OF
SOUTHERN
CALIFORINIA
The
Relationship
Between
DNA
Methylation
and
Transcription
Factor
Binding
in
Colon
Cancer
Cells
Yu (Phoebe) Guo
PI: Dr. Peggy J. Farnham
Degree: Master of Science in Biochemistry and Molecular Biology
Degree Conferral Date: December 2014
1
Acknowledgments
I have to start my acknowledgments with great gratitude to my awesome mentor Dr.
Peggy Farnham. I would never have been able to finish this thesis without your
instruction. I want to thank you for being so patient during the past two years. I was
really impressed by how much you care about your students. You have done so much for
me, I am really thankful! If in the future I ever get a chance to mentor someone, I wish I
could be like you.
I want to thank my two lab mentors Heather Witt and Adam Blattler. Heather helped me
to start the lab and guided me through the ChIP-seq protocol. Adam taught me how to do
all the analyses which I knew nothing about before. I wish you guys all the best in the
future!
I want to thank all the Farnham lab members: Adam Blattler, Albert Do, Esther Tak,
Jordan Gaddis, Joy Hung, Lijing Yao, Matt Grimmer and Malaina Gaddis. The Lab was a
lot more fun with you guys all around.
I want to thank my committee members Dr. Gerry Coetzee and Dr. Zoltan Tokes for
kindly reviewing my thesis.
To the people in USC Epigenome Center Sequencing Core: Charles Nicolet, Selene
Tyndale and Helen Truong, thanks for helping me generate all the sequencing data.
I would like to thank my family members, especially my Dad and my aunt Ye. Thank you
so much for putting me through this program and letting me go across the whole Pacific
Ocean to pursue a higher degree in a foreign country. You never know how much it
2
means to me when you tell me to go for my dreams. I love you guys so so much! I also
want to thank uncle Zijing, aunt Yuehong and my cousin James. I love to spend
Christmas with you guys. It makes me feel that even though I am far away from home, at
least I am not alone here.
Last but not least, I want to thank Jieli Shen for being so supportive during the whole
time. Life is just a lot happier with you around! Thank you so much!
3
Table of Contents
Acknowledgments .......................................................................................... 1
List of Figures and Tables .............................................................................. 5
Abstract ........................................................................................................... 8
Introduction .................................................................................................... 9
DNA methylation ........................................................................................ 9
Repression mechanisms ............................................................................ 10
Study plan .................................................................................................. 11
Results .......................................................................................................... 13
ChIP-seq antibody specificity validation .................................................. 13
RNAPII recruitment is not enhanced by global promoter demethylation . 18
ZNF274 binding does not require DNA methylation ............................... 24
DNA demethylation at MAX motifs allows more MAX binding ............. 27
CEBPB binds to methylated regions in vivo ............................................. 32
Discussions ................................................................................................... 37
Materials and Methods ................................................................................. 41
Cell culture. ............................................................................................... 41
4
Western blottings and IP westerns. ........................................................... 41
qPCR. ........................................................................................................ 43
Chromatin immune-precipitation sequencing (ChIP-seq). ........................ 44
Whole genome bisulfate sequencing WGBS. ........................................... 46
ENCODE data access ................................................................................ 48
Data processing ......................................................................................... 48
References .................................................................................................... 50
5
List of Figures and Tables
Figure 1. Whole genome bisulfite sequencing comparative analysis of
HCT116 and DKO1 cells ( Blattler et al. Submitted, 2014).. ....................... 10
Figure 2. Site-specific transcription factors containing CpG dinucleotides
within their recognition motif. (Blattler et al. JBC, 2013). .......................... 12
Figure 3. ChIP-seq antibody validation. . ..................................................... 13
Figure 4. Selection of high confidence peaks.. ............................................. 14
Figure 5. RNAPII data quality control. ........................................................ 15
Figure 6. ZNF274 data quality control.. ....................................................... 16
Figure 7. MAX data quality control.. ........................................................... 16
Figure 8. Overlapping peaks in HCT116 and DKO1 identifies binding sites
that are responsive to changes in DNA methylation.. .................................. 18
Figure 9. RNAPII gains few binding sites in DKO1 cells. .......................... .19
Figure 10. RNAPII ranked peak height in HCT116 and DKO1.. ................ 20
Figure 11. Tag density plots and % methylation of each RNAPII peak set.. 21
Figure 12. Only a few DKO1 unique peaks are due to loss of DNA
methyltion.. ................................................................................................... 22
Figure 13. Differential peak analysis reveals only 146 DKO1 unique peaks..
...................................................................................................................... 23
Figure 14. ZNF274 binding does not require DNA methylation. ............... .24
6
Figure 15. ZNF274 ranked peaks in HCT116 and DKO1.. .......................... 25
Figure 16. Tag density plots and % methylation of each ZNF274 peak set. 26
Figure 17. Examples of ZNF274 DKO1 unique peaks.. .............................. 27
Figure 18. MAX gains many new binding sites in DKO1. .......................... 28
Figure 19. MAX ranked peaks in HCT116 and DKO1.. .............................. 28
Figure 20. Tag density plots and % methylation of each MAX peak set.. ... 29
Figure 21. Differential peak analysis reveals many DKO1 unique binding
sites.. ............................................................................................................. 30
Figure 22. New binding sites contain the MAX motif.. ............................... 31
Figure 23. MAX new binding sites are mostly at gene bodies and distal
regions……………………………………………………………………...32
Figure 24. CEBPB has distinct binding patterns in HCT116 and DKO1 cells.
....
33
Figure 25. CEBPB ranked peaks in HCT116 and DKO1. . .......................... 33
Figure 26. Analysis of DKO1 unique and HCT116 unique peaks identifies
different motifs.. ........................................................................................... 35
Figure 27. CEBPB binding sites are mostly at distal regulatory regions. .... 36
Table 1. Sequencing and peak metrics. ........................................................ 17
7
Table 2. RNA expression level of ATF4, CEBPB, ETS1 and ETS2 in
HCT116 and DKO1 cells. ............................................................................ 39
Table 3. Antibody information ..................................................................... 42
Table 4. Primers information ........................................................................ 44
8
Abstract
DNA methylation is an important epigenetic mark in the human genome. It is known as a
repressive mark that associates with stably silenced genes. Hypotheses proposed for how
DNA methylation silences expression include a) either the heterochromatin structure that
is associated with DNA methylation prevents gene expression or b) DNA methylation
directly blocks transcription activator binding. In my study, chromatin immune-
precipitation sequencing (ChIP-seq) and whole genome bisulfite sequencing (WGBS)
were used to test these two mechanisms in a global DNA demethylation model: HCT116
vs. DKO1 cells. ChIP-seq data for four proteins, including RNA polymerase II, ZNF274,
MAX and CEBPB were analyzed. My results suggest that DNA methylation has different
effects on different factors. For example, I found that a) promoter demethylation is not
sufficient to open heterochromatin and recruit RNAPII, b) ZNF274 binds independent of
DNA methylation, c) demethylation of MAX motifs leads to more binding, and d)
CEBPB binds to methylated regions and unmethylated regions, perhaps using different
protein partners. In conclusion, the function of DNA methylation on gene expression is
more complex than expected.
9
Introduction
DNA methylation
DNA methylation is an important epigenetic mark in the human genome. DNA
methylation usually occurs on the 5th carbon of cytosine and happens in the context of a
CpG dinucleotide. There are regions in the human genome that are more enriched for
CpG sites. These regions are called CpG islands (CGIs). CGIs usually are longer than
200bp and have an observed- to-expected CpG ratio greater than 60% (Gardiner-Garden
& Frommer, 1987). Most CpG sites in the human genome are methylated except for
CpGs in CGI promoters which tend to be unmethylated and associated with actively
transcribed genes. Studies have shown that DNA methylated CGI promoters are stably
silenced (Mohn et al., 2008; Payer & Lee, 2008; Stein, Razin, & Cedar, 1982).
DNA methylation is carried out by a group of enzymes called DNA methyltransferases
(DNMTs). There are three major DNMTs: DNMT1, DNMT3a and DNMT3b. DNMT3a
and DNMT3b have de novo methylation activity. On the other hand, DNMT1 is in charge
of the maintenance of DNA methylation on the daughter strand during the DNA
replication process. Although most normal cells express these DNMTs, there is an
engineered cell line called DKO1 (Rhee et al., 2002) which is derived from HCT116
human colon cancer cells but lacks most DNMT activity (due to a knockout of DNMT1
and DNMT3b). DKO1 has a 95% global DNA methylation loss (Figure 1), making
DKO1 an ideal model to study the consequences of DNA demethylation and further
elucidate the function of DNA methylation in transcription regulation.
10
Figure 1. Whole genome bisulfite sequencing comparative analysis of HCT116 and
DKO1 cells (Blattler et al., 2014). A. Light blue and pink tracks represent the
sequencing coverage along a segment of human chromosome 19 in HCT116 and DKO1
cells, respectively. Dark blue and red tracks illustrate the percent methyl-C/C in HCT116
and DKO1, respectively. Light-colored lines within the %methylation tracks represent the
average %methylation in the immediate region. CpG islands are shown above the RefSeq
genes as green bars. B. Box plot illustrating the percent methylation in promoters, gene
bodies, and random regions of the genome; the horizontal line in each bar indicates the
median value. For HCT116 cells the median values are 30% (promoters), 84% (gene
bodies), and 84% (random regions) and for DKO1 cells the median values are <1%
(promoters), 13% (gene bodies), and 9% (random regions). C. The number of promoters
containing varying levels of methylation in HCT116 and DKO1 are shown; the minimum
and maximum DNA methylation values for the region between -100 and +700 relative to
the start site at the promoters in each group is indicated by the color key.
Repression mechanisms
DNA methylation at CGI promoters is known to be a repressive mark whereas the
function of DNA methylation at distal regulatory regions (including gene body and distal
regions) has not been as well studied.
11
There are two mechanisms that have been proposed for how DNA methylation silences
gene expression. The first is that the heterochromatin structure at methylated CGIs closes
these regions and does not allow access of the general transcriptional machinery due to
dense nucleosome formations (For reviews, see Deaton & Bird, 2011; Klose & Bird,
2006). Thus, gene expression is shut down. However, whether DNA methylation initiates
the silencing event or if it just serves as a lock of a previously established
heterochromatin structure remains unclear. The other proposed mechanism is that DNA
methylation at transcription factor binding motifs can directly block site specific factors
from recognizing the motifs. Therefore, downstream gene expression that is regulated by
these transcription factors will be repressed. The second mechanism requires the motif to
have the ability to be methylated. That is to say, there should be at least one CpG site
within the motif.
Study plan
The two proposed mechanisms are mostly based on the study of DNA methylation at CGI
promoters. Whether DNA methylation plays the same role at enhancers and other
functional elements has not been well-studied. Also, the possibility that DNA
demethylation can reverse the silencing process is an unaddressed question.
Using HCT116 and DKO1 cells as a DNA demethylation model and genome wide
experimental tools, I have tried to shed light on these two aspects in my studies. Whole
genome bisulfite sequencing (WGBS) was used to produce the methylome of both cell
lines and ChIP-seq was used to identify in vivo binding profiles of RNA polymerase II
and transcription factors. Site-specific transcription factors containing CpG within their
12
recognition motif (Figure 2) and other factors that have been suggested by the literatures
to bind to methylated regions were tested.
Figure 2. Site-specific transcription factors containing CpG dinucleotides within
their recognition motif (Blattler & Farnham, 2013). An analysis of HOMER v4.3 and
FactorBook motif databases (78, 79) identified a small number of motifs having at least
one CpG dinucleotide at a critical position. These included motifs bound by members of
the ATF family (ATF1, ATF3), EGR1, members of the ETS family (ETS1, ELF1, ELK4,
and GABPA), SP1, the MYC family (MYC, MAX, NMYC, USF1, BHLHE40), ZBTB33,
CRE-binding factors, HIF1A, NRF1, and members of the E2F family. Shown are
examples of the motif for a member of each family that has a critical CpG in its
recognition sequence. ChIP-seq data for ATF3, EGR1, ELF1, SP1, USF1, and ZBTB33
produced by the ENCODE consortium (77) was compared with whole genome bisulfite
sequencing data (Get from Blattler et al. Submitted, 2014); all data are from HCT116
colorectal cancer cells. On the right, the degree of DNA methylation of a region1500
from the center of each ChIP-seq peak is plotted for those TFs. In all cases, DNA
methylation is absent from the center of the TF-binding sites. To determine the in vivo
relationship between TF binding and DNA methylation, experiments such as this must be
performed comparing ChIP-seq data with whole genome DNA methylation data in
matched cell types.
13
Results
ChIP-seq antibody specificity validation
In order to produce a high quality ChIP-seq experiment, it is critical to have an antibody
that specifically recognizes the target protein. I tested 9 antibodies in total, including
antibodies for RNAPII, ZNF274, CEBPB, MAX, MYC, E2F4, SP1 and KLF4. Four of
these worked in ChIP experiments (RNAPII, ZNF274, MAX, and CEBPB). In order to
further confirm their specificity, I performed IP western or western blotting (Figure 3).
RNAPII, ZNF274 and CEBPB antibodies showed great specificity in the blotting image;
however, the MAX protein was harder to detect (data not shown), perhaps due to its
relatively smaller size (19kD).
Figure 3. ChIP-seq antibody validation. The RNAPII panel is an IP western image.
INP stands for input and IP stands for immune-precipitated sample. IgG stands for IgG
control. The ZNF274 and CEBPB panels are western blotting images. The cells used in
each experiment are also shown.
Defining ChIP-seq peaks for RNAPII, ZNF274, MAX, and CEBPB
Most ChIP-seq experiments were performed on two independent biological replicates.
Peaks were called for each replicate using either Sole-Search (Blahnik et al., 2010) for
14
RNAPII and ZNF274 or HOMER findPeaks (Heinz et al., 2011) for MAX and CEBPB.
Reproducible peaks were identified by overlapping the 2 replicates. Those peaks that
were present in both replicates were selected as high confidence (HC) peaks. (Figure 4)
Figure 4. Selection of high confidence peaks. The 2 blue circles in the Venn diagram
represent 2 replicates of a ChIP-seq experiment. The overlap area represents peaks that
appear in both replicates. These peaks were selected as high confidence (HC) peaks.
To eliminate false positive peaks from downstream analyses, different additional data
quality control strategies were used on different factors according to the property of the
factor and the data sets.
For example, RNAPII only binds to promoters and peaks located outside of promoters are
very likely due to DNA looping. Therefore, a 4 kb window (+/- 2kb) around the
transcription start site (TSS) of all human genes was used to select for promoter proximal
RNAPII peaks. Only HC peaks that were located within this window were used in
downstream analyses (Figure 5).
15
Figure 5. RNAPII data quality control. RNAPII HC peaks located in promoter
proximal regions (+/-2kb from TSS) were used for analysis. The two tracks represent 2
replicates of the RNAPII ChIP-seq data. The black box below represents the 4 kb
window that was used for filtering.
ZNF274 has fewer binding sites as compared to other transcription factors (Frietze,
O’Geen, Blahnik, Jin, & Farnham, 2010). As a consequence, peak-calling programs tend
to overcall peaks and many of the very small peaks are simply noise (Figure 6).
Therefore, I used visual inspection to remove the most obvious false positive peaks.
These peaks were mostly in highly repetitive regions around centromeres. However,
since I only have one replicate of ZNF274-DKO1 ChIP-seq, this data set might still have
some false positives that remain even after this quality control step.
16
Figure 6. ZNF274 data quality control. Dark blue tracks are ZNF274-DKO1 ChIP-seq
visualization tracks and pink tracks represent peaks that were called by the peak calling
program. The upper panel shows examples of good ZNF274 peaks whereas the lower
panel shows false positive peaks.
I generated only one replicate of MAX-DKO1 and MAX-HCT116 ChIP-seq data. To
obtain reliable peak sets for Max-HCT116, I compared the peak numbers to ENCODE
MAX-HCT116 data. I downloaded two MAX-HCT116 ChIP-seq raw data sets from
ENCODE. I called peaks on all 3 datasets (mine and the two from ENCODE) with the
HOMER findPeaks command program. Using a three-way overlap analysis, peaks that
were present in all three replicates were called HC peaks (Figure 7 A). For the single
replicate of MAX-DKO1 ChIP-seq, peaks were ranked according to their score; a cut-off
at rank 21,641 was selected to maintain a peak score of 24 while obtaining a similar peak
number as MAX-HCT116 HC peaks. (Figure 7 B).
Figure 7. MAX data quality control. A. The three circles represent 3 replicates of
MAX-HCT116; peaks present in all the replicates are selected as high confidence peaks.
B. In the single MAX-DKO1 data set, ranked peaks were truncated to 21,641.
The CEBPB-HCT116 and CEBPB-DKO data sets did not require additional quality
control steps because I produced two replicates of each dataset. Therefore, I simply
17
selected for HC peaks as those peaks that were present in both replicates of the HCT116
and DKO1 datasets.
RNAPII-
HCT116
RNAPII-
DKO
ZNF274-
HCT116
ZNF274-
DKO
CEBPB-
HCT116
CEBPB-
DKO
MAX-
HCT116
MAX-
DKO
Unique
reads, Rep1
41,786,921 44,000,451 21,119,419 31,827,110 18,879,896 39,419,456 30,790,105 38,175,188
Peak
number,
Rep1
25,068 13,420 652 2,176 54,260 6,759 24,967 33,529
Unique
reads, Rep2
14,961,748 27,558,447 32,176,543
NA
37,589,886 38,866,741 23,302,004
NA Peak
number,
Rep2
18,194 7,001 1,514 26,641 13,978 56,811
Unique
reads, Rep3
NA
25,996,337
NA Peak
number,
Rep3
62,180
HC peaks 16,806 6,853 628 NA 15,048 5,310 18,835 NA
HC peaks
after QC
12,004 5,795 385 683 15,048 5,310 18,835 21,641
Table 1. Sequencing and peak metrics.
After all data sets were properly processed (Table 1), the next step was to overlap
HCT116 peaks with DKO1 peaks for each factor. That produced 3 different peak sets for
each factor (Figure 8). Peaks that exist in both cell lines were defined as common peaks.
They represent those binding site that were in HCT116 cells and were retained after DNA
demethylation in DKO1. Peaks that were found only in HCT116 were defined as HCT16
unique peaks. These peaks represent binding sites that are no longer bound by that factor
after demethylation in DKO1. Peaks that were found only in DKO1 were defined as
DKO1 unique peaks. These peaks represent new binding sites that become occupied by
that factor after demethylation in DKO1. Downstream analyses were conducted using
these three different peak sets for each factor.
18
Figure 8. Overlapping peaks in HCT116 and DKO1 identifies binding sites that are
responsive to changes in DNA methylation. The blue and red circles represent HCT116
and DKO1 peaks respectively. Overlapped peaks unique peaks are defined in the text.
RNAPII recruitment is not enhanced by global promoter demethylation
RNAPII is a major polymerase that is in charge of transcription of all the protein coding
genes. It binds to unmethylated promoters in euchromatin and transcribes downstream
genes into RNA. DNA methylation at a promoter is thought to silence gene expression by
forming heterochromatin, blocking RNAPII from binding at the promoter and repressing
gene expression. In my studies, I wanted to test the hypothesis that demethylation of
promoters in DKO1 cells would open heterochromatin and allow RNAPII to bind to
previously silenced promoters.
Around 30,000 promoters that used to be highly methylated in HCT116 cells are
demethylated in DKO1 cells (Figure 1 C). To test my hypothesis, I performed ChIP-seq
for RNAPII in DKO1 cells. If my hypothesis was correct, I expected to identify around
30,000 new RNAPII binding sites in DKO1 unique peaks for RNAPII. However, an
overlap analysis of RNAPII peaks identified only 420 DKO1 unique peaks; 6,569
19
HCT116 unique peaks and 5,710 peaks common in HCT116 and DKO1 were also
identified (Figure 9). In other words, more than 6,000 of RNAPII binding were lost after
global demethylation and only 420 were gained after global demethylation.
Figure 9. RNAPII gains few binding sites in DKO1 cells. Blue and red circles
represent RNAPII HCT116 and DKO1 peaks respectively.
I next ranked the RNAPII peaks according to their peak height; an inflection plot peak
height vs. peak rank is shown in Figure 10. This analysis shows that not only do DKO1
cells have fewer peaks (5,759 peaks) than HCT116 (12,004 peaks), but DKO1 peaks are
also smaller than HCT116 peaks.
20
Figure 10. RNAPII ranked peak height in HCT116 and DKO1. Peak height is shown
on the Y axis and peak rank is shown on the X axis. The blue line represents DKO1
peaks and the red line represents HCT116 peaks.
I next generated tag density plots for HCT116 unique, common and DKO unique peak
sets. All peaks were centered on the peak center and the average tag density across all
peaks was calculated at each position. The average % DNA methylation was also plotted
for each peak set (Figure 11). HCT116 unique peaks are well shaped and there are few
tags in these peaks in DKO1 cells. This suggests that these HCT116 RNAPII binding
sites are completely lost in DKO cells. Also, these sites are all unmethylated in both cell
lines. Common peaks are defined as those sites that are present in HCT116 and are
retained in DKO1 cells after removal of DNA methylation. Common peaks are smaller
in DKO1 cells than in HCT116 cells, indicating that RNAPII binding strength was
weaker at the common sites after global demethylation. As for the DKO unique peaks,
the average tag density at these peaks in DKO cells was actually smaller than in HCT116.
This suggests that perhaps some or all of the DKO unique sites were in fact also bound
by RNAPII in HCT116 cells (i.e. are false negatives in HCT116). Perhaps these peaks
21
were not called as peaks in HCT116 due to an issue with the peak-calling program or
perhaps they were a peak in one HCT116 replicate but not the other HCT116 replicate
(and thus did not appear in the HC peak set). I did note that, on average, DKO unique
peaks are more than 40% methylated in HCT116, suggesting that at least some newly
unmethylated promoters were bound by RNAPII in DKO but not HCT116. However, I
was not able to see them in the ChIP-seq tag density plot because they were
overshadowed by false negative RNAPII peaks when I plotted the average tag density for
all DKO unique peaks.
Figure 11. Tag density plots and % methylation of each RNAPII peak set. The red
line represents HCT116 cells and the blue line represents DKO1 cells. The X axis in each
plot is the distance from the peak centers. The upper three panels are RNAPII ChIP-seq
tag density plots for each peak set and the bottom three figures are the % methylation of
the same peak sets.
22
To examine the possibility that false negatives were influencing my results, tags from
individual peaks were plotted using a heat map for all DKO unique peaks (Figure 12).
Clustering results reveal that only those peaks in the red bracket are real DKO unique
peaks, having higher tag densities in DKO1 than in HCT116. The methylation heat map
shows that this small set of peaks are located in regions that were highly methylated in
HCT116 cells.
Figure 12. Only a few DKO1 unique peaks are due to loss of DNA methylation.
Unsupervised clustering of the DKO1 unique RNAPII peaks was performed. Each row in
the heat map represents one DKO unique peak. The left 2 columns are RNAPII tags for
each peak in HCT116 and DKO. The darker the blue, the lower the tag count, whereas
green represents higher tag counts. The two columns on the right represent percent
methylation in each peak. In this case, dark red indicates high DNA methylation and dark
blue indicates low DNA methylation.
23
To further confirm this result, an analytical program called Homer getDifferentialPeaks
was used to quantify differential peaks in HCT116 and DKO1 cells. By using a fold
change cut-off of 1, I found 11,919 peaks that were lower in DKO1 cells (Down in DKO
peaks) and 146 peaks that were higher in DKO1 cells (Up in DKO1 peaks) (Figure 13).
ChIP-seq tag density plots confirmed that each group of peaks is well shaped. A DNA
methylation heat map indicated that for the Down in DKO peaks, the binding sites are
unmethylated in both cell lines. However, for the Up in DKO peaks, the binding sites are
heavily methylated in HCT116. Thus, this analysis eliminated the false negative
HCT116 peaks and revealed that there are only 146 real new RNAPII sites that are
present in demethylated promoters.
Figure 13. Differential peak analysis reveals only 146 DKO1 unique peaks. The red
line represents HCT116 cells and the blue line represents DKO1 cells in the tag density
plots. The X axis in each plot and in the heat maps represents the distance from the peak
centers. For the heat map, dark blue indicates 0% methylation and dark red indicates 100%
methylation.
24
To summarize, even though more than 30,000 promoters were demethylated in DKO1
cells, only a small subset of these (146) were bound by RNAPII. For the rest of the
promoters, it appears that demethylation is not sufficient to recruit RNAPII.
ZNF274 binding does not require DNA methylation
ZNF274 is a zinc finger protein that targets the 3’ end of a subset of ZNF genes (Frietze
et al., 2010). It recruits KAP1 and the histone methyltransferase SETDB1to the 3’ ends of
these genes and adds the H3K9me3 repressive mark to those regions. Because ZNF274
binds in transcribed regions, its binding sites are in regions of high DNA methylation.
However, whether binding of ZNF274 requires DNA methylation for robust binding is
unknown.
I overlapped ZNF274 ChIP-seq peaks from HCT116 and DKO1 cells and found that the
majority of HCT116 peaks (338 out of 385) are also in the DKO1 peak set; only 61
HCT116 peaks are unique. This suggested that demethylation does not prevent ZNF274
from binding to most of the sites to which it was bound in HCT116 cells (Figure 14). I
also identified 364 DKO1 unique ZNF274 peaks.
Figure 14. ZNF274 binding does not require DNA methylation. The blue and red
circles represent ZNF274 HCT116 and DKO1 peaks respectively.
25
Give that I have only 1 DKO1 dataset available for the analysis, the DKO1 unique peaks
might be false positive peaks. In an attempt to address this possibility, I plotted ZNF274
peak height vs. peak rank (Figure 15). I found that the number of big peaks is similar in
both DKO and HCT116 cells. Moreover, DKO1 has many small peaks with a peak height
lower than 25 tags. This plot supports the hypothesis that the DKO1 unique peaks are
small false positive peaks.
Figure 15. ZNF274 ranked peaks in HCT116 and DKO1. Peak height is shown on the
Y axis and peak rank is shown on the X axis. The blue line represents DKO1 HC peaks
and the red line represents HCT116 HC peaks.
I plotted ChIP-seq tag density plots and average % DNA methylation for ZNF274
HCT116 unique, common and DKO1 unique peaks sets (Figure 16). Because there are
only 61 HCT116 unique peaks, the plots are very jagged. Analysis of the common peaks
shows that these sites are robust in both cell lines. The % DNA methylation plot shows
that the DNA methylation level is distinctly different between the two cell lines. In
HCT116 cells, the regions corresponding to the common sites are more than 80%
methylated in HCT116 cells but only 30% methylated in DKO1 cells. This suggests that
reduction of DNA methylation at these sites does not prevent ZNF274 binding. DKO
26
unique peaks on the other hand, have a similar % DNA methylation pattern as common
peaks in HCT116 and DKO1. However, the ChIP-seq tag density plot shows that DKO1
unique peaks have a very small average tag count compared to common peaks. This
further suggests that the DKO1 unique peaks are not real binding sites but instead may be
small false positive peaks.
Figure 16. Tag density plots and % methylation of each ZNF274 peak set. The red
lines represent HCT116 cells and blue lines represent DKO1 cells. The X axis in each
plot is the distance from peak centers. The upper three panels are ZNF274 ChIP-seq tag
density plots for each peak set and the bottom three panels are the % methylation of the
same peak sets.
As a final analysis of the DKO1 unique peaks, Figure 17 shows browser snapshots of
four examples of DKO1 unique peaks. None of the peaks shown are strongly bound by
ZNF274. Therefore, I have concluded that for ZNF274, no sites are lost in the
demethylated cells and very few, if any, robust new sites appear. Therefore, even though
27
ZNF274 binding sites are in highly methylated regions (because they are in gene bodies),
ZNF274 binding itself does not require DNA methylation.
Figure 17. Examples of ZNF274 DKO1 unique peaks. The dark blue tracts are
ZNF274 ChIP-seq visualization tracts. The pink track represents DKO1 unique peaks that
were called by the program. Four examples were shown, two on chromosome 1 and two
on chromosome 19.
DNA demethylation at MAX motifs allows more MAX binding
MAX, MYC association factor X, is a transcription factor that forms homodimers and
heterodimers with other family members, such as MAD, MXI1, and the oncogene MYC.
These protein complexes compete for a common motif 5'-CACGTG-3'. The CG
dinucleotide at the center of the motif has the potential to be methylated. ENCODE data
shows that MAX tends to bind unmethylated sites in HCT116 cells (Figure 2).
The overlap analysis of MAX HCT116 peaks and DKO1 peaks identified 10,690
common peaks that remained bound after demethylation. 8,114 HCT116 unique peaks
representing binding sites no longer bound in DKO1, and 10,663 DKO1 unique peaks
representing new binding sites in DKO1cells (Figure 18). The DKO1 unique peak
number for MAX is large compared to the RNAPII and ZNF274 analyses.
28
Figure 18. MAX gains many new binding sites in DKO1. The blue and red circles
represent MAX HCT116 and DKO1 peaks respectively
An inflection graph for all four replicates of the MAX data sets shows that the peak score
distribution is similar in each data set, regardless of the cell line (Figure 19).
Figure 19. MAX ranked peaks in HCT116 and DKO1. Peak height is shown on the Y
axis and peak rank is shown on the X axis. The blue, red and green lines represent 3
eplicates of HCT116 peaks and the orange line represents DKO1 peaks.
ChIP-seq tag density plots of the three groups of peaks show that the HCT116 unique
peaks are real peaks (Figure 20). Common peaks are well bound by MAX in both cell
29
lines with relatively less tags in DKO1 cells. DKO1 unique peaks appear to have similar
average tags with the common peaks in DKO1 cells. However, these DKO1 unique peaks
also have tags in HCT116 cells, and the average tag counts in both cell lines are almost
the same. This indicates that even though false positive DKO1 peaks issue is less likely
to exist in this data set, false negative HCT116 peaks could exist. DNA methylation heat
maps show that HCT116 unique peaks and common peaks are at regions that are
unmethylated in both cell lines (Figure 20). On the other hand, DKO1 unique peaks are
at regions that are highly methylated in HCT116 cells but unmethylated in DKO1 cells.
The DKO1 unique DNA methylation heat map in HCT116 shows that peak centers are
slightly less methylated in HCT116 compared to the edge of the peaks. However, a
differential peak analysis is still needed to eliminate the false negative HCT116 peaks
from DKO1 unique peaks and confirm the DNA methylation analysis result.
Figure 20. Tag density plots and % methylation of each MAX peak set. The red lines
represent HCT116 cells and blue lines represent DKO1 cells in the tag density plot. The
X axis in each plot and in the heat maps represents the distance from the peak centers.
For the heat map, dark blue indicates 0% methylation and dark red indicates 100%
methylation.
30
Differential peak analysis reveals 17,425 Down in DKO1 peaks and 6,418 Up in DKO1
peaks (Figure 21). The ChIP-seq tag density plots show that Up in DKO1 peaks have a
higher average tag count in DKO1 cells compared to HCT116 cells and Down in DKO1
peaks have a higher average tag count in HCT116 cells. Differential peak analysis shows
that over 6,000 peaks out of the 10,663 DKO1 unique peaks are real MAX binding sites
that appeal after removal of DNA methylation. The remaining 4,000 DKO1 unique peaks
are false negative HCT116 peaks. The DNA methylation heat map for each peak set
confirmed the result from Figure 20 that the new binding sites in DKO1 are at regions
that are demethylated in DKO1 (even the peak centers are now heavily methylated after
removing false negative HCT116 peaks) whereas other binding sites (HCT116 unique
and common peaks) are unmethylated in both cell lines.
Figure 21. Differential peak analysis reveals many DKO1 unique binding sites.
The red lines represent HCT116 cells and blue lines represent DKO1cells in the tag
density plot. X axis in each plot and in the heat maps represents the distance from the
peak centers. For the heat map, dark blue indicates 0% methylation and dark red indicates
100% methylation.
31
Motif density plots of the differential peaks shows that both the Down in DKO1 peaks
and the Up in DKO1 peaks are enriched for the MAX motif 5’-CACGTG-3’ (Figure 22).
For the Up in DKO1 peaks, it is possible that DNA demethylation exposed previously
methylated motifs for MAX to recognize.
Figure 22. New binding sites contain the MAX motif. MAX motif were used as shown
by motif logo. The green line represents MAX motif density at each position in the two
peaks sets: Down in DKO and Up in DKO.
Location analysis of Up in DKO1 peaks shows that most of them are at distal regulatory
regions or in gene bodies (Figure 23). There are two possibilities for this observation:
either DKO1 demethylated promoters do not contain many MAX motifs or distal
regulatory regions are more sensitive to DNA demethylation.
32
Figure 23. New MAX binding sites are mostly at gene bodies and distal regions.
The pie chart shows the location analysis of MAX DKO1 unique peaks. Yellow
represents gene body regions, red represents promoter regions and blue represents distal
regions.
CEBPB binds to methylated regions in vivo
CEBPB is a B-ZIP protein that binds to the DNA major groove as either a homodimer or
a heterodimer with other factors. It regulates genes involved in cell proliferation,
differentiation, inflammation and metabolism. In vitro studies show that CEBPB
homodimers recognize a methylated motif 5’-TTGCGCAA-3’, whereas CEBPB|ATF4
heterodimers recognize a methylated 5’-CGATGCAA-3’ motif (Mann et al., 2013). I
conducted in vivo assay ChIP-seq in HCT116 and DKO1 cells to determine if the CEBPB
binding pattern changes upon DNA demethylation.
15,048 HC peaks were called for CEBPB in HCT116 cells and 5,310 HC peaks were
called in DKO1 cells. I overlapped the HCT116 peaks with the DKO1 peaks. The Venn
diagram shows that majority of HCT116 peaks (12,400 out of 15,048 peaks) are no
33
longer bound by CEBPB after global demethylation; only 20% of the peaks (2,642 peaks)
are retained (Figure 24). Meanwhile more than 2,000 new CEBPB binding sites are
created in DKO1 cells.
Figure 24. CEBPB has distinct binding patterns in HCT116 and DKO1 cells.
The blue and red circles represent CEBPB HCT116 and DKO1 peaks respectively.
An inflection plot of peak scores against peak score rank for all four CEBPB data sets
shows that peaks in HCT116 cells are slightly larger than in DKO1 cells (Figure 25).
Figure 25. CEBPB ranked peaks in HCT116 and DKO1. Peak height is shown on the
Y axis and peak rank is shown on the X axis. The blue and red lines represent 2 replicates
of HCT116 peaks and the green and purple lines represent 2 replicates of DKO1 peaks.
34
To measure the DNA methylation level at the CEBPB binding sites, ChIP-seq tag density
plots and DNA methylation heat maps were generated for each peak set. DKO1 unique
peaks have an average tag count twice higher in DKO1 cells than in HCT116 cells,
suggesting there are that no false positive or false negative issue for this peak set (Figure
26 A).
The heat maps show that all the peak sets are at regions that are heavily methylated
(>80%) in HCT116 and demethylated in DKO1 (Figure 26 B). The peak centers are
more methylated than the flanking regions, suggesting that these binding sites might
contain motifs that have a methylated CpG at the peak center. For the HCT116 unique
peaks, this result suggests that 80% of the CEBPB binding in HCT116 cells are DNA
methylation dependent. However, for the DKO1 unique peaks, DNA demethylation
actually leads to new CEBPB binding, suggesting that CEBPB is also capable of
recognizing unmethylated binding sites. This indicates that there might be two different
binding mechanisms in HCT116 and DKO1 cells. The common peaks could be the sites
that can be bound under both mechanisms.
I performed a de novo motif search for HCT116 unique peaks and DKO1 unique peaks to
determine if CEBPB recognizes a different motif in DKO1 cells as compared to HCT116
cells (Figure 26 C). However, the predominant motif for both peak sets is the typical
CEBPB binding motif 5’-TTGCGCAA-3’. Interestingly, the second most common motif
is different in the two cell types. In HCT116 unique peaks, the second motif is an ATF4
motif. This result supports the in vitro study that CEBPB can form a heterodimer with
ATF4 and bind to methylated motifs (Mann et al., 2013). In DKO1 unique peaks, the
35
second motif that comes out is an ETS|IRF motif, suggesting that CEBPB can dimerize
with factors from a different protein family and recognize unmethylated motifs.
Figure 26. Analysis of DKO1 unique and HCT116 unique peaks identifies different
motifs. A. The red lines represent HCT116 cells and the blue lines represent DKO1cells
in the tag density plots. The X axis in each plot and in the heat maps indicates the
distance from the peak centers. B. In the heat maps, dark blue indicates 0% methylation
and dark red indicates 100% methylation. C. The two most predominant motifs that come
out from de novo motif searches in HCT116 unique and DKO1 unique peaks.
Location analysis of each peak set shows that CEBPB is mainly binding at distal
regulatory regions (Figure 27).
36
Figure 27. CEBPB binding sites are mostly at distal regulatory regions. The pie
charts show the location analysis of CEBPB HCT116 unique, common, and DKO1
unique peaks. Yellow represents gene body regions, red represents promoter regions, and
blue represents distal regions.
37
Discussions
My analyses of ChIP-seq data for RNAPII and three transcription factors in HCT116 and
DKO1 cells revealed that DNA demethylation has a different impact on different factors.
Even though DNA methylation at CGI promoters is strongly associated with inactive
genes and heterochromatin, I found that promoter demethylation is not sufficient (and
perhaps not necessary) to open heterochromatin and recruit RNAPII. My studies show
that global demethylation leads to few new RNAPII binding sites. However, these few
binding sites do occur at newly unmethylated promoters, suggesting that DNA
methylation does play a role in preventing RNAPII binding at a small subset of promoters.
But other than that, nucleosome positioning might play a more important role in stably
silencing genes. My studies suggest that DNA methylation is more of a lock rather than a
cause of heterochromatin. Also, the loss of RNAPII binding sites in DKO1 cells needs
further investigation. Whether the gene expression level on those promoters goes down or
it is just the poised RNAPII that was lost in DKO1 cells remains unknown. To address
these questions, genome wide nucleosome occupancy and RNA expression profiles can
to be incorporated into the analyses.
My studies also suggest that ZNF274 binds independent of DNA methylation. Despite
the fact that ZNF274 binding sites are associated with DNA methylation, removal of
DNA methylation at these sites does not affect the majority of ZNF274 binding. ZNF274
binds to the 3’ end exons of ZNF genes, which are gene bodies. These gene bodies are
usually methylated, but clearly ZNF274 does not rely on DNA methylation to identify its
target sites. This conclusion gives us a caveat that the association between transcription
38
factor binding and DNA methylation does not necessarily indicate causation between the
two events.
MAX is a sequence specific transcription factor. Removal of DNA methylation allows
more MAX binding and these new binding sites are enriched for the MAX motif. Given
that the MAX motif contains a critical CpG at the center, methylation at these critical
CpG sites might prevent MAX from binding to these motifs. Therefore, I propose that
removal of DNA methylation in DKO1 cells exposes newly unmethylated motifs for
MAX to bind. These new binding sites are mostly at distal regulatory regions, perhaps
because promoter MAX motifs are already unmethylated in HCT116 cells. DNA
methylation analysis on all the MAX motifs should be done to further examine this
hypothesis. Also, MAX shares the same motif with some other factors, including MYC.
Whether these factors follow the same manner with MAX should be tested. MYC plays a
vital role in tumor progression. If MYC binding is also affected by DNA demethylation,
perhaps demethylation will affect tumor progression. These are interesting questions to
look into in the future. Moreover, there are other factors that have critical CpGs in their
motifs (Figure 2). The same experimental approaches can be applied to these factors to
see if their binding in HCT116 and DKO1 cells changes in the same way.
My studies confirmed that CEBPB binds to methylated regions in vivo. DNA
demethylation leads to loss of >80% of the CEBPB binding sites. Motif analysis on the
lost sites and gained sites identified different motifs sets, indicating that CEBPB might
function through different pathways in HCT116 cells vs. DKO1 cells. My studies
suggest that CEBPB dimerizes with ATF4 and recognizes a methylated motif and
dimerizes with an ETS family member to bind unmethylated motifs. To understand the
39
possible mechanisms by which DNA demethylation triggered this change, I investigated
the gene expression levels of CEBPB, ATF4 and ETS1 in HCT116 and DKO1 cells
(Table 2). RNA-seq data (Blattler et al., 2014) shows that neither CEBPB nor ATF4 has
significant changes (p<0.05) between HCT116 and DKO1 cells. However, ETS1 and
ETF2 are significantly up-regulated in DKO1 by 27.4 fold and 1.86 fold, respectively.
The absolute expression level of these factors shows that ATF4 is much more abundant
than CEBPB in both cell lines (>10 fold). On the other hand, ETS1 is almost not
expressed in HCT116 but is expressed two fold higher than CEBPB in DKO1. This
suggests that in HCT116 cells there might be not enough ETS1 to take CEBPB to
unmethylated sites. However, with the increased ETS1 expression in DKO1, there is now
enough to dimerize with CEBPB and bind to nonmethylated sites.
Table 2. RNA expression level of ATF4, CEBPB, ETS1 and ETS2 in HCT116 and
DKO1 cells.
To understand the function of DNA methylation, one must consider the location of the
binding regions (promoter vs. distal) as well as the distribution of CpG sites. For the three
transcription factors I investigated in this study, two of them are actually affected by
DNA demethylation. Although it seems like DNA demethylation affects binding of these
two factors more at distal regulatory regions, promoter proximal regions are already more
Gene name
DKO1 over HCT116
fold change
p-value
RNA level in
HCT116
RNA level
in DKO1
ATF4 -1.24677 0.54889 625.03 501.254
CEBPB 2.42788 0.360865 38.5872 99.6808
ETS1 27.4609 0.008938 5.21 143.0695
ETS2 1.86373 0.037032 50.54735 94.2069
40
unmethylated than distal regulatory regions in the HCT116 cells. Therefore, it is hard to
know if DNA methylation functions differently in distal vs promoter regions using this
demethylation DKO1 model. Also, CpG distribution at the binding sites needs to be taken
into consideration when drawing further conclusions.
41
Materials and Methods
Cell culture
The human cell lines HCT116 (ATCC #CCL-247) and DKO1 (Rhee et al. Nature, 2002)
were cultured in DMEM (Corning Cellgro) medium supplemented with 10% fetal bovine
serum (Gibco) and 1%penicillin/streptomycin at 37°C with 5.0% CO
2
; cells were
harvested for downstream experiments at 80% confluence.
Western blots and IP westerns
Nuclear extracts were prepared using cell lysis buffer with proteinase inhibitor. 5 seconds
of sonication was used to break the nuclei. The BCA assay was used to quantify nuclear
extract concentration.
For western blotting, 40ug of nuclear extract was used for each sample lane. After mixing
with sample loading buffer and denaturing at 95°C for 5mins, samples were loaded onto
premade SDS-PAGE gels (BIO RAD Mini-PROTEAN TGX Gels, Cat#456-9025) with
5ul protein ladder (BIO RAD Precision Plus Protein Dual Color Standards, Cat#161-0374)
in a separate lane. Gels were run under 200V for 40mins and followed by wet transfer at
4°C for 1hr under 90V. After 1hr of blocking, the membranes were incubated with
diluted primary antibody overnight. The next day, membranes were washed and
incubated with secondary antibody. After that membranes were scanned using a Li-COR
system.
42
For IP westerns, an additional overnight immune precipitation step was performed prior
to western blotting. 100ug nuclear extract were used for each IP and IgG was used as a
negative control. When performing the western blotting, 40ug of nuclear extract was
used as input control for each sample.
See Table 3 for information of antibodies that were used.
Target
protein
Product information Cat# Lot#
Expected
target size
RNAPII
Convance MMS-126R-200 RNA
Polymerase II
8WG16
D12LF03
114
220kD
ZNF274
Abnova Anti-ZNF274 (420-530)
mAb
H0001078
2-M01
08064-
4C12
90kD
MAX
Santa Cruz Biotechnology Max (C-
17)X rabbit polyclonal IgG
sc-197X B2712 19kD
CEBPB
Santa Cruz Biotechnology C/EBP
β (C-19)X rabbit polyclonal IgG,
sc-150X A2113 48kD
SP1
Santa Cruz Biotechnology Sp1(E-
3)X Mouse monoclonal IgG2a
sc-17824X C0910 90kD
KLF4
Santa Cruz Biotechnology GKLF
(H-180)X rabbit polyclonal IgG
sc-20691X F1609 53kD
c-MYC
Santa Cruz Biotechnology c-Myc
(N262) X rabbit polyclonal IgG,
sc-764X A2513 70kD
E2F4
Santa Cruz Biotechnology E2F-4
(C108)X rabbit polyclonal
sc-512X E2212 70Kd
E2F4
Santa Cruz Biotechnology E2F-4
(C20)X rabbit polyclonal
sc-866x F2212 60kD
Table 3. Antibody information
43
qPCR
qPCR was performed for each ChIP-seq experiment after ChIP and library making steps
to confirm the sample enrichment on target sites over input. For each antibody, three
target sites were chosen as positive and two other sites were chosen as negative controls.
Primers were designed for each site. Experiments were performed on BIO-RAD CFX96
real time system using SBYG reagent (BIO-RAD SsoFast EvaGreen Supermix, Cat.
#172-5201). 1ul sample and 1ul 10uM primers were used in each well. Measurements
were done in triplicates.
See Table 4 for detail positive primers information.
Label
Name
Forwad sequence (5'-3') Reverse sequence (5'-3') Note
GAPDH-
Pol
CGGCTACTAGCGGTTTTACG GCTGCGGGCTCAATTTATAG RNAPII positive
MAZ-Pol GGCCCTTCAAATGTGAGGTA GGGATGAGAAAATGGAGCAA RNAPII positive
MYC-Pol TCGAGCCATAAAAGGCAACT CTCAATCTCGCTCTCGCTCT RNAPII positive
SUPT5H GAGGTCAACGGGTAGGTTCTC TGCACACGCACTTACCTCTC RNAPII positive
ZNF180-3’ TGATGCACAATAAGTCGAGCA TGCAGTCAATGTGGGAAGTC ZNF274 positive
ZNF333-3’ TGAAGACACATCTGCGAACC TCGCGCACTCATACAGTTTC ZNF274 positive
ZNF554-3’ CGGGGAAAAGCCCTATAAAT TCCACATTCACTGCATTCGT ZNF274 positive
SOD1 GTTTGCGTCGTAGTCTCCTG TTCGTCGCCATAACTCGCTA MAX positive
NR1D1 CTAGAGCCATGTGAGCCCTC GGGGAATCCTCAGTGACAGG MAX positive
TBCB AACCGCAAAACACTGAGCAT GGAGGTAGGAGTGGCAAGAG MAX positive
TR1B1 GGTGGAGGAAAAGGAGAAGG GTGTATGAGAGCGAGCGAGA CEBPB positive
TMEM9 TTCTATTGGTTGCCAGAGCA GTTTGGTCGGGTTTGAGAAA CEBPB positive
44
PLIN2_UP TGCTCAAGATGCTGGTATGC CTCTAGGGCACATGGAAAGC CEBPB positive
Table 4. Primers information
Chromatin immunoprecipitation sequencing (ChIP-seq)
Most of the ChIP-seq experiments were done in replicate using the previously published
lab protocol (Frietze et al. Plos One, 2010).
Briefly, cells were cross linked using 1% formaldehyde and chromosomes were sonicated
to small fragments (usually smaller than 600bp). 200ug to 1000ug of sonicated chromatin
were used in each ChIP-seq and 500ng chromatin was saved as input. Chromatin was
incubated with antibody overnight at 4°C. StaphA or Protein A/G magnetic beads (Pierce
Protein A/G Magnetic Beads, Prod # 88803) were used the next day to immuno-
precipitate the target protein-DNA complex. After reverse crosslinking at 65°C overnight,
DNA fragments were purified using the QIAGEN QIAquick PCR Purification Kit (Cat#
28104). qPCR experiments were performed after ChIP to confirm successful IP. ChIP
samples that had more than 10 fold enrichment over input that target sites were made in
to ChIP-seq libraries.
For making ChIP-seq libraries, ChIPed DNA fragments were repaired using Epicentre
End-It DNA Repair Kit (Cat# ER0720).An additional A base was added to the 3’ end of
the repaired fragments. Sequencing barcodes (BIOO NEXTflex DNA Barcodes) were
ligated to the DNA fragments. PCR was used to amplify adapter modified DNA
fragments. The reaction system was:
1. 10-24ul adapter modified DNA fragments
45
2. 25ul KAPA HiFi NGS MM
3. 1ul PCR primer mix (6.25uM)
4. Add elution buffer (EB) to a 50ul total reaction volume
Amplification was done using the following PCR protocol:
1. 98°C for 2mins
2. 98° for 30s
3. 65°C for 30s
4. 72°C for 60s
5. Go to 2, 10-12 cycles
6. 72°C for 4mins
7. Hold at 10°C
AMPure magnetic beads (BECKMAN COULTER Agencourt AMPure XP, A63881)
were used to purify DNA fragments throughout the library making process. After
purifying the amplification products, Qubit dsDNA HS Assay Kit (Q32851) and Kappa
Quantification Kits (KK4844) were used to quantify library concentration. Library
quality was checked using BioAnalyzer (Agilent). Library enrichments over input were
determined by qPCR. Qualified ChIP-seq libraries were sequenced on a HiSeq2000
sequencing machine. All ChIP-seq data was mapped to hg19 using BWA (default
parameters).
46
Whole genome bisulfite sequencing
Whole genome bisulfite sequencing (WGBS) data sets were generated in the lab by
Adam Blattler. (Blattler et al., 2014)
Briefly, genomic DNA was collected from HCT116 and DKO1 cells using a Qiagen
QIAeasy DNA mini kit. Genomic DNA (2µg) was sonicated using a Covaris to an
average molecular weight of 150bp. Achievement of the desired size range was verified
by BioAnalyzer (Agilent) analysis. Fragmented DNA was repaired to generate blunt ends
using the END-It kit (Epicentre Biotechnologies, Madison, WI) according to
manufacturer’s instructions. Following incubation, the treated DNA was purified using
AmpureX beads from Agencourt. In general, magnetic beads were employed for all
nucleic acid purifications in the following protocol. Following end repair, A-tailing was
performed using the NEB dA-tailing module according to manufacturer’s instructions
(New England Biolabs, Ipswich, MA). Adapters with a 3’ ‘T’ overhang were then ligated
to the end-modified DNA. For whole genome bisulfite sequencing, modified Illumina
paired-end (PE) adapters were used in which cytosine bases in the adapter are replaced
with 5-methylcytosine bases. Depending on the specific application, we utilized either
Early Access Methylation Adapter Oligos that do not contain barcodes, or the adapters
present in later versions of the Illumina DNA Sample Preparation kits, which contain
both indices and methylated cytosines. Ligation was carried out using ultrapure, rapid T4
ligase (Enzymatics, Beverly, MA) according to manufacturer’s instructions. The final
product was then purified with magnetic beads to yield an adapter-ligation mix. Prior to
bisulfite conversion, bacteriophage lambda DNA that had been through the same library
47
preparation protocol described above to generate adapter-ligation mixes was combined
with the genomic sample adapter ligation mix at 0.5% w/w. Adapter-ligation mixes were
then bisulfite converted using the Zymo DNA Methylation Gold kit (Zymo Research,
Orange, CA) according to the manufacturer’s recommendations. Final modified product
was purified by magnetic beads and eluted in a final volume of 20 ul. Amplification of
one-half the adapter-ligated library was performed using Kapa HiFi-U Ready Mix in a
50ul total volume reaction for the following protocol:
8. 98°C for 2mins
9. 98° for 30s
10. 65°C for 15s
11. 72°C for 60s
12. Go to 2, 6 times
13. 72°C for 10mins
The final library product was examined on the Agilent BioAnalyzer, and then quantified
using the Kapa Biosystems Library Quantification kit according to manufacturer’s
instructions. Optimal concentrations to get the right cluster density were determined
empirically but tended to be higher than for non-bisulfite libraries. Libraries were plated
using the Illumina cBot and run on the Hi-Seq 2000 according to manufacturer’s
instructions using HSCS v 1.5.15.1. Image analysis and base calling were carried out
using RTA 1.13.48.0; deconvolution and fastq file generation were carried out using
CASAVA_v1.7.1a5. Raw reads were mapped using Bis-SNP (Liu, Siegmund, Laird, &
48
Berman, 2012), and percent methyl-C/C was calculated for every CpG dinucleotide in the
human genome.
ENCODE data access
Two of the MAX ChIP-seq datasets in HCT116 were available via ENCODE and were
downloaded from the UCSC browser (accession number wgEncodeEH003223).
Data processing
ChIP-seq peak calling: RNAPII and ZNF274 peaks were called using Sole-Search with
default settings. MAX and CEBPB peaks were call using HOMER findPeaks script with
options “-style factor –region”.
Overlap analysis: RNAPII and ZNF274 overlap analysis was carried out using Sole-
Search gff overlap function with default setting. MAX and CEBPB overlap analysis was
carried out using HOMER mergePeaks script with the option “-d 100”.
Annotate peaks: HOMER annotatePeaks.pl script was used to annotate each peak
against hg19 with location, nearest genes etc. Other options used are “-d” and “-ratio”
(when annotating peaks with WGBS tags).
Tag density plots: To create the ChIP-seq tag density plots, HOMER annotatePeaks.pl
script was used with options “–hist 30 –size 3000” to average ChIP-seq tags relative to all
peak centers.
49
Motif density plots: To create the motif density plots, HOMER annotatePeaks.pl script
was used with options “–hist 30 –size 3000 -m”. The motif file that was used comes from
the HOMER transcription factor motif data base.
De novo motif searches: HOMER findMotifsGenome.pl script was used to find motifs
that were enriched in a set of peaks. Option “–len 8” was used in each search.
Single peak heat map: To create the single peak heat maps, peaks were annotated with
tags counts using HOMER annotatePeaks.pl script (with option “-ratio” when plotting
DNA methylation). The tags in each peak were plotted as heat map using heatmap.2
function in Bioconductor gplots package.
DNA methylation heat maps: To create the DNA methylation heat maps, HOMER
annotatePeaks.pl script was used with options “–hist 30 –size 3000 -ratio” to average
WGBS tags relative to all peak centers. . The resulting bins were plotted as heat maps
using heatmap.2 function in Bioconductor gplots package.
50
References
Blahnik, K. R., Dou, L., O’Geen, H., McPhillips, T., Xu, X., Cao, A. R., … Farnham, P. J.
(2010). Sole-Search: an integrated analysis program for peak detection and
functional annotation using ChIP-seq data. Nucleic Acids Research, 38(3), e13.
doi:10.1093/nar/gkp1012
Blattler, A., & Farnham, P. J. (2013). Cross-talk between site-specific transcription
factors and DNA methylation states. The Journal of Biological Chemistry, 288(48),
34287–94. doi:10.1074/jbc.R113.512517
Blattler, A., Yao, L., Witt, H., Guo, Y., Nicolet, C. M., Berman, B. P., & Farnham, P. J.
(2014). Global loss of DNA methylation uncovers intronic enhancers in genes
showing expression changes.(Submitted)
Deaton, A. M., & Bird, A. (2011). CpG islands and the regulation of transcription. Genes
& Development, 25(10), 1010–22. doi:10.1101/gad.2037511
Frietze, S., O’Geen, H., Blahnik, K. R., Jin, V. X., & Farnham, P. J. (2010). ZNF274
recruits the histone methyltransferase SETDB1 to the 3’ ends of ZNF genes. PloS
One, 5(12), e15082. doi:10.1371/journal.pone.0015082
Gardiner-Garden, M., & Frommer, M. (1987). CpG islands in vertebrate genomes.
Journal of Molecular Biology, 196(2), 261–82. Retrieved from
http://www.ncbi.nlm.nih.gov/pubmed/3656447
Heinz, S., Benner, C., Spann, N., Bertolino, E., Lin, Y. C., Laslo, P., … Christopher, K.
(2011). Simple combinations of lineage-determining transcription factors prime cis-
regulatory elements required for macrophage and B cell identities. Molecular Cell,
38(4), 576–589. doi:10.1016/j.molcel.2010.05.004.
Klose, R. J., & Bird, A. P. (2006). Genomic DNA methylation: the mark and its
mediators. Trends in Biochemical Sciences, 31(2), 89–97.
doi:10.1016/j.tibs.2005.12.008
Liu, Y., Siegmund, K. D., Laird, P. W., & Berman, B. P. (2012). Bis-SNP: Combined
DNA methylation and SNP calling for Bisulfite-seq data. Genome Biology, 13(7),
R61. doi:10.1186/gb-2012-13-7-r61
Mann, I. K., Chatterjee, R., Zhao, J., He, X., Weirauch, M. T., Hughes, T. R., & Vinson,
C. (2013). CG methylated microarrays identify a novel methylated sequence bound
by the CEBPB|ATF4 heterodimer that is active in vivo. Genome Research, 23(6),
988–97. doi:10.1101/gr.146654.112
51
Mohn, F., Weber, M., Rebhan, M., Roloff, T. C., Richter, J., Stadler, M. B., … Schübeler,
D. (2008). Lineage-specific polycomb targets and de novo DNA methylation define
restriction and potential of neuronal progenitors. Molecular Cell, 30(6), 755–66.
doi:10.1016/j.molcel.2008.05.007
Payer, B., & Lee, J. T. (2008). X chromosome dosage compensation: how mammals keep
the balance. Annual Review of Genetics, 42, 733–72.
doi:10.1146/annurev.genet.42.110807.091711
Rhee, I., Bachman, K. E., Park, B. H., Jair, K.-W., Yen, R.-W. C., Schuebel, K. E., …
Vogelstein, B. (2002). DNMT1 and DNMT3b cooperate to silence genes in human
cancer cells. Nature, 416(6880), 552–6. doi:10.1038/416552a
Stein, R., Razin, A., & Cedar, H. (1982). In vitro methylation of the hamster adenine
phosphoribosyltransferase gene inhibits its expression in mouse L cells
Biochemistry : Proc Natl. Acad Sci., 79(June), 3418–3422.
Abstract (if available)
Abstract
DNA methylation is an important epigenetic mark in the human genome. It is known as a repressive mark that associates with stably silenced genes. Hypotheses proposed for how DNA methylation silences expression include a) either the heterochromatin structure that is associated with DNA methylation prevents gene expression or b) DNA methylation directly blocks transcription activator binding. In my study, chromatin immune-precipitation sequencing (ChIP-seq) and whole genome bisulfite sequencing (WGBS) were used to test these two mechanisms in a global DNA demethylation model: HCT116 vs. DKO1 cells. ChIP-seq data for four proteins, including RNA polymerase II, ZNF274, MAX and CEBPB were analyzed. My results suggest that DNA methylation has different effects on different factors. For example, I found that a) promoter demethylation is not sufficient to open heterochromatin and recruit RNAPII, b) ZNF274 binds independent of DNA methylation, c) demethylation of MAX motifs leads to more binding, and d) CEBPB binds to methylated regions and unmethylated regions, perhaps using different protein partners. In conclusion, the function of DNA methylation on gene expression is more complex than expected.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Functional DNA methylation changes in normal and cancer cells
PDF
The kinetic study of engineered MBD domain interactions with methylated DNA: insight into binding of methylated DNA by MBD2b
PDF
Functional characterization of colon cancer risk enhancers
PDF
Understanding DNA methylation and nucleosome organization in cancer cells using single molecule sequencing
PDF
Identification of target genes and protein partners of ZNF711 in glioblastoma cells
PDF
Functional characterization of colorectal cancer GWAS loci
PDF
DNA methylation markers for blood-based detection of small cell lung cancer in mouse models
PDF
DNA methylation inhibitors and epigenetic regulation of microRNA expression
PDF
Genome-wide studies reveal the function and evolution of DNA shape
PDF
DNA methylation and gene expression profiles in Vidaza treated cultured cancer cells
PDF
Do the ZFX and ZFY transcription factors have redundant or unique functions?
PDF
Functional characterization of colon cancer risk-associated enhancers: connecting risk loci to risk genes
PDF
DNA methylation as a biomarker in human reproductive health and disease
PDF
Understanding protein–DNA recognition in the context of DNA methylation
PDF
Quantitative modeling of in vivo transcription factor–DNA binding and beyond
PDF
Role of DNA methyltransferases 3A and 3B in inheritance of DNA methylation patterns
PDF
Transcriptional regulation by epigenetic mechanisms
PDF
Functional role of chromatin remodeler proteins in cancer biology
PDF
Characterization of the ZFX family of transcription factors that bind downstream of the start site of CpG island promoters
PDF
Identification and characterization of cancer-associated enhancers
Asset Metadata
Creator
Guo, Yu (Phoebe)
(author)
Core Title
The relationship between DNA methylation and transcription factor binding in colon cancer cells
School
Keck School of Medicine
Degree
Master of Science
Degree Program
Biochemistry and Molecular Biology
Publication Date
09/16/2014
Defense Date
06/16/2014
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
CEBPB,DKO1,DNA methylation,Max,OAI-PMH Harvest,RNAPII,transcription factor binding,ZNF274
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Farnham, Peggy J. (
committee chair
), Coetzee, Gerhard (Gerry) A. (
committee member
), Tokes, Zoltan A. (
committee member
)
Creator Email
yuguo.phoebe@gmail.com,yuguo@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c3-479522
Unique identifier
UC11286817
Identifier
etd-GuoYuPhoeb-2956.pdf (filename),usctheses-c3-479522 (legacy record id)
Legacy Identifier
etd-GuoYuPhoeb-2956.pdf
Dmrecord
479522
Document Type
Thesis
Format
application/pdf (imt)
Rights
Guo, Yu (Phoebe)
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
CEBPB
DKO1
DNA methylation
RNAPII
transcription factor binding
ZNF274