Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Effects of chromatin regulators during carcinogenesis
(USC Thesis Other)
Effects of chromatin regulators during carcinogenesis
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
EFFECTS OF CHROMATIN REGULATORS DURING CARCINOGENESIS by Christopher E. Duymich A Dissertation Presented to the FACULTY OF THE GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (GENETIC, MOLECULAR, AND CELLULAR BIOLOGY) MAY 2016 Copyright 2016 Christopher E. Duymich i EPIGRAPH “La science, mon garçon, est faite d'erreurs, mais d'erreurs qu'il est bon de commettre, car elles mènent peu à peu à la vérité.” --- Jules Verne, Voyage au centre de la Terre ii DEDICATION I dedicate this work to my family and people whose support and confidence have enabled me to pursue my goals and aspirations in science. iii ACKNOWLEDGMENTS I would like to thank my supervisor Dr Peter A. Jones for his guidance and support throughout the course of my PhD. Discussions, support and guidance by Dr Gangning Liang also provided tremendous help during this arduous process. Without either one’s help, none of the research contained in this thesis would have been possible. iv TABLE OF CONTENTS Epigraph ............................................................................................................................. i! Dedication .......................................................................................................................... ii! Acknowledgments ............................................................................................................ iii! List of Tables ................................................................................................................... vii! List of Figures ................................................................................................................. viii! Abstract ............................................................................................................................ xii! Chapter 1 Epigenetic and Genetic Regulation in Cancer ............................................. 1! Introduction ................................................................................................................... 1! Genetic Aberrations and Cancer Development ......................................................... 1! Tumor Suppressors ..................................................................................................... 2! Oncogenes ................................................................................................................... 3! Driver and Passenger Gene Mutations ........................................................................ 3! Epigenetic Modification Functions .............................................................................. 4! DNA Methylation ....................................................................................................... 4! Histone Modifications ................................................................................................. 6! Nucleosome Positioning ............................................................................................. 6! Epigenetic Aberrations in Cancer ............................................................................... 7! DNA Methylation ....................................................................................................... 7! Histone Modifications ................................................................................................. 9! Nucleosome Positioning ........................................................................................... 10! An Overview of Urothelial Cell Carcinoma ............................................................. 11! Genetic Aberrations of Urothelial Cell Carcinoma .................................................. 15! Allelic Losses and Chromosomal Alterations ....................................................... 15! Signatures of Mutations and Frequency ............................................................... 15! Fibroblast Growth Factor Receptor 3 Alterations ................................................. 16! Alterations in the PI3K Pathway ........................................................................... 17! MAPK Pathway Activation .................................................................................. 17! TP53 and RB Alterations ...................................................................................... 17! Epigenome and Related Alterations in UCC ............................................................ 19! Overview of Thesis Research ..................................................................................... 20! Chapter 2 Comprehensive Characterization of Mutations, Epigenetic Changes and Gene Expression in Human Urothelial Cell Carcinoma Cell Lines ........................... 22! Introduction: ............................................................................................................... 22! Material and Methods: ............................................................................................... 25! Cell Culture ............................................................................................................... 25! Illumina Infinium HM450 DNA Methylation Assay ................................................ 25! AcceSssIble Assay .................................................................................................... 26! Whole Exome Sequencing ........................................................................................ 26! Detection of Somatic Mutations ............................................................................... 26! Detection of DNA Copy Number Alterations .......................................................... 27! Publically Available Data Sets .................................................................................. 28! v Results: ......................................................................................................................... 32! Genetic Background of Cell Lines ............................................................................ 32! Methylation Background of Cell Lines ..................................................................... 42! Gene Expression Changes in Bladder Cancer Caused by Epigenetic Changes ........ 46! Pathways Altered in UCC ......................................................................................... 52! Discussion: ................................................................................................................... 56! Chapter 3 Low Stage Recurrent UCCs Have Common Ancestral Clones ................ 58! Introduction ................................................................................................................. 58! Materials and Methods ............................................................................................... 61! Illumina Infinium HM450 DNA Methylation Assay ................................................ 61! Whole Exome Sequencing ........................................................................................ 61! Detection of Somatic Mutations ............................................................................... 62! Publically Available Data Sets .................................................................................. 63! Results .......................................................................................................................... 67! Clinical and Pathological Data .................................................................................. 67! Somatic DNA Alterations ......................................................................................... 67! DNA Methylation Alterations ................................................................................... 74! Clonal Evolution Analysis Identifies an Ancestral Clone in Every Patient .............. 78! Public Mutations Indicate Early Events in Bladder Cancer ...................................... 85! Discussion .................................................................................................................... 95! Chapter 4 DNMT3B Isoforms without Catalytic Activity Stimulate Gene Body Methylation as Accessory Proteins in Somatic Cells ................................................... 97! Introduction ................................................................................................................. 97! Materials and Methods ............................................................................................. 100! Cell lines and Drug treatment ................................................................................. 100! DNA Methyltransferase Isoform Constructs .......................................................... 100! Protein Extraction and Western Blot Analysis ....................................................... 101! RNA isolation and mRNA expression by qRT-PCR .............................................. 101! Illumina Infinium HM450 DNA Methylation data processing ............................... 102! RNA-Seq Data Collection and Analysis ................................................................. 102! Results ........................................................................................................................ 104! Stable Re-Introduction of DNMT Isoforms ............................................................ 104! DNA Methylation Restoration by DNMT3B Isoforms is Independent of Catalytic Activity ................................................................................................................... 110! DNMT3B Isoforms can increase the Remethylation Rate after Treatment with a DNA Methylation Inhibitor .................................................................................... 113! Over-expression of DNMT3s in different cancer types .......................................... 117! Discussion .................................................................................................................. 119! Chapter 5 Whole-Exome Sequencing Processing Methdology ................................. 123! Introduction ............................................................................................................... 123! Materials and Methods ............................................................................................. 128! Sample Preparation and Sequencing ....................................................................... 128! Publically Available Resources .............................................................................. 128! Results ........................................................................................................................ 130! FastQ Files and Quality Control ............................................................................. 130! vi Alignment to the Genome, Sorting, Marking Duplicates and Indexing Files ........ 132! Using the Genome Analysis Tool Kit (GATK) ...................................................... 134! Annotating, Filtering and Combining Multiple Samples ........................................ 142! Final Filtering and Isolation of Variants ................................................................. 145! Predicting Copy Number Variations from WES Sequencing Data ........................ 149! Discussion .................................................................................................................. 152! Chapter 6 Discussion .................................................................................................... 154! Summary .................................................................................................................... 154! Final Conclusion and Perspective ........................................................................... 157! References: ..................................................................................................................... 158! Appendix ........................................................................................................................ 179! Comment on “The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity” ..................................................................................... 180! Expert Commentary on “ΔDNMT3B4-del Contributes to Aberrant DNA Methylation Patterns in Lung Tumorigenesis” ...................................................... 182! vii LIST OF TABLES Table 2.1 Primer List for Validation of Specific Gene Mutations .................................... 29! Table 2.2 Urothelial Cell Carcinoma Cell Line Information ............................................ 30! Table 2.3 Epigenetic Related Genes and Respective Categories ...................................... 31! Table 3.1 Clinical and Pathological Data for 30 UCC Tumors from 13 Patients ...................................................................................................... 64! Table 3.2 Metachronous Tumor Locations ....................................................................... 65! Table 4.1 PCR Sequences for DNMTs and Cloning ...................................................... 103! viii LIST OF FIGURES Figure 1.1 Schematic Representation of Urothelial Cell Carcinoma Stages .................... 13! Figure 1.2 Mutations in Non Muscle-Invasive and Muscle-Invasive UCC ...................... 14! Figure 2.1 Overview of Mutations in UCC Cell Lines ..................................................... 33! Figure 2.2 Frequently Mutated Genes in UCC ................................................................. 34! Figure 2.3 Key Mutations in KDM6A and ARID1A ......................................................... 38! Figure 2.4 Copy Number Gains in UCC Cell Lines ......................................................... 39! Figure 2.5 Copy Number Losses in UCC Cell Lines ........................................................ 40! Figure 2.6 Copy Number Alterations in UCC Cell Lines ................................................. 41! Figure 2.7 Overview of DNA Methylation Changes in UCC Cell Lines ......................... 43! Figure 2.8 Summary of Promoter Hyper and Hypo DNA Methylation in UCC .......................................................................................................... 45! Figure 2.9 Gene Expression Changes in UCC .................................................................. 48! Figure 2.10 AcceSssIble Assay Workflow ....................................................................... 49! Figure 2.11 Chromatin Accessibility Alterations in UCC Cell Lines .............................. 50! Figure 2.12 Epigenetic Alterations Confirmed in Patient Samples .................................. 53! Figure 2.13 Genetically and Epigenetically Altered Genes and Pathways in UCC .......................................................................................................... 55! Figure 3.1 Metachronous Bladder Tumor Timelines for 13 Patients ............................... 68! Figure 3.2 Somatic Mutations and Mutational Signatures in UCC .................................. 69! Figure 3.3 Mutational Signatures Identified from Tumor Samples .................................. 70! Figure 3.4 Frequently Mutated Genes in Non Muscle-Invasive UCC .............................. 73! Figure 3.5 DNA Methylation Aberrations Per Tumor ...................................................... 75! ix Figure 3.6 CGI Context of DNA Methylation Aberrations Per Tumor ............................ 76! Figure 3.7 Genes with Promoter Methylation Changes in UCC ...................................... 77! Figure 3.8 P03-P12 Evolution Trees Inferred by Somatic Mutations and DNA Methylation Changes ....................................................................... 79! Figure 3.9 P13-P21 Evolution Trees Inferred by Somatic Mutations and DNA Methylation Changes ....................................................................... 80! Figure 3.10 P26-P30 Evolution Trees Inferred by Somatic Mutations and DNA Methylation Changes ....................................................................... 81! Figure 3.11 Mutations and Mutational Signatures in Respect to Public and Private Branches ....................................................................................... 82! Figure 3.12 DNA Methylation Changes Occurring in Public and Private Branches with Respect to Genomic CpG Location .................................. 83! Figure 3.13 30 Frequently Mutated Genes Found in the Public Branch .......................... 87! Figure 3.14 Frequently Mutated Genes in the Public Branch in 3 or more Patients ...................................................................................................... 88! Figure 3.15 Non Muscle-Invasive and Muscle-Invasive UCCs Somatic Mutations in MLL2 and MUC16 .............................................................. 89! Figure 3.16 Somatic Mutations of MUC16 in 5 Different Cancers .................................. 90! Figure 3.17 Mucin Family Members are Frequently Mutated in UCC ............................ 91! Figure 3.18 Somatic Mutation Locations of all Mutated Mucin Family Members ................................................................................................... 92! Figure 3.19 Frequency of Mutations and Promoter Methylation in Public Branches .................................................................................................... 94! x Figure 4.1 HCT116 Derivative Cell Line DNA Methylation Comparison .................... 105! Figure 4.2 Schematic Diagram of DNMT3A, Selected DNMT3B Isoforms and DNMT3L .......................................................................................... 106! Figure 4.3 Stable Reintroductions of DNMT3B Isoforms and DNMT3L in DKO8 and 3BKO Cell Lines .................................................................. 108! Figure 4.4 DNMT3B Isoforms and DNMT3L Restore DNA Methylation at Specific CpG Sites .............................................................................. 109! Figure 4.5 Genomic Context of DNMT3B Isoforms and DNMT3L Target Sites ......................................................................................................... 111! Figure 4.6 DNMT3B Isoforms Restore DNA Methylation at Specific CpG Sites After 5-Aza-CdR Treatment .......................................................... 114! Figure 4.7 Group I CpGs are DNMT3B Target Sites in 3BKO ..................................... 115! Figure 4.8 Group IV CpG Sites Rebound Independently of DNMT3B ......................... 116! Figure 4.9 Differential Expression Level of DNMT3A and 3B Between Normal and Various Tumor Tissues ....................................................... 118! Figure 4.10 The Changing Landscape of de novo DNMTs During Development ........................................................................................... 121! Figure 5.1 Whole-Exome Sequencing (WES) Data Processing Overview .................... 124! Figure 5.2 FASTQ Files and Quality Check ................................................................... 131! Figure 5.3 Step-by-Step SNV and INDEL Calling Part 1 .............................................. 133! Figure 5.4 Step-by-Step SNV and INDEL Calling Part 2 .............................................. 137! Figure 5.5 Sequence Alignments Before and After GATK-based Read Realignment ............................................................................................ 138! xi Figure 5.6 Step-by-Step SNV and INDEL Calling Part 3 .............................................. 139! Figure 5.7 Step-by-Step SNV and INDEL Calling Part 4 .............................................. 140! Figure 5.8 Alternative to VQSR for SNV and INDEL Filtering .................................... 141! Figure 5.9 Variant Annotation Preparation ..................................................................... 143! Figure 5.10 Filtering, Combining and Annotating VCF Files ........................................ 144! Figure 5.11 Isolating DNA Sequence Variants for 55 WES Samples ............................ 147! Figure 5.12 Pre- and Post- Analysis of Variant Files ..................................................... 148! Figure 5.13 CNV Using Control-FREEC ....................................................................... 150! ! ! ! xii ABSTRACT Traditionally cancer is viewed as a disease driven by the accumulation of genetic alterations, however, this view has been currently expanded to include epigenetic modifications. Recent developments, such as whole exome sequencing (WES) have enabled researchers to analyze the entire genomic coding regions in cancer tissue samples to determine possible genetic mutations causing tumor onset/progression, leading to improved disease prognosis and treatment therapies. These studies have unexpectedly shown genes that organize the epigenome are frequently mutated in many tumor types. The effects of these mutations on the epigenetic landscape, e.g. DNA methylation, histone modification, and nucleosome positioning alterations are as yet unknown. Given that the disruption of normal epigenetic patterns can contribute to carcinogenesis, it is critical to understand how the cancer epigenome has been altered and can be reversed by pharmacological intervention. I designed a pipeline to process WES data for the purpose of identifying somatic alterations in bladder cancer cell lines as well as primary bladder cancer tissues. I characterized the genetic and epigenetic alterations in bladder cancer cell lines and compared them to uncultured tumors to validate them as model systems for studying bladder cancer tumorigenesis. Specifically, I confirmed findings of genetic alterations previously identified in bladder cancer, as well as DNA methylation alterations using arrays. Additionally, I dissected early epigenetic and genetic alterations that occur in recurrent non muscle-invasive bladder cancer using metachronous uncultured tumor samples. These findings show clonal evolution from ancestral clones in all patients using xiii both DNA methylation and genetic analysis confirming a common initiation of tumorigenesis. I have also identified a new accessory protein functional role for DNA methyltransferase 3B (DNMT3B) isoforms lacking catalytic domains. These accessory proteins are able to stimulate DNA methylation both in a de novo context, as well as partially restoring DNA methylation levels following treatment with a FDA-approved demethylation agent, 5-Aza-CdR. Taken together, this dissertation presents a detailed view of the genome and epigenome alterations that contribute to tumor initiation and development. 1 CHAPTER 1 EPIGENETIC AND GENETIC REGULATION IN CANCER INTRODUCTION Cancer rises from an accumulation of genetic and epigenetic alterations and accounts for more than 8.2 million deaths a year worldwide (Stewart and Wild). Eight hallmarks of cancer were proposed to contribute to oncogenesis (Hanahan and Weinberg, 2011). These include production of proliferative signals, growth suppressor evasion, resistance to cell death, inducing angiogenesis, replication immortality, tissue invasion and metastasis, deregulating cellular energetics and avoiding immune destruction. Additionally, enabling characteristics that are consequential of the canonical hallmarks include genome instability, mutation and tumor-promoting inflammation (Hanahan and Weinberg, 2011). Alterations in the genome influence changes to the epigenome, causing cellular transformation into cancer (Simó-Riudalbas and Esteller, 2014). In this chapter, I will present a brief overview of cancer genetics, a detailed look at the normal and cancer epigenomes, as well as and mechanisms of disease for bladder cancer. GENETIC ABERRATIONS AND CANCER DEVELOPMENT Traditionally, cancer arises from a single cell and is based on the foundation of a succession of somatic mutations in which each confers specific growth advantages over normal cells (Armitage, 1985). This accumulation of variants in mutant cells eventually allows for aberrant growth in the specific microenvironment (Weinberg, 2007). Recent 2 studies have highlighted the diverse accumulation of somatic mutation events in 30 different cancers with ranges of 0.001 per megabase in pilocytic astrocytoma to 400 per megabase in melanoma (Alexandrov et al., 2013). Mutations and alterations in the genome include single nucleotide variants (SNVs), insertions or deletions (INDELs), amplifications and translocations. A driver gene mutation usually occurs in tumor suppressors or oncogenes while a large proportion of mutations are passengers in these and other genes because they do not confer any growth advantages (Vogelstein et al., 2013). Tumor Suppressors Tumors suppressor genes inhibit progression of cell transformation required for cancer development. To abolish tumor suppressor function, inactivating mutations, gene silencing or homozygous deletions are required. The earliest identified tumor suppressor gene was retinoblastoma 1 (RB1) and interestingly, required “two hits” for gene inactivation (Knudson, 1971). This meant that a mutation in one allele was not sufficient to cause gene inactivation because the non-mutated allele will still give rise to functional gene product. This is not true for all tumor suppressor genes, especially for TP53, one of the most frequently mutated genes in cancer. Heterozygous TP53 mutations have a dominant negative effect, since the mutated protein inhibits the function of the normal protein (O'Connor et al., 1993). In addition, the “two-hit” theory is also expanded to include epigenetic alterations such as DNA methylation (Jones and Laird, 1999). In 84% of colorectal cancer patients, methylation of the promoter of MLH1 leading to gene inactivation is one such epigenetic hit (Herman et al., 1998). 3 Oncogenes Genes that normally contribute to cellular growth and proliferation (proto- oncogenes) and become hyperactive are considered oncogenes (Santos et al., 1983). Fifty-four oncogenes have been identified in the human genome, with most of them identified through retroviral studies by accidently incorporating proto-oncogenes into proviral DNA (Weinberg, 1994). Normal cells express proto-oncogenes that are also required for cellular functions, however, in transformed cells, oncogene mutations promote hyperactivity that leads to increased cellular proliferation. Different mutational and epigenetic mechanisms allow for proto-oncogenes to become oncogenes. Examples include the following proto-oncogenes: RAS that is activated through point mutations, and ABL chromosomal translocation with BCR, resulting in the BCR-ABL fusion protein (Bishop, 1983; Chissoe et al., 1995). Driver and Passenger Gene Mutations Currently, 125 driver genes (a gene containing a driver gene mutation) have been identified, and consist of 71 tumor suppressors and 54 oncogenes (Vogelstein et al., 2013). Mutations in genes that confer growth advantages are considered driver mutations, while the rest are purely passenger mutations (Haber and Settleman, 2007; Maley et al., 2004; Simpson, 2009). As an example, truncating mutations within the coding region of the N-terminal 1,600 amino acids of the APC gene are usually driver gene mutations, but SNVs throughout the rest of the gene, as well as the C-terminal coding region, are passenger gene mutations (Bozic et al., 2010). It is speculated that there are 2-8 mutated driver genes per tumor across different cancers, while the remaining mutations in each 4 tumor are purely passengers (Vogelstein et al., 2013). In addition to mutation driver genes, there are also epigenetic driver genes that are simply genes epigenetically modified causing their aberrant expression. To identify the complete set of cancer driver genes, both in vitro and in vivo experiments are necessary (Vogelstein et al., 2013). EPIGENETIC MODIFICATION FUNCTIONS For normal cellular processes, the epigenome contributes functional elements that include DNA methylation, nucleosome positioning, histone variants and modifications, as well as non-coding RNAs. These elements work in concert during mammalian development, as well as in differentiated somatic cells, to regulate the transcriptome in all cell types. Due to the importance of the epigenome for tissue identity, aberrant alterations of the epigenome may be instrumental for initiating tumorigenesis and disease development (Baylin and Jones, 2011; Sandoval and Esteller, 2012; Sharma et al., 2010). DNA Methylation In mammalian cells, the covalent addition of a methyl group to the C-5 position of cytosines in a 5’-CG-3’ (CpG) dinucleotide context is referred to as DNA methylation and is one of the most studied epigenetic components. These dinucleotides are found throughout the genome, but are enriched in CpG islands (CGI), defined as a DNA region of more than 500bp having a GC content of more than 55%, with a CpG dinucleotide ratio of 0.65 observed over expected (Takai and Jones, 2002). To obtain this ratio, expected values are calculated by multiplying the number of C bases by G bases in a given sequence window and dividing by the window size. CpGs are more prone to 5 spontaneous deamination, leading to TpGs thereby depleting the CpG content through the genome. Although CGIs are located primarily at promoters of genes and are generally unmethylated, 90% of CpG dinucleotides are found outside CGIs and are usually methylated (Meissner et al., 2008). DNA methylation at CGIs within promoters of genes may be associated with permanent transcriptional silencing of those genes. The function of non-CGI DNA methylation has been implicated in controlling gene expression, although less is known (Jones, 2012). DNA methylation is a key epigenetic mechanism that is responsible for stable gene silencing of key biological processes which include the establishment and maintenance of normal tissue specific gene expression patterns, X- chromosome inactivation, parasitic transposable elements silencing and genomic imprinting (Bird, 2002; Esteller, 2011; Jurkowska et al., 2011). The enzymes responsible for DNA methylation are the DNA methyltransferases (DNMTs) and are comprised of three enzymatically active family members: DNMT1, DNMT3A and DNMT3B, as well as DNMT3L (DNA methyltransferase 3-like), which establishes DNA methylation patterns during development (Hata et al., 2002). Maintenance methylation is catalyzed by DNMT1 which copies DNA methylation patterns from the parental to the daughter strand during replication (Lei et al., 1996). DNMT1 interacts with Ubiquitin-Like PHD And RING Finger Domain-Containing Protein 1 (UHRF1) for the specific targeting of regions currently undergoing DNA replication (Bostick et al., 2007), and more recent studies have shown that DNMT3s are also involved in this process (Jeong et al., 2009; Sharma et al., 2011). DNMT3A and DNMT3B are primarily de novo methyltransferases since they show no preference for hemi-methylated DNA, although they have been shown to interact with DNMT1 for 6 maintenance purposes (Liang et al., 2002a; Okano et al., 1999; Rhee et al., 2002). DNMT3A was found to have two different isoforms while DNMT3B has more than thirty isoforms (Gopalakrishnan et al., 2009; Ostler et al., 2007; Wang et al., 2006b; Xie et al., 1999). The DNMT3s use nucleosomal DNA as substrates (Sharma et al., 2011) and are responsible for the converting the largely unmethylated epigenome in embryonic stem (ES) cells. Genome-wide DNA methylation changes during an epigenetic reprogramming phase are thought to cause ES cell differentiation (Zhu et al., 2013). Histone Modifications The establishment of the epigenome also relies on the coordination of other epigenetic factors including nucleosomes to efficiently organize the genome. Nucleosomes are comprised of 147bp of DNA wrapped around a core histone octamer comprised of two copies of each H2A, H2B, H3, and H4 histone proteins. These histone proteins can be covalently modified at specific residues on the N-terminal tail by modifications such as methylation, ubiquitination, sumoylation, acetylation, or phosphorylation. Based on the modifications of the histone tail proteins, euchromatic or heterochromatic regions are demarcated throughout the genome (Zhou et al., 2011). Nucleosome Positioning Nucleosome positioning throughout the genome is important in determining whether genes are accessible to transcription factors, polymerases or other proteins for transcription. In regards to transcription start sites (TSS) of genes, nucleosome depleted regions at the promoter allow transcription factors to bind and activate gene expression. 7 Early studies in S.cerevisiae identified nucleosome-depleted regions upstream of TSSs (Bernstein et al., 2004; Lee et al., 2004; Sekinger et al., 2005). Recent genome-wide sequencing advances enabled researchers to not only confirm the presence of nucleosome-depleted regions, but also describe how they provide areas for transcription machinery to access the DNA (Yuan et al., 2005). Our lab pioneered two different techniques on studying chromatin accessibility in addition to DNA methylation; AcceSssIble and nucleosome occupancy methylome-sequencing (NOME-seq) (Kelly et al., 2012; Pandiyan et al., 2013; You et al., 2011). Collectively, epigenetic factors including DNA methylation, histone modifications, and nucleosome positions work in concert to regulate the expression of genes (Portela and Esteller, 2010; Segal et al., 2006; Zhou et al., 2011). EPIGENETIC ABERRATIONS IN CANCER The epigenetic landscape can be distorted in cancer similar to the genetic alterations that also occur. The epigenome is known to acquire global DNA methylation changes, histone modification patterning and nucleosome reorganization, all of which play roles in altering the transcriptome during cancer initiation and progression (Esteller, 2007; Sharma et al., 2010). DNA Methylation Although DNA methylation is critical for mammalian development as mentioned previously, genome-wide studies have shown that normal tissue DNA methylation aberrancies are a hallmark of diseases, including cancer (Jones and Baylin, 2007; Sharma et al., 2010). These aberrations include global DNA hypomethylation at distal regulatory 8 regions, as well as repetitive elements, that may lead to chromosome instability and transcriptome alterations and DNA CGI hypermethylation, causing gene silencing (Bird, 2002; Ehrlich, 2009). CGI promoter DNA hypermethylation associated gene silencing was one of the first functional studies to show that DNA methylation could lead to inactivation of the tumor suppressors RB1 (Greger et al., 1989) and p16 INK4a (Herman et al., 1995). Recently, gene body DNA methylation was shown to have a functional role in gene expression (Kulis et al., 2012; Maunakea et al., 2010; Varley et al., 2013) and may potentially be exploited as an additional cancer therapeutic target (Yang et al., 2014). As a result of rapid technological advancements, DNA methylation patterns and other epigenetic components can be mapped across the entire genome, thus revealing distinct patterns in cancerous tissues compared to an originating tissue (Kundaje et al., 2015), including bladder cancer (Network, 2014a; Wolff et al., 2010b). However, the establishment of these patterns and the mechanisms involved is not well understood. DNMTs have been identified as a possible cause of DNA methylation aberrations because they are often overexpressed in cancer, specifically with respect to the DNMT3B isoforms (Chen et al., 2003; Gopalakrishnan et al., 2009; Robertson et al., 1999; Wang et al., 2006b; Weisenberger et al., 2004). DNMT1 overexpression is responsible for the loss of imprinting with DNA hypermethylation of the imprinted IGF2/H19 locus, leading to biallelic IGF2 expression (Biniszkiewicz et al., 2002). High CXCR4 expression levels in breast cancer are due to non-canonical expression of DNMT1 and DNMT3B, which are necessary for the maintenance of promoter DNA methylation of CXCR4 (Biniszkiewicz et al., 2002; Przybylski et al., 2010). Non-small cell lung cancer has expression of the delta DNMT3B isoforms (DNMTΔ3B) that do not have the N-terminal domain of 9 DNMT3B, and is thought to be responsible for DNA hypermethylation of the RASSF1A and p16 INK4a promoters (Wang et al., 2007; Wang et al., 2006a; Wang et al., 2006b). DNMT3B3 and DNMT3B4, two inactive DNMT3B isoforms, are able to stimulate de novo DNA methylation by forming a complex with DNMT3A, although it is not fully clear how these inactive forms affect loci specific targeting (Gordon et al., 2013). Histone Modifications As mentioned before, DNA methylation is only one branch of epigenetics and is not responsible for all alterations to the transcriptome. Specifically, histone tail modifications can also be disrupted in cancer, potentially altering the chromatin architecture. It has been shown that DNA hypermethylation of tumor suppressor genes are linked to the loss of Histone H3 lysine 4 trimethylation (H3K4me3) modification, gains of histone H3 lysine 9 monomethylation (H3K9me1) and H3K27me3 modifications (Hamamoto et al., 2004; Viré et al., 2006). The altered distribution of histone methyl marks in cancer cells is associated with aberrant expression of histone methyltransferases and demethylases (Chi et al., 2010). Inactivating mutations in SETD2, a histone methyltransferase, and KDM6A, a histone demethylase, were found in renal cell carcinoma. In addition, frequent mutations of KDM6A were found in urothelial cell carcinoma (UCC) (Dalgliesh et al., 2010; Gui et al., 2011). Cancer cells also aberrantly express histone deacetylases, which in combination with the deregulated histone methyltransferases and histone demethylases, results in transcriptionally inactive histone modifications in gene promoter regions, as well as a compact chromatin structure (Jones and Baylin, 2007; Nakagawa et al., 2007; Sharma et al., 2010; Shi, 2007). Chromatin 10 compaction does silence a gene, but to permanently turn off gene expression, promoter DNA methylation is required. This process is used in tumorigenesis and occurs when the enhancer of zeste homolog 2 (EZH2) protein catalyzes the repressive tri-methylation modification on H3K27, leading to the compaction of chromatin. Following EZH2’s repressive histone modification, DNMTs are recruited for DNA methylation, which subsequently causes permanent gene silencing (Viré et al., 2006). Although this de novo hypermethylation at gene promoters results in an epigenetic switch from Polycomb, repressive histone modifications as well as chromatin reorganization to silence genes, to DNA methylation during cancer development (Gal-Yam et al., 2008; Kondo et al., 2008; Schlesinger et al., 2007; Widschwendter et al., 2007), DNA hypomethylation of these regions does not necessarily re-open the chromatin to transcription factors (Lay et al., 2015; Lin et al., 2007). The interplay between histone modifications and DNA methylation is an important aspect of epigenetic studies for cancer biology, since alterations in one can affect the other. Nucleosome Positioning As mentioned previously, nucleosome positioning regulates expression of genes by altering DNA accessibility for regulatory proteins. Expression alterations in cancer are influenced by nucleosome-positioning and chromatin restructuring. Chromatin remodeling complexes are often found to be malfunctioning in cancer. The chromatin remodeling complex SWI/SNF mediates ATP-dependent chromatin changes and targets genes involved in cell growth, development and differentiation (Brown et al., 2002; Liu et al., 2001). Protein subunits in the SWI/SNF complex are often mutated in human 11 cancer cells (Reisman et al., 2009; Roberts and Orkin, 2004). BAF47, BRM, BRG1, ARID1A and PRM1 are subunits in the SWI/SNF complex, and are frequently mutated in various cancers, including chromic myeloid leukemia, ovarian clear cell carcinoma, renal clear cell carcinoma and urothelial cell carcinoma (Grand et al., 1999; Gui et al., 2011; Guo et al., 2013; Network, 2013, 2014a; Wiegand et al., 2010). Interplay between DNA methylation and nucleosome positioning remains inconclusive, although there is significant progress on how these two epigenetic components work together (Kelly et al., 2012; Lay et al., 2015). Nucleosome positioning, combined with incorporation of the histone variant H2A.Z, allows for chromatin reorganization and gene reactivation at LINE-1 repetitive elements and the MHL1 locus (Lin et al., 2007; Wolff et al., 2010a; Yang et al., 2012). The frequent mutations in epigenetically related components highlight their importance in tumorigenesis. AN OVERVIEW OF UROTHELIAL CELL CARCINOMA Urothelial Cell Carcinoma (UCC) is the fifth most commonly diagnosed cancer, following prostate, breast, colon and lung cancers in Western countries, with approximately 380,000 new cases and 150,000 deaths per year worldwide (Ferlay et al., 2010). Common risk factors associated with UCC include cigarette/tobacco use, chronic urinary tract infections, arsenic exposure and occupational exposure of carcinogens in the rubber or fossil fuel industry (Brandau and Böhle, 2001; Feng et al., 2002; Kantor et al., 1984). UCC is diagnosed as non-muscle-invasive (NMIBC) or muscle-invasive bladder cancer (MIBC) and is characterized by the extent of surrounding tissue invasion (Tis-T4) using the Tumor-Node-Metastasis system (TNM system) (Figure 1.1) (Sobin et al., 12 2009). Patients with NMIBC represent 75% of UCC cases, and it is estimated that 60% of these will have a recurrent tumor (Babjuk et al., 2013). On one hand, common alterations in NMIBC are seen in the Ras mitogen-activated protein kinase (MAPK) pathway, resulting in constitutive activation of this extracellular signal transduction pathway involved in cellular proliferation (Wolff et al., 2005). On the other hand, the retinoblastoma and p53 pathways are inactivated in most MIBC cases. Genes in these three pathways can be altered either through genetic or epigenetic alterations, such that conclusions based on either mechanism are difficult to ascertain (Markl and Jones, 1998; Sarkar et al., 2000; Wolff et al., 2005). Advances in sequencing technology have allowed researchers to comprehensively study genome-wide bladder cancer alterations. Several studies have identified genes involved in the aforementioned pathways, as well as chromatin remodeling genes. By combining the works of several research groups that reported genetic alterations in NMIBC and MIBC, potential associations with the two types are evident (Figure 1.2) (Balbás-Martínez et al., 2013; Guo et al., 2013; Iyer et al., 2013; Network, 2014a; Nordentoft et al., 2014; Ross et al., 2014). Although NMIBC and MIBC have been well characterized and both show diverse genetic backgrounds, there is still limited knowledge about the molecular events causing NMIBC and MIBC disease progression (Wolff et al., 2005). Figure 1.1: Schematic Representation of Urothelial Cell Carcinoma Stages. Illustration of Urothelial cell carcinoma stages: Ta, non-invasive papillary carcinoma; T1, tumor invades subepithelial connective tissue; T2a, tumor invades superficial muscle; T2b, tumor invades deep muscle; T3, tumor invades perivesical tissue and T4, tumor invades adjacent tissues and organs. Stages are represented according to the Tumor-Node-Metastasis (TNM) nomenclature system. 13 Lumen Lamina propria Inner muscle Outer muscle Urothelium T3 T1 T2a T2b Ta T4 Figure 1.2: Mutations in Non Muscle-Invasive and Muscle-Invasive UCC. Frequencies of 53 mutated genes in UCC separated by non muscle-invasive (n=50, black) and muscle-invasive tumors (n=418, red). 0 10 20 30 40 50 CDKN2A/B CCND1 ERBB2 LPHN3 WNK1 MDM2 INADL MCL1 CPAMD8 FGFR1 TNC EGFR CCND3 TP53 RB1 TSC1 NEB PDZD2 MLL3 FANCA NF1 PCDHA9 OBSCN ATM BRAF MYCBP2 MLL2 ASH1L ARID1A ELF3 VCAN LRRC7 DCHS2 OSMR FAT1 CREBBP CACNA1D C1orf173 MAPK8IP3 ERCC2 ZFYVE26 HRAS LGALS8 ZFP36L1 RBM10 XIRP2 RANBP2 PIK3CA EP300 STAG2 RARG KDM6A FGFR3 Ta gI-II Ta gIII-T4 Non-Muscle-Invasive Low-Grade UCC Both Muscle-Invasive High-Grade UCC Summary of UCC Mutated Genes from 50 Ta and 418 T1-T4 % Frequency 14 15 Genetic Aberrations of Urothelial Cell Carcinoma Allelic Losses and Chromosomal Alterations There are only a small number of genetic rearrangements in NMIBCs, in contrast to MIBCs, which have many different alterations, potentially the result of the failure of the non-homologous end-joining system (Bentley et al., 2004; Morrison et al., 2014). MIBCs have inactivating mutations in DNA repair and DNA damage checkpoint genes including ataxia-telangiectasia mutated (ATM), Fanconi anemia complementation group A (FANCA), and excision repair cross-complementation group 2 (ERRC2) (Figure 1.2). NMIBCs have a higher frequency of mutations in stromal antigen 2 (STAG2), a component of the cohesion complex and plays important roles in chromatid segregation (Figure 1.2) (Balbás-Martínez et al., 2013; Solomon et al., 2013; Taylor et al., 2014). Additionally chromosome 9 deletion is frequent in both NMIBC and MIBC and affects the tumor suppressor gene cyclin-dependent kinase inhibitor 2A (CDKN2A) and B (CDKN2B), tuberous sclerosis 1 (TSC1), as these genes are located on this chromosomal arm (Cairns et al., 1994; Platt et al., 2009; Sjödahl et al., 2011; Tsai et al., 1990; Williamson et al., 1995). Besides the homozygous deletion of CDKN2A, 2q36, 9p21.3, 11p11, 18p11 and 19q12 are also reportedly homozygously deleted in UCC (Veltman et al., 2003; Voorter et al., 1995). Signatures of Mutations and Frequency The NMIBC somatic mutation frequency is estimated between 50-302 exonic mutations per sample with a median rate of mutation of 1.8 per megabase, while reports on MIBC found ~300 exonic mutations per sample with the median mutation rate of 6-8 per megabase (Network, 2014a; Nordentoft et al., 2014). This somatic mutation rate for 16 UCC is only surpassed by melanoma and lung cancer, both of which are dominated by C:G to T:A transitions. Recent advances in bioinformatics have allowed for the identification of mutational signatures associated with cancer and have been applied to UCC samples identifying an APOBEC (apolipoprotein B mRNA-editing enzyme, catalytic polypeptide-like) mutation signature in one-third of UCCs (Alexandrov et al., 2013; Nordentoft et al., 2014). Expression of APOBEC3B is significantly upregulated in UCC, potentially explaining the mutational signature containing a high incidence of C:G to T:A mutations in the TCW sequence context, where W is T or A (Burns et al., 2013; Lawrence et al., 2013; Network, 2014a; Nordentoft et al., 2014). Fibroblast Growth Factor Receptor 3 Alterations NMIBCs are also noted by the high frequency (~80%) of activating FGFR3 mutations, with favorable patient outcome (Cappellen et al., 1999). Mutant FGFR3 activates the RAS-MAPK pathway and phospholipase Cγ (PLCγ) causing increased cell survival and proliferation (di Martino et al., 2009). Since FGFR3 and RAS gene family mutations are mutually exclusive, this suggests that they affect the same pathway (Jebar et al., 2005). Twenty-one percent (21%) of UCs invade the lamina propria (T1), while 16% of MIBCs (T2-4) have FGFR3 mutations. Interestingly, FGFR3 mutations are not found in carcinoma in situ (CIS) tumors, indicating that NMIBCs do progress, but through a different pathway than CIS tumors (Billerey et al., 2001). Few tumors have concurrent FGFR3 and TP53 mutations with low stage and grade UCC (Ta-T1, 1-2) typically having an activating FGFR3 mutation while in high stage and grade (T2-T4, 2- 3) there are usually inactivating TP53 mutations (Bakkar et al., 2003; Wallerand et al., 2005). 17 Alterations in the PI3K Pathway Genes involved in the PI3K pathway are also mutated in UCC, including upstream activators EGFR, ERBB2 and ERBB3 (Gui et al., 2011; Network, 2014a). EGFR or ERBB2 overexpression leads to RAS activation, that in turn activates the PI3K pathway. Additional activating PIK3CA pathway mutations occur in 25% of NMIBCs and confer a proliferative and migratory advantage (Juanpere et al., 2012; Kompier et al., 2010; López-Knowles et al., 2006; Platt et al., 2009; Ross et al., 2013; Sjödahl et al., 2011). PTEN, which negatively regulates PI3K, commonly shows loss of heterozygosity (LOH) in MIBC, as well as an association with downregulated TP53 expression and poor outcome (Platt et al., 2009; Puzio-Kuter et al., 2009). Other PI3K pathway alterations include the aforementioned TSC1 mutations. MAPK Pathway Activation FGFR3 and RAS (HRAS or KRAS) mutations in UCC are mutually exclusive, suggesting that they share similar functions. Because of the low level and frequency of RAS mutations in NMIBC and MIBC, there is a possibility of other non-redundant functions. Activating FGFR3 mutations result in activated MAPK, but not PI3K, signaling pathway even though PIK3CA mutations co-occur with FGFR3 mutations in NMIBC (Juanpere et al., 2012; Kompier et al., 2010). This suggests cooperative activation of PI3K and MAPK pathways, while the role of FGFR3 in the activation of MAPK pathway needs further elucidation. TP53 and RB Alterations Detection of DNA damage for repair and cell-cycle arrest is coordinated by p53, and TP53 mutations are the most common genetic change in cancer (Williams and Stein, 18 2004). The inability of a cell to repair DNA damage normally would cause the cell to undergo apoptosis by activated p53, thereby stopping the cell cycle, resulting in RB hypophosphorylation and inhibition of cell-cycle progression (Stein et al., 1998). p53, MDM2 and CDKN2A form an autoregulatory feedback loop with decreased CDKN2A expression caused by TP53 downreglation (Robertson and Jones, 1998). TP53 mutations are linked to smoking and also to higher UCC grade and stage (Wallerand et al., 2005). RB is involved in many different cellular functions and helps to control expression of genes involved in the key cellular processes proliferation, differentiation and apoptosis. The RB protein accomplishes these gene expression alterations by interacting with chromatin, DNA-modifying enzymes and transcription factors (Robertson et al., 2000). Non-dividing urothelial cells have a hypophosphorylated RB protein which interacts with members of the E2F transcription factor family thus preventing their binding to DNA and consequently cellular proliferation (Garcia-España et al., 2005). Alterations in RB expression or changes in cyclin-CDK complexes, which are responsible for the phosphorylation of RB, can result in uncontrolled cell growth (Chatterjee et al., 2004). UCC patients with no RB expression as well as those that have high expression of RB have poor prognosis (Cote et al., 1998). Patients with high levels of RB also have elevated levels of cyclin D1 and/or loss of CDKN2A that causes functional RB inactivation (Chatterjee et al., 2004). The interaction between RB and AP- 2, which controls expression of E-cadherin maintaining the epithelial phenotype, is lost in patients not expressing RB thus linking UCC tumorigenesis to de-differentiation (Batsché et al., 1998). RB1 LOH increases from 19% in NMIBCs to 60% in MIBCs (Wada et al., 2000). 19 Epigenome and Related Alterations in UCC NMIBC and MIBC show distinct DNA hypomethylation patterns at non-CpG islands in NMIBC and DNA hypermethylation at CpG islands in MIBC (Reinert et al., 2011; Wolff et al., 2010b). There is the potential for an epigenetic field defect that occurs in UCC and was highlighted by identifying LINE-1 DNA hypomethylation across the entire urothelium of UCC patients (Wolff et al., 2010a; Wolff et al., 2010b; Wolff et al., 2008). Additionally, repressive histone modifications (H3K9me3 and H3K27me3) were shown to play an important role with DNA methylation in gene silencing of genes involved in cellular homeostasis in UCC patients (Dudziec et al., 2012). Interestingly, in MIBC, at least one genetic alteration of a chromatin regulator occurred in 89% of patients (Network, 2014a). The lysine-specific demethylase 6a (KDM6A), responsible with removing the H3K27me3 modification and subsequent euchromatin formation, is frequently mutated in UCC (>25%) (Figure 1.2) (Gui et al., 2011). ARID1A, a component of the SWI/SNF chromatin-remodeling complex component, is also mutated in 22% of UCCs, potentially altering the chromatin architecture. MLL2 methylates H3K4, leading to euchromatin formation and is mutated in 17% of UCCs. Taken together these data highlight that genes involved in epigenetic processes are mutated in one-fifth of UCCs (Gui et al., 2011; Network, 2014a). Inactivating alterations in these epigenetic-related genes are predicted to cause gene silencing and explain the vast epigenome changes in UCC. Other epigenetically-related genes mutated at >5% in MIBC include MLL, MLL3, CREBBP, NCOR1, CHD6, CHD7 and SRCAP (Gui et al., 2011; Network, 2014a). 20 OVERVIEW OF THESIS RESEARCH As summarized previously, much is known about epigenetics and cancer development. However, relationships between epigenetic modifications and genetic mutations, as well their impact on tumorigenesis, remain unclear. Moreover, little is known regarding the mechanisms of epigenetic alterations in cancer and how proteins involved in this process become dysfunctional. The development of next-generation sequencing (NGS) techniques have allowed for studying connections between genetic alterations and changes in the epigenomic landscape rather than a focused look at individual loci. My graduate dissertation is based on combining epigenetic and genetic differences between normal and diseased cells. With the advancements in genome wide DNA methylation arrays, as well as NGS technology, we combined data from different platforms of information to better understand human bladder cancer. First, I comprehensively characterized UCC cell lines with respect to genetic alterations, DNA methylation and chromatin accessibility alterations. These cell lines provide useful model systems to study different aspects of UCC in detail, and by combining epigenetics and genetics, we can better understand the alterations occurring in UCC. Next, I analyzed primary NMIBC and MIBC samples on both the genetic and epigenetic levels to elucidate important events in the disease. Third, I investigated the role of DNA methyltransferases, their ability to establish DNA methylation patterns and their roles during tumorigenesis. In addition, I designed a pipeline for analyzing whole-exome sequencing data so that somatic mutations can be identified and integrated with the epigenetic datasets. Finally, the last chapter will summarize my work and reflect how my 21 findings have advanced the knowledge of epigenetics, bladder cancer and NGS data analysis. 22 CHAPTER 2 COMPREHENSIVE CHARACTERIZATION OF MUTATIONS, EPIGENETIC CHANGES AND GENE EXPRESSION IN HUMAN UROTHELIAL CELL CARCINOMA CELL LINES INTRODUCTION: Urothelial Cell Carcinoma (UCC) ranks fifth among all cancers in Western countries and results in approximately 380,000 new cases and 150,000 deaths per year worldwide (Ferlay et al., 2010). UCC tumors are staged using the Tumor-Node- Metastasis system (TNM system) describing the extent of surrounding tissue invasion (Tis-T4) (Figure 1.1) (Sobin et al., 2009). Although 75% of patients have non muscle- invasive UCCs, it is estimated that 60% of these will have a recurrent tumor while 10- 15% progress to muscle-invasive UCC (Babjuk et al., 2013). Activating mutations in FGFR3 occur in 68-88% of non muscle-invasive UCC Ta tumors and are linked to lower risk of recurrence (Billerey et al., 2001; van Rhijn et al., 2002). Mutations in TP53 have been linked to smoking in muscle-invasive cancer and these mutations are predominately mutually exclusive to mutations in FGFR3 (Bakkar et al., 2003; Wallerand et al., 2005). Due to the molecular events causing disease progression and the limited treatment options available, patients with muscle-invasive UCC typically undergo radical cystectomy. There are many studies that characterize genetic aberrations of muscle-invasive UCC through whole-exome sequencing (WES) (Alexandrov et al., 2013; Balbás- 23 Martínez et al., 2013; Gui et al., 2011; Guo et al., 2013; Iyer et al., 2013; Kandoth et al., 2013; Network, 2014a; Ross et al., 2014). UCC is heterogeneous in its mutational landscape compared to other types of cancer, which typically have aberrations in a specific group of genes with similar functions (Kandoth et al., 2013). Frequently mutated genes in muscle-invasive UCC include STAG2 (13%), ARID1A (22%), MLL2 (17%), KDM6A (28%), CREBBP (15%), EP300 (13%), TP53 (41%), TSC1 (8%), RB1 (15%), HRAS (8%), FGFR3 (13%) and PIK3CA (18%) (Forbes et al., 2011; Gui et al., 2011; Network, 2014a). In addition, UCC have frequent amplification of genes in regions containing oncogenes like CCND1, ERBB2 and MDM2 and loss of heterozygosity (LOH) in regions containing tumor suppressors like CDKN2A, DBC1, PTEN, RB1, TP53 and TSC1 (Forbes et al., 2011; Goebell and Knowles, 2010). Interestingly, chromatin remodeling genes are frequently mutated in UCC and include: KDM6A, MLL2, CREBBP, EP300 and ARID1A potentially causing downstream epigenetic changes (Gui et al., 2011; Network, 2014a). Besides the genetic changes there are many DNA methylation changes that occur in UCC. DNA methylation alterations are involved in both the initiation and progression of carcinogenesis (Goodman and Watson, 2002). DNA hypermethylation can occur at promoters of tumor suppressor genes to silence them while global DNA hypomethylation can lead to genomic instability (Putiri and Robertson, 2011). DNA methylation in gene bodies is also correlated with gene expression and is implicated in cancer (Kulis et al., 2012; Maunakea et al., 2010; Varley et al., 2013; Yang et al., 2014). Methylation- sensitive arbitrarily primed PCR studies have exposed altered DNA methylation patterns in UCC (Gonzalgo et al., 1997; Liang et al., 2002b; Liang et al., 1998; Markl et al., 24 2001). UCC patients have high levels of de novo DNA methylation occurring at promoter CpG islands suggesting epigenetic silencing (Byun et al., 2007; Friedrich et al., 2005; Salem et al., 2000a; Salem et al., 2000b; Wolff et al., 2010b; Wolff et al., 2008). Additionally, those changes can be detected in urine sediments (Friedrich et al., 2004; Su et al., 2014). There is a high level of DNA hypermethylation that occurs when low stage tumors advance into the muscle invasive disease, although this is very rare and only occurs in 10-15% of recurrent UCC patients (Bilgrami et al., 2014). The following genes are impacted by either genetic or epigenetic alterations causing their aberrant expression in UCC: STAG2, ARID1A, MLL2, KDM6A, CREBBP, EP300, HRAS, FGFR3, PIK3CA, RASSF1A, APC, MGMT, PTCH, TSC1, RB1, PTEN, TP53, DAPK, FHIT, CDH1, CDKN2B and CDKN2A (Cairns, 2007; Knowles, 2007). To comprehensively study the genetic and epigenetic alterations in UCC we took advantage of available UCC cell lines. In this study we characterized the genetic and epigenetic aberrations in 15 different UCC cell lines. We identified mutated genes as well as DNA copy number alterations using WES. Genome-wide DNA methylation and chromatin accessibility were detected with the 450K DNA methylation platform. Additionally, gene expression information was incorporated into the analysis showing the possible downstream effects of the epigenetic and genetic alterations. These findings were further validated with muscle-invasive UCC patient data from The Cancer Genome Atlas (TCGA) for biological relevance. 25 MATERIAL AND METHODS: Cell Culture The majority of the cell lines were obtained from muscle-invasive UCC tumors, except for RT4 and UROTSA (Table 2.2). TLD71, LD137, LD583, LD605, LD611, LD630, and LD692 were maintained in DMEM (Dulbecco's Modified Eagle’s Medium) containing 1% penicillin/streptomycin and 10% inactivated fetal bovine serum. HT-1376 and UM-UC-3 were maintained in MEM (Minimum Essential Medium) containing 1% penicillin/streptomycin and 10% inactivated fetal bovine serum. J82, TCCSUP, and SCaBER were maintained in MEM containing 1% penicillin/streptomycin, 10% inactivated fetal bovine serum, 1% MEM non-essential amino acids and 1mM sodium pyruvate. 5637 was maintained in RPMI (Roswell Park Memorial Institute) 1640 medium containing 1% penicillin/streptomycin and 10% inactivated fetal bovine serum. RT4 and T24 were maintained in McCoy’s 5A medium containing 1% penicillin/streptomycin and 10% inactivated fetal bovine serum. UROTSA was maintained in Low Glucose DMEM containing 1mg/mL glucose, 1% penicillin/streptomycin and 10% inactivated fetal bovine serum. The cell lines used in this study are listed in Table 2.2. Illumina Infinium HM450 DNA Methylation Assay DNA methylation was assessed using Illumina’s Infinium HumanMethylation450 (HM450) BeadChip and was performed at the USC Epigenome Center according to the manufacturers specifications. The array examines the DNA methylation status of 482,421 CpG sites and each is reported as a beta value, ranging from 0 (unmethylated) to 1 (fully methylated). CpG probes with a detection p-value greater than 0.05, located within 15 26 base pairs of a single-nucleotide polymorphism, mapped to multiple locations, or on the sex chromosomes were excluded from further analysis in all samples leaving 385,826 probes. AcceSssIble Assay Nuclei preparation and CpG methyltransferase treatment were performed as described previously (Pandiyan et al., 2013). DNA is purified by phenol/chloroform extraction and ethanol precipitation. 1ug of DNA is bisulfite converted using EZ DNA Methylation™ Kit (Zymo Research). Whole Exome Sequencing 5ug of genomic DNA from cell lines was sent to Genewiz for whole exome sequencing. Exome library enrichment was performed with Agilent SureSelect Exome library preparation following the manufacturers specifications. Sequencing was performed on Illuminia’s HiSeq2500 High Output mode platform in a 2x100bp paired- end (PE) configuration. Each sample had a minimum of 120 million reads and an average exonic coverage of 30X per base. Detection of Somatic Mutations High-quality paired-end reads of data was gapped aligned to NCBI human reference genome (hg19) using Burrows-Wheeler Aligner (BWA). Local realignment of BWA-aligned reads was performed using the Genome Analysis Toolkit (GATK) 27 (McKenna et al., 2010). Somatic mutations were called based on the BWA alignments by VarScan (Koboldt et al., 2009) following the heuristic rules: (i) samples are covered sufficiently (≥10x); (ii) average base quality was no less than 20; (iii) variants are represented in at least 20% of the total reads from the sample and (iv) variants had at least 3 reads in the sample. The same criteria were used for the somatic indels called by GATK. Further reduction of the false positive calls was accomplished by the following filtering criteria: (i) Phred-like scaled consensus scores and SNP quality scores of <20; (ii) mapping qualities of variants <30; (iii) indels found in only one DNA strand; and (iv) variants within 30bp of an indel. Elimination of previously described germline variants with a minor allele frequency >1% was done by cross-referencing somatic mutations to dbSNP (version 138) and SNP data sets from the 1000 Genomes Project. Variants were annotated using ANNOVAR (Wang et al., 2010a). Detection of DNA Copy Number Alterations Aligned BAM files from whole-exome sequencing data were used for determining DNA copy number alterations. Normal urothelium sequencing data from 4 different patient samples was used as a control for copy number variation (CNV). FASTQ files from 4 different patients were combined, aligned to the genome using BWA, removal of duplicate reads and annotated using hg19. BED files were then created converting the BAM files using an algorithm calculating the ratio of reads to the average number of reads in a region for each base from the bedtools package. BED files from the normal urothelium were then compared to BED files produced for each cell line using the program Control-FREEC (Boeva et al., 2012; Boeva et al., 2011). The following rules 28 were followed to calculate CNVs: a minimum of 50 reads in the control sample, step sizes of 250bp with window sizes of 500bp, break point threshold set at 1.5, adjusted for GC content and regions targeted by exome sequencing. CNVs were then deemed statistically significant using the Wilcoxon and Kolmogorov-Smirnov tests with p-values <0.001 as a threshold cutoff. Publically Available Data Sets Expression data for cell lines was obtained from The Cancer Cell Line Encyclopedia (http://www.broadinstitute.org/ccle/home). Cell line expression data was obtained using Affymetrix U133+2 arrays, with the raw Affymetrix CEL files converted to a single value for each probe set using Robust Multi-array Average (RMA) and normalized using quantile normalization (Irizarry et al., 2003). RNASeqV2 Level 3 data from The Cancer Genome Atlas (TCGA) was obtained from the publically available data portal (http://cancergenome.nih.gov/dataportal/). Values for gene expression values were aligned using MapSplice and quantification was performed using RSEM (Wang et al., 2010b). 19 different normal urothelium samples were averaged and used as a reference for calculating the fold change (FC) of each of 182 different UCC samples. Genes in which 20% of the UCC samples had a FC more than 2.5 or less than -2.5 were kept for further analysis. Level 3 processed HM450 DNA methylation data from 130 UCC patients were obtained from TCGA and used in the analysis (Network, 2014a). 29 Table 2.1 Primer List for Validation of Specific Gene Mutations Locus Sequence (5’ to 3’) Sanger Sequencing KDM6A (NM_021140) Sense Antisense GCCATTTCAACAGCAACACC AAACGGTCCAAATTTCAGCA ARID1A (NM_006015) Sense Antisense GCCACACAGTTCCAGCAGAG CGGTGATACCGAGATGTCCA 30 Table 2.2 Urothelial Cell Carcinoma Cell Line Information Name Sex Primary or Metastasis Stage (Ta-T4) Grade (1-4) UROTSA F NORMAL N/A N/A RT4 M Primary T2 2 HT-1376 F Primary T2 3 J82 M Primary T3 3 LD137 M Primary ND 4 LD583 F Primary ND 4 LD605 M Primary ND 3 LD611 F Metastasis ND 4 LD630 F Primary ND 4 LD692 M Metastasis ND 4 LD71 M Primary ND 3 SCABER M Primary T4 4 T24 F Primary T3 3 TCC5637 M Primary ND 2 TCCSUP F Primary T4 4 UM-UC-3 M Primary ND 3 31 Table 2.3 Epigenetic Related Genes and Respective Categories Epigenetic Modifying Group Genes (HGNC) Chromatin Remodeling ARID1A, ARID1B, CHD1, CHD2, CHD3, CHD4, CHD5, CHD6, CHD7, CHD8, HMG20B, SMARCA1, SMARCA2, SMARCA4, SMARCA5, SMARCAL1, SMARCB1, SMARCC1, SMARCC2, SMARCD1, SMARCD2, SMARCD3, SMARCE1, SRCAP, ASF1A, ASF1B, ASXL1, BMI1, CBX2, CBX4, CBX8, COREST, NCOR1, NCOR2, PHC1, PHC2, PHC3, RBBP4, RBBP5, RBBP7, EED, FACT, GATAD2B, HCF1, HCFC1, HLTF, HPH1, HPH2, HPH3, MEL18, MI-2A, MI-2B, MYSM1, NCOA6, PCGF2, PRDM14, PRDM16, PRDM2, RBAP46, RBAP48, RCOR1, RING1, RING1B, RIZ1, SATB1, SATB2, SCMH1, SSRP1 and SUZ12 Histone Modifications ASH1L, ASH2, ASH2L, DOT1L, EHMT2, EP300, EZH1, EZH2, HAT1, HDAC1, HDAC10, HDAC11, HDAC2, HDAC3, HDAC4, HDAC5, HDAC6, HDAC7, HDAC8, HDAC9, JARID1C, JARID2, KAT2A, KAT2B, KAT5, KAT6A, KAT6B, KAT7, KAT8, KDM1A, KDM1B, KDM2A, KDM2B, KDM3A, KDM3B, KDM4A, KDM4B, KDM4C, KDM4D, KDM4DL, KDM4E, KDM5A, KDM5B, KDM5C, KDM5D, KDM5DP1, KDM6A, KDM6B, KMT5B, KMT5C, KMT8, MLL, MLL2, MLL3, MLL4, MLL5, NSD1, NSD2, WHSC1, SET1A, SETD1A, SET1B, SETD1B, SETD2, SETDB1, SUV39H1, SUV39H2, SUV420H1, SUV420H2, BAP1, BRE1, CBP, CREBBP, MYST1, MYST2, MYST3, MYST4, RNF2, RNF20, CXXC1, DPY30, JMJD3, LSD1, MENIN, MEN1, MOF, MORF, PYGO1, PYGO2, SIRT1, USP16, USP21, USP22, WDR5 and WDR82 DNA Methylation DNMT1, DNMT3A, DNMT3B, DNMT3L, IDH1, IDH2, MBD1, MBD2, MBD3, MBD3L1, MBD3L2, MBD3L3, MBD3L4, MBD3L5, MBD4, MBD5, MBD6, MECP2, TET1, TET2 and TET3 32 RESULTS: Genetic Background of Cell Lines To investigate the genetic background of the urothelial cell carcinoma (UCC) cell lines we used WES to identify genetic mutations. By sequencing the exonic regions we can find mutations that are in coding regions of genes, leading to direct changes of protein sequences. Sixteen different cell lines were chosen for WES including: SV40- mediated immortalized normal urothelium cell line (UROTSA), 1 non-invasive UCC cell line (RT4), 12 invasive UCC cell lines (HT1376, J82, LD137, LD583, LD605, LD630, LD71, SCABER, T24, TCCSUP, TCC5637 and UMUC3) and 2 UCC cell lines derived from UCC metastases (LD611 and LD692) (Table 2.2). LD583 and LD611 cell lines are the primary and metastatic tumor cell lines derived from the same patient, respectively (Markl and Jones, 1998). From the 16 cell line WES data we observed on average 461 (range 202-1,237) variants in protein coding regions and 339 (range 146-901) variants with functional impacts as defined by a variant effect predictor, snpEFF (Figure 2.1 A and B) (Cingolani et al., 2012). UCC is the 4 th most highly mutated cancer with 1-10 mutations per megabase with melanoma having up to 400 mutations per megabase (Alexandrov et al., 2013; Network, 2014a). Since epigenetic related genes are mutated in UCC we highlighted those genes that are implicated in chromatin remodeling, histone modifying or DNA methylation (Table 2.3) and found an average of 7 (range 2-20) functional variants in epigenetically related genes per cell line (Figure 2.1 C). We then focused on mutations found in 32 genes from the literature that have a high mutational frequency in UCC patient samples (Network, 2014a). UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 5 10 # of Mutations INDEL MISSENSE NONSENSE UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 5 10 15 20 # of Mutations INDEL MISSENSE NONSENSE UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 500 1000 # of Mutations Low Moderate High UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 250 500 750 1000 1250 # of Mutations INDEL MISSENSE NONSENSE SILENT Figure 2.1: Overview of Mutations in UCC Cell Lines. (A) Protein coding mutations per cell line. (B) Mutations per cell line classified by snpEFF impact annotations (excluding silent mutations). (C) Subset of functional mutations per cell line in epigenetic related genes and frequently mutated genes found by TCGA (excludes silent mutations). 33 All Identified Mutations in Protein Coding Regions A B C Impacts of Nonsynonymous Mutations Mutations in Epigenetic Related Genes Frequently Mutated Genes High Moderate Low 0 50 100 RHOA ZFR2 ZFP36L1 CDKN2A ELF3 RXRA ERBB3 STAG2 RB1 FGFR3 ERCC2 CDKN1A EP300 PIK3CA KDM6A ARID1A MLL2 TP53 % Frequency TCGA UCC Figure 2.2: Frequently Mutated Genes in Urothelial Cell Carcinoma. Each row represents a gene and each column is a different cell line. Green font is a normal cell line, orange font is a low stage cell line and blue font is a pair of cell lines from the same patient. Gene insertion/deletion (yellow), missense (blue), nonsense (red), and silent (green) mutations are filled in. The frequency of mutations is shown on the right compared to the frequency of mutations in 130 UCC patient samples from TCGA. 34 LD692 UMUC3 LD137 J82 LD583 LD611 HT1376 LD605 LD630 LD71 RT4 T24 TCCSUP SCABER TCC5637 UROTSA TP53 MISSENSE MISSENSE NONSENSE NONSENSE MISSENSE MISSENSE NONSENSE NONSENSE MLL2 MISSENSE MISSENSE NONSENSE MISSENSE MISSENSE NONSENSE MISSENSE ARID1A NONSENSE MISSENSE INDEL KDM6A MISSENSE MISSENSE INDEL INDEL INDEL NONSENSE PIK3CA MISSENSE EP300 NONSENSE NONSENSE INDEL MISSENSE CDKN1A INDEL ERCC2 MISSENSE FGFR3 SILENT MISSENSE RB1 INDEL MISSENSE NONSENSE NONSENSE STAG2 INDEL ERBB3 MISSENSE MISSENSE MISSENSE MISSENSE RXRA SILENT MISSENSE ELF3 INDEL CDKN2A INDEL INDEL INDEL INDEL INDEL INDEL INDEL ZFP36L1 INDEL ZFR2 MISSENSE RHOA MISSENSE SILENT MISSENSE INDEL NONSENSE 35 On average there were 3 (range 1-8) functional mutations per cell line in this subset of genes (Figure 2.1 C). A closer look at the specific genes and their types of mutations showed a high rate of missense, nonsense and insertions/deletions (indels) in these frequently mutated genes (Figure 2.2). The frequency of mutations in specific genes from the 15 UCC cell lines matched closely with the mutation frequency of those same genes from a summary of 130 primary high stage UCC tumors (Figure 2.2). CDKN2A was the only gene with a higher frequency of alterations in the cell lines, which is not surprising because cell lines with homozygous deletions of CDKN2A can be more easily established in vitro (Spruck et al., 1994). As expected, there was a high occurrence of mutations in TP53, the majority of which were non-synonymous mutations. Mutated TP53 and deletion of CDKN2A are previously reported in UMUC3 cells but we also found them in LD583, LD611 and LD630 cells (Spruck et al., 1994). Low sequence coverage prevented the detection of the TP53 mutation in the LD137 cell line and the 2bp deletion of CDKN2A in the SCABER cell line as we previously reported (Spruck et al., 1994). Some of these highly mutated genes are involved in epigenetic regulation, they include: ARID1A, EP300, KDM6A and MLL2. We also confirmed findings of homozygous deletions of CDKN2A in 7 out of the 15 UCC cell lines (Figure 2.2). Additionally we did not observe any case where cell lines had both non-synonymous functional mutations in FGFR3 and TP53 confirming the low frequency of concurrent mutations in these genes (Figure 2.2) (Bakkar et al., 2003). UMUC3 did have a missense mutation in TP53 and a silent mutation in FGFR3 that does not have an impact on the protein. We confirmed mutations in the epigenetically related genes, as defined by gene ontology annotations, KDM6A and ARID1A, using Sanger sequencing in LD71 and T24 36 cells. These showed homozygous frame-shift and heterozygous nonsense mutation in KDM6A, respectively (Figure 2.3 A). The homozygous nonsense mutation in ARID1A was confirmed in LD692 cells (Figure 2.3 B). These mutations are detrimental to the protein structure since they are truncating. In addition to somatic mutations, we were also interested in DNA copy number changes in the UCC cell lines because many regions with DNA amplifications commonly harbor oncogenes while regions of DNA deletions often contain tumor suppressor genes. Although our lab previously identified allelic losses occurring at chromosomes 9, 11 and 17 in patient samples (Tsai et al., 1990) , we wanted to confirm those findings in our 15 cell lines. We can identify regions that have copy number alterations using normal urothelium WES data as a reference. A limitation to using WES data is the restriction to only exonic regions, making it harder to identify copy number changes between genes. The mean number of copy number variations per cell line was 68 (range 48 to 85) with an average of 18 gains (range 1-34) and 50 losses (range 36-71). The most frequent changes involved partial losses of 4p (100% of cell lines) and gains of 2q (54% of cell lines) (Figure 2.4 and Figure 2.5). Other losses occurring in 7 cell lines or more included 2q (93% of cell lines), 5p (93% of cell lines), 6p (93% of cell lines), 3q (87% of cell lines), 1q (67% of cell lines), 2p (67% of cell lines), 11p (67% of cell lines), 21q(67% of cell lines), 22q (67% of cell lines), 12p (60% of cell lines), 1p (54% of cell lines), 8p (54% of cell lines), 13q (54% of cell lines), 14q (54% of cell lines), 19p (54% of cell lines) and 9p (47% of cell lines) (Figure 2.5). Other gains occurring in 7 cell lines or more included 4q (47% of cell lines) and 5p (47% of cell lines) (Figure 2.4). These findings confirm 37 previous findings from our lab, such as allelic losses at chromosomes 9p and 11p (Tsai et al., 1990). A summary of the location of chromosome losses and gains found in at least 2 different cell lines is diagrammed with the most striking results showing partial gains of chromosomes 5 and 8 and losses of chromosomes 3, 4, 8 and 15 (Figure 2.6). Important genes are found in these regions; e.g. CDKN2A is located at chromosome 9p.21.3 and was found homozygously deleted in 7/15 cell lines. In addition, cancer related genes that fell into regions of chromosome losses for at least 2 cell lines included RB1, CREBBP, NCOR1, PTEN, FGFR1/3, WHSC1, TERT, BCR, ARID2, HRAS and SMARCB1. Chromosome regions that were amplified in at least 2 cell lines contained the following cancer related genes: CCND1, EGFR, PPARG, MDM2, ERRB2, CCNE1, MYC, FGFR3, TERT, IL7R, LIFR and MYCL1. Although the genetic alterations and frequencies of mutations in the cell lines are similar to those found in patients with muscle-invasive UCC, the cell lines cover different stages of the disease. RT4 cells were derived from a patient with a grade 1 tumor and is the only non-invasive cell line. SCABER is a squamous cell carcinoma cell line and UROTSA cells are an immortalized normal urothelium cell line and it showed the fewest overall mutations as well as one mutation in RHOA found to be frequently mutated in UCC. Since genetic mutations are only one aspect of altering gene function we also were interested in gene expression alterations caused by DNA methylation because of the potential correlation between genetic and epigenetic alterations and roles during tumorigenesis. Figure 2.3: Key Mutations in KDM6A and ARID1A. (A) Validated mutations using Sanger sequencing for KDM6A for LD71 and T24. (B) Validated mutations using Sanger sequencing for ARID1A in LD692. Amino Acids (AA) are in blue, DNA are in green and differences are in red text. Homozygous mutations (HOM) and heterozygous mutations (HET) are indicated. 38 KDM6A REF AA# REF AA REF DNA 590 591 592 593 594 Ser Ala Glu Val Leu TCA GCA GAA GTT CTG LD71 AA LD71 DNA Ser Ala Glu Phe * TCA GCA GAG TTC TGA Glu592GluPhe* GA>G HOM T24 AA T24 DNA Ser Ala * TCA GCA TAA Glu592* G>T HET ARID1A REF AA# REF AA REF DNA 1858 1859 1860 1861 1862 Leu Ala Lys Val Asp TTG GCC AAG GTG GAC LD692 AA LD692 DNA Leu Ala * TTG GCC TAG Lys1860* A>T HOM A B 16p13.3 19p12 19p13.11 2q11.2 17p11.2 18p11.21 19q13.2 19q13.31 4q13.2 5p12 5p13.1 5p13.2 5p13.3 5p14.1 5p14.2 5p14.3 5p15.1 5p15.2 5p15.31 6p22.1 11q22.1 11q22.2 13q14.3 18p11.32 19q13.41 5p15.32 5p15.33 11q13.3 11q13.4 11q22.3 13q11 15q21.2 19q13.42 1p22.1 22q11.21 4p16.3 7q11.21 0 25 50 75 100 % Frequency LD692 LD583 LD611 HT1376 T24 J82 LD605 TCCSUP LD630 LD137 LD71 SCABER UMUC3 TCC5637 RT4 2q11.2 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 16p13.3 1 0 1 1 1 0 1 0 1 1 1 0 0 0 0 19p12 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 19p13.11 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 4q13.2 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 5p12 1 1 1 1 0 1 0 1 0 1 0 0 0 0 0 5p13.1 1 1 1 1 0 1 0 1 0 1 0 0 0 0 0 5p13.2 1 1 1 1 0 1 0 1 0 1 0 0 0 0 0 17p11.2 1 1 1 1 1 1 0 0 1 0 0 0 0 0 0 18p11.21 1 1 1 0 1 0 1 0 0 1 1 0 0 0 0 19q13.2 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 19q13.31 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 5p13.3 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 5p14.1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 5p14.2 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 5p14.3 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 5p15.1 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 5p15.2 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 5p15.31 1 1 1 1 0 1 0 1 0 0 0 0 0 0 0 6p22.1 0 0 1 1 0 1 0 1 0 0 1 0 1 0 0 5p15.32 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 5p15.33 1 1 1 1 0 0 0 1 0 0 0 0 0 0 0 11q22.1 1 1 1 0 0 0 0 0 0 0 0 1 0 1 0 11q22.2 1 1 1 0 0 0 0 0 0 0 0 1 0 1 0 13q14.3 1 0 0 0 0 0 1 0 0 1 1 0 1 0 0 18p11.32 1 1 0 0 0 0 1 0 1 1 0 0 0 0 0 19q13.41 1 1 0 0 1 1 0 0 1 0 0 0 0 0 0 1p22.1 0 1 1 1 0 0 0 0 0 0 0 1 0 0 0 4p16.3 1 1 1 0 1 0 0 0 0 0 0 0 0 0 0 7q11.21 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 11q13.3 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 11q13.4 0 1 1 0 0 0 0 0 0 0 0 1 1 0 0 11q22.3 1 0 0 0 0 0 1 0 0 0 0 1 0 1 0 13q11 1 0 0 0 1 0 1 0 0 1 0 0 0 0 0 15q21.2 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 19q13.42 1 1 0 0 1 0 0 0 1 0 0 0 0 0 0 22q11.21 1 0 0 0 1 0 0 0 0 0 0 0 1 0 1 Figure 2.4: Copy Number Gains in UCC Cell Lines. Each row represents a cytoband and each column is a different cell line. UROTSA was used as a a reference and is not included for this reason. Orange font is a low stage cell line and blue font is a pair of cell lines from the same patient. Genomic region amplifications are are red. The frequency of copy number gains is shown on the right. 39 4p15.32 4p16.3 6p22.1 11p15.5 21q22.2 22q12.3 2q11.2 5p15.33 6p21.32 6p21.33 3q13.33 21q21.3 12p13.31 13q12.13 13q34 1q32.3 2p12 10q26.3 14q11.2 19q13.43 1p22.2 1q42.2 19p12 22q11.23 1q42.13 8p23.1 8p23.3 9p21.3 12q12 14q32.33 21q22.3 5q33.3 8p11.22 16p13.3 0 25 50 75 100 % Frequency 40 J82 LD605 RT4 UMUC3 LD630 LD71 SCABER LD583 T24 TCC5637 TCCSUP HT1376 LD137 LD611 LD692 4p15.32 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4p16.3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 6p22.1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 11p15.5 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 21q22.2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 22q12.3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2q11.2 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 5p15.33 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 6p21.32 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 6p21.33 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 3q13.33 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 21q21.3 1 1 0 0 1 1 1 1 1 1 1 1 1 1 1 12p13.31 1 1 1 1 0 1 1 1 1 1 1 0 1 1 0 13q12.13 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 13q34 1 1 1 1 1 0 1 1 1 1 1 1 0 0 0 1q32.3 1 1 1 1 1 0 1 0 1 1 1 0 1 0 0 2p12 0 1 1 0 1 1 0 1 1 0 1 0 1 1 1 10q26.3 1 1 1 0 1 0 1 0 1 1 1 1 0 0 1 14q11.2 1 0 1 1 1 1 0 1 0 0 1 1 1 0 1 19q13.43 1 1 1 1 0 1 0 1 1 1 1 0 0 1 0 1p22.2 1 1 1 1 1 1 1 0 1 1 0 0 0 0 0 1q42.2 1 1 1 1 1 0 1 0 1 1 1 0 0 0 0 19p12 1 1 1 0 1 1 0 1 0 0 0 0 1 1 1 22q11.23 0 1 1 0 1 0 0 1 1 1 1 0 0 1 1 1q42.13 1 1 0 1 1 1 1 0 1 1 0 0 0 0 0 8p23.1 1 0 0 1 1 1 1 0 0 0 0 0 1 1 1 8p23.3 1 0 1 1 0 0 1 1 0 0 1 1 0 0 1 9p21.3 0 1 1 1 1 1 0 1 0 0 0 0 0 1 0 12q12 1 0 0 1 0 0 1 1 0 0 0 1 1 1 0 14q32.33 1 1 1 1 0 0 1 0 0 1 0 1 0 0 0 21q22.3 1 0 1 1 0 1 1 0 0 1 0 1 0 0 0 5q33.3 1 0 0 1 1 0 0 0 0 0 1 1 1 0 0 8p11.22 0 0 0 0 0 1 1 1 0 0 1 0 1 1 0 16p13.3 0 1 1 1 0 1 0 0 0 1 0 1 0 0 0 Figure 2.5: Copy Number Losses in UCC Cell Lines. Each row represents a cytoband and each column is a different cell line. UROTSA was used as a a reference and is not included for this reason. Orange font is a low stage cell line and blue font is a pair of cell lines from the same patient. Genomic region losses are are blue. The frequency of copy number losses is shown on the right. Figure 2.6: Overview of Copy Number Alterations of UCC Cell Lines. Top chromosome map shows regions of copy number gains (red) and the bottom chromosome map shows regions of copy number losses (blue). Only copy number changes occurring in 2 or more different cell lines are summarized. 41 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 0 Mb 50 Mb 100 Mb 150 Mb 200 Mb 250 Mb Chromosome # Chromosome # Copy Number Gains Copy Number Losses 0 Mb 50 Mb 100 Mb 150 Mb 200 Mb 250 Mb 42 Methylation Background of Cell Lines To investigate the DNA methylation status of the UCC cell lines, we used the Illumina 450K Human DNA Methylation array (450K) to interrogate 482,421 CpG sites. The 450K array includes 99% of RefSeq genes including promoters, transcribed regions, intragenic regions as well as 96% of CpG islands (CGI). To capture cancer-specific DNA hypermethylation events we identified unmethylated CpG sites (mean B-value<0.1) from 450K array data of 19 normal urothelium patient samples from TCGA and applied a cut- off of ≥0.3 B-value for the cell line to be considered “hypermethylated”. Cancer-specific DNA hypomethylation events were captured by using methylated CpG sites (mean B- value>0.9) from the same 19 normal urothelium patient samples from TCGA and applying a cut-off of <0.7 B-value for the cell line to be considered hypomethylated. This yielded 174,530 CpG sites for further analysis. In total the average number of aberrant DNA methylation events per cell line was 19,553 (range 12,005-30,519) where a mean of 10,230 CpGs are hypermethylated (range 4,919-24,750) and a mean of 9,323 CpGs are hypomethylated (range 1,736-21,968) (Figure 2.7 A). Generally, there was a larger percentage of hypermethylated CpGs than hypomethylated CpGs as indicated in the bottom panel of Figure 2.7 A. Cancer-specific DNA hypermethylation was most frequently seen at CpG sites contained in CGI and hypomethylation usually occurs at CpG sites found in non-CGI. We found that the percentage of CpG sites with hypermethylation, located in CGI was on average 73% per cell line (range 68-79%) and CpG sites exhibiting hypomethylation were located primarily in non-CGI, with on average 88% per cell line (range 85-96%) (Figure 2.7 B). Figure 2.7: Overview of DNA Methylation Changes in UCC Cell Lines. (A) Top bar chart shows the relative number of hyper/hypo methylated CpGs in each cell line compared to normal tissue with percent total shown below. (B) Separation of hyper/ hypo methylated CpGs showing CGI context (top) and genomic location (bottom). 43 A UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites Body Other Promoter UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites Body Other Promoter UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites Body Other Promoter UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites Body Other Promoter Differentially Methylated CpG Sites Compared to Normal Tissue (%) UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 8000 16000 24000 32000 # of CpG Sites Hypermethylated Hypomethylated Differentially Methylated CpG Sites Compared to Normal Tissue (%) UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 8000 16000 24000 32000 # of CpG Sites Hypermethylated Hypomethylated UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites NON CGI UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites NON CGI UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites NON CGI UROTSA RT4 HT1376 J82 LD137 LD583 LD605 LD611 LD630 LD692 LD71 SCABER T24 TCC5637 TCCSUP UMUC3 0 50 100 % of CpG Sites NON CGI Hypermethylation Hypomethylation B 44 CpG sites were also classified into 3 categories; promoter, gene body and other regions. Interestingly, there was no specific preference for CpG sites located in gene bodies to be hyper- or hypomethylated with 55% and 58%, respectively. However, we did see a larger percentage of gene promoter hypermethylation occurring than hypomethylation with 24% and 5%, respectively. In addition, there was a lower percentage of hypermethylated CpG sites in “other” genomic regions compared to hypomethylated CpG sites: 21% and 37%, respectively (Figure 2.7 B). To investigate potential DNA methylation changes that could influence gene expression we looked at DNA hyper- and hypomethylation occurring at gene promoters. CpG sites that were aberrantly methylated in at least 4 cell lines were examined. In total 5,323 CpG sites hypermethylated at gene promoters representing 2,048 genes and 1,425 CpG sites hypomethylated at gene promoters representing 998 genes. (Figure 2.8 A and B, respectively). Although we previously reported that DNA methylation in uncultured tumors does not dramatically change in culture (Markl et al., 2001), we wanted to verify these methylation changes in patient tumors. We interrogated DNA methylation at the same CpG sites from 450K data from 126 UCC patients from TCGA (Figure 2.8 A and B right panels). Many hypermethylated and hypomethylated CpG sites were present not only in the cell lines but also in primary tumors. Although the differentially methylated CpG sites were more striking in the cell lines compared to the primary tumors, this can be explained by tumor heterogeneity. Thus, we verify again on the global scale that immortalized cell lines reflect the DNA methylation status similar to primary tumors (Figure 2.8). Figure 2.8: Summary of Promoter Hyper and Hypo DNA Methylation in UCC. Heatmap of DNA methylation were each row represents a CpG site and each column is a different cell line. Yellow indicates methylation and blue indicates unmethylated. Hyper and Hypo methylation events are separated into A and B, respectively. Normal is the mean of 19 different normal urothelium samples. 45 A B Hypermethylation 5,323 CpG Sites Hypomethylation 1,425 CpG Sites HT1376 LD71 LD605 LD583 TCCSUP LD611 LD692 TCC5637 LD137 LD630 J82 T24 SCABER RT4 UMUC3 CGI Normal Muscle-Invasive UCC n=126 CGI Normal Muscle-Invasive UCC n=126 HT1376 TCCSUP LD71 LD630 LD605 LD692 T24 J82 UMUC3 LD137 SCABER LD583 LD611 TCC5637 RT4 46 This data confirmed previous studies regarding global DNA methylation in bladder cancer (Network, 2014a; Wolff et al., 2010b). Gene Expression Changes in Bladder Cancer Caused by Epigenetic Changes Although we identified which genes were mutated or had promoter methylation changes, we determined if these aberrations cause altered gene expression. We used RNA-seq data from TCGA for 182 tumors and compared them to 19 normal urothelium samples (Network, 2014a). A total of 11,369 genes were expressed in either normal urothelium or the tumor samples with minimum read count expectation-maximization (RSEM) values greater than 100 (Figure 2.9 A). We calculated the fold-change (FC) of expression of tumor to normal tissue and found 276 genes with a ≥2.5 FC increase and 764 genes with a ≤2.5 FC decrease in UCC (Figure 2.9 A). We compared the top mutated genes in UCC to the differentially expressed genes and found only 84 genes that had mutations and an expression change. This leaves 956 genes with differential expression patterns that result from other mechanisms, possibly DNA methylation or nucleosome positioning changes (Figure 2.9 B). As others and we previously showed DNA alterations at gene promoters are not the only epigenetic alteration influencing gene expression but also can include histone modifications or nucleosome positioning changes (Bell et al., 2011; Shen and Laird, 2013; You and Jones, 2012). Since nucleosome occupancy is recognized as an important regulator of gene expression by providing or preventing accessibility for transcription machinery, we wanted to access chromosome accessibility in the cell lines. We used the AcceSssIble Assay developed in our lab (Figure 2.10) (Pandiyan et al., 2013). This assay utilizes the 450K array and an exogenous CpG methyltransferase, M.SssI, to interrogate 47 chromatin accessibility and DNA methylation at the same CpG site by comparing to non enzyme treated nuclei. CpG sites that are methylated (>0.7 B-value) are not able to be analyzed by this method thus we are limited to looking at regions that are unmethylated for chromatin accessibility measurements (Figure 2.10) (Pandiyan et al., 2013). Although promoters of genes may be unmethylated indicating the potential for gene expression, transcription factors may nevertheless not be able to bind in these regions. Using the CpG sites that are aberrantly methylated in the UCC cell lines we investigated the CpG accessibility for those same sites. We used DNaseI hypersensitivity (DHS) data from Encode for normal urothelium as a reference because it shows accessible and inaccessible chromatin. Only CpG sites at gene promoters were considered for analysis. We found 2,468 genes which were inaccessible in normal urothelium but accessible in at least 20% of the UCC cell lines (Figure 2.11 A). Interestingly, 17% of those genes were methylated in normal urothelium and were found to be hypomethylated in the cell lines. 8,409 genes were accessible in normal urothelium but became inaccessible in at least 20% of the UCC cell lines (Figure 2.11 B). Only 18% of these genes were unmethylated in normal urothelium and become methylated in the cell lines. 48 11,369 Total Genes log 2 FC (Tumor to Normal) 764 Genes 2.5 FC Down in Cancer 276 Genes 2.5 FC Up in Cancer A Mutated Genes Differentially Expressed Genes B Figure 2.9: Gene Expression Changes in UCC. (A) Gene expression changes for 11,369 genes sorted by largest fold-change increase in tumors to largest fold-change decrease. The red line represents the average of 182 samples with the standard deviation shown in gray. Blue horizontal dashed lines represent a 2.5 fold increase or decrease cut-off. (B) Venn diagram of the differentially expressed genes (green circle) and the top mutated genes in UCC (red circle). 956 84 884 49 Figure 2.10: AcceSssIble Assay Workflow. Nuclei are extracted from cells and split into two fractions for a no enzyme or M.SssI CpG methyltransferase enzyme sample. The control no enzyme fraction is used to assess endogenous DNA methylation while the treated sample interrogates accessible regions of DNA to the CpG methyltransferase. DNA is purified from both fractions and undergoes bisulfite conversion and analysis on the 450K array. Processed B-values from the enzyme treated sample are subtracted from the corresponding B-value from the untreated sample resulting in a footprint of chromatin accessibility. Regions with endogenous DNA methylation B-values above 0.7 are excluded from analysis due to the inability to detect increased methylation values at these sites. Endogenous DNA Methylation Post-M.SssI DNA Methylation DNA Purification Bisulfite Conversion Infinium 450K Microarray Control No Enzyme Treatment M.SssI Treatment Isolated Nuclei Chromatin Accessibility Subtraction In Silico Unmethylated DNA Methylated DNA Accessible DNA Inaccessible DNA 18% 17% 50 A C 18% 17% 8,409 Genes Accessibility Decreasing 2,468 Genes Accessibility Increasing 7912 267 497 2396 204 72 72 E 57% 47% 497 Genes Accessibility Decreasing and Decreased Expression 72 Genes Accessibility Increasing and Increased Expression Methylation Dependent Methylation Dependent G1 G3 G2 G4 Figure 2.11: Chromatin Accessibility Alterations in UCC Cell Lines. (A) Pie showing all genes with accessibility increases in at least 20% of the UCC cell lines and fraction which have accompanying DNA methylation losses. (B) Pie showing all genes with accessibility decreases in at least 20% of the UCC cell lines and fraction which have accompanying DNA methylation gains. (C) Venn diagram with genes which gained accessibility and genes which had increased expression in tumors. (D) Venn diagram with genes which decreased accessibility and genes which had decreased expression in tumors. (E and F) Genes which expression changes matched with epigenetic alterations show as pies of chromatin accessibility changes independent and dependent on DNA methylation. G1 are genes which lose expression in the tumors and have become methylated and inaccessible. G2 are genes which also lose expression but only have become inaccessible. G3 are genes which gain expression in the tumors and have become unmethylated and accessible. G4 are genes which also gain expression but only have become accessible. B D F 51 Overlapping the 276 genes that increased expression in UCC tumors we found 72 genes could be explained by epigenetic changes of chromatin accessibility increases with or without DNA hypomethylation (Figure 2.11 C and E). Genes with decreased expression in UCC tumors (497) could be regulated by chromatin accessibility decreases with or without DNA hypermethylation (Figure 2.11 D and F). Limitations of using cell lines are that there may be artifacts from the cell culture process or from the immortalization method. To assess the biological significance of these epigenetic and genetic changes we compared our cell line data to TCGAs UCC sample set for expression, methylation and mutation information. Taking the DNA methylation and accessibility changes found in the UCC cell lines, we asked whether the methylation changes were consistent with patient data and how the expression of these genes is affected. We were able to isolate 4 different groups which had epigenetic changes that also correlated with gene expression changes in 435 genes (Figure 2.11 E and F and Figure 2.12). We calculated the change in accessibility of the UCC cell lines compared to DHS data of normal urothelium (Delta Accessibility), DNA methylation changes of the UCC cell lines and 126 patient samples to normal urothelium (Delta Methylation) and gene expression fold changes (FC) of 182 patient samples to normal urothelium. Group 1 (G1) has DNA methylation and chromatin accessibility dependent changes, causing decreased gene expression levels and contains 343 CpG sites representing 162 different genes. Group 2 (G2) has DNA methylation and chromatin accessibility dependent changes causing increased gene expression levels, consisting of 6 CpG sites representing 6 different genes. Group 3 (G3) is chromatin accessibility dependent without DNA 52 methylation changes, causing decreased gene expression levels, consisting of 780 CpG sites located in 332 different genes. Group 4 (G4) has chromatin accessibility changes without DNA methylation changes causing increased gene expression levels, consisting of 78 CpG sites representing 23 different genes. G1 and G3 represent genes that are normally expressed in the urothelium and predominately have CGI promoters. G2 and G4 represent genes with low expression in normal urothelium and have only non-CGI promoters (Figure 2.12). Pathways Altered in UCC Although some genes with epigenetic changes correlated with expression changes, we wanted verify the mutation status in those genes by checking the top frequently mutated genes. We used TCGA’s UCC mutation data and isolated 968 genes that were mutated in at least 6 different patients and were also interrogated on the 450K array. Only 45 genes were both mutated and epigenetically altered, while 923 genes were solely mutated and 390 were solely epigenetically altered (Figure 2.13 Top Venn Diagram). Interestingly, genes contained in 7 different pathways that were either mutated or epigenetically altered were previously been implicated in cancer like cadherin 11 (CDH11) in retinoblastoma and CDKN2A in UCC (Figure 2.13 Bottom Venn Diagram) (Marchong et al., 2004; Markl and Jones, 1998). Additionally, we previously reported the alterative transcripts of endothelin signaling genes in a melanoma cell line as well as hypermethylation of the gene promoter (Pao et al., 2001; Tsutsumi et al., 1999). The cadherin and Wnt signaling pathways contained genes that were mutated, epigenetically altered and both mutated and epigenetically altered (Figure 2.13 Bottom Pathway Lists). Figure 2.12: Epigenetic Alterations Confirmed in Patient Samples. Heatmap where each row represents a CpG and each column is a different dataset. Leftmost panel shows CGI context. Second panel is the normal accessibility from DHS of normal urothelium from ENCODE. Third panel is the delta accessibility changes compared to normal urothelium for the 15 UCC cell lines. Fourth panel is the normal DNA methylation average of 19 patients. Fifth panel is the delta DNA methylation for the 15 UCC cell lines. Sixth panel shows delta DNA methylation for 126 patients with UCC. Seventh panel is the average log 2 expression values for 19 normal urothelium RNA- seq samples. The last panel shows the fold-change expression of tumor to normal of 182 RNA-seq samples. 53 UCC ΔACC UCC ΔMETH TCGA ΔMETH TCGA FC RNA-seq CGI Non-CGI ΔMethylation Loss Gain Methylation 0 1 ΔAccessibility Loss Gain Accessibility Closed Open FC RNA-seq Loss Gain log 2 (RSEM) 0 20 G1 G2 G3 G4 54 These genes/pathways have been shown to play important roles in cancer development. Notable pathways that were only affected by epigenetic changes included angiogenesis, apoptosis and interleukin signaling pathways. VEGF, PDGF, TP53 and hypoxia response pathways were notable pathways found to have been genetically altered alone and have been implicated in UCC previously. These findings provide potentially new therapeutic targets since not only are genetic alterations occurring in UCC but epigenetic alterations account for a large proportion. Figure 2.13: Genetically or Epigenetically Altered Genes and Pathways in UCC. (A) Venn diagram showing overlap of frequently mutated genes and epigenetically altered genes in UCC with expression alterations. (B) Venn diagram showing overlap of pathways affected by mutations and pathways affected by epigenetic alterations. (C) Lists of pathways found in each category and have a p-value < 0.05. 55 923 390 45 19 16 7 Top Mutated Genes Epigenetically Altered Genes Top Mutated Pathways Epigenetically Altered Pathways Axon guidance mediated by netrin Beta1 adrenergic receptor signaling Beta2 adrenergic receptor signaling Cytoskeletal regulation by Rho GTPase Endogenous_cannabinoid_signaling Hedgehog signaling G-protein-Gq alpha and Go alpha mediated Huntington disease Hypoxia response via HIF activation Insulin/IGF pathway-protein kinase B Ionotropic glutamate receptor Metabotropic glutamate receptor group I Metabotropic glutamate receptor group III Nicotinic acetylcholine receptor signaling Notch signaling p53 and feedback loops 2 PDGF signaling VEGF signaling B cell activation Cadherin signaling Endothelin signaling Histamine H1 receptor mediated signaling Inflammation by chemokine/cytokine signaling Integrin signaling Wnt signaling 5HT2 type receptor mediated signaling Alzheimer disease-amyloid secretase Alzheimer disease-presenilin Angiogenesis Apoptosis signaling Axon guidance mediated by Slit/Robo CCKR signaling map Gonadotropin releasing hormone receptor Histidine biosynthesis Insulin/IGF pathway-mitogen activated- protein kinase kinase/MAP kinase Interleukin signaling Oxidative stress response Oxytocin receptor mediated signaling Pyrimidine Metabolism TGF-beta signaling Toll receptor signaling 56 DISCUSSION: Next-generation sequencing technology is one of the pivotal resources to analyze and understand biological systems. The relatively low costs associated with whole-exome sequencing, DNA methylation and gene expression arrays allow for integrated analysis to understand and characterize cancer. Experimental models are very important to study the molecular mechanisms underlying UCC. Here we have performed whole-exome sequencing, DNA methylation arrays and chromatin accessibility experiments on 15 UCC cell lines. We were also able to use publically available resources for gene expression status and comparisons to normal urothelium and UCC patient samples. We characterized gene mutations, DNA copy number alteration, epigenetic and chromatin status of key genes in UCC. This allows for a comprehensive molecular understanding of the individual UCC cell lines to be used for possible model systems to study bladder cancer. Large consortia characterized mutations in many UCC cell lines but only focused on 1,651 genes (Barretina et al., 2012). Although that study was comprehensive for a specific gene list, using exome sequencing we were able to identify protein coding mutations in 15 different UCC cell lines. This allowed us to confirm frequently mutated genes in UCC such as; TP53, FGFR3, MLL2, ARID1A and KDM6A and enables individual cell lines to be chosen as specific models for UCC. In addition to mutations we also confirmed copy number alterations that are important in UCC such as the homozygous deletion of CDKN2A. DNA methylation alterations in UCC have been shown to occur genome-wide (Gonzalgo et al., 1997; Liang et al., 2002b; Liang et al., 1998; Markl et al., 2001). 57 Specifically, global hypomethylation decreases genome stability while hypermethylation of tumor suppressors can cause decreased expression (Putiri and Robertson, 2011). High stage UCC is known to have DNA hypermethylation at gene promoters that was also observed in the UCC cell lines. Interestingly, some UCC cell lines exhibited a large proportion of DNA hypomethylation that is thought to contribute to chromosome instability and is more prominent in low stage UCC (Eden et al., 2003; Wolff et al., 2010b). DNA methylation alone is not able to explain changes in gene expression, thus chromatin context was also investigated. Chromatin accessibility was able to explain more expression alterations than purely DNA methylation context further highlighting the importance of both epigenetic areas for the first time in UCC. The combination of gene expression, DNA methylation, chromatin accessibility and mutational context of genes revealed that the epigenetic and genetic changes may overlap but are usually independent. This implies that only one mechanism is necessary for altering the gene expression and that possibly either the epigenetic context or genetic context can be altered to achieve the same result, especially if the genes belong to the same signaling pathway. This opens the door to potentially novel therapeutic strategies to include epigenetically altered genes, which can contribute to the tumor phenotype. These UCC cell lines provide a basis to study different aspects of the disease and more specifically with an understanding of what genes are mutated or epigenetically affected. 58 CHAPTER 3 LOW STAGE RECURRENT UCCS HAVE COMMON ANCESTRAL CLONES INTRODUCTION Urothelial Cell Carcinoma (UCC) ranks fifth among all cancers in Western countries and results in approximately 150,000 deaths per year and more than 380,000 new cases worldwide (Ferlay et al., 2010). Although 75% of patients have non muscle- invasive UCCs it is estimated 50% of these will have a recurrent tumor that could also be pathologically muscle-invasive (Babjuk et al., 2013). A lower risk of recurrence and better clinical outlook is associated with activating mutations in FGFR3 that occur in 68- 88% of non muscle-invasive UCC Ta tumors (Billerey et al., 2001; van Rhijn et al., 2002). TP53 mutations are predominate in muscle-invasive UCC, linked to smoking and are mostly mutually exclusive of FGFR3 mutations (Bakkar et al., 2003; Wallerand et al., 2005). Recent studies characterize genetic aberrations of muscle-invasive UCC through whole-exome sequencing (WES) (Alexandrov et al., 2013; Balbás-Martínez et al., 2013; Gui et al., 2011; Guo et al., 2013; Iyer et al., 2013; Kandoth et al., 2013; Network, 2014a; Ross et al., 2014) although few groups have investigated non muscle-invasive UCC (Balbás-Martínez et al., 2013; Nordentoft et al., 2014). Muscle-invasive UCC is heterogeneous in its mutational landscape compared to other types of cancer, which typically have aberrations in a specific group of genes with similar functions (Kandoth et al., 2013). Frequently mutated genes in muscle-invasive UCC include TP53 (41%), KDM6A (28%), ARID1A (22%), PIK3CA (18%), MLL2 (17%), CREBBP (15%), RB1 59 (15%), EP300 (13%), FGFR3 (13%), STAG2 (13%), HRAS (8%) and TSC1 (8%) (Forbes et al., 2011; Gui et al., 2011; Network, 2014a). Besides the genetic aberrations there are many DNA methylation changes that occur in muscle-invasive UCC. DNA methylation alterations are known to be involved in both the initiation and progression of tumorigenesis (Goodman and Watson, 2002). Muscle-invasive UCC patients show global DNA methylation alterations (Network, 2014a; Wolff et al., 2010b) and high levels of de novo DNA methylation occurring at promoter CpG islands which suggests epigenetic silencing (Byun et al., 2007; Friedrich et al., 2005; Friedrich et al., 2004; Salem et al., 2000a; Salem et al., 2000b; Wolff et al., 2010b; Wolff et al., 2008). Although a small percentage (10-15%) of low stage tumors advance to high stage UCC, promoter DNA methylation can increase during this process (Bilgrami et al., 2014). Recurrent non muscle-invasive UCC tumors are thought to either come from a series of carcinogenic insults resulting in independent tumors or from the spread of a single clone via intraepithelial migration or implantation with evidence supporting both claims (Cheng et al., 2002; Hafner et al., 2001; Sidransky et al., 1992). An early study of recurrent muscle-invasive UCC used X chromosome inactivation to assess clonal expansion from a single transformed cell and found in all cases the same X chromosome inactivation (Sidransky et al., 1992). Additionally, recent studies confirmed those early findings with metachronous UCCs that were all low stage by identifying genetic alterations in 4 patients and finding a common ancestor in all cases (Nordentoft et al., 2014). However, there are no large WES studies on metachronous non muscle-invasive UCCs with epigenetic analysis included for clonal ancestral analysis. 60 Since the genetic and epigenetic alterations in non muscle-invasive UCC are not well defined we sought to improve our understanding of genetic and epigenetic alterations in recurrent low-stage UCCs for clonal compositions. In order to comprehensively study not only the genetic but also epigenetic alterations of 13 patients with non-invasive UCC we use WES to identify mutated genes and genome-wide DNA methylation alterations were detected with the 450K DNA methylation array. These platforms allowed for the identification of an early ancestral clone in all 13 patients on a genetic and an epigenetic basis. Additionally, the originating ancestral tumor cell of these patients was found to have potentially initiating events with mutations in MLL2 and MUC16 but not FGFR3. 61 MATERIALS AND METHODS Illumina Infinium HM450 DNA Methylation Assay DNA methylation was assessed using Illumina’s Infinium HumanMethylation450 (HM450) BeadChip and was performed at the USC Epigenome Center according to the manufacturers specifications. Normalization and β-value calculation were performed using the R package methylumi. The array examines the DNA methylation status of 482,421 CpG sites and each is reported as β-value, ranging from 0 (unmethylated) to 1 (fully methylated). CpG probes with a detection p-value greater than 0.05, located within 15 base pairs of a single-nucleotide polymorphism, mapped to multiple locations, or on the sex chromosomes were excluded from further analysis in all samples leaving 385,826 probes. CpG sites with an average β-value <0.1 or >0.9 in the 19 normal urothelium samples were isolated from the 385,826 CpG sites on the array leaving 174,531 CpG sites for downstream analysis. DNA hypermethylation was defined as CpG sites with a β<0.1 in normal urothelium and >0.3 in the tumor while DNA hypomethylation was defined as CpG sites with a β>0.9 in normal urothelium and <0.7 in the tumor. Whole Exome Sequencing Macrodissected tissue samples were kindly provided by Dr Yong-June Kim as described previously (Kim et al., 2015). 5ug of genomic DNA from tissues was sent to Genewiz for whole exome sequencing. Exome library enrichment was performed with Agilent SureSelect Exome library preparation following the manufacturers specifications. Sequencing was performed on Illuminia’s HiSeq2500 High Output mode platform in a 62 2x100bp paired-end (PE) configuration. Each sample had a minimum of 120 million reads and an average exonic coverage of 30X per base. Detection of Somatic Mutations This process and analysis is comprehensively explained in Chapter 5 but briefly, high-quality paired-end reads of data was gapped aligned to NCBI human reference genome (hg19) using Burrows-Wheeler Aligner (BWA). Local realignment of BWA- aligned reads was performed using the Genome Analysis Toolkit (GATK) (McKenna et al., 2010). Somatic mutations were called based on the BWA alignments by VarScan (Koboldt et al., 2009) following the heuristic rules: (i) samples are covered sufficiently (≥10x); (ii) average base quality was no less than 20; (iii) variants are represented in at least 20% of the total reads from the sample and (iv) variants had at least 3 reads in the sample. The same criteria were used for the somatic indels called by GATK using PINDEL. Further reduction of the false positive calls was accomplished by the following filtering criteria: (i) Phred-like scaled consensus scores and SNP quality scores of <20; (ii) mapping qualities of variants <30; (iii) indels found in only one DNA strand; and (iv) variants within 5bp of an indel. Elimination of previously described germline variants with a minor allele frequency >1% was done by cross-referencing somatic mutations to dbSNP (version 142) and SNP data sets from the 1000 Genomes Project. Variants were then annotated using ANNOVAR and SnpEFF (Cingolani et al., 2012; Wang et al., 2010a). 63 Publically Available Data Sets Processed Somatic mutation data was collected from The Cancer Genome Atlas (TCGA) for bladder urothelial carcinoma (BLCA), lung adenocarcinoma (LUAD), lung squamous cell carcinoma (LUSC), colorectal adenocarcinoma (COAD), rectum adenocarcinoma (READ) and kidney renal clear cell carcinoma (KIRC) (Network, 2012a; Network, 2012b, 2013, 2014a, b). Level 3 processed HM450 DNA methylation data for the 19 normal urothelium were obtained from TCGA and used in the analysis (Network, 2014a). 64 Table 3.1 Clinical and Pathological Data for 30 UCC Tumors from 13 Patients Patient Number Tumor Number Operation Date Stage Grade WHO Grade ISUP Same Area 03 1 04/11/00 T1N0M0 3 2 Yes 2 10/27/00 T2N0M0 3 2 Yes 05 1 10/19/07 T1N0M0 3 2 Maybe 2 02/18/08 T1N0M0 3 2 Maybe 07 1 10/14/99 T1N0M0 3 2 No 2 01/21/02 T2N0M0 3 2 No 10 1 07/20/01 TaN0M0 2 1 NA 2 09/09/05 T2N0M0 3 2 NA 12 1 07/09/09 T1N0M0 3 2 Yes 2 02/08/10 T3N1M0 3 2 Yes 13 1 12/10/03 T1N0M0 2 ND Yes 2 12/06/04 T1N0M1 3 2 Yes 15 1 02/26/07 T1N0M0 3 2 NA 2 09/06/12 T1N3M1 3 2 NA 20 1 04/23/99 T1N0M0 3 2 NA 2 07/31/08 T1N0M0 3 2 NA 3 10/12/12 T1N0M0 3 2 NA 21 1 10/29/01 TaN0M0 2 1 Yes 2 06/26/07 TaN0M0 3 2 Yes 26 1 11/24/10 TaN0M0 2 2 NA 2 04/06/11 TaN0M0 2 1 NA 3 08/25/11 TaN0M0 1 1 NA 4 03/09/12 TaN0M0 2 1 NA 27 1 12/16/05 T1N0M0 3 2 Yes 2 10/30/06 T2N0M0 3 2 Yes 28 1 03/05/03 T1N0M0 3 2 NA 2 09/20/04 TaN0M0 1 ND NA 30 1 11/08/99 TaN0M0 1 1 NA 2 05/15/01 TaN0M0 2 1 NA 3 09/30/02 TaN0M0 2 1 NA 65 Table 3.2 Metachronous Tumor Locations Patient Number Tumor #1 Tumor #2 Tumor #3 Tumor #4 03 NA NA 05 NA NA 07 NA NA 10 NA NA 12 NA NA 13 NA NA 15 NA NA 20 NA 21 NA NA 66 Patient Number Tumor #1 Tumor #2 Tumor #3 Tumor #4 26 27 NA NA 28 NA NA 30 NA 67 RESULTS Clinical and Pathological Data Tumors from 13 different patients who initially presented low stage recurrent bladder cancer (Ta-T1) were collected in addition to their follow-up tumors that ranged from 4-162 months post the initial tumor collection (Figure 3.1). Tumor-adjacent histologically normal-appearing bladder tissue was also collected for analysis and comparison. Pathological characteristics are shown in Table 3.1 and Table 3.2 including the tumor locations and summaries by three expert genitourinary pathologists and their findings on the stage and grade (WHO and ISUP standards) for each tumor specimen (Figure 1.1). Patient (03-30) and tumor (1-4) numbers are also indicated and are used throughout the study for clarity. Surgical reports indicated 5 patients (P03, P12, P13, P21 and P27) had the recurrent tumor in the same anatomical location as the previously resected tumor (Table 3.1 and Table 3.2). Additionally our dataset contains 5 patients (P03, P07, P10, P12 and P27) who did progress to a higher stage disease (Table 3.1). Somatic DNA Alterations Each tumor displayed a large number of DNA alterations, including single- nucleotide variants (SNVs) and insertions/deletions (INDELs). We observed an average of 271 (13-1014) variants per sample in protein coding regions after filtering out variants found in the germline, 1000-genome database and dbSNP142 (Figure 3.2 A) (Abecasis et al., 2012; Sherry et al., 2001). Overall, C>T mutations were the most prevalent SNV in our dataset with large percentages of C>T mutations in a CpG context accounting for 8- 40% of SNVs per sample (Figure 3.2 B). Figure 3.1: Metachronous Bladder Tumor Timelines for 13 Patients. Time (months) indicating initial tumor collection and all subsequent follow-up tumors from each patient. 68 0 50 100 150 P30 P28 P27 P26 P21 P20 P15 P13 P12 P10 P07 P05 P03 Follow-Up Time (Months) 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0 500 1000 # of Mutations Low Moderate High P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0.00 0.04 0.08 Signature Contribution Aging APOBEC P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0 50 100 % of SNVs C>A/G and T>A/G C>T NonCpG C>T CpG T>C P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 Figure 3.2: Somatic Mutations and Mutational Signatures in UCC. (A) Distribution of SNVs and INDELs for the individual tumors from each patient in protein coding regions. (B) SNV mutation type profile as a cumulative bar plot showing different types of transversions or transitions. (C) SNV mutational signature contributions of either APOBEC or Aging. (D) Distribution of nonsynonymous mutations in respect to potential function impact. 69 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0 400 800 1200 # of Mutations P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 SILENT INDEL MISSENSE NONSENSE A B C D Figure 3.3: Mutational Signatures Identified from Tumor Samples. Aging (A) and APOBEC (B) signatures are displayed according to the 96 different substitution classifications in respect to trinucleotide context. The percentage of SNVs contributing to that signature are shown as a summary of all 30 WES UCC tumor samples. 70 ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ATA ATC ATG ATT CTA CTC CTT CTG GTA GTC GTG GTT TTA TTC TTG TTT ATA ATC ATG ATT CTA CTC CTT CTG GTA GTC GTG GTT TTA TTC TTG TTT ATA ATC ATG ATT CTA CTC CTT CTG GTA GTC GTG GTT TTA TTC TTG TTT 0.0 0.5 1.0 1.5 2.0 C>A C>G C>T T>A T>C T>G ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ACA ACC ACG ACT CCA CCC CCG CCT GCA GCC GCG GCT TCA TCC TCG TCT ATA ATC ATG ATT CTA CTC CTT CTG GTA GTC GTG GTT TTA TTC TTG TTT ATA ATC ATG ATT CTA CTC CTT CTG GTA GTC GTG GTT TTA TTC TTG TTT ATA ATC ATG ATT CTA CTC CTT CTG GTA GTC GTG GTT TTA TTC TTG TTT 0.0 0.5 1.0 1.5 2.0 C>A C>G C>T T>A T>C T>G Signature Contribution Aging Identified Mutational Signature Signature Contribution APOBEC Identified Mutational Signature A B 71 These SNVs are sites that become spontaneously deaminated and normally operate in the germ line but occur at a higher level in cancer (Pfeifer, 2006; Welch et al., 2012). We further analyzed the mutations in the trinucleotide context of each SNV from patients to identify mutational signatures. This analysis identified two distinct mutation spectrums including the mutations attributed to aging, consisting of C>T mutations in CpG context, and the APOBEC mutation signature, C>T/G in a TCW context where W is an A or T (Figure 3.2 C and Figure 3.3). The APOBEC signature is caused by over activity of members of the ABOBEC family of cytidine deaminases (Di Noia and Neuberger, 2007; Nik-Zainal et al., 2012). 13 of 30 tumors were enriched for the APOBEC mutation signature and in most cases the mutation signature contribution, whether it was aging or APOBEC, was consistent between each patients tumors (Figure 3.2 C). To assess the potential function of protein coding mutations we isolated nonsynonymous mutations and characterized them according to predicted impact (Figure 3.2 D). On average we found 204 (12-924) potentially functional mutations per tumor with an average of 41 (1-568) mutations per sample with a high impact. P20’s 1 st tumor showed a large number of INDELs and is possible due to the high level of DNA damage and error-prone double- strand break repair that is seen in UCC (Bentley et al., 2004; Morrison et al., 2014). We identified 40 genes that were mutated in at least 20% of the UCCs (Figure 3.4). MLL2 (63%), MLL3 (23%), WNK1 (23%), KDM6A (20%) and TP53 (20%) were among the top mutated genes that were previously found significantly mutated in muscle-invasive UCC (Gui et al., 2011; Network, 2014a). In summary our somatic mutation findings are consistent with other whole-exome sequencing studies on non muscle-invasive UCC. The total number of observed 72 mutations is similar to other recent studies on low-stage non muscle-invasive UCC and also lower than high-stage muscle-invasive UCC (~300) (Network, 2014a; Nordentoft et al., 2014). Muscle-invasive UCC was found to have different mutational signatures which including a signature of aging and also APOBEC which we also identified in our non muscle-invasive UCC patient dataset. In addition, our data confirmed findings in which approximately 1 out of 3 UCC patients have APOBEC mutational signatures in non muscle-invasive UCC (Network, 2014a; Nordentoft et al., 2014). We found mutations in genes from studies of muscle-invasive UCC in our non muscle-invasive UCC dataset indicating them as potential driver mutations. 73 Figure 3.4: Frequently Mutated Genes in UCC. Forty Genes that are mutated in the UCC tumors occurring at a minimum frequency of 20%. Genes having statistically significant levels of mutation in muscle-invasive UCC are shown by *. 0 20 40 60 80 100 ZNF595 VRTN TP53 TERF2IP SYNE2 SLC38A1 SIPA1 SCN4A RYR2 PKHD1L1 MAGEB6 KDM6A IL1RAP HECW1 GAA FUK DHX37 CMYA5 CDR2L CCDC18 CBLC C17orf70 WNK1 SRCAP MUC12 MLL3 DYNC2H1 DNAH6 CLSTN2 LYST GPR98 LAMB1 TTN PLEC MUC4 HRNR ATN1 MUC6 MUC16 MLL2 Frequency (%) * * * * * 74 DNA Methylation Alterations In addition to somatic mutations, we were also interested in DNA methylation alterations and analyzed these on the Infinium 450K array. We compared the tumors to an average of 19 publically available normal bladder urothelium samples (Network, 2014a). Hypermethylated CpG sites were defined as CpG sites with β<0.1 in the normal urothelium and β>0.3 in the tumor while hypomethylated CpG sites were defined as CpG sites with β>0.9 in the normal urothelium and β<0.7 in the tumor. We chose these parameters as well as confirming low leukocyte contamination to alleviate tumor purity concerns because the DNA methylation differences could only be attributed by tumor cell DNA as previously described (Network, 2014a). As a consequence of these strict parameters it prevents us from investigating allele specific DNA methylation changes although they already would be difficult to distinguish in impure samples. We found 4,086 CpGs to be hypermethylated on average (1,494-8,824) and 6,688 CpGs to be hypomethylated on average (1,262-14,343) (Figure 3.5). On average 32% of hypermethylated CpGs and 11% of hypomethylated CpGs were in promoters while 44% of hypermethylated CpGs and 57% hypomethylated CpGs were in gene bodies per patient (Figure 3.5). In relation to CpG island (CGI) context, 76% of hypermethylated CpGs and 12% of hypomethylated CpGs were found in CGIs (Figure 3.6). DNA hypermethylation occurred predominately at gene promoters while DNA hypomethylation mainly occurred in gene bodies which is consistent with previous studies (Gaudet et al., 2003; Jones, 2012; Laird and Jaenisch, 1996; Wolff et al., 2010b). Figure 3.5: DNA Methylation Aberrations Per Tumor. (A) Total number of CpG sites hypermethylated (β<0.1 in normal and β>0.3 in tumor, yellow) and hypomethylated (β>0.9 in normal and β<0.7 in tumor, blue) compared to normal urothelium per tumor sample. (B) Distribution of locations for hypermethylated (Top) and hypomethylated (bottom) CpG sites with respect to promoter (maroon), gene body (orange) and all other locations (gray). 75 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0 50 100 % of CpG Sites Other P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 Promoter Gene Body 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0 50 100 % of CpG Sites Other P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 Promoter Gene Body B Hypomethylation of CpG Sites Hypermethylation of CpG Sites 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0 6000 12000 18000 # of CpGs P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 Hypomethylated Hypermethylated Hypermethylated Hypomethylated Differentially Methylated CpG Sites Compared to Normal Tissue (%) A 76 Hypomethylation of CpG Sites Hypermethylation of CpG Sites 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0 50 100 % of CpG Sites CGI P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 NON 1 2 1 2 1 2 1 2 1 2 1 2 1 2 1 2 3 1 2 1 2 3 4 1 2 1 2 1 2 3 0 50 100 % of CpG Sites CGI P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 NON Figure 3.6: CGI Context of DNA Methylation Aberrations Per Tumor. Distribution of locations for hypermethylated (top) and hypomethylated (bottom) CpG sites with respect to CGI (green) or non-CGI (gray). Differentially Methylated CpG Sites Compared to Normal Tissue (%) 77 Figure 3.7: Genes with Promoter Methylation Changes in UCC. (A) 64 genes that have promoter hypermethylation occurring at a minimum frequency in the UCCs of 60%. (B) 43 genes that have promoter hypomethylation occurring at a minimum frequency in the UCCs of 60% (right). 0 20 40 60 80 100 ZIC1 TOX2 PHYHIPL NPY LOC645323 HSPA1L HOXA7 GDF7 GABRA2 DRD4 ADCY8 ZNF418 TMEM106A T NRN1 MIR129-2 IGF2BP1 DDX25 VSX1 SLIT2 SIX6 RASSF1 RALYL GRP EYA4 CSDAP1 C1QL2 C17orf104 ZNF177 ZDBF2 PITX2 MYO15B MIR137 KCNA3 GHSR GABRA4 CCDC81 WBSCR17 SOX17 PCDHB4 NKX2-4 HOXA9 ADCYAP1 SALL3 RSPO2 PCDHB15 OLIG2 MIR124-2 HIST1H4F DCC BHLHE23 PCDHGB7 PCDHAC1 PAX1 GALR1 BOLL PCDH8 MSC CDO1 UNCX SLC32A1 MARCH11 NKX2-6 PENK Frequency (%) 0 20 40 60 80 100 TKTL2 PRM2 PRKACA OR51I2 MYH13 MIR432 KIF2B KCNIP4 ELMO1 DCAF4L2 DEC1 SLC6A16 SLC30A8 OR5D18 MIR410 MIR1974 LOC644936 LOC388965 FMO3 COX8C C13orf16 OR6T1 NUDT9P1 MIR549 DPPA2 CCDC42 SLC17A2 MYADML MIR539 LOC285954 DEFB133 NTF3 REG1A ODZ3 NACAP1 MIR412 MIR409 LOC729678 TM4SF19 THSD7B CEACAM19 MITF CRP Frequency (%) Hypermethylation Hypomethylation A B 78 To investigate whether DNA methylation at gene promoters occurred at genes previously implicated in bladder cancer we focused on the most frequently hypermethylated and hypomethylated promoters in our dataset (Figure 3.7). We found 64 genes that had promoter hypermethylation occurring in at least 60% of the tumors (18 of 30 non muscle-invasive UCCs). Interestingly, pathway analysis revealed cadherin signaling pathway members PCDH8 (83%), PCDHAC1 (80%), PCDHGB7 (80%), PCDHB15 (77%) and PCDHB4 (73%) implicating the potential silencing of multiple genes in this pathway by DNA methylation (Figure 3.7). We also detected hypermethylation of RASSF1 (66%) which was previously found to be hypermethylated in UCCs and used as a detection marker for UCCs from urine sediments (Dulaimi et al., 2004). Additionally, we identified 43 genes with hypomethylated promoters occurring in at least 60% of the tumors (18 of 30) although no specific pathways associated with these were identified (Figure 3.7). These results in addition to others show that DNA hypermethylation can possibly drive tumorigenesis and/or progression in bladder cancer (Heyn and Esteller, 2012; Sharma et al., 2010). Clonal Evolution Analysis Identifies an Ancestral Clone in Every Patient To gain insight into whether metachronous tumors from one patient came from a common ancestral clone, we compared mutations and DNA methylation alterations of each patient’s tumors (Figure 3.8, Figure 3.9 and Figure 3.10). For clarity, mutations that are common/ancestral from the patient’s metachronous tumors are deemed “public” and those that are unique are considered “private.” Figure 3.8: P03-P12 Evolution Trees Inferred by Somatic Mutations and DNA Methylation Changes. Evolution trees for each patient representing all nonsynonymous mutations as well as DNA hypermethylation and DNA hypomethylation changes. For each tree the total number of events is written at the bottom. The percentage of events contributing to the ancestral branch, (black), private tumor 1 branch (green) and private tumor 2 branch (blue) is shown. Mutations Hypermethylation Hypomethylation 20% 40% 40% n=6725 7.2% 90% 3.2% n=1262 2.8% 51% 46% n=6389 60% 16% 24% n=4439 27% 49% 24% n=4200 43% 18% 39% n=3399 1.5% 96% 2.3% n=8824 20% 14% 67% n=2677 27% 25% 48% n=4150 50% 18% 32% n=2839 12% 87% 1.1% n=93 19% 76% 4.8% n=83 17% 19% 64% n=119 48% 34% 17% n=99 56% 32% 12% n=329 P05 4 mo. P10 50 mo. P03 7 mo. P07 27 mo. P12 7 mo. 79 9.1% 30% 61% n=670 n=973 16% 16% 68% n=14116 n=8045 n=3285 34% 13% 53% n=4908 80 Mutations Hypermethylation Hypomethylation P15 66 mo. P21 68 mo. P13 12 mo. P20 162 mo. 16% 9% 21% 19% 35% 22% 8% 23% 15% 32% 1.6% 1.7% 1.8% 94% 0.3% 6.1% 66% 28% n=9354 34% 33% 33% n=5692 21% 16% 63% n=7977 51% 18% 31% n=1494 28% 33% 39% n=314 18% 23% 59% n=148 Figure 3.9: P13-P21 Evolution Trees Inferred by Somatic Mutations and DNA Methylation Changes. Evolution trees for each patient representing all nonsynonymous mutations as well as DNA hypermethylation and DNA hypomethylation changes. For each tree the total number of events is written at the bottom. The percentage of events contributing to the ancestral branch, (black), private tumor 1 branch (green), private tumor 2 branch (blue) and private tumor 3 branch (yellow) is shown. 14% 32% 54% n=7529 48% 7.2% 45% n=2225 n=5113 n=1973 81 Mutations Hypermethylation Hypomethylation P28 19 mo. P27 11 mo. P30 35 mo. 64% 4% 19% 12% 26% 2% 64% 8% 24% 13% 25% 5% 33% 0.069% 98% 1.8% n=2876 5.3% 70% 25% n=2749 17% 36% 48% n=316 12% 21% 68% n=183 n=323 Figure 3.10: P26-P30 Evolution Trees Inferred by Somatic Mutations and DNA Methylation Changes. Evolution trees for each patient representing all nonsynonymous mutations as well as DNA hypermethylation and DNA hypomethylation changes. For each tree the total number of events is written at the bottom. The percentage of events contributing to the ancestral branch, (black), private tumor 1 branch (green), private tumor 2 branch (blue), private tumor 3 branch (yellow) and private tumor 4 branch (purple) is shown. n=393 n=3483 P26 16 mo. 20% 5% 8% 5.3% 50% 7% 4% 22% 7% 16% 22% 10% 9% 14% 45% 7% 17% 0.3% 1.3% 16% 13% n=14353 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 3 = 1 2 = 1 2 3 4 = 1 2 = 1 2 = 1 2 3 0 200 400 600 800 1000 # of Mutations LOW MODERATE HIGH P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 3 = 1 2 = 1 2 3 4 = 1 2 = 1 2 = 1 2 3 0 50 100 % of SNVs C>A/G and T>A/G C>T NonCpG C>T CpG T>C P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 3 = 1 2 = 1 2 3 4 = 1 2 = 1 2 = 1 2 3 0 400 800 1200 # of Mutations INDEL MISSENSE NONSENSE SILENT P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 3 = 1 2 = 1 2 3 4 = 1 2 = 1 2 = 1 2 3 0.00 0.04 0.08 Signature Contribution APOBEC Aging P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 Figure 3.11: Mutations and Mutational Signatures in Respect to Public and Private Branches. (A) Distribution of SNVs and INDELs for the individual patients in protein coding regions separated by public (=) or private tumor branches (1,2,3 or 4). (B) SNV mutation type profile as a cumulative bar plot showing different types of transversions or transitions. (C) SNV mutational signature contributions of either APOBEC or Aging. (D) Distribution of nonsynonymous mutations in respect to potential function impact. 82 A B C D = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 3 = 1 2 = 1 2 3 4 = 1 2 = 1 2 = 1 2 3 0 25 50 75 100 % of CpG Sites Promoter Gene Body Other P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 = 1 2 3 = 1 2 = 1 2 3 4 = 1 2 = 1 2 = 1 2 3 0 25 50 75 100 % of CpG Sites Promoter Gene Body Other P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 Hypomethylation of CpG Sites Hypermethylation of CpG Sites Figure 3.12: DNA Methylation Changes Occurring in Public and Private Branches with Respect to Genomic CpG Location. (A) DNA hypermethylation profile in respect to CpG location as a cumulative bar plot of CpG sites with respect to promoter (maroon), gene body (orange) and all other locations (gray). (B) DNA hypomethylation profile in respect to CpG location as a cumulative bar plot of CpG sites with respect to promoter (maroon), gene body (orange) and all other locations (gray). Public DNA methylation changes for each patient are indicated by “=” and private tumors are indicated by tumor number (1-4). 83 A B 84 Looking at the public and private mutations we did not see any dramatic differences with respect to the aforementioned mutation counts, transitions/transversions or mutational signatures of this dataset (Figure 3.11 A-C). Interestingly, the distribution of the mutational signatures in the private mutations in tumors from P03, P05 and P26 were different than the public and private branches of the other tumors for those patients (Figure 3.11 C). We found nonsynonymous mutations in protein coding regions with potential functional roles representing on average 38% (0.3-68%) of the total mutations per patient in the public mutation branch and accounted for on average 120 (1-411) public mutations per patient (Figure 3.8, Figure 3.9, Figure 3.10 and Figure 3.11 D). Thus we believe this confirms the clonal evolution of these tumors from a common ancestral cell. Interestingly, P03, P05 and P20 all exhibit an early divergence from an ancestral cell with only 4,1 and 3 public mutations, respectively. DNA methylation alterations were also examined for each patient to assess clonal expansion from a common ancestor. Public DNA hypermethylation on average accounted for 36% of alterations per patient while public DNA hypomethylation on average accounted for 29% of alterations per patient (Figure 3.8, Figure 3.9 and Figure 3.10). Additionally, the public and private DNA methylation changes did not show distribution differences of CpG sites in respect to genomic location (Figure 3.12). This data further confirms the aforementioned genetic findings in which these patients have metachronous tumors sharing a common ancestral cell. Additionally, the high levels and prevalence of DNA methylation alterations point to the possibility that the epigenetic changes can precede the genetic ones (Eads et al., 2000; Niwa et al., 2010; Shen et al., 2005; Wolff et al., 2010b; Yan et al., 2006). 85 Public Mutations Indicate Early Events in Bladder Cancer Next, we wanted to identify genes that were frequently mutated as well as found in the publically altered branch of patients. We found 30 genes that were publically mutated in at least 2 patients (Figure 3.13) including MLL2 (54%), MUC16 (38%), ATN1 (23%), HRNR (23%), SRCAP (23%), TP53 (23%), and WNK1 (23%) (Figure 3.13 and Figure 3.14). Although, MLL2, SRCAP, WNK1 and TP53 have been reported to be significantly mutated in bladder cancer, we see that MLL2 and MUC16 are potentially early mutations in bladder cancer because they appear in the public branches (Figure 3.13 and Figure 3.15) (Network, 2014a; Nordentoft et al., 2014). To confirm that mutations in MLL2 and MUC16 are early or potentially driving mutations in bladder cancer we compared the mutations we identified to the larger cohort of UCCs available from TCGA. Mutations in MLL2 did not have a mutation hotspot in the coding region of the gene when comparing non muscle-invasive to muscle-invasive UCC (Figure 3.15 A). On the other hand, we did observe a highly significant (P <0.0001) mutation in the MUC16 gene in the coding region for tandem Sperm protein, Enterokinase and Agrin (SEA) domains when comparing non muscle-invasive to muscle- invasive UCC (Figure 3.15 B). To see if mutations in MUC16’s SEA domains could be found in other cancers we looked at 5 different cancers with somatic mutation data available and found that none of them exhibited the same mutational localization (Figure 3.16). This indicates that potentially the mutation localization in the SEA domain may be specific for non muscle- invasive UCC. Other mucin genes was also contained mutations, with MUC16 having the most public mutations amongst patients, although there was no specific domain targeted 86 in any of the other mucin family members (Figure 3.17 and Figure 3.18). The SEA domain of MUC16 contains sites of glycosylation and by cross-referencing known sites of glycosylation with our mutation data we found no mutation at a potential post- translational modification site (data not shown). Figure 3.13: 30 Frequently Mutated Genes Found in the Public Branch. Chart of the top genes with frequent mutations in the public branch. Each column is a different patient and each row is a different gene. For P20, P26 and P30 public mutations are mutations shared in at least 2 of the metachronous tumors. 87 * * * GENE P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 2 2 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 MLL2 1 1 2 2 2 2 2 2 1 1 1 1 1 MUC16 1 1 1 1 1 1 1 1 1 ATN1 1 1 1 1 1 1 1 1 HRNR 1 1 1 1 1 1 1 1 SRCAP 1 1 1 1 1 1 TP53 1 1 1 1 1 1 1 1 WNK1 ACACB 1 1 2 2 ARID1A 1 1 1 1 ATXN1 1 1 1 1 CDR2L 1 1 1 1 1 1 CLSTN2 1 1 1 1 1 1 DNAH6 1 1 1 1 1 DYNC2H1 1 1 1 1 1 GPR98 1 1 1 1 HIPK3 1 1 1 1 HSPG2 1 1 1 1 KDM6A 1 1 1 1 1 1 LAMB1 3 3 1 1 LYST 1 1 1 1 1 1 MLL3 1 1 2 2 2 2 MUC12 1 1 1 1 MUC4 1 1 1 1 MUC5B 1 1 1 1 MUC6 1 1 1 1 1 1 1 PKHD1L1 1 1 1 1 PLEC 1 1 1 1 1 1 1 SH3TC2 1 1 1 1 SYNE2 1 1 1 1 TTN 1 1 6 6 Figure 3.14: Frequently Mutated Genes in the Public Branch in 3 or more Patients. Chart of the top genes with frequent mutations in the public branch. Each column is a different patient while each gene has two rows with the first indicating public mutations (black) and the second indicating private mutations (green). For P20, P26 and P30 public mutations are mutations shared in at least 2 of the metachronous tumors. 88 * * * GENE P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 2 2 1 1 1 1 1 1 1 2 2 2 2 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 1 1 1 1 1 1 3 1 1 1 1 3 3 1 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 WNK1 MLL2 MUC16 ATN1 HRNR SRCAP TP53 Figure 3.15: Non Muscle-Invasive and Muscle-Invasive UCCs Somatic Mutations in MLL2 and MUC16. (A) Mutation locations for the MLL2 gene in Non muscle- invasive UCC (top) and muscle-invasive UCC (bottom). (B) Mutation locations for the MUC16 gene in non muscle-invasive UCC (top) and muscle-invasive UCC (bottom). Student’s t-test was used to assess domain specific mutations with **** indicating a p<0.0001. 89 Muscle-Invasive UCC NS -C N- MLL2 SET domain F/Y rich domains PHD-like zinc-binding domain PHD-finger Non-Synonymous Variants INDELs -C N- MUC16 SEA Domains (Function Unknown) Extracellular domain O-Glycosylated Non-Synonymous Variants INDELs -C N- **** Non Muscle-Invasive UCC -C N- Muscle-Invasive UCC Non Muscle-Invasive UCC A B -C N- Figure 3.16: Somatic Mutations of MUC16 in 5 Different Cancers. Mutation locations for MUC16 in 5 selected cancers. Student’s t-test was used to assess domain specific mutations with *** indicating a p<0.001 and not significant (NS). 90 Lung Adenocarcinoma Lung Squamous Cell Colon and Rectum Adenocarcinoma Kidney RCC Carcinoma NS NS NS -C N- -C N- -C N- *** SEA Domains (Function Unknown) Extracellular domain O-Glycosylated Non-Synonymous Variants INDELs Figure 3.17: Mucin Family Members are Frequently Mutated in UCC. Chart of the mucin family of genes with frequent mutations in the public branch. Each column is a different patient while each gene has two rows with the first indicating public mutations (black) and the second indicating private mutations (green). For P20, P26 and P30 public mutations are mutations shared in at least 2 of the metachronous tumors. 91 * * * GENE P03 P05 P07 P10 P12 P13 P15 P20 P21 P26 P27 P28 P30 1 1 2 2 2 2 2 2 1 1 1 1 1 1 3 1 1 1 1 3 3 1 3 MUC16 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 1 3 2 3 1 1 1 1 1 1 1 1 1 1 1 1 1 1 4 4 1 2 1 1 3 1 1 1 2 1 2 1 MUC12 MUC4 MUC5B MUC6 1 1 1 3 3 3 3 8 MUC17 MUC13 Figure 3.18: Somatic Mutation Locations of all Mutated Mucin Family Members. Mutation locations for MUC16, MUC12, MUC4, MUC5B, MUC6, MUC13, and MUC17 genes. 92 VWD VWD VWD SEA domain MUC5B (5762) MUC4 (2169) MUC16 (14507 of 22152) MUC13 (512) MUC12 (5478) MUC6 (2439) MUC17 (4493) 93 Epigenetic alterations are potentially as important as genetic alterations in cancer progression but occur more frequently. We compared the most frequently mutated genes and hyper- and hypomethylated gene promoters found in the public branches and found dramatic differences in their frequencies (Figure 3.19). The top 30 genes with promoter hypermethylation and hypomethylation occurred at a frequency above 54% while somatic mutations occurred at a frequency above 20%. Interestingly, the cadherin-signaling pathway genes (PCDH8, PCDHGB7, PCDHAC1, PCDHB15 and PCDHB4) were found hypermethylated in the public branch. Although studies identified mutations in KDM6A and PIK3CA as early events in bladder cancer, the mutations in MUC16 and MLL2 have not been implicated as early mutational events (Nordentoft et al., 2014). We checked MUC16 somatic mutation data for various cancers including UCC and found that there was no mutational clustering in those cancers. Since TCGA’s UCC sample set was biased towards high stage UCC (T3- T4) and did not see the same mutational clustering in the SEA coding domains, we believe this could indicate a unique aspect of low stage non muscle-invasive UCC. Although mutations in the SEA domain of MUC16 did not disrupt glycosylation sites, these alterations could still influence the protein and extracellular domain structure from folding correctly or preventing the glycosylation of a nearby site (Maverakis et al., 2015). Additionally, early DNA methylation events include silencing of genes involved in the cadherin-signaling pathway. Figure 3.19: Frequency of Mutations and Promoter Methylation in Public Branches. (A) Top 30 frequently mutated genes in the public branch sorted by frequency. (B) Top 30 frequently hypermethylated gene promoters in the public branch sorted by frequency. (C) Top 30 frequently hypomethylated gene promoters in the public branch sorted by frequency. 94 0 50 100 LOC100128554 KCNIP4 GRAMD3 ELMO1 DCAF4L2 CCDC42 C13orf16 PRKACA OR6T1 OR5D18 NTF3 MYH13 MYADML MIR1974 LOC644936 DEFB133 COX8C ODZ3 NACAP1 MIR539 MIR412 MIR409 LOC729678 DPPA2 CEACAM19 TM4SF19 THSD7B REG1A MITF CRP Frequency (%) 0 50 100 TTN SYNE2 SH3TC2 PLEC PKHD1L1 MUC6 MUC5B MUC4 MUC12 MLL3 LYST LAMB1 KDM6A HSPG2 HIPK3 GPR98 DYNC2H1 DNAH6 CNTN4 CLSTN2 CDR2L ARID1A ACACB WNK1 TP53 SRCAP HRNR ATN1 MUC16 MLL2 Frequency (%) 0 50 100 EYA4 CSDAP1 CCDC81 ZDBF2 SOX17 SALL3 RSPO2 PITX2 PCDHB4 PCDHB15 PCDHAC1 PAX1 NKX2-4 HOXA9 HIST1H4F GHSR DCC BOLL BHLHE23 ADCYAP1 UNCX PCDHGB7 PCDH8 NKX2-6 MSC CDO1 SLC32A1 GALR1 MARCH11 PENK Frequency (%) A B Public Mutations Public Promoter Hypomethylation Public Promoter Hypermethylation C 95 DISCUSSION Multifocal bladder cancer is thought to originate from either carcinogenic insults resulting in independent tumors or a single clone spreading via intraepithelial migration (Cheng et al., 2002; Hafner et al., 2001; Sidransky et al., 1992). Both of these concepts have been extensively researched and there is evidence for both concepts on genetic and epigenetic levels (Knowles and Hurst, 2015; Wolff et al., 2010b). With the increased popularity of next-generation sequencing identifying the mechanisms and origins behind multi-focal bladder cancer is becoming easier. Deep sequencing of tumors allows for the identification of the heterogeneity within a signal tumor and can also be used to reconstruct phylogenetic trees of related tumors (Fischer et al., 2014). Genetic alterations and DNA methylation alterations combined can more precisely reconstruct the tumor ancestry of patients with multi-focal bladder cancer. Muscle-invasive UCC is genetically well characterized with the identification of epigenetic related genes having frequent mutations but there is limited knowledge of recurrent non muscle-invasive UCC (Network, 2014a; Nordentoft et al., 2014). Our results indicate the clonal expansion of a single clone in metachronous bladder tumors from 13 patients. We show common ancestral aberrations on both the genetic and epigenetic levels furthering the idea that UCC arises from a single tumor initiating event using WES and DNA methylation arrays. Furthermore, mutations in MLL2 and MUC16 could be early events in bladder cancer because of their high frequency of mutations occurring in public branches of patient’s tumors. Although MLL2 is frequently mutated in muscle-invasive UCC, it has not been shown in non muscle-invasive UCC or as a potentially early event (Gui et al., 2011; Network, 2014a). Our results also show 96 mutations occur at a lower frequency than DNA hypermethylation and hypomethylation alterations at gene promoters. This implicates DNA methylation playing potentially an important role with respect to disease recurrence and/or progression. Interestingly, DNA methylation changes occurring at promoters revealed multiple genes involved in the cadherin-signaling pathway. Cadherin pathway genes have been silenced by DNA methylation in UCC as well as other cancer types and play roles in tumor invasion (Kuphal et al., 2009; Wolff et al., 2010b; Yamamoto et al., 2008). We found 5 cadherin-pathway genes (PCDH8, PCDHAC1, PCDHGB7, PCDHB15 and PCDHB4) that showed hypermethylation occurring at their promoters and found in the public branch of DNA methylation alterations. Recently, PCDH8 silencing by DNA methylation was shown to correlate with muscle-invasive UCC and could be used as a diagnostic marker for monitoring recurrent UCC in non muscle-invasive UCC (Lin et al., 2013; Lin et al., 2014). Overall, our study reveals the clonal ancestry of UCC patients and their recurrent tumors. This further solidifies the idea of UCC as the outgrowth from a single tumor cell based on both alterations of the genome and epigenome. Additionally, early mutations or DNA methylation changes in key genes could be facilitating the recurrence or progression of bladder cancer. 97 CHAPTER 4 DNMT3B ISOFORMS WITHOUT CATALYTIC ACTIVITY STIMULATE GENE BODY METHYLATION AS ACCESSORY PROTEINS IN SOMATIC CELLS INTRODUCTION DNA methylation is a key epigenetic mechanism that participation in stable gene silencing in key biological processes including the establishment and maintenance of tissue specific gene expression patterns, X-chromosome inactivation, parasitic transposable elements silencing and genomic imprinting (Bird, 2002; Esteller, 2011; Jurkowska et al., 2011). Although DNA methylation is critical for mammalian development, genome-wide studies show that aberrations of normal tissue DNA methylation patterns are a hallmark of cancer and other diseases (Jones and Baylin, 2007; Sharma et al., 2010). Cytosines in a CpG context are methylated by the transfer of a methyl group from S-adenosylmethionine catalyzed by DNA methyltransferases (DNMTs) (Goll and Bestor, 2005). DNMTs are comprised of four family members: DNMT1, DNMT3A, DNMT3B as well as DNMT3L (DNA methyltransferase 3-like), which is required for the establishment of DNA methylation patterns during development (Gowher et al., 2005). Maintenance of DNA methylation is carried out by DNMT1 which copies DNA methylation patterns from the parental to the daughter strand during replication (Chen et al., 2007). Both DNMT3A and DNMT3B serve as de novo methyltransferases during embryonic development, but they also help maintain DNA methylation patterns in 98 somatic cells since DNMT1 cannot perform this function alone requiring the ongoing participation of DNMT3A/B (Chen et al., 2003; Jones and Liang, 2009; Liang et al., 2002a). DNMT3A has two different isoforms while DNMT3B has more than thirty isoforms (Gopalakrishnan et al., 2009; Ostler et al., 2007; Wang et al., 2006a; Xie et al., 1999). Expression patterns of the latter are highly conserved between humans and mice, suggesting that these isoforms are biologically significant (Okano et al., 1998). Many studies have identified the specific roles of DNMTs in DNA methylation during development but the role of aberrant expression levels of DNMTs and isoforms, especially DNMT3B in cancer, leading to global DNA methylation changes, is still unclear (Chen et al., 2003; Gopalakrishnan et al., 2009; Okano et al., 1999; Ostler et al., 2007; Robertson et al., 1999; Saito et al., 2002; Wang et al., 2006a; Xie et al., 1999). Several studies have investigated the roles of DNMTs in more detail demonstrating that DNMT3B participates in gene body methylation or remethylation by targeting the H3K36m3 modification (Baubec et al., 2015; Liao et al., 2015; Yang et al., 2014). Disruption of the catalytic domains of all three DNMTs was recently characterized in human ES cells by CRISPR/Cas9 genome editing (Liao et al., 2015). However, the disruption of DNMT3B1 left a truncated version very similar to the DNMT3B3 isoform in these cells (Liao et al., 2015). Furthermore, this study also found that although DNMT3B1 is highly expressed in ES cells DNMT3B1 expression decreases and DNMT3B3 which has a disrupted catalytic domain becomes the dominant isoform expressed in somatic cells (Liao et al., 2015). Thus the role of DNMT3B isoforms in ES and somatic cells is not well characterized. 99 In this study, we sought to identify target sites of DNMT3B isoforms on a genome-wide level and their functional roles by characterizing a representative panel of DNMT3B isoforms and DNMT3L by restoring their expression in DNMT3B deficient cells. We confirmed that transcribed regions of genes are the favored de novo DNA methylation target. We also show that isoforms of DNMT3B can stimulate DNA methylation in cells with decreased methylation and also re-methylation in gene bodies after DNA methylation inhibitor treatment. Together, our results suggest that DNMT3B isoforms can act as accessory proteins that interact with catalytically active enzymes to re-establish DNA methylation and could be one of many key factors for initiation of de novo DNA methylation during tumorigenesis. 100 MATERIALS AND METHODS Cell lines and Drug treatment HCT116 derivative cell lines DKO8 and 3BKO were cultured in McCoy’s 5A medium containing 1% penicillin/streptomycin and 10% inactivated fetal bovine serum in a humidified and 5% CO 2 containing atmosphere at 37°C. Cell lines were treated with 0.3uM of 5-Aza-CdR (Sigma-Aldrich) transiently for 24 hours followed by a medium change. DNA Methyltransferase Isoform Constructs Human DNMT isoforms 3B1, Δ3B2, 3B3, Δ3B4 and 3L, containing a MYC- tagged DNA sequence ligated to the 5’ ends, were amplified from pIRESpuro/Myc constructs (Choi et al., 2011) (a modified version of the pIRESpuro3 vector, Clontech), a gift from Allen Yang (USC). Catalytically-inactive mutants containing a cysteine to serine alteration in position 651 of DNMT3B1 and 452 of DNMTΔ3B2 proteins were established as previously described (Sharma et al., 2011). MYC-tagged DNMT sequences were cloned into pLJM1 lentivirus vector at EcoRI and AgeI sites using Infusion HD PCR Cloning Plus (Clontech) following manufacturer’s protocol. To produce lentivirus for the specific constructs, the vesicular stomatitis virus envelope protein G expression construct pMD.G1, the packaging vector pCMV ΔR8.91 and transfer vector pLJM1 were used as previously described (Ou et al., 2009). All vectors were amplified and purified using the PureYield™ Plasmid Maxiprep system (Promega), according to manufacturer’s instructions. Primer sequences for cloning can be found in Table 4.1.The HCT116 derivative cell line DKO8 and 3BKO were stably transfected with 101 a lentivirus and cells expressing DNMTs were selected in the presence of 2ug/mL puromycin for three weeks. Protein Extraction and Western Blot Analysis Cells were trypsinized, washed with PBS and resuspend in RIPA buffer (50 mM Tris-HCl, ph 8.0, 150 mM NaCl, 1% NP-40, 0.5% DOC, 0.1% SDS) with protease inhibitors. The cells were then sonicated on ice and cellular debris were removed my centrifugation. 2ug of protein was mixed with SDS/β-mercaptoethanol loading buffer and resolved on a Biorad 4-15% gradient SDS/PAGE gel. Antibodies against the Myc-epitope tag (Millipore, 05-724) and β-actin (Sigma, A2228) were used. Proteins were visualized using the ECL detection system (Thermo Scientific) and BioRad’s ChemiDoc™ system. RNA isolation and mRNA expression by qRT-PCR Puromycin-resistant polyclonal cells, stably transfected with DNMT isoforms, were used for total RNA extraction using the RNeasy Mini kit (Qiagen). 1 µg of total RNA was treated with 10 units of DNaseI (Roche) at 25°C for 20 minutes followed by 8mM final concentration EDTA and 10 min heat inactivation at 75°C. Typically, 1 µg of DNase treated RNA was reverse transcribed using iScript™ Reverse Transcription Supermix (BioRad), according to the manufacturer’s instructions. PCR reactions were performed using KAPA SYBR® FAST University 2X qPCR Master Mix. Primer sequences used in qRT-PCR can be found in Table 4.1. 102 Illumina Infinium HM450 DNA Methylation data processing DNA was extracted after 6 weeks of DNMT re-expression. DNA methylation was assessed using Illumina’s Infinium HumanMethylation450 (HM450) BeadChip array and was performed at the USC Epigenome Center according to the manufacturer’s specifications. The array examines the DNA methylation status of 482,421 CpG sites and each is reported as a beta value, ranging from 0 (unmethylated) to 1 (fully methylated). CpG probes with a detection p-value >0.05, located within 15 base pairs of a single- nucleotide polymorphism or located in gene deserts, were excluded from further analysis in all samples leaving 385,826. Analysis and beta value calculations were performed as described elsewhere (De Carvalho et al., 2012; Pandiyan et al., 2013; Yang et al., 2014). Changes of 0.2 (20%) beta-value were considered target sites for analyses and are described elsewhere (Choi et al., 2011). RNA-Seq Data Collection and Analysis RNASeqV2 Level 3 data from The Cancer Genome Atlas (TCGA) was obtained from the publically available data portal (http://cancergenome.nih.gov/dataportal/). Values for gene expression values were aligned using MapSplice and quantification was performed using RSEM (Wang et al., 2010b). 103 Table 4.1 PCR Sequences for DNMTs and Cloning Locus Sequence (5’ to 3’) qRT-PCR Primers TBP (NM_003194) Sense Antisense GCCCGAAACGCCGAATAT CCGTGGTTCGTGGCTCTCT DNMT3B (NM_006892) Sense Antisense CTGCCGGTGTTTCTGTGTGG TGTAACAGCTCCAGGGCTCC DNMT3L (NM_175867) Sense Antisense TACGCACGGCCCAAGC ACCAGATTGTCCACGAACATCC In-Fusion Primer Sequences for DNMTs 3B1/M (NM_006892) Sense Antisense GCTAGCGCTACCGGTATGGAGCAGAAGCTGATCT CGAGGTCGAGAATTCCTATTCACATGCAAAGTAGTC CTTC Δ3B2/M (NM_006892) Sense Antisense GCTAGCGCTACCGGTATGGAGCAGAAGCTGATCT CGAGGTCGAGAATTCCTATTCACATGCAAAGTAGTC CTTC 3B3 (NM_006892) Sense Antisense GCTAGCGCTACCGGTATGGAGCAGAAGCTGATCT CGAGGTCGAGAATTCCTATTCACATGCAAAGTAGTC CTTC 3B4 (NM_006892) Sense Antisense GCTAGCGCTACCGGTATGGAGCAGAAGCTGATCT CGAGGTCGAGAATTCTTAACTGTTCATCCCGGGT Δ3B4 (NM_006892) Sense Antisense GCTAGCGCTACCGGTATGGAGCAGAAGCTGATCT CGAGGTCGAGAATTCCTATTCACATGCAAAGTAGTC CTTC 3L (NM_175867) Sense Antisense GCTAGCGCTACCGGTATGGAGCAGAAGCTGATCT CGAGGTCGAGAATTCATGGCGGCCATCCCAGCCCTG GAC 104 RESULTS Stable Re-Introduction of DNMT Isoforms We elucidated the role of DNMT3B and its isoforms in DNA methylation by using the 3BKO and DKO8 derivatives of the HCT116 colon cancer cell line that have homozygous deletions for DNMT3B with DKO8 additionally having a dramatically reduced protein level of a hypomorphic DNMT1 (DNMT1 ΔE2-5 ) (Egger et al., 2006; Rhee et al., 2002) allele. There is globally about 3% less DNA methylation in 3BKO and 22% less DNA methylation in DKO8 compared to the parental line HCT116 (Figure 4.1 A and B) (Rhee et al., 2002). We selected a representative panel of DNMT3B isoforms as well as DNMT3L which is catalytically inactive and expressed in ES but not in somatic cells (Chen and Li, 2006; Gowher et al., 2005) to use as a positive control for accessory proteins (Figure 4.2). For DNMT3B isoforms, we chose DNMT3B1: the canonical full length protein, DNMT3B3: a catalytically inactive isoform (Gordon et al., 2013) expressed in somatic and ES cells but overexpressed in some types of cancer (Weisenberger et al., 2004), DNMT3B4: a catalytically inactive isoform that interacts with DNMT3s and reduces DNA binding ability (Gordon et al., 2013), DNMTΔ3B2: N- terminal truncated isoform mainly overexpressed in lung cancer (Wang et al., 2006a), DNMTΔ3B4: N-terminal truncated isoform along with missing of PWWP domain (Wang et al., 2006a), and DNMT3L: expressed only during embryonic development (Lucifero et al., 2007). In addition, we engineered DNMT3B1-M and DNMTΔ3B2-M, which are catalytically inactive mutants containing a cysteine to serine inactivating point mutation at position 651 and 452, respectively (Sharma et al., 2011). Figure 4.1: HCT116 Derivative Cell Line DNA Methylation Comparison. (A) Heatmap showing the 10% (n=39,605) most differentially methylated CpG sites between the 3 cell lines. Each row is an individual CpG site. Color scale ranges from cold to warm (β-value 0-1, 0-100% methylated). (B) Boxplot of all CpG sites from the 450K showing DNA methylation distribution of HCT116, 3BKO and DKO8 cell lines. 105 A HCT116 DKO8 3BKO 1 0 Methylation B-Value B HCT116 3BKO DKO8 0.0 0.5 1.0 B-Value of all CpG Sites Figure 4.2: Schematic Diagram of DNMT3A, Selected DNMT3B Isoforms and DNMT3L. Schematic diagram of DNMT3A (3A), DNMT3B isoforms (3B1, 3B1-M, 3B3, 3B4, Δ3B2, Δ3B2-M and Δ3B4) and DNMT3L (3L) showing conserved PWWP (purple), PHD-like domain (green) and DNMT catalytic motifs (black). There are 5 catalytic domains (I, IV , VI, IX, and X) in DNMT3A and DNMT3B, all of which are absent in DNMT3L. A red colored VI domain indicates inactivating mutations (Cys to Ser) of amino acids 651 and 452 in 3B1-M and Δ3B2-M, respectively. 3B4 has a frame shift with a unique protein sequence shown in orange. The ^ indicates alternative splicing. 106 3L 3B3 Δ3B4 Δ3B2 3B1-M Δ3B2-M 3B1 3A 3B4 PWWP PHD-like I IV VI IX X Catalytic Domain 107 Eight weeks after transfection we obtained polyclonal cell populations expressing a specific DNMT3B isoform or DNMT3L in DKO8 and 3BKO cells. Different RNA expression levels were seen for each of the DNMTs but larger variations were apparent in the protein levels, possibly reflecting differential stabilities of the variable isoforms outside of the putative DNMT3A/3B complex (Figure 4.3 A-D). Consistent with our previous study (Sharma et al., 2011) we show the mutant form of DNMT3B1 was expressed at a higher protein level than its wild-type form; however, no significant differences in restoration of DNA methylation between the transfected constructs could be observed (Figure 4.4 A). We also know that excess DNMT3B protein is degraded when not bound to nucleosomes which may explain the differences between the mRNA and protein levels of these DNMTs (Sharma et al., 2011). 108 HCT116 EV 3B1 3B1-M 3B3 3B4 Δ3B2 Δ3B2-M Δ3B4 3L 0.0 0.5 1.0 1.5 2.0 2.5 Relative Expression to TBP DKO8 HCT116 EV 3B1 3B1-M 3B3 3B4 3L x x x 0.0 0.5 1.0 1.5 2.0 2.5 Relative Expression to TBP 3BKO Figure 4.3: Stable Reintroductions of DNMT3B Isoforms and DNMT3L in DKO8 and 3BKO Cell Lines. (A and B) mRNA expression level of endogenous DNMT3B in HCT116, exogenous DNMT3B isoforms and DNMT3L assessed by qRT-PCR and normalized to the expression of the TATA Box Binding Protein (TBP) in DKO8 and 3BKO cells respectively. Error bars indicate standard deviation from the mean of 3 replicates. The empty vector (EV) cell line is the transfection control. (C and D) Protein expression levels of exogenous MYC-tagged DNMT isoforms by western blot analysis in DKO8 and 3BKO cells respectively. β-ACTIN was used as a loading control. HCT116 EV 3B1 3B1-M 3B3 3B4 Δ3B2 Δ3B2-M Δ3B4 3L 0.0 0.5 1.0 1.5 2.0 2.5 Relative Expression to TBP DKO8 MYC- Tagged DNMT β-ACTIN DKO8 3BKO HCT116 EV 3B1 3B1-M 3B3 3B4 3L x x x 0.0 0.5 1.0 1.5 2.0 2.5 Relative Expression to TBP 3BKO MYC- Tagged DNMT β-ACTIN A B C D HCT116 DKO8 3B1 3B1-M 3B3 Δ3B2 Δ3B2-M Δ3B4 3L Endogenous Methylation Methylation Changes [Construct-EV] Accessibility CGI Location 3B4 1 0 Methylation B-Value Others Promoter Gene Body 0.35 0 Methylation Difference Value CGI NON-CGI 54,911 CpGs Figure 4.4: DNMT3B isoforms and DNMT3L restore DNA methylation at specific CpG sites. (A) Boxplots showing the distribution of DNA methylation levels of 385,826 CpG sites for each indicated cell line. HCT116 cells global methylation level is included for comparison to the derivative cell line DKO8 EV . * indicates a significant difference between EV and DNMT-expressing cell line individually using a Student t-test with a Benjamini and Hochberg adjusted p-value (<0.05). (B) Heatmap showing 54,911 CpG sites in DKO8 cells, expressing the indicated DNMT isoforms. CpG sites in genomic locations targeted by DNMT3Bs and 3L with respect to promoter (maroon), gene body (orange) and other regions, excluding promoters and gene bodies (gray), are shown in the left panel. CpG sites in CpG islands are represented in green in the left panel. Endogenous methylation levels in HCT116 and DKO8 cells are represented by a cold to warm color scale (β-value 0-1, 0-100% methylated), where every row represents one CpG site in the middle panel. DNA methylation differences between DNMT isoforms and empty vector (EV) are shown in the right panel on a green to red color scale (0=no change). 109 A HCT116 EV 3B1 3B1-M 3B3 3B4 Δ3B2 Δ3B2-M Δ3B4 3L 0.0 0.5 1.0 B-Value of all CpG Sites * B 110 DNA Methylation Restoration by DNMT3B Isoforms is Independent of Catalytic Activity Global DNA methylation levels 8 weeks after transfection with the isoforms were compared to the parental HCT116 cell line and represented as a boxplot and heatmap (Figure 4.4 A and B). As expected, DNMT3L could restore DNA methylation in the transfected cells and showed the strongest overall restoration of DNA methylation compared to the DNMT3B isoforms (Figure 4.4 B). Interestingly, the majority of sites showing increased DNA methylation were also methylated in the parental HCT116 cells (Figure 4.4 B). Most of these restored sites are located in gene bodies, which are enriched in H3K36m3, characteristics of an actively transcribed region (Baubec et al., 2015; Venkatesh et al., 2012; Yang et al., 2014). We previously showed that the H3K36m3 mark is not lost in the more severely demethylated derivative cell line DKO1, which has lost 90% of DNA methylation (Yang et al., 2014). This finding is consistent with other observations that DNMT3B recognizes the H3K36m3 modification written by SETD2 (Baubec et al., 2015). Individually, the sites most targeted by DNMT3B are located in gene bodies and non-CpG islands regardless of the isoform when compared to the background distribution of CpG sites on the array (Figure 4.4 B and Figure 4.5 A and B). Furthermore, DNMT3B1 and its mutated form DNMT3B1-M show similar restoration of DNA methylation regardless of the latter being catalytically inactive, consistent with previous findings (Figure 4.4 A and B) (Sharma et al., 2011). In addition, although DNMT3B3 has been considered a catalytically inactive isoform (Gordon et al., 2013) DNA methylation was strongly increased by its presence. Figure 4.5: Genomic Context of DNMT3B Isoforms and DNMT3L Target Sites. (A) Distribution of genomic locations targeted by DNMT3Bs and 3L with respect to promoter (maroon), gene body (orange) and other regions excluding promoters and gene bodies (gray), from the 450K array. Using a z-test, p-values <0.001 are annotated with ***. (B) Distribution of genomic locations targeted by DNMT3Bs and 3L in respect to CpG island (green) or Non CpG island (gray) from the 450K array. Background distribution is the representative population of CpG sites that could be targeted by the individual DNMTs. Using a z-test, p-values <0.001 are annotated with ***. 111 Background 3B1 3B1-M 3B3 3B4 Δ3B2 Δ3B2-M Δ3B4 3L 0 50 100 Targetd CpG Sites (%) Location Others Promoter Gene Body *** *** *** *** *** *** Background 3B1 3B1-M 3B3 3B4 Δ3B2 Δ3B2-M Δ3B4 3L 0 50 100 Targetd CpG Sites (%) CGI CGI NON *** *** *** *** *** *** A B 112 Both DNMTΔ3B2 and Δ3B2-M isoforms without an N-terminal show lower but statistically significant abilities to increase DNA methylation levels independently of catalytic ability (Figure 4.4 A and B). DNMTΔ3B4 that has no PWWP domain had the lowest ability to induce remethylation. DNMT3B4 was also weak at inducing remethylation confirming the finding of its decreased binding affinity with DNA (Gordon et al., 2013). The abilities of DNMT3B and its isoforms to restore methylation are mainly dependent on the presence of an N-terminal but not a catalytic domain in most of the isoforms (Figure 4.2 and Figure 4.4 A and B). The exception here is DNMT3B4, which loses the ability to restore DNA methylation and has previously been shown to cause hypomethylation by binding to active DNMT3s and reducing their DNA binding affinities (Gordon et al., 2013; Saito et al., 2002). The loss of DNA or nucleosome binding affinity of DNMT3B4 could be due to the specific truncated C-terminal structure of DNMT3B4, which should be further studied. However, our previous studies have demonstrated that the ability of DNMT3B1 to bind nucleosomes is important in maintaining DNA methylation levels, while loss of the N-terminal domain could dramatically decrease the binding ability and increase protein degradation (Jeong et al., 2009; Sharma et al., 2011). Unexpectedly, most DNMT3B isoforms can restore DNA methylation at similar target regions independently of the presence of functional catalytic domains. These findings suggest that besides its functional role as a methyltransferase, DNMT3B may also act as an accessory protein to recruit DNMT3A and stimulate DNA methylation. 113 DNMT3B Isoforms can increase the Remethylation Rate after Treatment with a DNA Methylation Inhibitor We recently showed that inhibiting gene body methylation by transient 5-Aza- CdR treatment deceases the expression of some genes and that restoration of this methylation requires DNMT3B (Yang et al., 2014). We demonstrated that there are four groups (I-IV) of demethylated targets based on the rate of remethylation in HCT116 cells after treatment of 5-Aza-CdR (Figure 4.6) (Yang et al., 2014). The CpG sites of Group I, were enriched in gene bodies and the H3K36m3 modification and showed the fastest rates of remethylation and were almost fully remethylated 42 days after treatment, while the CpG sites in Group IV showed the slowest rate of remethylation and remained demethylated at day 42 (Yang et al., 2014). To test whether DNMT3B isoforms would have different functional roles during remethylation, we introduced a subset of isoforms into the 3BKO cell line (Figure 4.2 and Figure 4.3 B and D) and subsequently treated them transiently with 5-Aza-CdR for 24 hours. We observed DNA methylation decreases 5 days after treatment and DNA methylation restoration 42 days post treatment consistent with pervious studies (Figure 4.6, Figure 4.7 and Figure 4.8) (Yang et al., 2014). DNMT3L and DNMT3B1, 3B1-M, 3B3 were able to stimulate DNA re-methylation most strongly to Group I CpGs (Figure 4.7). DNMT3B4 did not influence the rebound methylation of this group as we expected since it has decreased binding affinity to DNA (Gordon et al., 2013). Figure 4.6: DNMT3B Isoforms Restore DNA Methylation at Specific CpG Sites After 5-Aza-CdR Treatment . Previously described Groups I-IV were used to show average DNA methylation values for each 3BKO cell line expressing a specific DNMT3 isoform at days 0, 5, 10 and 42 post 24 hour 5-Aza-CdR treatment (Yang et al., 2014). 114 0 5 10 42 0.4 0.6 0.8 1.0 Days Post Treatment DNA Methylation (B-Value) Group 1 0 5 10 42 0.4 0.6 0.8 1.0 Days Post Treatment DNA Methylation (B-Value) Group 2 0 5 10 42 0.4 0.6 0.8 1.0 Days Post Treatment DNA Methylation (B-Value) Group 3 0 5 10 42 0.4 0.6 0.8 1.0 Days Post Treatment DNA Methylation (B-Value) Group 4 EV 3B1 3B1-M 3B3 3B4 3L HCT Group 1 Group 2 Group 3 Group 4 HCT116 3BKO EV 3B4 3B3 3B1 3B1-M 3L HCT116 3BKO EV 3B4 3B3 3B1 3B1-M 3L HCT116 3BKO EV 3B4 3B3 3B1 3B1-M 3L 0.4 0.6 0.8 1.0 B-Value Figure 4.7: Group I CpG Sites are DNMT3B Target Sites in 3BKO. Heatmaps and Boxplots showing previously defined Group I CpG sites in cell lines expressing different DNMTs before 24 hour 5-Aza-CdR treatment and Day 5 and Day 42 post treatment (Yang et.al., 2014). Individual CpG sites falling in genomic locations targeted by DNMT3Bs and 3L with respect to promoter (maroon), gene body (orange) and other regions, excluding promoters and gene bodies (gray) are shown in the left panel. CpG sites in CpG islands are represented in green in the left panel. Endogenous methylation levels in cell lines are represented by a cold to warm color scale (β-value 0-1, 0-100% methylation) where every row represents one CpG site in the 3 right panels. 115 Day 0 Day 5 Day 42 Group I 2,521 CpG Sites Others Promoter Gene Body CGI NON-CGI 1 0 Methylation B-Value HCT116 3BKO EV 3B4 3B3 3B1 3B1-M 3L HCT116 3BKO EV 3B4 3B3 3B1 3B1-M 3L HCT116 3BKO EV 3B4 3B3 3B1 3B1-M 3L 0.4 0.6 0.8 1.0 B-Value Group IV 10,907 CpG Sites Day 0 Day 5 Day 42 Others Promoter Gene Body CGI NON-CGI 1 0 Methylation B-Value Figure 4.8: Group IV CpG Sites Rebound Independently of DNMT3B. Heatmaps and Boxplots showing previously defined Group IV CpG sites in cell lines expressing different DNMTs before 24 hour 5-Aza-CdR treatment and Day 5 and Day 42 post treatment (Yang et.al., 2014). Individual CpG sites falling in genomic locations targeted by DNMT3Bs and 3L with respect to promoter (maroon), gene body (orange) and other regions, excluding promoters and gene bodies (gray) are shown in the left panel. CpG sites in CpG islands are represented in green in the left panel. Endogenous methylation levels in cell lines are represented by a cold to warm color scale (β-value 0-1, 0-100% methylation) where every row represents one CpG site in the 3 right panels. 116 117 As we expected, Group IV CpG sites were also confirmed to be non-DNMT3B target regions, the same as for DNMT3L (Figure 4.8). These results therefore confirm that DNMT3B can act as an accessory protein like DNMT3L in transcribed regions in somatic cells. Over-expression of DNMT3s in different cancer types The role of DNMT3B in acting as an accessory protein may also contribute to aberrant de novo methylation during tumorigenesis. Since overexpression of DNMTs (Kautiainen and Jones, 1986), especially DNMT3s and their isoforms, is commonly seen in tumors (Jones and Liang, 2009; Linhart et al., 2007; Rhee et al., 2002; Robertson et al., 1999). Our previous work suggests that DNMT3A requires DNMT3B for restoration of methylation in somatic cells and DNMT3B is also required for re-methylation after 5- Aza-CdR treatment (Sharma et al., 2011; Yang et al., 2014), however, the functional role of DNMT3B and its isoforms in a complex with DNMT3A is still not clear. DNMTs are overexpressed in cancer cells (Robertson et al., 1999), although detailed analysis of the expression profiles has not been described. We took advantage of The Cancer Genome Atlas (TCGA) project which includes RNA-seq data on 8 primary tumor types to study expression levels of DNMT3s in normal tissues and their corresponding tumors. We confirmed overexpression of DNMT3A in 6 out of 8 cancer types and DNMT3B overexpression in all cancer types, compared to normal tissues (Figure 4.9). Figure 4.9: Differential Expression Level of DNMT3A and 3B Between Normal and Various Tumor Tissues. Normalized read counts from RNA-Seq data were calculated for DNMT3A, DNMT3B and DNMT3B isoforms from matched normal and tumor tissue samples by Expectation-Maximization (RSEM). Expression fold change is shown as log 2 (tumor/normal) for each cancer type. N indicates the number of sample pairs for each tumor type. Mann-Whitney unpaired statistical test was used to assess expression differences on log 2 (RSEM) values between tumor and normal tissue. P-values <0.05, <0.01 or <0.001 are annotated with *, ** or ***, respectively. 118 3A 3B3B13B23B33B43B53B63B8 -10 -5 0 5 10 BLCA 3A 3B3B13B23B33B43B53B63B8 -10 -5 0 5 10 BRCA 3A 3B3B13B23B33B43B53B63B8 -10 -5 0 5 10 COAD 3A 3B3B13B23B33B43B53B63B8 -10 -5 0 5 10 HNSC 3A 3B3B13B23B33B43B53B63B8 -10 -5 0 5 10 LIHC 3A 3B3B13B23B33B43B53B63B8 -10 -5 0 5 10 KIRP 3A 3B3B13B23B33B43B53B63B8 -10 -5 0 5 10 LUAD 3A 3B3B13B23B33B43B53B63B8 -10 -5 0 5 10 LUSC Bladder Cancer (n=19) *** *** ** *** *** Lung Adenocarcinoma (n=57) *** *** *** ** *** *** Colon Adenocarcinoma (n=41) *** *** * ** Head/Neck Squamous Cell Carcinoma (n=41) *** ** *** *** *** Kidney Cancer Papillary (n=30) *** ** * ** ** Hepatocellular Carcinoma (n=50) *** *** * *** *** Lung Squamous Cell Carcinoma (n=50) *** *** *** *** *** *** *** Breast Cancer (n=110) *** *** *** *** *** Isoforms Isoforms 3A 3B 3B1 3B2 3B3 3B4 3B5 3B6 3B8 -10 -5 0 5 10 LUAD 3A 3B 3B1 3B2 3B3 3B4 3B5 3B6 3B8 -10 -5 0 5 10 LUAD 119 In addition, multiple isoforms of DNMT3B were also significantly over-expressed in these types of cancers (Figure 4.9) including catalytically inactive DNMT3B3. Please note that delta isoforms, such as DNMTΔ3B2 and 4, are not included in this analysis because of the difficulties in distinguishing these isoforms using RNA-seq data (Figure 4.9). DISCUSSION In this study we investigated the DNA methylation target sites of DNMT3B isoforms on a genome-wide level by restoring the expression of a representative panel of DNMT3B isoforms in two different DNMT3B deficient cell lines. Genome-wide analysis of DNA methylation changes reveals that catalytic activity is not required for the induced DNA methylation indicating an accessory protein role for DNMT3B isoforms. The presence of the N-terminal and PWWP domain was found to be important for DNMT3B function and its role as an accessory protein. Preferential binding of DNMT3B isoforms to regions known to be enriched with the H3K36me3 modification was found in our earlier work (Yang et al., 2014). Additionally, we confirmed the aberrant expression of DNMT3B isoforms in various cancers using publically available datasets. Taken together, we show that DNMT3B can act as an accessory protein to restore DNA methylation of gene bodies in differentiated cells. Our results and other studies have shown that DNMT3Bs are able to bind to active DNMT3A molecules in vivo and in vitro (Gordon et al., 2013) The data here show clearly that DNMT3B1 can act as an accessory protein even when it is catalytically inactive, although it does require intact N-terminal and PWWP domains. Since 120 catalytically inactive DNMT3B1 can establish de novo DNA methylation patterns in both DNMT1 and DNMT3B deficient cells, it seems most likely that DNMT3B acts as an accessory protein for DNMT3A rather than DNMT1 (Sharma et al., 2011). Although DNMT3B3 catalytic activity is controversial, our study suggests that DNA methylation in the presence of the DNMT3B3 isoform could be caused by it acting as an accessory protein in differentiated cells. The changing landscape of catalytically active and inactive proteins included in ES and differentiated cells are depicted in Figure 4.10. DNMT3L has been identified as a DNMT3A accessory protein to stimulate de novo methylation in embryonic stem cells (ES) (Bourc'his et al., 2001; Gowher et al., 2005; Guo et al., 2015; Neri et al., 2013; Ooi et al., 2007), while UHRF1 has also been recognized as a DNMT1 accessory protein to maintain DNA methylation patterns in embryonic and differentiated cells (Bostick et al., 2007; Sharif et al., 2007). Our findings, for the first time, suggest a role for DNMT3B as an accessory protein for DNMT3A to stimulate and restore DNA methylation. The results also raise the possibility that DNMT3B isoforms can act as accessory proteins to themselves, especially when different isoforms are expressed simultaneously and interact with each other to boost DNA methylation (Gordon et al., 2013). Furthermore, DNMT3L recruits DNMT3s to specifically bind to nucleosomes without H3K4 methylation and stimulate de novo methylation during embryonic development (Lucifero et al., 2007; Ooi et al., 2007), our results of overlapping target regions by DNMT3B and DNMT3L suggests that H3K36m3 could also be a target signal for DNMT3L. Figure 4.10: The Changing Landscape of de novo DNMTs During Development. In ES cells the catalytically active CpG methyltransferases DNMT3A1 (3A1), DNMT3A2 (3A2) and DNMT3B1 (3B1) are expressed as well as the accessory protein DNMT3L (3L) (Aapola et al., 2000; Bourc'his et al., 2001; Chen et al., 2002; Jia et al., 2007). Our work suggests that catalytically inactive DNMT3B3 (3B3) may also participate as an accessory protein in the establishment of DNA methylation patterns. Following differentiation 3A2 and 3L are no longer expressed while expression of 3B1 is decreased dramatically leaving 3A1 and the accessory protein 3B3 as potential mediators of methylation (Aapola et al., 2000; Chen et al., 2002; Lei et al., 1996). Our work also suggests that 3B3 preferentially targets 3A1 to gene bodies. Active methyltransferase enzymes and accessory proteins are shown in red and blue boxes respectively. 121 3A2 Differentiation 3L ES Cells 3B1 3B3 3A1 Differentiated Cells 3B1 3A1 3B3 3L 3A2 122 In conclusion, we demonstrate that DNMT3B can act as an accessory protein to restore DNA methylation specifically in gene bodies in differentiated cells. This finding also suggests DNMT3B and its catalytically inactive isoforms may play key roles in collaborating with DNMT3A, like DNMT3L in ES cells, for initiating and maintaining DNA methylation in transcribed regions. Our findings might also help explain the presence of aberrant DNA methylation patterns during tumorigenesis because of the overexpression of DNMT3A and DNMT3B isoforms. 123 CHAPTER 5 WHOLE-EXOME SEQUENCING PROCESSING METHDOLOGY INTRODUCTION Next-generation sequencing (NGS) represents a significant breakthrough in the field of cancer genetics. The popularity of whole-exome sequencing (WES) is rising due to its low cost and ability to focus on potentially functional coding regions mutations. WES covers 1-1.5% of the human genome of which approximately 85% of the known protein coding disease-causing variants are found in these regions although GWAS hits are found in non-exonic regions (Ng et al., 2009; Rosenfeld et al., 2012). WES generates enormous amounts of sequence data, which requires considerable processing and interpretation before disease-causing variants can be identified. The typical WES workflow involves several steps: sequence data quality check, mapping to a reference genome, variant calling and calibration, variant annotation, filtering and finally isolation of variants of interest. (Figure 5.1) (McKenna et al., 2010; Tetreault et al., 2015). The quality of variant detection depends highly on the accuracy of NGS base calling and can be plagued with sequencing errors and detection issues. The improvements of sequencing technology have increased the accuracy of this process to above 99.5% base-calling accuracy (Massingham and Goldman, 2012). Since the base- calling accuracy is extremely high, researchers can now focus on the quality of the sequence data using tools like FASTQC or Picard. These tools can quickly estimate how many reads map to the reference genome, the average depth of coverage and the presence of overrepresented sequences or PCR duplicates (Bioinformatics; Picard). Figure 5.1: Whole-Exome Sequencing (WES) Data Processing Overview. Specific tools and resources are indicated on the left and what each of those packages contributes to the WES workflow is detailed on the right. 124 FASTQC • Quality Check BWA • Index Reference Genome • Align Reads to Reference Genome Picard • Sort SAM File • Mark Duplicates • Index BAM File GATK • INDEL Realignment • Base Recalibration • Variant Calling and Recalibration of Variant Quality Scores SnpEff • Annotate SNVs and INDELs Filtering • Remove SNVs and INDELs with Bad Coverage/Quality • Remove Germline Events using dbSNP Isolation • Isolate Variants based on a Biological Question • Assess SNVs and INDELs for Function 125 NGS data analysis relies heavily on the ability to map sequence reads to a reference genome and there are multiple tools available to do this including MAQ, bowtie, bowtie2 and BWA (Langmead and Salzberg, 2012; Langmead et al., 2009; Li and Durbin, 2009; Li et al., 2008). These tools are very efficient for aligning sequence reads to a reference genome although BWA is better for allowing gapped alignments, which is essential to call insertions or deletions (INDELs) from NGS (Li and Durbin, 2009). A critical step in WES data analysis is the ability to call variants. SAMtools and The Genome Analysis Tool Kit (GATK) are two widely used tool kits that lead the way in NGS for variant calling and provide numerous tools to answer most biological questions (Li et al., 2009a; McKenna et al., 2010). GATK is more robust than other toolkits including SAMtools because of its accurate statistical modeling of sequencing errors (Abecasis et al., 2012). These tools although are all limited by project sequencing depth and studies which have approximately 100-150x sequence coverage (~20X average coverage per exon) are 95% sensitive in their variant calling pipelines. BWA combined with GATK has 99-99.7% selectivity in detection and calling of single nucleotide variants (SNVs) and 80% selectivity for INDELs which outperformed the following pipelines: Short Oligonucleotide Analysis Package (SOAP), BWA-SNVer, Genomic Next-generation Universal MAPper (GNUMAP) and BWA-SAMtools (Clement et al., 2010; Li et al., 2009b; O'Rawe et al., 2013; Wei et al., 2011). Proper variant annotation provides the link between identified SNVs or INDELs and their functional effects. Using gene annotations and classification of variants according to their functional category is most common (e.g., synonymous, nonsense, 126 missense, splicing, etc.). Although gene annotations provide important information, some variants are polymorphisms present in the normal population, thus data sequenced by consortia like dbSNP and The 1000 Genomes Project (1000G) are also necessary to be included in the annotation process of variants (Abecasis et al., 2012; Sherry et al., 2001). Appending more information to variants besides prevalence in the population is also necessary and can be fulfilled by ANNOVAR, SnpEff or VEP (Cingolani et al., 2012; VEP; Wang et al., 2010a). These tools are used on WES data to identify recurrent aberrations from large sample studies especially for cancer genome studies (Nadaf et al., 2015). WES data can further be used to identify copy-number variations (CNVs) although these variations are limited to exonic regions. CNV calling from WES data can be done using read depth and allele frequency with the Control-FREEC tool ((Boeva et al., 2012; Boeva et al., 2011)). The advantage of using WES to detect CNVs is the ability to use both the read depth and allele frequency to determine regions that have loss of heterozygosity (LOH) or CNVs where traditional techniques, based on allele frequency alone, can miss balanced amplifications (e.g. where AABB is present instead of AB) (Wong et al., 2004). To analyze 55 WES data sets we developed the following pipeline to interpret multiple WES samples. We start with raw NGS data and performed sequence quality checks before aligning the reads to a reference genome with BWA. Then Picard and GATK were used to remove duplicate reads, identify INDELs and SNVs, perform base quality recalibration and variant quality score recalibration. This decreased the false discovery rate of variants. SnpEff was used to annotate all variants with functional 127 information. These variants were then filtered and evaluated for biological significance using dbSNP and 1000G (Figure 5.1). We also used the WES to identify CNVs present in each sample using Control-FREEC. This workflow provides an example of using established and verified tools to identify potentially functional SNVs and INDELs from WES. 128 MATERIALS AND METHODS Sample Preparation and Sequencing 55 samples for WES were collected comprised of: Cell pellets from 16 cell lines (Table 2.2), DNA from 30 UCC tumors and 9 adjacent normal urothelium from 13 patients were kindly provided by Dr Yong-June Kim (Table 3.1). Genomic DNA (5ug) from samples was isolated using Qiagen’s DNeasy Blood and Tissue Kit following manufacturers specifications. DNA was sent to Genewiz for whole exome sequencing. Exome library enrichment was performed with Agilent SureSelect Exome library preparation following the manufacturers specifications. Sequencing was performed on Illuminia’s HiSeq2500 High Output mode platform in a 2x100bp paired-end (PE) configuration. Each sample had a minimum of 30X coverage per base. Publically Available Resources FASTQC is a freely available resource that can be downloaded from the following web address: http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc (Bioinformatics). FASTQC is a java-based program that allows for quick quality control analysis of FastQ files from next-generation sequencing projects. BWA is a software package for mapping sequences against a reference genome and is freely available for use (Li and Durbin, 2009). Bedtools provides many useful analysis tools for genomic analysis of large datasets and is available for download (Quinlan and Hall, 2010). Control-FREEC can determine CNVs from NGS and is available for use with WES as well (Boeva et al., 2012; Boeva et al., 2011). The Genome Analysis Tool Kit (GATK) is a collection of tools for analyzing NGS data including pre 129 and post-processing of sequence data (McKenna et al., 2010). SnpEff is a functional annotation tool for outputs from variant calling pipelines including GATK (Cingolani et al., 2012). Picard is another toolkit that is useful for NGS data and is available for download and use at the following web address: http://picard.sourceforge.net (Picard). 130 RESULTS FastQ Files and Quality Control Before running any analysis on sequence reads, we first evaluated the quality of the sequence data. Since the sequence data is stored in FastQ files, it is important to understand what is actually contained in those types of file. Each read in a FastQ file consists of four lines containing important information about the sequence data (Figure 5.2 A). The first line starting with @ designates the name of the read and also contains information about the position on the flow-cell in the following order: @<Instrument Name>:<Run-ID>:<flow-cell-ID>:<tile within the flow-cell lane>:<x-coordinate of the detected cluster>:<y-position of the detected cluster> <member of a pair>:<Y or N if the read is filtered>:<0 or even number when control bits are off>: <multiplex index>. The second and fourth lines are the actual sequence read and quality score per base, respectively. These lines are separated by a + symbol but can contain the read name as well. Quality FastQ files are necessary for downstream analysis and it is important to verify the quality of the files before proceeding. Using the publically available resource FASTQC package we validated the quality of each sequence read from the corresponding FastQ files. In general, the most important aspect to investigate in FastQ files is the per base quality score within each sequence read. The optimal quality score is ≥30 independent of the location of the base in question within each sequenced read. Since we generated sequence data using 100bp paired-end reads, we ensured quality reads throughout the entire sequence read (Figure 5.2 B). Each FastQ file was verified to have at least 90% of the sequence reads having a quality score average of 30 or above. Figure 5.2: FASTQ Files and Quality Check. (A) Example data contained in FASTQ files. The first line in black specifies the name of the read with information about position on a flow-cell. The second line in red is the actual sequence read. The third line in blue is always a + sign. The fourth line in green is the quality score of each base in the read in an encoded format. (B) Boxplot of quality scores per base for an entire FASTQ file separated by position in each read. Quality scores above 30 are considered high quality reads for each individual base. 131 @GWZHISEQ01:385:C6WCJACXX:1:1101:1909:2498 1:N:0:AGCAGGAA TACAGAGATAAAATGTACCATAGAAATCTTATCTAAATCTGTCTGGAGACGTAAAATA AGATGACAACAAACTTTTAAGTACCCTTTGAAAAACTTAAGCA + CCCFFFFFHHHHHJJHHJIJJIJJJJIJJJJJJIIJJJJJJJJJJIIJJJJJJGIIIJ JIJJJJJJIIJJJJJJJHIHHHHHHFFFFFFFEDEDCDDDDCD A B Position in Read (bp) Quality Score Value 132 Alignment to the Genome, Sorting, Marking Duplicates and Indexing Files Following FastQ quality checks, we aligned the sequence reads to a human genome build using one of three different methods: Bowtie, Bowtie2 and Burrows- Wheeler Aligner (BWA). Bowtie is very stringent, requires short reads, but does not allow gapped alignments. Bowtie2 is good for longer read lengths and allows gapped alignments, but is not as efficient in memory as BWA, which also allows for gapped read alignments. BWA is widely used for whole exome sequencing (WES) and provides a higher quality mapping ability for the data in this study. Specifically, BWA imports the FastQ files of each sample, which are separated by forward and reverse reads, into two separate FastQ files, and matches the sequence read pairs before alignment to the human genome build (hg19). The genome build was indexed and transformed before proceeding in order to optimize performance. This is accomplished using an index function on the reference genome FastQ files. Paired-end reads were aligned using BWA (with the options mem, M and R), resulting in an aligned SAM file (example line of code in Figure 5.3 A). We specified the M option for marking shorter split hits in order to use Picard, a toolkit of java based functions in which to analyze and/or process Illumina NGS data, as well as R for writing the read group header line for each sequence in the output SAM file. Picard sorting was accomplished by chromosome and position (SortSam function) and outputted as a binary version of a SAM file (BAM) (example line of code in Figure 5.3 B). The sorted BAM file was then further processed to mark and remove duplicate reads (MarkDuplicates function) (example line of code in Figure 5.3 C). Finally, the file was indexed (BuildBamIndex) and processed using Picard (Figure 5.3 D). Figure 5.3: Step-by-Step SNV and INDEL Calling Part 1. (A) Example line of code for using BWA with specific options specifying the read group information to be added to the file. The reference genome is also indicated and necessary for BWA. (B) Sorting SAM files using PICARD. (C) Marking duplicate PCR sequence reads using PICARD. (D) Creating a new index for the newly generated BAM file. Functions or specific tools are indicated in red, input files in blue, output files in green and automatically generated output files in orange. 133 bwa mem -M -R '@RG\tID:WES_1\tSM:1\tPL:IHS\tLB:libWES_1\tPU:unitWES_1’ hg19.fasta WES_1_R1.fastq.gz WES_1_R2.fastq.gz > WES_1.sam java -jar picard/SortSam.jar \ INPUT=WES_1.sam \ OUTPUT=WES_1.bam \ SORT_ORDER=coordinate java -jar picard/MarkDuplicates.jar \ INPUT=WES_1.bam \ OUTPUT=WES_1_dedup_reads.bam \ METRICS_FILE=WES_1_metrics.txt java -jar picard/BuildBamIndex.jar \ INPUT=WES_1_dedup_reads.bam \ WES_1_dedup_reads.bai A B C D 134 Using the Genome Analysis Tool Kit (GATK) Although the sequence reads are aligned to the latest genome build, we rectified each BAM file by adjusting reads around insertions/deletions (INDELs). Realignment of reads around INDELs is necessary because these regions generate a high number of false- positive SNVs when the INDEL is located at the end of a sequence read. GATK addresses this issue in two steps. First, an index of possible INDEL targets is made, followed by local sequence read realignment using those target areas, using RealignerTargetCreator and IndelRealigner tools, respectively (example lines of code in Figure 5.4 A and B). Importantly, a list of possible “known” INDELs was incorporated into the analyses and was used as a reference to create the target region realignment indices. These files can be downloaded from NCBI or from GATK databases (McKenna et al., 2010). The need for determining INDELs in sequencing data is shown in Figure 5.5. In this example, sequence reads aligned using BWA (Figure 5.5 A) were compared to those using GATK tools (Figure 5.5 B). Using GATK, specific reads around an INDEL were aligned for a more robust sequence analysis. Although each sequencing file has already been assigned a quality score for each base of sequencing, these values were recalibrated for further downstream filtering and analysis using GATKs BaseRecalibrator tool. This is comprised of four separate steps. First, the sequence data are analyzed for patterns of covariation, resulting in a table of information to be used in the second step (example line of code in Figure 5.4 C). Here, another round of covariation analysis is performed on the data using the base quality score recalibration (BQSR) option in order to perform on-the-fly recalibration based on the initial pass (example line of code in Figure 5.4 D). Third, a GATK tool generated 135 plots to compare base quality scores before and after base recalibration (example line of code in Figure 5.4 E). Lastly, the actual recalibrated base quality scores were applied to the BAM file (example line of code in Figure 5.4 F). SNVs and INDELs were called from the sequence files , and then filtered based on the new read quality information. GATK includes another useful tool that is used to call SNVs and INDELs called HaplotypeCaller. This tool filters variants but was only used to create a variant call format (VCF) file for which we will filter in downstream analyses. Using this tool, we called SNVs based on high quality phred-scaled scores. Anything with a phred-scaled score below 10 was not called and anything above 30 was considered as sufficient quality. The phred-scaled scores between 10 and 30 were automatically annotated with “LowQual” to separate the low and high confident SNVs (example line of code in Figure 5.6 A). Since we did not use HaplotypeCaller to filter SNVs, we filtered SNVs and INDELs by either applying a variant quality score recalibration (VQSR) or a hard filtering technique depending on the total number of variants. VQSR is more robust, but only works on datasets that contain more than 20,000 variants. We performed VQSR on all the samples that had at least 20,000 variants, while the rest were subjected to hard filtering criteria. VQSR was performed on SNVs in two steps: 1) using of training datasets like dbSNP, HapMap, Illumina’s Omni Genotype Array, and 1000 Genome projects to first calculate confidence intervals of SNVs; 2) applying the newly generated quality score to each SNV (example lines of code in Figure 5.6 B and C) (Abecasis et al., 2012; Sherry et al., 2001). To run VQSR on INDELs, the same method is applied but the training sets are the gold standard INDELs (example lines of code in Figure 5.7 A and B) 136 (Mills et al., 2011). To perform the alternative method of filtering using a hard-filtering technique, we used the GATK tool SelectVariants for SNVs and INDELs separately and applied specific filtering criteria. For SNVs, we first isolated these from the VCF file before applying filtering criteria using SelectVariants (example line of code in Figure 5.8 A). Filtering SNVs with read depths (QD) <2, strand bias (FS) >60, mapping quality (MQ)<40, haplotype score >13, mapping quality rank sum <-12.5 and read position rank sum <8 was performed with the VariantFiltration tool within GATK (example line of code in Figure 5.8 B INDELs were analyzed in the same manner as SNVs for selecting them from the VCF file (example line of code in Figure 5.8 C). To filter INDELs, we used the parameters of QD<2, FS>200 and read position rank sum <-20 (example line of code in Figure 5.8 D). We recombined all the information from the SNV filtering and INDEL filtering using GATKs CombineVariants tool in order to create a uniform VCF file (example line of code in Figure 5.8 E). The process of annotating the VCF file yields information for every single variant and was displayed as either “PASS” for high confidence variants or “LowQUAL” for low confidence variants that did not meet the aforementioned criteria. Only variants that passed these criteria were included in the following filtering process. Figure 5.4: Step-by-Step SNV and INDEL Calling Part 2. (A) Example for using GATKs realignment target creator function. (B) Line of code for how target realignment around indels outputs a newly aligned bam file. (C) Step 1 of base recalibration which creates a file with analyzed covariation information. (D) Step 2 of base recalibration where on-the-fly base recalibration is performed. (E) Informational plots generated showing before and after results of base recalibration. (F) Applying base recalibration scores to original bam file. Functions or specific tools are indicated in red, input files in blue, output files in green and automatically generated output files in orange. 137 java -jar GenomeAnalysisTK.jar \ -T RealignerTargetCreator \ -R hg19.fasta \ -I WES_1_dedup_reads.bam \ -known Gold_standard.indels.hg19.vcf \ -o WES_1_target_intervals.list java -jar GenomeAnalysisTK.jar \ -T IndelRealigner \ -R hg19.fasta -I WES_1_dedup_reads.bam -targetIntervals WES_1_target_intervals.list -known Gold_standard.indels.hg19.vcf -o WES_1_realigned_reads.bam java -jar GenomeAnalysisTK.jar \ -T BaseRecalibrator \ -R hg19.fasta \ -I WES_1_realigned_reads.bam \ -knownSites dbsnp_138.hg19.vcf \ -knownSites Gold_standard.indels.hg19.vcf \ -o WES_1_recal_data.table java -jar GenomeAnalysisTK.jar \ -T BaseRecalibrator \ -R hg19.fasta \ -I WES_1_realigned_reads.bam \ -knownSites dbsnp_138.hg19.vcf \ -knownSites Gold_standard.indels.hg19.vcf \ -BQSR WES_1_recal_data.table \ -o WES_1_post_recal_data.table java -jar GenomeAnalysisTK.jar \ -T AnalyzeCovariates \ -R hg19.fasta \ -before WES_1_recal_data.table \ -after WES_1_post_recal_data.table \ -plots WES_1_recalibration_plots.pdf java -jar GenomeAnalysisTK.jar \ -T PrintReads \ –R hg19.fasta \ –I WES_1_realigned_reads.bam \ -BQSR WES_1_recal_data.table \ -o WES_1_recal_reads.bam WES_1_recal_reads.bai A B C D E F Figure 5.5: Sequence Alignments Before and After GATK-based Read Realignment. (A) Sequence reads after initial alignment using BWA at an INDEL site. The arrow indicates a sequence read that is not alignment properly and would possibly result in a SNV call at that site. (B) Sequence reads after realignment of sequence reads are performed by GATKs IndelRealigner and BaseRecalibrator tools. The end of the sequence read with a C was readjusted to map correctly to the reference genome and now it includes the INDEL that was represented in the majority of the other sequence reads. 138 A B C G T G T G T G T G T G T G PAK6 C C G T G T G T G T G T G T G PAK6 C Figure 5.6: Step-by-Step SNV and INDEL Calling Part 3. (A) An example of the GATK variant calling function code. (B) SNV quality score recalibration from GATK. (C) Application of the variant quality score recalibration on SNVs. Functions or specific tools are indicated in red, input files in blue and output files in green. 139 java -jar GenomeAnalysisTK.jar \ -T HaplotypeCaller \ -R hg19.fasta \ -I WES_1_recal_reads.bam \ --genotyping_mode DISCOVERY \ -stand_emit_conf 10 \ -stand_call_conf 30 \ -o WES_1_raw_variants.vcf java -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator \ -R hg19.fasta \ -nt 16 \ -input WES_1_raw_variants.vcf \ -resource:hapmap,known=false,training=true,truth=true,prior=15.0 hapmap_hg19.vcf \ -resource:omni,known=false,training=true,truth=true,prior=12.0 omni_hg19.vcf \ -resource:1000G,known=false,training=true,truth=false,prior=10.0 1000G.hg19.vcf \ -resource:dbsnp,known=true,training=false,truth=false,prior=2.0 dbsnp_138.hg19.vcf \ -an DP -an QD -an FS -an MQRankSum -an ReadPosRankSum \ -mode SNP \ -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \ -recalFile WES_1_recalibrate_SNP.recal \ -tranchesFile WES_1_recalibrate_SNP.tranches \ -rscriptFile WES_1_recalibrate_SNP_plots.R java -jar GenomeAnalysisTK.jar \ -T ApplyRecalibration \ -R hg19.fasta \ -nt 16 \ -input WES_1_raw_variants.vcf \ -mode SNP --ts_filter_level 99.0 \ -recalFile WES_1_recalibrate_SNP.recal \ -tranchesFile WES_1_recalibrate_SNP.tranches \ -o WES_1_recalibrated_snps_raw_indels.vcf A B C Figure 5.7: Step-by-Step SNV and INDEL Calling Part 4. (A) INDEL variant quality score recalibration using GATK. (B) Application of the variant recalibration on INDELs. Functions or specific tools are indicated in red, input files in blue and output files in green. 140 java -jar GenomeAnalysisTK.jar \ -T VariantRecalibrator \ -R hg19.fasta \ -nt 16 \ -input WES_1_recalibrated_snps_raw_indels.vcf \ -resource:mills,known=true,training=true,truth=true,prior=12.0 Gold_indels.hg19.vcf \ -an DP -an FS -an MQRankSum -an ReadPosRankSum \ -mode INDEL \ -tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 \ --maxGaussians 4 \ -recalFile WES_1_recalibrate_INDEL.recal \ -tranchesFile WES_1_recalibrate_INDEL.tranches \ -rscriptFile WES_1_recalibrate_INDEL_plots.R java -jar GenomeAnalysisTK.jar \ -T ApplyRecalibration \ -R hg19.fasta \ -nt 16 \ -input WES_1_recalibrated_snps_raw_indels.vcf \ -mode INDEL --ts_filter_level 99.0 \ -recalFile WES_1_recalibrate_INDEL.recal \ -tranchesFile WES_1_recalibrate_INDEL.tranches \ -o WES_1_recalibrated_variants.vcf A B Figure 5.8: Alternative to VQSR for SNV and INDEL Filtering. (A) Selecting SNVs from the list of raw variants using SelectVariants from GATK. (B) Applying a filtering criteria on SNVs. (C) Selecting INDELs from the list of raw variants. (D) Applying filtering criteria on INDELs. (E) Combining the filtering information for SNVs and INDELs into VCF files. Functions or specific tools are indicated in red, input files in blue and output files in green. 141 java -jar GenomeAnalysisTK.jar \ -T SelectVariants \ -R hg19.fasta \ -V WES_1_raw_variants.vcf \ -selectType SNP \ -o WES_1_raw_snps.vcf java -jar GenomeAnalysisTK.jar \ -T VariantFiltration \ -R hg19.fasta \ -V WES_1_raw_snps.vcf \ --filterExpression "QD < 2.0 || FS > 60.0 || MQ < 40.0 || HaplotypeScore > 13.0 || MappingQualityRankSum < -12.5 || ReadPosRankSum < -8.0” \ --filterName "my_snp_filter” \ -o WES_1_filtered_snps.vcf java -jar GenomeAnalysisTK.jar \ -T SelectVariants \ -R hg19.fasta \ -V WES_1_raw_variants.vcf \ -selectType INDEL \ -o WES_1_raw_indels.vcf java -jar GenomeAnalysisTK.jar \ -T VariantFiltration \ -R hg19.fasta \ -V WES_1_raw_indels.vcf \ --filterExpression "QD < 2.0 || FS > 200.0 || ReadPosRankSum < -20.0” \ --filterName "my_indel_filter” \ -o WES_1_filtered_indels.vcf java -jar GenomeAnalysisTK.jar \ -T CombineVariants \ -R hg19.fasta \ -V:WES_1_SNP WES_1_filtered_snps.vcf \ -V:WES_1_INDEL WES_1_filtered_indels.vcf \ -setKey null \ -o WES_1_recalibrated_variants.vcf A B C D E 142 Annotating, Filtering and Combining Multiple Samples Further required processing of each VCF file was performed by including variant coverage so that the highest quality SNVs and INDELs could be extracted. To do this, we used a combination of GATK tools and SnpEff. We used the VariantAnnotator tool within GATK to add the coverage of each variant into our VCF file (example line of code in Figure 5.9 A). SnpEff is a package that allows for extracting potential impacts of each variant, and was used to prepare the file for filtering based on this information later (example line of code in Figure 5.9 B). Next, we prepared the VCF file to be filtered by SnpSift by using GATKs VariantAnnotator tool (example line of code in Figure 5.9 C). The VCF was partially filtered using this added information by setting specific rules for handling INDELs and SNVs separately. INDEL filtering relied on two different pieces of information stored in the VCF files from the previous steps. The variant quality score calculated from GATK for each INDEL, which was ≥30 and “PASS” indicated from the previous GATK steps, was retained for downstream analyses (example lines of code in Figure 5.7 B and Figure 5.10 A top). To filter SNVs, we used the depth of sequencing (DP), genotype quality score (GQ), allele frequency (AF) and allele sequencing depth (AD) information contained in the VCF file. We kept SNVs with both DP≥10 and GQ>60 and either an AF=0.5 and AD≥5 or AF=1 and AD<3 (example line of code Figure 5.10 A bottom). These rules eliminated low frequency mutations that do not have proper sequencing coverage and those for which the allele count cannot be determined because of either low sequence coverage or an insufficient number of paired reads. Following these rules, the INDELs and SNVs that remained have a 99.9% chance of being correct. Figure 5.9: Variant Annotation Preparation. (A) Annotation of variants using the VariantAnnotator tool from GATK by adding variant coverage information. (B) Using snpEff to prepare VCF file for adding functional mutation information to VCF. (C) Adding the newly prepared snpEff information and binding it to the VCF file. Functions or specific tools are indicated in red, input files in blue and output files in green. 143 java -Xmx2g -jar GenomeAnalysisTK.jar \ -R hg19.fasta \ -nt 8 \ -T VariantAnnotator \ -I WES_1_recal_reads.bam \ -o WES_1_recalibrated_variants_annotated.vcf \ -A Coverage \ --variant WES_1_recalibrated_variants.vcf \ --dbsnp dbsnp_138.hg19.vcf java -Xmx4g –jar snpEff/snpEff.jar \ -v -o gatk \ GRCh37.64 \ WES_1_recalibrated_variants_annotated.vcf \ > WES_1_recalibrated_variants_annotated_snpeff.vcf java -Xmx4g -jar GenomeAnalysisTK.jar -T VariantAnnotator \ -nt 16 \ -R hg19.fasta \ -A SnpEff \ --variant WES_1_recalibrated_variants_annotated.vcf \ --snpEffFile WES_1_recalibrated_variants_annotated_snpeff.vcf \ -o WES_1_recalibrated_variants_annotated_snpeff_GATK.vcf A B C Figure 5.10: Filtering, Combining and Annotating VCF Files. (A) Filtering INDELs using the SnpSift function within snpEFF using specific quality scores (Top). Filtering SNVs by coverage, allele frequency and genotype quality. (B) Combining multiple WES samples into one merged VCF file using the GATK CombineVariants function. (C) Extracting specific pieces of information from the merged VCF to generate a user friendly tab-delimitated file. Functions or specific tools are indicated in red, input files in blue and output files in green. 144 cat WES_1_recalibrated_variants_annotated_snpeff_GATK.vcf | \ java -jar snpEff/SnpSift.jar \ filter "((( exists INDEL ) & (QUAL >= 20)) | (QUAL >= 30 )) & (FILTER = 'PASS')” \ > WES_1_recalibrated_variants_annotated_snpeff_GATK_INDEL_filt.vcf cat WES_1_recalibrated_variants_annotated_snpeff_GATK_INDEL_filt.vcf | \ java -jar snpEff/SnpSift.jar \ filter "(DP >= 10) & ( GEN[0].GQ > 60 ) & ((( AF = 0.500 ) & ( GEN[0].AD[0] >= 5 )) | ((AF = 1.00 ) & ( GEN[0].AD[0] < 3 )))” \ > WES_1_recalibrated_variants_annotated_snpeff_GATK_Filtered.vcf java -jar GenomeAnalysisTK.jar \ -T CombineVariants \ -R hg19.fasta \ -V:WES_1 WES_1_recalibrated_variants_annotated_snpeff_GATK_Filtered.vcf \ -V:WES_2 WES_2_recalibrated_variants_annotated_snpeff_GATK_Filtered.vcf \ -o ALL_WES_recalibrated_variants_annotated_snpeff_GATK_Filtered.vcf java -jar snpEff/SnpSift.jar \ extractFields -s "," -e ".” \ ALL_WES_recalibrated_variants_annotated_snpeff_GATK_Filtered.vcf \ CHROM POS ID REF ALT QUAL \ "GEN[*].GT" "GEN[*].AD" "GEN[*].DP" "GEN[*].GQ" "GEN[*].PL” \ AC AF AN DP MQ QD \ SNPEFF_AMINO_ACID_CHANGE SNPEFF_EFFECT SNPEFF_EXON_ID \ SNPEFF_FUNCTIONAL_CLASS SNPEFF_GENE_BIOTYPE \ SNPEFF_GENE_NAME SNPEFF_IMPACT set \ > ALL_WES_recalibrated_variants_annotated_snpeff_GATK_Filtered_snpEff.txt A B C 145 Since we generated WES data on multiple samples, we combined all variants into one dataset for further analysis. The GATK CombineVariants function that combined individually-filtered VCF files was put into one VCF file (example line of code in Figure 5.10 B). The resulting merged VCF file was then reformatted so that the relevant information was extracted and organized into a usable file type. The SnpSift function within SnpEff was used to convert the merged VCF file to a text file (example line of code in Figure 5.10 C). Using SnpSift, we selected the needed pieces of information for further processing, and included the following fields: chromosome, genomic position, reference allele, alterative allele, quality score, genotype, genotype depth, genotype quality score, allele count, allele frequency, allele number, coverage depth, mapping quality, amino acid change, variant effect, exon affected, variant functional class, gene class, gene name, variant impact and which sample(s) the variant was found in. Subsequently, the variant list is in a format for use in the final filtering steps and analysis. Final Filtering and Isolation of Variants Since we were interested in variants that had potential functional effects and this might cause phenotype changes, we isolated variants in protein coding regions that were non-synonymous. Using the 55 WES datasets from our lab as an example, the general filtering of these variants which used all the aforementioned steps resulted in a total of 473,988 variants before a final filtering criteria could be applied (Figure 5.11 A). To make this list of variants manageable, we applied specific rules to clean up our data set. We first removed variants found in intergenic regions or regions where there was no association with a gene, leaving 115,896 total variants (Figure 5.11 A). The resulting list of variants was further refined by removing any variant which was found to 146 be common in the population with a minor allele frequency >1% or could be mapped to multiple locations. This was performed by using the dbSNP_142 build, and resulted in 40,272 total variants. We next removed any variants that happened to fall in repetitive regions due to the high level of sequencing errors that occur in these regions, leaving 38,969 total variants. Since the dbSNP is constantly updated, we removed any variants that were represented in more than three of our samples or were contained in the corresponding normal tissue samples. As a result of this filtering, a total of 33,414 variants remained. Since our interest was in identifying potential functional mutations, we eliminated any mutation located outside of protein coding regions, resulting in 25,574 variants. Additionally, we removed synonymous variants, leaving 16,894 potentially biologically relevant variants (Figure 5.11 A). Although, 16,894 was a large number, this represents the summary of 55 combined WES samples, and results in an average of 466 variants per sample (range 201 to 1,068) (Figure 5.11 B). The importance of the filtering and annotation process can be visualized in Figure 5.12. Initially, we started with a raw VCF file containing information on SNVs or INDELs generated by the GATK HaplotypeCaller tool (Figure 5.6 A), but the format was difficult to decipher and use (Figure 5.12 A). Through the process described above, we clearly annotated and filtered the variants, and exported the results in an interpretable file format (Figure 5.12 B). To further demonstrate the importance of the aforementioned filtering, we used the ARID1A gene, in which three different variants were initially identified. Figure 5.11: Isolating DNA Sequence Variants for 55 WES Samples. (A) The numbers on the left indicate the number of variants still present in the dataset after the filtering criteria on the right was applied. (B) Final count of mutations per individual WES sample from the 16,894 total variants in the final data file. 147 473,988 • Partially filtered variants from 55 WES samples 115,896 • Only variants which have a potential impact on the gene function or expression 40,272 • Filtered common SNVs with minor allele frequency >1% and mapping to multiple locations using dbSNP 38,969 • Removed variants in repeat regions 33,414 • Removed variants that were represented in more than 3 samples or the corresponding normal showed the variant as a germline event. 25,574 • Variants which are only found in protein coding regions 16,894 • Variants which are considered non-synonymous 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 0 300 600 900 1200 WES Sample # # of Mutations B A Figure 5.12: Pre- and Post-Analyses of Variant Files. (A) Examples of raw variants in VCF format from the GATK HaplotypeCaller before any annotations were added. (B) Example of annotated file using the described methods in the text. The red colored line indicates retention of the variant, the blue content represents an example line that was filtered because of insufficient frequency in this specific data set, and green indicates a filtered variant based on the minor allele frequency from dbSNP. 148 A #CHROM POS ID REF ALT QUAL FILTER chr1 27107116 . A T 1699.77 . chr1 27107650 . TAA T 137.73 . chr1 27107263 rs3841356 T TC 716.73 . INFO FORMAT 1 AC=2;AF=1.00;AN=2;BaseQRankSum=-0.666;ClippingRankSum=0.600;DP=53;FS=0.000;MLEAC=2;MLEAF=1.00;MQ=60.00;MQ0=0;MQRankSum=0.600;QD=32.07;ReadPosRankSum=-1.266 GT:AD:DP:GQ:PL 1/1:1,51:52:99:1728,118,0 AC=1;AF=0.500;AN=2;BaseQRankSum=-0.201;ClippingRankSum=1.524;DP=25;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.201;QD=5.51;ReadPosRankSum=0.299 GT:AD:DP:GQ:PL 0/1:7,7:14:99:175,0,159 AC=1;AF=0.500;AN=2;BaseQRankSum=-0.609;ClippingRankSum=-0.997;DP=70;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.15;MQ0=0;MQRankSum=-0.748;QD=0.50;ReadPosRankSum=-0.582 GT:AD:DP:GQ:PL 0/1:46,11:57:73:73,0,962 #CHROM POS ID REF ALT QUAL GEN[*].GT GEN[*].AD chr1 27107116 . A T 1699.77 1/1 1,51 chr1 27107650 . TAA TA,T 110.73 0/1 . chr1 27107263 rs3841356 T TC 716.73 0/1 25,33 GEN[*].DP GEN[*].GQ GEN[*].PL AC AF AN DP MQ QD 52 99 1728,118,0 2 1 2 59 60 32.07 24 99 . 1,0 0.500,0.00 2 628 . . 58 99 754,0,540 1 0.5 2 269 . . SNPEFF_AMINO_ACID_CHANGE SNPEFF_EFFECT SNPEFF_EXON_ID K1860* STOP_GAINED 19 . DOWNSTREAM . . DOWNSTREAM . SNPEFF_FUNCTIONAL_CLASS SNPEFF_GENE_BIOTYPE NONSENSE protein_coding NONE nonsense_mediated_decay NONE protein_coding SNPEFF_GENE_NAME SNPEFF_IMPACT set ARID1A HIGH WES_1 ARID1A MODIFIER WES_1-WES_2-WES_3-WES_4 ARID1A MODIFIER WES_1 B 149 Two ARID1A variants were filtered from the final dataset by either their frequency in the entire WES dataset or by dbSNP, as shown in blue and green, respectively. The variant shown in red is an example of a SNV that would pass all the filtering criteria and is ready for interpretational for biological significance (Figure 5.12 B). Predicting Copy Number Variations from WES Sequencing Data Another use of WES data is the ability to predict copy number variations (CNV). There are many methods available to call CNVs, but a free and reliable tool called Control-FREEC is compatible with WES data (Boeva et al., 2012; Boeva et al., 2011). This tool inputs the previously-generated BAM files and identifies CNVs across all exonic regions that are sufficiently covered in WES samples. To prepare for CNV calling the BAM file from Figure 5.4 F was used and converted to a bed file using bedtools (example line of code in Figure 5.13 A). We used bedtools genome coverage (genomecov) to determine the coverage of all the exonic regions for Control-FREEC using a floating window (example line of code in Figure 5.13 A). Control-FREEC requires a configuration file with specific information regarding required parameters for CNV calling (example of configuration file in Figure 5.13 B). This file contains the standard protocol that was determined sufficient for WES data. We specifically set window and step sizes to 500 and 250bp, respectively. The window refers to the number of base pairs (bp) averaged together while the step size is the number of bp the window moves after each averaging event. We also specified the expected ploidy, the minimum read count before calling a CNV and a reference CNV file. Figure 5.13: CNV Using Control-FREEC. (A) Use of bedtools to generate a bed file from the BAM file. (B) Example of Control-FREEC configuration file for WES. (C) Run Control-FREEC followed by assigning p-values on CNVs called using an R code file available from Control-FREEC. Functions or specific tools are indicated in red, input files in blue and output files in green. (D) Example of CNVs for an individual WES sample. Blue is loss of CN, red is gain of CN and green is normal CN. 150 bedtools genomecov –ibam WES_1_dedup_reads.bam -bg -scale FLOAT –g HG19_Chrom_Sizes.txt > WES_1_Normalized_RC.bed [general] chrLenFile = hg19length.len noisyData = TRUE; ploidy = 2; readCountThreshold = 50 samtools = samtools; sex = XY; step = 250; window = 500; breakPointThreshold = 1.5 forceGCcontentNormalization = 2; printNA=FALSE; breakPointType = 4 maxThreads=6 chrFiles = ChromosomeFASTQ; degree=3 [sample] mateFile = WES_1_Normalized_RC.bed inputFormat = BED; mateOrientation = 0 [control] mateFile = ALL_Normalized_RC.bed inputFormat = BED; mateOrientation = 0 [target] captureRegions = truseq_exome_targeted_regions.hg19.bed freec -conf config_WES_1_Normalized.txt cat access_signficance.R |R --slave –args WES_1_Normalized_RC.bed_CNVs; WES_1_Normalized_RC.bed_ratio.txt A B C D 6 4 2 0 0.0e+00 1.0e+08 2.0e+-8 Postion, Chr 3 151 The reference CNV file is important for WES data, and a summary of 15 normal WES samples were used as the control for comparison. These WES samples had no CNV changes and provided a baseline to compare all tumor samples too. CNVs called from Control-FREEC were statistically analyzed using both the Wilcoxon test and Komogorov-Smirnov test (example lines of code in Figure 5.13 C) and CNVs with p<0.001 were kept. To visualize the CNVs for a WES sample, scatterplots were generated for each chromosome showing the CNV versus position of variation (Figure 5.13 D). 152 DISCUSSION WES technology represents a significant advancement in high-throughput genomic analysis of the coding regions of more than 20,000 genes. The ability to compare normal and tumor DNA sequences allows for the identification of variations potentially linked to a disease. With the new sequencing technological advances, however, challenges arise with regards to analyzing the generated data and extracting relevant information. Here I describe how WES data were quality checked, mapped to a reference human genome build, realigned around called INDELs, annotated for both SNVs and INDELs, annotated with publically available datasets and gene information, filtered using specific criteria and evaluated for function. All steps were performed using the resources that were proven to provide a quality analysis of WES data. Detecting and calling SNVs and INDELs from WES data is a tedious process but can now be streamlined using freely available resources. However, the interpretation of these results in the context of specific diseases is still challenging. Although publically available databases, such as dbSNP and 1000 genomes, help provide population-based information on variants, the majority of variants are of unknown significance (Xue et al., 2015). The distinction between a rare polymorphism that has a low minor allele frequency and a disease-causing variant is still not clear. As more SNVs in potentially disease-causing genes are identified, the confidence of the variants identified from WES and GWAS studies will increase, although INDELs are still challenging. This indicates the necessity to verify NGS data sets with additional sample collections in order to definitively link specific variants in a certain gene to a particular disease. 153 In summary, robust identification and calling of SNVs and INDELs from WES datasets was performed using verified and established tools like GATK, Picard, BWA and SnpEff. Proper mapping of sequence reads was showcased by realigning reads around INDELs and demonstrating how the filtering and annotation processes leads to specific variants for further interrogation (Figure 5.5 and Figure 5.12). These tools used with the aforementioned parameters, in addition to filtering criteria based on the information generated from these tools, can effectively result in high confidence variants for further studies. 154 CHAPTER 6 DISCUSSION SUMMARY Comprehensive cancer genetic and epigenetic studies using NGS tools reveal a high frequency of mutations in epigenetic related genes as well as distinct alterations to the epigenome (Kundaje et al., 2015; Vogelstein et al., 2013; You and Jones, 2012). Bladder cancer is enriched for mutations in genes that regulate the epigenome in addition to altered DNA methylation patterns (Gui et al., 2011; Network, 2014a; Reinert et al., 2011; Wolff et al., 2010b). Over 75% of muscle-invasive bladder cancers had at least 1 inactivating mutation in a chromatin regulatory gene indicating potentially new therapeutic pathways for bladder cancer (Network, 2014a). Therapies that use drugs to target chromatin modifications could prove useful for bladder cancer treatments based on their mutation spectra (Filippakopoulos et al., 2010). While genetic alterations are prevalent in cancer, epigenome alterations occur more frequently and represent a unique area for diagnosing and treating cancer. Thus the focus of this thesis was to further elucidate the role of epigenetic and genetic alterations in bladder tumorigenesis as well as interrogate the consequence of aberrant expression of specific epigenetic machinery. An ideal therapy to treat cancer would target epigenetic machinery such as DNA methylation with inhibitors as well as tumorigenic cells with cytotoxic drugs. To be able to provide this directed treatment one must characterize and understand aberrations present in both the epigenome and genome of a specific disease. 155 First, I comprehensively characterize the genetic and epigenetic landscapes of mainly high stage muscle-invasive UCC derived cell lines. The characterization of each in vitro model system was done using WES, DNA methylation arrays and the AcceSssIble assay with comparisons to uncultured primary tumor data. Global analysis of mutations and DNA methylation revealed similar findings in the cell lines as the uncultured tumors. Alterations to the epigenome explained more downstream gene expression changes in comparison to genetic alterations. Additionally, aberrantly expressed genes could be explained more so by chromatin accessibility dependent changes than DNA methylation changes. This study also provides a basis for studying different aspects of the disease with an emphasis on genetic and epigenetic backgrounds. Having characterized high stage muscle-invasive UCC using cell lines, I focused on uncultured tumors which were low stage non muscle-invasive UCC. This study was done on 13 patients with metachronous tumors with a focus on the genetic and epigenetic aberrations that were shared between tumors from the same patient. I found that in all cases of low-stage recurrent bladder cancer there was a common ancestral cell population that had mutations in key genes including MLL2 and MUC16. Additionally, I found that epigenetic alterations occurred more frequently than genetic alterations and could also be used in detecting clonal cell populations. Due to the high level of DNA methylation alterations that occur during tumorigenesis I sought to understand how DNA methyltransferases affect the methylome. To do this I interrogated the ability of DNA methyltransferase 3B isoforms to establish DNA methylation in a knock out cell line. I found that DNMT3B isoforms preferentially targeted gene bodies as accessory proteins to DNMT3A. I also confirmed DNMT3Bs 156 functional abilities to restore methylation after a 5-Aza-CdR treatment at target regions marked with the H3K36me3 modification. In addition, pan-cancer analysis revealed overexpression of DNMT3B isoforms in cancer indicating a potential relevance to aberrant DNA methylation patterns seen in cancerous tissues. Finally, I designed an analysis pipeline to process next-generation sequence data with the goal of detecting somatic variants. Analysis of whole-exome sequence data remains challenging both technically and logistically even with publically available resources. Quality somatic variants are difficult to detect without sufficient sequencing coverage as well as addressing the heterogenetic cell populations in primary tissue samples. Using publically available databases and resources as well as strict technical parameters to detect somatic variants, I have addressed these issues and applied it on both cell lines and patient samples. Further study using higher genomic coverage is necessary to determine even smaller sub clonal cell populations from primary tumors and would allow for determining potentially the first tumor-initiating event whether it be a genetic or epigenetic one. 157 FINAL CONCLUSION AND PERSPECTIVE Applications utilizing genome-wide approaches to study genetics and epigenetics allow for detailed analysis and discovery of alterations in cancer and this combination will provide foundations for personalized medicine. Integration of epigenome information and comparing different tissue types to one another has provided a solid basis to study molecular events of human diseases (Kundaje et al., 2015). As studies find more and more SNVs in potentially disease-causing genes using WES and GWAS studies, the confidence of the variants will increase but this is only possible as sample sizes increase as well. Understanding how to process and identify the key mutation or epigenetic drivers is critical for elucidating molecular mechanisms guiding and initiating tumorigenesis (Vogelstein et al., 2013). My own study hardly breached the surface to understanding the interplay between genetics and epigenetics and their roles in carcinogenesis but the genome-wide data that I have gathered and analyzed can provide a foundation for research in the future to elucidate mechanisms of the disease. 158 REFERENCES: Aapola, U., Kawasaki, K., Scott, H.S., Ollila, J., Vihinen, M., Heino, M., Shintani, A., Minoshima, S., Krohn, K., Antonarakis, S.E., et al. (2000). Isolation and initial characterization of a novel zinc finger gene, DNMT3L, on 21q22.3, related to the cytosine-5-methyltransferase 3 gene family. Genomics 65, 293-298. Abecasis, G.R., Auton, A., Brooks, L.D., DePristo, M.A., Durbin, R.M., Handsaker, R.E., Kang, H.M., Marth, G.T., McVean, G.A., and Consortium, G.P. (2012). An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56-65. Alexandrov, L.B., Nik-Zainal, S., Wedge, D.C., Aparicio, S.A., Behjati, S., Biankin, A.V., Bignell, G.R., Bolli, N., Borg, A., Børresen-Dale, A.L., et al. (2013). Signatures of mutational processes in human cancer. Nature 500, 415-421. Armitage, P. (1985). Multistage models of carcinogenesis. Environ Health Perspect 63, 195-201. Babjuk, M., Burger, M., Zigeuner, R., Shariat, S.F., van Rhijn, B.W., Compérat, E., Sylvester, R.J., Kaasinen, E., Böhle, A., Palou Redorta, J., et al. (2013). EAU guidelines on non-muscle-invasive urothelial carcinoma of the bladder: update 2013. Eur Urol 64, 639-653. Bakkar, A.A., Wallerand, H., Radvanyi, F., Lahaye, J.B., Pissard, S., Lecerf, L., Kouyoumdjian, J.C., Abbou, C.C., Pairon, J.C., Jaurand, M.C., et al. (2003). FGFR3 and TP53 gene mutations define two distinct pathways in urothelial cell carcinoma of the bladder. Cancer Res 63, 8108-8112. Balbás-Martínez, C., Sagrera, A., Carrillo-de-Santa-Pau, E., Earl, J., Márquez, M., Vazquez, M., Lapi, E., Castro-Giner, F., Beltran, S., Bayés, M., et al. (2013). Recurrent inactivation of STAG2 in bladder cancer is not associated with aneuploidy. Nat Genet 45, 1464-1469. Barretina, J., Caponigro, G., Stransky, N., Venkatesan, K., Margolin, A.A., Kim, S., Wilson, C.J., Lehár, J., Kryukov, G.V., Sonkin, D., et al. (2012). The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 483, 603-607. Batsché, E., Muchardt, C., Behrens, J., Hurst, H.C., and Crémisi, C. (1998). RB and c- Myc activate expression of the E-cadherin gene in epithelial cells through interaction with transcription factor AP-2. Mol Cell Biol 18, 3647-3658. Baubec, T., Colombo, D.F., Wirbelauer, C., Schmidt, J., Burger, L., Krebs, A.R., Akalin, A., and Schübeler, D. (2015). Genomic profiling of DNA methyltransferases reveals a role for DNMT3B in genic methylation. Nature. 159 Baylin, S.B., and Jones, P.A. (2011). A decade of exploring the cancer epigenome - biological and translational implications. Nat Rev Cancer 11, 726-734. Bell, O., Tiwari, V.K., Thomä, N.H., and Schübeler, D. (2011). Determinants and dynamics of genome accessibility. Nat Rev Genet 12, 554-564. Bentley, J., Diggle, C.P., Harnden, P., Knowles, M.A., and Kiltie, A.E. (2004). DNA double strand break repair in human bladder cancer is error prone and involves microhomology-associated end-joining. Nucleic Acids Res 32, 5249-5259. Bernstein, B.E., Liu, C.L., Humphrey, E.L., Perlstein, E.O., and Schreiber, S.L. (2004). Global nucleosome occupancy in yeast. Genome Biol 5, R62. Bilgrami, S.M., Qureshi, S.A., Pervez, S., and Abbas, F. (2014). Promoter hypermethylation of tumor suppressor genes correlates with tumor grade and invasiveness in patients with urothelial bladder cancer. Springerplus 3, 178. Billerey, C., Chopin, D., Aubriot-Lorton, M.H., Ricol, D., Gil Diez de Medina, S., Van Rhijn, B., Bralet, M.P., Lefrere-Belda, M.A., Lahaye, J.B., Abbou, C.C., et al. (2001). Frequent FGFR3 mutations in papillary non-invasive bladder (pTa) tumors. Am J Pathol 158, 1955-1959. Biniszkiewicz, D., Gribnau, J., Ramsahoye, B., Gaudet, F., Eggan, K., Humpherys, D., Mastrangelo, M.A., Jun, Z., Walter, J., and Jaenisch, R. (2002). Dnmt1 overexpression causes genomic hypermethylation, loss of imprinting, and embryonic lethality. Mol Cell Biol 22, 2124-2135. Bioinformatics, B. FASTQC Available at:http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc. Bird, A. (2002). DNA methylation patterns and epigenetic memory. Genes Dev 16, 6-21. Bishop, J.M. (1983). Cellular oncogenes and retroviruses. Annu Rev Biochem 52, 301- 354. Boeva, V., Popova, T., Bleakley, K., Chiche, P., Cappo, J., Schleiermacher, G., Janoueix- Lerosey, I., Delattre, O., and Barillot, E. (2012). Control-FREEC: a tool for assessing copy number and allelic content using next-generation sequencing data. Bioinformatics 28, 423-425. Boeva, V., Zinovyev, A., Bleakley, K., Vert, J.P., Janoueix-Lerosey, I., Delattre, O., and Barillot, E. (2011). Control-free calling of copy number alterations in deep-sequencing data using GC-content normalization. Bioinformatics 27, 268-269. Bostick, M., Kim, J.K., Estève, P.O., Clark, A., Pradhan, S., and Jacobsen, S.E. (2007). UHRF1 plays a role in maintaining DNA methylation in mammalian cells. Science 317, 1760-1764. 160 Bourc'his, D., Xu, G.L., Lin, C.S., Bollman, B., and Bestor, T.H. (2001). Dnmt3L and the establishment of maternal genomic imprints. Science 294, 2536-2539. Bozic, I., Antal, T., Ohtsuki, H., Carter, H., Kim, D., Chen, S., Karchin, R., Kinzler, K.W., Vogelstein, B., and Nowak, M.A. (2010). Accumulation of driver and passenger mutations during tumor progression. Proc Natl Acad Sci U S A 107, 18545-18550. Brandau, S., and Böhle, A. (2001). Bladder cancer. I. Molecular and genetic basis of carcinogenesis. Eur Urol 39, 491-497. Brown, R.C., Pattison, S., van Ree, J., Coghill, E., Perkins, A., Jane, S.M., and Cunningham, J.M. (2002). Distinct domains of erythroid Krüppel-like factor modulate chromatin remodeling and transactivation at the endogenous beta-globin gene promoter. Mol Cell Biol 22, 161-170. Burns, M.B., Temiz, N.A., and Harris, R.S. (2013). Evidence for APOBEC3B mutagenesis in multiple human cancers. Nat Genet 45, 977-983. Byun, H.M., Wong, H.L., Birnstein, E.A., Wolff, E.M., Liang, G., and Yang, A.S. (2007). Examination of IGF2 and H19 loss of imprinting in bladder cancer. Cancer Res 67, 10753-10758. Cairns, P. (2007). Gene methylation and early detection of genitourinary cancer: the road ahead. Nat Rev Cancer 7, 531-543. Cairns, P., Mao, L., Merlo, A., Lee, D.J., Schwab, D., Eby, Y., Tokino, K., van der Riet, P., Blaugrund, J.E., and Sidransky, D. (1994). Rates of p16 (MTS1) mutations in primary tumors with 9p loss. Science 265, 415-417. Cappellen, D., De Oliveira, C., Ricol, D., de Medina, S., Bourdin, J., Sastre-Garau, X., Chopin, D., Thiery, J.P., and Radvanyi, F. (1999). Frequent activating mutations of FGFR3 in human bladder and cervix carcinomas. Nat Genet 23, 18-20. Chatterjee, S.J., George, B., Goebell, P.J., Alavi-Tafreshi, M., Shi, S.R., Fung, Y.K., Jones, P.A., Cordon-Cardo, C., Datar, R.H., and Cote, R.J. (2004). Hyperphosphorylation of pRb: a mechanism for RB tumour suppressor pathway inactivation in bladder cancer. J Pathol 203, 762-770. Chen, T., Hevi, S., Gay, F., Tsujimoto, N., He, T., Zhang, B., Ueda, Y., and Li, E. (2007). Complete inactivation of DNMT1 leads to mitotic catastrophe in human cancer cells. Nat Genet 39, 391-396. Chen, T., and Li, E. (2006). Establishment and maintenance of DNA methylation patterns in mammals. Curr Top Microbiol Immunol 301, 179-201. Chen, T., Ueda, Y., Dodge, J.E., Wang, Z., and Li, E. (2003). Establishment and maintenance of genomic methylation patterns in mouse embryonic stem cells by Dnmt3a and Dnmt3b. Mol Cell Biol 23, 5594-5605. 161 Chen, T., Ueda, Y., Xie, S., and Li, E. (2002). A novel Dnmt3a isoform produced from an alternative promoter localizes to euchromatin and its expression correlates with active de novo methylation. J Biol Chem 277, 38746-38754. Cheng, L., Gu, J., Ulbright, T.M., MacLennan, G.T., Sweeney, C.J., Zhang, S., Sanchez, K., Koch, M.O., and Eble, J.N. (2002). Precise microdissection of human bladder carcinomas reveals divergent tumor subclones in the same tumor. Cancer 94, 104-110. Chi, P., Allis, C.D., and Wang, G.G. (2010). Covalent histone modifications--miswritten, misinterpreted and mis-erased in human cancers. Nat Rev Cancer 10, 457-469. Chissoe, S.L., Bodenteich, A., Wang, Y.F., Wang, Y.P., Burian, D., Clifton, S.W., Crabtree, J., Freeman, A., Iyer, K., and Jian, L. (1995). Sequence and analysis of the human ABL gene, the BCR gene, and regions involved in the Philadelphia chromosomal translocation. Genomics 27, 67-82. Choi, S.H., Heo, K., Byun, H.M., An, W., Lu, W., and Yang, A.S. (2011). Identification of preferential target sites for human DNA methyltransferases. Nucleic Acids Res 39, 104-118. Cingolani, P., Platts, A., Wang, l.L., Coon, M., Nguyen, T., Wang, L., Land, S.J., Lu, X., and Ruden, D.M. (2012). A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80-92. Clement, N.L., Snell, Q., Clement, M.J., Hollenhorst, P.C., Purwar, J., Graves, B.J., Cairns, B.R., and Johnson, W.E. (2010). The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics 26, 38-45. Cote, R.J., Dunn, M.D., Chatterjee, S.J., Stein, J.P., Shi, S.R., Tran, Q.C., Hu, S.X., Xu, H.J., Groshen, S., Taylor, C.R., et al. (1998). Elevated and absent pRb expression is associated with bladder cancer progression and has cooperative effects with p53. Cancer Res 58, 1090-1094. Dalgliesh, G.L., Furge, K., Greenman, C., Chen, L., Bignell, G., Butler, A., Davies, H., Edkins, S., Hardy, C., Latimer, C., et al. (2010). Systematic sequencing of renal carcinoma reveals inactivation of histone modifying genes. Nature 463, 360-363. De Carvalho, D.D., Sharma, S., You, J.S., Su, S.F., Taberlay, P.C., Kelly, T.K., Yang, X., Liang, G., and Jones, P.A. (2012). DNA methylation screening identifies driver epigenetic events of cancer cell survival. Cancer Cell 21, 655-667. di Martino, E., L'Hôte, C.G., Kennedy, W., Tomlinson, D.C., and Knowles, M.A. (2009). Mutant fibroblast growth factor receptor 3 induces intracellular signaling and cellular transformation in a cell type- and mutation-specific manner. Oncogene 28, 4306-4316. Di Noia, J.M., and Neuberger, M.S. (2007). Molecular mechanisms of antibody somatic hypermutation. Annu Rev Biochem 76, 1-22. 162 Dudziec, E., Gogol-Döring, A., Cookson, V., Chen, W., and Catto, J. (2012). Integrated epigenome profiling of repressive histone modifications, DNA methylation and gene expression in normal and malignant urothelial cells. PLoS One 7, e32750. Dulaimi, E., Uzzo, R.G., Greenberg, R.E., Al-Saleem, T., and Cairns, P. (2004). Detection of bladder cancer in urine by a tumor suppressor gene hypermethylation panel. Clin Cancer Res 10, 1887-1893. Eads, C.A., Lord, R.V., Kurumboor, S.K., Wickramasinghe, K., Skinner, M.L., Long, T.I., Peters, J.H., DeMeester, T.R., Danenberg, K.D., Danenberg, P.V., et al. (2000). Fields of aberrant CpG island hypermethylation in Barrett's esophagus and associated adenocarcinoma. Cancer Res 60, 5021-5026. Eden, A., Gaudet, F., Waghmare, A., and Jaenisch, R. (2003). Chromosomal instability and tumors promoted by DNA hypomethylation. Science 300, 455. Egger, G., Jeong, S., Escobar, S.G., Cortez, C.C., Li, T.W., Saito, Y., Yoo, C.B., Jones, P.A., and Liang, G. (2006). Identification of DNMT1 (DNA methyltransferase 1) hypomorphs in somatic knockouts suggests an essential role for DNMT1 in cell survival. Proc Natl Acad Sci U S A 103, 14080-14085. Ehrlich, M. (2009). DNA hypomethylation in cancer cells. Epigenomics 1, 239-259. Esteller, M. (2007). Cancer epigenomics: DNA methylomes and histone-modification maps. Nat Rev Genet 8, 286-298. Esteller, M. (2011). Cancer Epigenetics for the 21st Century: What's Next? Genes Cancer 2, 604-606. Feng, Z., Hu, W., Rom, W.N., Beland, F.A., and Tang, M.S. (2002). 4-aminobiphenyl is a major etiological agent of human bladder cancer: evidence from its DNA binding spectrum in human p53 gene. Carcinogenesis 23, 1721-1727. Ferlay, J., Shin, H.R., Bray, F., Forman, D., Mathers, C., and Parkin, D.M. (2010). Estimates of worldwide burden of cancer in 2008: GLOBOCAN 2008. Int J Cancer 127, 2893-2917. Filippakopoulos, P., Qi, J., Picaud, S., Shen, Y., Smith, W.B., Fedorov, O., Morse, E.M., Keates, T., Hickman, T.T., Felletar, I., et al. (2010). Selective inhibition of BET bromodomains. Nature 468, 1067-1073. Fischer, A., Vázquez-García, I., Illingworth, C.J., and Mustonen, V. (2014). High- definition reconstruction of clonal composition in cancer. Cell Rep 7, 1740-1752. Forbes, S.A., Bindal, N., Bamford, S., Cole, C., Kok, C.Y., Beare, D., Jia, M., Shepherd, R., Leung, K., Menzies, A., et al. (2011). COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res 39, D945-950. 163 Friedrich, M.G., Chandrasoma, S., Siegmund, K.D., Weisenberger, D.J., Cheng, J.C., Toma, M.I., Huland, H., Jones, P.A., and Liang, G. (2005). Prognostic relevance of methylation markers in patients with non-muscle invasive bladder carcinoma. Eur J Cancer 41, 2769-2778. Friedrich, M.G., Weisenberger, D.J., Cheng, J.C., Chandrasoma, S., Siegmund, K.D., Gonzalgo, M.L., Toma, M.I., Huland, H., Yoo, C., Tsai, Y.C., et al. (2004). Detection of methylated apoptosis-associated genes in urine sediments of bladder cancer patients. Clin Cancer Res 10, 7457-7465. Gal-Yam, E.N., Egger, G., Iniguez, L., Holster, H., Einarsson, S., Zhang, X., Lin, J.C., Liang, G., Jones, P.A., and Tanay, A. (2008). Frequent switching of Polycomb repressive marks and DNA hypermethylation in the PC3 prostate cancer cell line. Proc Natl Acad Sci U S A 105, 12979-12984. Garcia-España, A., Salazar, E., Sun, T.T., Wu, X.R., and Pellicer, A. (2005). Differential expression of cell cycle regulators in phenotypic variants of transgenically induced bladder tumors: implications for tumor behavior. Cancer Res 65, 1150-1157. Gaudet, F., Hodgson, J.G., Eden, A., Jackson-Grusby, L., Dausman, J., Gray, J.W., Leonhardt, H., and Jaenisch, R. (2003). Induction of tumors in mice by genomic hypomethylation. Science 300, 489-492. Goebell, P.J., and Knowles, M.A. (2010). Bladder cancer or bladder cancers? Genetically distinct malignant conditions of the urothelium. Urol Oncol 28, 409-428. Goll, M.G., and Bestor, T.H. (2005). Eukaryotic cytosine methyltransferases. Annu Rev Biochem 74, 481-514. Gonzalgo, M.L., Liang, G., Spruck, C.H., Zingg, J.M., Rideout, W.M., and Jones, P.A. (1997). Identification and characterization of differentially methylated regions of genomic DNA by methylation-sensitive arbitrarily primed PCR. Cancer Res 57, 594-599. Goodman, J.I., and Watson, R.E. (2002). Altered DNA methylation: a secondary mechanism involved in carcinogenesis. Annu Rev Pharmacol Toxicol 42, 501-525. Gopalakrishnan, S., Van Emburgh, B.O., Shan, J., Su, Z., Fields, C.R., Vieweg, J., Hamazaki, T., Schwartz, P.H., Terada, N., and Robertson, K.D. (2009). A novel DNMT3B splice variant expressed in tumor and pluripotent cells modulates genomic DNA methylation patterns and displays altered DNA binding. Mol Cancer Res 7, 1622- 1634. Gordon, C.A., Hartono, S.R., and Chédin, F. (2013). Inactive DNMT3B splice variants modulate de novo DNA methylation. PLoS One 8, e69486. Gowher, H., Liebert, K., Hermann, A., Xu, G., and Jeltsch, A. (2005). Mechanism of stimulation of catalytic activity of Dnmt3A and Dnmt3B DNA-(cytosine-C5)- methyltransferases by Dnmt3L. J Biol Chem 280, 13341-13348. 164 Grand, F., Kulkarni, S., Chase, A., Goldman, J.M., Gordon, M., and Cross, N.C. (1999). Frequent deletion of hSNF5/INI1, a component of the SWI/SNF complex, in chronic myeloid leukemia. Cancer Res 59, 3870-3874. Greger, V., Passarge, E., Höpping, W., Messmer, E., and Horsthemke, B. (1989). Epigenetic changes may contribute to the formation and spontaneous regression of retinoblastoma. Hum Genet 83, 155-158. Gui, Y., Guo, G., Huang, Y., Hu, X., Tang, A., Gao, S., Wu, R., Chen, C., Li, X., Zhou, L., et al. (2011). Frequent mutations of chromatin remodeling genes in transitional cell carcinoma of the bladder. Nat Genet 43, 875-878. Guo, G., Sun, X., Chen, C., Wu, S., Huang, P., Li, Z., Dean, M., Huang, Y., Jia, W., Zhou, Q., et al. (2013). Whole-genome and whole-exome sequencing of bladder cancer identifies frequent alterations in genes involved in sister chromatid cohesion and segregation. Nat Genet 45, 1459-1463. Guo, X., Wang, L., Li, J., Ding, Z., Xiao, J., Yin, X., He, S., Shi, P., Dong, L., Li, G., et al. (2015). Structural insight into autoinhibition and histone H3-induced activation of DNMT3A. Nature 517, 640-644. Haber, D.A., and Settleman, J. (2007). Cancer: drivers and passengers. Nature 446, 145- 146. Hafner, C., Knuechel, R., Zanardo, L., Dietmaier, W., Blaszyk, H., Cheville, J., Hofstaedter, F., and Hartmann, A. (2001). Evidence for oligoclonality and tumor spread by intraluminal seeding in multifocal urothelial carcinomas of the upper and lower urinary tract. Oncogene 20, 4910-4915. Hamamoto, R., Furukawa, Y., Morita, M., Iimura, Y., Silva, F.P., Li, M., Yagyu, R., and Nakamura, Y. (2004). SMYD3 encodes a histone methyltransferase involved in the proliferation of cancer cells. Nat Cell Biol 6, 731-740. Hanahan, D., and Weinberg, R.A. (2011). Hallmarks of cancer: the next generation. Cell 144, 646-674. Hata, K., Okano, M., Lei, H., and Li, E. (2002). Dnmt3L cooperates with the Dnmt3 family of de novo DNA methyltransferases to establish maternal imprints in mice. Development 129, 1983-1993. Herman, J.G., Merlo, A., Mao, L., Lapidus, R.G., Issa, J.P., Davidson, N.E., Sidransky, D., and Baylin, S.B. (1995). Inactivation of the CDKN2/p16/MTS1 gene is frequently associated with aberrant DNA methylation in all common human cancers. Cancer Res 55, 4525-4530. Herman, J.G., Umar, A., Polyak, K., Graff, J.R., Ahuja, N., Issa, J.P., Markowitz, S., Willson, J.K., Hamilton, S.R., Kinzler, K.W., et al. (1998). Incidence and functional 165 consequences of hMLH1 promoter hypermethylation in colorectal carcinoma. Proc Natl Acad Sci U S A 95, 6870-6875. Heyn, H., and Esteller, M. (2012). DNA methylation profiling in the clinic: applications and challenges. Nat Rev Genet 13, 679-692. Irizarry, R.A., Hobbs, B., Collin, F., Beazer-Barclay, Y.D., Antonellis, K.J., Scherf, U., and Speed, T.P. (2003). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4, 249-264. Iyer, G., Al-Ahmadie, H., Schultz, N., Hanrahan, A.J., Ostrovnaya, I., Balar, A.V., Kim, P.H., Lin, O., Weinhold, N., Sander, C., et al. (2013). Prevalence and co-occurrence of actionable genomic alterations in high-grade bladder cancer. J Clin Oncol 31, 3133-3140. Jebar, A.H., Hurst, C.D., Tomlinson, D.C., Johnston, C., Taylor, C.F., and Knowles, M.A. (2005). FGFR3 and Ras gene mutations are mutually exclusive genetic events in urothelial cell carcinoma. Oncogene 24, 5218-5225. Jeong, S., Liang, G., Sharma, S., Lin, J.C., Choi, S.H., Han, H., Yoo, C.B., Egger, G., Yang, A.S., and Jones, P.A. (2009). Selective anchoring of DNA methyltransferases 3A and 3B to nucleosomes containing methylated DNA. Mol Cell Biol 29, 5366-5376. Jia, D., Jurkowska, R.Z., Zhang, X., Jeltsch, A., and Cheng, X. (2007). Structure of Dnmt3a bound to Dnmt3L suggests a model for de novo DNA methylation. Nature 449, 248-251. Jones, P.A. (2012). Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat Rev Genet 13, 484-492. Jones, P.A., and Baylin, S.B. (2007). The epigenomics of cancer. Cell 128, 683-692. Jones, P.A., and Laird, P.W. (1999). Cancer epigenetics comes of age. Nat Genet 21, 163-167. Jones, P.A., and Liang, G. (2009). Rethinking how DNA methylation patterns are maintained. Nat Rev Genet 10, 805-811. Juanpere, N., Agell, L., Lorenzo, M., de Muga, S., López-Vilaró, L., Murillo, R., Mojal, S., Serrano, S., Lorente, J.A., Lloreta, J., et al. (2012). Mutations in FGFR3 and PIK3CA, singly or combined with RAS and AKT1, are associated with AKT but not with MAPK pathway activation in urothelial bladder cancer. Hum Pathol 43, 1573-1582. Jurkowska, R.Z., Jurkowski, T.P., and Jeltsch, A. (2011). Structure and function of mammalian DNA methyltransferases. Chembiochem 12, 206-222. Kandoth, C., McLellan, M.D., Vandin, F., Ye, K., Niu, B., Lu, C., Xie, M., Zhang, Q., McMichael, J.F., Wyczalkowski, M.A., et al. (2013). Mutational landscape and significance across 12 major cancer types. Nature 502, 333-339. 166 Kantor, A.F., Hartge, P., Hoover, R.N., Narayana, A.S., Sullivan, J.W., and Fraumeni, J.F. (1984). Urinary tract infection and risk of bladder cancer. Am J Epidemiol 119, 510- 515. Kautiainen, T.L., and Jones, P.A. (1986). DNA methyltransferase levels in tumorigenic and nontumorigenic cells in culture. J Biol Chem 261, 1594-1598. Kelly, T.K., Liu, Y., Lay, F.D., Liang, G., Berman, B.P., and Jones, P.A. (2012). Genome-wide mapping of nucleosome positioning and DNA methylation within individual DNA molecules. Genome Res 22, 2497-2506. Kim, Y.W., Yoon, H.Y., Seo, S.P., Lee, S.K., Kang, H.W., Kim, W.T., Bang, H.J., Ryu, D.H., Yun, S.J., Lee, S.C., et al. (2015). Clinical Implications and Prognostic Values of Prostate Cancer Susceptibility Candidate Methylation in Primary Nonmuscle Invasive Bladder Cancer. Dis Markers 2015, 402963. Knowles, M.A. (2007). Tumor suppressor loci in bladder cancer. Front Biosci 12, 2233- 2251. Knowles, M.A., and Hurst, C.D. (2015). Molecular biology of bladder cancer: new insights into pathogenesis and clinical diversity. Nat Rev Cancer 15, 25-41. Knudson, A.G. (1971). Mutation and cancer: statistical study of retinoblastoma. Proc Natl Acad Sci U S A 68, 820-823. Koboldt, D.C., Chen, K., Wylie, T., Larson, D.E., McLellan, M.D., Mardis, E.R., Weinstock, G.M., Wilson, R.K., and Ding, L. (2009). VarScan: variant detection in massively parallel sequencing of individual and pooled samples. Bioinformatics 25, 2283-2285. Kompier, L.C., Lurkin, I., van der Aa, M.N., van Rhijn, B.W., van der Kwast, T.H., and Zwarthoff, E.C. (2010). FGFR3, HRAS, KRAS, NRAS and PIK3CA mutations in bladder cancer and their potential as biomarkers for surveillance and therapy. PLoS One 5, e13821. Kondo, Y., Shen, L., Cheng, A.S., Ahmed, S., Boumber, Y., Charo, C., Yamochi, T., Urano, T., Furukawa, K., Kwabi-Addo, B., et al. (2008). Gene silencing in cancer by histone H3 lysine 27 trimethylation independent of promoter DNA methylation. Nat Genet 40, 741-750. Kulis, M., Heath, S., Bibikova, M., Queirós, A.C., Navarro, A., Clot, G., Martínez- Trillos, A., Castellano, G., Brun-Heath, I., Pinyol, M., et al. (2012). Epigenomic analysis detects widespread gene-body DNA hypomethylation in chronic lymphocytic leukemia. Nat Genet 44, 1236-1242. Kundaje, A., Meuleman, W., Ernst, J., Bilenky, M., Yen, A., Heravi-Moussavi, A., Kheradpour, P., Zhang, Z., Wang, J., Ziller, M.J., et al. (2015). Integrative analysis of 111 reference human epigenomes. Nature 518, 317-330. 167 Kuphal, S., Martyn, A.C., Pedley, J., Crowther, L.M., Bonazzi, V.F., Parsons, P.G., Bosserhoff, A.K., Hayward, N.K., and Boyle, G.M. (2009). H-cadherin expression reduces invasion of malignant melanoma. Pigment Cell Melanoma Res 22, 296-306. Laird, P.W., and Jaenisch, R. (1996). The role of DNA methylation in cancer genetic and epigenetics. Annu Rev Genet 30, 441-464. Langmead, B., and Salzberg, S.L. (2012). Fast gapped-read alignment with Bowtie 2. Nat Methods 9, 357-359. Langmead, B., Trapnell, C., Pop, M., and Salzberg, S.L. (2009). Ultrafast and memory- efficient alignment of short DNA sequences to the human genome. Genome Biol 10, R25. Lawrence, M.S., Stojanov, P., Polak, P., Kryukov, G.V., Cibulskis, K., Sivachenko, A., Carter, S.L., Stewart, C., Mermel, C.H., Roberts, S.A., et al. (2013). Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214- 218. Lay, F.D., Liu, Y., Kelly, T.K., Witt, H., Farnham, P.J., Jones, P.A., and Berman, B.P. (2015). The role of DNA methylation in directing the functional organization of the cancer epigenome. Genome Res 25, 467-477. Lee, C.K., Shibata, Y., Rao, B., Strahl, B.D., and Lieb, J.D. (2004). Evidence for nucleosome depletion at active regulatory regions genome-wide. Nat Genet 36, 900-905. Lei, H., Oh, S.P., Okano, M., Jüttermann, R., Goss, K.A., Jaenisch, R., and Li, E. (1996). De novo DNA cytosine methyltransferase activities in mouse embryonic stem cells. Development 122, 3195-3205. Li, H., and Durbin, R. (2009). Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics 25, 1754-1760. Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., Marth, G., Abecasis, G., Durbin, R., and Subgroup, G.P.D.P. (2009a). The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-2079. Li, H., Ruan, J., and Durbin, R. (2008). Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res 18, 1851-1858. Li, R., Li, Y., Fang, X., Yang, H., Wang, J., and Kristiansen, K. (2009b). SNP detection for massively parallel whole-genome resequencing. Genome Res 19, 1124-1132. Liang, G., Chan, M.F., Tomigahara, Y., Tsai, Y.C., Gonzales, F.A., Li, E., Laird, P.W., and Jones, P.A. (2002a). Cooperativity between DNA methyltransferases in the maintenance methylation of repetitive elements. Mol Cell Biol 22, 480-491. 168 Liang, G., Gonzalgo, M.L., Salem, C., and Jones, P.A. (2002b). Identification of DNA methylation differences during tumorigenesis by methylation-sensitive arbitrarily primed polymerase chain reaction. Methods 27, 150-155. Liang, G., Salem, C.E., Yu, M.C., Nguyen, H.D., Gonzales, F.A., Nguyen, T.T., Nichols, P.W., and Jones, P.A. (1998). DNA methylation differences associated with tumor tissues identified by genome scanning analysis. Genomics 53, 260-268. Liao, J., Karnik, R., Gu, H., Ziller, M.J., Clement, K., Tsankov, A.M., Akopian, V., Gifford, C.A., Donaghey, J., Galonska, C., et al. (2015). Targeted disruption of DNMT1, DNMT3A and DNMT3B in human embryonic stem cells. Nat Genet. Lin, J.C., Jeong, S., Liang, G., Takai, D., Fatemi, M., Tsai, Y.C., Egger, G., Gal-Yam, E.N., and Jones, P.A. (2007). Role of nucleosomal occupancy in the epigenetic silencing of the MLH1 CpG island. Cancer Cell 12, 432-444. Lin, Y.L., Ma, J.H., Luo, X.L., Guan, T.Y., and Li, Z.G. (2013). Clinical significance of protocadherin-8 (PCDH8) promoter methylation in bladder cancer. J Int Med Res 41, 48- 54. Lin, Y.L., Wang, Y.L., Ma, J.G., and Li, W.P. (2014). Clinical significance of protocadherin 8 (PCDH8) promoter methylation in non-muscle invasive bladder cancer. J Exp Clin Cancer Res 33, 68. Linhart, H.G., Lin, H., Yamada, Y., Moran, E., Steine, E.J., Gokhale, S., Lo, G., Cantu, E., Ehrich, M., He, T., et al. (2007). Dnmt3b promotes tumorigenesis in vivo by gene- specific de novo methylation and transcriptional silencing. Genes Dev 21, 3110-3122. Liu, R., Liu, H., Chen, X., Kirby, M., Brown, P.O., and Zhao, K. (2001). Regulation of CSF1 promoter by the SWI/SNF-like BAF complex. Cell 106, 309-318. Lucifero, D., La Salle, S., Bourc'his, D., Martel, J., Bestor, T.H., and Trasler, J.M. (2007). Coordinate regulation of DNA methyltransferase expression during oogenesis. BMC Dev Biol 7, 36. López-Knowles, E., Hernández, S., Malats, N., Kogevinas, M., Lloreta, J., Carrato, A., Tardón, A., Serra, C., and Real, F.X. (2006). PIK3CA mutations are an early genetic alteration associated with FGFR3 mutations in superficial papillary bladder tumors. Cancer Res 66, 7401-7404. Maley, C.C., Galipeau, P.C., Li, X., Sanchez, C.A., Paulson, T.G., and Reid, B.J. (2004). Selectively advantageous mutations and hitchhikers in neoplasms: p16 lesions are selected in Barrett's esophagus. Cancer Res 64, 3414-3427. Marchong, M.N., Chen, D., Corson, T.W., Lee, C., Harmandayan, M., Bowles, E., Chen, N., and Gallie, B.L. (2004). Minimal 16q genomic loss implicates cadherin-11 in retinoblastoma. Mol Cancer Res 2, 495-503. 169 Markl, I.D., Cheng, J., Liang, G., Shibata, D., Laird, P.W., and Jones, P.A. (2001). Global and gene-specific epigenetic patterns in human bladder cancer genomes are relatively stable in vivo and in vitro over time. Cancer Res 61, 5875-5884. Markl, I.D., and Jones, P.A. (1998). Presence and location of TP53 mutation determines pattern of CDKN2A/ARF pathway inactivation in bladder cancer. Cancer Res 58, 5348- 5353. Massingham, T., and Goldman, N. (2012). All Your Base: a fast and accurate probabilistic approach to base calling. Genome Biol 13, R13. Maunakea, A.K., Nagarajan, R.P., Bilenky, M., Ballinger, T.J., D'Souza, C., Fouse, S.D., Johnson, B.E., Hong, C., Nielsen, C., Zhao, Y., et al. (2010). Conserved role of intragenic DNA methylation in regulating alternative promoters. Nature 466, 253-257. Maverakis, E., Kim, K., Shimoda, M., Gershwin, M.E., Patel, F., Wilken, R., Raychaudhuri, S., Ruhaak, L.R., and Lebrilla, C.B. (2015). Glycans in the immune system and The Altered Glycan Theory of Autoimmunity: a critical review. J Autoimmun 57, 1-13. McKenna, A., Hanna, M., Banks, E., Sivachenko, A., Cibulskis, K., Kernytsky, A., Garimella, K., Altshuler, D., Gabriel, S., Daly, M., et al. (2010). The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res 20, 1297-1303. Meissner, A., Mikkelsen, T.S., Gu, H., Wernig, M., Hanna, J., Sivachenko, A., Zhang, X., Bernstein, B.E., Nusbaum, C., Jaffe, D.B., et al. (2008). Genome-scale DNA methylation maps of pluripotent and differentiated cells. Nature 454, 766-770. Mills, R.E., Pittard, W.S., Mullaney, J.M., Farooq, U., Creasy, T.H., Mahurkar, A.A., Kemeza, D.M., Strassler, D.S., Ponting, C.P., Webber, C., et al. (2011). Natural genetic variation caused by small insertions and deletions in the human genome. Genome Res 21, 830-839. Morrison, C.D., Liu, P., Woloszynska-Read, A., Zhang, J., Luo, W., Qin, M., Bshara, W., Conroy, J.M., Sabatini, L., Vedell, P., et al. (2014). Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer. Proc Natl Acad Sci U S A 111, E672-681. Nadaf, J., Majewski, J., and Fahiminiya, S. (2015). ExomeAI: detection of recurrent allelic imbalance in tumors using whole-exome sequencing data. Bioinformatics 31, 429- 431. Nakagawa, M., Oda, Y., Eguchi, T., Aishima, S., Yao, T., Hosoi, F., Basaki, Y., Ono, M., Kuwano, M., Tanaka, M., et al. (2007). Expression profile of class I histone deacetylases in human cancer tissues. Oncol Rep 18, 769-774. 170 Neri, F., Krepelova, A., Incarnato, D., Maldotti, M., Parlato, C., Galvagni, F., Matarese, F., Stunnenberg, H.G., and Oliviero, S. (2013). Dnmt3L antagonizes DNA methylation at bivalent promoters and favors DNA methylation at gene bodies in ESCs. Cell 155, 121- 134. Network, C.G.A. (2012a). Comprehensive molecular characterization of human colon and rectal cancer. Nature 487, 330-337. Network, C.G.A.R. (2012b). Comprehensive genomic characterization of squamous cell lung cancers. Nature 489, 519-525. Network, C.G.A.R. (2013). Comprehensive molecular characterization of clear cell renal cell carcinoma. Nature 499, 43-49. Network, C.G.A.R. (2014a). Comprehensive molecular characterization of urothelial bladder carcinoma. Nature 507, 315-322. Network, C.G.A.R. (2014b). Comprehensive molecular profiling of lung adenocarcinoma. Nature 511, 543-550. Ng, S.B., Turner, E.H., Robertson, P.D., Flygare, S.D., Bigham, A.W., Lee, C., Shaffer, T., Wong, M., Bhattacharjee, A., Eichler, E.E., et al. (2009). Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461, 272-276. Nik-Zainal, S., Alexandrov, L.B., Wedge, D.C., Van Loo, P., Greenman, C.D., Raine, K., Jones, D., Hinton, J., Marshall, J., Stebbings, L.A., et al. (2012). Mutational processes molding the genomes of 21 breast cancers. Cell 149, 979-993. Niwa, T., Tsukamoto, T., Toyoda, T., Mori, A., Tanaka, H., Maekita, T., Ichinose, M., Tatematsu, M., and Ushijima, T. (2010). Inflammatory processes triggered by Helicobacter pylori infection cause aberrant DNA methylation in gastric epithelial cells. Cancer Res 70, 1430-1440. Nordentoft, I., Lamy, P., Birkenkamp-Demtröder, K., Shumansky, K., Vang, S., Hornshøj, H., Juul, M., Villesen, P., Hedegaard, J., Roth, A., et al. (2014). Mutational context and diverse clonal development in early and late bladder cancer. Cell Rep 7, 1649-1663. O'Connor, P.M., Jackman, J., Jondle, D., Bhatia, K., Magrath, I., and Kohn, K.W. (1993). Role of the p53 tumor suppressor gene in cell cycle arrest and radiosensitivity of Burkitt's lymphoma cell lines. Cancer Res 53, 4776-4780. O'Rawe, J., Jiang, T., Sun, G., Wu, Y., Wang, W., Hu, J., Bodily, P., Tian, L., Hakonarson, H., Johnson, W.E., et al. (2013). Low concordance of multiple variant- calling pipelines: practical implications for exome and genome sequencing. Genome Med 5, 28. 171 Okano, M., Bell, D.W., Haber, D.A., and Li, E. (1999). DNA methyltransferases Dnmt3a and Dnmt3b are essential for de novo methylation and mammalian development. Cell 99, 247-257. Okano, M., Xie, S., and Li, E. (1998). Cloning and characterization of a family of novel mammalian DNA (cytosine-5) methyltransferases. Nat Genet 19, 219-220. Ooi, S.K., Qiu, C., Bernstein, E., Li, K., Jia, D., Yang, Z., Erdjument-Bromage, H., Tempst, P., Lin, S.P., Allis, C.D., et al. (2007). DNMT3L connects unmethylated lysine 4 of histone H3 to de novo methylation of DNA. Nature 448, 714-717. Ostler, K.R., Davis, E.M., Payne, S.L., Gosalia, B.B., Expósito-Céspedes, J., Le Beau, M.M., and Godley, L.A. (2007). Cancer cells express aberrant DNMT3B transcripts encoding truncated proteins. Oncogene 26, 5553-5563. Ou, C.Y., Kim, J.H., Yang, C.K., and Stallcup, M.R. (2009). Requirement of cell cycle and apoptosis regulator 1 for target gene activation by Wnt and beta-catenin and for anchorage-independent growth of human colon carcinoma cells. J Biol Chem 284, 20629-20637. Pandiyan, K., You, J.S., Yang, X., Dai, C., Zhou, X.J., Baylin, S.B., Jones, P.A., and Liang, G. (2013). Functional DNA demethylation is accompanied by chromatin accessibility. Nucleic Acids Res 41, 3973-3985. Pao, M.M., Tsutsumi, M., Liang, G., Uzvolgyi, E., Gonzales, F.A., and Jones, P.A. (2001). The endothelin receptor B (EDNRB) promoter displays heterogeneous, site specific methylation patterns in normal and tumor cells. Hum Mol Genet 10, 903-910. Pfeifer, G.P. (2006). Mutagenesis at methylated CpG sequences. Curr Top Microbiol Immunol 301, 259-281. Picard. Available from: http://picard.sourceforge.net. Platt, F.M., Hurst, C.D., Taylor, C.F., Gregory, W.M., Harnden, P., and Knowles, M.A. (2009). Spectrum of phosphatidylinositol 3-kinase pathway gene alterations in bladder cancer. Clin Cancer Res 15, 6008-6017. Portela, A., and Esteller, M. (2010). Epigenetic modifications and human disease. Nat Biotechnol 28, 1057-1068. Przybylski, M., Kozłowska, A., Pietkiewicz, P.P., Lutkowska, A., Lianeri, M., and Jagodzinski, P.P. (2010). Increased CXCR4 expression in AsPC1 pancreatic carcinoma cells with RNA interference-mediated knockdown of DNMT1 and DNMT3B. Biomed Pharmacother 64, 254-258. Putiri, E.L., and Robertson, K.D. (2011). Epigenetic mechanisms and genome stability. Clin Epigenetics 2, 299-314. 172 Puzio-Kuter, A.M., Castillo-Martin, M., Kinkade, C.W., Wang, X., Shen, T.H., Matos, T., Shen, M.M., Cordon-Cardo, C., and Abate-Shen, C. (2009). Inactivation of p53 and Pten promotes invasive bladder cancer. Genes Dev 23, 675-680. Quinlan, A.R., and Hall, I.M. (2010). BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841-842. Reinert, T., Modin, C., Castano, F.M., Lamy, P., Wojdacz, T.K., Hansen, L.L., Wiuf, C., Borre, M., Dyrskjøt, L., and Orntoft, T.F. (2011). Comprehensive genome methylation analysis in bladder cancer: identification and validation of novel methylated genes and application of these as urinary tumor markers. Clin Cancer Res 17, 5582-5592. Reisman, D., Glaros, S., and Thompson, E.A. (2009). The SWI/SNF complex and cancer. Oncogene 28, 1653-1668. Rhee, I., Bachman, K.E., Park, B.H., Jair, K.W., Yen, R.W., Schuebel, K.E., Cui, H., Feinberg, A.P., Lengauer, C., Kinzler, K.W., et al. (2002). DNMT1 and DNMT3b cooperate to silence genes in human cancer cells. Nature 416, 552-556. Roberts, C.W., and Orkin, S.H. (2004). The SWI/SNF complex--chromatin and cancer. Nat Rev Cancer 4, 133-142. Robertson, K.D., Ait-Si-Ali, S., Yokochi, T., Wade, P.A., Jones, P.L., and Wolffe, A.P. (2000). DNMT1 forms a complex with Rb, E2F1 and HDAC1 and represses transcription from E2F-responsive promoters. Nat Genet 25, 338-342. Robertson, K.D., and Jones, P.A. (1998). The human ARF cell cycle regulatory gene promoter is a CpG island which can be silenced by DNA methylation and down- regulated by wild-type p53. Mol Cell Biol 18, 6457-6473. Robertson, K.D., Uzvolgyi, E., Liang, G., Talmadge, C., Sumegi, J., Gonzales, F.A., and Jones, P.A. (1999). The human DNA methyltransferases (DNMTs) 1, 3a and 3b: coordinate mRNA expression in normal tissues and overexpression in tumors. Nucleic Acids Res 27, 2291-2298. Rosenfeld, J.A., Mason, C.E., and Smith, T.M. (2012). Limitations of the human reference genome for personalized genomics. PLoS One 7, e40294. Ross, J.S., Wang, K., Al-Rohil, R.N., Nazeer, T., Sheehan, C.E., Otto, G.A., He, J., Palmer, G., Yelensky, R., Lipson, D., et al. (2014). Advanced urothelial carcinoma: next- generation sequencing reveals diverse genomic alterations and targets of therapy. Mod Pathol 27, 271-280. Ross, R.L., Askham, J.M., and Knowles, M.A. (2013). PIK3CA mutation spectrum in urothelial carcinoma reflects cell context-dependent signaling and phenotypic outputs. Oncogene 32, 768-776. 173 Saito, Y., Kanai, Y., Sakamoto, M., Saito, H., Ishii, H., and Hirohashi, S. (2002). Overexpression of a splice variant of DNA methyltransferase 3b, DNMT3b4, associated with DNA hypomethylation on pericentromeric satellite regions during human hepatocarcinogenesis. Proc Natl Acad Sci U S A 99, 10060-10065. Salem, C., Liang, G., Tsai, Y.C., Coulter, J., Knowles, M.A., Feng, A.C., Groshen, S., Nichols, P.W., and Jones, P.A. (2000a). Progressive increases in de novo methylation of CpG islands in bladder cancer. Cancer Res 60, 2473-2476. Salem, C.E., Markl, I.D., Bender, C.M., Gonzales, F.A., Jones, P.A., and Liang, G. (2000b). PAX6 methylation and ectopic expression in human tumor cells. Int J Cancer 87, 179-185. Sandoval, J., and Esteller, M. (2012). Cancer epigenomics: beyond genomics. Curr Opin Genet Dev 22, 50-55. Santos, E., Reddy, E.P., Pulciani, S., Feldmann, R.J., and Barbacid, M. (1983). Spontaneous activation of a human proto-oncogene. Proc Natl Acad Sci U S A 80, 4679- 4683. Sarkar, S., Jülicher, K.P., Burger, M.S., Della Valle, V., Larsen, C.J., Yeager, T.R., Grossman, T.B., Nickells, R.W., Protzel, C., Jarrard, D.F., et al. (2000). Different combinations of genetic/epigenetic alterations inactivate the p53 and pRb pathways in invasive human bladder cancers. Cancer Res 60, 3862-3871. Schlesinger, Y., Straussman, R., Keshet, I., Farkash, S., Hecht, M., Zimmerman, J., Eden, E., Yakhini, Z., Ben-Shushan, E., Reubinoff, B.E., et al. (2007). Polycomb-mediated methylation on Lys27 of histone H3 pre-marks genes for de novo methylation in cancer. Nat Genet 39, 232-236. Segal, E., Fondufe-Mittendorf, Y., Chen, L., Thåström, A., Field, Y., Moore, I.K., Wang, J.P., and Widom, J. (2006). A genomic code for nucleosome positioning. Nature 442, 772-778. Sekinger, E.A., Moqtaderi, Z., and Struhl, K. (2005). Intrinsic histone-DNA interactions and low nucleosome density are important for preferential accessibility of promoter regions in yeast. Mol Cell 18, 735-748. Sharif, J., Muto, M., Takebayashi, S., Suetake, I., Iwamatsu, A., Endo, T.A., Shinga, J., Mizutani-Koseki, Y., Toyoda, T., Okamura, K., et al. (2007). The SRA protein Np95 mediates epigenetic inheritance by recruiting Dnmt1 to methylated DNA. Nature 450, 908-912. Sharma, S., De Carvalho, D.D., Jeong, S., Jones, P.A., and Liang, G. (2011). Nucleosomes containing methylated DNA stabilize DNA methyltransferases 3A/3B and ensure faithful epigenetic inheritance. PLoS Genet 7, e1001286. 174 Sharma, S., Kelly, T.K., and Jones, P.A. (2010). Epigenetics in cancer. Carcinogenesis 31, 27-36. Shen, H., and Laird, P.W. (2013). Interplay between the cancer genome and epigenome. Cell 153, 38-55. Shen, L., Kondo, Y., Rosner, G.L., Xiao, L., Hernandez, N.S., Vilaythong, J., Houlihan, P.S., Krouse, R.S., Prasad, A.R., Einspahr, J.G., et al. (2005). MGMT promoter methylation and field defect in sporadic colorectal cancer. J Natl Cancer Inst 97, 1330- 1338. Sherry, S.T., Ward, M.H., Kholodov, M., Baker, J., Phan, L., Smigielski, E.M., and Sirotkin, K. (2001). dbSNP: the NCBI database of genetic variation. Nucleic Acids Res 29, 308-311. Shi, Y. (2007). Histone lysine demethylases: emerging roles in development, physiology and disease. Nat Rev Genet 8, 829-833. Sidransky, D., Frost, P., Von Eschenbach, A., Oyasu, R., Preisinger, A.C., and Vogelstein, B. (1992). Clonal origin bladder cancer. N Engl J Med 326, 737-740. Simpson, A.J. (2009). Sequence-based advances in the definition of cancer-associated gene mutations. Curr Opin Oncol 21, 47-52. Simó-Riudalbas, L., and Esteller, M. (2014). Cancer genomics identifies disrupted epigenetic genes. Hum Genet 133, 713-725. Sjödahl, G., Lauss, M., Gudjonsson, S., Liedberg, F., Halldén, C., Chebil, G., Månsson, W., Höglund, M., and Lindgren, D. (2011). A systematic study of gene mutations in urothelial carcinoma; inactivating mutations in TSC2 and PIK3R1. PLoS One 6, e18583. Sobin, L.H., Gospodarowicz, M.K., Wittekind, C., International Union against Cancer., and ebrary Inc. (2009). TNM classification of malignant tumours (Chichester, West Sussex, UK ; Hoboken, NJ: Wiley-Blackwell,), pp. xx, 310 p. Solomon, D.A., Kim, J.S., Bondaruk, J., Shariat, S.F., Wang, Z.F., Elkahloun, A.G., Ozawa, T., Gerard, J., Zhuang, D., Zhang, S., et al. (2013). Frequent truncating mutations of STAG2 in bladder cancer. Nat Genet 45, 1428-1430. Spruck, C.H., Gonzalez-Zulueta, M., Shibata, A., Simoneau, A.R., Lin, M.F., Gonzales, F., Tsai, Y.C., and Jones, P.A. (1994). p16 gene in uncultured tumours. Nature 370, 183- 184. Stein, J.P., Ginsberg, D.A., Grossfeld, G.D., Chatterjee, S.J., Esrig, D., Dickinson, M.G., Groshen, S., Taylor, C.R., Jones, P.A., Skinner, D.G., et al. (1998). Effect of p21WAF1/CIP1 expression on tumor progression in bladder cancer. J Natl Cancer Inst 90, 1072-1079. 175 Stewart, B.W., and Wild, C.P. World cancer report 2014, pp. 1 online resource (632 pages) illustrations (some color), maps. Su, S.F., de Castro Abreu, A.L., Chihara, Y., Tsai, Y., Andreu-Vieyra, C., Daneshmand, S., Skinner, E.C., Jones, P.A., Siegmund, K.D., and Liang, G. (2014). A panel of three markers hyper- and hypomethylated in urine sediments accurately predicts bladder cancer recurrence. Clin Cancer Res 20, 1978-1989. Takai, D., and Jones, P.A. (2002). Comprehensive analysis of CpG islands in human chromosomes 21 and 22. Proc Natl Acad Sci U S A 99, 3740-3745. Taylor, C.F., Platt, F.M., Hurst, C.D., Thygesen, H.H., and Knowles, M.A. (2014). Frequent inactivating mutations of STAG2 in bladder cancer are associated with low tumour grade and stage and inversely related to chromosomal copy number changes. Hum Mol Genet 23, 1964-1974. Tetreault, M., Bareke, E., Nadaf, J., Alirezaie, N., and Majewski, J. (2015). Whole- exome sequencing as a diagnostic tool: current challenges and future opportunities. Expert Rev Mol Diagn 15, 749-760. Tsai, Y.C., Nichols, P.W., Hiti, A.L., Williams, Z., Skinner, D.G., and Jones, P.A. (1990). Allelic losses of chromosomes 9, 11, and 17 in human bladder cancer. Cancer Res 50, 44-47. Tsutsumi, M., Liang, G., and Jones, P.A. (1999). Novel endothelin B receptor transcripts with the potential of generating a new receptor. Gene 228, 43-49. van Rhijn, B.W., Montironi, R., Zwarthoff, E.C., Jöbsis, A.C., and van der Kwast, T.H. (2002). Frequent FGFR3 mutations in urothelial papilloma. J Pathol 198, 245-251. Varley, K.E., Gertz, J., Bowling, K.M., Parker, S.L., Reddy, T.E., Pauli-Behn, F., Cross, M.K., Williams, B.A., Stamatoyannopoulos, J.A., Crawford, G.E., et al. (2013). Dynamic DNA methylation across diverse human cell lines and tissues. Genome Res 23, 555-567. Veltman, J.A., Fridlyand, J., Pejavar, S., Olshen, A.B., Korkola, J.E., DeVries, S., Carroll, P., Kuo, W.L., Pinkel, D., Albertson, D., et al. (2003). Array-based comparative genomic hybridization for genome-wide screening of DNA copy number in bladder tumors. Cancer Res 63, 2872-2880. Venkatesh, S., Smolle, M., Li, H., Gogol, M.M., Saint, M., Kumar, S., Natarajan, K., and Workman, J.L. (2012). Set2 methylation of histone H3 lysine'36 suppresses histone exchange on transcribed genes. Nature 489, 452-455. VEP. The Bioinformatics Knowledgeblog has VEP Available at:http://bioinformatics.knowledgeblog.org. 176 Viré, E., Brenner, C., Deplus, R., Blanchon, L., Fraga, M., Didelot, C., Morey, L., Van Eynde, A., Bernard, D., Vanderwinden, J.M., et al. (2006). The Polycomb group protein EZH2 directly controls DNA methylation. Nature 439, 871-874. Vogelstein, B., Papadopoulos, N., Velculescu, V.E., Zhou, S., Diaz, L.A., and Kinzler, K.W. (2013). Cancer genome landscapes. Science 339, 1546-1558. Voorter, C., Joos, S., Bringuier, P.P., Vallinga, M., Poddighe, P., Schalken, J., du Manoir, S., Ramaekers, F., Lichter, P., and Hopman, A. (1995). Detection of chromosomal imbalances in transitional cell carcinoma of the bladder by comparative genomic hybridization. Am J Pathol 146, 1341-1354. Wada, T., Louhelainen, J., Hemminki, K., Adolfsson, J., Wijkström, H., Norming, U., Borgström, E., Hansson, J., Sandstedt, B., and Steineck, G. (2000). Bladder cancer: allelic deletions at and around the retinoblastoma tumor suppressor gene in relation to stage and grade. Clin Cancer Res 6, 610-615. Wallerand, H., Bakkar, A.A., de Medina, S.G., Pairon, J.C., Yang, Y.C., Vordos, D., Bittard, H., Fauconnet, S., Kouyoumdjian, J.C., Jaurand, M.C., et al. (2005). Mutations in TP53, but not FGFR3, in urothelial cell carcinoma of the bladder are influenced by smoking: contribution of exogenous versus endogenous carcinogens. Carcinogenesis 26, 177-184. Wang, J., Bhutani, M., Pathak, A.K., Lang, W., Ren, H., Jelinek, J., He, R., Shen, L., Issa, J.P., and Mao, L. (2007). Delta DNMT3B variants regulate DNA methylation in a promoter-specific manner. Cancer Res 67, 10647-10652. Wang, J., Walsh, G., Liu, D.D., Lee, J.J., and Mao, L. (2006a). Expression of Delta DNMT3B variants and its association with promoter methylation of p16 and RASSF1A in primary non-small cell lung cancer. Cancer Res 66, 8361-8366. Wang, K., Li, M., and Hakonarson, H. (2010a). ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38, e164. Wang, K., Singh, D., Zeng, Z., Coleman, S.J., Huang, Y., Savich, G.L., He, X., Mieczkowski, P., Grimm, S.A., Perou, C.M., et al. (2010b). MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res 38, e178. Wang, L., Wang, J., Sun, S., Rodriguez, M., Yue, P., Jang, S.J., and Mao, L. (2006b). A novel DNMT3B subfamily, DeltaDNMT3B, is the predominant form of DNMT3B in non-small cell lung cancer. Int J Oncol 29, 201-207. Wei, Z., Wang, W., Hu, P., Lyon, G.J., and Hakonarson, H. (2011). SNVer: a statistical tool for variant calling in analysis of pooled or individual next-generation sequencing data. Nucleic Acids Res 39, e132. Weinberg, R.A. (1994). Oncogenes and tumor suppressor genes. CA Cancer J Clin 44, 160-170. 177 Weinberg, R.A. (2007). The biology of cancer (New York: Garland Science). Weisenberger, D.J., Velicescu, M., Cheng, J.C., Gonzales, F.A., Liang, G., and Jones, P.A. (2004). Role of the DNA methyltransferase variant DNMT3b3 in DNA methylation. Mol Cancer Res 2, 62-72. Welch, J.S., Ley, T.J., Link, D.C., Miller, C.A., Larson, D.E., Koboldt, D.C., Wartman, L.D., Lamprecht, T.L., Liu, F., Xia, J., et al. (2012). The origin and evolution of mutations in acute myeloid leukemia. Cell 150, 264-278. Widschwendter, M., Fiegl, H., Egle, D., Mueller-Holzner, E., Spizzo, G., Marth, C., Weisenberger, D.J., Campan, M., Young, J., Jacobs, I., et al. (2007). Epigenetic stem cell signature in cancer. Nat Genet 39, 157-158. Wiegand, K.C., Shah, S.P., Al-Agha, O.M., Zhao, Y., Tse, K., Zeng, T., Senz, J., McConechy, M.K., Anglesio, M.S., Kalloger, S.E., et al. (2010). ARID1A mutations in endometriosis-associated ovarian carcinomas. N Engl J Med 363, 1532-1543. Williams, S.G., and Stein, J.P. (2004). Molecular pathways in bladder cancer. Urol Res 32, 373-385. Williamson, M.P., Elder, P.A., Shaw, M.E., Devlin, J., and Knowles, M.A. (1995). p16 (CDKN2) is a major deletion target at 9p21 in bladder cancer. Hum Mol Genet 4, 1569- 1577. Wolff, E.M., Byun, H.M., Han, H.F., Sharma, S., Nichols, P.W., Siegmund, K.D., Yang, A.S., Jones, P.A., and Liang, G. (2010a). Hypomethylation of a LINE-1 promoter activates an alternate transcript of the MET oncogene in bladders with cancer. PLoS Genet 6, e1000917. Wolff, E.M., Chihara, Y., Pan, F., Weisenberger, D.J., Siegmund, K.D., Sugano, K., Kawashima, K., Laird, P.W., Jones, P.A., and Liang, G. (2010b). Unique DNA methylation patterns distinguish noninvasive and invasive urothelial cancers and establish an epigenetic field defect in premalignant tissue. Cancer Res 70, 8169-8178. Wolff, E.M., Liang, G., Cortez, C.C., Tsai, Y.C., Castelao, J.E., Cortessis, V.K., Tsao- Wei, D.D., Groshen, S., and Jones, P.A. (2008). RUNX3 methylation reveals that bladder tumors are older in patients with a history of smoking. Cancer Res 68, 6208-6214. Wolff, E.M., Liang, G., and Jones, P.A. (2005). Mechanisms of Disease: genetic and epigenetic alterations that drive bladder cancer. Nat Clin Pract Urol 2, 502-510. Wong, K.K., Tsang, Y.T., Shen, J., Cheng, R.S., Chang, Y.M., Man, T.K., and Lau, C.C. (2004). Allelic imbalance analysis by high-density single-nucleotide polymorphic allele (SNP) array with whole genome amplified DNA. Nucleic Acids Res 32, e69. 178 Xie, S., Wang, Z., Okano, M., Nogami, M., Li, Y., He, W.W., Okumura, K., and Li, E. (1999). Cloning, expression and chromosome locations of the human DNMT3 gene family. Gene 236, 87-95. Xue, Y., Ankala, A., Wilcox, W.R., and Hegde, M.R. (2015). Solving the molecular diagnostic testing conundrum for Mendelian disorders in the era of next-generation sequencing: single-gene, gene panel, or exome/genome sequencing. Genet Med 17, 444- 451. Yamamoto, E., Toyota, M., Suzuki, H., Kondo, Y., Sanomura, T., Murayama, Y., Ohe- Toyota, M., Maruyama, R., Nojima, M., Ashida, M., et al. (2008). LINE-1 hypomethylation is associated with increased CpG island methylation in Helicobacter pylori-related enlarged-fold gastritis. Cancer Epidemiol Biomarkers Prev 17, 2555-2564. Yan, P.S., Venkataramu, C., Ibrahim, A., Liu, J.C., Shen, R.Z., Diaz, N.M., Centeno, B., Weber, F., Leu, Y.W., Shapiro, C.L., et al. (2006). Mapping geographic zones of cancer risk with epigenetic biomarkers in normal breast tissue. Clin Cancer Res 12, 6626-6636. Yang, X., Han, H., De Carvalho, D.D., Lay, F.D., Jones, P.A., and Liang, G. (2014). Gene Body Methylation Can Alter Gene Expression and Is a Therapeutic Target in Cancer. Cancer Cell 26, 1-14. Yang, X., Noushmehr, H., Han, H., Andreu-Vieyra, C., Liang, G., and Jones, P.A. (2012). Gene reactivation by 5-aza-2'-deoxycytidine-induced demethylation requires SRCAP-mediated H2A.Z insertion to establish nucleosome depleted regions. PLoS Genet 8, e1002604. You, J.S., and Jones, P.A. (2012). Cancer genetics and epigenetics: two sides of the same coin? Cancer Cell 22, 9-20. You, J.S., Kelly, T.K., De Carvalho, D.D., Taberlay, P.C., Liang, G., and Jones, P.A. (2011). OCT4 establishes and maintains nucleosome-depleted regions that provide additional layers of epigenetic regulation of its target genes. Proc Natl Acad Sci U S A 108, 14497-14502. Yuan, G.C., Liu, Y.J., Dion, M.F., Slack, M.D., Wu, L.F., Altschuler, S.J., and Rando, O.J. (2005). Genome-scale identification of nucleosome positions in S. cerevisiae. Science 309, 626-630. Zhou, V.W., Goren, A., and Bernstein, B.E. (2011). Charting histone modifications and the functional organization of mammalian genomes. Nat Rev Genet 12, 7-18. Zhu, J., Adli, M., Zou, J.Y., Verstappen, G., Coyne, M., Zhang, X., Durham, T., Miri, M., Deshpande, V., De Jager, P.L., et al. (2013). Genome-wide chromatin state transitions associated with developmental and environmental cues. Cell 152, 642-654. 179 APPENDIX In addition to the work described in this thesis, I also was able to critically examine peer-reviewed articles. I worked with Dr Jessica Charlet to address concerns about a recently published paper that identified somatic mutations in cancer cell lines (Barretina et al., 2012). Together we identified a common mutation found in cell lines that should have been identified as a polymorphism. Additionally, Dr Gangning Liang and myself provided an expert commentary on an article related to a specific DNMT3B isoform in lung cancer. I have attached the related comment and commentary that resulted from these collaborations. 180 COMMENT ON “THE CANCER CELL LINE ENCYCLOPEDIA ENABLES PREDICTIVE MODELLING OF ANTICANCER DRUG SENSITIVITY” Jessica Charlet, Christopher E. Duymich, Gangning Liang, Peter A. Jones Keck School of Medicine of USC, Los Angeles, California, USA. The Cancer Cell Line Encyclopedia (CCLE) combines genetic as well as pharmacological data of 947 human cancer cell lines, including gene expression, copy number, mutation status and pharmacological compound screening. Large data sets like these represent an extremely valuable asset for all future cell line-based research. However, regarding the sequencing data of the mutational status of >1600 genes, one should carefully interpret the mutation annotation format (MAF), as it can be misleading in scoring mutations against a human “reference genome”. An in frame deletion of three nucleotides, leading to a non-synonymous coding change in the chromatin remodeler gene CHD1 in 279 of the cancer cell lines, caught our attention since it suggested that the “mutation” might in fact be a germline polymorphism. This deletion was further studied in our laboratory through qPCR of genomic DNA with specific primer sets targeting the CHD1 sequence with the deletion (allele “B”) and without (allele “A”), in order to estimate the allele frequencies. Analysis of human white blood cell samples from 32 individuals showed an “A” allele frequency of 70% and a “B” allele frequency of 30%, with an equilibrium of both alleles in the investigated population, being consistent with the Hardy-Weinberg equation. Seventeen human transitional cell cancer cell lines available in our laboratory, 181 showed approximately the same distribution. The data from three of the cancer cell lines presented in the CCLE (UM-UC-3, HT-1376 and RT4) implies the presence of the “B” allele only; however, our analysis shows the presence of the “A” allele as well. Subjectively the reader is led to the conclusion that the CHD1 in frame deletion is linked to cancer. However, on the basis of our findings, this seems very unlikely. Nevertheless, the CCLE surely represents a highly important resource to every researcher working with cancer cell lines and who wants to get a rapid first impression of the genetics of the cell line they are currently working on. 182 Commentary UniqueRoleforaDNAMethyltransferaseIsoforminLungCancer ChristopherE.Duymich,GangningLiang⁎ DepartmentofUrology, USCNorrisComprehensive CancerCenter,KeckSchoolofMedicine,UniversityofSouthernCalifornia,Los Angeles, CA90089,USA Aberrant DNA methylation patterns are one of the most studied epigenetic components in cancer development. In certain cancers, altered expression of DNA methyltransferase (DNMT) family mem- bers could potentially cause this DNA methylation re-patterning. Specifically, DNMT3B has over thirty different isoforms with varied expression in tissues (Ostler et al. 2007). Recent studies have shown that DNMT3B is responsible for gene body methylation by recognizingtheH3K36me3modificationfoundingenebodies,espe- ciallygenebodyremethylationaftertreatmentwithaDNAmethyla- tioninhibitor(5-Aza-CdR)(Baubecetal.2015;Yangetal.2014).The methylatedgenebodiesshowedpositivecorrelationswithgeneexpres- sion and, additionally, hypomethylation of gene bodies could lead to downregulationofgeneexpression(Yangetal.2014).Non-smallcell lungcancer(NSCLC)isonesuchdisease whichhasoverexpression of DNMT3B, including a specific subfamily which lacks an N-terminal domain (ΔDNMT3B) (Wangetal. 2006). Inthisissueof EBioMedicine the work presented by Ma et al. (Ma et al. 2015), provides exciting developments for NSCLC withfindings connecting the aberrant DNA methylation patterns during tumorigenesis with the predominantly expressedDNMT3Bisoform:ΔDNMT3B4-del,atruncated DNMT3Biso- formlackingexons21and/or22whichcontainthecatalyticdomain. DNMTs responsible for DNA methylation establishment include: DNMT1,DNMT3AandDNMT3B,aswellasDNMT3L(DNAmethyltrans- ferase 3-like). DNMT3A and B are the de novo methyltransferases responsible for developing new methylation patterns in the genome aswellasmaintenanceofstablegenesilencingrelatedtokeybiological processes.Generally,tumorigenesisinvolvesDNAhypermethylationat CpG islands (CGI) as well as global DNA hypomethylation (Jones 2012).AlthoughtherearemanystudiesthatfocusonaspecificDNMT andtheirroleduringdevelopmentandcancer,thereisstillaknowledge gapwithrespecttohowtheplethoraofDNMTisoformscontributesto DNAmethylationalterations. The work by Ma et al. builds on the foundation that primary NSCLC tumors aberrantly express ΔDNMT3B from earlier studies with ΔDNMT3B4 being the most prevalent isoform (Wang et al. 2007). Additionally,ΔDNMT3B4 can have a truncated methyltrans- ferase domain (ΔDNMT3B4-del)thatwasshowntofacilitateDNA methylationchangesintransgenicmicesimilartoearlyinitiatingevents intumorigenesis.Interestingly,approximatelyhalfoftheNSCLCtumors expressedDNMT3B-delvariants.AberrantexpressionofΔDNMT3B4-del couldnotfullyexplainthedevelopmentofNSCLChowever;thisfinding highlightshowDNMT3Bisoformscouldbepotentialtargetsforcancer therapy. TheobservedDNAmethylationchangescouldbe initiationevents for tumorigenesis in NSCLC and is interesting because this shows the impact of one aberrantly expressed DNMT isoform in normal tissue. ThetransgenicmouseexperimentsbyMaetal.,whereΔDNMT3B4-del was exogenously introduced to the genome, resulted in lung specific global hypomethylation, a hallmark of aberrant DNA methylation foundduringearlytumorigenesis(JonesandBaylin 2007).Theglobal hypomethylation could be caused byΔDNMT3B4-del interacting or preventing interactions between otherDNMTfamily members or the lossofitscatalyticdomainbutrequiresfurtherindepthfunctionalstud- ies.ThefunctionsofΔDNMT3B4-delisnotfullyunderstood,norarethe effects of the induced global hypomethylation during tumorigenesis potentially due to its truncated catalytic domain. Nevertheless, the authorsusedinvitromodelsystemstostudyΔDNMT3B4-del,finding cells became arrested in the G2/M phase, had increased abnormal DNA content and elevated levels of DNA damage response genes consistent with lung tumorigenesis. Furthermore, the authors show that, although ΔDNMT3B4-del on its own is incapable of inducing tumorigenesis,incombinationwithcarcinogenexposurethetransgenic animals develop adenocarcinoma formations. Summarizing these results,itsuggeststheΔDNMT3B4-delplaysacriticalroleinpromoting genomicinstabilityprovidingthegroundworkfortumorinitiationand formationinNSCLC. These intriguing findings raise multiple new questions. Could other DNMT3B isoform family members provide the first necessary methylomechangesresultingintumorinitiation?HowdoDNMTfamily members interact with each other when one constituent is over pro- duced?WilltargetingspecificDNMTsbenecessaryincancertreatment andprevention?Answeringthesequestionswillbepivotalinproviding functionalcharacteristicsaboutDNAmethyltransferasesandtheirroles intumorigenesis.Furthermoreitwillprovideapossiblenewfoundation ofhowspecificgeneschangingtheepigenomecaninfluencemoredras- ticchangesleadingtotumorevolution. Oneproblemwithtargeting DNA methylationistheavailability of specificinhibitorsforDNMTs.Although5-azacytidine(5-Aza-CR)and 5-aza-2′-deoxycitidine (5-Aza-CdR) are FDA approved for treatment ofmyeloidmalignances, theyleadto thetrappinganddegradation of DNMTs resulting in passive DNA methylation loss after replication (Ghoshaletal.2005; Kuoetal.2007).Thebesttypeoftherapywould EBioMedicine2(2015)1272–1273 DOIoforiginalarticle:http://dx.doi.org/10.1016/j.ebiom.2015.09.002. ⁎ Correspondingauthor. E-mailaddress:gliang@usc.edu(G.Liang). http://dx.doi.org/10.1016/j.ebiom.2015.09.020 2352-3964/©2015TheAuthors.PublishedbyElsevierB.V.ThisisanopenaccessarticleundertheCCBY-NC-NDlicense(http://creativecommons.org/licenses/by-nc-nd/4.0/). ContentslistsavailableatScienceDirect EBioMedicine journal homepage: www.ebiomedicine.com 183 be one that targets specific proteins like ΔDNMT3B4-del to restore normalexpressionoftheDNMTfamilyinsteadoftargeting allconsti- tutesandpossiblycausingofftargeteffects.Theidentificationofdrugs thatcouldspecificallytargetoneDNAmethyltransferasewillbeanin- terestingfieldofresearchwithapotentialtherapeuticimpactonvarious diseases. Conflicts of Interest Theauthorsdeclarenoconflictsofinterest. References Baubec, T., Colombo, D.F., Wirbelauer, C., Schmidt, J., Burger, L., Krebs, A.R., Akalin, A., Schübeler, D., 2015. ). Genomic profiling of DNA methyltransferases reveals a role for DNMT3Bingenic methylation.Nature. Ghoshal,K.,Datta,J.,Majumder,S.,Bai,S.,Kutay,H.,Motiwala,T.,Jacob,S.T.,2005.5- Aza-deoxycytidine inducesselective degradationof DNAmethyltransferase1by aproteasomalpathwaythatrequirestheKENbox,bromo-adjacenthomology domain, and nuclear localization signal. Mol. Cell. Biol. 25, 4727–4741. Jones, P.A., 2012. Functions of DNA methylation: islands, start sites, gene bodies and beyond. Nat. Rev.Genet. 13, 484–492. Jones, P.A., Baylin, S.B.,2007.Theepigenomics of cancer. Cell128, 683–692. Kuo,H.K.,Griffith,J.D.,Kreuzer,K.N.,2007.5-Azacytidineinducedmethyltransferase-DNA adducts block DNA replication invivo. CancerRes. 67, 8248–8254. Ma, M.Z., Carrillo, J., Lin, R., Bhutani, M., Pathak, A., Ren, H., Li, Y., Song, J., Mao, L., 2015. ΔDNMT3B4-del Contributes to Aberrant DNA Methylation PatternsinLung Tumori- genesis. EBioMedicine 2,1340–1350. Ostler, K.R., Davis, E.M., Payne, S.L., Gosalia, B.B., Expósito-Céspedes, J., Le Beau, M.M., Godley, L.A., 2007. Cancer cells express aberrant DNMT3B transcripts encoding truncatedproteins. Oncogene26,5553–5563. Wang,L.,Wang,J.,Sun,S.,Rodriguez,M.,Yue,P.,Jang,S.J.,Mao,L.,2006.AnovelDNMT3B subfamily,DeltaDNMT3B,isthepredominantformofDNMT3Binnon-smallcelllung cancer. Int. J.Oncol.29, 201–207. Wang, J., Bhutani, M., Pathak, A.K., Lang, W., Ren, H., Jelinek, J., He, R., Shen, L., Issa, J.P., Mao, L., 2007. Delta DNMT3B variants regulate DNA methylation in a promoter- specific manner. CancerRes. 67, 10647–10652. Yang, X., Han, H., De Carvalho, D.D., Lay, F.D., Jones, P.A., Liang, G., 2014. Gene body methylation can alter gene expression and is a therapeutic target in cancer. Cancer Cell26,1–14. 1273 C.E.Duymich,G.Liang/EBioMedicine2(2015)1272–1273
Abstract (if available)
Abstract
Traditionally cancer is viewed as a disease driven by the accumulation of genetic alterations, however, this view has been currently expanded to include epigenetic modifications. Recent developments, such as whole exome sequencing (WES) have enabled researchers to analyze the entire genomic coding regions in cancer tissue samples to determine possible genetic mutations causing tumor onset/progression, leading to improved disease prognosis and treatment therapies. These studies have unexpectedly shown genes that organize the epigenome are frequently mutated in many tumor types. The effects of these mutations on the epigenetic landscape, e.g. DNA methylation, histone modification, and nucleosome positioning alterations are as yet unknown. Given that the disruption of normal epigenetic patterns can contribute to carcinogenesis, it is critical to understand how the cancer epigenome has been altered and can be reversed by pharmacological intervention. ❧ I designed a pipeline to process WES data for the purpose of identifying somatic alterations in bladder cancer cell lines as well as primary bladder cancer tissues. I characterized the genetic and epigenetic alterations in bladder cancer cell lines and compared them to uncultured tumors to validate them as model systems for studying bladder cancer tumorigenesis. Specifically, I confirmed findings of genetic alterations previously identified in bladder cancer, as well as DNA methylation alterations using arrays. Additionally, I dissected early epigenetic and genetic alterations that occur in recurrent non muscle-invasive bladder cancer using metachronous uncultured tumor samples. These findings show clonal evolution from ancestral clones in all patients using both DNA methylation and genetic analysis confirming a common initiation of tumorigenesis. ❧ I have also identified a new accessory protein functional role for DNA methyltransferase 3B (DNMT3B) isoforms lacking catalytic domains. These accessory proteins are able to stimulate DNA methylation both in a de novo context, as well as partially restoring DNA methylation levels following treatment with a FDA-approved demethylation agent, 5-Aza-CdR. Taken together, this dissertation presents a detailed view of the genome and epigenome alterations that contribute to tumor initiation and development.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Identification of novel epigenetic biomarkers and microRNAs for cancer therapeutics
PDF
Functional role of chromatin remodeler proteins in cancer biology
PDF
Functional DNA methylation changes in normal and cancer cells
PDF
Role of DNA methyltransferases 3A and 3B in inheritance of DNA methylation patterns
PDF
DNA methylation inhibitors and epigenetic regulation of microRNA expression
PDF
DNA methylation changes in the development of lung adenocarcinoma
PDF
Integrated genomic & epigenomic analyses of glioblastoma multiforme: Methods development and application
PDF
Epigenetic mechanisms driving bladder cancer
PDF
CpG poor promoter SULT1C2 regulated by DNA methylation and is induced by cigarette smoke condensate in lung cell lines
PDF
Developing a robust single cell whole genome bisulfite sequencing protocol to analyse circulating tumor cells
PDF
Epigenetic regulation of non CPG island gene promoters
PDF
Integrative genomic and epigenomic analysis of human cancer
PDF
Using epigenetic toggle switches to repress tumor-promoting gene expression
PDF
DNA hypermethylation: its role in colorectal tumorigenesis and potential clinical applications
PDF
Identification and characterization of cancer-associated enhancers
PDF
Understanding DNA methylation and nucleosome organization in cancer cells using single molecule sequencing
PDF
Ancestral inference and cancer stem cell dynamics in colorectal tumors
PDF
Identification and characterization of PR-Set7 and histone H4 lysine 20 methylation-associated proteins
PDF
Breast epithelial cell type specific enhancers and functional annotation of breast cancer risk loci
PDF
DNA methylation as a biomarker in human reproductive health and disease
Asset Metadata
Creator
Duymich, Christopher E.
(author)
Core Title
Effects of chromatin regulators during carcinogenesis
School
Keck School of Medicine
Degree
Doctor of Philosophy
Degree Program
Genetic, Molecular and Cellular Biology
Publication Date
02/23/2016
Defense Date
11/02/2015
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
bladder cancer,DNA methylation,epigenetics,genetics,OAI-PMH Harvest
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Coetzee, Gerhard (
committee chair
), Jones, Peter A. (
committee member
), Stallcup, Michael (
committee member
)
Creator Email
c.duymich@gmail.com,duymich@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c40-214550
Unique identifier
UC11279303
Identifier
etd-DuymichChr-4147.pdf (filename),usctheses-c40-214550 (legacy record id)
Legacy Identifier
etd-DuymichChr-4147.pdf
Dmrecord
214550
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Duymich, Christopher E.
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
bladder cancer
DNA methylation
epigenetics
genetics