Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
The development of targeted transcription factor transposition and understanding chromatin dynamics in hypertrophic cardiomyopathy
(USC Thesis Other)
The development of targeted transcription factor transposition and understanding chromatin dynamics in hypertrophic cardiomyopathy
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
The development of Targeted Transcription factor Transposition and understanding chromatin dynamics in hypertrophic cardiomyopathy by Justin Cayford A Dissertation Presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (MOLECULAR BIOLOGY) December 2021 Copyright 2021 Justin Cayford Acknowledgments Firstly, I would like to thank my advisor Dr. Lin Chen for all of the support and guidance throughout the years this work was completed in. It was always a pleasure to be able to talk about new and exciting methods and projects to try and find better ways of generating data to understand biology in exciting ways. I would also like to thank everyone in the Chen lab for all the discussions and support that has been given to me throughout this process. Thank you to Anupam, Calista, Celja, Gary, James, and Katie for all the randomly timed walks, support, advice, and help since we entered graduate school. I would also like to thank Dr. James Clarke for all the questions you had to endure. Also, everyone in the Vijayanand Lab who helped with the experimentation on the car- diomyocytes, specifically Dr. Vivek Chandra who helped with the HiChIP assay, Dr. Sourya Bhattacharyya who was always there to answer my questions about the analysis of the HiChIP assay, and Dr. Pandurangan Vijayanand who allowed me to complete the experimentation in the lab and for giving me the tools to succeed in graduate school and beyond. This work really would not have been possible without you all. I want to thank my parents, Bob and Cindy, brother, Jordan, and sister, Mackenzie for all the support and love throughout my entire journey. Lastly, I want to thank my soon to be wife, Janna, who has put up with the countless days of extra work, long drives to and from LA, and the amazing trips which allowed me to unwind. This work is dedicated Janna and my family who have always been there for me. ii Contents Acknowledgments ii List of Tables vi List of Figures vii Abbreviations viii Abstract ix 1 Introduction 1 1.1 Histones Discovery to the Epigenome . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Histone Modifications and Post-Translational Modifications . . . . . . . . . . 3 1.2.1 Histone Acetylation . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.2 CpG Islands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.3 Cis-regulatory Elements . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.4 Post Translational Modifications . . . . . . . . . . . . . . . . . . . . . 8 1.3 Chromatin Accessibility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.1 Enzymatic Cleavage of Accessible Chromatin . . . . . . . . . . . . . . 9 1.3.2 Transcription Factors and Chromatin Remodeling . . . . . . . . . . . . 13 1.3.3 DNA Footprinting . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4 The 3D Genome . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.4.1 Genome-wide Associated Studies . . . . . . . . . . . . . . . . . . . . 18 1.4.2 Quantitative Trait Loci . . . . . . . . . . . . . . . . . . . . . . . . . . 19 1.5 Chromatin Immunoprecipitation followed by Sequencing (ChIP-Seq) . . . . . 20 1.5.1 ChIP in brief . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.5.2 Encyclopedia of DNA Elements . . . . . . . . . . . . . . . . . . . . . 25 1.5.3 ChIPmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.5.4 ChIP-Seq Cell reduction to Single-cell and like methods . . . . . . . . 27 1.5.5 ChIP Current limitations . . . . . . . . . . . . . . . . . . . . . . . . . 29 1.6 Chromosome Capture Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.6.1 ChIA-PET and HiChIP . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.7 Tn5 transposase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 1.7.1 Tn5 Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 1.7.2 Tn5 Hyperactivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 1.8 Hypertrophic cardiomyopathy background . . . . . . . . . . . . . . . . . . . . 43 1.8.1 HCM Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 1.9 HCM Disease Mechanism . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 1.9.1 HDAC Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 1.9.2 MEF2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 2 Materials and Methods 58 2.1 Materials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.1.1 Kits Used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.1.2 Primer Creation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 2.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.2.1 Agar resistance plates . . . . . . . . . . . . . . . . . . . . . . . . . . 60 iii 2.2.2 Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.2.3 Glycerol Stocks, plasmid sequencing, and seed cultures for AMP resis- tant plasmids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 2.2.4 Small-scale Induction Test . . . . . . . . . . . . . . . . . . . . . . . . 61 2.2.5 Tn5:mHDAC Plasmid Generation . . . . . . . . . . . . . . . . . . . . 61 2.2.6 Tn5 and Tn5:mHDAC Expression . . . . . . . . . . . . . . . . . . . . 61 2.2.7 Stacked SDS-PAGE gel preparation . . . . . . . . . . . . . . . . . . . 61 2.2.8 Tn5 and Tn5:mHDAC Induction SDS-PAGE QC . . . . . . . . . . . . 62 2.2.9 Tn5 Protein Purification . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.2.10 Preparation of NRVMs . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.11 ChIP-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.12 HiChIP Assay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.13 ChIP-Seq Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 2.2.14 HiChIP Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 3 Targeted Transcription Factor Transposition 65 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 3.2 Assay Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 3.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.1 Optimal Purification of Tn5 . . . . . . . . . . . . . . . . . . . . . . . 69 3.3.2 Activity Optimization of Tn5 . . . . . . . . . . . . . . . . . . . . . . . 71 3.3.3 Tn5:mHDAC fusion plasmid creation . . . . . . . . . . . . . . . . . . 74 3.3.4 Tn5:mHDAC protein expression . . . . . . . . . . . . . . . . . . . . . 76 3.3.5 Tn5:mHDAC binding to MEF2 . . . . . . . . . . . . . . . . . . . . . 78 3.3.6 Tn5:mHDAC Specificity and Assay testing . . . . . . . . . . . . . . . 79 3.3.7 Mutation of Tn5:mHDAC . . . . . . . . . . . . . . . . . . . . . . . . 83 3.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 4 Computational Analysis of HCM HiC 90 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 4.1.1 Experimental Design . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 4.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2.1 Method 1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 4.2.2 Method 2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 4.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 5 Understanding NPPA and NPPB gene regulation in HCM 109 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2.1 H3K27ac ChIP-Seq . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 5.2.2 H3K27ac HiChIP Optimization, Preparation, and Sequencing . . . . . 117 5.2.3 H3K27ac HiChIP Initial Filtering Analysis . . . . . . . . . . . . . . . 121 5.2.4 H3K27ac HiChIP FitHiChIP Analysis . . . . . . . . . . . . . . . . . . 124 5.2.5 H3K27ac HiChIP Correlation to Gene Expression . . . . . . . . . . . 129 5.2.6 NPPA and NPPB Loci Interactions background . . . . . . . . . . . . . 133 5.2.7 NPPA and NPPB Loci Interactions . . . . . . . . . . . . . . . . . . . . 135 5.3 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 6 Discussion 151 iv 7 References 163 v List of Tables 1 List of H3 Histone Modifications . . . . . . . . . . . . . . . . . . . . . . . . . 5 2 List of Tn5 Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3 List of HCM Implicated Genes . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4 List of Plasmids . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 5 List of Oligonucleotides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 vi List of Figures 1.1 Overview of Chromatin Accessibility . . . . . . . . . . . . . . . . . . . . . . 10 1.2 Chromatin Remodeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.3 Overview of DNA Footprinting . . . . . . . . . . . . . . . . . . . . . . . . . . 17 1.4 eQTL Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.5 ChIP-Seq Methods Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.6 3D Methods Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 1.7 Overview of DNA Footprinting . . . . . . . . . . . . . . . . . . . . . . . . . . 39 1.8 Understanding HCM Mutation Targets . . . . . . . . . . . . . . . . . . . . . . 45 1.9 HCM Pathway . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 1.10 HDAC Classes and MEF2 Binding . . . . . . . . . . . . . . . . . . . . . . . . 50 1.11 HDAC Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 1.12 General MEF2 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 1.13 HCM Hypothetical 3D Interactions . . . . . . . . . . . . . . . . . . . . . . . . 55 3.1 General Experimental Design for Targeted Transcription factor Transposition . 68 3.2 Tn5 plasmid and purification . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 3.3 Optimization of Tn5 activity . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 3.4 Tn5:mHDAC Plasmid Strategy . . . . . . . . . . . . . . . . . . . . . . . . . . 75 3.5 Tn5:mHDAC plasmid generation . . . . . . . . . . . . . . . . . . . . . . . . . 77 3.6 Tn5:mHDAC Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.7 EMSA for Tn5:mHDAC binding affinity to MEF2 . . . . . . . . . . . . . . . . 82 3.8 Tn5:mHDAC specificity assay to MEF2 . . . . . . . . . . . . . . . . . . . . . 84 3.9 Tn5 Mutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 4.1 HiC Filter Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.2 HiC Analysis and ChIP-Seq Filtering . . . . . . . . . . . . . . . . . . . . . . . 97 4.3 Healthy vs HCM RNA-Seq Differential Analysis . . . . . . . . . . . . . . . . 99 4.4 Healthy vs HCM Looping Differences . . . . . . . . . . . . . . . . . . . . . . 101 4.5 Healthy vs HCM Looping Differences . . . . . . . . . . . . . . . . . . . . . . 103 5.1 General H3K27ac ChIP-Seq and HiChIP Experiment Scheme . . . . . . . . . 111 5.2 Sonication Optimization and H3K27ac ChIP-Seq of NRVMs . . . . . . . . . . 114 5.3 H3K27ac ChIP-Seq Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 5.4 H3K27ac HiChIP Optimization and Sequencing Results . . . . . . . . . . . . 119 5.5 HiCPro and FitHiChIP Pipeline Outputs . . . . . . . . . . . . . . . . . . . . . 123 5.6 FitHiChIP Pipeline Workflow . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.7 FitHiChIP Differential Looping Breakdown and stimulation chemicals . . . . . 128 5.8 Classification of Differential loop/gene overlaps . . . . . . . . . . . . . . . . . 132 5.9 NPPA and NPPB Locus Background . . . . . . . . . . . . . . . . . . . . . . . 136 5.10 NPPA and NPPB Local Loci Interactions . . . . . . . . . . . . . . . . . . . . 138 5.11 Extended NPPA/NPPB Locus interactions and TF ChIP-Seq . . . . . . . . . . 141 5.12 Extended NPPA/NPPB Locus (2) interactions and TF ChIP-Seq . . . . . . . . 142 5.13 Complete NPPA/NPPB Loci interactions model . . . . . . . . . . . . . . . . . 143 vii Abbreviations • sc - Single-cell • 1D, 2D, 3D, 4D - first dimension, second dimension, third dimension, fourth dimension • HCM - hypertrophic cardiomyopathy • DNA - deoxyribonucleic acid • kb, mb - kilobase, megabase • Nickase - nicking endonuclease • HPLC - high pressure liquid chromatography • TTT - Targeted Transcription factor Transposition • Tn5:mHDAC - Tn5 fused with a minimal binding motif of HDAC4 via a GSG linker • AMP - ampicillin • Tspn - transposon • GA - Gibson Assembly • MEF2 - Myocyte Enhancer Factor 2 • HDAC - Histone deacetylase • NPPA - Natriuretic peptide A • NPPB - Natriuretic peptide B • RE - Regulatory Element viii Abstract To better understand the disease hypertrophic cardiomyopathy (HCM), numerous techniques were utilized as well as the development of a novel assay. The driving factors of this body of work was to probe myocyte enhancer factor 2’s (MEF2) ability to alter chromatin structure and, as a result, gene expression upon hypertrophic activation. Due to some technical limitations of chromatin immunoprecipitation followed by sequencing (ChIP-Seq), a novel method was attempted which eliminated the use of antibodies to determine the binding positions of MEF2 genome wide. The third aspect of this work was to use a well known histone modification for activate gene expression (H3K27ac) and test the 2D (ChIP-Seq) and 3D interactions (HiChIP) being perturbed in the disease. These data were then used to determine if there was a molecular mechanism of 3D chromatin structural changes driving the up and down regulation of disease associated genes. This was the first example of using HiChIP in an HCM system and showed altered chromatin states in genes responsible for the disease. Specifically, these data showed the importance of MEF2 and histone deacetylase 4 (HDAC) in the natriuretic peptide a and b (NPPA/NPPB) locus both in the the regulation of gene expression but how the chromatin dynamics can be perturb on a local scale which has been missed by HiC experiments. The use of HiChIP also allowed for the understanding of the locus on a much broader scale than had been previously reported using 4C experiments. Overall, the locus showed how HiChIP data could be applied to a well studied locus to further the understanding of the role of 3D interactions in gene expression regulation. Overall, these experiments were conducted in order to better understand the role of MEF2:HDAC interactions played in the regulation of chromatin structure and gene expression. The assay de- velopment proved a proof-of-concept for a new type of ChIP-Seq which would eliminate the need for the standard antibody approach. The addition of Tn5 to a specific chemical probe to bind a target would also allow for the assay to be multiplexed for numerous targets as well as have the potential to be reduced to the single-cell level. This tool could have a large impact on the field to better understand the roles of transcription factors in 3D chromatin confirmation and their importance in gene regulation. The computational approach highlighted the need for assays like HiChIP as the combi- nation of ChIP-Seq and HiC on a computational level was not successful. In the final aspect of the projects, the goal of obtaining 3D chromatin changes between healthy and HCM was completed using H3K27ac HiChIP. The work focused on an essential locus in heart failure and showed the role of MEF2:HDAC interactions have in the expression of both NPPA and NPPB. ix 1 Introduction 1.1 Histones Discovery to the Epigenome To understand disease and generate potential therapeutics, it is important to determine which genes are being altered in diseased cells when compared to healthy cells. There has been a con- certed effort throughout pharmaceuticals and other research to gain a better understanding of disease causing genes [1–4]. From single-cell (sc) prokaryotes to the highly developed, multi- cellular eukaryotes, there is regulation of which genes are being transcribed [2, 4, 5]. There are numerous levels of control throughout the genome and, as the complexity of the organism is increased, the regulation of their genes are also more complex. To gain a fundamental un- derstanding of this regulation, decades of work have been completed and as we have moved from one dimensional (1D) analysis into the 3D and even 4D, ever deeper complexities have emerged. The main focus throughout this work will be to understand how the organization of the cell nucleus, specifically the chromatin, can affect gene regulation. Here, one of the most common genetic heart diseases, hypertrophic cardiomyopathy (HCM), was probed to better understand which genes were important in the disease and how chromatin structure was helping to drive the disease phenotype [2, 6–10]. This work was completed both in a computational analysis of previously published work which probed both the gene expression levels and the corresponding 3D structures [10] and new experimentation discussed in greater detail below. However, as the main topics in this work are centered around chromatin structure and how it is important to gene regulation and expression, an introduction to chromatin and gene regulation will be included here. To determine how genes are activated or silenced, we have to look at how they are struc- tured within the 3D deoxyribonucleic acid (DNA) landscape. For this, a brief summary of how chromatin and epigenetic controls will be included here. Chromatin, termed in 1881 by Flem- ming [11, 12], is generally regarded as DNA wrapped around histones and other transcription factors (TFs) bound to other regions of DNA [13]. There has been a lot of advancement in the understanding of what chromatin is and the components since it was termed in 1881, including 1 classic experiments of Avery, MacLeod, and McCarty in 1944, which showed DNA as the key genetic material [14]. That experiment fundamentally changed the dogma in the field away from the idea that proteins were the main genetic material passed from cell to cell [14]. There were other leaps in the understanding of chromatin between the discovery of DNA as the main component of the genetic material to the terming of nucleosomes in 1975 by Outdet et al [15]. Although these steps will be summarized in a few key experiments, the ability of the researchers to be completed with the limited technology is fascinating. To highlight some of the most important discoveries, such as Watson and Crick, who discovered the structure of DNA in 1953 [16] (with the contributions of Franklin), are essential to the timeline to understanding the complex nature of gene expression. After it was known that DNA was the genetic material [14] and the overall structure of DNA [16], the next step was to determine how genes were being expressed. Although histones had been discovered by Kossel et al in 1884 [11, 17], there was not a lot of work to study what proteins interacting with DNA until the mid 1960s. At that point, Allfrey et al made the first associations were made between histone modifications, such as acetylation and methylation, and gene expression [18]. Allfrey showed an important correlation between histone acetylation and activated gene expression [11, 18]. Despite the discovery of histone modifications being important for gene expression, it was nearly a decade later when the general chromatin models still being used today were first proposed [19]. The first example of high resolution images of chromatin was completed in 1974 by Olins et al and the well-known term of ’beads on a string’ was coined for how histones were wrapped around DNA [19]. Later in the same year, Kornberg and Thomas further developed the model with by the discovery of different sub-units within the histone, which have since been termed H2A, H2B, H3 and H4 [11, 20]. The histone complex with DNA wrapped around was then termed a nucleosome the subsequent year [11, 15]. It was not until 1997 when Lugar et al discovered the components of the nucleosome to be two copies of each of the histone proteins (H2A, H2B, H3, and H4), which were subsequently bound by 147 bp of DNA, which wrapped around its core [21]. The amazing breakthroughs took over a century to discover exactly what the components of chromatin which Flemming observed in 1881. Although the field of genetic regulation and control was still active between the 1970s 2 through the 1990s, it was not until the mid to late 1990s until the next and continued phase of epigenetics began [11,12]. In 1993, Turner helped to propose the idea that information for gene regulation could be contained on the known histone tails [22]. He looked at many aspects of this epigenetic control including the dynamic process of turning genes on and off with varying modifications, giving more evidence to the earlier work completed by Allfrey in 1964 [11, 22]. So, less than thirty years ago, there was evidence for the ability of histone modification to be directly implicated in gene expression [18, 22] and for these modifications to be dynamic and under some of environmental control [22]. Since the turn of the century, many discoveries help to prove pioneers like Allfrey [18], Olins [19], Kornberg [20], Richmond [23], and Turner [22] to be correct in their models [11]. Thus far, the models did not encompass too many proteins or TFs besides the nucleosome complex but this was changed at the dawn of the new age epigenetic revolution [12]. Two important areas of epigenetics are the modifications made to histones, such as methylation and acetylation, and the open regions of DNA between histones, which is generally termed as chro- matin accessibility. Both of these aspects are vital in epigenetic control and often intertwine with each other, where regulatory mechanisms are able to alter the histone modifications either directly or indirectly [24–28]. However, to gain a full appreciation for the chromatin acces- sibility and the factors which utilize these spaces of accessible DNA, an understanding of the histone modifications which can regulate gene expression is helpful. 1.2 Histone Modifications and Post-Translational Modifications Histone modifications are one of the most important factors for gene expression and will be an important discussion throughout this work. While there are numerous histone modifications of all the different components of the histone complex, the majority of the gene regulation marks are found on the H3 histone [11, 12, 29, 30]. Due to this, the focus of discussion here will be the H3 modifications and a comprehensive list of other modifications were complied by Zhao and Garcia [30]. In Table 1, a compressed list of H3 histone modifications responsible for gene regulation can be seen. Although all are important factors, some will be discussed further such as H3K4me2, H3K4me3, H3K27ac, and H3K27me3 are some of the more important epigenetic 3 marks [11,12,31]. Other histone marks such as H3K9ac are important for histone deacetylases (HDACs) which will be discussed in greater detail in another section [12, 32]. However, as mentioned above, the beginning stages of histone modifications was discov- ered by Allfrey and there was a lot of important work completed between his experiments and the current understandings of the complexities of histone modifications relating to gene expres- sion. An important aspect is the ability of these modifications to recruit other factors (TFs) to help facilitate the mark they are associated with (gene activation or repression) [12, 13, 29, 31]. There are numerous regulatory controls within the genome, some of which include promot- ers, enhancers, insulators, and repressors. An brief review will be given here because histone modification functions are often directly correlated to these controls [12, 13, 31]. Although there are numerous histone modifications, such as phosphorylation [33], deimi- nation [34,35], O-GlcNAcetylation [36], ADP ribosylation [37,38], ubiquitylation [39], sumoy- lation [40], proline isomerization [41], and tail clipping [42], two of the most important modi- fications for gene expression are acetylation and methylation [12, 13, 29, 31]. There are likely to be numerous more factors and controls of the histones, which have yet to be discovered. The control mechanisms of gene regulation as the complexity of an organism increase also drastically increase [4]. 1.2.1 Histone Acetylation Although the many marks mentioned above are also important for regulation, one of the most important marks for gene expression is histone acetylation. Histone acetylation happens mostly on lysine residues of histone tails, which negates the positive charge on the lysine residue [29,43]. The action of acetylation of the histone tails at specific locations helps to activate gene expression [18, 29, 44, 45]. There are two enzymes which are essential in histone acetylation and they are histone acetyltransferases (HATs) and histone deacetylases (HDACs) [29]. There are many complexities of HATs as there more than thirty types in humans, which are generally divided into two groups (A and B) [29, 43, 46]. 4 Amino Acid Mod Mod Repeats Function Reference(s) R2 me 1 Gene expression Chen et al. 1999; Schurter et al. 2001; Greer and Shi 2012 K4 ac Transcription activation at some promoters Guillemette et al. 2011 me 1 Transcriptional activation Wang et al. 2001a; Nishioka et al. 2002a; Wilson et al. 2002 me 1;3 Gene activation Zegerman et al. 2002 me 2;3 Transcriptional activation Hamamoto et al. 2004 me 1;2;3 Transcriptional activation (All) Strahl et al. 1999; Lee et al. 2007; Xiao et al. 2011 Gene activation Greer and Shi. 2012 Enhancer function Herz et al. 2013 R8 me 1 Transcriptional repression Pal et al. 2004 K9 ac Transcriptional activation Grant et al. 1999; Nowak and Corces 2000 me 1;2 Transcriptional repression Tachibana et al. 2001; 2002; Ogawa et al. 2002; Xin et al. 2003;Ogawa et al. 2002 me 2;3 Pericentric heterochromatin O’Carroll et al. 2000; Rea et al. 2000; Lachner et al. 2001; Peters et al. 2001 Transcriptional repression Schultz et al. 2002; Yang et al. 2002; Dodge et al. 2004; Wang et al. 2004 S10 ph Transcriptional activation Lo et al. 2001 Transcriptional activation Sassone-Corsi et al. 1999; Thomson et al. 1999; Cheung et al. 2000; Clayton et al. 2000 Transcriptional up-regulation Anest et al. 2003; Yamamoto et al. 2003 K14 ac Transcriptional activation Brownell et al. 1996; Kuo et al. 1996; Mizzen et al. 1996; Schiltz et al. 1999 R17 me 1 Transcriptional activation Chen et al. 1999; Schurter et al. 2001; Bauer et al. 2002; Daujat et al. 2002 K18 ac Transcriptional activation Grant et al. 1999; Schiltz et al. 1999 ac Transcriptional activation Daujat et al. 2002 K23 ac Transcriptional activation Grant et al. 1999 ac Transcriptional activation Daujat et al. 2002 me 3 Transcriptional repression Cao et al. 2002; Czermin et al. 2002; Kuzmichev et al. 2002; Muller et al. 2002; Su et al. 2003 K36 ac Promoter mark on active genes Morris et al. 2007 me 1;2;3 Transcription activation Edmunds et al. 2008; Wang et al. 2007 P38 iso Gene expression Nelson et al. 2006 Y41 ph Gene expression Dawson et al. 2009 R43 me 2 Transcriptional activation Casadio et al. 2013 K56 ac Transcriptional activation; DNA damage Xu et al. 2005; Ozdemir et al. 2005; Masumoto et al. 2005 me 3 Heterochromatin Jack et al. 2013 K64 ac Nucleosome dynamics and transcription Di Cerbo et al. 2014 me 3 Pericentric heterochromatin Daujat et al. 2009 Table 1: List of H3 Histone Modifications • Mod (column 2) refers to the post-translational modification made on the listed amino acid. They include me = methylation, ac = acetylation, ph = phosphorylation, and iso = isomerization. • The Mod Repeats (column 3) refers to the number of times that modification is repeated on the given amino acid. 5 The two types are quite distinct in their process but are both essential for proper regulation. Type A HATs have more implications in epigenetic control of gene regulation since they are found in the nucleus and are known to acetylate histones within chromatin [29, 43]. They are vastly important to gene expression in this regard and are often referred to as transcirptional activators, such as p300 [29,44]. It would be nice to think that the type A HATs, such as p300, are able to activate gene expression alone, simply by adding some acetyl groups to specific regions of the histone tails, however, this is not the case [44, 47]. There are generally very large complexes formed at these sites, with many factors coming together at the same location to ensure properly controlled gene expression [29, 44, 45, 47, 48]. Although less important for direct gene regulation, type B HATs are generally found in the cytoplasm and acetylate histones, which is important for the histone regulation and fate [29, 49]. The other group of enzymes very critical in histone acetylation are HDACs, which will be discussed in much greater detail below, however, their function is to reverse the acetylation of HATs, such that the lysine residue returns to a positive charge [29, 44, 50]. Due to this reversal in the lysine acetylation, most gene expression is reduced, highlighting the important function of the histone acetylation [29]. As a result, the action of HDACs is to silence and negatively regulate gene expression, which will be a major topic throughout this work. As a result of these functions of histone acetylation, it will be an important aspect of this work. The specific mark which was probed for this work was the acetylation of lysine 27 on histone 3 (H3K27ac), as it has been shown to be one of the most important marks to indicate active gene expression [4,29,31,43,44,51]. Due to this, if there are losses of H3K27ac, there is a reduction in gene expression, which makes it an ideal modification to udnerstand the changing chromatin landscape for expression [4,29,31,43,44,51,52]. Also, since the activation of genes often occurs with large complexes [44], it has been shown the 3D chromatin structure would also be changing when gene expression is turned off and will be discussed in greater detail below [4, 13, 51, 53–56]. 6 1.2.2 CpG Islands Apart from histone methylation, another important regulatory mechanism of gene expression is the ability of cells to methylate cytosine in DNA. There are numerous DNA methyltransferases which play impactful roles on gene expression and for other essential cellular tasks such as maintaining methylated marks during cellular replication [4,57–59]. However, one of the most important DNA feature with cytosine methylation with a formation called CpG islands [4, 13, 29]. A CpG island refers to numerous repeats of cystine and guanine directly adjacent to each other in the genome, with the cystine at the 5’ end [?, ?, ?, ?]. Generally, there need to be a few hundred to a few thousand base pairs with an unproportionately highly (about 65%) guanine and cystine ratio [60, 61]. Interestingly, about 75% of the cytosines in a CpG are methylated in mammals [62]. Un- like histone methylation where the presence of methylation can either help to activate or silence genes, cystine methylation within CpG regions silence gene expression [?, ?, ?, ?]. The CpG islands are often associated with activation of genes in mammals with them being implicated in promoters to help start gene expression [31, 60, 62]. In these regions there is little methy- lation, which helps to show the complexities of gene regulation. Although CpG islands were not probed in this work, it is important to realize the large number of gene expression controls found in the genome. 1.2.3 Cis-regulatory Elements There are quite a few regulatory mechanisms which are vital in gene expression but some of the most important and well understood are promoters and enhancers [29,63,64]. These regulatory regions are often referred to as Cis-regulatory elements (CREs) as they are important elements which help to control gene expression [63]. Promoters and enhancers help to regulate gene expression but they have different properties and will be discussed here [63–65]. Promoters are the closest elements to a gene and are essential for their expression. One of the control mechanisms within promoters were briefly discussed above as CpG islands. Amazingly, about 70% of human promoters have a CpG island [61, 66]. Due to the control of methylation within these cites, promoters could be activated or deactivated with either methyl- 7 transferases or demethyltransferases. Although this is one control mechanism for promoters, another important aspect is the binding of transcription factors to help start gene activation or gene silencing. In order to activate transcription, a large complex has to be formed, with the binding of multiple transcription factors [67–69]. As will be discussed below, the promoters do not have to be close in the linear DNA but within the 3D landscape of the nucleus, so a promoter could be acting on a gene which is tens or even hundreds of thousands of base pairs away from it in the linear DNA space [53–55, 70, 71]. Although promoters are essential in this process they are generally highly conserved within the genome [63]. Though there are implications on disregulation of promoters, it is much more common to have disregulation in the enhancer elements [63]. The ability of en- hancers to regulate proximal genes has been widely understood [54] but the field has begun to understand the role of enhancers to interact with genes slightly further away has been aided by 3D techniques such as the chromosome capture methods and methods like HiChIP [53]. Recently, the importance of CREs in immune cells [56, 72] through HiChIP has expanded on the importance of both distal enhancer and promoters. To fully understand how the genome is regulated, the 3D landscape has to be taken into account for these reasons. 1.2.4 Post Translational Modifications Although there have been over 200 post translational modifications (PTM) discovered which are able to regulate proteins through various mechanisms [73, 74], there are two which will be implicated in this work and will be briefly discussed: phosphorylation (phos) and O-GlcNAcetylation (O-GlcNac). The most common PTM, phos, has a potential of around 700,000 residues in hu- man proteins [?,?]. Ubersax and Ferrrell [?] have an extensive review on protein phosphoryla- tion which described the general mechanism of phosphorylation is the transfer of a phosphate group from ATP to a target protein, specifically serine, threonine, and tyrosine [?]. They also went into detail about how these phosphorylation sites are essential for proper regulation of pro- teins using many different mechanisms [?]. Another recent review was completed by Ardito et al which gives an extensive review of potential phos mechanisms related to cancer [?]. The phos of specific residues in proteins can regulate specific pathways in the cell [?, ?] and has 8 been implicated in hypertrophic cardiomyopathy [2]. Another important PTM in understanding the mechanism of HCM is O-GlcNac [2]. This PTM adds an O-linked N-acetylglucosamine to serine or threonine residues [?] and has been linked to stress responses [?, ?]. The PTM has recently been linked to many human diseases [?, ?, ?] including parkinsons [?] and HCM [2]. Interestingly, proteins have the PTM regulated by either O-GlcNAc transferase (addition) or O-GlcNAcase (removal), so understanding the complex nature of this PTM can be studied in numerous ways [?, ?]. Since it is known the types of residues which can be modified, specific mutations can be made on proteins known to be O-GlcNac to further udnerstand diseases such as parkinsons [?]. These two residues are important in the health of cells and further understanding of them will have large impacts on many diseases. 1.3 Chromatin Accessibility As has been discussed above, there are many control mechanisms used by cells to regulate gene expression. Although this was not an exhausted list another major aspect of gene expression control is the accessibility of chromatin. The accessibility refers to regions of the genome which do not contain nucleosomes and have been implicated in high gene expression with a few different techniques which will be discussed in greater detail here [13, 20, 28, 75–84]. The general concept is chromatin is more accessible in regions which are more actively transcribed and there are less factors, such as nucleosomes or TFs, which are bound to the DNA as shown in figure 1.1a [13]. Amazingly, only around 3% of the genome is accessible at a given time [27]. The accessibility of chromatin has a large affect on gene expression and a lack of accessibility in heterchromatic regions makes the chromatin more rigid and unavailable for other factors to bind [13, 51] 1.3.1 Enzymatic Cleavage of Accessible Chromatin Chromatin accessibility first came into the light in the early 1970s when chromatin was frag- mented and shown there were regions of the genome which had less nucleosomes at specific regions within the genome [13, 75]. This work and other work throughout the 1970s using 9 Figure 1.1: (A) Figure 1 from Klemm et al [13] which shows the different accessibility of chromatin. In the ”closed” state, chromatin is inaccessible due to being tightly bound to histones. In the ”permissive” state, TFs are able to bind by removing nucleosomes or shifting them around and in the ”open” state, TFs and other transcirptional activators such as a Pol II can bind to the DNA [13]. Pol II, RNA polymerase II; TF, transcription factor. (B-D) Adapted from Klemm et al’s Figure 2 [13]. (B) Overview of DNase-Seq - openly accessible DNA is randomly cut by the endonuclease, DNase I. If a TF or other factors, such as nucleosomes, are bound, the cutting will happen at the edge of the steric hindrance. After cutting, the samples have adapters ligated and sent for sequencing [13]. (C) Overview of MACC, in which MNase is used to fragment open chromatin but is sterically hindered by nucleosomes and TFs. (D) Overview of ATAC-Seq, where Tn5 cuts and adds a sequencing barcode at chromatin accessible regions. Tn5 is hindered by TFs or nucleosomes and after cutting, sequencing libraries are prepared [13, 28]. 10 DNA endonucleases [20, 76] were the first examples of a potential important feature of chro- matin [13]. The method at the time was to use Southern-blots to determine the regions of interested which were more readily cut by a DNA endonuclease [13]. Although there was subsequent work completed on chromatin accessibility between the late 1970s using Southern- blots, the method was limited by the amount of sequencing and genome wide information that could be obtained. The next big breakthrough in the field was the advent of DNase-Seq in 2006 [77, 78]. The new method published in July of 2006 by Crawford et al [77] and Sabo et al [78] created a new way to understand these accessible regions by using the enzyme DNase I, which cleaves DNA in an unbiased way [77, 78]. Crawford et al and Sabo et al utilized this enzyme and combined it with a newer technique of micro arrays, which allowed for the binding of up to 1% of the genome sequence to be tested for the presence of accessible regions [77, 78]. The technique exposed nuclei from different cell types to different concentrations of DNase I, added a biotinylated linker to the subsequent fragments for enrichment, amplified the fragments using polymerase chain reaction (PCR), and then loaded the samples onto a customized microarray where fragments which had sequence homology would be recognized [77]. The use of multiple concentrations of DNase I was im- portant due to the state of accessibility of a region [77]. Some of the accessible regions were more readily cut by DNase I, indicating a hypersensitivity, which could be correlated to higher expression or less regulatory machinery nearby. This technique has been extremely impactful on the field and has paved the way for the newer techniques widely used today. However, the next leap of the technique happen with the advent of Next-generation se- quencing when the DNase-Seq was adapted from microarrays to full genome sequencing by Boyle et al in 2008 as shown in figure 1.1b [79]. Although Boyle’s method was able to gener- ate the high-throughput sequencing results, the protocol was very complicated and produced a lot of background [13,79]. Another paper was published a year later by Hesselberth et al, which addressed both the complexity of the protocol and reduced the amount of background [13, 80]. Overall, the DNase-Seq method was able to correlate accessible chromatin to gene expression, with at least 80% of the results from DNase-Seq being around enhancers [13, 77–80]. A similar method to DNase-Seq is MNase accessibility (MACC), which uses micrococcal 11 nulcease (MNase) to fragment the DNA instead of DNase I as shown in figure 1.1c [13,82–84]. The results showed similar results to DNase-Seq results but is more highly enriched at H3K27ac sites [13]. Although there is a lot of correlation in more of the regions between DNase-Seq, there are differences between where gene expression is ending [13]. Although DNase-Seq and MACC have been important techniques, arguably the most important breakthrough for chromatin accessibility has been the invention of the assay for transposase-accessible chromatin using sequencing (ATAC-Seq) in 2015 [28]. The ATAC-Seq method not only simplified the workflow of the method but also was able to decrease the num- ber of cells required from the millions to around 50,000 [28, 80]. The method was improved further to needed only around 500 cells in 2017 by Corces et al [81], both from the Greenleaf lab. ATAC-Seq utilized a transposase enzyme, Tn5, which will discussed in greater detail be- low. Briefly, the utilization of Tn5 allowed for the fragmentation of isolated nuclei and tagment the DNA with barcodes compatible with Next-Gen sequencing. Tn5 also does not readily cut heterchromatin as the DNA is too tightly packed around nucleosomes and other factors which maintain the heterchromatic state [28]. The ATAC-Seq protocol incubated live cells in a hypotonic buffer to to isolate the cell nuclei, which could then be subjected to Tn5 fragmentation via insertion of a barcode through transposition. After the tagmentation reaction, the DNA could be purified and subsequently amplified via PCR before sequencing as shown in figure 1.1d [28]. Beyond the decrease in cell number, another large breakthrough of this technique was how much more quickly it could be completed. In comparison to DNase-Seq or MACC which take a few days of labor intensive work, ATAC-Seq could be completed in a few hours [28, 80]. This has since allowed the technique to be used on a very wide scope and provide chromatin accessibility to a wide range of cells. The general protocols for these three techniques can be seen in figure 1.1 b-d, which shows how the methods all have the same concept. Although they all use different enzymes to achieve cutting the accessible chromatin regions, they accomplish it in different ways. Al- though DNase-Seq and MACC can be used, ATAC-Seq is the quickest method, uses the fewest amount of cells, and produces some of the highest quality data [13, 28]. As a result, it is now 12 the most used technique of the three and has been continually scaled down to the single-cell level (discussed below). These assays will continue to be widely used and important for the understanding of gene expression. Although there is a major down-side to ATAC-Seq and that is the necessity of using live cells [28]. As a result, for clinical samples or other hard to obtain samples, using the assay can be challenging. As a result of this limitation, the assay was not used in the studies outlined here, instead a deeper understanding of H3K27ac was completed using ChIP-Seq and HiChIP (discussed below). 1.3.2 Transcription Factors and Chromatin Remodeling The power of TFs to bind to DNA and help regulate gene expression is a large component of their functions, since around 90% of the regions where TFs bind are in the accessible regions [27]. There are over 55 different families of TF binding proteins in eukaryotes, which include a variety of factors which are important for both chromatin structure and gene expression [4,85]. Some of the most important factors are those which help with chromatin remodeling, such as a family which helps to get rid of nucleosomes which have already been wrapped around DNA [4, 86]. Although this might seem like a somewhat superfluous action but it has far reaching affects on structure and gene express [4, 86]. The class of proteins, Snf2, can recognized a variety of marks or other bound TFs and then after able to eject specific nucleosomes, allowing other factors to bind to the DNA (figure 1.2a) [4, 13, 86]. They are also responsible for helping to assemble the nucleosomes in proper spacing, which Clapier et al provided an excellent review in 2017 [86]. There are four major models for chromatin accessibility where TFs play an important role which will be explained briefly here [4, 13, 51, 86]. The first is a direct response to a SNF2 protein excising a nucleosome (figure 1.2b (1)). The consequence of the removal of a nucleo- some creates a more accessible region for other factors to bind, such as TFs which can regulate gene expression [4, 86]. The model described is likely a more simplistic version compared to what would happen in vivo, however, that does not mean it does not occur [4, 86]. The second model in figure 1.2b described a case in which an architectural protein (AP) is bound to the DNA nearby to the nucleosome. APs are generally considered to be proteins which bind to 13 Figure 1.2: (A) Adaptation from Figures 1 and 5 from Clapier et el [86]. The Snf2 remodeling family goes to specific sites in chromatin and is able to perform many different functions. Shown here is the ability to assemble nucleosomes in a proper spacing and order, the removal of nucleosomes to make the chromatin more accessible, and the editing of nucleosomes by the addition or removal of histone variants [86]. (B) Figure 5 from Klemm et al [13]. These four different models for chromatin remodeling (CR). In (1) the chromatin access from (a) is shown, (2) Shows the ability to remove a nucleosome and subsequent addition of a TF, (3) Shows the recruitment of a Snf2 family member to excise a nucleosome, and (4) shows a TF binding to a nucleosome and the subsequent recruitment of a Snf2 family member [13, 86]. TF, Transcription factor; CR, chromatin remodeler; AP, architectural protein; SF, secondary TF. 14 DNA to provide a variety of functions including a place-holder to protect certain regions of the DNA from other TFs which might activate or repress gene express [4]. These proteins likely compete passively with the activators or repressors and are an important regulatory feature of chromatin [4]. So, as show figure 1.2b (2), as the concentration of a TF is increased it has the capability of replacing the AP which can then downstream recruit an Snf2 protein to excise a nucleosome, allowing a co-factor or secondary TF (SF) to bind and change the accessibility and dynamics of the region of chromatin. The third model in figure 1.2b, begins to bring in the 3D architecture of the genome and how gene activation or repression would occur in vivo. A TF already bound to the chromatin (TF’), recruits another TF, which is able to recruit a Snf2 protein to excise a nucleosome result- ing in an accessible region, far in the linear DNA but close in the 3D landscape [4, 86]. After the chromatin is accessible, a SF can bind in the region and form a complex, stabilizing a 3D loop or region of chromatin. This regulation could be the formation of the transcription start machinery or various other regulatory mechanisms [4, 13, 51, 86]. The last model to be discussed shown in figure 1.2b (4) is the ability of TFs to bind onto DNA which is around wrapped around the nucleosome [4]. In this case, similar to the others, the bound TF would recruit an Snf2 protein to excise the nucleosome, allowing for a region of previously inaccessible chromatin to be bound by a SF. This model is very similar to the second model, however, the difference is that the TF would bind directly to histone wrapped DNA instead of out competing an AP [4]. In all of these models, TFs play an essential role in both chromatin accessibility and gene expression. Two important TFs,MEF2 and HDACs, will be described in great detail below and will be a major focus of this work. These factors are likely to follow the third model as will be discussed later. However, these are only two examples of the numerous in the genome and these regulatory mechanisms are just starting to be understood [4, 13, 51, 55, 87]. The accessibility of chromatin helps to bring in many aspects of control for gene regulation. As TFs are bound, nucleosomes excised, histones modified, there are various outcomes that are implicated and highly coordinated throughout complex life forms [4, 88]. It can be difficult to imagine all of these factors playing key roles but the complexity of gene expression is vast and there are likely 15 more regulatory controls yet to be discovered. As the known control mechanisms and yet to be discovered mechanisms are understood they could pave the way for understanding diseases better and in turn generate new classes of therapeutics to control the accessibility and gene regulation. 1.3.3 DNA Footprinting Another powerful ability of DNase-Seq [80] and more recently with ATAC-Seq [28] is the abil- ity to perform DNA-footprinting, which is to determine which TFs or other factors are bound to the DNA [28, 72, 89, 90]. DNA footprinting is not a new concept and was first discussed in 1978 by Galas and Schmitz [89]. In their experiment, the conducted a DNase assay and then tested to see the binding of the lac repressor to the lac operator [89]. The amazing experiment showed the ability to visualize the sequences which were being bound by a TF [89]. Although since the advent of the technique in 1978 which used agarose gels, today we can use Next-gen sequencing with the techniques mentioned above such as DNase-Seq or ATAC-Seq [28,80,90]. The steric hindrance of a bound TF or nucleosome in the DNA prevents the selected en- zyme from cutting the DNA near that site, meaning there will be a lack of sequencing reads at that location. A simplified sequencing readout is show in figure 1.3, where there is a valley in the location of a bound TF or nucleosome. This has allowed for powerful bioinformatical anal- ysis to better understand the factors which are bound near highly accessible chromatin, helping to give better understanding of how genes are regulated by CREs [53, 55, 87]. Recently, ATAC-Seq was applied to a 32 different immune cells by Calderon et al in 2019, which showcased the quality of the data but also the ability to run thousands of samples in a high-throughput and cost-effective manner [87]. Calderon was able to show differences in chromatin accessibility and correlate that data to gene-expression data to better understand how immune cells differentiate and which genes are important in that transition [87]. They were able to confirm their findings with other immune cell research completed using different techniques, which also highlights the power for better understanding of the chromatin acces- sibility. Chromatin accessibility has also recently been shown to be important to understand dynamics of cancer in tumors with the Tumor Atlas Network [91]. These breakthrough studies 16 Figure 1.3: (A) A cartoon of a TF bound to a random string of DNA B An endonuclease or other enzymatic fragmentation is not able to cut where there is a bound factor to the DNA. Due to this steric hindrance, when the results are sequenced, there is a depletion of sequencing reads where the factor was bound. (C) Where there is a depletion in the sequencing reads, the original TF binding sequence can be determined and if the sequence has a known TF, it would be likely to have been bound in that position. TF, Transcription Factor. 17 help to show the importance of chromatin accessibility in gene regulation and the factors which are helping to control the mechanism of gene expression [13, 28, 53, 55, 72, 77–81, 81–84, 87]. Combining these techniques with others discussed below could help to further understand how gene expression is happening and which factors are important. The complexities of gene regulation are extensive and this section only begins to scratch the surface of the control mech- anisms seen in humans and throughout life. Although DNase-Seq and ATAC-Seq are able to have DNA footprinting analysis completed, Chromatin Immunoprecipitation followed by se- quencing (ChIP-Seq) is one of the most powerful tools to determine TF binding and will be reviewed in detail below [27]. The use of ATAC-Seq for DNA footprinting can help ameliorate the need for TF ChIP-Seq, however, the information is not as specific as with TF ChIP-Seq. The reason behind this is there is shown in figure 1.3 because there could be multiple co-factors forming a complex in which only the direct DNA:TF interaction is determined. Although co-factors are often known, to be certain about whether a TF is bound nearby to the DNA, ChIP-Seq will still be the preferred method. 1.4 The 3D Genome The 3D genome refers to the structure of the genome as it would be in vivo. A common way to think of the genome is a linear, or 2D, string of amino acids in which pieces which are far apart on a line would not be close to each other. However, through work from Dekkar and others, the 3D genome has shown regions of DNA tens of thousands or even further bases away from each other are close in proximity when the DNA is condensed within a nucleus [51,54,70,92]. Methods such as chromosome conformation capture (3C) [54] and HiC [70] will be discussed in greater detail below, but have opened up a whole new complexity and understanding of how the genome is regulated by regions which can be brought together in a 3D context. 1.4.1 Genome-wide Associated Studies To gain a deeper understanding of the 3D genome and the implications of the overall chromatin structure, it is important to understand that single nucleotide polymorphisms (SNPs) can have 18 an impact on diseases [51, 93, 94]. The first genome-wide associated studies (GWAS) were completed by Ozaki et al in 2002, in which the authors genotyped over 1,100 people to deter- mine if there were certain SNPs which would be a marker for myocardial infarction [95]. So, the goal of any GWAS study is to try and determine how a disease is occurring, as well as, the area in the genome that is responsible for it [51, 95–98]. The general read out from the GWAS studies are specific loci in the genome which have a higher frequency of mutation compared to a control group [51, 95]. From there, genes would be looked at which are proximal to those mutations and they have been found to cluster around enhancers [97–99]. Although over one million GWAS SNPs have been found [99], the vast majority of the SNPs are in non-coding regions of the DNA and the hits have not been vastly successful in targeting nearby genes [51, 94]. Although there are many reported cases of suc- cessful targets generated from GWAS studies [94], with the advent of 3D technologies, there are more studies revealing SNPs are likely interacting with long-range partners and causing mis-regulation of genes far away from their linear distance [51, 53–56, 87]. 1.4.2 Quantitative Trait Loci As a more complex understanding of the 3D is accumulated, new targets generated from the numerous GWAS studies [94], looking at the regions around the SNP is essential. For this, quantitative trait loci (QTLs) are useful. In a general sense, QTLs are regions of the genome which are responsible for specific phenotypes [100]. The idea behind a QTL is that it is a genetic locus which is able to influence specific phenotypes and a specific locus can affect numerous traits [100]. Amazingly, these QTLs are known in many cases to affect multiple sites [101], indicating there is some sort of 3D structure which is brings multiple regions close to each other to regulate a specific cluster of genes [102, 103]. Generally, a more specific type of QTL is the expression quantitative trait loci (eQTL) which is an region which is correlated to the mRNA in gene expression data [94, 103]. As RNA-Sequencing data has become higher quality, these linkages can be made and give infor- mation about the exact region which is being transcribed. In order to see if these regions were changing between individuals with disease, GWAS SNPs could be used to see if there were 19 overlaps between QTLs found in data and linked them to nearby QTLs [94, 104]. As shown in figure 1.4, a basic example of how eQTL analysis is completed to show if there is a SNP within a specific genomic region, there can have an effect on gene expression [93, 103]. In the example shown in figure 1.4b, different potential SNPs ”AC” and ”CA” drastically reduce the amount of expression of Gene A. If Gene A was shown to have altered expression levels in a disease in which there was a GWAS SNP nearby, the corresponding locus (eQTL) could be an explanation for the disease phenotype. Although this has been a powerful tool to determine which loci are impactful for GWAS SNPs, with about a third of some of the disease specific GWAS SNPs overlapping directly with eQTLs, there are many which do not, as they are in non-coding regions [51, 56, 104]. In more resent studies, it has begun to be apparent that these SNPs in the non-coding regions are important in the 3D landscape and can influence genes which were previously unknown as can be seen in figure 1.4c [56]. This leads again into the importance of spacial proximity of key DNA elements within the nucleus. If an important regulatory element tens of thousands of base pairs away is altered in some way, the effect on gene expression could be important for disease [56, 72, 81, 87]. There are other types of QTLs which have been discovered in recent years, including chromatin QTLs (chQTLs or hQTLs), which look at ChIP-Seq to determine if there are changes of histone modifications within a specific locus which is tied to altered gene expression [105]. The importance of QTL analysis is that it points the researcher in the direction of the loci within the genome which are having an impact on gene expression and trying to determine how that difference occurs. Although it is not able to explain exactly how the gene expression is being altered, it gives many clues and narrows down where researchers should be looking for potential therapeutics and focuses the research on the most impactful areas in the genome [94,100–103]. 1.5 Chromatin Immunoprecipitation followed by Sequencing (ChIP-Seq) 1.5.1 ChIP in brief Chromatin Immunoprecipitation (ChIP) has been a widely used method to facilitate the under- standing of in-vivo chromatin dynamics and TF landscapes over the last few decades [106–108]. 20 Figure 1.4: (A) A cartoon of an arbitrary region within the genome. Gene A has a GWAS SNP nearby which could affect gene expression. B Adapted from Figure 1 (Nica et al) [103], where different geno- types (shown AA, AC, and CA) would have an effect on the expression levels of Gene A. If low expres- sion was associated with disease, an explanation of the disease phenotype could potentially be explained. (D) A cartoon depiction of how an eQTL can help to regulate genes which are far in 2D space. The eQTL acts in cis with Gene B but also acts in trans with Gene A due to the 3D location of the eQTL. GWAS, Genome-wide Associated Studies. 21 The main principle behind ChIP is to determine where a specific target is binding onto DNA, whether it is histone modifications or TFs. This is accomplished by enriching a protein target with an antibody specific to that target and enriching the DNA:protein complex by washing away any DNA which is not bound by the antibody [106]. The first example of ChIP was complete by David Gilmore and John Lis in 1984, when they used ultra violet (UV) irradiation to cross-link protein to DNA to determine the distribution of RNA polymerase II (RNA pol ii) in an inducible Escherichia coli (e.coli) lac-operon [106]. The protocol included numerous aspects including sonication of the cells after fixation to about 600 base pairs (bp), the use of antibodies to a specific target, washing of DNA bound by antibody, and subsequent purification of the DNA [106]. Through this newly developed protocol, they successfully determined bind- ing positions of RNA pol ii by inducing specific regions of the lac-operon to see which regions were enriched with their anti-RNA pol ii antibody [106]. This was the first example in which fixed in-vivo TF-DNA complexes were enriched and quantified. Gilmore and Lis followed the study up the following year to further understand the binding of RNA pol ii with heat shock proteins in Drosophila melanogaster (D. melanogaster) [107]. This was an important development as they moved from a bacterial assay to a much more complex system in flies. Later that year, an important development still being used in most ChIP assays, was added to the emerging assay. Instead of using UV irridation for fixation, a chemical approach was used with formaldehyde to mediate crosslinking [109]. This protocol was optimized further by the same group leading to a very similar protocol as is used today. The big addition was the addition of a RIPA buffer, still ubiquitous within the field today [110]. However, one of the issues with the protocol developed was the processing of samples after cell fixation, which involved around 72 hours of ultra-centrifugation [111]. The new protocol, developed in 1992, eliminated this step for most experiments by diluting the sample in other buffers [112]. This allowed for the protocol to be completed more quickly, allowing for the use to be increased. It also allowed more laboratories, which might not have had access to an ultra- centrifuge to complete the assay. Another big improvement of the assay utilized polymerase chain reaction (PCR) for the first time in 1996, which allowed for the enrichment of the ChIP product [113]. The addition of PCR gave a significance increase to the signal-to-noise of about 22 10- to 20-fold over negative control samples [113]. Also, experimenters could then design PCR primers which overlapped specific regions of DNA to determine if the ChIP signal was present in a specific locus. The method of ChIP signal enriched by PCR is still used today, when full sequencing results are not desired. The next milestone for the assay was the development of ChIP-on-chip in 1999 [114]. This method utilized micro-arrays containing an array of DNA fragments, allowing for enriched ChIP DNA to be hybridized to it [114]. These microarrays helped determine what the fragments of the ChIP sample were, allowing for the first examples of a potential genome wide analysis [114]. The big breakthrough with this adaptation of the ChIP assay is it allowed for customized micro-arrays to be developed. Although this method was costly, it gave an almost genome- wide analysis and was quickly adopted in the next few years within the field to study from samples budding yeast to mammalian cells [114–119]. These data were a large leap-forward in understanding complex binding of TFs to multiple positionsin vivo throughout the genome, specifically promoter regions, help to determine important TFs in many cell processes [117, 119] . As a results of these pioneering studies, not only did the microarrays because higher resolution and more specific, it lead into the sequencing age. Although ChIP-on-chip was successfully used, there was a new adaptation of the ChIP protocol which has made its use widely used. The next large leap in the technology coming with the integration of high-throughput sequencing. Due to the overall success of ChIP-on-chip, it should not be a surprise that there was a concerted effort to use ChIP with high-throughput se- quencing methods. In May of 2007, the Zhao group won the race to publish the new technique termed chromatin immunoprecipitation followed by sequencing (ChIP-Seq) [122]. Their proto- col of sequencing their histone ChIP DNA (H 3 K 2 me 3 ) was to complete End Repair and adapter ligation on the ChIP DNA fragments, followed by PCR, and finally sequenced to generate the genome-wide analysis [122]. There were two other groups which published similar protocols in June and August of that year all using similar methods and all using the Illumina/Solexa system [122–124]. The overall protocol has had some variations since these methods were de- scribed but, the general workflow of ChIP-Seq has not been altered too drastically. However, there have been some improvements, specifically to the library preparation of the samples. 23 Figure 1.5: (A) Adapted from Klein and Hainer’s Figure 2 [120] showing the workflow of ChIP-Seq and CUT&RUN. ChIP-Seq uses antibodies on fragmented chromatin to enrich a specific TF or histone mod- ification. CUT&RUN uses an MNase attached to a Protein A (pA-MNase) to directly cut the chromatin where the specific antibody has already bound. (B) Adapted Figure 2 from Hoffman et al [121] showing the mechanism of formaldehyde fixation. (C) Adapted Figure 2 from Hoffman et al [121] depicting the structure of how fixation affects the DNA and binds protein to DNA. 24 Ever since the inception of ChIP-Seq in 2007, the general protocol has not been altered too greatly. The protocol has been briefly described above but more detail can be found in figure 1.5a. Cells are first formaldehyde fixed (1%) for around ten minutes after-which the cells are either snap-frozen in liquid nitrogen or they can be used right away for a ChIP assay [109]. After the cell fixation, the cell chromatin is then broken up using sonication or enzymatic, such as MNase or DNase I [125–127]. The optimal conditions for sonications often depend on the target, such as a histone modification or a TF. Generally, for histone modification ChIP assays, a size of 150-500 bp is optimal, while TFs can vary but often require less sonication (300-750 bp) due to the risk of losing the target epitope with too much sonication [93, 128, 129]. A good practice with ChIP assays is to run time courses on the fragmentation method (enzymatic or mechanical) to determine the optimal conditions figure 1.5a. After sonication, the immunoprecipitation (IP) can be completed with an antibody targeting a specific factor bound to magnetic beads, to allow for DNA enrichment and the removal of weak binding complexes after an overnight incubation with the sonicated sample [106, 122–124, 126]. Post IP, the samples can be amplified and sent for Next-Generation (NextGen) sequencing [122] (figure 1.5a). Afterwards, samples can be mapped to their appropriate genome and determine where the target marks are with determining statistically relevant peaks using programs such as Model-based Analysis of ChIP-Seq (MACS) [130] or Methylated DNA Immunoprecipitation (MEDIPs) [131]. 1.5.2 Encyclopedia of DNA Elements The importance of this assay has been highlighted by continued support from the National Institute of Health (NIH) in the form of a program called the Encyclopedia of DNA Elements (ENCODE) Project Consortium, which had its pilot studies started in 2004 and had the goal of finding and describing ”all functional elements in the human genome sequence” [1]. The project is on its fourth phase and has been funded since the initial funding in 2004 and has been focused on the initial goal of understanding the functional elements throughout the genome. In the pilot phase of the project, a portion of this work specifically identified using ChIP- ChIP and ChIP assays to understand the transcirptional machinery. After three years, the first 25 findings were published, where over 120 ChIP-ChIP data-sets were made available on a wide range of TFs and histone modification [132]. These data-sets were important in discovering new promoters and enhancers important to transcription [132]. The consortium also high- lighted the importance of histone modifications for their ability to determine there is an active transcription start site (TSS) nearby [132]. As a result of a successful phase I, the project was renewed in 2007 for over $80 million over four years [133]. One of the primary assays used in this phase of the project was ChIP-Seq. Although the project focused on ChIP-Seq work, only ”13 of about 60 known histone modifications and 120 of about 1,800 [TFs]” have been completed [134]. The result is that much more work has to be completed to further understand the genome. Although other methods are helping with understanding the complexities, ChIP has remained an important technique and will continue to be in the years ahead [108, 135]. 1.5.3 ChIPmentation The integration of Illumina’s Tn5 tagmentation enzyme to the assay with the development of ChIPmentation in 2015 helped to increase the signal-to-noise, reduce cell input requirements, and reduced the cost of the assay [136]. This method used the standard ChIP-Seq protocol, but changed a step of the library preparation, increasing the signal-to-noise and also the ability to have less starting material when preparing a library for sequencing [129, 136]. Tn5 is used to insert a DNA barcode post ChIP and prior to any DNA purification. While the chromatin is still attached to the magnetic beads, Tn5 is added to the enriched ChIP-Seq sample and subsequently tagmented with a barcode. As a result, the tagmented sample can then be PCR amplified with very minimal input (as little as 100 pg) and sequenced [129,136]. Previous sequencing methods using blunt-end ligation required large amounts of DNA input (> 15 ng). Due to this constraint it was common for enriched ChIP-Seq DNA to have to be amplified prior to any sequencing amplification using whole genome amplification [52]. The extra step could increase the risk of PCR bias and the loss of minor peaks by the over representation of more common peaks within a sample. As a result, the loss of the amplification prior to sequencing helped to reduce signal-to-noise and has been my preferred ChIP-Seq method since [128, 129, 136]. 26 1.5.4 ChIP-Seq Cell reduction to Single-cell and like methods Even though the general protocol has not been altered too much in the past decade, there have been efforts to micro-scale and increase signal-to-noise, which have lead to the development of new technologies. The first ChIP-Seq experiments mentioned above all used upwards of 20 million cells [122–124]. The large cell requirements were a limitation of the assay for use in clinical settings, however, there have been concerted efforts to reduce these numbers down [52,129,136–138]. These methods and similar techniques have been successful in micro- scaling clinical samples down to the low thousand cells [52, 129, 139]. These methods have opened the door to studying both more rare cell populations and more samples. The main improvements have been both with better availability on specific antibodies as well as improved NextGen sequencing techniques. However, to fully understand the dynamics of chromatin or TFs within the cell, there is a need for single-cell resolution data. The bulk techniques still have the issue of averaging the signal over the whole population of cells which were included in the assay. Thus, if there a population is not complete heterogeneous, the cell-to-cell variability will likely not be captured [140, 141]. These issues began with single-cell RNA-Seq (scRNA-Seq) and highlighting the differences between samples which were previously thought to be heterogeneous [140, 142– 145]. The ability to determine if the chromatin and TF dynamics was not able to be probed at the single-cell level. The first examples of single-cell ChIP-Seq (scChIP-Seq) was using complex micro-fluidic devices with a technique called Drop-ChIP [145]. This groundbreaking technique used MNase to fragment the genome of a cell and then separated the fragments into individual droplets within a micro-fluidic device [145]. The DNA contained within the droplets were then individually barcoded with a unique sequence before pooling the samples again to complete the IP in a bulk manner [145]. This technique then allowed for the de-multiplexing of the fragments in post-sequencing computational analysis [145]. This was the first example of scChIP-Seq and was a big leap forward but due to its complex use of specialized micro-fluidic machinery, the technique has not been widely adopted. Since the development of Drop-ChIP, two recent notable scChIP-Seq methods have been developed including an improvement of the CUT&RUN (similarly CUT&TAG) assays as well 27 as a newer technique called the simultaneous indexing and tagmentation-based ChIP-seq (itChIP- Seq) which are capable of going down to single-cell assays [146–149]. These protocols show a big advantage over Drop-ChIP in that they only require a cell sorting machine and not more complex microfluidics, allowing for the protocols to be more widely used. Although the im- provement of the techniques goes to the single cell level the ability to probe TFs is still challeng- ing. This is likely due to abundance of the DNA being enriched in the assay from a single-cell, increasing the background noise. This is a fundamental issue with the sc techniques in that the overall signal captured from the single-cells is low [145,149]. To overcome this issue there have to be some more improvements to the assay in order to capture all of the intended interactions. The CUT&RUN methods have two other distinct disadvantages at this time and the first was having to use living cells [146]. The issue was the inability to use formaldehyde fixation, limits the use of the technique with clinical samples, due to the large time necessary to collect the samples. For larger cohorts of samples, collection can take many months and up to years, meaning the samples have to be processed at completely different times or a method of preser- vation will need to be added to the protocol. Thus, methods that allow for cell fixation have an advantage of completing all sample preparation at a given time, reducing the effects of different batches. Another issue which could affect the data quality of the techniques was at the core of the assay was the unknown factors affected by the removal of DNA fragments from the hypo- tonic nucleus. Although the assay is completed at near 0 C in the case of CUT&RUN [146], the chromatin will likely have negative effects as a result. The problem would be increased with the use of CUT&TAG [148], where the cutting of the genome is completed at 37 C. Although the data quality is fairly high, even at around 200 cells, there are likely to be background effects caused by this issue. There other issue currently affecting the scChIP methods is the lack of complete cover- age of the genome. This had affected most sc methods including scRNA-Seq, scATAC-Seq, CUT&RUN, itChIP, and others [28, 140, 146, 149]. The amount of coverage from the meth- ods is drastically lower than when comparing to bulk data, indicating a fundamental issue with the techniques to only probe specific areas of the genome. Once these areas are perturbed, it is likely causing a secondary effect of losing other factors. One of the best current methods 28 to overcome this issue is to test numerous cells and combine the data to match up with bulk data [149]. This tactic has been able to recover most of the bulk data but shows the variabil- ity between cells to be fairly high. More likely than the cells having huge variation would be the methods have not been perfected yet. Although it leaves the question open as to how heterogeneous cells within a large population are. Although the protocols have come very far since the first example in 1984 by Gilmore and Lis, there is still a lot of work needed to understand the complex function of the dynamic genome. As such, it is important to continue to develop and improve methods. As we begin to understand how the genome is organized in a better way, we can understand more fully how cells are able to regulate genes and with enough time, uncover how to prevent and treat disease. 1.5.5 ChIP Current limitations Since its advent in the mid 1980s, ChIP and now ChIP-Seq have been used to better understand the genome and how genes are being regulated. Although the technique has stood the test of time, there are a few problems associated with the assay, some of which are generally ignored by the field. There are three distinct issues with current protocols including the number of cells required, the use of formaldehyde to complete cell fixation, and a more general issue of the use of antibodies. New methods which have been developed ameliorated one or more of these issues but there have not been developments which reduce or eliminate all three issues. The first limitation, which has had the most amount of work to improve was discussed above was the number of cells required for the assay. As mentioned, there have been single- cell methods for ChIP-Seq but the inability to used formaldehyde-fixed cells has continued to be a determine to the wide-scale clinical use of the techniques [146, 148, 149] . Other tech- niques, such as DROP-ChIP, which can use formaldehyde fixed cells use complex microfluidic devices making it challenging to have wide-spread use [147]. Although obtaining sc level data is important, there is another limiting factor to the cell usage in ChIP-Seq and that is the ability to sonicate a limited number of cells. For most sonicators, there is a requirement to have a few hundred thousand cells for optimal sonication, so even for microscaled ChIP-Seq [128, 129], for> 100,000 cells, the sample is usually a dilution of a sonication of 300,000 cells. This 29 means that even for a 10,000 cell sample, the chromatin being used is an ensemble of a sample which had a much larger number of cells and the ChIP-Seq sample is a dilution of large batch of cells. In order to get a true micro-scaled ChIP-Seq experiment with limited, fixed cells, methods such as itChIP, which first tagment and barcode a small sample of cells is ideal [149]. There are still some issues with methods which used enzymatic fragmentation as there would be more bias to either specific sequences or to open chromatin regions [28,149]. So, while methods are continually improving, it is important to weigh the pros and cons of each method prior to the completion of a specific type of ChIP-Seq. If the number of cells is not a concern, sonication would be the best, non-bias approach, but then ensemble batch effects are more prevalent. As the methods improve, it is likely only a matter of time before sc ChIP-Seq on fixed cells is possible with very high resolution data. A second limitation, which is not discussed a lot in the field is the use of antibodies. It has been estimated that around $800 million dollars is wasted on inconsistent antibodies in biological research per year, globally, with a the United States contributing about half of the total [150, 151]. Part of this issue can be based on a 2008 study conducted by Berglund et al, which showed that over half of 6,000 commonly used antibodies targeting only the specific target listed [150,152]. Amazingly, a study attempting to replicate 53 preclinical studies which used various antibodies, but was only successful in 6 [153]. Although these studies did not all involve ChIP-Seq, the inconsistency is a prevailing issue, especially with TFs [151]. This issue is likely caused both by the manufacturing methods of bleeding mice and the use of formaldehyde fixation altering the antibody binding motif. The other limitation, described as a black box by Garvilov et al, is the use of formaldehyde to fixate cells [154]. Garilove et al give a few examples as to how this problem affects all of the ChIP and long-range chromatin interaction procedures and their work will be summarized here [154]. Although the mechanism of formaldehyde fixation is known in chromatin which can be seen in figure 1.5b-c [155]. The formaldehyde is able to react with amino groups from proteins and DNA bases to form an intermediate which then reacts with a second amino group to condense and fix the cells [111, 155]. The fixation issue has more of an issue with the 30 crosslinking of TFs to DNA, more so than histones to DNA [154]. As was discussed above, the antibody issue has the same issue with TFs. For the TFs to be fixated onto the DNA, there would have to be an available amino group near the DNA, which would be accessible to formaldehyde, and close enough to be able to form a fixed product (figure 1.5b-c) [154, 155]. There has been evidence which has shown issues with TFs being fixed to DNA in vivo which can result in incorrect conclusions [156, 157]. These type of studies could have far reaching effects for the 3D centric methods which will be discussed below. If the fixation procedure is able to alter the structure then some of the information gained from these assay could be false positives [156]. Also, the fixation issue is likely a reason why TF ChIP-Seq is harder to complete compared to histone modification [129]. There are certain TFs, which seem to be able to be fixed readily, such as CTCF and cohesion, which is likely due to accessibility for fixation to occur and this issue could help to explain why some TFs are very challenging to complete ChIP on [53, 93]. The issue could be more of a problem with the lack of the target TF being fixed, rather than an issue with the antibody. If this is the case, then other methods of fixation will have to be utilized in order to capture the TFs which are not able to be fixed to DNA [154]. Or, other methods which do not require fixation, a potential method is discussed below, could be used. 1.6 Chromosome Capture Methods ChIP-Seq has been an important technique to understand where histone modifications and TFs are located within a 2D space, however, it lacks the ability to show any 3D information. The 3D is essential when determining how gene regulation is occurring because enhancers and promoters do not necessarily affect only the genes which are close in a 2D manner [53, 54, 70, 92, 158–161]. As more research has gone into understanding the 3D interactions within the nucleus, the deeper understanding of how gene regulation happens [93]. In Schmiedel et el, we showed a single nucleotide polymorphism (SNP) altered a binding site for a known 3D mediator, CTCF, which caused gene dis-regulation in asthma using 4C [93]. However, to gain a deeper understanding of all the 3D techniques and how they have been built upon over the years a summary will be completed here. The first and most simple method 31 was developed in 2002 by Dekkar et al and was termed chromosome conformation capture, or 3C. The concept behind the experiment was to find out which regions of DNA were close together in a 3D landscape. In order to do this Dekkar et al. devised a clever method which used formaldehyde fixed cells and digested the nuclear DNA with a restriction enzyme (EcoRI), which would recognize a sequence of six base pairs and cut the DNA (GAATTC). Since the cut-site of the enzyme was palindromic, it meant if there were 2 pieces of DNA, with a EcoRI cut site nearby, they could be ligated together via intramolecular ligation [54]. Afterwards, a specific region of the genome would be tested with qPCR, to see if the two regions of DNA could form a product using various PCR primers (figure 1.6). There were many downsides to the method, with the main one being there had to be some evidence the two regions were in close contact and specific primers had to be created for that purpose. A common term to denote 3C is a one versus one reaction, in that one locus of the genome is probed for one specific contact. The method was published in 2002, before the advent of very high-throughput sequencing, but it was one of the first examples of probing which long-range DNA regions were close to each other within the nucleus as has paved the way for 3D genome work completed today [54]. Four years after 3C, two groups simultaneously published almost identical improvements to 3C, which was termed Circular chromosome conformation capture (4C) [158, 159]. These methods expanded on the original 3C idea, which allowed for the use of microarrays or se- quencing to determine the nearby DNA to a specific target (figure 1.6). The expansion of the method made 3C a one versus all reaction, such that all the interactions found at a specific region could be determined [158, 159]. The methods were very similar, but Zhao et al used a restriction enzyme with a more frequent palindromic cut site (TTAA) in order to capture more interactions of a specific region [159]. They allowed the circular ligation to occur as in 3C but created primers nearby a known target region to complete a reverse PCR [54,159]. This was a clever idea which allowed for any DNA region nearby the target to be known since the 3’ and 5’ ends would be from the known region and al the DNA in the middle of the DNA would be from a sequence that was spatially nearby. 32 Figure 1.6: Adapted from Wei et al Figure 1 [162] showing the differences and similarities between 3D Chromosome Capture methods. 3C, 4C, 5C, HiC, and HiChIP all use proximity ligation as the first step in the reaction. 3C then uses known targets to enrich a specific region of DNA before determination via sequencing or other methods generating a one versus one data set [54]. 4C adds a secondary cutting and re-ligation event to circularize the DNA fragment. Afterwards, a primer targeting a known region will amplify the unknown proximal DNA before analysis through microarray or sequencing to generate a one versus all data set [158,159]. 5C used randomized primers to amplify all regions generating a many versus many data set [160]. HiC was the first example of an all versus all data-set, in which they enriched the reactions via biotin and sequenced all of the products [70]. ChIA-PET was the first example of an enriched HiC, which completed a ChIP before proximity ligation and subsequent sequencing generating a ChIP enriched all versus all data set [161]. HiChIP completed the general HiC protocol but enriched the sample via ChIP prior to biotin pull down and subsequent sequencing generating similar data to ChIA-PET with fewer cells and higher quality data [53]. 33 Simonis et al used a different idea and used the restriction enzyme HindIII, which cut more infrequently as the palindromic recognition site was longer (AAGCTT). This allowed for the tails of the cut regions to be larger for the ligation of nearby DNA sequences [158]. Size the fragment sizes were larger, they completed a secondary digestion with DpnII, which had a much shorter recognition site (GATC) and could then be circularized, and inverse PCR to occur as in Zhao et al (figure 1.6) [158,159]. These two methods were developed independently and accomplished the same idea in slightly different ways. They were a large improvement over 3C because of the ability to obtain all the contact regions around a specific target [158, 159]. The methods were a large step forward in the chromosome capture techniques and are still used. Also in 2006, Dostie et al with Dekkar as the corresponding author, developed an improve- ment of 3C termed Chromosome conformation capture carbon copy (5C), which was designed to detect all of the interactions in a large region (> 400 kb) [160]. Although the method did not capture all the interactions, 5C drastically increased the amount of coverage over previ- ous methods. The protocol added a step which used a multiplexed primer system to generate a library of all the 3C interactions in the reaction [160]. This allowed for the products to be amplified further with a secondary set of primers used in either high-throughput sequencing or on microarray’s (figure 1.6) [160]. Although a lot more information was obtained using this method and was generally termed with many versus many, the 5C primers used prior to expanding for sequencing were costly and could be challenging to interpret. Although 3C was the most popular method, 4C and 5C were still widely used and helped to further understand gene regulation and general chromosome structure [54, 159, 160]. How- ever, the most popular method for understanding chromosome structure was published in 2009 by Lieberman-Aiden et al, with Hi-C [70]. The method was once again developed under the supervision of Dekkar and was successful in expanding upon the 5C many versus many concept with the first example of capturing all versus all interactions [70]. The method used Simonis’ method of using HindIII as the restriction enzyme but did not use DpnII but instead used soni- cation after adding a biotin dNTP to the sticky-end ligation [70,158]. This allowed only regions which were ligated back together after the restriction enzyme cutting to be enriched [70]. After the enrichment of regions which were ligated back together, primers were added to the ends of 34 the DNA molecules and then were sequenced in a high-through put manner (figure 1.6) [70]. The ability to sequence a vast number of sequences through the new sequencing methods, allowed for HiC to be developed but also to be widely used in the field. The Lin Chen lab developed an analogous method in 2012 termed tethered chromosome conformation capture, which first biotinylated protein prior to ligation in order to reduce the chance of non-specific ligation from occurring [92]. These methods have been extremely powerful in their ability to find which regions of the genome were close in contact with each other, which has allowed for a much deeper understanding of not only how gene regulation happens, but also how the nucleus is able to condense DNA. Although there have been improvements to the method over the years (down to sc level [163]), the resolution of the assay has been a constant issue. Despite some of the downsides to the assay, it remains an useful and widely used technique to probe genome structure. 1.6.1 ChIA-PET and HiChIP The amount of data generated in HiC techniques is large and often requires very high sequenc- ing depth [70]. Although all versus all interactions are important, often researchers are in- terested how specific regions of the genome are structure and do not need all of the informa- tion provided by HiC. Due to this, a specific TF or histone modification could be all that was needed and because of that, a method which attempted to combine both ChIP-Seq and HiC was developed and termed chromatin interaction analysis by paired-end tag sequencing (ChIA- PET) [161]. The method was published a month after HiC but used a very similar technique but instead of using enzymatic fragmentation, sonication was used and linkers were ligated onto the ends of the sonicated fragments. From there, the fragments were ligated together and ChIP-Seq was subsequently completed (figure 1.6) [161]. The method successfully captured interactions using a variety of histone modification and TFs but required a very large number of cells (> 20 million) as well as deep sequencing [161]. The method has used but technically challenging and was recently developed further down to the sc level [164]. However, even the newest method relies on highly sophisticated microfluidic devices and remains technically challenging [164]. 35 Although ChIA-PET was the first example of highly enriched 3D ChIP, a method de- veloped by Mumbach et al in 2016, termed HiChIP, has quickly been adopted as a better al- ternative [53]. The HiChIP method was published at the same time as an analogous method called Proximity Ligation-Assisted ChIP-seq (PLAC-Seq) [165], in which the difference is the library preparation - HiChIP used a Tn5 based approach [53], while PLAC-Seq used a blunt- end adapter addition. For simplicity, HiChIP will be used interchangeably throughout this manuscript. Overall, the methods were the same concept as ChIA-PET but did so in a more efficient manner by first completing the usual HiC protocol ligation. However, instead of using biotin to enrich the DNA, a ChIP was completed instead (figure 1.6) [53, 165]. The result was fewer cells could be used (> 1 million) and the result was a much higher quality data-set as the overlapping ChIP-Seq peaks with HiChIP compared to ChIA-PET [53, 161]. The method has shifted the way in which gene expression is understood as the important 3D interactions helping to guide transcription have been discovered [53, 55, 72, 87]. These important studies have shown how even GWAS SNPs can interaction with regions very far away to help cause pathological phenotypes [87]. The HiChIP method was used here to help further understand the transcirptional differences within HCM. 1.7 Tn5 transposase Transposases (tnp) were first discovered in 1950 by McClintock in maize [166]. She was able to show maize was able to restoration of gene activation through the removal of specific regions of DNA [166]. Interestingly, she also noticed that the insertion of the DNA increased the frequency of mutation, indicating a potential mechanism to help restore mutated genes [166]. This copy and paste mechanism where a piece of DNA is removed and inserted somewhere else was termed a transposition events [166]. Tnps have been subsequently studied in great detail and there is evidence they are some of the most common genes throughout life [167]. The reasoning behind this is not known, but it is thought that since tnps have the ability to cut out genes and insert them in other places that this type of event could help with the diversification of life and being able to copy genes from one organism to another [167]. Although there are numerous versions of tnps throughout the tree of life, in this work, 36 a specific family from Shewanella and e.coli named Tn5 [168] will be a main focus of this work both in NextGen sequencing approaches and the technology development. One of the most impactful scientists for the development and wide use of Tn5 today is Reznikoff who seems to have had a hand in every major discovery of Tn5 from the early 1980s until the mid 2000s. These discoveries will be briefly discussed here from Tn5s early discovery to the hyperactive version which is used in many techniques including library preparations from Illumina (NextEra) and molecular techniques such as ATAC-Seq [28]. Since the discoveries of tnps in 1950, the general mechanism behind tnps has been dis- covered and is widely conserved between various families [169]. Although some of the mech- anisms vary between the families, the mechanism for Tn5 is very much cut and paste where specific DNA sequences are recognized, cut from the genome, and then inserted at another, ran- dom position (figure 1.7b). The mechanism of Tn5 transposition will be discussed in greater detail below. 1.7.1 Tn5 Mechanism Tn5 has been known since 1975 when it was first discovered as part of another studying look- ing at kanamycin resistance [168, 170]. The study showed that kanamycin resistance could be transferred to another genome via transposition [170]. Further work by others showed this phe- nomenon was consistent with bleomycin and streptomycin resistance [168, 171, 172]. These two impactful studies by Auerswald et al [171] and Mazodier et al [172] showed that the trans- position events had a conserved sequences which were subsequently termed IS50L and IS50R, which flanked the resistance genes [168, 169]. Although these sequences had a high conser- vation, they differed in a few positions (figure 1.7a). These sequences are essential for the enzyme to be active, although some optimization of the binding sequence has been completed and was termed the mosaic end (ME) sequence (figure 1.7a) [173]. The optimization was com- pleted by combining the IS50L and IS50R to generate the highest binding motif for Tn5 [173] (figure 1.7a). The generalized mechanism can be seen in figure 1.7b and will be explained here. For simplicity, the IS50R and IS50L sequences will be referred as the ME sequence but also because 37 the ME sequence has been shown to bind more readily and is the preferred sequence to use when creating the Tn5 dimer [173]. For the mechanism, the first step for the enzyme is for a monomer Tn5 to bind to the ME sequence. The Tn5 can bind other complexes with a similar sequence to the ME using about 40 amino acid residues (26-65), but their lower affinity makes the dimerization formation much less stable [168, 174]. As a result of the lower stability, the complexes which do not bind an ME sequence are not able to perform transposition [168,174]. The next stage of Tn5 dimerization is complex and involves conformational changes within the protein [169, 175–178]. The dimerization event has been shown to be implicated in the N-terminus and C-terminus of the enzyme without the bound ME sequence contact each other which in turn blocks for the formation of the Tn5 dimerization [168, 175]. Gradman and Reznikoff showed in 2008 that with an N-terminal deletion, dimerization was not able to occur, showing conformational changes have to occur both in the binding of the ME sequence and the subsequent dimerization [175,178]. So, although in figure 1.7b the dimerization is shown as a simplistic mechanism, there is a lot of biochemical changes occurring for this process to occur. Similarly to the dimerization step, the DNA excision step is a complex and involves a few different steps. One of the most molecules in this step is Mg 2+ as it is required for the active site to perform the reaction of excision [168]. The process for excision is a multi-step process, but the enzyme uses hydrolysis reactions to completed a sequential single strand break, in order to generate the double strand break needed for excision, through a hairpin intermediate [168, 175, 179–182]. Once the DNA has been excised, the insertion process goes through a similar mechanism of single and subsequent double strand DNA breaks but the enzyme utilizes a 3’ hydroxyl attack of the hairpin intermediate to insert the ME sequence into the target DNA [168,182]. After the insertion even there is a remaining nick in the phosphate backbone of the DNA which can be subsequently repaired by mechanisms in the cell [168]. This mechanism has included over 20 years of study but has given insight into both how Tn5 is working as well as alterations which could be made to improve or decrease efficiency of the enzyme. 38 Figure 1.7: (A) The transposon structure for Tn5 with the IS50L and IS50R sequences and the combined ME sequence. The bold sequences are where there are differences between the IS50L and IS50R and the ME sequence is color coordinated to the sequence which was from either the IS50L (green) or IS50R (yellow). This is an adapted from Figure 1 (Reznikoff 2008) [168, 173]. (B) The general mechanism for Tn5 binding through transposon transfer. Tn5 monomer binds to the ME sequence and then forms a dimer with another bound Tn5 monomer. After dimerization, the transposon is excised and the new complex of the Tn5 dimer bound with the transposon. The complex can then find another piece of DNA (DNA B) to insert the transposon into. 39 1.7.2 Tn5 Hyperactivity Although the mechanism went through a great deal of study, during the process of understand- ing how the enzyme worked there were some modifications made to the enzyme to increase its activity. One of the methods to increase the activity of Tn5 was briefly discussed above and dealt with the ME sequence. Once the ME sequence was discovered it was shown to increase the activity of the Tn5 insertion greatly [168, 169, 173]. There have been a lot of studies mutating Tn5 to determine the effect of specific changes which can be seen in table 2. Although a lot of the listed mutations were to determine the mechanism of Tn5, there were a few which saw hyperactivity of Tn5. The general studies will be summarized briefly here. In 2007, Vaezeslami, Sterling, and Reznikoff published a study which looked at the forma- tion of the Tn5 dimer, where altering specific amino acids reduced the ability for Tn5 dimeriza- tion. Due to the mutations completed, they were able to hinder the activity of the enzyme [178]. While it was useful in determine mechanistic traits of the enzyme, it was not helpful in the increased ability for transposition [178]. Also in 2007, Reznikoff’s group published another work on the general reduction of Tn5 activity to better understand the dimerization process [183]. Klenchin et al in 2008 was also able to show mutations which helped with the overall transposition mechanism as they diminished the enzymatic activity [184]. They made a few mutations which were implicated in the strand transfer mechanism but none of them increased the activity of the enzyme [184]. Although the aforementioned studies worked on the mechanism of Tn5 transposition, there was an important study completed in 1998 by Goryshin, Yu, and Reznikoff, which in- cluded three mutate versions of Tn5 which are still used today [168, 185, 186]. These include M56A, E54K, and L372P and they all work on different aspects of the Tn5 mechanism. M56A blocked the synthesis of a natural Tn5 inhibitor, Inh [185]. Although an indirect increase of activity, it allowed for more transposition events to occur. The next mutation, E54K, had to do with the binding of Tn5 to the ME sequence [185]. By increasing the ability of Tn5 to bind to the ME sequence, more events were capable as the efficiency of the reaction was increased [185]. The last mutation was L372P, which increased 40 the ability of Tn5 to form dimers [185]. Although the mutants individually increased the trans- position events by a dramatic number, it was still less than about 5-10 fold [185]. However, when the mutants were combined, the transposition events increased by at least 10 3 -fold over the standard Tn5 [185]. Tn5 has become an extremely useful protein since the work of Reznikoff and others from the mid 1980s. Its use has become widely known and accepted since the advent of ATAC-Seq and Illumina’s NextEra library preparation kits [28]. There have been new technologies driven by Tn5 and it has become a staple enzyme in molecular biology, especially since the ability to purify and use the enzyme has been demonstrated to be as high quality as industrial production [186]. Due to its ability to create a faster and more cost effective library preparation and determine chromatin accessibility makes it useful in the coming years. With newer technologies such as improved Omni-ATAC [81], CoBATCH [187], CUT&TAG [148], itChIP [149], and other improvements of previous techniques will continue to occur. 41 Mutation Mutation Function Citation M56A Blocks the synthesis of Inh (a natural inhibitor of Tn5) Goryshin; Yu; and Reznikoff (1998) E54K Hyperactivity by enhancing the binding of ME. Goryshin; Yu; and Reznikoff (1998) L372P Hyperactivity by altering the dimerization potential of Tn5. Goryshin; Yu; and Reznikoff (1998) R189C Increased target preference by about 4 fold. Reduces strand transfer by 2.5 fold Adams et al (2007) K212M Decreases the target insertion specificity and reduced transposition effciency more than 4 fold Adams et al (2007) K249A Slight reduction in transposition efficiency (reproducible) Adams et al (2007) R250A Slight reduction in transposition efficiency (reproducible) Adams et al (2007) R104A Slight reduction in transposition efficiency (reproducible). Reduces strand transfer by 2.5 fold Adams et al (2007) Q118A Slight reduction in transposition efficiency (reproducible). Adams et al (2007) K160A Reduced transposition effciency more than 4 fold. Adams et al (2007) K164A Greatly reduced transposition efficiency by about 75 fold. Adams et al (2007) R210A Greatly reduced transposition efficiency by about 30 fold. Adams et al (2007) H213A Reduced transposition effciency more than 4 fold. Adams et al (2007) R342A Completely impaired dimerization complex formation Vaezeslami; Sterling; and Renikoff (2007) E344A Enhanced hyperactivity. This is because it is able to form dimerization products Vaezeslami; Sterling; and Renikoff (2007) N348A Completely impaired dimerization complex formation Vaezeslami; Sterling; and Renikoff (2007) S438A Completely impaired dimerization complex formation Vaezeslami; Sterling; and Renikoff (2007) K439A Completely impaired dimerization complex formation Vaezeslami; Sterling; and Renikoff (2007) S445A Completely impaired dimerization complex formation Vaezeslami; Sterling; and Renikoff (2007) R210A Helps with the 3’ non-transferred strand of DNA Klenchin et al (2008) Y319 Helps with the 3’ non-transferred strand of DNA Klenchin et al (2008) R322 Helps with the 3’ non-transferred strand of DNA Klenchin et al (2008) Table 2: List of Tn5 Mutations • This is not an exhausted list of the mutations completed one Tn5 but are used to highlight how it would be possible to either increase or decrease the activity of Tn5 if it was needed. Gene Locus Type Frequency TTN 2q31 Giant filament Rare MYH7 14q11.2-q12 Thick filament 25–40 MYH6 14q11.2-q12 Thick filament Rare MYL2 12q23-q24.3 Thick filament Rare MYL3 3p21.2-p21.3 Thick filament Rare MYBPC3 11p11.2 Intermediate filament 25–40 TNNT2 1q32 Thin filament 3–5 TNNI3 19p13.4 Thin filament 1–5 TPM1 15q22.1 Thin filament 1–5 ACTC 15q14 Thin filament Rare TNNC1 3p21.1 Thin filament Rare ACTN2 1q42-q43 Z-disc Rare ANKRD1 10q23.31 Z-disc Rare CSRP3 11p15.1 Z-disc Rare LBD3 10q22.2-q23.3 Z-disc Rare MYOZ2 4q26-q27 Z-disc Rare TCAP 17q12-q21.1 Z-disc Rare VCL 10q22.1-q23 Z-disc Rare CALR3 19p13.11 Calcium-Handling Rare CASQ2 1p13.3-p11 Calcium-Handling Rare JPH2 20q13.12 Calcium-Handling Rare PLN 6q22.1 Calcium-Handling Rare RYR2 1q42.1-q43 Calcium-Handling Rare Table 3: List of HCM Implicated Genes • This was adapted from Table 1 from Landstrom and Ackerman [188], which shows the most common mutated genes and their locus in HCM. • The Frequency (column 4) shows the percentage of HCM cases which have a mutation in the corresponding gene. If the Frequency is below 1% it was listed as Rare. 42 1.8 Hypertrophic cardiomyopathy background Heart disease is one of the most common causes of death in the United States [189]. Although there are numerous types of disease which have different triggers, there are often issues with the cardiomyocyte cells [190–192]. Hypertrophic cardiomyopathy (HCM) is a genetic heart disease which affects cardiomyocytes, specifically in the left ventricle wall [193–195]. Through yet discovered reasons, there is an increase in the left-ventricle wall [193]. The disease affects about 1 in 500 individuals, making it one of the most common genetic heart diseases [6, 192, 195, 196]. One of the most prevalent HCM researchers, Barry Maron, explains that not only is the disease able to affect individuals of all ages but it is also the ”most common cause of sudden death in people under 30 years of age” [191, 196]. Although there has been a concerted effort to better understand the disease on a genetic and epigenetic level, there was not always a distinct name given to the disease since it was first described by Donald Teare in 1958 in which 7 out of his 8 patients included in the study died before turning 50 years old [193]. although the death rate of the disease has been found out to be much lower of about 1% [197–199], the mortality rate cannot be ignored due to the diseases high prevalence in the human population. The mortality rate has also been decreased due to early diagnosis and preventative measures such as implatable cardioverter-defibrillators (ICDs) or clinical advice to avoid certain strenuous activities [199]. While HCM remains a concern of sudden cardiac death (SDC) in young individuals and in athletes, the management of the conditions have shown to be able to reduce the incidence of death if the disease is known [188]. Since Teare’s initial study, there have been many names associated with the disease (over 75 [6]), and the most common method to diagnose the disease was through an electrocardio- gram or other invasive techniques [192]. However, in late 1989, Jarcho et al first discovered a potential HCM causing locus in chromosome 14 [192, 194]. This work was quickly followed up by Giesterfer et al in 1990, which found a mutation in a myofilament gene,=beta-myosin (MYH7), to be a potential cause for HCM and this work has been followed up extensively to confirm these findings [200, 201]. The number of potential disease causing mutation found in the MYH7 gene, found in 1990, has increased drastically to dozens of genes and potentially more than 1,400 mutations [188, 192, 202–210]. Due to the large number of genes and subse- 43 quent mutations found in HCM, it has been challenging to fully understand how the disease is occurring and was a major focus of this body of work. 1.8.1 HCM Mutations In 2010, Landstrom and Ackerman pointed out that although there have been many mutations discovered, the ability to make a clinical prognosis in HCM using the known mutations has not been all that useful [188]. As mentioned above, there are numerous genes implicated in HCM and have been summarized in table 3, which include numerous in myofilament genes, the Z-disc (boundaries between muscle tissues), and in genes which handle calcium [188, 192, 202–210]. Although the genes in table 3 are not exhaustive, they summarize the most common genes found [188]. There are three major categories of mutations and they target different aspects of the heart and they include genes in the filament, Z-disc, and the ability for cells to handle calcium [188]. The filaments in cardiomyocytes are responsible for the muscle contraction, which is essential to pump blood throughout the body [211]. A general mechanism of how the myosin filaments work can be seen in figure 1.8a. The Z-disc is another important aspect of this complex as can be seen in figure 1.8a-b, which are titin (TNT) and troponin C (TNNC1) [209]. These Z-disc genes help to anchor the whole complex together, but they have important properties as signaling molecules, specifically in the heart and in rare cases can cause HCM [209,212–219]. The last class of major mutations in HCM are ones which affect Ca 2 + signaling [188,220] . Although these are considered to be rare in HCM, they have helped to uncover the importance of the signaling in the overall HCM disease pathway [188, 220]. One of the issues with only looking at the mutations is that it does not explain the whole picture of the disease [188]. In order to gain a deeper understanding of the disease which genes are being deferentially expressed is important because that would help to explain what the mutations are doing and how to then generate therapeutics against the resulting mutations. Since technologies such as, clustered regularly interspaced short palindromic repeats (CRISPR) using the CAS9 protein [221] are still being developed for human use, targeting the result of 44 Figure 1.8: (A) Figure 7b from Sen-Chowdhry et al [199] showing the cyclic mechanism of how myosin filaments are able to contract and relax. The process involves many different factors and is an ATP driven cycle. The top shows the binding of ATP to the myosin head which undergoes ATP hydrolysis to allow for the myosin head to bind to the F-actin filament. Interestingly, the inorganic phosphate (P i ) helps to facilitate the binding to limit the rate of contraction and once it is released, the binding strengthens and contraction happens which releases the ADP and in turn the F-actin from the myosin head. [199]. (B) Figure 2 from Seidman and Seidman 2011 [209] showing a zoomed in version of the HCM mutation genes within the filaments. The black listed genes: MYH7, MYPBC3, TNNT2, TPMI, MYL2, MYL3, and ACTC are heavily associated with HCM and make up an important complex within the filament. The Z-disc genes of TNNT1, TNT, and MYH6 positions can also be seen [209]. Abbreviations: (A) ATP, adenosine triphosphate; ADP, adenosine triphosphate; TnC ,troponin C, ; TnI, troponin I; TnT, troponin T. (B) MYH7, myosin heavy chain; MYPBC3, MYBPC; TNNT2, troponin T; TPMI, tropomyosin; MYL2, myosin regulatory light chain; MYL3, myosin essential light chain; ACTC, actin alpha cardiac muscle; TNNT1, troponin C; TNT, titin; MYH6, myosin heavy chain. 45 a disease causing mutation helps to gain a full understanding of how the disease is able to function. So, while mutations are important to understand, the focus of this work is to look at the resulting affects of gene expression and chromatin dynamics within the disease. 1.9 HCM Disease Mechanism Although mutations help to explain some of the causes, they do not fully address the biolog- ical mechanism of the disease. As discussed above, HATs and HDACs play a crucial role in gene expression and was a likely [18, 29, 44, 45]. Over the past twenty years there have been many insights into the molecular mechanisms of HCM and the subsequent stress put on the heart. Through this work, there have been a few major proteins which have been implicated including the -adrenergic receptor (-AR), calcium/calmodulin-dependent protein kinase II (CaMKII), protein kinase A (PKA), histone deacetylase 4 (HDAC4), and one of its binding partners myocyte enhancer factor 2 (MEF2) [2,7, 9,67–69, 222–225]. All of these components of the mechanism play a key role, the two proteins interacting with chromatin are MEF2 and HDAC4. These studies have been completed in a variety of ways including various mutant mouse models [2, 7–9, 222, 226, 227], utilizing mouse muscle myoblast cells lines such as C2C12 (ATCC: CRL-1772), or invasive methods such as transverse aortic constriction (TAC) [10,224, 228]. TAC is one of the most common techniques used when studying HCM as it overloads the heart and creates a similar response to what is seen in the disease [222–225, 229–231]. Another model which is used to study the HCM mechanism is the use of neonatal rat ventricle myocytes (NRVMs) which can be harvested and subsequently stimulated with nore- pinephrine to induce a hypertrophic response [190, 223, 232] and was used in the ChIP and HiChIP studies in this work. These models have helped to uncover the mechanistic and molec- ular understanding of HCM and will likely be the reasons for future therapeutics. The mechanism of HCM has best been described by Backs and he has extensively studied the disease over the past two decades [2, 7–9, 222, 224, 225]. The concise mechanism can be seen in figure 1.9 which starts with the activation of-AR, a G protein-coupled receptor, and is stimulated by adrenaline and some of its derivatives. There are two distinct pathways within 46 Figure 1.9: Adapted from Jebessa et al Figure 5 [2] depicting the HCM pathway. -AR stimulation activates protein kinase A (PKA) and calcium/calmodulin-dependent protein kinase II (CaMKII). The two distinct pathways begin with either a O-GlcNacylated of CaMKII which maladaptively phospho- rylated serine residues on histone deacetylase 4 (HDAC4), facilitating relocation to the cytoplasm via the chaperone protein 14-3-3 (not shown). The relocation of HDAC4 activates MEF2 genes such as nr4a1 [2]. The MEF2 genes being activated have been correlated to glucose handling, maladaptive O-GlcNAcetylation, and improper calcium handling resulting in impaired cardiac function. The other side of the pathway activates PKA to phosphorylate perilipin 5 (PLIN5), allowing for the serine protease, abhydrolase domain containing 5 (ABHD5), to cleave the first 201 amino acids of a O-GlcNacylated HDAC4 (HDAC4-NT). The HDAC4-NT retains its silencing ability of MEF2 [2]. 47 the mechanism, one of which is maladaptive and responsible for triggering HCM and the other maintains proper cardiac function. If the -AR is stimulated chronically, the activation of CaMKII is favored, leading to a maladaptive cellular response [2, 233, 234]. However, if the stimulation only occurs for a short amount of time, the activation of PKA is favored maintaining cardiac integrity [2, 235]. The side of the pathway which retains proper cardiac function, begins with the acti- vation of PKA. Activated PKA phosphorylates a lipid-droplet-associated protein, perilipin 5 (PLIN5) [2]. The activation allows for a newly discovered serine protease, abhydrolase do- main containing 5 (ABHD5), to cleave HDAC4 at tyrosine 201. Interestingly, HDAC4 cannot be cleaved if it is phosphorylated at serine 632 (S632), but is O-GlyNAcetylated at serine 642 (S642) (figure 1.10c) [2, 236]. After the cleavage event, the first 201 amino acid residues of the n-terminal HDAC (HDAC-NT) has its ability to bind and to continue to silence MEF2. The HDAC-NT maintains proper cardiac function, while being able to avoid being phosphorylated by CaMKII in the maladaptive pathway [2, 9, 236]. The maladaptive pathway begins with the activation of an O-GlyNAcetylated CaMKII, which allows for the phosphorylation at S632 on HDAC4, aiding in the relocation of HDAC4 to the cytoplasm via a 14-3-3 chaperone protein. The effect of HDAC4 relocation on the cell is the loss of MEF2 silencing, allowing pathological genes to be activated. Some of those genes are responsible for glucose and Ca 2 + handling, which will have a negative effect on cardiac function downstream [2, 7–9, 222, 236, 237]. 1.9.1 HDAC Background HDAC4 is the most important regulator in the HCM pathway as it silences MEF2 genes which lead to cardiac dysfunction if activated (figure 1.9). HDACS have been studied in great de- tail and have been broken down into four major classes with a total of 18 different vari- ants [32, 50, 238, 239]. Three of the classes of HDAC can be seen in figure 1.10a, with their relative similarity for their deacetylase domains [240, 241]. Although the class III HDACs are technically HDACs, they have a different type of catalytic domain and will not be discussed in detail here [240, 242]. Although HDAC4 is a class IIa HDAC, the other classes will briefly be 48 discussed to give background on the diversity and importance of this TF. The general purpose of HDACs were briefly discussed above with their function being to remove acetylation marks on chromatin [32]. As a result of the removal of histone acetylation, there can be an over expres- sion of certain genes around the area, even leading to certain cancers [31,32,238,240,241,243]. Instead, the focus will be on the structure of HDAC4 and how it correlates to HCM. The first aspect which will be discussed is the binding of HDAC4 to MEF2, which was discovered in the Lin Chen lab in 2003 [68]. These data revealed that there was a conserved binding sequence for MEF2 with class IIa HDACs and even some conserved binding for p300 to bind in the same position as the HDACs on MEF2 (figure 1.10b) [68]. Cabin1 was used to generate the crystal structure and is a known repressor of MEF2 [68, 69]. The binding sequence for other class IIa HDACs is also shown in figure 1.10b as the binding motif is very similar. Also of note is that p300, a transcirptional activator, also has conserved binding to the MEF2 site, which will be discussed in more detail below. In figure 1.10c, a crystal structure from Han et al shows the positioning of the bound HDAC and how a small region of less than 30 residues forms the binding [68]. As a result of the small binding domain for MEF2 within the HDAC protein, there are many other regions of note which are important to HCM. figure 1.11a summarizes the vari- ous domains of HDAC4 in a concise manner. Shown are the relative positions of the binding position for MEF2, serum response factor (SRF) which has been implicated in HCM gene sta- bility [7, 222], and the binding positions for CaMKII and PKA. Interestingly, the deacetylase domain of HDAC4 and class II HDACs in general do not have much activity [237,244] and of- ten recruit class I HDACs or methyltransferases to inhibit gene expression [237, 245, 246]. So, while the deacetylase domain is not inherently implicated in HCM, there are likely large TF do- mains in and around MEF2 silenced regions. The serine residues implicated in 14-3-3 binding and subsequent relocation to the cytoplasm are shown in black (S246, S467, and S632) [2,222]. The serine residue in red (S642) is the site of O-GlcNAcetylation, which protects from CaMKII binding and phosphorylation of S632 (figure 1.11c) [2, 9, 237]. The relative position of the tyrosine cleavage site used by ABHD5 is also shown [2]. As shown in figure 1.11b, after ABHD5 cleavage, the HDAC-NT is this region of residues 1-201 49 Figure 1.10: (A) Adaption from Figure 1 (Seto et al [241]) and Figure 3 (Parbin et al [240]) showing Class I, IIa, IIb, and IV . To the right of the protein the total amino acids in the protein is shown. The darkened in section of the protein shows the location of the deacetylase. The phylogenetic tree [240] between the different classes is shown to the left. (B) Conserved binding between Cabin1, Class IIa HDACs, and p300 adapted from Figure 1a Han et al [68]. The top bar refers to the region where the binding happens in MEF2 (positions from 2159-2180). The bold and red indicates the residues facilitating the binding. (C) Figure 1c from Han et al [68] showing the crystal structure of Cabin1 bound to a MEF2 dimer (depicted as A and B). 50 and contains two important structural components, the MEF2 binding site and an-helix for- mation from residues 62-153 [2, 67]. So, the HDAC-NT retains its binding and subsequent silencing of MEF2 but also contains a glutamine rich -helix, which allows for a linear he- lix to be formed [67]. Interestingly, the -helix can form higher order structures of at least tetramers and it is likely the tetramers can interact as well, forming high order structures (fig- ure 1.11d) [67]. Although the function of this -helix is not fully understood, it is likely to play an important structural role in the silencing of MEF2. It is possible there are large regions of chromatin which are segregated in a super-silencing manner via these-helix interactions, especially due to the fact that this region is conserved in the HDAC-NT, known to maintain cardiac stability. These questions of the-helix function as a structural component for chro- mosomal structure will be probed in this work [2,67]. A theoretical crystal structure of how the helix would function to bring in multiple MEF2 proteins in the same 3D space is shown in fig- ure 1.11e. As a result of HDAC relocalization to the cytoplasm in the maladaptive pathway, the chromosomal structure around those regions are altered and could help to explain how errant gene expression of MEF2 genes occurs [10]. There could be many evolutionary reasons for this protective structural adaptability com- ponent of HDAC4-NT, but a major one is this would provide a way to control the gene ex- pression of numerous genes by segregating them into silenced regions. HDAC4 is a known regulator of MEF2 and is expressed widely in both the heart and during muscle develop- ment [7, 68, 247–249, 249–254, 254]. HDACs also have the ability to turn off genes which have been recently activated by HATs and RNA pol II by resetting the chromatin back to its pre-transcribed state via the removal of the acetyl groups [32, 254]. HDAC4 and subsequently HDAC4-NT has been shown to have a positive affect on the ability for mice to exercise [2, 3, 222] as well as being able to rescue the HCM phenotype by the addition of HDAC4-NT back into the errant cardiomyocytes [3]. So, the use of this protein has been of interest in the field. There are possible therapies which involve the addition of a derivative of the HDAC4-NT protein to be given to patients to restore the proper silencing of the MEF2 target genes [3] and another therapy which has been proposed by Backs is a small molecule to block the binding of CAMKII to HDAC4, preventing the accumulation of HDAC4 51 Figure 1.11: (A) A zoomed in structure of HDAC4, adapted from Figure 8a (Backs et al) [222]. The structure shows the relative MEF2 (yellow) and SRF (blue) binding positions as well as the docking positions for CaMKII (red) and PKA (green). The positions of phosphorylation important for reloca- tion to the cytoplasm. A cleavage site at tyrosine 201 (Y201) used by a protease to keep maintain the MEF2 binding [222]. CaMKII, calcium/calmodulin-dependent protein kinase II. (B) A zoomed in pic- ture of the N-terminal fragment of HDAC4 (HDAC4-NT) which is the first 201 residues. This sequence has two features, the -helix formation from residues 62-153, shown in blue, and the MEF2 binding sequence, shown in yellow. (C) Adapted from Figure 7 (Kronlage et al [236]) showing (1) the phospho- rylation event by CaMKII when HDAC is not O-GlyNAcetylated at position S642. (2) Shows how if there HDAC4 is O-GlyNAcetylated at S642, it interrupts the ability for CaMKII to phosphorylate S632 inhibiting chaperone protein 14-3-3 from relocating HDAC to the cytoplasm. (D) Figure 1 from Guo et al [67] depicting the crystal structure of the-helix formation. This shows how four of these 90 residue sections can form higher order structures, such as the dimer which is shown. The individual formations are shown in different colors (red, blue, green, and purple). The right depiction is a side view of the left structure. (E) A theoretical view of how an HDAC bound to MEF2 could bring different regions of chromatin. The colors are associated with the HDAC bound to MEF2 and the corresponding-helix. This whole region has not been crystallized, so this is a combination of MEF2:HDAC (PDB: 1N6J [68]) and HDAC4-helix (PDB: 2VQV [67]). 52 in the cytoplasm [2]. Due to these factors, a deeper understanding of not only the chromatin landscape and subsequent gene expressed regulated by HDAC would be important the field. Due to HDAC sharing a binding location with p300 on MEF2, the inhibition of MEF2 would block active transcription, the higher order structure would maintain a high local concentration of HDAC4 around a specific region [2, 67]. MEF2 plays a key role in numerous functions, which will be discussed below [255]. The work presented here will be focused on chromatin structural changes occurring due to the loss of nuclear HDAC4 in HCM and subsequent differential gene expression [10]. The unique ability for HDAC4 to act as a repressor in distinct ways by being an antagonist against transcirptional activators such as p300 [67–69], using its deacetylation domain to recruit class I HDACs and other chromatin remodelers of nearby histone acetyl groups [2, 240, 241] , and by potentially sterically inhibiting other transcirptional activating co-factors by forming tetrameric or even higher order structures within the nucleus [67] (figure 1.11d-e) make HDAC4 an excit- ing protein to understand further. 1.9.2 MEF2 The MEF2 family has numerous overlapping functions in eukaryotes but a general theme in humans and other vertebrates is high activity in muscle tissues, specifically in development [255]. A nice summary of different MEF2 functions was generated by Pon et al [255] and can be seen in figure 1.12a. In the development of both muscle and heart muscle, MEF2 is critical, however the constant activation of MEF2 genes as seen in HCM can be maladaptive to the cells [2, 7, 256, 257]. Due to the importance of this TF, there have been many studies to try and better understand both the binding and the expression of genes being regulated. The classes of MEF2 (a-d) as seen in figure 1.12, have very similar structures for the first 57 residues in the protein is made up of a conserved binding sequence motif termed the MADS box [258]. These 57 residues share an overage of over 97% conservation between the four classes in humans [259] and have been shown to have a conserved binding sequence of CTA(A/T) 4 TAG [259–262]. The next 29 residues a DNA dimerization sequence are implicated 53 Figure 1.12: (A) Figure 1 from Pon et al [255] showing the numerous and redundant functions of the MEF2 family of proteins. As can be seen, there are many overlapping functions as well as independent functions. These proteins are expressed in many different cell types and are essential for development [255]. (B) Figure 2a from He et al showing the ability of p300 to bind to 3 MEF2 molecules (PDB: 3P57). 54 Figure 1.13: (A) Cartoon version showing the endogenous silencing of the MEF2 dimer by HDAC4, preventing proximal transcription. Upon-AR stimulation, HDAC4 is released into the cytoplasm and p300 can then bind and drive HCM activation. (B) A theoretical 3D view of (a). HDAC4 could form tetrameric structures to lock in a large region of chromatin but upon the stimulation, p300 could bind to three of the MEF2 molecules causing chromatin remodeling. (C) Crystal structure evidence to support the model shown in (b). On the left is a theoretical crystal structure shown in figure 1.11e and on the right is the crystal structure of p300 binding to MEF2 as shown in figure 1.12b. 55 in MEF2 dimerization, which is shown in figure 1.10c, where A is one MEF2 molecule and B is another [68, 69, 259]. There is also very high conservation in this region, with about 85% of the sequence is conserved between the four groups. Beyond the binding of HDAC4 and Cabin1 to MEF2 (figure 1.10c), an important co-factor helps to express the genes in close proximity to a bound MEF2. It has been shown that the transcirptional activator, p300, can bind up to three MEF2 dimers [263] (figure 1.12b). This has implications on HCM activation because when HDAC4 is segregated into the cytoplasm upon HCM activation, the MEF2 genes have to be activated by other factors and it is likely p300 is the main TF driving the expression of the maladaptive genes. The ability of p300 to bind 3 MEF2 dimers would have consequences on the chromatin structure of HCM. Although there have been a number of studies to determine differential gene expression between healthy and HCM [2,3,7,8,8,10,264,265,265,266], the specific genes being regulated by MEF2 have not been fully characterized. Some of the work completed by Backs and others on class ii HDACs have started to reveal certain genes such as NR4FA1 (figure 1.9) help to reg- ulate glucose handling in the cells, causing the maladaptive response to-AR stimulation [2]. A hypothetical model showing the linear and subsequent implications of the 3D structure can be seen in figure 1.13a-c. The most simple way of seeing the implication of p300 activation is shown in figure 1.13a where the binding of HDAC4 to a MEF2 dimer inhibits any transcription activity, however upon-AR stimulation, p300 is able to bind to the MEF2 dimer causing the activation of HCM genes. The 2D view of the HCM activation does not encapsulate the potential structural changes within the nucleus, allowing for specific genes to be activated. A cartoon of the activation can be seen in figure 1.13b, where the tetrameric structure of HDAC-NT segregates an entire region of chromatin, which could be formed of trans regions. The crystal structure evidence for this model can be seen in figure 1.13c. However, upon-AR stimulation, p300 binds but can only bind a total of three MEF2 molecules [263]. The stoichiometric change from at least four MEF2 TFs segregated to three into one space would have consequences on the gene expression profile. This model was tested here using both computational and experimental methods and there was evidence to show chromatin based changes between healthy and HCM as will be 56 discussed throughout the rest of the manuscript. 57 2 Materials and Methods 2.1 Materials 2.1.1 Kits Used Zymo ChIP DNA Clean & Concentrator (D5201). From New England BioLabs (NEB): Gibson Assembly (E2611S), Q5 High-Fidelity 2X Master Mix (M0492L), Q5 Site-Directed Mutage- nesis Kit (E0554), and NEBNext Ultra II DNA Library Prep Kit for Illumina (E7645L). From Qiagen: QIAprep Spin Miniprep Kit, MinElute Gel Extraction Kit, QIAquick PCR Purification Kit. The kits were used following the manufacturers suggested protocols. 2.1.2 Primer Creation All primers were synthesized by IDT (www.idtdna.com) with standard desalting conditions. Plasmid Name Description Source pTXB1-Tn5 Tn5 Transposase fused to Mxe Intein and Chitin-binding domain. AMP resistance Picelli (2014) 3XMEF2-luc Plasmid used for TTT proof of concept of targeted transposition Prywes Lab (unpublished) Table 4: List of Plasmids 58 Name Sequence Purpose Tn5MErev 5’ - /5Phos/CTGTCTCTTATACACATCT - 3’ Phosphorylated reverse primer for Tn5 transposon Tn5ME-A 5’ - TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG - 3’ Tn5 transposon primer with custom barcode mHDAC insert 5’ - GGTAGCGGCAGCGGTAGCCCGAAGGGCACCGGTCGTGCGGTGGCGAGCACCGAGGTTAAGCAGAAACTGCAAGAGTTCCTGCTGAGCAAACACCACCACCACCACCAC - 3’ Insert for mHDAC in pTXB1-Tn5 mHDAC-FWD 5’ - tggcgcagggcattaaaatcGGGTCTGGGAGCGGCT - 3 FWD primer for mHDAC gibson assembly mHDAC-REV 5’ - agtgcatctcccgtgatgcaGTGGTGATGGTGATGG - 3’ REV primer for mHDAC gibson assembly Tn5-linear-FWD 5’ - tgcatcacgggagatgcactagttgc - 3’ FWD linearization primer for pTXB1-Tn5 plasmid for gibson assembly Tn5-linear-REV 5’ - gattttaatgccctgcgccatcaggt - 3’ REV linearization primer for pTXB1-Tn5 plasmid for gibson assembly mHDAC-Sanger-1 5’ - TCGTGAAGCGGATATTCATG - 3’ Primer 1 for Sanger Sequencing to determine mHDAC insertion to Tn5 plasmid mHDAC-Sanger-2 5’ - CTGGGCTATCTGGATAAAGG - 3’ Primer 2 for Sanger Sequencing to determine mHDAC insertion to Tn5 plasmid mHDAC-Sanger-3 5’ - GACGTCGACCAAACACAACA - 3’ Primer 3 for Sanger Sequencing to determine mHDAC insertion to Tn5 plasmid M2RS-EMSA-top 5’ - GGCTATTTTTAGGGCCTCGAGGGCTATTTTTAGGGCCTCGAGGGCTATTTTTAGCA - 3’ 3X MEF2a binding sequence from 3XMEF2-luc plasmid. Top strand M2RS-EMSA-bot 5’ - TGCTAAAAATAGCCCTCGAGGCCCTAAAAATAGCCCTCGAGGCCCTAAAAATAGCC - 3’ 3X MEF2a binding sequence from 3XMEF2-luc plasmid. Bottom strand M2R-DNA-FWD 5’ - gttcatactgttgagcaatt - 3’ FWD primer for the creation fo the linear DNA to test Tn5:mHDAC specificity M2R-DNA-REV 5’ - tcacatgttctttcctgc - 3’ REV primer for the creation fo the linear DNA to test Tn5:mHDAC specificity Table 5: List of Oligonucleotides 59 2.2 Methods 2.2.1 Agar resistance plates Plates were made by combining 31 g of 2xYT media, 15.5 g of agar into 1L of nanopure water and then autoclaved. The solution was then cooled to 55 C and the required antibiotic was diluted to a 1X concentration. Using a bunsen burner to sterilize the workbench, 10 mL of media was added to each plate and allowed to dry for about 5 minutes. Afterwards, the plates were capped, sealed with parafilm, and stored at 4 C for up to 6 weeks. 2.2.2 Transformation 50-150 ng of plasmid was gently mixed with 50L e.coli, spun down quickly, and incubated on ice for 30 minutes. The e.coli was then heat shocked at 42 C in a thermomixer rotating at 300 RPM. The sample was then incubated on ice for 2 minutes. 1 mL of 2XYT media was added to the sample and incubated in a thermomixer at 37 C and 1000 RPM for 1 hour. The e.coli was pelleted by centrifugation at 13,000 RPM for 1 minute. 750 L supernatant was removed, the pellet was resuspended in the remaining media, and then the sample was plated on an ampicillin (AMP) at 100g=mL resistant plate. The plate was incubated upside down at 37 C overnight. 2.2.3 Glycerol Stocks, plasmid sequencing, and seed cultures for AMP resistant plas- mids Individual colonies were picked and placed in 5 mL 2xYT media supplemented with 100 g=mL AMP in a 15 mL round bottom flask. These were then incubated for at least 6 hours in a shaker at 37 C and 220 RPM. Afterwards, 200L 80% glycerol was mixed with 800L of the culture. The glycerol stock was then stored at80 C for long-term storage. The remaining culture could be used to determine sequence specificity by a Qiagen Mini- Prep following the manufactures recommended protocol and eluting in 35L of warm elution buffer (heated to 55 C). The plasmid sequence was verified with Sanger Sequencing using custom primers. The other options were to use the bacterial as a seed culture to complete a 60 small scale induction test (see Small-scale Induction Test) or for an expression experiment (see Tn5 Expression). 2.2.4 Small-scale Induction Test The E.coli cultures were grown for about 4 hours at 37 C with rotation until the OD reached between 0.7-1. 1 mL of the culture was removed for a QC gel for the induction. Afterwards, isopropyl-D-1-thiogalactopyranoside (IPTG) was added to the culture for a final concentra- tion of 0.25 mM and it was incubated at 23 C with rotation for another 4 hours. The OD was checked and the optimal value was 1.3-1.6. 1 mL of culture was removed after the incubation was finished to test for induction efficiency by SDS-PAGE (see below). 2.2.5 Tn5:mHDAC Plasmid Generation To insert the minimal HDAC domain of 108 bp into the Tn5 plasmid (Addgene Plasmid 60240) [186], the full length plasmid was sent to GenScript where their CLonEZ method was used to insert the desired 108 bp and generate the new Tn5:mHDAC plasmid. 2.2.6 Tn5 and Tn5:mHDAC Expression The seed culture was added to a 1 liter 2xYT culture supplementated with 1X AMP. The cul- tures were grown for about 4 hours at37 C with rotation until the OD reached between 0.7-1. 1 mL of the culture was for a QC gel for the induction. Afterwards, a final concentration of 0.25 mM of IPTG was added to the culture and it was incubated at 23 C with rotation for another 4 hours. The OD was checked and was around 1.3-1.6. 1 mL of culture was removed prior to addition of IPTG and after the incubation was finished to test for induction efficiency. The culture was pelleted by centrifugation in a JLA9.1 rotor at 7,500g for 15 minutes at 4 C. The pellet was collect and stored at20 C. 2.2.7 Stacked SDS-PAGE gel preparation Need to fill in this section with correct values in the table... can get this online. 61 2.2.8 Tn5 and Tn5:mHDAC Induction SDS-PAGE QC Induction of the target protein was verified by sodium dodecyl sulfate–polyacrylamide gel elec- trophoresis (SDS- PAGE). The pre and post induction samples were pelleted by centrifugation (13,000 RPM for 1 minute) and diluted with water using the following equation: • Pre-Induction: (0:8= Pre OD Reading)300 • Post-Induction: (Post OD Reading= Pre OD Reading)(Volume of Pre-Induction) After dilution, 12L was mixed with 3L of a 5X SDS loading buffer and quickly spun down. The samples were boiled at 95 C for 7 minutes, afterwhich were quickly spun down. The samples were then loaded on a stacked 12% SDS PAGE and run for 2 hours at 200 volts in 4 C. The gel was then stained with coomassie for 20 minutes at room temperature and destained with SDS PAGE Destain buffer for between 2-6 hours. 2.2.9 Tn5 Protein Purification The purification was completed in a similar manner as previously described [186]. The protein pellet was removed from20 C and left at room temperature for 15-20 minutes and placed in a pre-cooled metal sonication beaker. The pellet was broken up in complete HEGX buffer (10 mL per 1 gram of cell pellet). The cell pellet was sonicated. After the sonication was complete, the suspension was centrifuged using a JA25.50 rotor at 16,000 RPM for 30 minutes at4 C. After, the supernatant was transferred to a beaker and 526L 5% PEI per 10 mL of the supernatant was added dropwise and mixed with a stirbar. The media was the centrifuged with a JA25.50 rotor at 12,000 RPM for 10 minutes at 4 C. The centrifuge was collected and is the crude lysate. A 10 mL chitin gravity column was prepared by washing with HEGX buffer (20 mM HEPES-KOH at pH 7.2, 0.8 M NaCl, 1 mM EDTA, 10% glycerol, 0.2% Triton X-100) for 10-15 column volumes ()CVs). The crude lysate was run through the column and then 20 CVs of HEGX was washed through the column. Added 25 mL of HEGX supplemented with 100 mM sodium 2-mercaptoethanesulfonate (MESNa) to the top of the column and allowed for 10 mL to run through the column before closing and capping the for 36-48 hours at 4 C. 62 The product was eluted in 12, 2 mL fractions in which 70-90% of the product is in the first 5 fractions. The fractions were visualized on SDS PAGE and the fractions which contained the protein were concentrated using an Amicon Ultra concenrtator (EMD Millipore catalog number UFC503096) at 15,000g for 30 minutes at 4 C. The Tn5 was stored at20 C as previously decscribed [186]. 2.2.10 Preparation of NRVMs The preparation and stimulation of the neonatal rat ventricle myocytes were completed as pre- viously described [3]. 2.2.11 ChIP-Seq ChIP-Seq was completed as previously described with a minor alterations discussed below [56, 128, 129]. 2.2.12 HiChIP Assay HiChIP was completed as previously described with a few alterations discussed below [53, 56] 2.2.13 ChIP-Seq Analysis The samples were mapped to the Rattus norvegicus (rn6) genome and had the differential peaks called as previously described with few alterations [56]. The differential regions were also determined using the MEDIPs package [131]. The settings used to create the MEDIPs MSets were: extend = 0, shift = 0, window size = 500, BSgeome = BSgenome.Rnorvegicus.UCSC.rn6, and paired = TRUE. To determine differential regions between different MSets, MEDIPS.meth was used with the following conditions: p.adj = ”BH”, diff.method = ”edgeR”, CNV = FALSE, and MeDIP = FALSE [131]. The differential peaks (MACS2 [267]) or regions (MEDIPS [131]) were then correlated to nearby genes (within10 kb) using a custom R script. These genes were then subsequently used as a list for RNA-Seq or HiChIP analysis. 63 2.2.14 HiChIP Analysis HiChIP was completed as previously described with a few alterations [53, 56] and will be discussed in great detail below. 64 3 Targeted Transcription Factor Transposition 3.1 Introduction The desire to better understand where regulatory elements are located within the genome has been an ongoing pursuit as seen through the development of assays such as ChIP-Seq, ATAC- Seq, HiC, HiChIP, and other similar methods [28, 53, 54, 92, 122, 160]. As newer techniques are developed, there are benefits and limitations associated with them. As discussed, one of the limitations ChIP-Seq and subseqeunt ChIP-Seq like methods, such as CUT&RUN, it-ChIP, and HiChIP are the use of antibodies [53, 146, 149] [cite antibody problems]. Those obstacle were one of the main motivations to develop a new method, termed Targeted Transcription factor Transposition (TTT), which would not rely on antibodies but instead uses a chemical probe in its place. Single-cell genomic data have drastically increased our knowledge of gene regulation as it has helped to eliminate ensemble effects. The goal of this work was to develop an assay with the ability to add a barcode to enrich a specific target without the use of antibodies within a living single cell. If a technology was developed in which a histone or TF’s position could be determined over time or in response to a stimulus, a lot of useful insights could be obtained. The system could also be calibrated to tag the position of a target using different markers at various times. This would allow for dynamic analysis of a specific target’s location a single cell over time, something only possible through high resolution microscopy [268]. The potential of TTT for this system made it an exciting project as there the development began prior to assays like CUT&TAG or it-ChIP [148, 149]. To develop TTT, the method had to be developed in which a probe could bind to a spe- cific target and then leave a barcode where the probe was bound in the genome. A similar strategy had been attempted for targeted therapeutics which used transposase linked with zinc- fingers [269,270]. Although this is an active area of research, the unpredictability and off-target insertion of transposase insertion can cause issues using it as a therapeutic [269, 270]. How- ever, the ability to use this in an in vitro system would allow for the idea to be used much more readily, due to less impact of off-target affects. There has been a large number of TFs binding 65 motifs discovered and TTT would make of the known motifs to design a library of probes to chemically bind to the target of interest. In this chapter my work on TTT will be described. Although the ultimate goal was not achieved, the proof of concept foundation was laid and future work on this project could com- plete the initial goals outlined above. There were many important parts to this work which included purifying Tn5 and a modified version of Tn5, testing binding abilities of TFs to DNA and the modified Tn5 to the target, and then to develop an in vitro system to test the efficiency of targeted transposition. Finally, Tn5 was mutated to reduce the off-target insertion of a barcode. In the future, the modified Tn5 plasmids could be used as a template for new probes. 3.2 Assay Design The original approach taken in the project was to fuse a nicking endonuclease (nickase) to a probe in order to create a nick on one strand of DNA nearby to a specific target. The nick would subsequently be filled in with a chemically modified dNTP (figure 3.1a). If the technique was able to nick the DNA in a targeted manner, either a biotin enrichment similar to the chromosome capture methods or a newer type of sequencing with BioNano, a fluorescent based nanochannel sequencing technology, would be utilized [cite bionano] with the addition of a fluorescently labeled dNTP. Although this plan was set into action, it was quickly replaced by a different version before any preliminary data was generated. The new version went in a slightly new direction, where a transposase would be used instead of a nickase. There were a few reasons for this: the success of ATAC-Seq [28] and ChIPmentation [136] showed the utilization of Tn5 would be readily accepted by the commu- nity, the ability to control Tn5 to barcode with a customize string of DNA, and previous work by Feng et al which showed success fusing a transposase to a zinc-finger domain (ZFD) to direct specific insertion of a DNA barcode near a DNA sequence [270]. However, for the method to be applicable, there were a few alterations needed from Feng’s work. In the work, a Tc1/mariner family transposon from Synechochystis, ISY100, was used [270]. Although this transposase was capable of being inserted into the human genome, it was less known in the field and less active compared to Tn5 [cite tn5 vs isy100] [269]. While 66 ISY100 could be used, another drawback was the slightly longer binding sequence required (analogous to Tn5’s ME sequence) of 30 bp instead of 19 bp for Tn5 [270] [cite tn5 ME sequence]. The 11 bp difference would mean during sequencing, post insertion of a unique barcode, there would be less reads to the genome and be further away from the target TF or histone modification. Another advantage of using Tn5 was the readily straightforward protocol to purify the protein with a high yield, without the need of high pressure liquid chromatography (HPLC) [186]. The next modification was to change the ZFD to a chemical probe which would target a specific TF and not a sequence of DNA. The Lin Chen lab had a lot of experience with purifying and understanding the binding between MEF2 and HDAC [68,69]. Due to MEF2’s implication in HCM, probing MEF2 binding sites was of great interest [222]. From Han et. al’s work in 2003 and 2005, the MEF2 binding domain for HDAC4 was known [68,69]. The MEF2 binding motifs of HDAC4, 5, 7, 9, and Cabin1 were used to generate an optimal binding sequence for MEF2 (figure 3.1b). There were a few alterations made to the HDAC4 binding domain by using crystal structure modeling to determine an optimal binding sequence (figure 3.1b-c). The best fitting residues were used to make the probe bind as tightly as possible. There was also a six histidine (6XHis) tag addedif a secondary purification was desired. Overall, the 108 bp sequence of the minimal HDAC4 binding motif (mHDAC) can be found in table 5. The GSG linker used in Feng’s work was kept as it was a flexible linker that had previously worked for their targeted transposition [270]. The number of GSG repeats for optimal linker size would have to be determined based on experimentation and would need to be tested in future work. The hypothetical stoichometry model of the modified Tn5 protein bound to MEF2 and DNA can be seen in figure 3.1c. Once the design of the potential probe was finalized, the goal was to purify both the standard Tn5 and the modified Tn5 which had a minimal binding domain of HDAC4 fused to it via a GSG linker (Tn5:mHDAC). After the purification of these constructs it was required to test their activity, determine if the Tn5:mHDAC could bind and had specificity to MEF2, and finally to determine if there was targeted insertion around a DNA bound MEF2. 67 Figure 3.1: (a) General method layout showing the idea behind the project. This scheme shows a chemical probe linked to a marker generator bound to a TF of interest. The first iteration of the method used a nickase and the second iteration used a Tn5 transposase. The marker in the case of second iteration was a unique barcode inserted by the Tn5. (b) Adapted from Han et al. Figure 1 [68]. Showing the different binding motifs for class II HDACs and another MEF2 binder, cabin1. The amino acids in red are conserved hydrophobic residues and the blue are conserved lysine, phenylalanine, or alanine. The numbers indicate the position in the full-length proteins [68]. (c) Figure 5 from Han et al. showing the hydrophobic binding groove of MEF2 where the class II HDACs. The same color scheme is used as in (b). (d) A hypothetical model of the DNA:MEF2 and Tn5:mHDAC. The gray color is random DNA, blue is MEF2, red is HDAC, pink and tan are Tn5 transposase, and cyan is the Tn5 transposon. The yellow and green residues would be where the GSG linker would attach. Assembled in Pymol using Tn5 (1MUH) [180], MEF2:HDAC (1N6J) [68], and a constructed DNA molecule. 68 3.3 Results 3.3.1 Optimal Purification of Tn5 The first stage of the method development was to purify Tn5. This was used as both an initial proof of concept for Tn5:mHDAC, but mainly as a way to set up the purification as had been reported previously [186]. The same Tn5 plasmid as used by Picelli et al., pTXB1-Tn5, was obtained through AddGene, which fused Tn5 with an Mxe Intein and chitin binding domain (CBD) (figure 3.2a). The plasmid was created in a way to optimize purification with the induction of the Tn5 protein. The CBD allowed for the initial purification of to be completed with chitin beads after sonication of an IPTG induced E.coli cell pellet. Afterwards, Picelli et al. designed an Mxe intein (described above) such that the addition of a thiol group, such as sodium 2-mercaptoethanesulfonate (MENSNa), to the column would cleave the Tn5 off of the chitin beads. In a gravity column the Tn5 would settle at the bottom of the column. Once the column is eluted, the majority of the Tn5 protein was eluted in the first 15 mL (figure 3.2b right panel) [186]. The purification steps are described above (section 2.2.3 - 2.2.11) and the general pro- cess will be described briefly again here. The first step was to transform the plasmid into a BL21DE3 E. coli strain, plate onto 1X AMP resistant agar plates, and left overnight at 37 C. The next day, colonies were picked, grown, and sent for sequencing at Laragen Corporation (using T7 and T7-terminator primers option) as described in section 2.5.2. After confirming the plasmid sequence a small scale induction verified before completing a 1L induction with IPTG. A typical example of a pre-post induction can be seen in figure 3.2b left panel - green lanes, where the induced sample showed the whole fused Tn5 at about 75 kDa which included the Mxe intein and the CBD. After purification and concentration, the unassembled Tn5 was stored at20 C by creating a 55% glycerol stock by mixing 1.1 vol 100% glycerol, 0.33 2x Tn5 dialysis buffer to the concentrated Tn5 sample [186]. First attempts at purifying the protein were successful but yielded a low quantity of puri- fied Tn5 (figure 3.2b left panel). Further optimization of purification were completed and it was discovered that the yield is higher under the certain conditions briefly described here. The first condition was to use a fresh seed culture for 1L expansions. 69 Figure 3.2: (a) pTXB1-Tn5 (Tn5) plasmid map used in the experiment [186], generated in SnapGene. (b) SDS-PAGE of the pre and post optimized purification of Tn5. The gel was run at 200V for 120 minutes at4 C and then stained in coomassie blue for 1 hour before being destained for at least 3 hours. Left panel: The pre optimized conditions with a comparison of pre and post induced sample (green lanes). The wash is post binding of the Tn5 to the chitin beads. Fractions (orange lanes) were combined and concentrated as described in section 2.2.11. Right panel: Optimized conditions for purification of Tn5. The crude sample refers to prior to loading the sample onto the chitin column. The flowthrough is the sample which did not bind to the chitin column. The washes are completed with HEGX buffer and a sample is taken after 10 CV was put through the column. The fractions (orange lanes) are prior to any concentration of the sample. 70 Although a seed culture could be stored for a few weeks at 4 C , the efficiency of induc- tion was noticeably increased using a seed culture the morning after an overnight expansion. The next condition was initially described in Picelle et al. but was not followed in original iterations of purification [186]. The condition was to cool the culture from37 C to about10 C by leaving the culture at 4 C for about 10-15 minutes prior to inducing the culture with 0.25 mM IPTG [186]. The final optimizations all centered around using fresh reagents. It was found to be important to use fresh chitin beads, MESNa, and HEGx. The pure MESNa powder was diluted with water, so it is likely over time the thiol group was hydrolyzed, reducing the ef- fectiveness of the intein cleavage. Although the HEGx buffer was stable for a few weeks at 4 C, the purifications had less background when the buffer was made fresh (data not shown). A comparison between the pre and post optimized conditions can be seen in figure 3.2b. 3.3.2 Activity Optimization of Tn5 After the purification of Tn5 was completed, activity of the enzyme had to be tested and for this a similar approach was taken as previously described [186]. As described above, to create the activated Tn5, the enzyme had to be dimerized using a tspn. The first stage was to anneal the primers Tn5MErev and Tn5ME-A (sequences in table 5). To anneal the primers, a simple protocol was run where 50L of 100M of each primer were mixed together and incubated in a PCR machine at80 C for 1 minute and then the temperature was stepped down by2 C every minute until the sample reached 10 C. To test the annealing of the primers, an EMSA gel was completed as shown in figure 3.3a. The optimal results were seen when a single higher band in the annealed conditions was visible, which indicated a completed annealing. It was acceptable if there were feint bands of either Tn5ME-A or Tn5MErev, as long as the dominant product was the annealed primer (ME). After verification, the annealed primer was then stored at80 C for at least 6 months. Once annealing had been confirmed, assembly of the Tn5 dimer was completed by mixing 0.125 vol of ME with 0.4 vol 10% glycerol, 0.12 vol 2x Tn5 dialysis buffer, and 0.36 vol of 1.85 mg/mL Tn5 [186]. The concentration of Tn5 was dependent on each purification but the final concentration desired was 11.5 - 12.5 M, so if the concentration was higher or lower 71 the vol ratios might be slightly altered with keeping the final concentrations constant, with the exception of the annealed primers as that depending on each batch. The dimerization of Tn5 was a straightforward protocol which, after mixing the components together, were incubated at RT for 1 hour. This allowed for numerous conditions to be tested readily in order to obtain optimal ratios as shown in figure 3.3b-d. To test the assembly and activity of Tn5 ,various amounts of Tn5 would be used to frag- ment 50 ng of DNA over 10-20 mins at 55 C [186]. If the assembly failed or there were problems with the assay, it was visualized in this step. There were slight differences in the reaction buffer used for the fragmentation assays, in which TD buffer from the ATAC-Seq [28] protocol was used instead fo the buffers outlined by Picelli et al. due to higher activity with Tn5 (data not shown). The first attempts at assembling the Tn5 correctly failed as shown in figure 3.3b. It was discovered the concentrated of Tn5 after purification was about ten fold higher than thought and, as a result, it was likely there was not enough ME to form dimerized Tn5. The lack of an ME band at the bottom of the gel (> 100 bp) indicated this. The increased amount of Tn5 in the reaction was also the likely reason the DNA cannot be seen in the gels in figure 3.3b because the DNA was trapped by the Tn5. Once the quantification issue was realized, various amounts of ME were tested in various Tn5 dimerization reactions, as seen in figure 3.3c. For a 50L reaction, the standard amount of ME called for 6.25L but there was a lot of ME remaining after the reaction (> 100 bp). There was almost zero ME visible in an industrial Tn5 (Illumina) figure 3.3d, which resulted in testing the required amount for the homemade Tn5. As seen in figure 3.3c, a titration of ME was completed to determine the concentration required without affecting the efficiency of fragmentation. The result showed when the volume of ME was reduce to 1 - 1.5 L, the amount of remaining sequence> 100 bp was drastically reduced. Although the fragment size was slightly increased, the input DNA was completely fragmented. The amount of Tn5 within the reaction was also important in the efficiency of fragmentation. The DNA fragmentation was unstable when excess Tn5 was used (figure 3.3d). An important note about the quantification issue was when the concentration of Tn5 was corrected, batch 2 (B2), showed fragmentation activity (figure 3.3 d). 72 Figure 3.3: (a) Annealing of the Tn5 tspn. 1M of sample was loaded for each lane. The pre-annealed sampels are in the green lanes and the blue lane is the annealed sample. EMSA using an 8% TBE gel run for 30 mins at 4 C. (b) 1.2% agarose gels showing Tn5 failing to cleave 50 ng of DNA. 0 Tn5 conditions indicate adding in 1L of the Tn5 storage buffer. Left panel: Using different concentrations of Tn5 in the reaction of the first batch of purified Tn5 (B1).The gel was run for 30 min at 125V . Right panel: Showing two different batches (B2 for batch 2). The gel was run for 50 min at 75V . (c) 1.2% agarose gel run for 40 minutes at 100V . A titration of the amount of ME used in the assembly of the Tn5 dimer. The fragmentation assay was run for 20 minutes at 55 C. The Tn5 was able to assemble as was shown by the fragmented DNA. (d) 1.2% agarose gel run for 45 minutes at 100V . This was the first example of Tn5 cutting the DNA after incubation for 20 minutes at55 C. Standard Illumina Tn5 was used as a control and variable amounts of the annealed ME was used to assemble the Tn5 for either B2 or batch 3 (B3). Also, the amount of Tn5 was changed (1L was used in the standard lanes and 2.5L was used in the 2.5x lanes. (e) 1.2% agarose gel run for 60 minutes at 75V . 50 ng of DNA was added to each lane and the samples were incubated at the indicated time at 55 C. The amount of ME used to assemble the Tn5 dimer is listed. These results showed the assay was run to completion (most fragments > 1 kb) within 7 minutes as previously described [186]. 73 The last optimizations were to determine the appropriate time for fragmentation. The time required to complete the fragmentation assay in Picelli et al was 7 minutes at55 C [186]. Once the other optimization conditions were completed, the quality of the Tn5 was determined based on the time necessary for complete fragmentation (majority of fragments ¿ 500 bp). A time- course was completed on the Tn5, shown in figure 3.3e, which confirmed high quality Tn5 and matched previously published work [186]. The size of the fragments were all around the same distrubtion regardless of the ME amount included in the assembly reaction, indicating the concentration of Tn5 was the most important factor in assembly. Thus, the lowest volume of ME was sufficient to assemble the majority of Tn5 in the assembly reaction (figure 3.3e). 3.3.3 Tn5:mHDAC fusion plasmid creation After purification and confirmation of high quality Tn5, the next stage was to create a modified Tn5 plasmid with mHDAC fused to it (Tn5:mHDAC). When setting out to accomplish this, the first step was to determine the best cloning technique. As discussed above, the mHDAC domain was 108 bp in length, which made it both a small but also a larger insert depending on the method being used [271–273]. The Tn5 plasmid was 8 kb in length (figure 3.2a) and the insertion of a relative small fragment of 108 bp was expected to be a little challenging. However, as will be discussed in this section, the process ended up not being successful and ultimately was completed by GenScript with their CLonEZ method. There were two methods which could have been used for this process, a PCR based method to ligate the modified linear plasmid into a circular plasmid form [cite this?] and Gibson Assembly (GA) [271–273]. Ultimately, the decision was made to use GA due to both the expertise within the laboratory and availability of industrial kits. The experimental design for GA, described in figure 3.4a, uses overlapping regions of two or more fragments to join them together with sticky-end ligation [271]. The GA strategy used here was to synthesize the 108 bp of mHDAC first and PCR the overlaps afterwards. This was because there could have been some optimization needed for the length of the overlaps and it was determined it would be easier to order new extension primers. As a result, mHDAC-FWD and mHDAC-REV were 74 Figure 3.4: (a) Figure 1 from Gibson et al. [271] showing the process of Gibson Assembly. The strategy is to have overlapping primers, such that the two intersections will come together and be ligated back together via sticky end ligation. (b) Sequence for the insert of mHDAC with the associated primers. The blue bar indicates the position on insertion (position 1470 of the Tn5 plasmid). The bar above the feature names indicates the amino acid (numbered below) within the given feature. The number at the end of each row is the DNA position in the plasmid. The primers seen have their sequence and direction indicated by the arrow direction. The plasmid map was generated using SnapGene. 75 ordered and used for subsequent PCR (table 5). The primers were created with a 20 bp overlap due to the size of the Tn5 plasmid (New England BioLabs) [cite NEB protocol?]. The plasmid sequence and primers around the insertion point for the mHDAC are shown in figure 3.4b. It was important to insert the sequence between the Tn5 transposase and Mxe- intein due to intein cleavage. As a result, at bp position 1470, the insertion was directed to that site. The linearization of the Tn5 plasmid was straightforward and completed with a single PCR reaction (figure 3.5a) using the Tn5-linear-FWD and Tn5-linear-REV primers (table 5). As there were some contaminate products seen the gel, the linear plasmid fragment at 8 kb was cut out of gel and gel purification was completed. Although this resulted in around a 50% yield compared to a PCR Clean-Up (data not shown), it produced a very clean linear product (figure 3.5c-d). The GA insert of mHDAC was created via PCR using the mHDAC-FWD and mHDAC- REV primers (table 5). The product was visualized via gel electrophoresis as shown in (fig- ure 3.5a). The product was cleaned up via Qiagen PCR Clean-Up Kit but after failed GA, it was determined it also required a gel-purification in order to generate GA products (fig- ure 3.5c-d). For unknown reasons, GA continually failed the gel extraction was not completed. Once this issue was resolved, the Tn5:mHDAC product formed as shown by the large band in figure 3.5d. Although the band associated with a formed plasmid, after gel purification of the product and transformation into E. coli, there were no colonies formed on AMP resistant plates. The GA process was repeated numerous times successfully, however, colonies were not able to be formed even though other positive controls such as in (figure 3.5c pUC control were successful. As a result, the assembly was abandoned and it was decided to send the plasmid and mHDAC to an industry company, GenScript, where their CLonEZ method was successful. 3.3.4 Tn5:mHDAC protein expression After transformation the Tn5:mHDAC plasmid into BL21DE3 E. coli and verifying the via Sanger Sequencing section 3.3.1using the following primers mHDAC-Sanger1 - 3 (table 5) and T7, T7-Rev (provided by Laragen Corporation). After induction of Tn5:mHDAC was con- firmed in a small-scale induction (figure 3.6a), colonies B and C had bacterial stocks created 76 Figure 3.5: (a) 1.2% agarose gel (60 min at 120V) showing the PCR product from adding the overlap- ping tails to the mHDAC insert (108 bp). The final product was 145 bp. (b) 1.2% agarose gel (25 mins at 120V) showing linearization of the pTXB1-Tn5 plasmid. A and B correspond to two independently purified Tn5 plasmids and the linearization was completed in duplicate as indicated by A1, A2, B1, B2. (c) 1.2% agarose gel (30 mins at 120V) showing the gibson assembly using a positive pUC control (in- cluded with the Gibson Assembly Kit) and the Tn5:mHDAC attempt. As shown in the ”Gibson pUC”, the positive control assembled. There was no assembly in the Tn5 tests. The radio indicates moles of plasmid : moles of mHDAC insert. (d) 1.2% agarose gel (40 mins at 100V) showing a positive result from the gibson assembly for the Tn5:mHDAC assembly. The ratio indicates moles of plasmid : moles of mHDAC insert. (e) The optimized codon sequence provided by GenScript for E. coli. This was the final DNA sequenced used for the mHDAC insert. 77 and subsequently stored at80 C for future induction experiments. Afterwards, protein ex- pression was completed for both Tn5 and Tn5:mHDAC in the same batch (figure 3.6b). This was to limit batch effects and previous batches of Tn5 were over 6 months old and started to lose efficiency [186]. Purification of both proteins at the same time was also important to negate any discrepancies of the assembly and activity assays. The purification of the proteins resulted in an expected shift between the Tn5 and Tn5:mHDAC (57 kDa versus 53). This in- dicated mHDAC had been correctly inserted into the protein and expressed without an issue (figure 3.6b). The next stage was to assemble the two Tn5 proteins and, as with the standard Tn5, there were problems assembling the Tn5:mHDAC. Even though various amounts of ME were used, previously used with Tn5, there was no fragmentation of the DNA (figure 3.6c). Although the quantification of the Tn5 and Tn5:mHDAC were completed in the same manner, the mHDAC had more protein, which had to be diluted in order to obtain successful assembly. The reason for quantification differences were unknown and the best way to determine the correct con- centration were to complete titrations of Tn5 and ME to find the correct ratio for each batch. Standard Tn5 was less variable but often required a few assembly conditions to identify the most active (data not shown). However, once the assembly had been confirmed, the robustness of the assay can be seen in figure 3.6d. The DNA was fragmented for 7 minutes at 55 C and the first two Tn5 lanes, showed the standard reaction conditions (1 enzymatic unit (U)) with optimal fragmen- tation. The numbers in subsequent lanes were the number of U added to each reaction. It was predicted the addition of less U, the amount of fragmentation would be decreased as well. The result showed both the consistency between batches, the same activity between Tn5 and Tn5:mHDAC, and the amount of U corresponded with the fragmentation (figure 3.6d). 3.3.5 Tn5:mHDAC binding to MEF2 In order for the proof-of-concept to show a positive result, the ability of Tn5:mHDAC to bind MEF2 needed to be confirmed. The first step was to ensure MEF2 used in the experiment process was capable of binding to DNA. For this, MEF2a (MEF2 for simplicity) was purified 78 as previously described [68] and utilizing the plasmid 3XMEF2-luc (table 4), the 3x MEF2a binding sequence was used. Since the binding sequence was only 10 bp in length with a small spacer in between, primers M2RS-EMSA-top and M2RS-EMSA-bot were purchased and an- nealed together (table 5). The subsequent 56 bp string of DNA was the same sequence found in the 3XMEF2-luc plasmid, which allowed for the plasmid to be used in future experiments (discussed below). To confirm the MEF2:DNA complex was being formed, an EMSA was completed by load- ing 1M of the M2RS-EMSA DNA and subsequently increasing the concentration of MEF2 until all three binding sites were occupied as seen in figure 3.7a. The amount of MEF2 needed to occupy all 3 sites was found to be around 2.5M, which showed an almost stoichiometric ratio. The difference was likely due to different quantification methods. When Tn5:mHDAC tests were completeled, a slightly lower concentration was used because it was not favorable to have unbound MEF2 to DNA in the assay. After confirming MEF2 binding to the M2RS- EMSA DNA, the next step was to determine if Tn5:mHDAC could bind to the complex as seen in figure 3.7b-c. The first test of Tn5:mHDAC seen in figure 3.7b was an extended titration using 2.5M of DNA and MEF2 and then a variable amount of Tn5:mHDAC ranging from 0.5 L of 12.5 M Tn5:mHDAC to 7 L. These reactions were left at RT for 30 minutes, which allowed time for the Tn5:mHDAC to bind the MEF2:DNA complex. The result showed a super shift once the concentration of Tn5:mHDAC was increased, showing the Tn5:mHDAC was capable of binding the MEF2:DNA complex. The titration for the required Tn5:mHDAC amount was tightened in figure 3.7c where 5M of DNA and MEF2 was used with a titration of 0-6L of the 12.5 L Tn5:mHDAC. The result showed after about 4L was added to the 5M DNA (second last lane), and that was sufficient to shift the majority of the MEF2:DNA complex. As a result, for subsequent reactions, the same ratio of Tn5:mHDAC was used. 3.3.6 Tn5:mHDAC Specificity and Assay testing After the confirmation of Tn5:mHDAC binding to MEF2:DNA complex, the ability for Tn5:mHDAC to specifically insert a barcode nearby MEF2 was the next stage. In order to accomplish this, a 79 Figure 3.6: (a) 12% SDS-PAGE run for 2 hours at 200V . The pre and post IPTG induction of four different Tn5:mHDAC colonies. The arrow indicated the Tn5:mHDAC protein which was expected at around 85 kDa. (b) 12% SDS-PAGE run for 2 hours at 200V . A purification gel for Tn5 and Tn5:mHDAC completed in the same batch. Tn5 has a kDa of 53 and Tn5:mHDAC has a kDa of 57. There is a shift up for the Tn5:mHDAC indicating the mHDAC probe was in the plasmid. (c) 1.2% agarose gel run for 30 minutes at 110V . This gel compared Tn5 assembled with 1.5L of ME and a titration of ME for two batches (B1 and B2) of Tn5:mHDAC. There was a quantification issue of the enzyme as the assembly failed. (d) 1.2% agarose gel run for 45 minutes at 100V . Activity assay for Tn5 and two batches of Tn5:mHDAC. The numbers indicate the number of enzymatic units used in the lane. As less Tn5 and Tn5:mHDAC was used, the less fragmentation there was. 80 linear piece of DNA with the 3x MEF2 sites would be created by using PCR on the 3XMEF2- luc plasmid backbone. For this, M2R-DNA-FWD and M2R-DNA-REV were used to create a fragment (M2R-DNA) as seem in figure 3.8a, where Fragment A was 500 bp away from the closest MEF2 binding site and Fragment B was 850 bp from the nearest MEF2 binding site. This strategy was used in order to create multiple fragment lengths if it was needed. As will be discussed in detail below, the strategy was used as a proof-of-concept but was inherently flawed due to Tn5’s inability to insert nearby TFs bound to DNA. The subsequent experiments were to determine both the ideal time and concentration of enzyme needed for further study. To create the conditions, first MEF2 was bound to DNA as described previously [68] with a minor adjustment of the molar concentrations. For these reactions, MEF2 was added at a 2.5x molar concentration compared to the M2R-DNA. After the MEF2:DNA complex had been formed, Tn5 or Tn5:mHDAC was added to the reaction at a 5x molar concentration compared to the M2R-DNA, such that the final ratios were: 1 M2R-DNA : 2.5x MEF2a : 5x Tn5 or Tn5:mHDAC. The reaction was left at RT without the addition of MgCl 2 for 30 minutes as in earlier tests (figure 3.7b-c). After the incubation time, the reaction was started by adding a final concentration of 25M MgCl 2 for the time labeled in figure 3.8b. The reaction was also completed at a lower temperature of 37 C compared to previous assays at 55 C because the higher temperature could have affected the ability of MEF2 to stay in complex with DNA. The added benefit was in order to achieve the long-term goal of completing the assay in living cells, having the Tn5 work at 37 C was beneficial. The result showed not only does the Tn5:mHDAC (top gel) cut a fragment at 850 bp more readily, it only required 1-3 minutes to do so. Without the addition of MEF2 (blue lanes), the DNA was not preferentially fragmented anywhere. Another interesting result from the exper- iment was at the ten minute mark, there was less specificity, likely due to more Tn5:mHDAC fragmenting the 850 bp fragment. Although the exact reason was not explored further. In con- trast, using Tn5, the fragment at 850 bp was less prevalent at the 1-3 minute mark and increased its frequency as the assay time was increased to 10 minutes (figure 3.8b). The activity of the Tn5 was less than was expected when compared to the Tn5:mHDAC without the presence of MEF2 in the reaction, this result indicated specificity of the Tn5:mHDAC to MEF2. 81 Figure 3.7: (a) EMSA of MEF2a binding to DNA (b) EMSA of Tn5:mHDAC binding to MEF2:DNA (c) EMSA of MEF2:DNA and then MEF2:DNA:(Tn5:mHDAC) 82 There was a flaw in the design of the experiment, which is highlighted in figure 3.8c, where the presence of MEF2 created a sterically unavailable region of the DNA molecule to which Tn5 could not cut. This phenomena is known when it related to ATAC-Seq as TF foot- printing can use this to determine TF bound to DNA. In figure 3.8c, the assay was run for three minutes using the same conditions as in figure 3.8b. However, the difference was us- ing variable amounts of the enzymes. Although the Tn5:mHDAC lanes had less background noise, it was impossible to say with confidence the specificity was higher. Due to this reason, another type of assay would have been needed to show the specificity. The plan to fix the issue is discussed in the future directions section. 3.3.7 Mutation of Tn5:mHDAC The final step taken during the development of TTT was to mutate Tn5 to reduce the activity. The motivation for this was to reduce the amount of off-target transposition events. For this, the strategy was to limit the ability of Tn5:mHDAC to bind onto DNA so transposition events would only happen when the protein was bound to a MEF2 complex. All other Tn5:mHDAC would not be able to bind DNA, to eliminate the need to wash away the unbound complex or have the complex bind to DNA without specificity. There were a lot of mutations completed in Tn5 (discussed in detail in section 1.7.2). However, since it was not desired to reduce the transposition activity, only residues which were thought to interact with DNA were used. Due to the negative charge of DNA, positive residues such as lysine (K) and arginine (R) often interact directly with DNA [cite?]. After looking into the potential interphase between DNA with a Tn5 dimer, there were two distinct residues found: K291 and R250. As seen in figure 3.9a, the positions K291 and R250 indicate the likely binding within the major and minor groove of the target DNA, so when the dimer Tn5 was formed, there would be four positions of binding to DNA. Although more modeling would need to be done, it appeared the K291 residue would fit into the minor groove and the R250 residue would correspond to the major groove. The residues were located far from the DDE active site and the residues important for tspn binding, so completing point mutations was unlikely to alter either the transposition activation or dimerization. 83 Figure 3.8: (a) Design of the DNA fragment being used. (b) Experiment testing the time needed for the experiment (c) Testing various concentrations of the DNA 84 For these reasons, both of the residues were mutated to a negatively charged residue, glutamic acid (E). By changing the charge of the residue, it was hypothesized the interface of Tn5 to DNA would be altered to repel due to charge-charge interactions. If the hypothesis was correct, unless the Tn5 was able to stay close in contact with DNA, the negative charge on the interphase would eliminate any off-target insertion of the DNA barcode. To complete the mutations, Q5 Site-Directed Mutagenesis (NEB) was completed for double mutants. The mutants were created using primers: R250E-FWD, R250E-REV , K291E-FWD, and K291E- REV and the sequences were confirmed with Sanger Sequencing from Laragen figure 3.9b. The R250E mutation altered the original bases from CGT to GAG and the K291E mutation altered the bases from AAA to GAG figure 3.9b. The double mutants were termed Tn5 or Tn5:mHDAC , for R250E and K291E. After confirmation of the mutagenesis, the two proteins were purified as discussed pre- viously without any optimization required (data not shown). To ensure there were no batch effects, both mutants and the standard enzymes were purified at the same time. Afterwards, the activity was tested as previously discussed (55 C for 7 minutes) with two separate dimerized assemblies figure 3.9c. As can be seen, the Tn5 and Tn5:mHDAC had a drastic reduc- tion in the fragmentation when compared to Tn5 and the Tn5:mHDAC. There was almost zero fragmentation in the Tn5 and Tn5:mHDAC . In some cases it was difficult to see any noticeable difference between the control and the fragmented sample. Although this was where the project was wrapped up for the present, the next few ex- periments would have been to verify the dimerization of the Tn5 samples using EMSA, to double check there was no hinderance of the dimerization due to the point mutations. The next step would have been to repeat similar experiments seen in figure 3.8b-c. The proof of concept for the mutation of Tn5 showed a positive result but was not explored further in this body of work. For the future work (discussed below), it would be likely to reduce the off-target insertion and generate a targeted insertion. It was left unknown whether the mutation destroyed all insertion, but as seen in figure 3.9c, the DNA appeared to have been fragmented slightly. 85 Figure 3.9: (a) 86 3.4 Discussion In this chapter, TTT was developed, which sought to create a new TF ChIP-Seq like assay. The goal of the method was to create a TF binding patterns in a live, single cell, without having to use antibodies to enrich the target DNA. The assay, although not completed, showed the possibility to preferentially insert a barcode at a given position nearby to a target TF. The method involved a lot of techniques including protein purification and mutagenesis and allowed the flexibility of creating a probe to numerous different targets, including potential drugs. The drug could be chemically linked to the Tn5 through expressed protein ligation (EPL). The assay started with standard Tn5 and evolved into the creation of a Tn5 , which limited the transposition activity. The beginning assay had a fundamental flaw due the sterics of the MEF2 bound to DNA, so future work would need to alter this assay in a different one. Two proposed improvements for future work include having a much longer fragment of DNA to bind the MEF2 to. The reasoning would be to limit the chance Tn5 would cut directly adjacent to the MEF2 as there would be more positions where the Tn5 could insert. Since the cut-off point of the Tn5 was around 100-500 bp, the fragment lengths of 500 bp and 850 bp were likely too small to be cut more than once. If each side of the fragments (figure 3.8a) were a few kb in length, the targeted insertion of Tn5:mHDAC would likely be more pronounced. Another improvement to the tagmentation assay could have been to create an assay which inserted a secondary resistance marker into the 3XMEF2-luc plasmid. This could have been completed by transformation of two plasmids with the same origin of replication into E. coli, one which produced Tn5 and had one type of resistance gene flanked by Tn5 ME sequences. A second plasmid would be transformed into that cell population and then plated on double resistant bacterial plates. Due to only one origin of replication, only one plasmid would be maintained throughout and if there were colonies formed on the double resistant plate, there would have had to be a transposition between the two plasmids. The colonies could then sequenced and determined if the resistance gene was nearby the MEF2 binding sites with either NextGen Sequencing, Sanger Sequencing, or PCR (to determine if the length of the product was altered nearby the MEF2 binding sites). The system would have to be optimized but it would have allowed a lot of flexibility both for Tn5 mutatgenesis and for future targets to be tested in 87 a high-throughput manner. The mutations described were specific to the Tn5:DNA interphase, so there should not have been any reduction in the ability for the enzyme to dimerize or to insert the tspn. Although this was not explicitly tested, it would have been unlikely due to the positions of the mutations and the known regions of importance both for dimerization and the active site. Further work would need to be completed to verify but the mutants had the desired affect of reduction of DNAfragmentation figure 3.9c. A final test of whether there was activity when Tn5:mHDAC had the ability to insert was not completed and would need to completed in future work to either confirm or deny the hypothesis of the mutants ability to insert, increasing the signal-to- noise. The TTT method could have been used in a few different ways. The first would be to replace ChIP-Seq for specific targets as the chemical probe could not only bind more tightly to target but methods such as phage display could make very specific binders over many rounds of selection [cite a phage display here]. The probe would also be chemically the same for every experiment, limiting potential batch effects and would be created in E. coli reducing the cost compared to antibodies. These advantages would have made the method an alternative to TF ChIP-Seq. Another way the method could have been used was to run along side an ATAC-Seq assay. If the modified enzyme and standard Tn5 were added into the cells at the same time and allowed to incubate for enough time for the probe to bind its target, the enzyme could have been activated with MgCl 2 . The standard ATAC-Seq could have been used and the specific binding position of a target could be completed at the same time as obtaining the chromatin accessible regions. The method would be used to confirm DNA footprinting currently completed with ATAC-Seq [28, 80, 90]. One of the most exciting possibility for the assay was the ability to multiplex with many different targets. Since a unique barcode could be added for numerous different probes, numer- ous TFs could have been probed at the same time. Assuming the signal-to-noise was low with the Tn5:mHDAC , a library of probes could create a landscape of TF binding. Combining multiplex TF binding positions with assays such as HiChIP would help to understand the 3D chromatin structure and how TFs such as CTCF and MEF2 are able to mediate long range 88 interactions. These interactions could be further understood as to how they correlate to gene expression to gain a deeper understanding of the complex system within cells. 89 4 Computational Analysis of HCM HiC 4.1 Introduction To gain a deeper understanding the chromatin dynamics of HCM, the 2017 study by Rosa- Garrido et al, which used RNA-Seq and HiC to see if a correlation between differential looping and gene expression was completed [10]. The authors used mouse cardiomyocytes to compar healthy, CTCF knockout (CTCF-KO), and TAC conditions. The study did not discover striking differences between the CTCF-KO and TAC conditions but found some when compared to the healthy samples. The authors linked the differentially expression gene (DEG) to more localized chromatin dynamic changes rather than complete TADs being altered [2, 3, 10, 190]. Rosa-Garrido et al mention this issue and note that while a CTCF-KO was able to mimic some of the features of heart failure, specifically HCM, there were still a number of changes which were not found in the TAC samples [10]. Due to the somewhat large resolution of HiC (five kilobases) and the high background noise in the HiC protocol, smaller changes mediated by proteins or other factors might not be visible. Although newer techniques, such as MicroC [274], have sought to get even higher resolution HiC data by using MNase instead of restriction enzymes, the assays still have high background issues and are complex to analyze at a gene expression level. Although there has been a concerted effort by the field to understand and analyze HiC generated contacts with tools such as HiCCUPS [275], Juicer [276], Fit-HiC [277], HiC-Pro [278], and others, the tools inevitably lead to differences in the statistical results depending on the tool being used [279]. As a result, it can be challenging to be confident in the small changes which affect gene expression. The goal of this project was to try and determine if there were MEF2/HDAC4 mediated loops that were being altered in HCM as the hypothetical model would suggest (figure 1.13). To test this, to different approaches were created to analyze the HiC data generated in the Rosa- Garrido et al manuscript [10]. There were two distinct approaches used to analyze the data, in order to have confidence in the result. Although they used similar methodology, a few key differences include reversing order of the filters and the HiC resolution used. 90 These two approaches would highlight the challenging aspect of using HiC to infer ChIP- Seq interactions and why the field has generally not gone down this computational direction and instead has preferred to use new methods such as HiChIP [53] to complete the enrichment of HiC [280]. Although there were some overlaps of the two methods, there were too many variables to control to obtain statistically arguable results. These variables will be discussed throughout this chapter and although the main conclusion was the necessity for HiChIP on car- diomyocytes to better understand the gene expression of HCM, the computational approaches here enabled for higher quality analysis of future work. 4.1.1 Experimental Design As mentioned in the Introduction, the main goal of this project was to determine if MEF2 mediated looping could be discovered using publicly available data-sets. Since the advent of HiC in 2009 by Lieberman et al [70], there have been over 5,000 publications citing the work and, as such, there have been thousands of HiC maps generated various organisms testing a magnitude of disease and other biological questions such as cell-cycle. Also, along with the thousands of ChIP-Seq experiments completed just by the ENCODE consortium alone, if these two types of data could be combined, very specific TF or histone mediated looping questions could be answered without having to do more experimentation [280]. The first method described here was similar to what was previously described [280] where the authors attempted to use HiC, ChIP-Seq, and RNA-seq data to determine the regulation of gene expression through the 3D chromatin landscape and the effect of a range of TFs or histone modifications. Although the result was very speculative, the authors had a very broad scope, so the initial methodology generated here was to look at very specific contacts. Due to the mechanistic evidence of MEF2/HDAC localization being lost under hypertrophic activation (figure 1.13), this type of analysis could have been insightful to better understand HCM and try to confirm the hypothetical model. The second approach was mainly generated as a way to confirm the findings of the first approach because it essentially worked backwards and if the findings were strong, should have reached the same conclusions. The overall workflow of the computational methods can be seen in figure 4.1 and will be discussed here. 91 The first stage in Method 1 was to use the dump command in the Juicer package [276], which allowed for the data to be segregated by chromosome and in a format which could then generate the necessary Fit-HiC input files from the HiC data generated by Rosa-Garrido et al [10]. After the generation of input files with a custom R script, Fit-HiC [277] was used to filter the HiC data such that only the strongest contacts were kept for subsequent analysis. Fit-HiC package uses a number of statistical tools, such as splines and binomial distributions to generate p-values and subsequent q-values for each contact generated during a HiC experiment [277]. The result is highly confident HiC contacts which are unlikely to be the product of random ligation events or other experimental bias. The specific settings used for these sets of experiments will be discussed in the results section. After the initial FitHiC filtering, differential contacts between the healthy and TAC (HCM) samples were determined and the regions which either did not contain any contacts were re- moved. The contacts were then separated into a few categories of whether they were only present in one sample or shared between them by subtracting the contact counts from the nor- malized contact counts generated in FitHiC. After this enrichment, publicly available MEF2 ChIP-Seq data [281] was used to find only contacts in which there was an overlapping ChIP- Seq peak [280]. The final stage of the analysis pipeline was to correlate the contacts with a ChIP-Seq peak to the RNA-Seq from the Rosa-Garrido manuscript [10]. The filtered con- tacts would then generate a quasi HiChIP data-set in which the intra-TAD regions mediated by MEF2 could be examined. Although MEF2 was the focus, the pipeline was written in such a way that any ChIP-Seq data could be used including other TFs or histone modifications. The second approach (Method 2) also used the Juicer package [276] to dump the data and also used the eigenvector function to bin and determine HiC compartments at a 100 kb resolution. The next step was to use these data to assess all the HiC contacts at a low resolution of 100 kb (1 bin)to determine where the changes between the healthy and HCM samples were. After binning the genome and filtering the regions which had differential contacts within the 100 kb window, the next filter applied was to keep bins which removed any which did not contain a DEG [10]. The final filter was then to remove any of the regions which did not have a MEF2 ChIP-Seq peak within a region of 1 bin. 92 Figure 4.1: (a) 93 The two methods used the same data and as will be discussed in the results section ob- tained some variable results. However, the data which was used is important to understand, so that the caveats of the analysis can be appreciated. The first issue was the challenge of obtain- ing MEF2 ChIP-Seq data in cardiomyocytes. Since the HiC and RNA-Seq data being utilized were from mouse cardiomyocytes [10], mouse ChIP-Seq data was required. The Cistrome Data Browser (www.dc2.cistrome.org) was used to find MEF2 ChIP-Seq experiments which had been completed in mice. Of the three studies probing MEF2C [281–283], there was only one which passed the quality control filters applied by cistrome. There was one study which completed MEF2C ChIP-Seq in mouse cardiomyocytes [282], there was high background and less than 500 total peaks called with a 10-fold enrichment (www.dc2.cistrome.org). The second study used B-lymphocytes [283] and it had a similar issue with high background and low en- richment as the cardiomyocyte sample. As a result, the other option was to use MEF2C tracks generated by Telese et al [281] in mouse cortical neurons, which passed all of the quality con- trol checks. The other option would have been to use MEF2A ChIP-Seq tracks, of which there were also only three. There was only one sample which passed the quality control measures and that was from the same paper from Telese et al in cartical neurons. The other two studies completed in HL-1 (ATTC number: SCC065) cardiomyocytes [284] and C2C12 (ATCC number:CRL- 1772) myoblasts [285]. Although they were completed in a closer cell to the primary mouse cardiomyocytes, the difference in binding and the lack of high quality data made the choice of ChIP-Seq track challenging. Ultimately, the MEF2C track from Telese et al was used in the analysis because of the data quality and the conserved nature of MEF2. It is likely the binding of MEF2C is conserved throughout developmental stages and after cell differentiation, the regions are sequestered in a similar mechanism as the use of MEF2C by the cells is ubiquitous in many tissues during development [255]. Although MEF2A could have been used, the main MEF2 implicated with HCM has been MEF2C [2,7,8,223]. Due to these factors, the use of the MEF2C in the cortical neuron was used in the analysis. In future, it would be ideal to have all the conditions from the same cells to complete this analysis. 94 The RNA-Seq raw files were downloaded from Rosa-Garrido et al and subsequently run through an RNA-Seq pipeline (https://github.com/ndu-UCSD/LJI RNA SEQ PIPELINE V2) to determine the differential gene expression data for healthy and HCM. The raw data was used in order to ensure the data processing was completed in a way which all the significant genes could be determined. The process of the pipeline will be discussed in a little more detail in the results section, but the overall method maps to the genome and uses DESeq2 [286] to determine the DEGs. The goal of these computational approaches was to determine if there were DEGs impli- cated in HCM being mediated by MEF2C through chromatin rearrangement. Although this method was using ChIP-Seq tracks from a different mouse tissue [281] and comparing them to in vivo work completed by Rosa-Garrido et al [10], the results might have been able to explain some genes directly controlled by MEF2C. Although the method described focused on MEF2 because of the mechanistic evidence of maladaptive MEF2 regulation, this technique could be used to study a variety of other TFs readily. 4.2 Results 4.2.1 Method 1 Results The first method needed to first to determine the best HiC package to use for filtering the data. There are a number of tools which can be used to generate high confidence contacts and previous work has shown some positives and negatives to various tools [279]. For this analysis, two tools developed in the Ay lab, Fit-HiC [277] and a newer tool, Mustache [biorxiv] were tested. The two packages both attempt to filter the most important HiC contacts but do so in different ways. Fit-HiC is less stringent and allows for many more contacts to be retained through the analysis [277], while Mustache is a newly developed tool which seeks to generate the most important contacts for genome structure. The methods approach the problem in different ways with Fit-HiC using spline fitting [277] and Mustache uses gausian distrubitions to probe the maximum peak frequencies, similar to how single molecule fluorescence microscopy filters based on the highest intensity SOURCE microscopre. Due to this difference, much more of the background and lower frequency con- 95 tacts are retained in Fit-HiC that could be important for more localized changes in chromatin structure. For these reasons, both packages were used to test which would generate the most relevant contacts. To run the Fit-HiC package, the HiC data from the Rosa-Garrido manuscript had to be converted into the proper file formats for the package. To complete this, a HiC data processing package called Juicer [276] was used. The ”dump” command from Juicer was used to extract chromosome specific contacts from the .hic file: java -jar ../juicer tools.jar dump observed KR 1 1 BP 5000<outfile file.txt>. The ”1 1” indicate the chromosomes to be unpacked and the ”KR” indicates the use of Knight-Ruiz balancing to normalize the number of reads across the sample [276]. These settings retained the 5kb resolution and prepared the files to be used in subsequent analysis with Fit-HiC. The file contained all the contacts from the sequencing file, which could then be read into R. The two required files for Fit-HiC were the<fragments> and<interactions> [277] which were generated using a custom R script. After the generation of those files for each chromo- some, Fit-HiC could then be run using the following settings fithic -r 5000 -l -f <fragments> -i<interactions> -o -L 10000 -U 2000000 -x intraOnly. The settings indicate the following: 5000 was the resolution fo the HiC data, 10000 indicates the minimum distance between contacts (10 kb),2000000 was the maximum distance between con- tacts (2 mb), and intraOnly tells the package to only look within the same chromosome. The intrachromosomal contacts were only used because the changes in gene expression are likely to be close contacts within TADs as was shown in Rosa-Garrido results [10]. The output from Fit-HiC retained about 1.2 million contacts with a q value less than 0.005, with about 350,000 contacts overlapping between healthy and HCM (figure 4.2a). After filtering the HiC files with Fit-HiC, the next step in the analysis was to remove any contacts that did not have a ChIP-Seq peak on either end of the contact. To do this, a custom R script was written which would take the average of the MEF2C ChIP-Seq peak [281] and extend the mean value by a user defined number of bins in either directions (where 1 bin is 5 kb) figure 4.2b. Although various bins were tested, after extending out more than 2 bins in each direction, the number of contacts was greatly increased. As shown in figure 4.2c, the number 96 Figure 4.2: (a) 97 of contacts was reduced almost by half when using 1 bin extension instead of 2 bins. Since the objective of this study was to determine the local changes, using a single bin was also desired because it should have correlated to the direct binding or loss of binding of MEF2C. After filtering out the contacts with MEF2C ChIP-Seq peaks, it was found there were many contacts which had similar positions. So, the final stage of the ChIP-Seq filtering was to merge contacts which were within1 bin on both sides of the contacts. This was a multi-step process which involved first exporting the contacts to a BED file and using the bedtools package [287], to merge any windows which were overlapping. To do this, the bedtools merge command was used with the following settings bedtools merge -d 1000 -i <filtered ChIP HiC result.bed> >. This command would merge the windows which were within 1 kb from each other, which merged the contacts 1 bin away due to the rounding being conducted. After the merging, the 16,209 contacts was reduced to 12,150 (figure 4.2d). The final number of contacts represented the number which had a MEF2C ChIP-Seq peak within a 10 kb range on either side of the contact and was about 1% of the total contacts filtered by Fit-HiC. Surprisingly, there was little evidence of a MEF2 dimer or higher order structures observed in the data, as only 7% of the total HiC loops had a dimer (data not shown). As a result, it was not possible to determine if the MEF2 dimers (and higher order structures) mediated by HDAC4 were being altered between disease and HCM. The lack of the structures are likely an artifact of the lower resolution of HiC and other methods such as HiChIP could be used to determine if these MEF2:HDAC4 higher order structure are present in vivo as seen in in vito crystallography studies [67–69], discussed in the Introduction. The final stage of the computational analysis in Method 1 was to correlate these contacts with the RNA-Seq profiles generated by Rosa-Garrido et al [10]. The raw RNA sequencing results were downloaded and subsequently analyzed using a customized python pipeline. The pipeline mapped the sequencing reads using an RNA-seq aligner, STAR (version 2.7) [288]. After the mapping, the sample raw counts were merged and general quality control measures of the sequencing data was completed using Qualimap2 [289]. The samples which passed the QC process of acceptable mapping were then used for subsequent analysis. The RNA-Seq samples were then run through differential gene expression on the samples 98 Figure 4.3: (a) 99 using DESeq2 [286] to determine the differentially expressed genes between the healthy and TAC conditions. The DESeq command [286] was used on the raw RNA-Seq counts in R using the triplicate samples in both the healthy and TAC, which resulted in almost 1,900 differentially expressed genes with an adjusted p-value of less than 0.05 (figure 4.3). To determine whether some of the highest differentially expressed genes matched what has previously been published, some of the known upregulated HCM genes such as collagen, type V , alpha 2 (COL5A2), en- dothelin 3 (EDN3), thrombospondin 4 (THBS4), and as well as others known to be downreg- ulated in HCM such as angiopoietin 1 (ANGPT1) and potassium inwardly-rectifying channel, subfamily J, member 3 (KCNJ3) had results as expected in previous studies [2, 3, 190, 222] (figure 4.3b). The final stage of the RNA analysis was to find the DEGs which were contained within the ChIP-Seq filtered contacts. For this, a similar filtration was completed as the ChIP-Seq filter using a custom R script. After the filtration, there were a total of about 15% of the genes found within a filtered looping domain (273 total) found within the 12,150 ChIP-Seq filtered contacts. There was a standard deviation threshold set of 0.6 on all the discovered genes, meaning that the read counts between samples could not have a standard deviation greater than 0.6 to be kept in the analysis. This was done to eliminate samples which had a large read-count variation. Overall, there was an enrichment of about 60% upregulated genes (169) in HCM and 104 which were down-regulated (figure 4.3c). The result matched expectations due to the outstanding hypothesis that gene activation is being mediated by HDAC4 relocalizing to the cytoplasm and allowing for previously silenced genes to be activated by p300 through not fully understood mechanisms (figure 1.9). To check whether some of the highest differentially expressed genes which overlapped with ChIP-Seq peaks were what was expected, the list was once again checked against the known genes. The top genes post ChIP-Seq filtering were similar to the top genes in the RNA- Seq which indicated there was likely an overlap between the MEF2 regulation and the ex- pression of the genes. A short list of some of the top hits were both upregulated and down regulated. Some of the top upregulated were synaptopodin 2 Like (SYNPO2L) [290, 291] and specifically with MEF2 [291–293] (figure 4.4a-c). Other noteworthy upregulated genes found 100 Figure 4.4: (a) 101 in the analysis were xin actin binding repeat containing 2 (XIRP2) [294, 295], collagen type VIII alpha 1 chain (COL8A1) [296, 297], and fibrillin 1 (FBN1) [2, 298–307]. Some of the known downregulated genes found in the analysis and linked in TAC HCM were phospholam- ban (PLN) [291,308,308–312] (figure 4.4d-f), ANGPT1 [313–315] and KCNJ3 [2,3,190,222]. In order to determine if there were different mechanisms in place, the sizes of the loops were constrained to either 1.5-2 mb, 1.0-1.5 mb, or 0.5-1 mb in order to determine if there was a correlation between the sizes of the loops and the differential expressions (figure 4.4g). However, in doing this analysis, a false discovery rate (FDR) was tested using randomized loops of the same sizes picked directly from the HiC looping file. Using the same number of loops at the given sizes, the FDR was extremely high when testing over 500 iterations of randomzied loops as shown in figure 4.4g. The mean value of the number of genes passing filter was calculated and the FDR was extremely high with a minimum of 55%. These results indicated the filtering method was flawed in that the coverage of the loops even controlled by MEF2 were not enough to pick out specific genes. Due to using the 1,900 DEGs, the loops which had a gene in it were already pre-selected to be associated with HCM. As a result of this extremely high FDR, it was not possible to draw many conclusions from the results and a different approach (Method 2) was needed to eliminate any genes which were not found in both analyses or to determine whether it was feasible to complete this type of filtration. 4.2.2 Method 2 Results The second method attempted to run the same analysis, but in a different order and with a different way to analyze the initial HiC data. Due to the high coverage causing issues with Method 1 causing a high FDR, this method used the eigenvector (compartment analysis) to bin the genome into 100 kb windows. The non-bias approach allowed for the determination of specific windows which contained DEGs and then subsequently windows which contained both a DEG and a MEF2 ChIP-Seq peak. The motivation for creating this method was to determine if computing the regions potentially controlled by MEF2 could be determined by combining the results of Method 1 and 2 to help and decrease the FDR found in the methods. 102 Figure 4.5: (a) 103 The first step of the method was to generate the eigenvectors (corrected interaction maps or ICE analysis), which allow for the visualization of regions which either have a higher or lower number of contact frequency than would be expected in another region [70, 316]. The eigen- vector values are usually termed compartment A for euchormatic regions or compartment B for heterochromatic regions [316] (figure 4.5a). Although the regions are not exact in whether the compartment is euchromatic or heterochromatic, the analysis gives a good approximation of regions which are being expressed or not. This analysis would be useful if there were windows which had compartment swaps that correlated to DEGs and further down the pipeline to show they contained MEF2 ChIP-Seq peaks. In order to complete the eigenvector on the dataset, the eigenvector command from Juicer [276] was used. The following command was used:javajar:::=juicer tools:jareigenvector KR<hic file:hic><chromosome number >BP 100000p. In brief, the settings call the eigenvector command fromjuicer tools:jar and complete KR normaliza- tion given a HiC file. The analysis can be computationally intense, so it is best to complete it one chromosome at a time at a fairly large window size. In the example above, the window size was completed at 100 kb (100000) and lowering the window size dramatically increased computational time (from minutes at 100 kb to hours at 50 kb - data not shown). The last pa- rameter ”-p” was used as an override to allow the analysis to complete - it was found without it, the command did not allow for the computer to utilize the correct amount of memory. To determine whether the eigenvector analysis was correct, the eigenvectors were com- pared to publically available H3K27ac ChIP-Seq data. Since the analysis showed good corre- lation to the ChIP-Seq tracks in both the healthy and TAC samples, the eigenvector analysis could then be used to determine which of the 100kb bins contained DEGs. There were not many bins which swapped from either compartment A to B or B to A (around 7.5% genome wide) but the majority (>95%) of those changes were around the limits between A and B (data not shown). Overall, there were not many drastic compartment switches between the two sam- ples confirming the results from Rosa-Garrido et al [10] that there were not drastic chromatin remodeling upon entering TAC. The next stage of the analysis was to look at which compartments had DEGs using the 104 already analyzed results which were used in Method 1 (figure 4.3a). To do this a custom R script was written to filter which 100 kb bins contained a DEG. After obtaining that list, the script then expanded the window to 3 bins (or 300 kb) in each directions and kept only the bins which also contained a MEF2 ChIP-Seq peak (similar to how Method 1 filtered). This narrowed down the number of genes from around 1,800 to 291 (figure 4.3b). The next stage was to determine the FDR of the genes being discovered. This was com- pleted using a custom R script which used completely randomized regions of the chromosome as ChIP-Seq peaks to see if they were discovering any genes. The same number of peaks were used in the mock sample and run through the same script to eliminate bins without peaks. The resulting FDR over 500 iterations was quite high at around 25% (figure 4.3c). Due to the high FDR, the analysis of the genes was not conducted because it was likely the genes being filtered could not be correlated to MEF2 but only that they were differentially expressed between the two conditions. There was a little bit of overlap between the genes discovered between the two datasets (about 40%) but given the high FDR it would not be possible to say with certainty whether this was due to falsely discovered results, or true findings. 4.3 Discussion The essence of this chapter was to answer two different questions. Was it possible to use pub- lically available HiC and ChIP-Seq data to discover protein mediated interactions and was the MEF2:HDAC4 complex able to mediate looping and affect gene expression in HCM. Through the analysis of the differential looping between healthy and TAC and correlating to differen- tially expressed genes, it was not possible to confirm either of these questions due to high FDR in the methods that were tested. Although the idea behind generating enriched HiC data has been attempted previously [280], the computational field has not adopted the practice and in- stead relied on newer techniques such as ChIA-PET [161] and, more recently, HiChIP [53,55]. The reason could be due to great difficulties in computationally enriching HiC data highlighted in the two methods used here. As a result, experimental enrichment has been the best method of overcome the hurdle seen in this analysis. The analysis using both methods found differential loops, however, due to the high cover- 105 age of HiC, it was not possible to distinguish signal from noise in a statistically relevant manner. The two approaches (Method 1 and 2), attempted to analyze the data in separate pipelines to determine if there was a superior way to enrich HiC data from ChIP-Seq results. Even though the methods were not able to yield results with a low FDR, Method 2 had half the FDR (23% compared to 55%), which is a good starting point for future work. The binning of the genome allowed for a less bias approach and the use of the eigenvectors could allow for more statisti- cal significance to be determined if the eigenvector could be tied to TPM and ChIP-Seq peak results. Although it is difficult to determine whether the FDR could be lowered using such approaches. There was an overlap of about 40% of the genes being discovered by both methods, which was much lower than expected and highlighted the issues with the FDR. If the analysis was robust, there should have been a higher correlation between the two datasets since the inputs were the same. It is not unlikely many of the genes found in both datasets are mediated by MEF2, but other experimentation would be needed to confirm this result, such as MEF2 or H3K27ac HiChIP. The issues with the analysis could have come from any of the data inputs used in the two methods. The first input was the HiC, which generates a very high coverage over the whole genome [70]. The issue with the coverage was the chances of a loop being present at any point over the genome was quite high, even when down-sizing the dataset to have the same number of loops being filtered by the analysis (figure 4.4g). Due to the Method 1 filtration, there were a lot fewer contacts after using the well established HiC package FitHiC [277], there were still over one million contacts spread across the genome. Since the loops were constrained to 2 mb, the possibility of there being a DEG within any of the loops was high. Method 2 attempted to ameliorate this by starting with a non-bias approach by using the eigenvector in hopes of being able to tie the expression or repression of a gene back to the eigenvector value but this attempt also did not work. The likely cause of this has been mentioned above in that the coverage of the HiC was too high to be able to filter out enough of the contacts to see the small changes between the healthy and TAC. The second issue was the use of the publicly available ChIP-Seq data, which was discussed 106 in greater detail above. The main issue revolved around using data from a different cell type and from a different experiment. For this analysis to be fully robust, the ideal case would be to have both HiC and ChIP-Seq data from the same set of cells and the same experimental conditions. However, the need for this has been almost eliminated because if an experiment could run both HiC and ChIP-Seq, the best experimental approach would be HiChIP and the need for the filtration would be negated. Due to the advancement of molecular tools like HiChIP, this type of computational filtration is not as necessary. Although there are still many challenges with HiChIP, specifically with TFs, the filtration of HiC using TF ChIP-Seq is not a recommended alternative. The filtration method described here could potentially be aided with the use of machine learning or other computational techniques. The last input of the RNA-Seq caused the least amount of issues and it was because using the DEGs made it challenging to filter out nonsense genes from actual hits. Although it was attempted to use all genes in the FDR filtration, there was not sufficient enough differences between using the DEGs (data not shown). The high FDR and the enriched data for DEGs between the two conditions made the analysis impossible to distinguish positives from the false positives. Even when comparing the two separate methods, the overlap was not sufficient enough to improve these issues. The two methods used were similar in their goal but reversed the order of operations, which should have generated analogous results. However, as was shown, the methods not only came up with a different set of genes but there were also problems within both methods in a high FDR. In summary, using Method 1 generated 273 genes which could have been mediated by MEF2 but had an extremely high FDR of over 50%. Method 2 was able to capture more genes (291) with a much lower FDR of 23%. The overlap between the two methods was also not high with only 106 genes overlapping (about 40%) and given the extremely high FDR in both methods, the chances of the overlapping genes being artifacts is quite high. As a result of these issues, the genes mediated by MEF2 ChIP-Seq peaks in the HCM data was not able to be determined. However, experimental methods, such as HiChIP, give a way to enrich the signal of HiC [53]. To complete this type of computational approach in the future, there would need to be more control in the datasets generated, specifically have ChIP- 107 Seq completed on the same cells in which the HiC was completed. Although, as was discussed, this approach would be limited due to the high coverage of the HiC libraries and HiChIP would be a better alternative. More sophisticated methods could also be attempted, such as machine learning algorithms, but that was beyond the scope of this manuscript. 108 5 Understanding NPPA and NPPB gene regulation in HCM 5.1 Introduction The goal of this chapter was to further understand the chromatin dynamics of HCM. Since other approaches like Rosa-Garrido et al [10] and numerous studies from the Backs lab and others [2, 5, 7–9, 222–224, 235, 237, 245, 246, 264–266] have indicated differential gene regulation and chromatin remodeling accompanying HCM. There have not been many extensive studies probing genome-wide chromatin perturbations upon activation of the disease. A recent study completed HiC but did not find decisive chromatin remodeling [10]. There have been some extensively studied, such as the natriuretic peptide hormones a and b (NPPA and NPPB) loci with recent work which completed 4C with in vivo knockdowns [5, 264, 266]. However, with the advent of newer technologies such as HiChIP, specific changes in the chromatin could be probed. The genome-wide studies have not discovered many perturbations of the chromatin landscape [10], while looking at a specific loci like NPPA/NPPB have [5,264,266], suggesting the changes seen in the disease were in the intra-TAD domains. There has been a focus of research on MEF2 in HCM due to its high association with the disease [2, 7–9, 67, 222, 223, 237] but the implication on chromatin when HDAC is relocalized into the cytoplasm remains unknown and was the main focus of work here. Although there has been a focus on MEF2 and HDAC, there are likely other factors regulating gene expression and looping structures, either acting in cooperation with MEF2 or other unknown TFs. Although it would be ideal to study the exact genes being regulated by MEF2, as attempted computationally above, but the lack of a robust ChIP-Seq grade antibody made this not feasible for ChIP-Seq or HiChIP. However, one of the main marks of gene expression is the histone modification H3K27ac, which marks cis regulatory elements [56], so it was an ideal candidate to understand how gene expression was being altered upon HCM activation genome-wide. The overall scheme of the experiment can be seen in figure 5.1a, b, which utilized 1-3 day old rat left ventricle cardiomyocytes and cultured them for 10 days. After the cells had grown, the cells were split into two: half the cells were grown under standard conditions and the other half of the cells were stimulated with a strong stimulus, norepinephrine, for 72 hours. 109 Afterwards, the cells were fixed to a final concentration of 1% formaldehyde for 10 minutes before being snap frozen in liquid nitrogen. The samples were then stored at80 C until either H3K27ac ChIP-Seq or HiChIP was completed. The resulting data was used to understand cis regulation between non stimulated conditions and the stimulated conditions (HCM model), to create a genome-wide data-set the community could then use to probe specific loci of interest. To understand histone modifications genome-wide, ChIP-Seq was used although the caveat of only 2D information being able to be retained, which missed the 3D regulation of gene ex- pression. To overcome that challenge, H3K27ac HiChIP was also completed to understand the chromatin dynamics in HCM. Since it was known looping domains could be activated or inactivated [70], correlating DEGs to differential loops narrowed down the exact regions being targeted by the disease. Using this data made it possible to narrow down specific loci with known MEF2 binding domains, but it also allowed for genome-wide analysis to discover other not well understood mechanisms within the disease. Histone H3K27ac HiChIP has been utilized in the field since its publication in 2016 and has been used to understand GWAS SNPs and cell differentiation to better gain insights into chromatin regulation of gene expression [53, 55, 56, 87]. Using these concepts of altered chro- matin structure with active histone modification, a similar approach was applied here to better understand HCM specific loci changes. Although it would have been interesting to complete HiChIP on MEF2, the lack of ChIP-Seq grade antibodies and the challenge of completing HiChIP on TFs made it less feasible. Since H3K27ac was a very robust antibody it was quickly optimized with ChIP-Seq and HiChIP and it allowed for a less bias approach to understanding genome-wide gene regulation. The goal of the work presented here mainly focused on the NPPA/NPPB locus to bet- ter understand the mechanisms of gene regulation in HCM as well as determine the role of MEF2. The ability of cells to have regions of chromatin to quickly activate by the removal a TF like HDAC from the nucleus could explain how a cell could respond quickly to environmen- tal stimulus. The mechanism of HDAC cleaving suggests some loops would be important as a response factor but chronic activation would lead to disease. For the response to be activated in a time-sensitive manner, having a poised region for active histone modification would allow for 110 Figure 5.1: (A) General scheme of the stimulation of neonatal rat left ventricle cardiomyocytes (NRVMs). 1-3 day old left ventricles were harvested and cultured for 10 days in a stable growth medium. Afterwards, the cells were split and half continued to grow under standard conditions for 72 hours and the other half of the cells were stimulated with a strong-AR stimulus, norepinephrine, for the same amount of time. Afterwards, the cells were formaldehyde fixed (1% final concentration) for 10 min- utes and subsequently snap frozen in liquid nitrogen. (B) Adapted from Chandra et al [56] showing the work-flow of the experiment. Frozen formaldehyde fixed samples were used to complete both H3K27ac ChIP-Seq and HiChIP. These experiments resulted in understanding cis regulatory elements from ChIP- Seq (top) and the subsequent looping of these elements to promoters or other enhancers (bottom). 111 rapid transcription. This would then enable the cell to activate a gene and in turn upregulate the histone modification H3K27ac [56]. The hypothesis presented here is the perturbations in chro- matin structure seen in HCM are within larger TADs and the intra-TAD domains are altered at a local level which cannot be seen in experiments such as HiC [10]. In order to accomplish this, H3K27ac ChIP and HiChIP were used to probe the cis regulatory looping. The ability of MEF2:HDAC to segregate at least four regions of chromatin [67] suggested loop remodeling upon relocalization of HDAC to the cytoplasm. In order to confirm this, pub- licly available ChIP-Seq tracks were used to match up MEF2 binding sites and the differential HiChIP looping domains between non stimulated (NS) and stimulated (S) cardiomyocytes. If there were regions of four chromatin loci coming together under non stimulated conditions and upon stimulation there was an altered stoichiometry in the loci, combining the loops with ChIP-Seq data would suggest the regulation of MEF2 in the loci. Here, the main focus of the work was in the NPPA and NPPB loci due to its importance in the HCM and other heart diseases [5]. Although there were many other loci identified which had differential loops between the S and NS conditions, the local interactions in the loci linked MEF2 to small intra-TAD chromatin changes. The genome-wide data generated here will be an extensive resource to the community wanting to study and understand particular loci in greater detail than has previously been published. 5.2 Results 5.2.1 H3K27ac ChIP-Seq To complete a high quality histone modification ChIP-Seq experiment, there were many factors which need to be considered as was discussed above. One of the key aspects of the optimization was the sonication of fixed chromatin to the correct size of 150-750 bp [128, 129]. For this, the first step was to fixate the cells in 1% formaldehyde for 10 minutes followed by snap freezing the pellets and storing them at80 C. To determine the optimal number of amount of time of sonication, a time-course was the ideal way to determine cycle number as previously described [128]. For this experiment, the cells were sonicated in a Covaris S2 sonicator and tested with the following settings: Duty Factor: 10%, Intensity: 2 or 5, Cycles per bust: 200, 112 Seconds per cycle: 10, 70L volume. The result, as seen in figure 5.2a, showed that for the higher intensity (5) settings, only about 120 seconds of sonication were required, while for the lower intensity settings (2), about 4 minutes was needed to obtain fragment sizes 100-750 bp. After the sonication was optimized, the sample pellets were resuspended in ChIP-Seq sonication lysis buffer such that the concentration of cells did not exceed 2M cells per 70L of buffer [129]. After the cells were lysed in 70L, they were sonicated for 4 minutes with the lower intensity (2) settings. To determine the size of the fragments, a fraction was removed, decrosslinked overnight, purified, quantified, and then run on an agarose gel as previously described [128, 129]. The result was a majority of the DNA fragments in all the samples to be between 100-750 bp (figure 5.2b). Although there were some larger fragments, it was acceptable as an over sonicated sample can dramatically increase the background signal in a ChIP-Seq experiment [128]. From there, 100,000 cell equivalents of DNA (500 ng) of the samples were used to run a ChIP assay with an H3K27ac antibody (Diagenode Catalog number: C15410196, Lot: A1723- 0041D) on an automated ChIP-Seq platform, IP Star, as previously described [128, 129]. After the IP washes, the samples were tagmented in a similar manner to ChIPmentation [136] and then decrosslinked and eluted before subsequent Illumina based sequencing [128, 129]. After sequencing, the samples were run through a ChIP-Seq processing package, ChIPLine (github.com/ay-lab/ChIPLine) to determine the mapping quality and generation of the readable sample files. All of the the samples had over 15 million sequencing reads except two as shown in figure 5.2c. There were fewer reads in the NS DMSO samples but they were able to be merged into the NS samples (discussed below). The samples were merged together using the Samtools package [317] Samtools merge command . The genome was binned into 500 bp regions using the MEDIPs package [131] and the average spearman correlation of the bins between NS samples was slightly higher than S (0.94 - blue box and 0.92 - red box respectively) (figure 5.3a). This was completed by us- ing the MEDIPS.correlation(method=Pearson) command in R. Due to the low read counts in the NS DMSO conditions and high correlation with the other samples, they were merged with the standard NS samples to give a total reads of about 70M. Since the read counts in the S 113 Figure 5.2: (A) Sonication Optimization time-course for neonatal rat left ventricle myocytes (NRVMs) using an S2 Covaris sonicator. The two conditions shared the following settings: Duty Factor: 10%, Cycles per bust: 200, Seconds per cycle: 10, 70 L volume The left panel of High intensity used Intensity: 5 and the right panel of Low intensity used Intensity: 2. The optimal settings used for the ChIP-Seq was in lane 17. (B) Sonication results for NRVMs used in the ChIP-Seq experiments. They were sonicated for 4 minutes at Intensity 2, such that the majority of the fragment sizes were between 100-750 bp. Each lane had between 200-250 ng of purified DNA and run on a 1.2% agarose gel for 35 minutes at 100 volts. (C) The mapping results from H3K27ac ChIP-Seq on stimulated (S), stimulated with DMSO (S DMSO), non stimulated (NS), and non stimulated with DMSO (NS DMSO). The Reads in Peaks indicates the number of the reads which were in regions called as peaks by MACS2 [267]. 114 samples were above 100M, the addition of the S DMSO conditions was not necessary as these samples were not used in the subsequent HiChIP experiments (discussed below). Overall, the sequencing quality control measures such as percentage mapped reads, uniquely mapped reads, and the amount of reads in peaks indicated a high quality sample (figure 5.2c). In order to generate the statistically relevant peaks from these files, MACS2 [267] was used with standard settings (qvalue < 0.05). As a result, there were around 78,000 peaks identified in NS and about 101,000 in S (figure 5.3b). The extra peaks found in S was not unexpected as it is known that there is a large up regulation of genes in HCM [2, 3, 6–10, 190]. The increase in the H3K27ac profile was expected because new enhancers and promoter regions would be activated or upregulated. To determine where there was a significant increase or decrease of H3K27ac between NS and S, the MEDIPs package [131] was used. To do this, the following command was used within the MEDIPs package in R: MEDIPS.meth(MSet1 =<NS group>, MSet2 =, p.adj = ”BH”, diff.method = ”edgeR”, CNV = FALSE, MeDIP = F), which probed all of the regions for differential counts between the samples. After, the samples were filtered using the MEDIPS.selectSig command to an adjusted pvalue < 0.05. Finally, the overlapping regions were merged with the MEDIPS.mergeFrames command so bins within 1kb were combined. As shown in figure 5.3c, there were about 3,300 H3K27ac peaks that were upregulated in NS and about 7,900 in S. The resulting merged samples (figure 5.3c), showed a high quality ChIP-Seq experiment was completed and that there was a high signal-to-noise in the experiment. In figure 5.3d, two loci showed the merged sequencing files for NS and S (visualized in the WashU Epigenome Browser, with the following settings: Aggregated=mean and smooth=0). The left panel showed the beta cytoskeletal actin (ACTB) loci, a known cardiomyocyte housekeeping gene loci [318, 319], which had high levels of H3K27ac in both samples without any differential peaks. The right panel showed the paired box 5 (PAX5) loci, a protein which has been implicated in immune response [320] and is not expressed in resting or stimulated NRVMs [3, 190, 319]. The loci of differential H3K27ac often fell around genes which also had differentially expressed genes (DEGs). Two examples of loci with differential H3K27ac (up and down reg- 115 Figure 5.3: (A) Spearman correlation and correlation plots between the non stimulated (NS) and stim- ulated (S) conditions. The blue box represents the non stimulated conditions and the red box represents the stimulated conditions. The spearman correlations were determined using the MEDIPs package [131]. (B) The ChIP-Seq samples were merged into either NS or S. The number of peaks that were called using MACS2 [267] which had a qvalue < 0.05. (C) The number of Differential Regions (pvalue < 0.05) called by the MEDIPs package [131]. This shows a larger amount of H3K27ac in the S conditions com- pared to the NS conditions. (D) ChIP-Seq tracks showing (Left) a known cardiomyocyte housekeeping gene [318, 319] actin beta’s (ACTB) loci. In the right panel shows the paired box 5 (PAX5) loci which is an immune response gene and not expressed in cardiomyocytes [3, 190, 319, 320]. Tracks were vi- sualized in the WashU Epigenome Browser. (E) Differential ChIP-Seq loci between NS and S. (Left) The Sodium V oltage-Gated Channel Beta Subunit 3 (SCN3B) loci showing up regulation of H3K27ac (pvalue< 0.05). Upregulated regions shown in gray. (Right) The Iroquois-class homeodomain protein (IRX5) loci showing down regulation of H3K27ac in S conditions (pvalue< 0.05). The down regulation regions shown in gray. Tracks were visualized in the WashU Epigenome Browser. 116 ulated shown as gray bars) can be seen in figure 5.3e. The left panel showed the sodium voltage-gated channel beta subunit 3 (SCN3B) loci, which has been implicated as an upreg- ulated gene in HCM [3, 319, 321]. The loci showed a upstream upregulated enhancer in a non-coding region of the DNA as well as a large up regulation at the gene’s promoter. There was also a downstream region which was upregulated as well, but without HiChIP or other 3D capture data, linkage between the SCN3B promoter and the intragenic region in the gram do- main containing 1b (GRAMD1B) gene would be challenging to determine. In the right panel, the loci for the Iroquois-class homeodomain protein (IRX5) was shown, where the gray bars indicated down regulation of H3K27ac in the S conditions. IRX5 has been shown to have an important role in myocytes and specifically the left ventricle of the heart [319,322,323]. There was a downstream enhancer in S in a non-coding region, there was also an upstream enhancer, located in the intragenic region of alpha-ketoglutarate-dependent dioxygenase (FTO), which was also down regulated. Interestingly, the down regulation in the intragenic region of FTO did not have an affect on the expression levels of the gene in previous reports such as Dai et al in 2020 [319]. The implication these differential enhancers and promoters had on specific gene expres- sion was not possible to ascertain from ChIP-Seq data due to the 1D nature of the assay. Al- though a generic correlation between the H3K27ac patterns of loci which were up or down regulation often occurred in regions of known corresponding gene expression profiles, it was not possible to tie them specific enhancers or promoters. This was due to the inability to tie cis-acting regulatory mechanisms to their interacting partners in 3D space. Due to these posi- tive results, it was an initial proof-of-concept for differential H3K27ac levels between NS and S, but also indication the antibody and ChIP workflow would be sufficient for HiChIP. 5.2.2 H3K27ac HiChIP Optimization, Preparation, and Sequencing The bulk of this chapter will focus to the H3K27ac HiChIP and subsequent analysis. Although ChIP-Seq has been a powerful tool used to better understand local and 1D changes in chro- matin or TF binding, without understanding the 3D interactions, the exact nature of the cis regulatory elements can not be determined. As a result, it has been a challenge in the field to 117 fully understand how specific genes are being activated or inactivated [93]. Here, H3K27ac ChIP-Seq was used to show many regions of chromatin were up or down regulated in HCM which indicated potential regions of interest in the disease. However, without the ability to tie certain enhancers and promoters together or to implicate specific TFs or genomic loci together, the power of ChIP-Seq was diminished. Luckily, the use of newer tools such as HiChIP [53], a deeper understanding of the 3D interactions implicating specific TFs, promoters, or enhancers in HCM was completed. As shown in figure 1.6, HiChIP combined the first steps of HiC with ChIP-Seq being completed prior to biotin enrichment and subsequent sequencing. In order to optimize the assay, the first steps were to ensure the conditions for proximity ligation and subsequent sonication were optimal. As shown in figure 5.4a, the conditions were optimized in GM12878 cells. To run the optimization, two 5 million fixed cell pellets were resuspended in 500L of HiC Lysis buffer [70] and washed twice to isolate the nuclei before digesting with 200 U of MboI (4-base cutter) as previously described [53,325]. After the digestion of the DNA, the next step was to ligate the fragments back together without the addition of a biotin dATP. This step was not completed in the initial optimization (figure 5.4a) because streptavidin enrichment was not completed. The ligation was left overnight instead of the 4 hours indicated in the protocol [53, 56] to ensure complete religation of the samples. Based on the results, further optimization or modifications on the first steps was not re- quired. The sonication was also based on previous sonication optimization figure 5.2a, but the higher intensity setting (5) was used for 2.5 minutes. Due to the robustness of the H3K27ac antibody, the amount of sonication could be slightly longer because the histone modification was very robust and the chances of removing the mark is much lower than if a TF was being enriched [129]. After the initial optimization was shown to be robust, the HiChIP assay was completed with the help of Vivek Chandra at the La Jolla Institute, who has expertise in running the HiChIP assay [56]. Although the ChIP-Seq had been optimized previously for 100k cells, with HiChIP, the input was about 10-fold higher, so the same methods were not be used. The major 118 Figure 5.4: (A) HiChIP optimization for the pre-ChIP steps. A 0.6% agarose gel was run for 60 minutes at 75 volts showing the sample prior to digestion with a restriction enzyme (lanes 2-3), digestion of MboI (lanes 4-5), overnight ligation (lanes 6-7), and the optimized sonication of the religated sample. The sonication was completed for 2.5 minutes with Intensity 5. (B) Fragment Analyzer track of the completed HiChIP library which was used for subsequent sequencing. The median fragment size was around 650 bp. (C) The total number of sequenced reads per HiChIP sample. (D) Figure adapted from Servant et al [324] showing what was called as Valid Pairs and Invalid Pairs. Valid Pairs were reads which had Forward-Reverse (FR) reads, Reverse-Forward (RF) reads, Forward-Forward (FF) reads, or Reverse-Reverse (RR) reads. The invalid pairs were reads which did not have two complete separate loci. 119 difference was the high number of cells eliminated the ability to use the automated platform (IP Star), if a single batch of ChIP was to be completed. As a result, after the overnight ligation and subsequent sonication, the ChIP-Seq was completed manually. The HiChIP workflow was labor intensive, especially for day 1 because the assay started from the frozen cell pellet to the incubation of the religated and sonicated sample with H3K27ac antibody. There were no major changes from the protocol described previously [56] and the steps will be briefly summarized here. The cell pellets containing an estimated 2M cells had their nuclei isolated before being digested by 200 U of MboI restriction enzyme. The sticky ends were filled in with biotin-dATP using DNA Polymerase I, Large (Klenow) Fragment fol- lowed by religation of the filled ends with T4 DNA ligase for 4 hours [53, 56]. During the end repair and ligation of the cells, 7 g H3K27ac antibody (Diagenode Catalog number: C15410196, Lot: A1723-0041D) was incubated with 25L protein A-coated magnetic beads (Dynabeads) and subsequently washed with ChIP RIPA buffer [53,56]. The religated cell pellet was resuspended and sonicated to a median peak around 450 bp (data not shown). Afterwards, the sonicated chromatin was diluted and incubated with H3K27ac antibody coated protein A beads overnight at 4 C with rotation. The following day, standard ChIP-Seq washing, overnight decrosslinking, and purification were completed [56]. The third day of the protocol attached adapters (NEBNext Ultra II Li- brary Kit) to the purified DNA before using Streptavidin C-1 beads to capture the biotinylated DNA. Afterwards, Streptavidin C-1 beads was completed and then samples were amplified for final sequencing libraries. After amplification, the libraries were purified using AMPure XP beads (Beckman Coulter Life Sciences) to get a final size of about 300-800 bp, with an average size of about 650 bp (figure 5.4b) [56]. To sequence the libraries, 50 bp paired-end sequencing was completed on an Illumina NovaSeq6000 and the total number of reads can be seen in fig- ure 5.4c. Some libraries were sequenced deeper than others, after filtering the data all samples were scaled to have the same number of reads. 120 5.2.3 H3K27ac HiChIP Initial Filtering Analysis As was shown in figure 5.4c, there were at least 400M sequencing reads in each sample, which was enough to generate high quality data. After the sequencing, the next step in the analysis was to use the HiC-Pro pipeline [324] to remove any background signal and assign the cis interact- ing contacts which were between 10 kb to 3 mb away from each other. The pipeline removed any contacts which were over represented by PCR duplication and were inter-chromosomal. The general process of the HiC-Pro pipeline will be briefly explained here. The first step of the pipeline was to map the samples to the rn6 genome using bowtie2 (v2.3.3.1) with the following global options: –very-sensitive -L 30 –score-min L,-0.6,-0.2 – end-to-end –reorder and local options: –very-sensitive -L 20 –score-min L,-0.6,-0.2 –end-to- end –reorder as previously described [56]. The aligned sequences were then paired together and were then put through the HiC-Pro pipeline [324, 326]. The second step was to determine the valid pairs, which would remove any pairs not meeting a number of parameters. Some of the parameters included requiring a MboI cut-site, a specific library fragment size, the removal of invalid ligation products (discussed below), and if a pair is over represented due to PCR duplication. One of the major filtration mechanisms of the pipeline was to eliminate the Invalid Pairs (figure 5.4d). As previously reported, in HiChIP experiments there were about 50% Invalid pairs [53, 55, 56, 165, 326] and in the data-set a similar number was observed as shown in fig- ure 5.5a,b. The invalid pairs, including Singletons, Dangling Ends, Self Circles, and Dumped pairs (figure 5.4d right panel) were removed from the analysis. These types of pairs were generally caused by regions which were not digested completely, filled-in properly, or were not religated [324]. There were four types of valid contacts: Forward Reverse (FR), Reverse Forward (RF), Forward Forward (FF), and Reverse Reverse (RR) (figure 5.4d left panel) [324]. These in- teractions were from different regions of the genome as denoted by the blue and red lines ( > 10 kb and< 3 mb from each other). There is a junction in the middle (black dashed line), which represented the religation event and in this analysis a MboI cut site was also required in that region. A equal distribution of these valid contacts (FR, RF, FF, and RR)) was expected 121 if the samples had completely random religation events [324]. As can be seen in figure 5.5b, the Valid Pairs Type in all three samples (NS, S1, and S2), all had about 25% representation. This was a positive quality check to ensure the Valid Pairs were being correctly called by the HiC-Pro pipeline. After determining the Valid Pairs, sequences that were over-represented in the analysis because of PCR bias were termed Duplicate Contact Pairs and subsequently removed (fig- ure 5.5a,c). Although the number of duplicates removed was slightly higher than previous HiChIP reports ( 45% vs a few as 30% [165]), the number was on par with other experiments such as ChIA-PET [327]. In the NS conditions, there was a slightly higher number of dupli- cated contact pairs (56%) but the number was still below the 60% threshold for a mediocre sample [324]. Due to these positive results, the remaining reads were moved onto the final stage of the HiC-Pro pipeline, which was to determine whether a contact would be valid or invalid. The final contact filtration was completed to determine the whether the interaction was cis (valid) or trans (invalid). A general definition of a cis contact in the pipeline was intrachro- mosomal between 10 kb - 3 mb in distance between the two ligated fragments (figure 5.4d). The other type of contact interaction was in trans or interchromosomal contacts. For a Valid Pair to be considered a Valid Contact, the interaction had to be cis, if the contact was acting in trans, it was considered an Invalid Contact and removed from the subsequent analysis. It was expected there would be at least 40% of the contacts acting in trans [324]. However, that number was generated in humans and for a standard HiC experiment, so the total of about 50% trans interactions seen here was in range of previous H3K27ac HiChIP publications [55, 56]. Overall, there were between 10-15% of the initial pairs accepted as Valid Contacts in the analysis (figure 5.5a). In order to generate the Valid Contacts, the following HiC-Pro pipeline [278] -s commands: mapping, proc hic, quality control, and build contact maps were used. The graphs shown in figure 5.5 were generated using HiC Pro Alignment Summary and custom R scripts. After the generation of the Valid Contacts, the next stage of the analysis was to determine the significant contacts in each sample, followed by differential analysis between the NS and S conditions. For this, FitHiChIP [326] was used. 122 Figure 5.5: (A) The filtering from all of the pairs from sequencing to the amount of contacts taken into the FitHiChIP analysis. Each of the samples were separated (blue - non stimulated, red - stimulated replicate 1, and purple - stimulated replicate 2) and further separated into four categories. The All Pairs takes into account all of the sequencing reads and separated between Invalid and Valid Pairs. The second column (blue bars) probed the Valid Pairs into the type of pairs (Forward-Reverse (FR), Reverse-Forward (RF), Foward-Forward (FF), and Reverse-Reverse (RR)). The third column (green bars) probed all of the Valid Pairs to determine the number of PCR duplicated samples. The last column probed (purple bars) the Valid Contact Pairs to determine the Valid Contacts which were between 10kb to 3mb from each other. The percentages are based on the total number of Pairs in the first column. (B) A zoomed in graph of the first two columns from (A). (C) A zoomed in graph of the last two columns from (A). The percentages are based on the total number of Valid Pairs (green) (D) A summary chart of the FitHiChIP outputs using different P2P settings within the FitHiChIP package. 123 5.2.4 H3K27ac HiChIP FitHiChIP Analysis After completion of the HiC-Pro pipeline, the samples were run through the FitHiChIP pipeline [326] to determine the significant contacts. The general scheme of the FitHiChIP pipeline is shown in figure 5.6 and will be quickly summarized here. The first step in the pipeline was to map and generate the cis read pairs (valid pairs) generated using the HiC-Pro pipeline. After- wards, using the merged ChIP-Seq peaks (called by MACS [267]) as an input to filter contacts which contained a peak on at least one of the two contacts (Peak2ALL [326]). The filtered contacts then underwent bias regression and spline fitting to generate the most significant con- tacts. The final stages were to merge the nearby contacts and subsequent differential analysis between the stimulated and non stimulated conditions. An in depth analysis of these steps can be seen below. The minimum number of contacts needed to generate high quality data in FitHiChIP was 20M [326]. After the contacts were run through the HiC-Pro pipeline filtration and combined, the NS sample generated 22M Valid Contacts from the initial 440M unpaired sequencing reads and the S1 and S2 samples each had 65M and 34M respectfully (figure 5.7a). Although there were more contacts in the S samples, the values were all scaled down to 22M at this stage in the analysis to ensure proper comparisons between the samples [326]. After the generation of the downsized Valid Contacts to 22M for all samples, the FitHiChIP package [326] was run using the following settings: PeakFile=<narrow.peak (generated by MACS2 [267])> , IntType=3, Binsize=5000, LowDistThr=10000, UppDistThr=3000000, Bi- asType=1, MergeInt=1, Qvalue=0.01, and P2P=<0 and 1>. To quickly summarize these functions, the input ChIP-Seq file was used to filter contacts which only interacted with ChIP- Seq peaks was the same that was generated from the merged ChIP-Seq samples. IntType=3 was used to generate contacts which encapsulated at least one peak from the given ChIP-Seq input [326]. The reason not to limit the analysis to contacts which contained a ChIP-Seq peak overlap on both contacts was because in H3K27ac, it was likely there would be looping do- mains not regulated directly with histone modification. There were potential domains which lacked H3K27ac but regulated looping by various TFs such as MEF2, CTCF, and GATA. The LowDistThr and UppDistThr set a limit on the distance of the Valid Contact to be between 10 124 Figure 5.6: Adapted from Bhattacharyya et al. Figure 1 [326]. The general workflow for the FitHiChIP pipeline. The first step was to take the Paired-end Fastq files and bin and align the reads. Afterwards, only pairs which had a distance between 10kb-3mb were retained. The pairs were then combined with the MACS2 peaks [267] to generate the pairs which had a ChIP-Seq peak on at least one of the con- tact points. The pairs then underwent Bias Regression and Spline fitting before determining statistical relevance. The pairs could then be merged and subsequent differential loop analysis were completed. 125 kb and 3 mb, the setting was also used in the HiC-Pro pipeline, so this setting could be further adjusted as required. BiasType=1 was used for a coverage bias regression instead of ICE bias regression, MergeInt=1 merged overlapping contacts (within 1 bin), and Qvalue=0.01, set the threshold to retain any contact which had a significant qvalue (< 0.01) after FDR analysis. The last option listed was P2P, which had two inputs of either 0 (P2P0) or 1 (P2P1). Both of these two options were important to complete as it allowed either a stringent (P2P1) background or a looser background (P2P0). The package developers recommenced to run both settings and that for less sequencing depth to use P2P0 [326].The more stringent setting (1) would increase the stringency of the background so that only the highly significant loops were called. The less stringent setting (0) would allow for the less significant loops to be called, which was important for sub-looping properties and non-scaffolding loops. Although the sequencing depth in the samples were not considered high for FitHiChIP (> 50 M), it was useful to use both of the settings of P2P throughout the analysis. The P2P1 was an important aspect as these were much more stringent loops, so the most important local reg- ulatory looping could be visualized more clearly and was used for all HiChIP looping figures. P2P0 was used for differential analysis. In order to determine the best settings, both were used throughout the initial significance determination and subsequent differential analysis. The abil- ity to use both sets of background - whether stringent or loose, allowed for a deeper understand of the dynamics of H3K27ac looping in the NS vs S conditions. After running FitHiChIP on the NS, stimulated replicate 1 (S1), and stimulated replicate 2 (S2), there were about 90k significant interactions called with P2P0 and about 25k with P2P1 (figure 5.5d). These values were in line with previously published studies [53, 53, 56, 165] and although the P2P1 was a little lower, the numbers were acceptable to continue with the analysis. Overall, the number of merged contacts showed a high percentage of the P2P0 (about 40%) were merged, while a lower percentage in the P2P1 were merged (about 25%) which suggested the higher stringency P2P1 eliminated the less stringent, sub-looping domains. The next step of the analysis was too determine where there was differential looping between the NS, S1, and S2 samples. For this, the DiffAnalysisHiChIP .r script within the FitHiChIP package [326] was used. 126 For –AllLoopList, the output files from FitHiChIP for the samples were used: ./FitHiChIP/.../ Coverage Bias/ FitHiC BiasCorr/<sample>.interactions FitHiC.bed. The rn6 chromosome size file (–ChrSizeFile) was downloaded from ftp://hgdownload.cse.ucsc.edu/ and the chro- mosome sizes were extracted. The –ChIPAlignFileList files were generated from the merged ChIP-Seq files using the bigWigToBedGraphWig tool [328] with the following command ./big- WigToBedGraphWig. The following comparisons were completed for the differential analysis: NS vs S1 and NS vs S2. However, there was an option within the differential analysis to do the comparison NS vs S1 and S2. For this, there were a couple more options required: –ReplicaCount 2,1 –ReplicaLabels1 S1,S2 –ReplicaLabels2 NS which indicated there were two replicates for the S samples and 1 for the NS. The differential analysis in the package used EdgeR [329] to determine differential looping domains. One of the settings was to limit the minimum fold change of contacts to be 1. Due to the nature of the experiment, the FDR threshold was not used because it was discovered the differential loops were likely to be very similar (data not shown). In most of the recently reported H3K27ac HiChIP data sets, the differential analysis is used between various cell types [55, 56, 72, 81, 330] making the exact settings needed for disease comparisons within the same cell type unknown. In total there were six differential analysis runs completed as seen in figure 5.7a,b with various pvalue thresholds in order to determine the optimal results. The different settings yielded between around 10,000 differential loops down to about 900 in the most strict con- ditions. The condition used both the S1 and S2 to compare to NS (Both-D) yielded almost the same number of contacts but many of them were not being found in either the the S1 versus NS (S1-D) or the S2 versus NS (S2-D) for both the P2P0 and P2P1 as seen in figure 5.7c,d. To select a p.value, the desired amount of differential loops to go forward with analysis was to have under 5 thousand since over that mark started to yield too many irrelevant loci throughout the genome via visual inspection of the result [326]. For this reason, the pvalue< 0.0075 for differential loops was used. The P2P was also selected to be P2P0 to ensure the lower signal loops would be retained. As expected, all of the P2P1 loops were also contained within the P2P0 samples [326] (data not shown). 127 Figure 5.7: (A) Number of Differential Loops at different p value thresholds and P2P. NS vs S1 S2 compares the non stimulated condition with both of the stimulated conditions. NS vs S1 compares non stimulated with stimulated replicate 1 and NS vs S2 compares to stimulated replicate 2. (B) Graphical representation of (A) showing the number of differential loops in each condition. Both D refers to NS vs S1 S2, S1 D refers to NS vs S1, and S2 D refers to NS vs S2. (C) Venn Diagram of the number of overlapping differential loops between the three conditions with a P2P setting of 0. (D) Venn Diagram of the number of overlapping differential loops between the three conditions with a P2P setting of 1. (E) The chemical structure of norepinephrine. (F) The chemical structure of phenylephrine. (G) The type of differential loops which where retained in the analysis. The differentially express gene (DEG) had to be contained within the loop (left) or intragenically (left). (H) The non-differential loops were removed regardless if there were DEGs contained within (left) or intragenically (right). 128 Interestingly, regardless of the pvalue selected, the S1-D samples, there were no over- lapped differentially loops between Both-D or S2-D (p < 0.01 shown for simplicity) (fig- ure 5.7c,d). If the loop was called in S1-D, it was found in both sub samples S1-D had an overlap over 70% with S2-D (figure 5.7c). S2-D also had a very high overlap with over 60% contacts. The Both-D samples generally performed more poorly with an overlap fraction of around 27.5%. The reason for this discrepancy was likely to uneven sampling when compar- ing replicates to a single sample in edgeR and as a result, only the S1-D and S2-D were used for subsequent analysis. The final settings used for the differential looping were: P2P0 with a pvalue< 0.0075. The next stages of the analysis were focused on determining if the differential peaks lay in loci known to be implicated with HCM. 5.2.5 H3K27ac HiChIP Correlation to Gene Expression After the differential loops were determined, the next stage of the analysis was to check if there were any differentially expressed genes (DEGs) which correlated to differential loops. For this, a publicly available RNA-Seq data was used [319]. Dai et al used very similar stimulation conditions as were used in the samples here but used a slightly different stimulus, phenyle- phrine [?] instead of norepinephrine. Although the chemicals are very similar (figure 5.7e, f), norepinephrine is known to induce a stronger stimulation response in heart tissues [331]. Phenylephrine has still been shown to induce a strong HCM response [319, 332], so the use of this data set was still able to discover similar RNA-Seq expression profiles as previously described [2,190,222,232]. As a result, the data was used to correlate specific DEGs that were nearby the differential loops. To define DEGs, DESeq2 [286], was used and only genes which had a p-value< 0.05 and alog 2 (expressionfoldchange)> 1.5 were retained. There were two types of DEG filtered using a custom R script, DEGs that were either fully encompassed within the differential loop or intragenically (figure 5.7g, h). Although these were not the only possible ways a cell could regulated gene expression, there could have also been a mechanism where a DEG was contained within a loop that was not differential. This type of non-differential looping be a preformed looping domain, poised for gene expression. To complete this filtration, the differential loops were correlated to the DEGs which met 129 one of these criteria because it has been shown that differential looping correlate to DEGs [53, 55, 56, 72]. As mentioned above, the differential loops were constrained to a p-value < 0.0075 as well as a log 2 (contactcount) < 1.5 for the analysis. As part of the filtration of differential loops and DEGs, the contacts were classified into promoters or enhancers. The H3K27ac peaks were classified as promoters if they were within 500 bp of a transcription start site (TSS) and all other peaks were labeled as enhancers. The specific types of enhancers were not segregated to keep the analysis from diverging in too many directions. The differential loops were then classified whether they were interactions between promoters and enhancers (P-E), promoter and promoter (P-P), or enhancer and enhancer (E-E). P-P interactions can make up a large number of H3K27ac HiChIP interactions [56] and it has been shown that P-P interactions can impact transcription in yeast [333] and humans [56]. Since, the gene regulation mechanisms in HCM are not fully understood, these interactions were of interest because local chromatin changes might be regulated by other types of interactions. However, despite these small caveats, the differential HiChIP log 2 (contactcount) was compared to thelog 2 (expressionfoldchange) to determine if there were any correlations be- tween how the HiChIP looping was changing in the DEGs (figure 5.8a). Each point was a DEG that correlated to a differential loop and the relative expression difference was shown. The sig- nificant interactions were then color coordinated to the type of interaction there was between the two contacts (P-P, P-E, or E-E). There appeared to be some artificial cut-offs introduced in the FitHiChIP differential analysis with a strict cutoff forlog 2 (contactcounts) over 5.5 or below -5.5, with few loops between 1.5-5. This was likely caused during normalization and EdgeR analysis and was seen in all the differential analysis completed here (data not shown). The graph in figure 5.8a showed a similar pattern for both of the replicated S1 D and S2 D. The significant interactions had the three different classes and the breakdown of those interactions can be seen in figure 5.8b. Although there were slightly more interactions in the S2 D sample (686 versus 500), the overall breakdown of the interactions type was similar. The P-P interactions were much fewer in number compared to the P-E or E-E and the majority of the focus on of the interactions in the analysis were on either P-E or E-E. Since the type of enhancer was not further classified it was expected that these would entail a large number of 130 the interactions. The P-E subset was interesting as it inferred the regulation of the gene was based on the promoter activation. Interestingly, the E-E interactions, which made up over 50% of the classes indicated some gene regulation occurred within the looping domains. These interactions were useful in looking at specific loci within the genome but were not explored in great detail genome-wide. The result was that these were classified and could be used in the future to look at specific interactions of interest which could be widely used as a resource for the community. Although the classifications of the significant regions were interesting, a surprising find was how the HiChIP loops affect overall gene expression. Although it was known repressors have an important role in gene expression [334], there is often a large focus in the field on enhancer and promoters [4,55,56,335] without much attention being drawn to the responsibility of silencers in gene regulation. As shown in figure 5.8c, the majority of interactions (70%) showed that differential HiChIP loops had a negative regulatory mechanism on gene expression rather than a positive regulation. The loss of a loop was associated with a up regulation of gene expression in about 35% of the regions, while the loss was only associated with a down- regulation of gene expression in less than 10% of the correlated regions. Along the same lines, when a HiChIP loop was gained between NS and S, there was also about a 35% number of the significant contacts which saw a down regulation of the gene (figure 5.8c). There was also almost double the amount of gained loops with upregulated genes, indicating there was still some regulation of newly formed loops. The data suggested the role of silencers in S (likely MEF/HDAC interactions [2,3,7–9,202,222,232,233]) through small looping changes (median differential loops were about 60kb (figure 5.7a)) are able to remodel and regulate chromatin. The impact of gene expression and remodeling in S is likely due to more TFs than just MEF/HDAC suggests these regions were poised in some manner for transcription through a quick response stimuli. This concept of poised chromatin regions being regulated by MEF2/HDAC was also con- sistent with HDAC4-NT, in which these gene can be quickly activated and deactivated as the levels of HDAC in the nucleus vary upon stimulation. The higher order structure of HDAC4-NT would allow for the conservation of the silencing mechanism and local rearrangement would 131 Figure 5.8: (A) HiChIP contact count fold change versus RNA-Seq contact count fold change (log 2 values). Each dot represents a differentially expressed gene (DEG). Red represents S1 Diff differential loops and purple indicates S2 Diff differential loops. The significance threshold was set at log 2 (1:5). The significant interactions were then categorized based on whether they interacted between Promoter- Promoter (P-P) (orange), Promoter-Enhancer (P-E) (yellow), or Enhancer-Enhancer (E-E) (blue). The enhancers were not subdivided into further classifications. (B) Pie chart of the percentage of the sig- nificant differential loop and DEGs. (C) Correlation between the type of interaction and the affect of differential loop and DEG. There were four types of interactions: gaining/loss of a HiChIP loop and increased/decreased gene expression. There was not a significant difference between the type of inter- actions: P-P, P-E, and E-E. 132 not occur. Studies which activate HDAC-NT would be needed to further confirm this finding. However, this hypothesis of DEGs being controlled by the silencers such as HDAC would also explain why differences in HiC experiments have not been found [10]. The changes in S were on a local level within the chromatin and there were few instances of brand new H3K27ac marks around the differential loops. There was a general up regulation of H3K27ac (about 8,000 peaks (figure 5.3c)) which indicated the genes were more active but the NS H3K27ac was still present. As a molecular mechanism, this would allow cells to quickly respond to various stimuli in order to activate and subsequently deactivate specific regions of the chromatin. The minor changes in 3D interactions would not disrupt global chromatin structure which has been shown to be mediated by TADs clamped by CTCF/Cohesin [54, 70]. The role of minor chromatin changes in genome-wide structure has only recently been possible with the invention of assays like HiChIP [53] and ChIA-PET [336], so wide-scale silencing mechanisms could be more common than previously thought and was suggested by this data. The data generated here between NS and S has a vast amount of information and will allow for in-depth 3D analysis of numerous loci in greater detail than previously possible. The body of the work presented here was focused on one specific loci, leaving many other regions to be probed in the future. 5.2.6 NPPA and NPPB Loci Interactions background A highly studied region in HCM is the natriuretic peptide hormone a and b (NPPA/NPPB) loci due its importance in HCM, heart development, and other cardiac diseases [3, 264, 266, 319, 337]. The two genes are essential in heart development [264] and both are expressed in heart tissue throughout development. Although NPPB expression is drastically reduced in mature tissue [3,319], up regulation has been directly correlated to heart disease [246,338,339]. Recently, the two proteins have been used as a molecular marker for heart disease and heart failure in the clinical setting [266, 340, 341]. The loci has been extensively studied, but the structural regulation of the loci is not fully characterized. The loci has been shown to be evolutionarily conserved [266, 342–345], so its important 133 to understand how the 3D landscape is changed under disease conditions. The loci is unique because the two genes separated by only about 10 kb (figure 5.9a), but have vastly different gene expression regulation. NPPA was expressed at almost 10-fold higher rate than NPPB in NS cardiomyocytes [319], which has lead to the question of how cells are able to segregate the two genes. Sergeeva et al noted that there have been other similarly structure loci (Iroquois and Hox) that shared regulatory mechanisms [264, 346, 347], leading to hypothesis of conserved regulation in the NPPA/NPPB loci. Interestingly, upon stimulation in cardiomyocytes, NPPA and NPPA both had a large up regulation of gene expression and were two of the highest upregulated genes in S conditions [3, 319], with over 10-fold the expression levels compared to NS. The loci was also one of the top regions which had DEG contained within a differential loop. Instead of focusing directly on the differentially expressed loop, the whole loci was observed because there have been intensive studies completed on the region to understand the gene regulation [264, 265, 343, 348–350]. There have been a few regulatory elements (RE) both upstream and downstream of NPPA and NPPB implicated which involved independent mechanisms between development and cardiac stress [264,265,351,352]. A large region around the gene cluster (300 kb) was highly correlated through GWAS to be associated with heart disease [349]. The main REs highlighted by Sergeeva et al [264] were the NPPB promoter and a region from -27 to -22 kb from the NPPA promoter (figure 5.9a). The authors also mentioned a region -40 to -35 kb from NPPA which they hypothesized was an enhancer for both NPPA and NPPB [264] that was regulated by TF binding [334]. The chromatin had up regulation of H3K27ac in the promoter regions of both NPPA and NPPB [246] which recruited p300 to activate the genes further [353]. A very recent study by the same group looked further into upstream enhancer region and showed competitive gene expression between NPPA and NPPB. Interestingly, the deletion of this enhancer region caused hypertrophy [5]. The authors of the study developed a model for the interactions in the loci between the up- stream regulatory element and the chloride transport protein 6 (CLCN6) promoter (figure 5.9b). The overall structure of the loci was altered whether NPPA or NPPB was being expressed due to different regulatory elements interacting with the particular gene being expressed. The au- 134 thors in Man et al. found the interaction between the downstream RE was enough to cause activation of the gene and there was competition between the two genes to be expressed [5]. When the element was knocked out, there was not expression of either gene [5], indicating the importance of the element to the whole loci. These data suggested the loci, specifically NPPB, has MEF2/HDAC regulation and upon hypertrophic activation when HDAC is localized into the cytoplasm, gene expression and chro- matin remodeling can occur because MEF2 is known to recruit and bind p300 [69]. When HDAC was knocked down in cardiomyocytes in vivo there was a drastic up regulation of NPPB [354]. All together, this loci has been studied in depth and the work completed here further analyzed the mechanism of NPPA/NPPB activation and the resulting implication on chromatin structure. 5.2.7 NPPA and NPPB Loci Interactions Recent studies completed by the Christoffels group [5, 264–266], have had a large impact on understanding of the NPPA/NPPB loci in great detail and has begun to show how the loci is regulated. As mentioned above, their most recent study (January 2021) [5] helped to identify the -40 to -35 kb region from NPPA as a major RE through extensive knockdown assays in vivo. These REs have been highlighted in the H3K27ac ChIP-Seq data completed here (figure 5.10a gray bars). Man et al showed the removal of RE1a almost completely downregulated both NPPA and NPPB but did not have an affect expression on genes in the rest of the loci [5]. Interestingly, the knockdown of RE1b did not have affect on gene expression [5]. The removal of RE2 and RE3 regions caused an up regulation of NPPA but not NPPB, which lead to the conclusion of co-regulation of the two genes by RE1a [5]. The paper was able to correlate the RE1a region to a stress response but was unable to determine any downstream REs from NPPA which could also be co-regulating the loci. Although the group completed 4C experiments in the region [5, 264], the resolution and scope of the analysis was limited by the bait sequence selected. As a result, the full extent of interactions could not be completely understood, however the work completed here has allowed for all the interactions within the loci to be characterized. As shown in figure 5.10b, 135 Figure 5.9: (A) The natriuretic peptide hormones a and b (NPPA and NPPB) locus which includes CLCN6, PLOD1, and RGD1305350. The Upstream regulatory element (RE) was identified from Sergeeva et al [264]. (B) Adapted from Figure 5 Man et al [5] depicting the interactions in the loci from (A). A CTCF clamp forms between PLOD1 and CLCN6. The Upstream RE can interact with either NPPA (yellow) or NPPB (green) causing chromatin remodeling. The exact regulation of these interactions is not fully understood. 136 the interactions between NS and S conditions within the loci Man et al. focused on [5]. The data showed overlap between all of the regions outlined in Man et al., however, their data was not expanded to include CLCN6 promoter [5]. The data suggested a new loop at RE1a - RE5 upon stimulation and caused a perturbation in chromatin looping. The loci was in the predicted downstream region from NPPA and appears to have had a large impact on the structure of the loci. Through the looping, RE5 was brought close to RE1a but not RE6 which was a known mouse embryonic stem cell (ESC) TAD from the CLCN6 promoter to about 125 kb upstream [5,355,356]. As a result, the hypothesized RE5 was confirmed here and expanded the potential regulatory mechanisms within the loci. The loop from RE1a to RE6 was lost in the S conditions as can be seen in figure 5.10b which resulted in chromatin rearrangement within the expanded loci. RE6 not only looped to RE1a but also to RE3 (NPPB promoter region), which then looped to RE1b, bringing all the regions close in 3D proximity. The data suggested an explanation for why the deletion of RE1b did not have an effect on transcription [5]. The NS loop seen from RE1a to the RE2, likely maintained the segregation of NPPB from the main regulatory control mechanisms in loci. The deletion of RE1b would not have an effect on the RE1a - RE3 - RE6 being close in proximity, which kept transcription of NPPA consistent. However, when Man et al knocked out RE2 and RE3, there was a large up regulation of NPPA but not NPPB [5].The HiChIP data completed in this study suggested a possible explanation for this observation as well. As can be seen in the S conditions (figure 5.10b bottom panels), there were interactions between RE1a - RE3 - RE5 which indicated the regions were close in 3D proximity. Also of note, there were no interactions between RE1a or RE1b to RE2 in the S conditions. To understand the implication, the looping scaffold in NS could be further investigated to show the network of RE regions in the loci as shown in figure 5.10c. In 3D space, all of the REs would be close in proximity. However, by removing RE2 and RE3, the network of loops lost the structural integrity leaving only the RE1a to RE6 loop which lead to up regulation of NPPA [5]. This lead to the hypothesis the when the cells are not stimulated, RE1a had a natural preference for NPPA activation (figure 5.9b left panel) as has been shown by 4C experiments [5]. These data suggested the co-regulatory behavior of RE1a naturally has 137 Figure 5.10: (A) H3K27ac tracks of the NPPA/NPPB loci for non stimulated (No Stim) and stimulated (Stim) conditions. The regulatory elements (RE) noted in Man et al [5] are highlighted in gray and show overlapping regions with H3K27ac. Tracks were generated using the WashU Epigenome Browser. (B) Combined H3K27ac ChIP-Seq and HiChIP experiments for non stimulated (blue) and stimulated conditions (red and purple). The looping pattern is altered between stimulated and non stimulated and corresponded to newly identified REs highlighted in gray. (C) The looping interactions between REs in the non stimulated sample (D) The looping interactions between REs in the stimulated replicate 1 sample (E) The looping interactions between REs in the stimulated replicate 2 sample (F) The consensus looping interactions between REs in the both stimulated samples. RE4 interaction was unique in the stimulated conditions, which RE6 interactions was completely lost. (G) Adapted from Figure 3 Wu et al [261] containing the consensus sequences for MEF2. These sequences were used to probe the loci for possible MEF2 binding sites. (H) Vertical bars indicate all potential MEF2 binding sites in the loci. The gray bars are associated with the RE position. There were MEF2 overlaps nearby the majority of the RE elements suggesting the role of MEF in the locus. 138 a preference to NPPA and only the removal of potential regulatory silencers like MEF2/HDAC do you get activation of NPPB as seen in HCM [69, 319]. There was a new looping domain in S which was much stronger in S1 but a variation was also present in S2. The RE4 was right at the promoter of NPPA, which suggested regulatory control and a potential mechanism for the up regulation of NPPA. Overall, the looping was maintained from either RE1a to RE4 (S1) or RE2 to RE4 (S2) as shown in figure 5.10d, e. The consensus interactions (figure 5.10f), showed RE4 tied the loci together and along with stronger interactions with RE5, allowed for the up regulation of both NPPB and NPPA. Due to the co-regulatory mechanism of RE1 [5], the stronger interactions to the NPPB promoter (RE1a to RE3) and to the NPPA promoter (RE1a to RE4) indicated the looping was stronger with the loss of the RE1a to RE6 regulatory loop. This suggested the importance of both RE4 and RE6 in the expression regulation of NPPA/NPPB and chromatin structure. To better understand the importance of RE4, the loci was tested for potential MEF2 binding sites since NPPB was known to be upregulated when HDAC was knocked out in NVRMs [354]. If MEF2 binding was linked to the RE, MEF2/HDAC interactions could be responsible for the chromatin remodeling. In figure 5.10g, the consensus binding motifs for MEF2 [261] are shown and these sequences were then searched throughout the loci. As can be seen in figure 5.10h, there were two locations in the REs which had a MEF2 binding site with many other proximal to other REs. The binding position within RE1a was confirmed via human ChIP-Seq in Man et al [5], which suggested MEF2 regulation was occurring in the loci. Interestingly, the only other overlap was in RE4, which was an S specific loop. Due to the lack of interactions seen to RE4 in NS and the known up regulation of NPPB without HDAC [354], it is likely the RE1a - RE4 loop was mediated by the removal of HDAC to the cytoplasm. Al- though further tested would be required to confirm, there was overwhelming evidence provided by the various studies mentioned here combined with the HiChIP data. Although there was evidence of MEF2/HDAC regulation in the loci, current literature did not highlight the importance of the CLCN6 promoter (RE6) because most studies limited the studied region to a 100 kb loci. However, when the loci was expanded to the larger TAD domain there were more discoveries which highlighted the importance of RE6, RE4 and RE5. Once 139 the loci was zoomed out further, an important enhancer (RE7) between two f-box proteins (FXBO6) and (FXBO44) was discovered (figure 5.11a). Upon stimulation, the interaction between RE6 - RE7 was completely lost. The overall structure of RE1a - RE7 was maintained directly in the S1 sample and indirectly in S2 between RE1a - RE5 - RE7, which indicated the region was a potential intra-TAD boundary as shown in previous HiC work [10]. The looping data in the S1 sample suggested RE7 loops to RE4, a known MEF2 region. To explore this concept further, publicly available mouse (reference genome mm10) ChIP- Seq data was used to determine if any important TFs overlapped the loci and specifically RE6. As shown in figure 5.11b, the following ChIP-Seq data was used: H3K27ac [357] , CTCF [358], Nk2 Homeobox 5 (NKX2-5) [359], p300 [358], GATA Binding Protein 4 (GATA4) [360], MEF2A [361], and MEF2C [361]. The MEF2 ChIP-Seq samples were completed in neuronal cells while the other samples were in the heart epithelium. There were a very limited number of publicly available ChIP-Seq tracks for MEF2, so the use of the neuronal samples was re- quired. The neurons are known to express MEF2 and there is a conservation of MEF2 sites after differentiation [255]. NXK2-5, GATA4, and MEF2 have been implicated as core cardiac TFs and are known to act cooperatively with each other to stabilize chromatin structure [5]. The mm10 ChIP-Seq data from numerous different experiments (figure 5.11b) also showed overlap between the var- ious tracks which highlighted the co-binding ability. The H3K27ac mm10 data showed a high correlation to the NRVM data completed here and was used as a benchmark for assigning the REs (due to the conserved nature of the loci) [266, 342–345]. The NKX2-5 peaks showed an overlap with all the assigned REs supporting the importance of the TF to the loci and suggested the possible role of MEF2 throughout the loci. As shown in figure 5.11b, there was overlap of NKX2-5 and MEF2 at RE4, RE6, and RE7 which supported the hypothesis of MEF2 regula- tion. These data suggested HDAC bound to MEF2 bridged RE6 to RE7 and upon stimulation, switched the binding to MEF2/p300 from RE4 to RE7 which caused NPPA up regulation. The evidence from the HiChIP data showed the importance of RE6 at the CLCN6 promoter and loss of MEF/HDAC regulation. The data presented here suggested the loci extensively studied by Man et al [5] was one of 140 Figure 5.11: (A) H3K27ac ChIP-Seq and HiChIP of the extended NPPA/NPPB locus highlight a new regulatory element (RE), RE7. All REs are highlighted by gray bars and labeled below. The tracks were visualized in the WashU Epigenome browser. (B) Publicly available mm10 ChIP-Seq tracks in the overlapping region to the rn6 data in (A). The list of tracks are: H3K27ac [357] , CTCF [358], Nk2 Homeobox 5 (NKX2-5) [359], p300 [358], GATA Binding Protein 4 (GATA4) [360], MEF2A [361], and MEF2C [361]. The H3K27ac tracks was used as a benchmark for the RE placement. The gray bars highlight the specific RE labeled below the tracks. The tracks were visualized in the WashU Epigenome browser. 141 Figure 5.12: (A) H3K27ac ChIP-Seq and HiChIP of the further extended NPPA/NPPB locus highlight a new regulatory element (RE), RE8. All REs are highlighted by gray bars and labeled below. The tracks were visualized in the WashU Epigenome browser. (B) Publicly available mm10 ChIP-Seq tracks in the overlapping region to the rn6 data in (A). The list of tracks are: H3K27ac [357] , CTCF [358], Nk2 Homeobox 5 (NKX2-5) [359], p300 [358], GATA Binding Protein 4 (GATA4) [360], MEF2A [361], and MEF2C [361]. The H3K27ac tracks was used as a benchmark for the RE placement. The gray bars highlight the specific RE labeled below the tracks. The tracks were visualized in the WashU Epigenome browser. 142 Figure 5.13: (A) Interactions of the complete NPPA/NPPB loci under non stimulated conditions. Reg- ulatory elements (RE) with a circle did not contain overlap with all three TFs: MEF2, NKX2-5, and GATA4. REs with boxes indicate the overlap. Purple arrows indicate proximal in 3D space, orange indicates a CTCF dimer formation. (B) Interactions of the complete NPPA/NPPB loci under stimulated conditions. Regulatory elements (RE) with a circle did not contain overlap with all three TFs: MEF2, NKX2-5, and GATA4. REs with boxes indicate TF overlaps. Purple arrows indicates proximal in 3D space and orange indicates a CTCF dimer formation. 143 three intra-TAD domains within the whole 600 kb TAD. The other two major looping domains were between RE7 and an even more distal RE8 region 300 kb downstream at the mechanistic target of rapamycin kinase (MTOR) and UbiA prenyltransferase domain-containing protein 1 (UBIAD1) promoters (figure 5.12a). In NS, RE6 - RE8 formed a large loop but upon stimula- tion, the contact between RE8 swapped to form a new domain with RE7. These data suggested there was long-range regulation at RE6. The loss of looping at RE6 in S but instead the loop swaps to RE7, which in turn loops to RE4, an identified MEF2 binding region (figure 5.11b). Overall, all the perturbations in H3K27ac HiChIP looping paired with the extensive un- derstanding of the small loci between RE1 and RE6 [5] and ChIP-Seq overlaps gave an under- standing to the regulation of the entire TAD. A hypothetical model of the interactions is shown in figure 5.13a ,b which suggested how MEF2/HDAC interactions regulate the loci based on the HiChIP data. The four main interactions which regulate the loci in NS had NKX2-5, MEF2, and GATA overlapping ChIP-Seq peaks (figure 5.12b) and also formed HiChIP contacts (fig- ure 5.12a). Given the ability of MEF2/HDAC to form tetrameric structures [67], the data suggested MEF2/HDAC lock the TAD with the support of CTCF, GATA, and NKX2-5 and keep NPPA activated while segregating NPPB. There were some common features between S and NS. There were at least two CTCF sites the data suggested were retained and likely more. There was a strong CTCF site upstream to RE0 which was likely binding to upstream from RE1, ensuring the proximal distance of RE1 to the main gene cluster as shown by a purple arrow in figure 5.13a. This was important to note because without the ability of RE4 to localized into the same 3D space as RE1, NPPB could not be activated. These data suggested MEF2 mediated interactions are required. Man et al [5] showed the competitive nature of RE1 but the authors did not suggest a mechanism for how NPPB was not activated in healthy cardiomyocytes [319]. The hypothesis presented here is that the segregation of RE0 - RE6 - RE7 - RE8 eliminated the ability of MEF2 at RE4 to come into close enough proximity to activate NPPB. Through this mechanism, the second CTCF site between RE0 - RE1 and RE5 - RE6 was maintained as shown by Man et al [5]. Upon stimulation (figure 5.13b), due to HDACs segregation to the cytoplasm, RE4 was relocated to the main cluster between RE0 and RE8 with the support of NKX2-5 and GATA. 144 This then allowed for the competition of NPPB and NPPA to occur with RE1. The data of p300 binding as shown in figure 5.12b suggested these three loci were bound by p300, and thus helped to explain the activation of the loci. The remodeling of RE4 - RE0 - RE8 explained how the RE6 loop was lost under stimulation. The looping from RE7 - RE8 was varied slightly between the stimulated replicates (figure 5.12a) which indicated the interaction was less strong than in NS. The looping of NXK2-5 and GATA likely kept the loci close in proximity but without HDAC segregation of the MEF2 sites, strong looping was lost. 145 5.3 Discussion In the study presented here, H3K27ac ChIP-Seq and HiChIP was completed on NS and S NRVMs in order to compare an HCM model and determine cis regulatory elements being altered in the disease. The project began with the harvesting 1-3 day old neonatal rat left ventricle cardiomyocytes and the cells were cultured for ten days before stimulating half of the cells to induce an HCM phenotype. The cells were then formaldehyde fixed and stored before running both ChIP-Seq and HiChIP using H3K27ac antibodies. The goal of the study was to understand cis regulatory elements in HCM and if MEF2/HDAC were able to perturb local chromatin interactions. The first part of the analysis was to optimize and complete H3K27ac ChIP-Seq on both the NS and S conditions. The assay was completed with 100,000 cells equivalents in four replicates for each condition on an automated platform using a previously published protocol [128, 129]. The replicates had a high spearman correlation of> 0.9 and between the S and NS were slightly lower but still retained a high correlation (figure 5.4a). The peaks were called using MACS2 [267] and there was an expected increase of H3K27ac peaks in S seen, which suggested the up regulation of genes in HCM [3, 319]. There were about 10,000 of the peaks which were statistically different with the majority being upregulated in S (figure 5.4b, c). These findings matched what was expected as there was known to be a larger amount of genes being activated under HCM [2, 3, 7, 8, 190, 222, 319]. After the ChIP-Seq and subsequent analysis was completed, the H3K27ac HiChIP assay was run. There were three HiChIP samples (NS, S1, and S2) which allowed for the compar- ison between NS and S. After sequencing, the analysis began with mapping reads to the rn6 genome followed by the HiC-Pro pipeline [324] and finally through FitHiChIP package [326]. After mapping, there were at least 400 million reads in the samples (figure 5.4c), so HiC-Pro was used to eliminate the contacts to those which were Valid FR, RF, FF, or RR contacts (fig- ure 5.5b), not PCR duplicated and cis interactions (the second contact was between 10 kb - 3 mb away from the first) (figure 5.5c). In doing this filtering, the number of contacts was reduced by about 90% (figure 5.5a, which was in line with previous H3K27ac HiChIP stud- ies [53, 56]. The fewest number of valid contacts (22 million pairs) was in the NS sample and 146 as a result the S1 and S2 samples were down-sized to have the equal amount (figure 5.5d). The FitHiChIP package was then used to determine both the most statistically important contacts and the differential loops between NS and S [326]. There were a some settings which had to be optimized to determine the statistically important contacts but the main option was between using P2P0 and P2P1. Although both were used in this work, they had different purposes. The number of contacts recommended for statistical analysis using P2P1 was about 50 million [326], so this setting was only used for visualization of the HiChIP loops. As a result, the differential analysis was completed using the P2P0 settings and at a pvalue cutoff of < 0.0075 (figure 5.7a-d). The last bioinformatical analysis completed was to determine the differential loops which also contained differentially express genes (DEGs). A custom R-script was used to accomplish this by looking within the differential loops for DEGs that were within or intragenic. The other regions, although likely containing other gene regulatory mechanisms, were not further probed in the study. The type of interacting loop (P-P, E-P, E-E) did not reveal much about the mechanism of HCM gene regulation except that there was an enrichment of P-E and E-E interactions which suggested the important role of enhancers in the disease. A recent HiChIP study in immune cells found around 50% P-P interactions [56], which could be interesting to determine the depletion of that interaction when correlating to DEGs (figure 5.8a, b). An interesting finding was highlighted in (figure 5.8c), which showed the gaining of a dif- ferential loop generally lead to a decreased expression of a gene (2-3 fold higher than increased expression) and the loss of a differential loop generally lead to an increase of expression of the gene (3-5 fold higher than decreased expression). This finding was potentially HCM spe- cific and could indicate the importance of silencers on gene regulation. It has been shown that MEF2/HDAC interactions are important in the disease [2, 3, 7, 8, 190, 222, 319]. The data sug- gested when a loop was lost in S, a silencer (such as HDAC) could explain the increased gene expression. Although it was not explored, the gaining of a loop was likely a reaction to the same mechanism of HDAC segregation, causing genes which used to be cis to activators segregated to new chromatin regions and subsequently down regulated. The genome-wide study was con- cluded here although there were many interesting loci identified. Of note was five of the top six 147 genes included classical HCM genes [2, 3, 190, 319]: NPPA and NPPB [5, 264–266], xin actin binding repeat containing 2(XIRP2) [?, 294, 295], actin alpha 1 (ACTA1), and SCN3B [321]. These genes have also been known to impact development which suggested a similar mecha- nism but only the NPPA/NPPB loci was explored in great detail. The NPPA/NPPB loci had been implicated in HCM and other cardiovascular disease and the loci had been extensively studied previously [5,264–266] and although the genes are about 10 kb from each other have vastly different expression. The work here built upon a recent study by Man et al [5], which determined a competitive regulatory element (RE) upstream from the genes as shown in figure 5.9b. Although there were extensive knockouts and experiments com- pleted in the work, there were more REs which have been further explored here. The hypothesis of competitive activation by the RE1 region was confirmed here using different methods and the data here suggested the mechanism of the competition to be from MEF2/HDAC interac- tions. More experimentation would be needed to confirm the finding. Man et al probed the loci and found three REs through knockout, 4C, and expression experiments [5]. The HiChIP data generated here located 6 more RE throughout the entire 600 kb TAD (figure 5.12). There was an interesting correlation found between a few of the REs responsible for the strongest looping and ChIP-Seq. There were NKX2-5, GATA4, and MEF2 overlaps found when compared with publicly available ChIP-Seq data. Although the binding of these factors was sometimes medium to low level peaks, the correlation with the REs was very strong. All of the REs except RE1-3 contained all of the marks, which lacked MEF2 (figure 5.12b). These data suggested that MEF2/HDAC played an important role in the loci and regulated looping with known co-regulators NKX2-5 and GATA4 [3, 5, 264]. In NS, there was a cluster of four REs which localized together and all contained MEF2/NKX2-5/GATA4 overlaps at RE0, RE6, RE7, and RE8. It was has been hypothesized here that HDAC bound to MEF2 would form tetrameric segregation of chromatin (figure 1.11d, e). The data suggested while RE1 was not regulated by MEF2, it was still close in contact with the main hub of the RE0, 6, 7, 8 as shown in figure 5.13a. The segregation of RE1 was likely mediated by two CTCF binding pairs upstream and downstream of RE0. The structure of the loci kept NPPA proximal to RE1, which allowed for the activation of the gene in NS. 148 The long-range looping patterns were altered in S, which further supported the regulation of the loci by MEF2. Once stimulated, there was a known up regulation of both NPPA and NPPB, which has not previously been explained. As previously mentioned, NPPB was upregu- lated when HDAC was mutated [354], which suggests the role of MEF2 upon the locus. Here, the data suggested the role of RE4, which showed MEF2 binding in mice. The looping in the loci occurred in both replicates while NS lacked looping to RE4. This suggested reorganiza- tion of the chromatin to allow not only up regulation of NPPA but also NPPB. As shown in figure 5.13b, the central hub of RE0, 6, 7, 8 was altered upon stimulation to remove RE6 and instead RE4 was bound. The RE7 interaction was also loosened indicated by the varied looping from RE8 in the S2 condition. It was likely the loop was retained by CTCF instead of MEF2 but further studies are needed to confirm. The reduction of four proximal MEF2 sites to three was also supported by the overlap of p300 (shown previously to bind up to three MEF2 (figure 1.12b) [67]). The implication of RE4 joining RE0 and RE8 in the center cluster was also that RE1 was also proximal. RE4’s position located between NPPA and NPPB suggests the mechanism of how both genes are so highly upregulated [3,319] because this would bring both genes close to RE1 and allow for the competitive nature of the loci as shown by Man et al [5]. These findings have been missed by previous experiments such as HiC [10] because of the localized chromatin changes. Since the main TAD and most of the intra-TAD domains were not being altered, the resolution of HiC would not be high enough to discover these changes between NS and S. However, by using HiChIP and publicly available ChIP-Seq a mechanistic could be determined. More studies are required to confirm these findings but the data suggested new insights into the regulation of this highly studied loci. These insights highlighted the importance of MEF2 in the loci and the implications on gene regulation. The data also support the hypothesis of the structural importance of HDAC within the nucleus while also explaining how NPPB could be upregulated by an HDAC mutant. These findings will be of great interest in the field because there has been little to identify exact role of MEF2 interactions in HCM. It has previously been identified the protective nature of HDAC-NT on the NPPA/NPPB loci [3], which supported the evidence of HDAC interactions 149 even further. The data shown here was generated genome-wide, so other loci like mentioned above could be probed in greater detail by the community. The resource could be paired with many other analysis to confirm previously hypothesized interactions and regulatory elements in HCM. The value of the data will be of high impact to the community. 150 6 Discussion The work presented here had a central focus of determining the role of MEF2 in hypertrophic cardiomyopathy (HCM) and there were three unique approaches taken to accomplish this. The first was the development of a new technology to enable the understanding of MEF2 binding with the hope to scale to an increasing amount of targets. A second, computational, approach was tested using publicly available data-sets in order to understand how the genome was be- ing rearranged upon hypertrophic activation using HiC and ChIP-Seq data. Finally, using a newer technology in the field, HiChIP, 3D interactions between healthy and stimulated car- diomyocytes revealed many insights into the mechanism of the disease as well as MEF2/HDAC interactions in a specific loci. The data-set was generated genome-wide and will be a useful tool in the field to understand the numerous loci being perturbed in HCM. Although the data presented here began with the same goal in mind, there were different approaches and a wide variety of techniques used at each stage. These projects allowed for the growth and deep understanding of both the experimental and computational analysis allowing for a deeper understanding of the entire set of work discussed here. There remain numerous unknowns in HCM and the work here attempted to answer the underlying question of how the disease progresses on the chromatin level but did not address the 1,400 single nucleotide polymorphisms (SNPs) in at least 15 commonly mutated genes [192, 225], which the HiChIP data could be used to further investigate. It would be possible with the data to determine overlaps between differential loops and disease causing SNPs to correlate to a functional loop. The exact mechanism of the effect of SNPs and the nature of the common gene mutations have not fully been explored but is an active area of research in the field [2, 3, 5]. The original goal of the work presented here was to develop a new technology to overcome the shortcomings of ChIP-Seq using a Tn5 based assay. The result of the TTT development showed the ability to specifically insert a barcode via Tn5 using a chemical probe instead of other methods like CUT&TAG [148]. However, there were a lot more experiments required to generate a working technology for the field. The first experiments required was to test the system either in vitro using a plasmid delivery system or the use of an in vivo system. The 151 use of an in vitro plasmid system was planned using the 3XMEF2-luc plasmid, where MEF2 would have been incubated with the plasmid and then subsequently cut using Tn5:mHDAC. The fragments would have then been analyzed with either Sanger Sequencing or NextGen sequencing with the goal of finding the Tn5 inserted barcode proximal to the MEF2 binding sites in the plasmid. This would give an idea of both the specificity of the assay as the exact place of the insertion could be graphed and a statistical significance could determine if the probe was specific for MEF2 insertion. A followup experiment to the in vitro study could have been involving a more a complex double plasmid system in e.coli. This would have involved the a plasmid which could express Tn5:mHDAC and could specifically insert an antibiotic resistance gene. The general concept was to have one plasmid containing inducible Tn5:mHDAC and MEF2, a MEF2 binding site, and one antibiotic resistance gene. The e.coli containing the plasmid could then be expanded and stored for subsequent transformation of a second plasmid. The secondary plasmid, con- taining the same origin of replication (to eliminate both plasmids from being copied to the next generation), would have another antibiotic resistance gene flanked by the Tn5 ME sequence. Once the system was induced, the Tn5:mHDAC would be expressed and would bind to the ME sequences allowing for the transposition of the second resistance gene into the original plasmid at the MEF2 binding sequence. The e.coli could then be plated on double resistance plates and colonies would be picked and sequenced to determine where the insertion of the secondary resistance gene was located. This would allow for a more complex version of the in vitro study mentioned above. In conjunction with the plasmid experiment, another avenue of the work was the mutation of Tn5 to reduce the ability of the enzyme to bind to DNA. There were specific mutations and the initial studies showed a drastic reduction of enzyme activity. This area of research was not fully explored and would need to be looked at again in order to determine the effectiveness. This was an extremely promising area and development of the project because it would allow for only regions which were bound to the target to be tagmented. Although not calculated, one of the concerns about the assay was the off-target insertions of the barcode and the ability to multiplex the assay with a number of TF targets. The reduction of the enzyme would have 152 potentially allowed for the removal of wash steps and increase the ability to have a large number of probes per sample. The power of the assay also came from the ability to create a small chemical probe which could have been customized to any TF of interest. The use of techniques such as phage-display could have generated probes which tightly bound to their target relatively quickly. Once the tar- get probe was generated, the entire system was completed in e.coli which would have allowed for rapid production of the probe in an extremely cost effective manner. The antibodies for MEF2 were inconsistent in ChIP-Seq experiments (data not shown) and there are many other examples of challenging TFs to get reliable ChIP-Seq data. There could be many explanations for this but the use of antibodies could be a likely reason. If a TF is bound by many different factors in vivo, the motif the antibody targeted might not be sterically available. The probes here could be generated with co-factors in mind and created to bind regions outside the known binding surfaces. Even though the objective of creating the new assay was not fully realized, there were nu- merous biochemical and molecular biology techniques which had to be used including cloning, protein expression and purification, and subsequent wet-lab experiments with the enzyme. The TTT method had preliminary results suggesting the proof of concept to be working. If the tech- nique was developed further and more probes were able to be generated, it would have a high impact on community, especially for TFs which have inconsistent antibodies like MEF2. As there has been a bigger push towards single-cell (sc) genomics, this project would have a large impact in the field for its ability to be reduced to the single-cell level using similar techniques as itChIP [149]. However, one of the biggest advantages over methods like itChIP and CUT&TAG would be the ability of multiplexing the targets. If the assay was optimized down to that level, multiple TF targets could be added to the sample and if the background noise was low enough, genome-wide mapping of many factors could be completed within the same cell. In addition to that information with advancements in technology such as the recent development of sc ATAC-Seq combined with RNA-Seq in the same cell with the Chromium Single Cell Multiome ATAC + Gene Expression kit by 10xGenomics, there was the possibility of having multiple TFs targeted and obtaining the gene expression profiles as well within the 153 same cells. If it was possible to obtain so many different insights at the single-cell level, there would be a vast amount of insights given to disease and cellular functions. So, although the project was not seen to completion, there ability of the project to have a future impact in the field is still strong. The proof of concept showed the ability of the assay to work and as the field continues to use Tn5 in assays such as ATAC-Seq [28], itChIP-Seq [149], CUT&TAG [148] technology development like the one presented here will be important. The result of the computational approach highlighted the challenging aspects of under- standing 3D chromatin interactions, especially when mediated by TFs. The goal of the project was to understand how MEF2 and other TFs were mediating chromatin dynamics in HCM us- ing publicly available HiC data [10] and various ChIP-Seq data. Due to the high sequencing coverage found in HiC, it was hypothesized looping domains mediated by specific TFs could be filtered out using computational approaches. Although there had been a similar attempt pub- lished previously [280], the results were mixed and the field did not adopt the type of analysis combining HiC with ChIP-Seq. However, that work was completed in a less specific manner, so the question remained whether in HCM MEF2 mediated loops could be discovered with the HiC data. There were a few caveats in the experiment but the biggest one was the inability to obtain MEF2 ChIP-Seq data on the same type of cells as was completed in the HiC experiments [10]. This issue highlighted the reasoning behind the experiment because MEF2 has been a challenge to get reliable ChIP-Seq data both in the studies relating to this work (data not shown) and by others. This method was completed in the attempt to circumvent the need to complete TF ChIP-Seq on every condition. If it were possible to use publicly available ChIP-Seq data from similar cells, the data-sets generated by numerous labs could be combined to gain a much deeper understanding of gene regulation in HCM as well as in other systems. If the HiC and ChIP-Seq experiments were completed in the same cells it would have likely yielded higher signal-to-noise but that was not tested here. The input data for the computational approaches were HiC, RNA-Seq, and ChIP-Seq. The HiC and RNA-Seq were completed using the same samples [10] and as mentioned before, 154 the ChIP-Seq data was completed using other publicly available data. The HiC needed to be processed through the Juicer package [276] for use in customized filtering R scripts. The RNA- Seq completely re-analyzed from the sequencing files and ultimately run through the DESeq2 package [286] to determine the differentially expressed genes (DEGs). The ChIP-Seq did not require any additional analysis because the MACS2 [267] peak files were used. The first approach was to take HiC data and filter out background signal using the FitHiC package [277]. Afterwards, the differential loops were probed for MEF2 ChIP-Seq peak over- laps before determining if any of those loops contained a DEG. The result of the filtration yielded 273 genes, many of which were correlated to HCM it was not a true representation of the disease as there was over a 50% FDR in the genes. The first reason for this was only the DEGs were probed, so any hit that came out was already differentially expressed making it difficult to determine whether the signal was real or not. The second reason was the coverage of the loops was so high in the HiC, even the differential loops had most of the chromosomes covered. This resulted in a high FDR because the high number of loops increased the potential to include a gene. Although there were constraints put on the size of the loops, in order for the analysis to work well a higher stringency of loops was likely required. This issue was echoed when filtering the ChIP-Seq files because there was not a filter looking for the most significant peaks and was dependent on the thresholds set by the experiment the file was downloaded from. For future work, increasing the stringency of these two settings would likely decrease the FDR because only the strongest connections would be put into the analysis. The issue was more the coverage of the HiC loops, so the stringency on loops would be more important than on the ChIP-Seq data. The second approach went about the analysis in the opposite order, where the genome was first binned with the Eigenvector command within Juicer [276] to 100 kb sizes. The next step was to take only bins which had a DEG and then finally all bins which did not contain a ChIP- Seq peak were removed. The biggest issue with this approach was the compartments (generated by the eigenvector) were not greatly altered between the healthy and HCM conditions, reducing some of the statistical power [10]. However, a similar issue of high FDR was discovered 155 although it was lower than the first approach at 25%. This approach was slightly better because it took a less biased approach but it was still not enough to generate meaningful data. To improve this approach smaller binning of the genome would likely be needed, how- ever, it was computationally demanding and would likely also increase the background signal. Adapting much a stronger stringency for the data on these two methods might still not be enough as this type of analysis has not been adopted by the field, highlighting the challenge of combining the data. Although there was about a 40% overlap in the candidate genes discovered between the two methods, this was likely due to the use of DEGs and the high coverage of the HiC. Although computational approaches have become more and more powerful, there were limits to the power of analysis here. There could be an approach to use machine learning to discover patterns in the HiC and correlate back to the ChIP-Seq and RNA-Seq but they were not employed here. If a system could be created using these three assays in the same cells over many replicates, it might be possible to generate a data rich training set for machine learning. However, the amount of replicates needed would be so high it does not seem feasible at this time to complete on highly studied cell lines such as GM12878 much less in disease mod- els. As the computational programs get more power it might not be necessary to run as many experiments but that appears to be still many years away. Overall, this approach highlighted the need for experimentation and the best alternative was a method which combined both HiC and ChIP-Seq, HiChIP. Although there is no perfect assay, the combination of ChIP and HiC has helped with increasing the signal and has allowed for deeper insights into disease [53, 330] and cell differentiation [56] and will likely be im- proved further and further. The current limitations of high cell numbers (> 1 million) and the lack of many TFs shown to work beyond the highly abundant ones such as p300, CTCF, and cohesion [53] will have to be overcome to fully understand complexities in gene regulation. As previously mentioned, MEF2 antibodies were inconsistent, so H3K27ac was used instead because of its ability to enrich for cis regulatory elements. The final study in this work was to understand the chromatin dynamics of HCM using H3K27ac ChIP-Seq and HiChIP. The goal of the previous projects (TTT and the computational approach) was the same but fell short of their goal. However, with the use of HiChIP, many of 156 the shortcomings were overcome with the newer method since it allowed genome-wide anal- ysis of looping changes in the cis regulatory elements. The approach eliminated the need for an inconsistent MEF2 antibody and the high background noise of HiC experiments. The ex- periment was able to discover previously unknown regulatory elements in at least one loci and generated a unique data set in an HCM model which will be impactful in the field to understand the dynamics of the disease in a high resolution, genome-wide. There were three major components to the project which include running both the ChIP- Seq and HiChIP, the genome-wide computational analysis, and the extensive investigation into the NPPA/NPPB locus. The decision to use H3K27ac was based on the ability to capture cis regulatory elements and if MEF2 was regulating gene activation there was a high likelihood that there would be differential expression of the histone modification as gene expression was being changed. The ChIP-Seq confirmed differential H3K27ac loci, which were highly correlated to known HCM DEGs, which was a positive control for the overall scope of the project. Although ChIP-Seq was not used extensively in this study, the data could be probed by the community in greater detail to understand the specific loci being perturbed in HCM. These ChIP-Seq experi- ments were of high quality and used as input data for HiChIP analysis (through the FitHiChIP package [326]), which allowed for the powerful understanding of the genome-wide HiChIP studies. Altered H3K27ac expression was tied to looping, so the changing chromatin dynamics were understood in greater detail and suggested the implication of specific loops and subse- quent gene expression. One of the caveats of the HiChIP experiment was the number of samples run (n=3) and this had to do with the cost of the assay and a limited number of cells. The cost of running HiChIP was quite high because the number of paired-end reads required for differential analysis (around 400 million) [56,326]. Although sequencing costs have counted to decrease, it was still a major consideration when planning these experiments. Ideally, at least 3 replicates in each condition would be completed but in this study there were only two replicates for the stimulated condition and one for the non stimulated. In follow-up work it would be ideal to increase this number to ensure the interactions were standard throughout many samples. In designing future HiChIP experiments, an compromise could be to complete 4-5 repli- 157 cates and sequence each to about 100M reads and combine each into one sample to understand that major interaction which would be changing between conditions [56]. Although this could potentially reduce the minor changes in looping frequencies between the samples it would in- crease the statistical power of the major looping domains. This would also reduce any technical variability from one replicate to another. For example, if there were 4 samples in each of the stimulated conditions (8 total samples) and all contained the interaction with RE4 in conjunc- tion with the loss of the interaction at RE6, there would be much stronger evidence to support the claim of RE6 chromatin segregation away from the central hub of RE0, 7, 8. Another downside of completing many replicates would be the number of cells required (a limiting factor in the study presented here) and the cost of running the assay. For each of the conditions, at least one million cells would be required, so the suggested compromise of 4-5 replicates would in turn increase the cell number by 4-5 fold. If cell number was not a consideration, then this method of running the experiment would be ideal. However, there would still be the consideration of cost running the assay over that many replicates. There are many expensive steps in the protocol including biotinylated dNTPs, ChIP-Seq grade antibodies, and library preparations. If the cost associated with more replicates was not a large concern, the major cost of sequencing would stay about the same but it would be spread out over the samples. Despite these potential improvements to the experiments completed here, the results showed many interesting changes throughout the genome. The bulk of the analysis presented here was completed at the NPPA and NPPB locus but the data generated was genome-wide. There were many interesting loci which the field would be interested in studying but this was not completed here. A more in-depth analysis of known HCM loci would be vital to fully understand the chro- matin rearrangement in HCM. One of the genome-wide analysis to be completed in the future would be to correlate differential looping to some of the 1,400 HCM GWAS SNPs [192, 225]. To do this, similar methods to Chandra et al [56] could be used to link QTL analysis with HiChIP looping. Although many of the SNPs were associate with missense within genes [192], It would be expected some of the SNPs would be able to regulate chromatin [56, 93, 362]. The hypothesis would be some of the differential loops would overlap the HCM GWAS SNPs caus- 158 ing maladaptive looping and help to advance the severity of the disease. This analysis would support the concept of chromatin looping regulation if the SNPs were located in non-coding regions of the genome could potentially be linked to TF binding sites [93]. If these data found an enrichment of CTCF, MEF2, GATA4, or NKX2-5, there would be strong evidence of TF me- diated chromatin rearrangement and help to explain the molecular mechanism of the specific SNP. Another followup experiment would be to complete MEF2 and HDAC ChIP-Seq in NS and S NRVMs to understand the binding differences between the two conditions. The data would then further support the differential loops and could be directly tied to specific TFs and loci being targeted by the reduction of HDAC. These TF ChIP-Seq experiments were at- tempted but due to the challenge of inconsistent antibodies, full optimization of the assay was not completed (data not shown). In order to complete the optimization, numerous conditions of sonication and cell numbers would be required, but the biggest issue would be the antibodies for MEF2 and the class II HDACs has been a challenging factor. To run an optimal ChIP-Seq the number of cells would likely have to be 5-10 million per replicate in order to get repro- ducible data, which would require at least 5-10 rats per replicate [3] for each TF. However, if the ideal ChIP-Seq was completed, not only would the data be a high impact on the field in furthering the understanding of the MEF2 controlled loci but it would potentially allow for subsequent HiChIP experiments targeting MEF2 or HDAC. The use of H3K27ac in the data presented here was able to pin-point potential MEF2 regulation in the NPPA and NPPB locus and when the data is explored further will likely suggest many other loci. However, without the MEF2 and HDAC ChIP-Seq in the NRVMs, correlation remains a challenge. There have been some reports of tagged MEF2 in NRVMs [?,?], however these experiments generally complete ChIP-PCR or ChIP and the whole genome analysis was not completed. However, the NPPA/NPPB locus was the major loci studied in this work and there remains numerous potential follow-up experiments in the loci to confirm the hypothesis presented in figure 5.13. Man et al’s recent work [5] depicted a strong and comprehensive strategy to understand the complex nature of the loci to pin-point the important REs in the loci. A similar strategy would be required to complete the characterization of the entire 600 kb loci and it 159 would start with using CRISPR interference to knock down specific TFs and REs. To confirm the hypothesis within the loci, HiChIP experiments might not be required. Although it would provide the highest resolution data, since the REs have been laid out by the data here, 4C experiments could potentially be used using the specific RE regions as the bait sequences. The HiChIP experiments would likely provide higher resolution data though as shown here when compared to previous studies [5, 264, 266] and would still be preferred. One of the first experiments would be the knockdown HDAC to determine if the HiChIP looping signature in the loci adapted the stimulated state as would be expected due to up regu- lation of NPPB [354]. Due to the depletion of HDAC, RE4 would be able to re-localize to RE0 and RE8 causing the up regulation of NPPB, which would confirm the role of MEF2/HDAC interactions in the loci. In conjunction with the knockdown of HDAC, specific REs could be targeted by CRISPR interference. The most interesting loci to target would be RE4 and RE6 due to the loss of RE6 looping under stimulated conditions and the gaining of RE4. Each of these knockdowns would explain a different aspect of the looping within the loci. If RE6 was knocked down it would not be expected to have a dramatic effect on the structure in non stim- ulated cells because there would be redundancy in the loci with GATA and NKX2-5 mediating the looping domains, so a similar looping pattern would be expected for both the stimulated and non stimulated conditions. However, if RE4 was targeted and knocked out, the data suggests there would not be up regulation of NPPA or NPPB because the loci would not localize to RE0 and RE8 upon stimulation as the MEF2 binding site would be lost. As a result, the looping would stay similar to the non stimulated condition and the loci would be slightly less stringently packed around RE0, 6, 7, 8 but no major rearrangements would be seen. Although the data suggested the importance of RE7 to the whole loci, it is hypothesize targeting that region would not have much of an affect on the NPPA/NPPB locus but the overall effect remains unknown. The data suggests other minor REs identified such as RE2, RE3, and RE5 were used by the loci to fine- tune transcription regulation and likely do not play an important role on the overall structure of the locus. The CTCF positions noted in figure 5.13, could also be targeted to reveal their importance on the loci. It is predicted there would not be too much of an effect on the loci by 160 knocking the regions out because of the redundancy of looping from the TFs known to be in the loci such as MEF2, GATA, and NKX2-5. Although further studies would be needed to confirm this. Although this study has revealed many other REs in the NPPA/NPPB locus, confirmations using in vivo studies would be required. However, the identification of the REs helped further the understand both the mechanism of NPPB activation but also gave supporting evidence for MEF2/HDAC 3D interactions. The ability of the 4 loci to come together which all had MEF2 peaks was compelling in vivo evidence to support crystallography work of HDACs ability to form tetrameric structures. Beyond that, there was also strong evidence from p300 ChIP-Seq for overlapping three of these loci as predicted by the Chen lab previously [67]. These evidence will have a high impact for the field, specifically in the actively researched NPPA/NPPB loci. The data presented here will also be a strong resource for the HCM community as there has not been a genome-wide HiChIP assay completed in the system, so many loci can be specifically probed and further characterized. It is likely many of these regions will be regulated in similar manners based on the known mechanism of the disease [2] and similar correlations using pub- licly available ChIP-Seq can be made. Overall, the data was high-quality H3K27ac ChIP-Seq and HiChIP which showed many loci being perturbed by stimulation conditions and will be a resource used by many in the field. The far reaching of the project when it began was to further understand the role of MEF2/HDAC interactions in HCM. The first challenge was to develop a new assay to overcome the issues to generate a high quality MEF2 ChIP-Seq in NRVMs. Although the method completed initial proof of concept studies, TTT would generate only 2D information, so there would still be a need for other 3D assays. If the assay is fine-tuned in the future, it would be high impact for the whole field studying TFs but also because it would be possible to generate high quality data on MEF2 binding positions. If the MEF2 data was generated using TTT or standard ChIP-Seq, the data would be highly useful to understand 3D interactions genome-wide of the HiChIP completed here. The challenge of obtaining these data highlighted the importance of computational anal- ysis. It was originally hypothesized that using ChIP-Seq experiments to filter out HiC inter- 161 actions would be possible in order to generate TF specific looping patterns. However, as was shown, this process continues to be challenging due to many factors. The main issue high- lighted here was the converge of HiC was too high to generate statistically significant interac- tions, so the hits being generated could not be confirmed. Even when attempting to complete the computational approach in two different ways, the FDR of loci was still too high to come to unambiguous conclusions. It was likely the analysis was selecting for specific loci being controlled by MEF2 such as PLN, which also had differential looping in the H3K27ac (data not shown), but the overall findings could not be statistically confirmed. As a result of these first two studies, the object of the project was shifted from a protein (MEF2) centric analysis to a cis regulatory centric analysis with the target of H3K27ac. Not only did this allow for the completion of the ChIP-Seq and HiChIP experiments, it also created a resource which would be used more extensively in the field. This was because the information generated by the cis regulatory elements can help to explain more about overall gene regulation that a specific mark. Although for future work, confirmation of the hypothesis generated here would need MEF2 binding positions in NRVMs, the data would be used as a supportive element instead of the main discovery tool, which the data here can be used for. There is still a lot of work to be completed both in the NPPA and NPPB locus but also in the genome-wide understanding of HCM. The data generated here is one resource the community can use to further explore one of the most common heart diseases, HCM. 162 7 References References [1] ENCODE Project Consortium et al. The encode (encyclopedia of dna elements) project. Science, 306(5696):636–640, 2004. [2] Zegeye H Jebessa, Kumar D Shanmukha, Matthias Dewenter, Lorenz H Lehmann, Chang Xu, Friederike Schreiter, Dominik Siede, Xue-Min Gong, Barbara C Worst, Giuseppina Federico, et al. The lipid-droplet-associated protein abhd5 protects the heart through proteolysis of hdac4. Nature metabolism, 1(11):1157–1167, 2019. [3] Jianqin Wei, Shaurya Joshi, Svetlana Speransky, Christopher Crowley, Nimanthi Jay- athilaka, Xiao Lei, Yongqing Wu, David Gai, Sumit Jain, Michael Hoosien, et al. Rever- sal of pathological cardiac hypertrophy via the mef2-coregulator interface. JCI insight, 2(17), 2017. [4] Paul B Talbert, Michael P Meers, and Steven Henikoff. Old cogs, new tricks: the evo- lution of gene expression in a chromatin context. Nature Reviews Genetics, 20(5):283, 2019. [5] Joyce CK Man, Karel van Duijvenboden, Peter HL Krijger, Ingeborg B Hooijkaas, Inge- borg van der Made, Corrie de Gier-de Vries, Vincent Wakker, Esther E Creemers, Wouter de Laat, Bastiaan J Boukens, et al. Genetic dissection of a super enhancer controlling the nppa-nppb cluster in the heart. Circulation Research, 2021. [6] Barry J Maron. Hypertrophic cardiomyopathy: a systematic review. Jama, 287(10):1308–1320, 2002. [7] Johannes Backs, Kunhua Song, Svetlana Bezprozvannaya, Shurong Chang, and Eric N Olson. Cam kinase ii selectively signals to histone deacetylase 4 during cardiomyocyte hypertrophy. The Journal of clinical investigation, 116(7):1853–1864, 2006. [8] Johannes Backs, Paula Stein, Thea Backs, Francesca E Duncan, Chad E Grueter, John McAnally, Xiaoxia Qi, Richard M Schultz, and Eric N Olson. The isoform of cam kinase ii controls mouse egg activation by regulating cell cycle resumption. Proceedings of the National Academy of Sciences, 107(1):81–86, 2010. [9] Johannes Backs, Thea Backs, Stefan Neef, Michael M Kreusser, Lorenz H Lehmann, David M Patrick, Chad E Grueter, Xiaoxia Qi, James A Richardson, Joseph A Hill, et al. The isoform of cam kinase ii is required for pathological cardiac hypertrophy and remodeling after pressure overload. Proceedings of the National Academy of Sciences, 106(7):2342–2347, 2009. [10] Manuel Rosa-Garrido, Douglas J Chapski, Anthony D Schmitt, Todd H Kimball, Elaheh Karbassi, Emma Monte, Enrique Balderas, Matteo Pellegrini, Tsai-Ting Shih, Eliza- beth Soehalim, et al. High-resolution mapping of chromatin conformation in cardiac myocytes reveals structural remodeling of the epigenome in heart failure. Circulation, 136(17):1613–1625, 2017. 163 [11] Donald E Olins and Ada L Olins. Chromatin history: our view from the bridge. Nature reviews Molecular cell biology, 4(10):809–814, 2003. [12] C David Allis and Thomas Jenuwein. The molecular hallmarks of epigenetic control. Nature Reviews Genetics, 17(8):487, 2016. [13] Sandy L Klemm, Zohar Shipony, and William J Greenleaf. Chromatin accessibility and the regulatory epigenome. Nature Reviews Genetics, 20(4):207–220, 2019. [14] Oswald T Avery, Colin M MacLeod, and Maclyn McCarty. Studies on the chemical nature of the substance inducing transformation of pneumococcal types: induction of transformation by a desoxyribonucleic acid fraction isolated from pneumococcus type iii. The Journal of experimental medicine, 79(2):137–158, 1944. [15] P Oudet, M Gross-Bellard, and P Chambon. Electron microscopic and biochemical evidence that chromatin structure is a repeating unit. Cell, 4(4):281–300, 1975. [16] James D Watson, Francis HC Crick, et al. Molecular structure of nucleic acids. Nature, 171(4356):737–738, 1953. [17] Albrecht Kossel. Ueber die chemische Beschaffenheit des Zellkerns.... PA Norstedt, 1911. [18] VG Allfrey, R Faulkner, and AE Mirsky. Acetylation and methylation of histones and their possible role in the regulation of rna synthesis. Proceedings of the National Academy of Sciences of the United States of America, 51(5):786, 1964. [19] Ada L Olins and Donald E Olins. Spheroid chromatin units ( bodies). Science, 183(4122):330–332, 1974. [20] Roger D Kornberg. Chromatin structure: a repeating unit of histones and dna. Science, 184(4139):868–871, 1974. [21] Karolin Luger, Armin W M¨ ader, Robin K Richmond, David F Sargent, and Timothy J Richmond. Crystal structure of the nucleosome core particle at 2.8 ˚ a resolution. Nature, 389(6648):251–260, 1997. [22] Bryan M Turner. Decoding the nucleosome. Cell, 75(1):5–8, 1993. [23] TJ Richmond, JT Finch, B Rushton, D Rhodes, and A Klug. Structure of the nucleosome core particle at 7 ˚ a resolution. Nature, 311(5986):532–537, 1984. [24] James E Brownell, Jianxin Zhou, Tamara Ranalli, Ryuji Kobayashi, Diane G Edmond- son, Sharon Y Roth, and C David Allis. Tetrahymena histone acetyltransferase a: a homolog to yeast gcn5p linking histone acetylation to gene activation. Cell, 84(6):843– 851, 1996. [25] Geoffrey P Dann, Glen P Liszczak, John D Bagert, Manuel M M¨ uller, Uyen TT Nguyen, Felix Wojcik, Zachary Z Brown, Jeffrey Bos, Tatyana Panchenko, Rasmus Pihl, et al. Iswi chromatin remodellers sense nucleosome modifications to determine substrate pref- erence. Nature, 548(7669):607–611, 2017. 164 [26] Cheol-Koo Lee, Yoichiro Shibata, Bhargavi Rao, Brian D Strahl, and Jason D Lieb. Evidence for nucleosome depletion at active regulatory regions genome-wide. Nature genetics, 36(8):900–905, 2004. [27] Robert E Thurman, Eric Rynes, Richard Humbert, Jeff Vierstra, Matthew T Maurano, Eric Haugen, Nathan C Sheffield, Andrew B Stergachis, Hao Wang, Benjamin Vernot, et al. The accessible chromatin landscape of the human genome. Nature, 489(7414):75– 82, 2012. [28] Jason D Buenrostro, Beijing Wu, Howard Y Chang, and William J Greenleaf. Atac- seq: a method for assaying chromatin accessibility genome-wide. Current protocols in molecular biology, 109(1):21–29, 2015. [29] Andrew J Bannister and Tony Kouzarides. Regulation of chromatin by histone modifi- cations. Cell research, 21(3):381–395, 2011. [30] Yingming Zhao and Benjamin A Garcia. Comprehensive catalog of currently documented histone modifications. Cold Spring Harbor perspectives in biology, 7(9):a025064, 2015. [31] Vicky W Zhou, Alon Goren, and Bradley E Bernstein. Charting histone modifica- tions and the functional organization of mammalian genomes. Nature Reviews Genetics, 12(1):7–18, 2011. [32] Zhibin Wang, Chongzhi Zang, Kairong Cui, Dustin E Schones, Artem Barski, Weiqun Peng, and Keji Zhao. Genome-wide mapping of hats and hdacs reveals distinct functions in active and inactive genes. Cell, 138(5):1019–1031, 2009. [33] Hidemasa Goto, Yoshihiro Yasui, Erich A Nigg, and Masaki Inagaki. Aurora-b phos- phorylates histone h3 at serine28 with regard to the mitotic chromosome condensation. Genes to cells, 7(1):11–17, 2002. [34] Graeme L Cuthbert, Sylvain Daujat, Andrew W Snowden, Hediye Erdjument-Bromage, Teruki Hagiwara, Michiyuki Yamada, Robert Schneider, Philip D Gregory, Paul Tempst, Andrew J Bannister, et al. Histone deimination antagonizes arginine methylation. Cell, 118(5):545–553, 2004. [35] Yanming Wang, Joanna Wysocka, Joyce Sayegh, Young-Ho Lee, Julie R Perlin, Lau- riebeth Leonelli, Lakshmi S Sonbuchner, Charles H McDonald, Richard G Cook, Yali Dou, et al. Human pad4 regulates histone arginine methylation levels via demethylimi- nation. Science, 306(5694):279–283, 2004. [36] Kaoru Sakabe, Zihao Wang, and Gerald W Hart. -n-acetylglucosamine (o-glcnac) is part of the histone code. Proceedings of the National Academy of Sciences, 107(46):19915–19920, 2010. [37] Paul O Hassa, Sandra S Haenni, Michael Elser, and Michael O Hottiger. Nuclear adp- ribosylation reactions in mammalian cells: where are we today and where are we going? Microbiology and Molecular Biology Reviews, 70(3):789–829, 2006. [38] Raga Krishnakumar and W Lee Kraus. Parp-1 regulates chromatin structure and tran- scription through a kdm5b-dependent pathway. Molecular cell, 39(5):736–749, 2010. 165 [39] Hengbin Wang, Liangjun Wang, Hediye Erdjument-Bromage, Miguel Vidal, Paul Tempst, Richard S Jones, and Yi Zhang. Role of histone h2a ubiquitination in poly- comb silencing. Nature, 431(7010):873–878, 2004. [40] Yuzuru Shiio and Robert N Eisenman. Histone sumoylation is associated with transcrip- tional repression. Proceedings of the National Academy of Sciences, 100(23):13225– 13230, 2003. [41] Zhongzhou Chen, Jianye Zang, Johnathan Whetstine, Xia Hong, Foteini Davrazou, Ta- tiana G Kutateladze, Michael Simpson, Qilong Mao, Cheol-Ho Pan, Shaodong Dai, et al. Structural insights into histone demethylation by jmjd2 family members. Cell, 125(4):691–702, 2006. [42] Elizabeth M Duncan, Tara L Muratore-Schroeder, Richard G Cook, Benjamin A Garcia, Jeffrey Shabanowitz, Donald F Hunt, and C David Allis. Cathepsin l proteolytically pro- cesses histone h3 during mouse embryonic stem cell differentiation. Cell, 135(2):284– 294, 2008. [43] Ronald Richman, Louis G Chicoine, MP Collini, Richard G Cook, and C David Al- lis. Micronuclei and the cytoplasm of growing tetrahymena contain a histone acety- lase activity which is highly specific for free histone h4. The Journal of cell biology, 106(4):1017–1026, 1988. [44] X-J Yang and EHAT Seto. Hats and hdacs: from structure, function and regulation to novel strategies for therapy and prevention. Oncogene, 26(37):5310–5310, 2007. [45] Miyong Yun, Jun Wu, Jerry L Workman, and Bing Li. Readers of histone modifications. Cell research, 21(4):564–578, 2011. [46] Andrew F Neuwald. Gcn5-related histone n-acetyltransferases belong to a diverse su- perfamily that includes the yeast spt10 protein. Trends Bichem. Sci., 22:154–155, 1997. [47] Patrick A Grant, Laura Duggan, Jacques Cˆ ot´ e, Shannon M Roberts, James E Brownell, Reyes Candau, Reiko Ohba, Tom Owen-Hughes, C David Allis, Fred Winston, et al. Yeast gcn5 functions in two multisubunit complexes to acetylate nucleosomal histones: characterization of an ada complex and the saga (spt/ada) complex. Genes & develop- ment, 11(13):1640–1650, 1997. [48] Joseph Torchia, Christopher Glass, and Michael G Rosenfeld. Co-activators and co- repressors in the integration of transcriptional responses. Current opinion in cell biology, 10(3):373–383, 1998. [49] MR Parthun. Hat1: the emerging cellular roles of a type b histone acetyltransferase. Oncogene, 26(37):5319–5328, 2007. [50] Xiang-Jiao Yang and Serge Gr´ egoire. Class ii histone deacetylases: from sequence to function, regulation, and clinical implication. Molecular and cellular biology, 25(8):2873–2884, 2005. [51] Peter Hugo Lodewijk Krijger and Wouter De Laat. Regulation of disease-associated gene expression in the 3d genome. Nature reviews Molecular cell biology, 17(12):771, 2016. 166 [52] Gregory Seumois, Lukas Chavez, Anna Gerasimova, Matthias Lienhard, Nada Omran, Lukas Kalinke, Maria Vedanayagam, Asha Purnima V Ganesan, Ashu Chawla, Ratko Djukanovi´ c, et al. Epigenomic analysis of primary human t cells reveals enhancers associated with t h 2 memory cell differentiation and asthma susceptibility. Nature im- munology, 15(8):777, 2014. [53] Maxwell R Mumbach, Adam J Rubin, Ryan A Flynn, Chao Dai, Paul A Khavari, William J Greenleaf, and Howard Y Chang. Hichip: efficient and sensitive analysis of protein-directed genome architecture. Nature methods, 13(11):919, 2016. [54] Job Dekker, Karsten Rippe, Martijn Dekker, and Nancy Kleckner. Capturing chromo- some conformation. science, 295(5558):1306–1311, 2002. [55] Maxwell R Mumbach, Ansuman T Satpathy, Evan A Boyle, Chao Dai, Benjamin G Gowen, Seung Woo Cho, Michelle L Nguyen, Adam J Rubin, Jeffrey M Granja, Kate- lynn R Kazane, et al. Enhancer connectome in primary human cells identifies target genes of disease-associated dna elements. Nature genetics, 49(11):1602, 2017. [56] Vivek Chandra, Sourya Bhattacharyya, Benjamin J Schmiedel, Ariel Madrigal, Cristian Gonzalez-Colin, Stephanie Fotsing, Austin Crinklaw, Gregory Seumois, Pejman Mo- hammadi, Mitchell Kronenberg, et al. Promoter-interacting expression quantitative trait loci are enriched for functional genetic variants. Nature Genetics, pages 1–10, 2020. [57] Assaf Zemach and Daniel Zilberman. Evolution of eukaryotic dna methylation and the pursuit of safer sex. Current Biology, 20(17):R780–R785, 2010. [58] Jason T Huff and Daniel Zilberman. Dnmt1-independent cg methylation contributes to nucleosome positioning in diverse eukaryotes. Cell, 156(6):1286–1297, 2014. [59] Christina Ambrosi, Massimiliano Manzo, and Tuncay Baubec. Dynamics and context- dependent roles of dna methylation. Journal of molecular biology, 429(10):1459–1475, 2017. [60] Mehrnaz Fatemi, Martha M Pao, Shinwu Jeong, Einav Nili Gal-Yam, Gerda Egger, Daniel J Weisenberger, and Peter A Jones. Footprinting of mammalian promoters: use of a cpg dna methyltransferase revealing nucleosome positions at a single molecule level. Nucleic acids research, 33(20):e176–e176, 2005. [61] Serge Saxonov, Paul Berg, and Douglas L Brutlag. A genome-wide analysis of cpg dinucleotides in the human genome distinguishes two distinct classes of promoters. Pro- ceedings of the National Academy of Sciences, 103(5):1412–1417, 2006. [62] Kamel Jabbari and Giorgio Bernardi. Cytosine methylation and cpg, tpg (cpa) and tpa frequencies. Gene, 333:143–149, 2004. [63] Patricia J Wittkopp and Gizem Kalay. Cis-regulatory elements: molecular mechanisms and evolutionary processes underlying divergence. Nature Reviews Genetics, 13(1):59– 69, 2012. [64] Michael Bulger and Mark Groudine. Functional and mechanistic diversity of distal tran- scription enhancers. Cell, 144(3):327–339, 2011. 167 [65] Mike Levine. Transcriptional enhancers in animal development and evolution. Current Biology, 20(17):R754–R763, 2010. [66] Aim´ ee M Deaton and Adrian Bird. Cpg islands and the regulation of transcription. Genes & development, 25(10):1010–1022, 2011. [67] Liang Guo, Aidong Han, Darren L Bates, Jue Cao, and Lin Chen. Crystal struc- ture of a conserved n-terminal domain of histone deacetylase 4 reveals functional in- sights into glutamine-rich domains. Proceedings of the National Academy of Sciences, 104(11):4297–4302, 2007. [68] Aidong Han, Fan Pan, James C Stroud, Hong-Duk Youn, Jun O Liu, and Lin Chen. Sequence-specific recruitment of transcriptional co-repressor cabin1 by myocyte en- hancer factor-2. Nature, 422(6933):730, 2003. [69] Aidong Han, Ju He, Yongqing Wu, Jun O Liu, and Lin Chen. Mechanism of recruitment of class ii histone deacetylases by myocyte enhancer factor-2. Journal of molecular biology, 345(1):91–102, 2005. [70] Erez Lieberman-Aiden, Nynke L Van Berkum, Louise Williams, Maxim Imakaev, To- bias Ragoczy, Agnes Telling, Ido Amit, Bryan R Lajoie, Peter J Sabo, Michael O Dorschner, et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. science, 326(5950):289–293, 2009. [71] Robin Andersson and Albin Sandelin. Determinants of enhancer and promoter activities of regulatory elements. Nature Reviews Genetics, pages 1–17, 2019. [72] M Ryan Corces, Jeffrey M Granja, Shadi Shams, Bryan H Louie, Jose A Seoane, Wanding Zhou, Tiago C Silva, Clarice Groeneveld, Christopher K Wong, Seung Woo Cho, et al. The chromatin accessibility landscape of primary human cancers. Science, 362(6413), 2018. [73] Pablo Minguez, Luca Parca, Francesca Diella, Daniel R Mende, Runjun Kumar, Manuela Helmer-Citterich, Anne-Claude Gavin, Vera Van Noort, and Peer Bork. De- ciphering a global network of functionally associated post-translational modifications. Molecular systems biology, 8(1):599, 2012. [74] Pablo Minguez, Ivica Letunic, Luca Parca, and Peer Bork. Ptmcode: a database of known and predicted functional associations between post-translational modifications in proteins. Nucleic acids research, 41(D1):D306–D311, 2012. [75] Dean R Hewish and Leigh A Burgoyne. Chromatin sub-structure. the digestion of chro- matin dna at regularly spaced sites by a nuclear deoxyribonuclease. Biochemical and biophysical research communications, 52(2):504–510, 1973. [76] Carl Wu, Paul M Bingham, Kenneth J Livak, Robert Holmgren, and Sarah CR Elgin. The chromatin structure of specific genes: I. evidence for higher order domains of defined dna sequence. Cell, 16(4):797–806, 1979. [77] Gregory E Crawford, Sean Davis, Peter C Scacheri, Gabriel Renaud, Mohamad J Ha- lawi, Michael R Erdos, Roland Green, Paul S Meltzer, Tyra G Wolfsberg, and Francis S Collins. Dnase-chip: a high-resolution method to identify dnase i hypersensitive sites using tiled microarrays. Nature methods, 3(7):503–509, 2006. 168 [78] Peter J Sabo, Michael S Kuehn, Robert Thurman, Brett E Johnson, Ericka M Johnson, Hua Cao, Man Yu, Elizabeth Rosenzweig, Jeff Goldy, Andrew Haydock, et al. Genome- scale mapping of dnase i sensitivity in vivo using tiling dna microarrays. Nature methods, 3(7):511–518, 2006. [79] Alan P Boyle, Sean Davis, Hennady P Shulha, Paul Meltzer, Elliott H Margulies, Zhip- ing Weng, Terrence S Furey, and Gregory E Crawford. High-resolution mapping and characterization of open chromatin across the genome. Cell, 132(2):311–322, 2008. [80] Jay R Hesselberth, Xiaoyu Chen, Zhihong Zhang, Peter J Sabo, Richard Sandstrom, Alex P Reynolds, Robert E Thurman, Shane Neph, Michael S Kuehn, William S Noble, et al. Global mapping of protein-dna interactions in vivo by digital genomic footprinting. Nature methods, 6(4):283–289, 2009. [81] M Ryan Corces, Alexandro E Trevino, Emily G Hamilton, Peyton G Greenside, Nicholas A Sinnott-Armstrong, Sam Vesuna, Ansuman T Satpathy, Adam J Rubin, Kath- leen S Montine, Beijing Wu, et al. An improved atac-seq protocol reduces background and enables interrogation of frozen tissues. Nature methods, 14(10):959–962, 2017. [82] Jakub Mieczkowski, April Cook, Sarah K Bowman, Britta Mueller, Burak H Alver, Sharmistha Kundu, Aimee M Deaton, Jennifer A Urban, Erica Larschan, Peter J Park, et al. Mnase titration reveals differences between nucleosome occupancy and chromatin accessibility. Nature communications, 7(1):1–11, 2016. [83] Britta Mueller, Jakub Mieczkowski, Sharmistha Kundu, Peggy Wang, Ruslan Sadreyev, Michael Y Tolstorukov, and Robert E Kingston. Widespread changes in nucleosome accessibility without changes in nucleosome occupancy during a rapid transcriptional induction. Genes & development, 31(5):451–462, 2017. [84] James Allan, Ross M Fraser, Tom Owen-Hughes, and David Keszenman-Pereyra. Mi- crococcal nuclease does not substantially bias nucleosome mapping. Journal of molec- ular biology, 417(3):152–164, 2012. [85] Lakshminarayan M Iyer, Vivek Anantharaman, Maxim Y Wolf, and L Aravind. Com- parative genomics of transcription factors and chromatin proteins in parasitic protists and other eukaryotes. International journal for parasitology, 38(1):1–31, 2008. [86] Cedric R Clapier, Janet Iwasa, Bradley R Cairns, and Craig L Peterson. Mechanisms of action and regulation of atp-dependent chromatin-remodelling complexes. Nature reviews Molecular cell biology, 18(7):407–422, 2017. [87] Diego Calderon, Michelle LT Nguyen, Anja Mezger, Arwa Kathiria, Fabian M¨ uller, Vinh Nguyen, Ninnia Lescano, Beijing Wu, John Trombetta, Jessica V Ribado, et al. Landscape of stimulation-responsive chromatin across diverse human immune cells. Na- ture genetics, pages 1–12, 2019. [88] Liang Song, Shao-shan Carol Huang, Aaron Wise, Rosa Castanon, Joseph R Nery, Huaming Chen, Marina Watanabe, Jerushah Thomas, Ziv Bar-Joseph, and Joseph R Ecker. A transcription factor hierarchy defines an environmental stress response net- work. Science, 354(6312), 2016. 169 [89] David J Galas and Albert Schmitz. Dnaase footprinting a simple method for the detection of protein-dna binding specificity. Nucleic acids research, 5(9):3157–3170, 1978. [90] Jeff Vierstra and John A Stamatoyannopoulos. Genomic footprinting. Nature methods, 13(3):213–221, 2016. [91] Orit Rozenblatt-Rosen, Aviv Regev, Philipp Oberdoerffer, Tal Nawy, Anna Hupalowska, Jennifer E Rood, Orr Ashenberg, Ethan Cerami, Robert J Coffey, Emek Demir, et al. The human tumor atlas network: charting tumor transitions across space and time at single- cell resolution. Cell, 181(2):236–249, 2020. [92] Reza Kalhor, Harianto Tjong, Nimanthi Jayathilaka, Frank Alber, and Lin Chen. Genome architectures revealed by tethered chromosome conformation capture and population-based modeling. Nature biotechnology, 30(1):90, 2012. [93] Benjamin Joachim Schmiedel, Gr´ egory Seumois, Daniela Samaniego-Castruita, Justin Cayford, Veronique Schulten, Lukas Chavez, Ferhat Ay, Alessandro Sette, Bjoern Pe- ters, and Pandurangan Vijayanand. 17q21 asthma-risk variants switch ctcf binding and regulate il-2 production by t cells. Nature communications, 7(1):1–14, 2016. [94] Bogdan Pasaniuc and Alkes L Price. Dissecting the genetics of complex traits using summary association statistics. Nature Reviews Genetics, 18(2):117–127, 2017. [95] Kouichi Ozaki, Yozo Ohnishi, Aritoshi Iida, Akihiko Sekine, Ryo Yamada, Tatsuhiko Tsunoda, Hiroshi Sato, Hideyuki Sato, Masatsugu Hori, Yusuke Nakamura, et al. Func- tional snps in the lymphotoxin- gene that are associated with susceptibility to myocar- dial infarction. Nature genetics, 32(4):650–654, 2002. [96] Lucia A Hindorff, Praveen Sethupathy, Heather A Junkins, Erin M Ramos, Jayashri P Mehta, Francis S Collins, and Teri A Manolio. Potential etiologic and functional impli- cations of genome-wide association loci for human diseases and traits. Proceedings of the National Academy of Sciences, 106(23):9362–9367, 2009. [97] Matthew T Maurano, Richard Humbert, Eric Rynes, Robert E Thurman, Eric Haugen, Hao Wang, Alex P Reynolds, Richard Sandstrom, Hongzhu Qu, Jennifer Brody, et al. Systematic localization of common disease-associated variation in regulatory dna. Sci- ence, 337(6099):1190–1195, 2012. [98] Marc A Schaub, Alan P Boyle, Anshul Kundaje, Serafim Batzoglou, and Michael Sny- der. Linking disease associations with regulatory information in the human genome. Genome research, 22(9):1748–1759, 2012. [99] Anshul Kundaje, Wouter Meuleman, Jason Ernst, Misha Bilenky, Angela Yen, Alireza Heravi-Moussavi, Pouya Kheradpour, Zhizhuo Zhang, Jianrong Wang, Michael J Ziller, et al. Integrative analysis of 111 reference human epigenomes. Nature, 518(7539):317– 330, 2015. [100] Complex Trait Consortium et al. The nature and identification of quantitative trait loci: a community’s view. Nature reviews. Genetics, 4(11):911, 2003. 170 [101] Ga¨ el Yvert, Rachel B Brem, Jacqueline Whittle, Joshua M Akey, Eric Foss, Erin N Smith, Rachel Mackelprang, and Leonid Kruglyak. Trans-acting regulatory variation in saccharomyces cerevisiae and the role of transcription factors. Nature genetics, 35(1):57–64, 2003. [102] Matthew V Rockman and Leonid Kruglyak. Genetics of global gene expression. Nature Reviews Genetics, 7(11):862–872, 2006. [103] Alexandra C Nica and Emmanouil T Dermitzakis. Expression quantitative trait loci: present and future. Philosophical Transactions of the Royal Society B: Biological Sci- ences, 368(1620):20120362, 2013. [104] Dan L Nicolae, Eric Gamazon, Wei Zhang, Shiwei Duan, M Eileen Dolan, and Nancy J Cox. Trait-associated snps are more likely to be eqtls: annotation to enhance discovery from gwas. PLoS Genet, 6(4):e1000888, 2010. [105] Fabian Grubert, Judith B Zaugg, Maya Kasowski, Oana Ursu, Damek V Spacek, Alicia R Martin, Peyton Greenside, Rohith Srivas, Doug H Phanstiel, Aleksandra Pekowska, et al. Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Cell, 162(5):1051–1065, 2015. [106] David S Gilmour and John T Lis. Detecting protein-dna interactions in vivo: distribution of rna polymerase on specific bacterial genes. Proceedings of the National Academy of Sciences, 81(14):4275–4279, 1984. [107] DA VID S Gilmour and JOHN T Lis. In vivo interactions of rna polymerase ii with genes of drosophila melanogaster. Molecular and cellular biology, 5(8):2009–2018, 1985. [108] ENCODE Project Consortium et al. A user’s guide to the encyclopedia of dna elements (encode). PLoS biology, 9(4):e1001046, 2011. [109] Mark J Solomon and Alexander Varshavsky. Formaldehyde-mediated dna-protein crosslinking: a probe for in vivo chromatin structures. Proceedings of the National Academy of Sciences, 82(19):6470–6474, 1985. [110] Mark J. Solomon, P. G. Lund Larsen, and Alexander Varshavsky. Mapping proteindna interactions in vivo with formaldehyde: Evidence that histone h4 is retained on a highly transcribed gene. Cell, 53:937–947, 1988. [111] Valerio Orlando. Mapping chromosomal proteins in vivo by formaldehyde-crosslinked- chromatin immunoprecipitation. Trends in biochemical sciences, 25(3):99–104, 2000. [112] Peter C Dedon, Johann A Soults, C David Allis, and Martin A Gorovsky. A simpli- fied formaldehyde fixation and immunoprecipitation technique for studying protein-dna interactions. Analytical biochemistry, 197(1):83–90, 1991. [113] Andreas Hecht, Sabine Strahl-Bolsinger, and Michael Grunstein. Spreading of transcrip- tional represser sir3 from telomeric heterochromatin. Nature, 383(6595):92–96, 1996. [114] Yuval Blat and Nancy Kleckner. Cohesins bind to preferential sites along yeast chro- mosome iii, with differential regulation along arms versus the centric region. Cell, 98(2):249–259, 1999. 171 [115] Vishwanath R Iyer, Christine E Horak, Charles S Scafe, David Botstein, Michael Snyder, and Patrick O Brown. Genomic binding sites of the yeast cell-cycle transcription factors sbf and mbf. Nature, 409(6819):533–538, 2001. [116] Jason D Lieb, Xiaole Liu, David Botstein, and Patrick O Brown. Promoter-specific bind- ing of rap1 revealed by genome-wide maps of protein–dna association. Nature genetics, 28(4):327–334, 2001. [117] Bing Ren, Franc ¸ois Robert, John J Wyrick, Oscar Aparicio, Ezra G Jennings, Itamar Simon, Julia Zeitlinger, J¨ org Schreiber, Nancy Hannett, Elenita Kanin, et al. Genome- wide location and function of dna binding proteins. Science, 290(5500):2306–2309, 2000. [118] Tong Ihn Lee, Nicola J Rinaldi, Franc ¸ois Robert, Duncan T Odom, Ziv Bar-Joseph, Georg K Gerber, Nancy M Hannett, Christopher T Harbison, Craig M Thompson, Itamar Simon, et al. Transcriptional regulatory networks in saccharomyces cerevisiae. science, 298(5594):799–804, 2002. [119] Amy S Weinmann, Stephanie M Bartley, Theresa Zhang, Michael Q Zhang, and Peggy J Farnham. Use of chromatin immunoprecipitation to clone novel e2f target promoters. Molecular and cellular biology, 21(20):6820–6832, 2001. [120] David C Klein and Sarah J Hainer. Genomic methods in profiling dna accessibility and factor localization. Chromosome Research, 28(1):69–85, 2020. [121] Elizabeth A Hoffman, Brian L Frey, Lloyd M Smith, and David T Auble. Formalde- hyde crosslinking: a tool for the study of chromatin complexes. Journal of Biological Chemistry, 290(44):26404–26411, 2015. [122] Artem Barski, Suresh Cuddapah, Kairong Cui, Tae-Young Roh, Dustin E Schones, Zhibin Wang, Gang Wei, Iouri Chepelev, and Keji Zhao. High-resolution profiling of histone methylations in the human genome. Cell, 129(4):823–837, 2007. [123] David S Johnson, Ali Mortazavi, Richard M Myers, and Barbara Wold. Genome-wide mapping of in vivo protein-dna interactions. Science, 316(5830):1497–1502, 2007. [124] Tarjei S Mikkelsen, Manching Ku, David B Jaffe, Biju Issac, Erez Lieberman, Georgia Giannoukos, Pablo Alvarez, William Brockman, Tae-Kyung Kim, Richard P Koche, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature, 448(7153):553–560, 2007. [125] SC Elgin. The formation and function of dnase i hypersensitive sites in the process of gene activation. Journal of Biological Chemistry, 263(36):19259–19262, 1988. [126] Istvan Albert, Travis N Mavrich, Lynn P Tomsho, Ji Qi, Sara J Zanton, Stephan C Schus- ter, and B Franklin Pugh. Translational and rotational settings of h2a. z nucleosomes across the saccharomycescerevisiae genome. Nature, 446(7135):572–576, 2007. [127] Assaf Weiner, Tsung-Han S Hsieh, Alon Appleboim, Hsiuyi V Chen, Ayelet Rahat, Ido Amit, Oliver J Rando, and Nir Friedman. High-resolution chromatin dynamics during a yeast stress response. Molecular cell, 58(2):371–386, 2015. 172 [128] Justin Cayford, Sara Herrera-da la Mata, Benjamin Joachim Schmiedel, Pandurangan Vijayanand, and Gr´ egory Seumois. A semiautomated chip-seq procedure for large-scale epigenetic studies. Journal of Visualized Experiments, 162, 2020. [129] Diana Youhanna Jankeel, Justin Cayford, Benjamin Joachim Schmiedel, Pandurangan Vijayanand, and Gr´ egory Seumois. An integrated and semiautomated microscaled ap- proach to profile cis-regulatory elements by histone modification chip-seq for large-scale epigenetic studies. In Type 2 Immunity, pages 303–326. Springer, 2018. [130] Yong Zhang, Tao Liu, Clifford A Meyer, J´ erˆ ome Eeckhoute, David S Johnson, Bradley E Bernstein, Chad Nusbaum, Richard M Myers, Myles Brown, Wei Li, et al. Model-based analysis of chip-seq (macs). Genome biology, 9(9):R137, 2008. [131] Matthias Lienhard, Christina Grimm, Markus Morkel, Ralf Herwig, and Lukas Chavez. Medips: genome-wide differential coverage analysis of sequencing data derived from dna enrichment experiments. Bioinformatics, 30(2):284–286, 2013. [132] ENCODE Project Consortium et al. Identification and analysis of functional elements in 1% of the human genome by the encode pilot project. nature, 447(7146):799, 2007. [133] National Human Genome Research Institute (NHGRI), Mar 2017. https://www.nih.gov/about-nih/what-we-do/nih-almanac/national-human-genome- research-institute-nhgri. [134] Brendan Maher. The encyclodaedia. Nature, 489, 2012. [135] ENCODE Project Consortium et al. An integrated encyclopedia of dna elements in the human genome. Nature, 489(7414):57–74, 2012. [136] Christian Schmidl, Andr´ e F Rendeiro, Nathan C Sheffield, and Christoph Bock. Chip- mentation: fast, robust, low-input chip-seq for histones and transcription factors. Nature methods, 12(10):963, 2015. [137] K Mark Ansel, Rebecca J Greenwald, Suneet Agarwal, Craig H Bassing, Silvia Monti- celli, Jeneen Interlandi, Ivana M Djuretic, Dong U Lee, Arlene H Sharpe, Frederick W Alt, et al. Deletion of a conserved il4 silencer impairs t helper type 1–mediated immu- nity. Nature immunology, 5(12):1251–1259, 2004. [138] Pandurangan Vijayanand, Gr´ egory Seumois, Laura J Simpson, Sarah Abdul-Wajid, Dirk Baumjohann, Marisella Panduro, Xiaozhu Huang, Jeneen Interlandi, Ivana M Djuretic, Daniel R Brown, et al. Interleukin-4 production by follicular helper t cells requires the conserved il4 enhancer hypersensitivity site v. Immunity, 36(2):175–187, 2012. [139] John Arne Dahl and Philippe Collas. A quick and quantitative chromatin immunopre- cipitation assay for small cell samples. Front Biosci, 12:4925–4931, 2007. [140] Alex K Shalek, Rahul Satija, Xian Adiconis, Rona S Gertner, Jellert T Gaublomme, Raktima Raychowdhury, Schraga Schwartz, Nir Yosef, Christine Malboeuf, Diana Lu, et al. Single-cell transcriptomics reveals bimodality in expression and splicing in im- mune cells. Nature, 498(7453):236–240, 2013. [141] Tomer Kalisky and Stephen R Quake. Single-cell genomics. Nature methods, 8(4):311– 314, 2011. 173 [142] Barbara Treutlein, Doug G Brownfield, Angela R Wu, Norma F Neff, Gary L Mantalas, F Hernan Espinoza, Tushar J Desai, Mark A Krasnow, and Stephen R Quake. Re- constructing lineage hierarchies of the distal lung epithelium using single-cell rna-seq. Nature, 509(7500):371–375, 2014. [143] Anoop P Patel, Itay Tirosh, John J Trombetta, Alex K Shalek, Shawn M Gillespie, Hi- roaki Wakimoto, Daniel P Cahill, Brian V Nahed, William T Curry, Robert L Martuza, et al. Single-cell rna-seq highlights intratumoral heterogeneity in primary glioblastoma. Science, 344(6190):1396–1401, 2014. [144] Yong Wang, Jill Waters, Marco L Leung, Anna Unruh, Whijae Roh, Xiuqing Shi, Ken Chen, Paul Scheet, Selina Vattathil, Han Liang, et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature, 512(7513):155–160, 2014. [145] Assaf Rotem, Oren Ram, Noam Shoresh, Ralph A Sperling, Alon Goren, David A Weitz, and Bradley E Bernstein. Single-cell chip-seq reveals cell subpopulations defined by chromatin state. Nature biotechnology, 33(11):1165–1172, 2015. [146] Peter J Skene and Steven Henikoff. An efficient targeted nuclease strategy for high- resolution mapping of dna binding sites. Elife, 6:e21856, 2017. [147] Michael P Meers, Terri D Bryson, Jorja G Henikoff, and Steven Henikoff. Improved cut&run chromatin profiling tools. Elife, 8:e46314, 2019. [148] Hatice S Kaya-Okur, Steven J Wu, Christine A Codomo, Erica S Pledger, Terri D Bryson, Jorja G Henikoff, Kami Ahmad, and Steven Henikoff. Cut&tag for effi- cient epigenomic profiling of small samples and single cells. Nature communications, 10(1):1–10, 2019. [149] Shanshan Ai, Haiqing Xiong, Chen C Li, Yingjie Luo, Qiang Shi, Yaxi Liu, Xianhong Yu, Cheng Li, and Aibin He. Profiling chromatin states using single-cell itchip-seq. Nature cell biology, 21(9):1164–1172, 2019. [150] Andrew Bradbury and Andreas Pl¨ uckthun. Reproducibility: Standardize antibodies used in research. Nature, 518(7537):27–29, 2015. [151] Anand Venkataraman, Kun Yang, Jose Irizarry, Mark Mackiewicz, Paolo Mita, Zheng Kuang, Lin Xue, Devlina Ghosh, Shuang Liu, Pedro Ramos, et al. A toolbox of immunoprecipitation-grade monoclonal antibodies to human transcription factors. Na- ture methods, 15(5):330, 2018. [152] Lisa Berglund, Erik Bj¨ orling, Per Oksvold, Linn Fagerberg, Anna Asplund, Cristina Al-Khalili Szigyarto, Anja Persson, Jenny Ottosson, Henrik Wern´ erus, Peter Nilsson, et al. A genecentric human protein atlas for expression profiles based on antibodies. Molecular & cellular proteomics, 7(10):2019–2027, 2008. [153] C Glenn Begley and Lee M Ellis. Raise standards for preclinical cancer research. Nature, 483(7391):531–533, 2012. [154] Alexey Gavrilov, Sergey V Razin, and Giacomo Cavalli. In vivo formaldehyde cross- linking: it is time for black box analysis. Briefings in functional genomics, 14(2):163– 165, 2015. 174 [155] Valerio Orlando, Helen Strutt, and Renato Paro. Analysis of chromatin structure by in vivo formaldehyde cross-linking. Methods, 11(2):205–214, 1997. [156] Sascha Beneke, Kirstin Meyer, Anja Holtz, Katharina H¨ uttner, and Alexander B¨ urkle. Chromatin composition is changed by poly (adp-ribosyl) ation during chromatin im- munoprecipitation. PloS one, 7(3):e32914, 2012. [157] Leonid Teytelman, Deborah M Thurtle, Jasper Rine, and Alexander van Oudenaarden. Highly expressed loci are vulnerable to misleading chip localization of multiple unre- lated proteins. Proceedings of the National Academy of Sciences, 110(46):18602–18607, 2013. [158] Marieke Simonis, Petra Klous, Erik Splinter, Yuri Moshkin, Rob Willemsen, Elzo De Wit, Bas Van Steensel, and Wouter De Laat. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture–on-chip (4c). Nature genetics, 38(11):1348–1354, 2006. [159] Zhihu Zhao, Gholamreza Tavoosidana, Mikael Sj¨ olinder, Anita G¨ ond¨ or, Piero Mar- iano, Sha Wang, Chandrasekhar Kanduri, Magda Lezcano, Kuljeet Singh Sandhu, Umashankar Singh, et al. Circular chromosome conformation capture (4c) uncovers extensive networks of epigenetically regulated intra-and interchromosomal interactions. Nature genetics, 38(11):1341–1347, 2006. [160] Jos´ ee Dostie, Todd A Richmond, Ramy A Arnaout, Rebecca R Selzer, William L Lee, Tracey A Honan, Eric D Rubio, Anton Krumm, Justin Lamb, Chad Nusbaum, et al. Chromosome conformation capture carbon copy (5c): a massively parallel solution for mapping interactions between genomic elements. Genome research, 16(10):1299–1309, 2006. [161] Melissa J Fullwood, Mei Hui Liu, You Fu Pan, Jun Liu, Han Xu, Yusoff Bin Mohamed, Yuriy L Orlov, Stoyan Velkov, Andrea Ho, Poh Huay Mei, et al. An oestrogen-receptor- -bound human chromatin interactome. Nature, 462(7269):58–64, 2009. [162] Zong Wei, David Huang, Fan Gao, Wen-Hsuan Chang, Woojin An, Gerhard A Coet- zee, Kai Wang, and Wange Lu. Biological implications and regulatory mechanisms of long-range chromosomal interactions. Journal of Biological Chemistry, 288(31):22369– 22377, 2013. [163] Takashi Nagano, Yaniv Lubling, Tim J Stevens, Stefan Schoenfelder, Eitan Yaffe, Wendy Dean, Ernest D Laue, Amos Tanay, and Peter Fraser. Single-cell hi-c reveals cell-to-cell variability in chromosome structure. Nature, 502(7469). [164] Meizhen Zheng, Simon Zhongyuan Tian, Daniel Capurso, Minji Kim, Rahul Mau- rya, Byoungkoo Lee, Emaly Piecuch, Liang Gong, Jacqueline Jufen Zhu, Zhihui Li, et al. Multiplex chromatin interactions with single-molecule precision. Nature, 566(7745):558–562, 2019. [165] Rongxin Fang, Miao Yu, Guoqiang Li, Sora Chee, Tristin Liu, Anthony D Schmitt, and Bing Ren. Mapping of long-range chromatin interactions by proximity ligation-assisted chip-seq. Cell research, 26(12):1345–1348, 2016. 175 [166] Barbara McClintock. The origin and behavior of mutable loci in maize. Proceedings of the National Academy of Sciences, 36(6):344–355, 1950. [167] Ramy K Aziz, Mya Breitbart, and Robert A Edwards. Transposases are the most abun- dant, most ubiquitous genes in nature. Nucleic acids research, 38(13):4207–4217, 2010. [168] William S Reznikoff. Transposon tn 5. Annual review of genetics, 42:269–286, 2008. [169] WILLIAM S Reznikoff. The tn5 transposon. Annual review of microbiology, 47(1):945– 963, 1993. [170] Douglas E Berg, Julian Davies, Bernard Allet, and Jean-David Rochaix. Transposition of r factor genes to bacteriophage lambda. Proceedings of the National Academy of Sciences, 72(9):3628–3632, 1975. [171] E-A Auerswald, G Ludwig, and H Schaller. Structural analysis of tn5. In Cold Spring Harbor symposia on quantitative biology, volume 45, pages 107–113. Cold Spring Har- bor Laboratory Press, 1981. [172] Philippe Mazodier, Pascale Cossart, Evelyne Giraud, and Francis Gasser. Completion of the nucleotide sequence of the central region of tn 5 confirms the presence of three resistance genes. Nucleic acids research, 13(1):195–205, 1985. [173] Todd A Naumann and William S Reznikoff. Tn5 transposase with an altered specificity for transposon ends. Journal of bacteriology, 184(1):233–240, 2002. [174] Christian D Adams, Bernhard Schnurr, Dunja Skoko, John F Marko, and William S Reznikoff. Tn5 transposase loops dna in the absence of tn5 transposon end sequences. Molecular microbiology, 62(6):1558–1568, 2006. [175] Richard J Gradman and William S Reznikoff. Tn5 synaptic complex formation: role of transposase residue w450. Journal of bacteriology, 190(4):1484–1487, 2008. [176] Michael D Weinreich, Lisa Mahnke-Braam, and William S Reznikoff. A functional analysis of the tn5 transposase identification of domains required for dna binding and multimerization. Journal of molecular biology, 241(2):166–177, 1994. [177] Dona York and William S Reznikoff. Purification and biochemical analyses of a monomeric form of tn 5 transposase. Nucleic acids research, 24(19):3790–3796, 1996. [178] Soheila Vaezeslami, Rachel Sterling, and William S Reznikoff. Site-directed mutagene- sis studies of tn5 transposase residues involved in synaptic complex formation. Journal of bacteriology, 189(20):7436–7441, 2007. [179] Mindy Steiniger-White and William S Reznikoff. The c-terminal helix of tn5 trans- posase is required for synaptic complex formation. Journal of Biological Chemistry, 275(30):23127–23133, 2000. [180] Douglas R Davies, Igor Y Goryshin, William S Reznikoff, and Ivan Rayment. Three- dimensional structure of the tn5 synaptic complex transposition intermediate. Science, 289(5476):77–85, 2000. 176 [181] Sally S Twining, Igor Y Goryshin, Archna Bhasin, and William S Reznikoff. Functional characterization of arginine 30, lysine 40, and arginine 62 in tn5 transposase. Journal of Biological Chemistry, 276(25):23135–23143, 2001. [182] Archna Bhasin, Igor Y Goryshin, and William S Reznikoff. Hairpin formation in tn5 transposition. Journal of Biological Chemistry, 274(52):37021–37029, 1999. [183] Christian D Adams, Bernhard Schnurr, John F Marko, and William S Reznikoff. Pulling apart catalytically active tn5 synaptic complexes using magnetic tweezers. Journal of molecular biology, 367(2):319–327, 2007. [184] Vadim A Klenchin, Agata Czyz, Igor Y Goryshin, Richard Gradman, Scott Lovell, Ivan Rayment, and William S Reznikoff. Phosphate coordination and movement of dna in the tn5 synaptic complex: role of the (r) yrek motif. Nucleic acids research, 36(18):5855– 5862, 2008. [185] Igor Yu Goryshin and William S Reznikoff. Tn5 in vitro transposition. Journal of Biological Chemistry, 273(13):7367–7374, 1998. [186] Simone Picelli, ˚ Asa K Bj¨ orklund, Bj¨ orn Reinius, Sven Sagasser, G¨ osta Winberg, and Rickard Sandberg. Tn5 transposase and tagmentation procedures for massively scaled sequencing projects. Genome research, 24(12):2033–2040, 2014. [187] Qianhao Wang, Haiqing Xiong, Shanshan Ai, Xianhong Yu, Yaxi Liu, Jiejie Zhang, and Aibin He. Cobatch for high-throughput single-cell epigenomic profiling. Molecular cell, 76(1):206–216, 2019. [188] Andrew P Landstrom and Michael J Ackerman. Mutation type is not clinically useful in predicting prognosis in hypertrophic cardiomyopathy. Circulation, 122(23):2441–2450, 2010. [189] Emelia J Benjamin, Paul Muntner, Alvaro Alonso, Marcio S Bittencourt, Clifton W Callaway, April P Carson, Alanna M Chamberlain, Alexander R Chang, Susan Cheng, Sandeep R Das, et al. Heart disease and stroke statistics-2019 update a report from the american heart association. Circulation, 2019. [190] Jian Qin Wei, Lina Shehadeh, James Mitrani, Monica Pessanha, Tatiana I Slepak, Keith A Webster, and Nanette H Bishopric. Quantitative control of adaptive cardiac hypertrophy by acetyltransferase p300. Circulation, 118(9):934, 2008. [191] Barry J Maron. Hypertrophic cardiomyopathy. Circulation, 106(19):2419–2421, 2002. [192] Barry J Maron, Martin S Maron, and Christopher Semsarian. Genetics of hypertrophic cardiomyopathy after 20 years: clinical perspectives. Journal of the American College of Cardiology, 60(8):705–715, 2012. [193] Donald Teare. Asymmetrical hypertrophy of the heart in young adults. British heart journal, 20(1):1, 1958. [194] John A Jarcho, William McKenna, JA Peter Pare, Scott D Solomon, Randall F Hol- combe, Shaughan Dickie, Tatjana Levi, Helen Donis-Keller, JG Seidman, and Chris- tine E Seidman. Mapping a gene for familial hypertrophic cardiomyopathy to chromo- some 14q1. New England Journal of Medicine, 321(20):1372–1378, 1989. 177 [195] Barry J Maron, Robert O Bonow, Richard O Cannon III, Martin B Leon, and Stephen E Epstein. Hypertrophic cardiomyopathy. New England Journal of Medicine, 316(14):844–852, 1987. [196] Barry J Maron, Julius M Gardin, John M Flack, Samuel S Gidding, Tom T Kurosaki, and Diane E Bild. Prevalence of hypertrophic cardiomyopathy in a general population of young adults: echocardiographic analysis of 4111 subjects in the cardia study. Circu- lation, 92(4):785–789, 1995. [197] Barry J Maron, Ethan J Rowin, Susan A Casey, Mark S Link, John R Lesser, Ray- mond HM Chan, Ross F Garberich, James E Udelson, and Martin S Maron. Hyper- trophic cardiomyopathy in adulthood associated with low cardiovascular mortality with contemporary management strategies. Journal of the American College of Cardiology, 65(18):1915–1928, 2015. [198] Barry J Maron, Ethan J Rowin, Susan A Casey, John R Lesser, Ross F Garberich, Deepa M McGriff, and Martin S Maron. Hypertrophic cardiomyopathy in children, adolescents, and young adults associated with low cardiovascular mortality with con- temporary management strategies. Circulation, 133(1):62–73, 2016. [199] Srijita Sen-Chowdhry, Daniel Jacoby, James C Moon, and William J McKenna. Up- date on hypertrophic cardiomyopathy and a guide to the guidelines. Nature Reviews Cardiology, 13(11):651, 2016. [200] Anja AT Geisterfer-Lowrance, Susan Kass, Gary Tanigawa, Hans-Peter V osberg, William McKenna, Christine E Seidman, and JG Seidman. A molecular basis for fa- milial hypertrophic cardiomyopathy: a cardiac myosin heavy chain gene missense mutation. Cell, 62(5):999–1006, 1990. [201] A Woo, H Rakowski, JC Liew, MS Zhao, CC Liew, TG Parker, M Zeller, ED Wigle, and MJ Sole. Mutations of the myosin heavy chain gene in hypertrophic cardiomyopathy: critical functional sites determine prognosis. Heart, 89(10):1179–1185, 2003. [202] Hugh Watkins, Anthony Rosenzweig, Dar-San Hwang, Tatjana Levi, William McKenna, Christine E Seidman, and JG Seidman. Characteristics and prognostic implications of myosin missense mutations in familial hypertrophic cardiomyopathy. New England Journal of Medicine, 326(17):1108–1114, 1992. [203] Hideshi Niimura, Linda L Bachinski, Somkiat Sangwatanaroj, Hugh Watkins, Al- bert E Chudley, William McKenna, Arni Kristinsson, Robert Roberts, Michael Sole, Barry J Maron, et al. Mutations in the gene for cardiac myosin-binding protein c and late-onset familial hypertrophic cardiomyopathy. New England Journal of Medicine, 338(18):1248–1257, 1998. [204] Michael J Ackerman, Sara L VanDriest, Steve R Ommen, Melissa L Will, Rick A Nishimura, A Jamil Tajik, and Bernard J Gersh. Prevalence and age-dependence of ma- lignant mutations in the beta-myosin heavy chain and troponin t genes in hypertrophic cardiomyopathy: a comprehensive outpatient perspective. Journal of the American Col- lege of Cardiology, 39(12):2042–2048, 2002. 178 [205] Sara L Van Driest, Michael J Ackerman, Steve R Ommen, Rameen Shakur, Melissa L Will, Rick A Nishimura, A Jamil Tajik, and Bernard J Gersh. Prevalence and severity of “benign” mutations in the-myosin heavy chain, cardiac troponin t, and-tropomyosin genes in hypertrophic cardiomyopathy. Circulation, 106(24):3085–3090, 2002. [206] Pascale Richard, Philippe Charron, Lucie Carrier, C´ eline Ledeuil, Theary Cheav, Claire Pichereau, Abdelaziz Benaiche, Richard Isnard, Olivier Dubourg, Marc Burban, et al. Hypertrophic cardiomyopathy: distribution of disease genes, spectrum of mutations, and implications for a molecular diagnosis strategy. Circulation, 107(17):2227–2232, 2003. [207] J Erdmann, S Daehmlow, S Wischke, Mm Senyuva, U Werner, J Raible, N Tanis, S Dy- achenko, M Hummel, R Hetzer, et al. Mutation spectrum in a large cohort of unrelated consecutive patients with hypertrophic cardiomyopathy. Clinical genetics, 64(4):339– 349, 2003. [208] Paal Skytt Andersen, Ole Havndrup, Lotte Hougs, Karina M Sørensen, Morten Jensen, Lars Allan Larsen, Paula Hedley, Alex Rojas Bie Thomsen, Johanna Moolman-Smook, Michael Christiansen, et al. Diagnostic yield, interpretation, and clinical utility of mu- tation screening of sarcomere encoding genes in danish hypertrophic cardiomyopathy patients and relatives. Human mutation, 30(3):363–370, 2009. [209] Christine E Seidman and JG Seidman. Identifying sarcomere gene mutations in hy- pertrophic cardiomyopathy: a personal history. Circulation research, 108(6):743–750, 2011. [210] David J Tester and Michael J Ackerman. Genetic testing for potentially lethal, highly treatable inherited cardiomyopathies/channelopathies in clinical practice. Circulation, 123(9):1021–1037, 2011. [211] Maria E Zoghbi, John L Woodhead, Richard L Moss, and Roger Craig. Three- dimensional structure of vertebrate cardiac muscle myosin filaments. Proceedings of the National Academy of Sciences, 105(7):2386–2390, 2008. [212] Masahiko Hoshijima. Mechanical stress-strain sensors embedded in cardiac cytoskele- ton: Z disk, titin, and associated structures. American Journal of Physiology-Heart and Circulatory Physiology, 290(4):H1313–H1325, 2006. [213] Christian Geier, Andreas Perrot, Cemil Ozcelik, Priska Binner, Damian Counsell, Katrin Hoffmann, Bernhard Pilz, Yvonne Martiniak, Katja Gehmlich, Peter FM van der Ven, et al. Mutations in the human muscle lim protein gene in families with hypertrophic cardiomyopathy. Circulation, 107(10):1390–1395, 2003. [214] Takeharu Hayashi, Takuro Arimura, Manatsu Itoh-Satoh, Kazuo Ueda, Shigeru Hohda, Natsuko Inagaki, Megumi Takahashi, Hisae Hori, Michio Yasunami, Hirofumi Nishi, et al. Tcap gene mutations in hypertrophic cardiomyopathy and dilated cardiomyopathy. Journal of the American College of Cardiology, 44(11):2192–2201, 2004. [215] J Martijn Bos, Rainer N Poley, Melissa Ny, David J Tester, Xiaolei Xu, Matteo Vatta, Jef- frey A Towbin, Bernard J Gersh, Steve R Ommen, and Michael J Ackerman. Genotype– phenotype relationships involving hypertrophic cardiomyopathy-associated mutations in titin, muscle lim protein, and telethonin. Molecular genetics and metabolism, 88(1):78– 85, 2006. 179 [216] Jeanne L Theis, J Martijn Bos, Virginia B Bartleson, Melissa L Will, Josepha Binder, Matteo Vatta, Jeffrey A Towbin, Bernard J Gersh, Steve R Ommen, and Michael J Ack- erman. Echocardiographic-determined septal morphology in z-disc hypertrophic car- diomyopathy. Biochemical and biophysical research communications, 351(4):896–902, 2006. [217] Vlad C Vasile, Steve R Ommen, William D Edwards, and Michael J Ackerman. A missense mutation in a ubiquitously expressed protein, vinculin, confers susceptibility to hypertrophic cardiomyopathy. Biochemical and biophysical research communications, 345(3):998–1003, 2006. [218] Vlad C Vasile, Melissa L Will, Steve R Ommen, William D Edwards, Timothy M Olson, and Michael J Ackerman. Identification of a metavinculin missense mutation, r975w, associated with both hypertrophic and dilated cardiomyopathy. Molecular genetics and metabolism, 87(2):169–174, 2006. [219] Adriana Osio, Lily Tan, Suet N Chen, Raffaella Lombardi, Sherif F Nagueh, Sanjay Shete, Robert Roberts, James T Willerson, and Ali J Marian. Myozenin 2 is a novel gene for human hypertrophic cardiomyopathy. Circulation research, 100(6):766–768, 2007. [220] Andrew P Landstrom, Noah Weisleder, Karin B Batalden, J Martijn Bos, David J Tester, Steve R Ommen, Xander HT Wehrens, William C Claycomb, Jae-Kyun Ko, Moonsun Hwang, et al. Mutations in jph2-encoded junctophilin-2 associated with hypertrophic cardiomyopathy in humans. Journal of molecular and cellular cardiology, 42(6):1026– 1035, 2007. [221] Le Cong, F Ann Ran, David Cox, Shuailiang Lin, Robert Barretto, Naomi Habib, Patrick D Hsu, Xuebing Wu, Wenyan Jiang, Luciano A Marraffini, et al. Multiplex genome engineering using crispr/cas systems. Science, 339(6121):819–823, 2013. [222] Johannes Backs, Barbara C Worst, Lorenz H Lehmann, David M Patrick, Zegeye Jebessa, Michael M Kreusser, Qiang Sun, Lan Chen, Claudia Heft, Hugo A Katus, et al. Selective repression of mef2 activity by pka-dependent proteolysis of hdac4. J Cell Biol, 195(3):403–415, 2011. [223] Jianqin Wei, Shaurya Joshi, Svetlana Speransky, Christopher Crowley, Nimanthi Jay- athilaka, Xiao Lei, Yongqing Wu, David Gai, Sumit Jain, Michael Hoosien, et al. Rever- sal of pathological cardiac hypertrophy via the mef2-coregulator interface. JCI insight, 2(17), 2017. [224] Michael M Kreusser, Lorenz H Lehmann, Stanislav Keranov, Marc-Oskar Hoting, Michael Kohlhaas, Jan-Christian Reil, Kay Neumann, Michael D Schneider, Joseph A Hill, Dobromir Dobrev, et al. The cardiac camkii genes and contribute redundantly to adverse remodeling but inhibit calcineurin-induced myocardial hypertrophy. Circula- tion, pages CIRCULATIONAHA–114, 2014. [225] Lorenz H Lehmann, Zegeye H Jebessa, Michael M Kreusser, Axel Horsch, Tao He, Mariya Kronlage, Matthias Dewenter, Viviana Sramek, Ulrike Oehl, Jutta Krebs- Haupenthal, et al. A proteolytic fragment of histone deacetylase 4 protects the heart from failure by regulating the hexosamine biosynthetic pathway. Nature medicine, 24(1):62, 2018. 180 [226] Dawinder S Sohal, Mai Nghiem, Michael A Crackower, Sandra A Witt, Thomas R Kim- ball, Kevin M Tymitz, Josef M Penninger, and Jeffery D Molkentin. Temporally reg- ulated and tissue-specific gene manipulations in the adult and embryonic heart using a tamoxifen-inducible cre protein. Circulation research, 89(1):20–25, 2001. [227] Ramtin Agah, Peter A Frenkel, Brent A French, Lloyd H Michael, Paul A Overbeek, Michael D Schneider, et al. Gene recombination in postmitotic cells. targeted expres- sion of cre recombinase provokes cardiac-restricted, site-specific rearrangement in adult ventricular muscle in vivo. The Journal of clinical investigation, 100(1):169–179, 1997. [228] Josep M Colomer, Lan Mao, Howard A Rockman, and Anthony R Means. Pressure overload selectively up-regulates ca2+/calmodulin-dependent protein kinase ii in vivo. Molecular endocrinology, 17(2):183–192, 2003. [229] Richard D Patten and Monica R Hall-Porter. Small animal models of heart failure: development of novel therapies, past and present. Circulation: Heart Failure, 2(2):138– 144, 2009. [230] Steven R Houser, Kenneth B Margulies, Anne M Murphy, Francis G Spinale, Gary S Francis, Sumanth D Prabhu, Howard A Rockman, David A Kass, Jeffery D Molkentin, Mark A Sussman, et al. Animal models of heart failure: a scientific statement from the american heart association. Circulation research, 111(1):131–150, 2012. [231] Daniel A Richards, Mark J Aronovitz, Timothy D Calamaras, Kelly Tam, Gregory L Martin, Peiwen Liu, Heather K Bowditch, Phyllis Zhang, Gordon S Huggins, and Robert M Blanton. Distinct phenotypes induced by three degrees of transverse aortic constriction in mice. Scientific Reports, 9(1):1–15, 2019. [232] Nanette H Bishopric and Larry Kedes. Adrenergic regulation of the skeletal alpha- actin gene promoter during myocardial cell hypertrophy. Proceedings of the National Academy of Sciences, 88(6):2132–2136, 1991. [233] Wang Wang, Weizhong Zhu, Shiqiang Wang, Dongmei Yang, Michael T Crow, Rui- Ping Xiao, and Heping Cheng. Sustained 1-adrenergic stimulation modulates car- diac contractility by ca2+/calmodulin kinase signaling pathway. Circulation research, 95(8):798–806, 2004. [234] Thomas H Fischer, Jonas Herting, Theodor Tirilomis, Andr´ e Renner, Stefan Neef, Karl Toischer, David Ellenberger, Anna F¨ orster, Jan D Schmitto, Jan Gummert, et al. Ca2+/calmodulin-dependent protein kinase ii and protein kinase a differentially regulate sarcoplasmic reticulum ca2+ leak in human cardiac pathology. Circulation, 128(9):970– 981, 2013. [235] Matthias Dewenter, Albert von der Lieth, Hugo A Katus, and Johannes Backs. Cal- cium signaling and transcriptional regulation in cardiomyocytes. Circulation research, 121(8):1000–1020, 2017. [236] Mariya Kronlage, Matthias Dewenter, Johannes Grosso, Thomas Fleming, Ulrike Oehl, Lorenz H Lehmann, Inˆ es Falc˜ ao-Pires, Adelino F Leite-Moreira, Nadine V olk, Hermann-Josef Gr¨ one, et al. O-glcnacylation of histone deacetylase 4 protects the dia- betic heart from failure. Circulation, 140(7):580–594, 2019. 181 [237] Tao He, Jiale Huang, Lan Chen, Gang Han, David Stanmore, Jutta Krebs-Haupenthal, Metin Avkiran, Marco Hagenm¨ uller, and Johannes Backs. Cyclic amp represses patho- logical mef2 activation by myocyte-specific hypo-phosphorylation of hdac5. Journal of Molecular and Cellular Cardiology, 2020. [238] Annemieke JM de Ruijter, Albert H van GENNIP, Huib N Caron, Stephan Kemp, and Andr´ e BP van KUILENBURG. Histone deacetylases (hdacs): characterization of the classical hdac family. Biochemical Journal, 370(3):737–749, 2003. [239] Xiang-Jiao Yang and Edward Seto. The rpd3/hda1 family of lysine deacetylases: from bacteria and yeast to mice and men. Nature reviews Molecular cell biology, 9(3):206– 218, 2008. [240] Sabnam Parbin, Swayamsiddha Kar, Arunima Shilpi, Dipta Sengupta, Moonmoon Deb, Sandip Kumar Rath, and Samir Kumar Patra. Histone deacetylases: a saga of per- turbed acetylation homeostasis in cancer. Journal of Histochemistry & Cytochemistry, 62(1):11–33, 2014. [241] Edward Seto and Minoru Yoshida. Erasers of histone acetylation: the histone deacetylase enzymes. Cold Spring Harbor perspectives in biology, 6(4):a018713, 2014. [242] Roy A Frye. Characterization of five human cdnas with homology to the yeast sir2 gene: Sir2-like proteins (sirtuins) metabolize nad and may have protein adp-ribosyltransferase activity. Biochemical and biophysical research communications, 260(1):273–279, 1999. [243] Paola Gallinari, Stefania Di Marco, Phillip Jones, Michele Pallaoro, and Christian Steink¨ uhler. Hdacs, histone deacetylation and gene transcription: from molecular bi- ology to cancer therapeutics. Cell research, 17(3):195–211, 2007. [244] Michael Haberland, Rusty L Montgomery, and Eric N Olson. The many roles of his- tone deacetylases in development and physiology: implications for disease and therapy. Nature Reviews Genetics, 10(1):32–42, 2009. [245] Lorenz H Lehmann, Barbara C Worst, David A Stanmore, and Johannes Backs. His- tone deacetylase signaling in cardioprotection. Cellular and Molecular Life Sciences, 71(9):1673–1690, 2014. [246] Mathias Hohl, Michael Wagner, Jan-Christian Reil, Sarah-Anne M¨ uller, Marcus Tauch- nitz, Angela M Zimmer, Lorenz H Lehmann, Gerald Thiel, Michael B¨ ohm, Johannes Backs, et al. Hdac4 controls histone methylation in response to elevated cardiac load. The Journal of clinical investigation, 123(3):1359–1370, 2013. [247] Qing Lin, John Schwarz, Corazon Bucana, and Eric N Olson. Control of mouse cardiac morphogenesis and myogenesis by transcription factor mef2c. Science, 276(5317):1404–1407, 1997. [248] Rick B Vega, Koichi Matsuda, Junyoung Oh, Ana C Barbosa, Xiangli Yang, Eric Mead- ows, John McAnally, Chris Pomajzl, John M Shelton, James A Richardson, et al. Histone deacetylase 4 controls chondrocyte hypertrophy during skeletogenesis. Cell, 119(4):555–566, 2004. 182 [249] Christina Karamboulas, Albert Swedani, Chris Ward, Ashraf S Al-Madhoun, Sharon Wilton, Sophie Boisvenue, Alan G Ridgeway, and Ilona S Skerjanc. Hdac activity reg- ulates entry of mesoderm cells into the cardiac muscle lineage. Journal of cell science, 119(20):4305–4314, 2006. [250] Timothy A McKinsey, Chun-Li Zhang, Jianrong Lu, and Eric N Olson. Signal- dependent nuclear export of a histone deacetylase regulates muscle differentiation. Na- ture, 408(6808):106–111, 2000. [251] Tong Zhang, Michael Kohlhaas, Johannes Backs, Shikha Mishra, William Phillips, Na- taliya Dybkova, Shurong Chang, Haiyun Ling, Donald M Bers, Lars S Maier, et al. Camkii isoforms differentially affect calcium handling but similarly regulate hdac/mef2 transcriptional responses. Journal of Biological Chemistry, 282(48):35078–35087, 2007. [252] Chun Li Zhang, Timothy A McKinsey, and Eric N Olson. Association of class ii histone deacetylases with heterochromatin protein 1: potential role for histone methylation in control of muscle differentiation. Molecular and cellular biology, 22(20):7302–7312, 2002. [253] Zhengke Wang, Gangjian Qin, and Ting C Zhao. Hdac4: mechanism of regulation and biological functions. Epigenomics, 6(1):139–150, 2014. [254] Amy Wang, Siavash K Kurdistani, and Michael Grunstein. Requirement of hos2 histone deacetylase for gene activity in yeast. Science, 298(5597):1412–1414, 2002. [255] Julia R Pon and Marco A Marra. Mef2 transcription factors: developmental regulators and emerging cancer genes. Oncotarget, 7(3):2297, 2016. [256] Ralston M Barnes, Ian S Harris, Eric J Jaehnig, Kimberly Sauls, Tanvi Sinha, Anabel Rojas, William Schachterle, David J McCulley, Russell A Norris, and Brian L Black. Mef2c regulates outflow tract alignment and transcriptional control of tdgf1. Develop- ment, 143(5):774–779, 2016. [257] Michael P Verzi, David J McCulley, Sarah De Val, Evdokia Dodou, and Brian L Black. The right ventricle, outflow tract, and ventricular septum comprise a restricted expression domain within the secondary/anterior heart field. Developmental biology, 287(1):134–145, 2005. [258] Zsuzsanna Schwarz-Sommer, Peter Huijser, Wolfgang Nacken, Heinz Saedler, and Hans Sommer. Genetic control of flower development by homeotic genes in antirrhinum ma- jus. Science, 250(4983):931–936, 1990. [259] Roy Pollock and Richard Treisman. Human srf-related proteins: Dna-binding properties and potential regulatory targets. Genes & development, 5(12a):2327–2341, 1991. [260] Martin V Thai, Suresh Guruswamy, Kim T Cao, Jeffrey E Pessin, and Ann Louise Olson. Myocyte enhancer factor 2 (mef2)-binding site is required forglut4 gene expression in transgenic mice regulation of mef2 dna binding activity in insulin-deficient diabetes. Journal of Biological Chemistry, 273(23):14285–14292, 1998. 183 [261] Wenwu Wu, Xiaotai Huang, Jian Cheng, Zhenggang Li, Stefan de Folter, Zhuoran Huang, Xiaoqian Jiang, Hongxia Pang, and Shiheng Tao. Conservation and evolution in and among srf-and mef2-type mads domains and their binding sites. Molecular biology and evolution, 28(1):501–511, 2011. [262] John N Milligan and Emmitt R Jolly. Identification and characterization of a mef2 tran- scriptional activator in schistosome parasites. PLoS Negl Trop Dis, 6(1):e1443, 2012. [263] Ju He, Jun Ye, Yongfei Cai, Cecilia Riquelme, Jun O Liu, Xuedong Liu, Aidong Han, and Lin Chen. Structure of p300 bound to mef2 on dna reveals a mechanism of en- hanceosome assembly. Nucleic acids research, 39(10):4464–4474, 2011. [264] Irina A Sergeeva, Ingeborg B Hooijkaas, Jan M Ruijter, Ingeborg Van Der Made, Nina E De Groot, Harmen JG van de Werken, Esther E Creemers, and Vincent M Christoffels. Identification of a regulatory domain controlling the nppa-nppb gene cluster during heart development and stress. Development, 143(12):2135–2146, 2016. [265] Irina A Sergeeva and Vincent M Christoffels. Regulation of expression of atrial and brain natriuretic peptide, biomarkers for heart development and disease. Biochimica et Biophysica Acta (BBA)-Molecular Basis of Disease, 1832(12):2403–2413, 2013. [266] Joyce Man, Phil Barnett, and Vincent M Christoffels. Structure and function of the nppa–nppb cluster locus during heart development and disease. Cellular and Molecular Life Sciences, 75(8):1435–1444, 2018. [267] Yong Zhang, Tao Liu, Clifford A Meyer, J´ erˆ ome Eeckhoute, David S Johnson, Bradley E Bernstein, Chad Nusbaum, Richard M Myers, Myles Brown, Wei Li, et al. Model-based analysis of chip-seq (macs). Genome biology, 9(9):R137, 2008. [268] Bo Huang, Wenqin Wang, Mark Bates, and Xiaowei Zhuang. Three-dimensional super-resolution imaging by stochastic optical reconstruction microscopy. Science, 319(5864):810–813, 2008. [269] Xiaofeng Feng and Sean D Colloms. In vitro transposition of isy100, a bacterial insertion sequence belonging to the tc1/mariner family. Molecular microbiology, 65(6):1432– 1443, 2007. [270] Xiaofeng Feng, Amy L Bednarz, and Sean D Colloms. Precise targeted integration by a chimaeric transposase zinc-finger fusion protein. Nucleic acids research, 38(4):1204– 1216, 2009. [271] Daniel G Gibson, Lei Young, Ray-Yuan Chuang, J Craig Venter, Clyde A Hutchison, and Hamilton O Smith. Enzymatic assembly of dna molecules up to several hundred kilobases. Nature methods, 6(5):343–345, 2009. [272] Daniel G Gibson, John I Glass, Carole Lartigue, Vladimir N Noskov, Ray-Yuan Chuang, Mikkel A Algire, Gwynedd A Benders, Michael G Montague, Li Ma, Monzia M Moodie, et al. Creation of a bacterial cell controlled by a chemically synthesized genome. science, 329(5987):52–56, 2010. [273] Daniel G Gibson. Enzymatic assembly of overlapping dna fragments. In Methods in enzymology, volume 498, pages 349–361. Elsevier, 2011. 184 [274] Tsung-Han S Hsieh, Assaf Weiner, Bryan Lajoie, Job Dekker, Nir Friedman, and Oliver J Rando. Mapping nucleosome resolution chromosome folding in yeast by micro-c. Cell, 162(1):108–119, 2015. [275] Suhas SP Rao, Miriam H Huntley, Neva C Durand, Elena K Stamenova, Ivan D Bochkov, James T Robinson, Adrian L Sanborn, Ido Machol, Arina D Omer, Eric S Lander, et al. A 3d map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell, 159(7):1665–1680, 2014. [276] Neva C Durand, Muhammad S Shamim, Ido Machol, Suhas SP Rao, Miriam H Hunt- ley, Eric S Lander, and Erez Lieberman Aiden. Juicer provides a one-click system for analyzing loop-resolution hi-c experiments. Cell systems, 3(1):95–98, 2016. [277] Ferhat Ay, Timothy L Bailey, and William Stafford Noble. Statistical confidence estima- tion for hi-c data reveals regulatory chromatin contacts. Genome research, 24(6):999– 1011, 2014. [278] Nicolas Servant, Nelle Varoquaux, Bryan R Lajoie, Eric Viara, Chong-Jian Chen, Jean- Philippe Vert, Edith Heard, Job Dekker, and Emmanuel Barillot. Hic-pro: an optimized and flexible pipeline for hi-c data processing. Genome biology, 16(1):259, 2015. [279] Mattia Forcato, Chiara Nicoletti, Koustav Pal, Carmen Maria Livi, Francesco Ferrari, and Silvio Bicciato. Comparison of computational methods for hi-c data analysis. Nature methods, 14(7):679, 2017. [280] Xun Lan, Heather Witt, Koichi Katsumura, Zhenqing Ye, Qianben Wang, Emery H Bres- nick, Peggy J Farnham, and Victor X Jin. Integration of hi-c and chip-seq data reveals distinct types of chromatin linkages. Nucleic acids research, 40(16):7690–7704, 2012. [281] Francesca Telese, Qi Ma, Patricia Montilla Perez, Dimple Notani, Soohwan Oh, Wenbo Li, Davide Comoletti, Kenneth A Ohgi, Havilah Taylor, and Michael G Rosenfeld. Lrp8- reelin-regulated neuronal enhancer signature underlying learning and memory forma- tion. Neuron, 86(3):696–710, 2015. [282] Roberto Papait, Simone Serio, Christina Pagiatakis, Francesca Rusconi, Pierluigi Carullo, Marta Mazzola, Nicolo Salvarani, Michele Miragoli, and Gianluigi Condorelli. Histone methyltransferase g9a is required for cardiomyocyte homeostasis and hypertro- phy. Circulation, 136(13):1233–1246, 2017. [283] Olga N Kuvardina, Julia Herglotz, Stephan Kolodziej, Nicole Kohrs, Stefanie Herkt, Bartosch Wojcik, Thomas Oellerich, Jasmin Corso, Kira Behrens, Ashok Kumar, et al. Runx1 represses the erythroid gene expression program during megakaryocytic differen- tiation. Blood, The Journal of the American Society of Hematology, 125(23):3570–3579, 2015. [284] Aibin He, Sek Won Kong, Qing Ma, and William T Pu. Co-occupancy by multiple car- diac transcription factors identifies transcriptional enhancers active in heart. Proceedings of the National Academy of Sciences, 108(14):5632–5637, 2011. [285] Stephanie Wales, Sara Hashemi, Alexandre Blais, and John C McDermott. Global mef2 target gene analysis in cardiac and skeletal muscle reveals novel regulation of dusp6 by p38mapk-mef2 signaling. Nucleic acids research, 42(18):11349–11362, 2014. 185 [286] Michael I Love, Wolfgang Huber, and Simon Anders. Moderated estimation of fold change and dispersion for rna-seq data with deseq2. Genome biology, 15(12):550, 2014. [287] Aaron R Quinlan and Ira M Hall. Bedtools: a flexible suite of utilities for comparing genomic features. Bioinformatics, 26(6):841–842, 2010. [288] Alexander Dobin, Carrie A Davis, Felix Schlesinger, Jorg Drenkow, Chris Zaleski, Son- ali Jha, Philippe Batut, Mark Chaisson, and Thomas R Gingeras. Star: ultrafast universal rna-seq aligner. Bioinformatics, 29(1):15–21, 2013. [289] Konstantin Okonechnikov, Ana Conesa, and Fernando Garc´ ıa-Alcalde. Qualimap 2: advanced multi-sample quality control for high-throughput sequencing data. Bioinfor- matics, 32(2):292–294, 2016. [290] Willemijn Van Eldik, Brigit Den Adel, Jantine Monshouwer-Kloots, Daniela Salvatori, Saskia Maas, Ingeborg Van Der Made, Esther E Creemers, Derk Frank, Norbert Frey, Nicky Boontje, et al. Z-disc protein chapb induces cardiomyopathy and contractile dys- function in the postnatal heart. Plos one, 12(12):e0189139, 2017. [291] Ljubica Perisic Matic, Urszula Rykaczewska, Anton Razuvaev, Maria Sabater-Lleal, Mariette Lengquist, Clint L Miller, Ida Ericsson, Samuel R¨ ohl, Malin Kronqvist, Silvia Aldi, et al. Phenotypic modulation of smooth muscle cells in atherosclerosis is associ- ated with downregulation of lmod1, synpo2, pdlim7, pln, and synm. Arteriosclerosis, thrombosis, and vascular biology, 36(9):1947–1961, 2016. [292] Yuri Kim, Dillon Phan, Eva Van Rooij, Da-Zhi Wang, John McAnally, Xiaoxia Qi, James A Richardson, Joseph A Hill, Rhonda Bassel-Duby, Eric N Olson, et al. The mef2d transcription factor mediates stress-dependent cardiac remodeling in mice. The Journal of clinical investigation, 118(1):124–132, 2008. [293] Jian Xu, Nanling L Gong, Ilona Bodi, Bruce J Aronow, Peter H Backx, and Jeffery D Molkentin. Myocyte enhancer factors 2a and 2c induce dilated cardiomyopathy in trans- genic mice. Journal of Biological Chemistry, 281(14):9152–9162, 2006. [294] Hsuan-Ting Huang, Ondra M Brand, Matthen Mathew, Christos Ignatiou, Elizabeth P Ewen, Sarah A Mccalmon, and Francisco J Naya. Myomaxin is a novel transcriptional target of mef2a that encodes a xin-related-actinin-interacting protein. Journal of Bio- logical Chemistry, 281(51):39370–39379, 2006. [295] Haley W Sinn, Janne Balsamo, Jack Lilien, and Jim J-C Lin. Localization of the novel xin protein to the adherens junction complex in cardiac and skeletal muscle during devel- opment. Developmental dynamics: an official publication of the American Association of Anatomists, 225(1):1–13, 2002. [296] Niels Ulrik Brandt Hansen, Nicholas Willumsen, Jannie Marie B¨ ulow Sand, Lise Larsen, Morten Asser Karsdal, and Diana Julie Leeming. Type viii collagen is elevated in dis- eases associated with angiogenesis and vascular remodeling. Clinical biochemistry, 49(12):903–908, 2016. [297] Javier Barallobre-Barreiro, Shashi K Gupta, Anna Zoccarato, Rika Kitazume-Taneike, Marika Fava, Xiaoke Yin, Tessa Werner, Marc N Hirt, Anna Zampetaki, Alessandro Viviano, et al. Glycoproteomics reveals decorin peptides with anti-myostatin activity in human atrial fibrillation. Circulation, 134(11):817–832, 2016. 186 [298] Fabio Quondamatteo, Dieter P Reinhardt, Noe L Charbonneau, Gabriele Pophal, Lynn Y Sakai, and Rainer Herken. Fibrillin-1 and fibrillin-2 in human embryonic and early fetal development. Matrix biology, 21(8):637–646, 2002. [299] Yoshimasa Seike, Kenji Minatoya, Hitoshi Matsuda, Hatsue Ishibashi-Ueda, Hiroko Morisaki, Takayuki Morisaki, and Junjiro Kobayashi. Histologic differences between the ascending and descending aortas in young adults with fibrillin-1 mutations. The Journal of thoracic and cardiovascular surgery, 159(4):1214–1220, 2020. [300] Norifumi Takeda, Ryo Inuzuka, Sonoko Maemura, Hiroyuki Morita, Kan Nawata, Daishi Fujita, Yuki Taniguchi, Haruo Yamauchi, Hiroki Yagi, Masayoshi Kato, et al. Impact of pathogenic fbn1 variant types on the progression of aortic disease in patients with marfan syndrome. Circulation: Genomic and Precision Medicine, 11(6):e002058, 2018. [301] Paolo Salvi, Andrea Grillo, Susan Marelli, Lan Gao, Lucia Salvi, Maurizio Viecca, Anna Maria Di Blasio, Renzo Carretta, Alessandro Pini, and Gianfranco Parati. Aortic dilatation in marfan syndrome: role of arterial stiffness and fibrillin-1 variants. Journal of hypertension, 36(1):77–84, 2018. [302] Lun Tan, Zongze Li, Chengming Zhou, Yanyan Cao, Lina Zhang, Xianqing Li, Kather- ine Cianflone, Yan Wang, and Dao Wen Wang. Fbn1 mutations largely contribute to sporadic non-syndromic aortic dissection. Human Molecular Genetics, 26(24):4814– 4822, 2017. [303] Ellen S Regalado, DC Guo, Regie Lyn P Santos-Cortez, Ellen Hostetler, Tracy A Bensend, Hariyadarshi Pannu, Anthony Estrera, Hazim Safi, Anna L Mitchell, James P Evans, et al. Pathogenic fbn1 variants in familial thoracic aortic aneurysms and dissec- tions. Clinical genetics, 89(6):719–723, 2016. [304] Vaiva Lesauskaite, Ramune Sepetiene, Giedre Jariene, Vaiva Patamsyte, Giedrius Zuko- vas, Ingrida Grabauskyte, Zita Stanioniene, Raimondas Sirmenis, and Rimantas Benetis. Fbn1 polymorphisms in patients with the dilatative pathology of the ascending thoracic aorta. European Journal of Cardio-Thoracic Surgery, 47(4):e124–e130, 2015. [305] Linnea M Baudhuin, Katrina E Kotzer, and Susan A Lagerstedt. Increased frequency of fbn1 truncating and splicing variants in marfan syndrome patients with aortic events. Genetics in Medicine, 17(3):177–187, 2015. [306] Olga A Iakoubova, Carmen H Tong, Charles M Rowland, May M Luke, Veronica E Gar- cia, Joseph J Catanese, Remo M Moomiaie, Peter Sotonyi, Gyorgy Ascady, Demitrios Nikas, et al. Genetic variants in fbn-1 and risk for thoracic aortic aneurysm and dissec- tion. PloS one, 9(4):e91437, 2014. [307] Francesca N Delling and Ramachandran S Vasan. Epidemiology and pathophysiology of mitral valve prolapse: new insights into disease progression, genetics, and molecular basis. Circulation, 129(21):2158–2170, 2014. [308] Wouter P te Rijdt, Judith N ten Sande, Thomas M Gorter, Paul A van der Zwaag, Ingrid A van Rijsingen, S Matthijs Boekholdt, J Peter van Tintelen, Paul L van Haelst, R Nils Planken, Rudolf A de Boer, et al. Myocardial fibrosis as an early feature in phospho- lamban p. arg14del mutation carriers: phenotypic insights from cardiovascular magnetic 187 resonance imaging. European Heart Journal-Cardiovascular Imaging, 20(1):92–100, 2019. [309] Delaine K Ceholski, Irene C Turnbull, Chi-Wing Kong, Simon Koplev, Joshua May- ourian, Przemek A Gorski, Francesca Stillitano, Angelos A Skodras, Mathieu Nonnen- macher, Ninette Cohen, et al. Functional and transcriptomic insights into pathogenesis of r9c phospholamban mutation using human induced pluripotent stem cell-derived car- diomyocytes. Journal of molecular and cellular cardiology, 119:147–154, 2018. [310] Wouter P te Rijdt, Angeliki Asimaki, Jan DH Jongbloed, Edgar T Hoorntje, Elisabetta Lazzarini, Paul A van der Zwaag, Rudolf A de Boer, J Peter van Tintelen, Jeffrey E Saffitz, Maarten P van den Berg, et al. Distinct molecular signature of phospholamban p. arg14del arrhythmogenic cardiomyopathy. Cardiovascular Pathology, 40:2–6, 2019. [311] Sarah ED Nelson, Kim N Ha, Tata Gopinath, Mara H Exline, Alessandro Mascioni, David D Thomas, and Gianluigi Veglia. Effects of the arg9cys and arg25cys mutations on phospholamban’s conformational equilibrium in membrane bilayers. Biochimica et Biophysica Acta (BBA)-Biomembranes, 1860(6):1335–1341, 2018. [312] Shahrzad Sepehrkhouy, Johannes MIH Gho, Ren´ e van Es, Magdalena Harakalova, Nico- laas de Jonge, Dennis Dooijes, Jasper J van der Smagt, Marc P Buijsrogge, Richard NW Hauer, Roel Goldschmeding, et al. Distinct fibrosis pattern in desmosomal and phospho- lamban mutation carriers in hereditary cardiomyopathies. Heart Rhythm, 14(7):1024– 1032, 2017. [313] Kyun Hoo Kim, Yoshikazu Nakaoka, Hellmut G Augustin, and Gou Young Koh. My- ocardial angiopoietin-1 controls atrial chamber morphogenesis by spatiotemporal degra- dation of cardiac jelly. Cell reports, 23(8):2455–2466, 2018. [314] Yoh Arita, Yoshikazu Nakaoka, Taichi Matsunaga, Hiroyasu Kidoya, Kohei Yamamizu, Yuichiro Arima, Takahiro Kataoka-Hashimoto, Kuniyasu Ikeoka, Taku Yasui, Takeshi Masaki, et al. Myocardium-derived angiopoietin-1 is essential for coronary vein forma- tion in the developing heart. Nature communications, 5(1):1–14, 2014. [315] Michael A Trembley, Lissette S Velasquez, Karen L de Mesy Bentley, and Eric M Small. Myocardin-related transcription factors control the motility of epicardium-derived cells and the maturation of coronary vessels. Development, 142(1):21–30, 2015. [316] Maxim Imakaev, Geoffrey Fudenberg, Rachel Patton McCord, Natalia Naumova, Anton Goloborodko, Bryan R Lajoie, Job Dekker, and Leonid A Mirny. Iterative correction of hi-c data reveals hallmarks of chromosome organization. Nature methods, 9(10):999– 1003, 2012. [317] Heng Li, Bob Handsaker, Alec Wysoker, Tim Fennell, Jue Ruan, Nils Homer, Gabor Marth, Goncalo Abecasis, and Richard Durbin. The sequence alignment/map format and samtools. Bioinformatics, 25(16):2078–2079, 2009. [318] Guilherme Silva Julian, Renato Watanabe de Oliveira, Sergio Tufik, and Jair Ribeiro Chagas. Analysis of the stability of housekeeping gene expression in the left cardiac ventricle of rats submitted to chronic intermittent hypoxia. Jornal Brasileiro de Pneu- mologia, 42(3):211–214, 2016. 188 [319] Chongshan Dai, Qinfeng Li, Herman I May, Chao Li, Guangyu Zhang, Gaurav Sharma, A Dean Sherry, Craig R Malloy, Chalermchai Khemtong, Yuannyu Zhang, et al. Lac- tate dehydrogenase a governs cardiac hypertrophic growth in response to hemodynamic stress. Cell reports, 32(9):108087, 2020. [320] Mengyong Yan, Nourredine Himoudi, Martin Pule, Neil Sebire, Edmund Poon, Allison Blair, Owen Williams, and John Anderson. Development of cellular immune responses against pax5, a novel target for cancer immunotherapy. Cancer research, 68(19):8058– 8065, 2008. [321] Kevin Morgan, Edward B Stevens, Bhaval Shah, Peter J Cox, Alistair K Dixon, Kevin Lee, Robert D Pinnock, John Hughes, Peter J Richardson, Kenji Mizuguchi, et al. 3: an additional auxiliary subunit of the voltage-sensitive sodium channel that modulates channel gating with distinct kinetics. Proceedings of the National Academy of Sciences, 97(5):2308–2313, 2000. [322] Danny L Costantini, Eric P Arruda, Pooja Agarwal, Kyoung-Han Kim, Yonghong Zhu, Wei Zhu, Melanie Lebel, Chi Wa Cheng, Chong Y Park, Stephanie A Pierce, et al. The homeodomain transcription factor irx5 establishes the mouse cardiac ventricular repolarization gradient. Cell, 123(2):347–358, 2005. [323] Barbara Rosati, Frederic Grau, and David McKinnon. Regional variation in mrna tran- script abundance within the ventricular wall. Journal of molecular and cellular cardiol- ogy, 40(2):295–302, 2006. [324] Nicolas Servant, Nelle Varoquaux, Bryan R Lajoie, Eric Viara, Chong-Jian Chen, Jean- Philippe Vert, Edith Heard, Job Dekker, and Emmanuel Barillot. Hic-pro: an optimized and flexible pipeline for hi-c data processing. Genome biology, 16(1):1–11, 2015. [325] Xiong Ji, Daniel B Dadon, Benjamin E Powell, Zi Peng Fan, Diego Borges-Rivera, Sigal Shachar, Abraham S Weintraub, Denes Hnisz, Gianluca Pegoraro, Tong Ihn Lee, et al. 3d chromosome regulatory landscape of human pluripotent cells. Cell stem cell, 18(2):262–275, 2016. [326] Sourya Bhattacharyya, Vivek Chandra, Pandurangan Vijayanand, and Ferhat Ay. Identi- fication of significant chromatin contacts from hichip data by fithichip. Nature commu- nications, 10(1):1–14, 2019. [327] Yubo Zhang, Chee-Hong Wong, Ramon Y Birnbaum, Guoliang Li, Rebecca Favaro, Chew Yee Ngan, Joanne Lim, Eunice Tai, Huay Mei Poh, Eleanor Wong, et al. Chro- matin connectivity maps reveal dynamic promoter–enhancer long-range associations. Nature, 504(7479):306–310, 2013. [328] W James Kent, Ann S Zweig, G Barber, Angie S Hinrichs, and Donna Karolchik. Bigwig and bigbed: enabling browsing of large distributed datasets. Bioinformatics, 26(17):2204–2207, 2010. [329] Mark D Robinson, Davis J McCarthy, and Gordon K Smyth. edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinfor- matics, 26(1):139–140, 2010. 189 [330] M Ryan Corces, Anna Shcherbina, Soumya Kundu, Michael J Gloudemans, Laure Fr´ esard, Jeffrey M Granja, Bryan H Louie, Tiffany Eulalio, Shadi Shams, S Tansu Bag- datli, et al. Single-cell epigenomic analyses implicate candidate causal variants at inher- ited risk loci for alzheimer’s and parkinson’s diseases. Nature genetics, 52(11):1158– 1168, 2020. [331] Andrea Morelli, Christian Ertmer, Sebastian Rehberg, Matthias Lange, Alessandra Orec- chioni, Amalia Laderchi, Alessandra Bachetoni, Mariadomenica D’Alessandro, Hugo Van Aken, Paolo Pietropaoli, et al. Phenylephrine versus norepinephrine for initial hemodynamic support of patients with septic shock: a randomized, controlled trial. Crit- ical Care, 12(6):1–11, 2008. [332] Ashley J Wiese, Linda S Barter, Jan E Ilkiw, Mark D Kittleson, and Bruno H Pypendop. Cardiovascular and respiratory effects of incremental doses of dopamine and phenyle- phrine in the management of isoflurane-induced hypotension in cats with hypertrophic cardiomyopathy. American journal of veterinary research, 73(6):908–916, 2012. [333] TK Biswas and GS Getz. Promoter-promoter interactions influencing transcription of the yeast mitochondrial gene, oli 1, coding for atpase subunit 9. cis and trans effects. Journal of Biological Chemistry, 263(10):4844–4851, 1988. [334] Nicola Reynolds, Aoife O’Shaughnessy, and Brian Hendrich. Transcriptional repressors: multifaceted regulators of gene expression. Development, 140(3):505–512, 2013. [335] Liesbeth Minnoye, Georgi K Marinov, Thomas Krausgruber, Lixia Pan, Alexandre P Marand, Stefano Secchia, William J Greenleaf, Eileen EM Furlong, Keji Zhao, Robert J Schmitz, et al. Chromatin accessibility profiling methods. Nature Reviews Methods Primers, 1(1):1–24, 2021. [336] Guoliang Li, Melissa J Fullwood, Han Xu, Fabianus Hendriyan Mulawadi, Stoyan Velkov, Vinsensius Vega, Pramila Nuwantha Ariyaratne, Yusoff Bin Mohamed, Hong- Sain Ooi, Chandana Tennakoon, et al. Chia-pet tool for comprehensive chromatin inter- action analysis with paired-end tag sequencing. Genome biology, 11(2):R22, 2010. [337] Jens P Goetze, Benoit G Bruneau, Hugo R Ramos, Tsuneo Ogawa, Mercedes Kuroski de Bold, and J Adolfo. Cardiac natriuretic peptides. Nature Reviews Cardiology, 17(11):698–717, 2020. [338] Y Saito, K Nakao, H Arai, K Nishimura, K Okumura, K Obata, G Takemura, H Fujiwara, A Sugawara, T Yamada, et al. Augmented expression of atrial natriuretic polypeptide gene in ventricle of human failing heart. The Journal of clinical investigation, 83(1):298– 305, 1989. [339] Arthur M Feldman, PE Ray, CM Silan, JA Mercer, Wayne Minobe, and MR Bristow. Selective gene expression in failing human heart. quantification of steady-state levels of messenger rna in endomyocardial biopsies using the polymerase chain reaction. Circu- lation, 83(6):1866–1872, 1991. [340] Richard Troughton, G Michael Felker, and James L Januzzi Jr. Natriuretic peptide- guided heart failure management. European heart journal, 35(1):16–24, 2014. 190 [341] Marta de Antonio, Josep Lupon, Amparo Galan, Joan Vila, Agustin Urrutia, and An- toni Bayes-Genis. Combined use of high-sensitivity cardiac troponin t and n-terminal pro-b type natriuretic peptide improves measurements of performance over established mortality risk factors in chronic heart failure. American heart journal, 163(5):821–828, 2012. [342] Jiangping Wu, Branka Kovaˇ ciˇ c-Milivojevi´ c, Margot C Lapointe, Karl Nakamura, and David G Gardner. Cis-active determinants of cardiac-specific expression in the human atrial natriuretic peptide gene. Molecular Endocrinology, 5(9):1311–1322, 1991. [343] Koji Inoue, Takashi Sakamoto, Shinya Yuge, Hozi Iwatani, Sayaka Yamagami, Makiko Tsutsumi, Hiroshi Hori, Maria Carmela Cerra, Bruno Tota, Norio Suzuki, et al. Struc- tural and functional evolution of three cardiac natriuretic peptides. Molecular biology and evolution, 22(12):2428–2434, 2005. [344] Koji Inoue, Kiyoshi Naruse, Sayaka Yamagami, Hiroshi Mitani, Norio Suzuki, and Yoshio Takei. Four functionally distinct c-type natriuretic peptides found in fish re- veal evolutionary history of the natriuretic peptide system. Proceedings of the National Academy of Sciences, 100(17):10079–10084, 2003. [345] Juan J Tena, M Eva Alonso, Elisa de La Calle-Mustienes, Erik Splinter, Wouter De Laat, Miguel Manzanares, and Jos´ e Luis G´ omez-Skarmeta. An evolutionarily conserved three- dimensional structure in the vertebrate irx clusters facilitates enhancer sharing and coreg- ulation. Nature communications, 2(1):1–9, 2011. [346] Christof Nolte, Tim Jinks, Xinghao Wang, Mar´ ıa Teresa Martinez Pastor, and Robb Krumlauf. Shadow enhancers flanking the hoxb cluster direct dynamic hox expression in early heart and endoderm development. Developmental biology, 383(1):158–173, 2013. [347] Juan J Tena, M Eva Alonso, Elisa de La Calle-Mustienes, Erik Splinter, Wouter De Laat, Miguel Manzanares, and Jos´ e Luis G´ omez-Skarmeta. An evolutionarily conserved three- dimensional structure in the vertebrate irx clusters facilitates enhancer sharing and coreg- ulation. Nature communications, 2(1):1–9, 2011. [348] Cristina Vassalle, Maria Grazia Andreassi, Concetta Prontera, Marianna Fontana, Luc Zyw, Claudio Passino, and Michele Emdin. Influence of sca i and natriuretic peptide (np) clearance receptor polymorphisms of the np system on np concentration in chronic heart failure. Clinical chemistry, 53(11):1886–1890, 2007. [349] Michael J Flister, Shirng-Wern Tsaih, Caitlin C O’Meara, Bradley Endres, Matthew J Hoffman, Aron M Geurts, Melinda R Dwinell, Jozef Lazar, Howard J Jacob, and Carol Moreno. Identifying multiple causative genes at a single gwas locus. Genome research, 23(12):1996–2002, 2013. [350] Thomas Horsthuis, Arjan C Houweling, Petra EMH Habets, Frederik J de Lange, Hamid el Azzouzi, Danielle EW Clout, Antoon FM Moorman, and Vincent M Christoffels. Distinct regulation of developmental and heart disease–induced atrial natriuretic factor expression by two separate distal sequences. Circulation research, 102(7):849–859, 2008. 191 [351] Sonisha A Warren, Ryota Terada, Laura E Briggs, Colleen T Cole-Jeffrey, Wei-Ming Chien, Tsugio Seki, Ellen O Weinberg, Thomas P Yang, Michael T Chin, J¨ org Bungert, et al. Differential role of nkx2-5 in activation of the atrial natriuretic factor gene in the developing versus failing heart. Molecular and cellular biology, 31(22):4633–4645, 2011. [352] Ken Matsuoka, Yoshihiro Asano, Shuichiro Higo, Osamu Tsukamoto, Yi Yan, Satoru Yamazaki, Takashi Matsuzaki, Hidetaka Kioka, Hisakazu Kato, Yoshihiro Uno, et al. Noninvasive and quantitative live imaging reveals a potential stress-responsive enhancer in the failing heart. The FASEB Journal, 28(4):1870–1879, 2014. [353] Prabhu Mathiyalagan, Lisa Chang, Xiao-Jun Du, and Assam El-Osta. Cardiac ventricu- lar chambers are epigenetically distinguishable. Cell Cycle, 9(3):612–617, 2010. [354] Izhak Kehat, Federica Accornero, Bruce J Aronow, and Jeffery D Molkentin. Modula- tion of chromatin position and gene expression by hdac4 interaction with nucleoporins. Journal of Cell Biology, 193(1):21–29, 2011. [355] Karel van Duijvenboden, Bouke A de Boer, Nicolas Capon, Jan M Ruijter, and Vin- cent M Christoffels. Emerge: a flexible modelling framework to predict genomic regu- latory elements from genomic signatures. Nucleic acids research, 44(5):e42–e42, 2016. [356] Antoinette F van Ouwerkerk, Fernanda M Bosada, Karel van Duijvenboden, Matthew C Hill, Lindsey E Montefiori, Koen T Scholman, Jia Liu, Antoine AF de Vries, Bastiaan J Boukens, Patrick T Ellinor, et al. Identification of atrial fibrillation associated genes and functional non-coding variants. Nature communications, 10(1):1–14, 2019. [357] Shanshan Ai, Yong Peng, Chen Li, Fei Gu, Xianhong Yu, Yanzhu Yue, Qing Ma, Jinghai Chen, Zhiqiang Lin, Pingzhu Zhou, et al. Eed orchestration of heart maturation through interaction with hdacs is h3k27me3-independent. Elife, 6:e24570, 2017. [358] Carrie A Davis, Benjamin C Hitz, Cricket A Sloan, Esther T Chan, Jean M Davidson, Idan Gabdank, Jason A Hilton, Kriti Jain, Ulugbek K Baymuradov, Aditi K Narayanan, et al. The encyclopedia of dna elements (encode): data portal update. Nucleic acids research, 46(D1):D794–D801, 2018. [359] Malou van den Boogaard, LY Elaine Wong, Federico Tessadori, Martijn L Bakker, Lisa K Dreizehnter, Vincent Wakker, Connie R Bezzina, Peter AC‘t Hoen, Jeroen Bakkers, Phil Barnett, et al. Genetic variation in t-box binding element functionally affects scn5a/scn10a enhancer. The Journal of clinical investigation, 122(7):2519–2530, 2012. [360] Aibin He, Fei Gu, Yong Hu, Qing Ma, Lillian Yi Ye, Jennifer A Akiyama, Axel Visel, Len A Pennacchio, and William T Pu. Dynamic gata4 enhancers shape the chromatin landscape central to heart development and disease. Nature communications, 5(1):1–14, 2014. [361] Francesca Telese, Qi Ma, Patricia Montilla Perez, Dimple Notani, Soohwan Oh, Wenbo Li, Davide Comoletti, Kenneth A Ohgi, Havilah Taylor, and Michael G Rosenfeld. Lrp8- reelin-regulated neuronal enhancer signature underlying learning and memory forma- tion. Neuron, 86(3):696–710, 2015. 192 [362] Benjamin J Schmiedel, Divya Singh, Ariel Madrigal, Alan G Valdovino-Gonzalez, Brandie M White, Jose Zapardiel-Gonzalo, Brendan Ha, Gokmen Altay, Jason A Green- baum, Graham McVicker, et al. Impact of genetic polymorphisms on human immune cell gene expression. Cell, 175(6):1701–1715, 2018. 193
Abstract (if available)
Abstract
To better understand the disease hypertrophic cardiomyopathy (HCM), numerous techniques were utilized as well as the development of a novel assay. The driving factors of this body of work was to probe myocyte enhancer factor 2’s (MEF2) ability to alter chromatin structure and, as a result, gene expression upon hypertrophic activation. Due to some technical limitations of chromatin immunoprecipitation followed by sequencing (ChIP-Seq), a novel method was attempted which eliminated the use of antibodies to determine the binding positions of MEF2 genome wide. ? The third aspect of this work was to use a well known histone modification for activate gene expression (H3K27ac) and test the 2D (ChIP-Seq) and 3D interactions (HiChIP) being perturbed in the disease. These data were then used to determine if there was a molecular mechanism of 3D chromatin structural changes driving the up and down regulation of disease associated genes. This was the first example of using HiChIP in an HCM system and showed altered chromatin states in genes responsible for the disease. Specifically, these data showed the importance of MEF2 and histone deacetylase 4 (HDAC) in the natriuretic peptide a and b (NPPA/NPPB) locus both in the regulation of gene expression but how the chromatin dynamics can be perturb on a local scale which has been missed by HiC experiments. The use of HiChIP also allowed for the understanding of the locus on a much broader scale than had been previously reported using 4C experiments. Overall, the locus showed how HiChIP data could be applied to a well studied locus to further the understanding of the role of 3D interactions in gene expression regulation. ? Overall, these experiments were conducted in order to better understand the role of MEF2:HDAC interactions played in the regulation of chromatin structure and gene expression. The assay development proved a proof-of-concept for a new type of ChIP-Seq which would eliminate the need for the standard antibody approach. The addition of Tn5 to a specific chemical probe to bind a target would also allow for the assay to be multiplexed for numerous targets as well as have the potential to be reduced to the single-cell level. This tool could have a large impact on the field to better understand the roles of transcription factors in 3D chromatin confirmation and their importance in gene regulation. ? The computational approach highlighted the need for assays like HiChIP as the combination of ChIP-Seq and HiC on a computational level was not successful. In the final aspect of the projects, the goal of obtaining 3D chromatin changes between healthy and HCM was completed using H3K27ac HiChIP. The work focused on an essential locus in heart failure and showed the role of MEF2:HDAC interactions have in the expression of both NPPA and NPPB.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Using novel small molecule modulators as a tool to elucidate the role of the Myocyte Enhancer Factor 2 (MEF2) family of transcription factors in leukemia
PDF
Forkhead transcription factors regulate replication origin firing through dimerization and cell cycle-dependent chromatin binding in S. cerevisiae
PDF
The function of Rpd3 in balancing the replicaton initiation of different genomic regions
PDF
Exploring three-dimensional organization of the genome by mapping chromatin contacts and population modeling
PDF
Identification of novel androgen receptor target genes in prostate cancer
PDF
C. elegans topoisomerase II regulates chromatin architecture and DNA damage for germline genome activation
PDF
The role of Hic-5 in glucocorticoid receptor binding to chromatin
PDF
Genetic and molecular insights into the genotype-phenotype relationship
PDF
Using genomics to understand the gene selectivity of steroid hormone receptors
PDF
Quantitative modeling of in vivo transcription factor–DNA binding and beyond
PDF
Mapping 3D genome structures: a data driven modeling method for integrated structural analysis
PDF
Improved methods for the quantification of transcription factor binding using SELEX-seq
PDF
The relationship between DNA methylation and transcription factor binding in colon cancer cells
PDF
Integrating high-throughput sequencing data to study gene regulation
PDF
Transcriptional regulation of IFN-γ and PlGF in response to Epo and VEGF in erythroid cells
PDF
MSI2 level alters histone transcription rate in HepG2 cells
PDF
Forkhead transcription factors control genome wide dynamics of the S. cerevisiae replication timing program
PDF
Probing the genetic basis of gene expression variation through Bayesian analysis of allelic imbalance and transcriptome studies of oil palm interspecies hybrids
PDF
Identification of factors that underly genome-wide transcriptional repression during the development of the Caenorhabditis elegans germline.
PDF
Structural and biochemical analyses on substrate specificity and HIV-1 Vif mediated inhibition of human APOBEC3 cytidine deaminases
Asset Metadata
Creator
Cayford, Justin
(author)
Core Title
The development of targeted transcription factor transposition and understanding chromatin dynamics in hypertrophic cardiomyopathy
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Degree Conferral Date
2021-12
Publication Date
11/10/2021
Defense Date
03/24/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
3D chromatin,ChIP-seq,HCM,HDAC,HiChIP,hypertrophic cardiomyopathy,MEF2,NPPA,NPPB,OAI-PMH Harvest,Tn5
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Chen, Lin (
committee chair
), Aparicio, Oscar (
committee member
), Arnheim, Norman (
committee member
), Pratt, Matthew (
committee member
)
Creator Email
cayford@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC16659545
Unique identifier
UC16659545
Legacy Identifier
etd-CayfordJus-10211
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Cayford, Justin
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
3D chromatin
ChIP-seq
HCM
HDAC
HiChIP
hypertrophic cardiomyopathy
MEF2
NPPA
NPPB
Tn5