Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Measuring, modeling and identifying factors that influence eukaryotic DNA replication
(USC Thesis Other)
Measuring, modeling and identifying factors that influence eukaryotic DNA replication
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Measuring, Modeling and Identifying Factors That Influence Eukaryotic DNA Replication by Simon Robert Vincent Knott A Dissertation Presented to the Faculty Of The USC Graduate School University Of Southern California In Partial Fulllment of the Requirements for the Degree Doctor Of Philosophy (Computational Biology and Bioinformatics) May 2011 Copyright 2011 Simon Robert Vincent Knott Dedication I dedicate this work to my family for the undying support they have provided me both prior to and throughout my academic life. I have never felt alone in this process and have always known that, regardless of how things were progressing, my family was standing with me. To my parents Verner and Katherine, for their support and encouragement throughout my life. They have dedicated their lives to their children and we would not be where we are or the people we are today without them. To my wife Audrey, for her patience throughout this process. I thank you for under- standing what needed to be done and for giving me your loving support while I did it. ii Acknowledgments I would like to rst acknowledge my supervisors Dr. Oscar Aparico and Dr. Simon Tavar e. Prior to coming to USC I planned to work strictly in the eld of Computa- tional Biology. While moving through the preliminary course-work here my decision to pursue a career in computational biology was armed, however I gained a new found interest in molecular mechanisms. I realized that to venture into molecular biology would be an overwhelming task and that I would be starting from scratch in terms of working knowledge at the bench. After a very brief discussion, Dr. Aparicio agreed to me joining his lab and within the rst ve minutes of working there I broke his lab equipment....and he laughed. From that point on his patience has never been in question. His oce door has always been open for his students and his home phone number posted by the lab phone for lab members to use at will. I thank Dr. Aparicio for guiding me through this process and especially for giving me the resources to ask and answer questions thoroughly without restrictions. Dr. Tavar e is at USC for approximately half of the year. However, his work ethic and dedication to his students (in combination with video chat) has allowed him to act as a full-time supervisor to me throughout my PhD. Like with Dr. Aparicio, it takes an adventurous professor to agree to supervise a computational student taking on a hybrid (computational and molecular) role. Although a thesis with these two iii aspects generally moves slower than a purely computational one, he has been nothing but enthusiastic and supportive of the extra experiments it has taken to tie up lose ends. When computational problems arose he was always able to help guide me to a simple elegant solution. His knowledge of statistics and the scientic process has been inspiring to say the least. In addition to Drs. Aparico and Tavar e I would like to thank the other members of my thesis committee (Dr. Andrew Smith and Dr. Peter Laird). Dr. Smith's door has been open for discussions throughout this process and his guidance has added signicantly to the work discussed in Chapter III of this thesis. I've had the opportunity to give talks in front of Dr. Laird on three occasions and on each of these he has asked thought provoking questions that have guided subsequent experiments and computational designs. I would like to thank the past and present members of the Aparicio Lab (Chris Vig- giani, Shawn Szyjka, Yuan Zhong, Tittu Nellimoottil, Jared Peace and Zach Ostrow) for creating a fun atmosphere to do science in. When joining the Aparicio lab, as mentioned above, I had almost no working knowledge of the bench and Chris, Shawn and Yuan all helped guide me in this aspect. Also, as soon as I started there I was handed excellent data from all three, which allowed me to begin the computational side of my thesis immediately. Tittu is also performing a hybrid thesis and we have shared ideas, successes and frustrations with working at the bench throughout this process. Finally, both Jared and Zach have both been instrumental in completing the work discussed in Chapter VI of this thesis. I'd also like to thank our neighbors (the Arbeitman Lab, specically Michelle Arbeitman, Justin Dalton and J.P Masly) for their helpful and fun discussions (which usually occurred over beers on Fridays). Finally, I'd like to thank the computational biology students that entered the program iv with me in 2006. Together we went through an intensive set of course-work and I would not have completed it without them. When motivation ran low with any of us, the others were there for encouragement. In particular, I'd like to thank Kjong Lehmann and Sudeep Srivastava for patiently answering all of my Unix questions. Also included in the group of peers that helped me tremendously throughout my PhD is Reza Kalhor. Reza performed some of the analysis described in Chapter VI and, more importantly, was instrumental in getting the 4C experiments to work in yeast. v Table of Contents Dedication ii Acknowledgments iii List of Tables ix List of Figures x Abstract xiii Chapter 1: Introduction 1 1.1 DNA Replication Overview . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.1 Sequence Specicity . . . . . . . . . . . . . . . . . . . . . . . 1 1.1.2 Replication Protein Loading and Modication . . . . . . . . . 3 1.2 S-phase Impediments . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Replication Origin Timing . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4 Measuring DNA Replication and Other Genomic Events . . . . . . . 12 1.4.1 Site Specic Replication Origin Function . . . . . . . . . . . . 12 1.4.2 Genome-wide Replication Analysis . . . . . . . . . . . . . . . 15 1.4.3 Chromatin Analysis . . . . . . . . . . . . . . . . . . . . . . . . 17 1.5 Modeling DNA Replication . . . . . . . . . . . . . . . . . . . . . . . . 19 1.6 Factors Aecting Replication Origin Firing . . . . . . . . . . . . . . . 21 1.6.1 Chromatin Structure . . . . . . . . . . . . . . . . . . . . . . . 21 1.6.2 Chromatin Modications . . . . . . . . . . . . . . . . . . . . . 24 1.6.3 Transcription . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 1.7 Chapter Summary and Thesis Outline . . . . . . . . . . . . . . . . . 28 Chapter 2: Strategies for Analyzing BrdU-IP-chip Datasets 30 2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.2 Within-Array Normalization . . . . . . . . . . . . . . . . . . . . . . . 33 2.3 Between-Array Normalization . . . . . . . . . . . . . . . . . . . . . . 42 2.3.1 Location Normalization . . . . . . . . . . . . . . . . . . . . . 42 2.3.2 Scale Normalization . . . . . . . . . . . . . . . . . . . . . . . . 42 2.4 Peak Detection and Quantication . . . . . . . . . . . . . . . . . . . 45 vi 2.4.1 Detection of Enriched Regions . . . . . . . . . . . . . . . . . . 46 2.4.2 Detection of de novo Replication Peaks . . . . . . . . . . . . . 47 2.4.3 Peak Filtration Based on a priori Information . . . . . . . . . 50 2.5 Validation of Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 Chapter 3: The Spatio-Temporal Map of Yeast DNA Replication 56 3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2 A High-Resolution Temporal Prole of DNA Replication Throughout S-phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 3.3 Replication Origin Firing Rates in HU and Cdc45 Binding in G1-phase are Predictive of Replication Timing Schedules . . . . . . . . . . . . . 60 3.4 Unperturbed Transcription Does Not In uence Replication Timing or Direction in S. cerevisiae Cells . . . . . . . . . . . . . . . . . . . . . . 65 3.5 Chromatin Organization Correlates with Replication Timing . . . . . 69 3.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Chapter 4: Modeling DNA Replication 77 4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 4.2 A Stochastic Model Of DNA Replication . . . . . . . . . . . . . . . . 79 4.3 Model Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.1 Cell Asynchrony . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.3.2 Concentration of Limiting Factor . . . . . . . . . . . . . . . . 85 4.3.3 Distribution of Limiting Factor . . . . . . . . . . . . . . . . . 86 4.3.4 Firing Times, Fork Movement and Termination . . . . . . . . 87 4.4 Model Fitting and Selection . . . . . . . . . . . . . . . . . . . . . . . 88 4.5 Analysis of the Fitted Model . . . . . . . . . . . . . . . . . . . . . . . 90 4.6 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 Chapter 5: Rpd3 Co-regulates Replication Origin Firing and Flanking Tran- scriptional Activity 97 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 5.2 Rpd3 Regulates the Initiation of Many Replication Origins . . . . . . 99 5.3 Rpd3S Signicantly Modulates the Initiation of Only a Few Selected Replication Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102 5.4 Rpd3L Mediates the Rpd3-dependent Eect on Replication Origin Firing106 5.5 Deletion of Putative Rpd3L-Targeting Factors Deregulates Few Rpd3L- Regulated Replication Origins . . . . . . . . . . . . . . . . . . . . . . 109 5.6 Rpd3L-Regulated Replication Origins are Locally Associated with Sin3- Rpd3-Regulated Transcription and Chromatin Binding . . . . . . . . . . . . . . . . . . 110 5.7 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 vii Chapter 6: Rpd3 Co-regulates Replication Origin Firing and Flanking Tran- scriptional Activity 114 6.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114 6.2 Fkh1 and Fkh2 Regulate the Firing Dynamics of Non-Centromeric Replication Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 6.3 Fkh1 and Fkh2, Regulate CDC45 Binding at Forkhead-Regulated Repli- cation Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 6.4 Many Replication Origins are Bound by Fkh1 and Fkh2 . . . . . . . 123 6.5 Forkhead-Regulated Origins are not Spatially Associated with Forkhead- Regulated Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 6.6 Fkh1 and Fkh2 In uence Nucleosome-Phasing Around the ACS . . . 133 6.7 Fkh1 and Fkh2 Regulate Long-Range Chromatin Interactions . . . . 136 6.8 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 Chapter 7: Discussion 146 7.1 Summary of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 147 7.2 Interpretation of Results . . . . . . . . . . . . . . . . . . . . . . . . . 150 7.3 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 Bibliography 155 Appendices 176 Appendix A: Supplemental Tables . . . . . . . . . . . . . . . . . . . . . . . 176 Appendix B: Supplemental Figures . . . . . . . . . . . . . . . . . . . . . . 214 viii List of Tables A.1 Replication Origins that Fire in HU in WT Cells (BrdU-IP-chip) . . . 176 A.2 Replication Origins that Fire in HU in WT Cells (BrdU-IP-seq) . . . 183 A.3 BrdU Peaks not Associated with OriDB Replication Origins . . . . . 193 A.4 Replication Origins Deregulated in rpd3 Cells . . . . . . . . . . . . 195 A.5 Replication Origins Deregulated in Rpd3S Mutants . . . . . . . . . . 198 A.6 Replication Origins Deregulated in dep1 Cells . . . . . . . . . . . . 199 A.7 Rpd3-Dependent Histone Deacetylation at Rpd3-Regulated Replica- tion Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202 A.8 DNA Binding Factors Associated with Rpd3L-Regulated Origins . . . 203 A.9 Replication Origins Analyzed for Forkhead-Regulation . . . . . . . . 204 ix List of Figures 2.1 Illustration of Loess Normalization for BrdU-IP-chip Data . . . . . . 34 2.2 Testing ChIP-chip Normalization Methods . . . . . . . . . . . . . . . 36 2.3 Within-Array Normalization . . . . . . . . . . . . . . . . . . . . . . . 38 2.4 Autocorrelation Analysis of Unormalized vs. Normalized BrdU-IP-chip Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 2.5 Location Normalization of BrdU-IP-chip Data . . . . . . . . . . . . . 44 2.6 Scale Normalization of BrdU-IP-chip Data . . . . . . . . . . . . . . . 46 2.7 Identication of Enriched Regions in BrdU-IP-chip Data . . . . . . . 47 2.8 Validation of BrdU-IP-chip Normalization and Analysis . . . . . . . . 53 3.1 BrdU-IP-Seq Dataset Production . . . . . . . . . . . . . . . . . . . . 58 3.2 BrdU-Pulse-IP-seq Chromosomal Plot . . . . . . . . . . . . . . . . . . 59 3.3 Replication Origin and TER Predictions . . . . . . . . . . . . . . . . 61 3.4 Replication Timing vs. Replication Factor Binding . . . . . . . . . . 63 3.5 Transcription vs. Replication . . . . . . . . . . . . . . . . . . . . . . 67 3.6 Chromatin Structure vs. Replication . . . . . . . . . . . . . . . . . . 73 4.1 A Model of DNA Replication . . . . . . . . . . . . . . . . . . . . . . 83 4.2 Model Error and Posterior Parameter Distributions . . . . . . . . . . 91 4.3 Real vs. Simulated Data . . . . . . . . . . . . . . . . . . . . . . . . . 93 4.4 Identication of Fork Pause Site . . . . . . . . . . . . . . . . . . . . . 95 x 5.1 Rpd3 Regulates Many Replication Origins . . . . . . . . . . . . . . . 101 5.2 RPD3L is Responsible for Most RPD3 Replication Origin Regulation 103 5.3 Global Analysis of WT Replication Proles Compared to those of RPD3, RPD3S and RPD3L Mutants . . . . . . . . . . . . . . . . . . 105 5.4 Correlation of RPD3, RPD3S and RPD3L Replication Proles . . . . 108 5.5 Rpd3L-Regulated Replication Origins Colocalize with Rpd3-Regulated Genes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 6.1 Fkh1 and Fkh2 Regulate Many Replication Origins . . . . . . . . . . 117 6.2 WT vs. fkh1fkh2 Replication Timing Schedules . . . . . . . . . 119 6.3 ORC, MCM and CDC45 Binding in WT and fkh1fkh2 Cells . . 122 6.4 Many Replication Origins are Bound by Fkh1 and Fkh2 . . . . . . . 124 6.5 Genes Flanking Forkhead-Regulated Origins do not Show Correspond- ing Forkhead-Regulation . . . . . . . . . . . . . . . . . . . . . . . . . 128 6.6 RNA-Poll-II Binding is not Altered at Forkhead-Regulated Replication Origins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 6.7 Nucleosome Positioning Around the ACS in WT and fkh1fkh2 Cells . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 6.8 Fkh1 and Fkh2 Regulate Long Range Chromatin Interactions . . . . 139 6.9 Fkh1 and Fkh2 Regulate Replication Origin Firing Though Formation of Domain Swapped Dimers . . . . . . . . . . . . . . . . . . . . . . . 141 B.1 Testing ChIP-chip Normalization Methods on Noisy Data . . . . . . . 214 B.2 Within-Array Normalization on a \Noisy" rpd3 Dataset . . . . . . . 215 B.3 Symmetry Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 216 B.4 rpd3 Probes Plotted in the Chromosomal Plane . . . . . . . . . . . 217 B.5 Real and Simulated BrdU-IP-seq Pulse Experiments . . . . . . . . . . 218 xi B.6 Density of Nucleosome Positions around All Origins and in Two Repli- cates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 B.7 Budding Indices of WT Cells . . . . . . . . . . . . . . . . . . . . . . . 235 B.8 BrdU-Pulse-IP-seq Real vs. Simulated Read Counts . . . . . . . . . . 236 B.9 WT vs rpd3 Replication Proles . . . . . . . . . . . . . . . . . . . . 237 B.10 Early S-phase Replication Proles Identify Rpd3S- and Rpd3L Regu- lated Origins. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 B.11 Global Comparison of BrdU Peak-heights in Rpd3S and Rpd3L Mu- tants with Corresponding Wild-Type Peaks. . . . . . . . . . . . . . . 242 B.12 WT and Fkh-mutant BrdU-IP-seq Proles in HU . . . . . . . . . . . 243 B.13 WT and fkh1fkh2 BrdU-Pulse-IP-chip proles . . . . . . . . . . 259 xii Abstract For cells to proliferate, the genome must be replicated exactly once per cell cycle in a timely and accurate manner. Making this task dicult are multiple other genomic processes, such as transcription and DNA repair, that are concurrently operating on the same genomic template. Replication initiates at specic loci called replication ori- gins that must undergo a series of protein loadings before they can begin to replicate. Although this loading schedule takes place at all origins, individual origins re at distinct and conserved times during S-phase. It has been suggested that origin ring schedules are dened by their propensity to attract rate limiting replication factors from limited pools (where origins with higher propensities replicate earlier and origins with lower propensities replicate later). This model has not been validated and, fur- thermore, the factors determining an origin's propensity to attract replication factors remains poorly understood. In higher eukaryotes, replication timing has been linked to epigenetic inheritance and genomic stability. Thus, determining which factors dic- tate origin timing schedules is important for understanding the mechanisms driving development and healthy cell proliferation. This current work investigates the molecular kinetics that drive the S. cerevisiae replication schedule and also begin to uncover what coordination they exhibit with concurrently operating genomic processes. To understand better these timing dynam- xiii ics, we begin by developing molecular and computational tools to analyze replication timing genome-wide. Using these tools we produce a novel dataset that represents the highest delity temporal map of DNA replication to date. Next, to identify novel candidate limiting factors to DNA replication, we describe and analyze (in the context of this temporal map) two additional datasets designed to capture both pre-S-phase replication protein loading and global origin eciencies. Through analysis of these data we determine that, in G1-phase, the earliest replicating origins show a high propensity to attract Cdc45 (a replication factor that is limited in its nuclear concentration G1-phase). Following this, we devise and computationally implement a detailed theoretical model of DNA replication to test the hypothesis that origin ring dynamics (and hence genome-wide replication times) are determined by their ability to recruit replication factors from limited pools. After validating this model we identify factors that, in unperturbed cells, are correlated with origin ring dynamics. These include nucleosome positioning around the origin and the clustering of origins in the nucleus in late G1-phase. Previous work has demonstrated that histone acetylation around an origin promotes its early replication. Specically, others have shown that when the histone-deacetylase Rpd3 is removed from the cell several origins increase in their activity. To test the scope of Rpd3's action at origins, we have analyzed Rpd3 mutant cells genome-wide for their origin replication activities. We determined that approximately one-third of origins are suppressed by Rpd3 action. By targeting the individual complexes that Rpd3 operates in, we determined that its action at origins is through its role in transcriptional repression at the gene promoter, as opposed to its broad action as a suppressor of spurious transcription events. Furthermore, we demonstrate that the regions surrounding Rpd3-regulated origns are deacetylated by Rpd3 and also that xiv these regions are enriched for Rpd3 binding and Rpd3-regulated genes. Finally, we introduce the forkhead transcription factors Fkh1 and Fkh2 as two novel regulators of origin function. We demonstrate that Fkh1 alone regulates 50 origins and that in cells where both Fkh1 and Fkh2 action is removed, over one-half of all ori- gins show deregulation. Furthermore, these factors are the rst to be identied that have both repressive and excitatory action at origins ( 100 origins are activated by Fkh1 and Fkh2 while 80 are repressed; Fkh-excited and -repressed, respectively). As mentioned above, Cdc45 association at origins in G1-phase is predictive of their function. We demonstrate that in fkh1fkh2 cells this factor is depleted at Fkh- excited origins. Furthermore, we demonstrate that Fkh-excited origins are not found near the centromere (CEN), in contrast Fkh-repressed origins include many origins that localize at the CEN. Finally, we determine that Fkh1 and Fkh2 likely have their action at origins by regulating the formation of long-range chromatin interactions. Furthermore, we show evidence suggesting (that to regulate these interactions) indi- vidual forkhead proteins bind at dierent origins and then dimerize to bring these origins together in the nucleus. xv Chapter 1: Introduction 1.1 DNA Replication Overview In order for cells to move from one cell cycle to the next their genetic blueprint (DNA) must be duplicated completely and accurately in a timely manner. Replica- tion proteins load and initiate replication at a set of loci called replication origins that are distributed throughout the genome. As replication proceeds a subset of the proteins responsible for initiation move away from their respective origins in a bidirectional manner, unwinding and replicating the chromosome as they progress (replisome). Eventually, the replication forks from neighboring origins converge, ter- minating replication of the region. This process occurs on a genomic template that is actively engaged in other important processes such as transcription and repair. In order for these tasks to proceed concurrently, coordination between their nuclear and chromatin environments is essential. 1.1.1 Sequence Specicity In eukaryotes, origins vary in size and sequence specicity between organisms. In S. cerevisiae, origins were rst identied as Autonomously Replicating Sequences 1 (ARSs) that when inserted into plasmids harboring selectable genetic markers and transformed into cells lacking those markers allowed those cells to propagate under se- lective growth conditions [247]. The ARSs identied thus far are found predominantly in intergenic regions where they are protected from active transcription [175,282]. Se- quence dissection of conrmed ARSs has revealed a T-rich ARS Consensus Sequence (ACS) that is required for their function [27, 175, 235]. This sequence serves as the main binding site for the Origin Recognition Complex (ORC), a six-subunit protein complex required for replication initiation [18, 48, 65, 141]. In addition to the ACS, several other surrounding loci have been found to mediate ARS activity. These ele- ments are housed within the B domain which constitutes the 100 bp downstream of the ACS [35]. Although the elements within the B domain are not conserved at all origins, they have been found in various combinations at each of the ARSs that have been studied in detail thus far [175]. Element B1 is found nearest the ACS and appears to aid in ORC binding. In addition, B1 has been implicated in the binding of additional replication factors as some base substitutions within these loci signicantly reduce ARS function without hindering ORC binding [201]. Upstream of B1 is element B2, which has been implicated in DNA unwinding, as it binds the single stranded DNA (ssDNA) binding protein RPA and is contained within a region that unwinds easily (DNA unwinding element; DUE) [158]. Further upstream of B1 is element B3, which is found only at a subset of ARSs. B3 binds the transcription factor Abf1 (ARS Binding Factor 1), whose presence near origins enhances their ability to initiate replication. Abf1 stimulates origin initiation from distances up to 1000 kbp, however its mechanism for doing so remains unknown [270,271]. Finally, furthest upstream of the ACS lies the B4 element, which has been shown to increase origin activity through a yet unknown mechanism [94]. 2 In S. pombe, ARS stability assays have failed to identify an ACS [39,44,54,121,228]. As in S. cerevisiae, all ARSs are found in intergenic regions. In fact, approximately 50% of S. pombe intergenic regions bind ORC and appear to function as origins [44]. The determining factor of whether or not such a region can act as an origin appears to be the distance between its anking genes as well as the abundance of adenine (A) and thymidine (T) residues it contains (its AT content) [44]. As in S. pombe, neither ARS stability assays nor surveys of replication factor binding sites have succeeded in identifying a unique ACS in multicellular organisms [129,151, 160]. In metazoans, only 30% of the sites at which ORC binds function as origins. These sites are, however, enriched for high AT-content and CpG islands (which are found at promoter regions of active genes) [129,151,160]. Of the functional replication origins that have been identied in higher eukaryotes two classes have emerged. The rst class consists of specic sites of initiation (similar to S. cervisiae origins) that produce a localized initiation event (e.g. the Drosophila chorion locus, the human lamin B2 origin and the human B-globin locus) [1, 124, 185]. The second class form broad regions throughout which initiation events are dispersed (e.g. the Chinese hamster dihydrofolate reductase locus) [50]. 1.1.2 Replication Protein Loading and Modication Although the size and sequence specicity of origins vary signicantly between organ- isms, the set of core proteins required for replication initiation and fork progression is highly conserved. ORC is the most characterized of such factors. In S. cerevisiae and D. melanogaster ORC appears to be bound at origins throughout the cell cy- cle [12,187,222,254]. In contrast, the ORC complex is cleared from the DNA during 3 mitosis, only to be recruited again at early G1-phase in S. pombe, X laevis and mam- malian cells [32, 128, 172, 211, 256, 281]. ORC binding depends on the association of Orc1 and ATP [14,18,37,125]. Interestingly, ATP-hydrolysis is not required for ORC binding; in fact it is inhibited upon ORC binding DNA [37,125] When ORC is bound to an origin at the end of mitosis, two proteins (Cdc6 and Cdt1) are recruited to that origin separately [40, 152, 254]. Like Orc1, Cdc6 binds ATP but does not require hydrolysis to bind. However, ATP-hydrolysis is required for its functional role in replication at the origin once bound [59, 193, 275]. The requirements for Cdt1 recruitment vary from one organism to the next. In Xenopus egg extracts only ORC is required for its association with origins, whereas in S. pombe both ORC and Cdc6 are required for binding [152,181]. Both Cdc6 and Cdt1 appear to act independently to recruit Mcm-2-7 complexes to origins [40, 181]. The Mcm2-7-complex forms a donut-like hexameric structure that has ATP-hydrolysis-dependent helicase activity (ability to unwind double stranded DNA; dsDNA) [26, 38, 98, 119, 135, 224, 231, 278, 291]. This property, along with its essential role in initiation and fork movement suggests that it acts at the replication fork to unwind DNA and provide single stranded templates for synthesis [12, 134]. The assembly of ORC, Cdc6, Cdt1 and the Mcm2-7 complex together is known as DNA licensing. As a group, bound to a replication origin, this set of proteins is referred to as a pre-replicative complex (pre-RC; reviewed in [17]). As cells proceed through G1-phase the pre-RC matures into the pre-Initialization- Complex (pre-IC) with the recruitment of Cdc45, Sld3 and Dpb11 (reviewed in [49]). Cdc45 loads prior to DNA unwinding and is required for the loading/assembly of several polymerases onto the DNA template [10,11,164,165,165,272,296,296]. Cdc45 associates with several DNA polymerases, MCM and RPA (which acts to prevent 4 rewinding of DNA back into the dsDNA) [11, 113, 164, 219, 263]. These associations have led to the hypothesis that Cdc45 acts as a central core around which the repli- cation fork complex is built. Sld3 associates with Cdc45 and this is required for initiation [113]. Dpb11 is hypothesized to recruit DNA Pol to origins as they co- immunoprecipitate [157]. Furthermore, defects in Dpb11 cause hypersensitivity to DNA-damaging agents [13,273]. Two additional proteins that have been implicated in initiation control are Mcm10 and Sld2. Evidence suggest that Mcm10 is involved in both initiation and fork elongation as it shows genetic interactions with ORC, members of the Mcm2-7 complex, Cdc45 and DNA Pol [90,117,162]. Sld2 interacts with Dpb11 and is also required for DNA initiation [112,273]. The assembly and initiation of the pre-RC/IC is carefully controlled by two kinases, Dbf4-Dependent Kinase (DDK) (Cdc7-Dbf4) and Cyclin-Dependent Kinase (CDK) (reviewed in [17]). DDK acts at the origin upon entry into S-phase and its recruitment there requires ORC in S. cerevisiae and the Mcm2-7 complex in Xenopus egg extracts [104, 189]. Once associated with an origin, DDK helps to recruit Sld3, Gins and Cdc45 [286]. In vitro evidence suggests that DDK also phosphorylates DNA Pol and Cdc45 [84,183]. Furthermore, it has recently been demonstrated that another essential role for DDK is to relieve an inhibitory activity of Mcm4 through phosphorylation [233]. In yeast there is only one Cdk that controls the cell cycle (Cdc28) whereas several CDKs function in multicellular organism. CDKs drive S-phase when they associate with the appropriate S-phase specic cyclins. In S. cerevisiae, Cdc28 acts with two cyclins (Clb5 and Clb6). In X. laevis CDK association with origins is dependent on both ORC and Cdc6 [69]. The CDKs have been shown to have their action after 5 Mcm2-7 loading but before Cdc45 binding where they target ORC, Mcm2-7 and Cdc6 for phosphorylation [272, 295]. The CDK activity is essential for replication initia- tion; however when the phosphorylations on ORC, Mcm2-7 and Cdc6 are prevented by site directed mutagenesis replication proceeds unhindered, indicating that a yet unidentied protein requires CDK phosphorylation for replication initiation to take place [52,58,103,145,178,191,265]. Although the primary role for CDKs at origins is to allow initiation and replication to proceed, a secondary and perhaps equally important role is preventing origins from initiating replication more than once in a given cell cycle (re-replication). Both initiation and prevention of re-replication require high levels of CDK in the nucleus [43, 47, 93, 194]. When CDK is depleted during G2/M-phases re-replication occurs [43,182,194]. CDKs inhibit re-replication by phosphorylating ORC, Mcm2-7 and Cdc6 [107, 178]. Both of Orc2 and Orc6 are phosphorylated by CDK after the beginning stages of S-phase at sites that when removed promote re-replication [178]. Although the exact mechanism by which these modications prevent replication remains elusive, some evidence (in Xenopus egg extracts) suggest that they cause a release of ORC from DNA [92, 212]. Furthermore, ORC is also bound by Clb5, which inhibits ORC from forming new pre-RCs [277]. CDK mediated phosphorylation of the Mcm2-7 complex appears to have varying outcomes across organisms that appear to inhibit replication. In S. cerevisiae, CDK targeting of Mcm2-7 proteins results in their exportation from the nucleus [177]. In mammals, phosphorylation removes Mcm2-7 helicase activity [99]. Also, in X. laevis, CDK-mediated phosphorylation of Mcm4 correlates with its release from the chromosome [133]. Finally, like the Mcm2-7 complex in yeast, Cdc6, once targeted by CDKs appear to be depleted from the nucleus either though degradation (in S. 6 cerevisiae and S. pombe) or nuclear exportation (mammals) [52,68,106,218]. The initial melting of origin DNA is dependent on Cdc45 and RPA [272]. Furthermore the DNA polymerases responsible for initial synthesis are likely recruited through interactions with Cdc45 or RPA [11,164,255,272,296]. The rst nucleic acid synthesis is carried out by DNA Polymerase (DNA Pol ). This polymerase synthesizes an RNA primer molecule from which DNA replication can proceed bi-directionally away from [95]. Although DNA Pol performs the rst nucleic acid synthesis, its loading requires DNA Polymerase (DNA Pol ) at the origin [157,164]. Due to the anti-parallel nature of DNA (one strand runs 5 0 ! 3 0 while the other runs 3 0 ! 5 0 ) at given replication fork, two mechanisms of synthesis must take place. As the helicase unwinds DNA, the strand that is 3 0 ! 5 0 with respect to fork movement is replicated continuously (leading strand synthesis). In contrast, because DNA must be replicated in the 3 0 ! 5 0 direction, the second strand is replicated in a discontinuous manner (lagging strand synthesis) [95]. First, DNA Pol synthesizes a short RNA fragment in the opposite direction of the fork. Then the remainder of the gap between this primer and the previously synthesized primer is lled with DNA Polymerase (DNA Pol ) [95]. Once at the upstream RNA fragment, DNA Pol displaces the RNA to form a single stranded ap. This ap is subsequently bound by RPA and then cleaved by Fen1 (possibly in coordination with endonuclease Dna2). Following this the resulting nick in the nacent strand is lled with DNA ligase I [71]. DNA Pol can only synthesize 30 nucleotides before it falls o the template strand. Furthermore, it lacks 3 0 ! 5 0 exonuclease activity that other polymerases use to proofread newly replicated sequences [71]. Due to these constraints, after priming, a DNA-polymerase switch between DNA Pol and a high delity polymerase occurs. Evidence suggests that this switch is to DNA Pol on the leading strand and DNA 7 Pol on the lagging strand [179]. To ensure high processivity of replication, a third complex (PCNA) is included at the replisome. This complex forms a ring shaped structure that surrounds the template strand and binds DNA polymerase to act like a sliding clamp. PCNA is able to surround DNA when RFC hydrolyses ATP to temporarily open the clamp and allow it to clamp onto DNA [154]. Several processes that dont directly involve DNA-synthesis must also occur for DNA replication to take place. Due to the helical structure of DNA, as the fork moves forward the chromosomal regions preceding it become supercoiled and the replicated sister chromatids behind it become catenated. To solve this problem Topoisomerases I and II (TopI and TopII, respectively) relax these damaging structures with a series of cuts and ligations (TopI and II can relax supercoils but only TopII can remove precate- nates) [147]. Furthermore, TopII has been implicated at replication termination sites where converging replication forks collide and form precatenates and catenates [61]. Also, to identify sister chromatid pairs for proper segregation after DNA synthesis, the two daughter DNA molecules remain bound to one another by the protein complex cohesion (Ssc1-4, Eco1, Smc1,3), DNA Polymerase , Ctf8,18, Dcc1, Pds5 and Rfc2- 5 [232]. The cohesion complex has also been implicated in initiation; in Drosophila ORC binding shows high correlation with cohesion localization [151]. 1.2 S-phase Impediments The DNA damage response pathways are an active area of research, but are not a focus of this thesis. However endogenous impediments to replication are analyzed and many of the experiments are performed during an intra-S-phase block to replication, so each of these topics is outlined below. 8 In the healthy cell, replication forks still need to contend with protein-DNA complexes that can be either relatively static (site-specic DNA binding proteins) or active (e.g. RNA-Polymerase). A prime example of this is the frequently transcribed rDNA locus on S. cerevisiae Chromosme XII. When forks reach the rDNA locus, replication fork pausing takes place (presumably to allow ongoing transcription to complete) and this pausing appears to be guided by Tof1 and Csm3 [31, 167]. Tof1 is also required for pausing at tRNAs and CENs [88]. It co-immunoprecipitates with Mcm4 indicating that it may act to slow the helicase [173]. To traverse such loci after pausing, the cell has evolved proteins such as the DNA helicase Rrm3 and the kinase Mec1. Without Rrm3 the cell displays excess pausing, increased inter-chromosomal recombination and breakage at these sites [100, 260]. Mec1 also appears to play a role at sites of slow replication. When this protein is deleted cells show a prolonged S-phase and increased breakage [36]. In addition to endogenous impediments, replication may be required in the presence of exogenous threats such as nucleotide depletion and UV damage. These events can be mimicked in the lab with hydroxyurea (HU) and methyl-methane-sulfonate (MMS). HU depletes the nucleotide pool and MMS methylates purines and can cause DNA breaks. To deal with these threats the cell has evolved the intra-S-phase checkpoint. This pathway has two branches, the rst of which deals with stress (such as the depletion of nucleotides by HU) and another that deals with DNA damage (as is encountered under exposure to UV or MMS). In both pathways there are sensors that detect genomic distress, adaptors that activate a phosphorylation cascade, and eectors that amplify the signal cascade. In S. cerevisiae, Mec1 and Tel1 play the major roles in sensing DNA damage or replication stress. In mec1 cells, chromosomal abnormalities form in the absence 9 of HU and MMS [36]. Mec1 responds to both replication stress and DNA damage and Tel1 responds to dsDNA breaks and (to a lesser extent) DNA damage [102,264]. The Rad17-Mec3-Ddc1 complex is also required to initiate the damage checkpoint. This complex associates with ssDNA in an RPA-dependent manner [153]. The ss- DNA specicity is likely because ssDNA forms when the replication fork encounters damaged DNA or is inhibited by decreased nucleotide pools [258]. When Mec1 is activated by DNA damage (MMS) it phosphorylates Rad9 and when it responds to replication stress (HU) it activates Mrc1 [4,60,186,266]. In both instances the Rad53 kinase is eventually activated [220, 249]. Mrc1 phoshorylation is required to prevent late origin ring in the presence of HU [4]. Furthermore multiple lines of evidence indicate that Mrc1 is responsible for fork maintenance in the presence of HU [116,261]. Rad53, when activated, drives transcription of the DNA repair genes, inhibits late origin ring, stabilizes the replication fork and inhibits cell cycle progression [6,144, 221, 234, 240, 257, 274]. Furthermore, phoshorylated Rad53 can auto-phosphorylate other Rad53 molecules, leading to amplication of the checkpoint signal [250]. It is hypothesized that Rad53 prevents late origin ring in the presence of HU by phospho- rylating Dbf4 which leads to the dissociation of DDK complexes from origins [56,276]. Interestingly, when cells are left in the presence of HU for long periods of time they undergo the same origin ring schedule as in normal conditions, but with an overall delay in replication (i.e. late origins do eventually re) indicating that Rad53 acts at both early and late origins to delay initiation [7]. 10 1.3 Replication Origin Timing Although all functioning origins must follow the same protein-loading and modica- tion schedule to initiate replication, each has a unique and conserved time during S-phase where it shows the greatest propensity to re. Some origins initiate replica- tion promptly upon entry into S-phase while others are delayed in their initiation, which can lead to their passive replication by neighboring origins [287]. However, given unlimited time (i.e. if passive replication is prevented) most origins eventually re [122,223,269]. The reasons why cells have evolved to have origin-specic ring probabilities and ring schedules remain unclear. One explanation for this in metazoans (where a denitive link between replication timing and transcription activity has been demonstrated) is that replication is mechanistically connected with the establishment and inheritance of transcriptional states (reviewed in [78]). Under this theory, when replication oc- curs, distinct chromatin states (which dictate transcriptional activity) are themselves replicated in a timing-dependent manner resulting in their inheritance into the newly replicated sister chromatids. A second hypothesis is that cells have evolved to ensure replication initiates most frequently from sites that minimize the probability of collisions between active repli- cation and transcription complexes (discussed above and reviewed in [126]). These events can hinder both processes and cause genomic instability [197]. Under either assumption, understanding the mechanisms by which origin activities are regulated is an important step towards understanding functional and, subsequently altered, cell proliferation. It has been hypothesized that origin eciencies and ring times are determined by the 11 ability of individual origins to attract replication proteins from limited pools within the nucleus (we refer to this ability as propensity to re orPF ; [205]). In this model, origins with the highest PF accumulate the majority of the limiting factors in early S-phase allowing only these origins to re. As the replicons associated with these early origins terminate their limiting factor is released and free to be recruited to unred origins. This allows less ecient origins, that have not been passively replicated, to re under a later and more variable timing schedule. Which replication factors may act as limiting factors in such a system remains unknown although several candidates have been proposed (Cdc7 and Mcm2-7 [190, 288]). Furthermore, we have yet to elucidate the exact mechanism by which origins display dierential PF s. 1.4 Measuring DNA Replication and Other Ge- nomic Events 1.4.1 Site Specic Replication Origin Function The replication schedule is highly dependent on origin ring dynamics. In order to understand these dynamics one must rst know where origins are located in the genome. A rst assay to test for origin activity at specic loci is the ARS stability assay [247], in which, discrete regions of the genome are placed in vectors harboring a centromere (CEN) and a selectable marker. Cells lacking the selectable marker, when transformed with those vectors, are only able to grow on selectable media if the inserted genomic region contains an ARS. Thus, by strategically walking along the chromosome at suciently high resolution, each segment can be analyzed for activity by monitoring its corresponding transformants for growth on selectable media 12 [176,235,253]. A second assay that not only identies sites of initiation but also indicates the pro- portion of cells in which the site initiates is two-dimensional agarose gel electrophore- sis (2-D gel) analysis [66]. The speed with which a DNA molecule moves through an agarose gel by electrophoresis is in uenced not only by its size, but also by its shape. The mass of a molecule is inversely proportional to the speed with which it moves. Circular molecules (or \bubbles\) resulting from replication initiation move more slowly than linear molecules, as do branched replication fork structures. This property in combination with strategic electrophoretic eld application allows vari- ous DNA structures to be discerned from each other when they are run in a gel and observed through uorescent labeling/imaging. This allows one, for a given region of the genome, to calculate the ratio of linear:forked:circular (non-replicating:passively replicating:initiating) DNA. Thus, by comparing the circular DNA to the other struc- tures one can determine 1) if the region undergoes replication initiation and 2) what the ratio of initiation to passive replication is. As described above, the loading of ORC, MCM and Cdc45 is required for any locus to act as an origin. Chromatin Immunoprecipitation (ChIP) of these proteins coupled with PCR allows one to probe whether these proteins are enriched at any give site [241,242]. In ChIP proteins are xed to the DNA in vivo (typically by reversibly cross- linking them with formaldehyde). After chromatin isolation DNA-protein complexes of interest are then immunoprecipitated with appropriate antibodies, de-cross-linked and then puried for DNA. The isolated DNA from these experiments can then be quantied by PCR to determine its abundance relative to genomic DNA and a negative control (known genomic region without origin activity). This has been performed in asynchronous cells (ORC), in cells synchronized in G1-phase (MCM) 13 and at specic time intervals leading up to and through S-phase to determine the protein loading schedules (ORC, MCM and Cdc45) [12,281]. The thymidine analog 5-bromo-2'-deoxyuridine (BrdU), when available to replica- tion machinery, is readily inserted into replicating DNA at sights where thymidine is normally incorporated. Antibodies specic for BrdU allow one to isolate, by ChIP, BrdU-incorporated regions of the genome. When cells are synchronously released into S-phase in the presence of BrdU the replication schedule of any locus can then be an- alyzed by PCR of BrdU-IP material [252]. This is typically performed by harvesting subcultures at various times throughout S-phase. When replication origin activities are being analyzed cells may be released into S-phase in the presence of BrdU and HU to prevent fork invasion (from neighboring origins) at the origin of interest. Under these conditions, BrdU-incorporated DNA exclusively re ect initiation events at the origin. With the realization that the spatial organization of the genome is important for replication activity, a third class of methods is growing increasingly important. These methods harness imaging technologies to view specic regions of the genome in live cells or on combed DNA bers that have been extracted from live cells [123, 262]. In methods that image live cells, specic loci of interest are genetically modied to include repeats of binding elements for specic uorescently labeled proteins. When the cell is exposed to uorescence microscopy the bound loci uoresce and their location in the nucleus with respect to other engineered loci can be ascertained. These experiments were valuable in demonstrating the existence of replication factories in yeast cells [123]. 14 1.4.2 Genome-wide Replication Analysis The advent of tiling microarrays and high throughput sequencing has allowed several of the origin-specic assays described above to be adopted for whole genome analysis. The rst of these assays was ChIP on tiling arrays (ChIP-chip; this has subsequently been extended to include the sequencing of isolated DNA as well, so-called ChIP- seq) [204, 208]. In such data the sites of high signal signify likely sites of protein enrichment. This method has been applied to multiple organisms to identify putative origins through their binding of initiation proteins (ORC and MCM) [57,151,282,283]. It has also been employed to identify chromatin marks as well as the sites of action for multiple chromatin modiers [196,207]. A second class of techniques uses the physical structure of replicating DNA to identify origins. In the rst, DNA from proliferating cells is denatured, which releases ssDNA from around origins that have initiated synthesis and those that have not. When these strands are separated by size, only DNA from origins that have red will be in the lowest size fraction. This DNA can then be analyzed by microarrays or sequenced to identify putative origin sites [30]. In a second strategy, DNA is fragmented gently and then placed in an unsolidied agarose gel. After the agarose solidies it forms a matrix that traps the bubbles of DNA corresponding to the red origins. When an electophoretic eld is applied to the gel, linear or branched, but not circular, DNA moves away from the initial site in the gel [163]. The molecules at this site can then be isolated by gel purication and analyzed by array hybridization or DNA sequencing to identify putative origins (reviewed in [75]). Simply comparing replicated regions with non-replicated regions can also identify origins. The simplest form of this analysis involves synchronously releasing cells into 15 S-phase and then harvesting their genomic DNA after a short time or in the presence of HU (to ensure only DNA around origins is replicated) [85]. When this genomic DNA is labeled and hybridized to tiling arrays or sequenced, the problem is simply to identify regions of increased copy number. To make this identication easier, a second option is to analyze replicated DNA after it has been isolated by BrdU-IP (as described above) with either high-throughput technique [116, 267]. With this technique enriched signal is much simpler to detect, as background signal should theoretically be zero, while in the copy number approach the enrichment to non- enrichment ratio is theoretically 2:1. In the rst study to determine the genome-wide replication timing schedule of yeast, cells were grown in heavy isotopes to label DNA, arrested in G1-phase and then re- leased into S-phase. When released, the cells were transferred to media containing light isotopes, harvested at multiple time-points and the replicated DNA was sepa- rated from unreplicated DNA based on its density (heavy-heavy or HH vs. heavy-light HL). Following this the ratio of HL (replicated DNA) vs. HH (unreplicated DNA) was determined for each genomic region at each time-point using microarrays. In this analysis regions whose HL signal was high in arrays corresponding to early time points are early replicating, while regions that show no HL signal until later time points are replicated later. Using computational analysis and several reference positions (whose replication timing schedule was known) the timing schedule for the yeast genome was then approximated genomewide [200]. The copy number method and BrdU-IP method have also been used to track DNA replication through all of S-phase to determine the timing schedules of individual origins or replication domains, and also to track fork movement along the chromosome (compared in [87]). In the case of BrdU, this can be performed in the presence of 16 constant BrdU which results in a cumulative view of replication as time proceeds, or it can be performed by pulsing individual cell cultures for short time intervals. When the subcultures are analyzed each shows enrichment only where DNA replication occurred during its corresponding pulse interval [251]. An alternative to this is to perform ChIP on a replication fork protein at various time points throughout S- phase [229]. With either method, individual analysis of the resultant samples allows one to track fork movement accurately through time. Unfortunately quantifying the amount of replication at each time interval is dicult with the latter method. 1.4.3 Chromatin Analysis DNA does not exist in a bare state within the nucleus. It is wrapped around pro- tein complexes called histones to form nucleosomes and packaged further into various folds and loops within the nucleus [202]. In general, regions of tightly packaged chro- matin are silent and regions of looser chromatin are active [280]. Furthermore, various covalent modications are made on the histones, which are associated with specic genomic tasks (reviewed in [127]). The study of chromatin structure and modi- cations has received great attention, and this is mainly due to several genomewide tools that have allowed study of these structures/modications as they pertain to replication. One such assay is the nucleosome positioning assay. Here chromatin is cross-linked and digested with Micrococcal Nuclease (MNase). DNA that is wound around his- tones is protected from MNase and remains undigested while nucleosome-free DNA and linker regions between nucleosomes are digested. The undigested DNA is puried and analyzed with arrays or sequencing to determine the exact positions of nucleo- 17 somes throughout the genome, and hence also the average occupancy of nucleosomes at various genomic regions [64,138]. A second class of chromatin evaluation methods involves mapping long-range inter- actions. The rst of these to be developed was Chromatin Conformation Capture (3C) [46]. In its simplest form, chromatin is cross-linked so that contacts between loci engaged in long-range in trans interactions are xed. Following xation digestion of the DNA and a subsequent intramolecular ligation of cross-linked DNA molecules are performed at low concentration to form ligated structures joining the interacting genomic loci. The low concentration promotes intramolecular ligation of cross-linked loci, while disfavoring intermolecular ligation of uncross-linked genomic loci. In 3C, it is required to know both regions for which an interaction is being probed. Primers are designed such that they produce a product if the two regions of interest are lig- ated together. Thus, the existence of a product in such assays represents a positive long-range chromatin interaction. The 3C method has been modied to allow global analysis of loci interacting with a locus of interest (the bait), and is termed Chromatin Conformation Capture on Chip (4C) [236]. Here, the primer design allows the amplication of all molecules ligated to the bait. These amplied materials can then be analyzed for enrichment across the genome to determine which sites were in contact with the initial bait locus. A further extension of the 3C/4C methods allows for the analysis of the global set of long-range chromatin contacts, not limited by the analysis of a single bait locus [142]. In this method (Hi-C), the ligation step is altered to ensure that ligated structures contain biotin molecules and universal primers. Using this criterion, these structures are isolated and sequenced with paired-end sequencing. In the resulting data, a sequence that has at one end, one region of the genome and at the other end, a second 18 region of the genome would represent the presence of a ligated structure containing the two regions. If a statistically enriched number of such reads are present in the data, then an interaction between the two loci can be inferred. 1.5 Modeling DNA Replication Mathematical modeling of replication allows one to test hypotheses regarding the kinetics that dictate a given organism's replication schedule. Models have been de- veloped for S. cerevisiae, S. pombe, X. laevis and mammalian cells. Although the complexity and accuracy of these models is increasing, they have yet to fully explain the replication process at a molecular level. One system that has been analyzed thoroughly through such means is the embryonic Xenopus cell. In this system, S-phase completes within 20 minutes and imaging methods have revealed that this is because the rate of origin ring (initiations per unit of time) increases as S-phase progresses, but then slows near the end of S- phase [24,76,86,146,155]. Two separate groups have modeled DNA combing results in this system to determine why there is an increased rate of origin ring as S-phase progresses. In one model a limiting replication factor randomly distributes to origins and causes ring with a specied probability when it encounters an origin [76]. To t the model accurately to imaging data, the limiting factor in the system and the probability with which an origin res upon encountering an origin had to be increased as S-phase progressed. In a second model, after associating with DNA, initiation factors search linearly along the strands until they nd origins (at which time those origins re [72]). To t this model to the experimental results, the amount of initiation factor in the system had to be increased as S-phase progressed. 19 The S. pombe replication schedule is also approximately 20 minutes long. In this yeast, origins show a variety of relatively conserved replication ring propensities [85]. The most accurate model of this system has origins ring stochastically based on exponential ring times (with the mean of each origin's ring time being proportional to its peak height in HU copy number experiments; [148]). In order for this model to complete replication in a timely manner, the total sum of ring propensities had to be kept constant throughout S-phase. This meant that as S-phase progressed the propensity of unred origins to re increased. Of all organisms discussed thus far, the S. cerevisiae timing schedule appears to be most conserved from one cell cycle to the next. The simplest model of this system is deterministic, where origins re at their experimentally dened initiation times and forks travel at a rate of 3 kbp/min [243]. This simple model showed that the replication-timing schedule could be partially explained by replication origin ring schedules. However, on more than half of all chromosomes the simulated data corre- lated poorly with experimental results. Furthermore, the model did little to explain the causes of the well-dened ring times of S. cerevisiae origins. The rst stochastic model of S. cerevisiae replication assigned each origin a Gaus- sian ring time density in which the means for each origin were dened by plasmid maintenance assays and the fork rate was set constant at 1.6 kbp/min [45, 67]. The variance of these distributions was set at various levels during simulations and the simulations were repeated many times and averaged to compare their behavior to ex- perimental data [7]. The model failed to recapitulate origin-ring eciencies when a single variance was used for all origins and also when each origin was assigned a vari- ance proportional to its mean ring time. Only when variances for each origin were estimated using a genetic algorithm did the model recapitulate experimental obser- 20 vations. These tted variances showed a general increase with activation time. A second set of analytical models was subsequently proposed [205]. The rst of these was similar to the one described above, in that it contained for each origin a location, the time at which the origin had red in half of the cell population and a third parameter that was analogous to the variance of the origins ring time (except that here it was dened as the length of time between that origin ring in a quarter to three-quarters of cells; [288]). In a second model the latter two parameters were replaced by a single value (which the authors claim represents the amount of limiting factor bound to a given origin) and the origins were t with individual ring time densities based on the sigmoid function; [288]). After tting the parameters and then simulating replication proles, the authors were able to reproduce the experimental data well. 1.6 Factors Aecting Replication Origin Firing 1.6.1 Chromatin Structure Local chromatin environment has been most closely linked with origin selection and timing [9,188,216]. Chromatin structure in uences the dynamics of genomic events by regulating the availability of individual loci to process-specic proteins. The highest delity building block of chromatin is the nucleosome, which constitutes 147 bp of DNA wrapped around a histone octomer (containing two copies of histones H2A, H2B, H3 and H4). When higher order chromatin packaging is ignored, these structures appear as 10nm wide beads on a string where beads (nucleosomes) are interspersed by linker (nucleosome free) DNA. This level of packaging is the most accessible to 21 functional protein complexes (e.g. replication and transcription factors). The next level of packaging involves the condensation of 10nm bers into more dense 30nm bers by Histone H1. DNA that is packaged loosely is referred to as euchromatin while that which is packaged tightly is referred to as heterochromatin. Generally, euchromatin is transcriptionally active while heterochromatin is silent (reviewed in [280]). In the nucleus, even higher order packaging results when these bers are folded into three-dimensional (3-D) structures and domains [202]. Thus, within the nucleus, genomic features that are linearly distal may, in actuality, be proximal and may also in uence each other's chromatin environment. An initial hypothesis for why ORC binds only 600 of the 2000 ACSs in the S. cerevisiae genome is that local chromatin structure (specically nucleosome po- sitioning) impedes binding at some sites. The rst studies aimed at determining whether this was the case focused on ARS1 (an early ecient origin). At ARS1, well positioned nucleosomes were found on both sides of the ACS (which was nu- cleosome free) both on a plasmid and in the chromosome [143, 259]. Furthermore, experimentally forced nucleosome encroachment into the ACS resulted in a loss of ORC binding [237]. Interestingly, displacement of the nucleosome anking the ACS also disrupted pre-RC assembly and origin activity indicating that, in addition to the Nucleosome-Free Region (NFR), the precise positioning of nucleosomes around the ACS is important for origin function [143]. Genomewide nucleosome mapping with the MNase assay further conrmed this observation, ARS regions are in general devoid of nucleosomes and precise analysis of the ACSs within these ARSs revealed that these loci show a conserved nucleosome pattern with well dened nucleosomes anking each ACS [3,19,57,227]. The NFRs surrounding ACSs appear to be (at least partially) dened by sequence. 22 Reconstituted nucleosomes on bare DNA in the absence of trans-acting factors (e.g. pre-RCs or ICs) show an NFR around the ACS [114]. Furthermore, ACSs that are not functional origins are more likely to contain nucleosomes, indicating that nucle- osome positioning is a likely determinant of origin function. In higher eukaryotes such as Drosophila and Chinese Hampster egg cells ORC is also found to localize to NFRs. ORC appears to have in uence in dening the precision with which nucleosomes are positioned around the ACSs. Nucleosome positions in the absence of trans-acting fac- tors (ORC) do not show the specic well-dened positions that they do in the presence of ORC [57]. Furthermore, Orc1 contains a bromo-adjacent homology domain (BAH- domain), which in other proteins has been shown to interact with nucleosomes and silence chromatin [55,184,293]. In S. cerevisae removal of this domain causes disrup- tion of ORC binding at a subset of origins and subsequent loss of precision in their surrounding nucleosome pattern [170]. Furthermore, nucleosomes surrounding ORC binding sites undergo active nucleosome-turnover and exchange (although this may be facilitate ORC binding rather than be caused by it; [115,214]). Thus, in terms of nucleosome occupancy (at least in S. cerevisae) origins undergo two stages. In stage I, an NFR region is established based mainly on the ACS sequence composition. This NFR permits ORC binding which then allows the entry into Stage II where ORC facilitates the precise positioning of nucleosomes around the ACS. On a larger scale, as expected, origins are more active in euchromatin than they are in heterochromatin [91, 149, 246]. The organization of origins within nucleus has also been implicated in origin selection and timing control. The presence of 1000 discrete foci (replication factories) representing clusters of 10-100 replicons have been observed in mammals, indicating that origins cluster with one another 23 in the nucleus [20, 101, 150, 171]. Within these factories are proteins belonging to the pre-RC complex as well as several chromatin remodelers [41]. Some of these factors stably associate with the replication factories (e.g. PCNA), while others show dynamic contacts [244]. In yeast, the earliest origins cluster with one another in G1-phase indicating that the spatial organization of origins within the nucleus is important in determining there ring schedules [53]. On a larger scale, in mammalian cells, the early and late replicating domains are housed in separate sub-compartments within the nucleus [216]. The factors responsible for this clustering have not yet been identied, although cohesion has been implicated in their formation in mammals [81]. Furthermore, the mechanisms by which clustering benets origins in terms of initiation time and probability remain elusive. 1.6.2 Chromatin Modications Density and accessibility of the DNA within chromatin can be in uenced by covalent modications to residues within the conserved histone tails. Individual modications have also been implicated in specic genomic tasks including transcription and DNA repair. Modications include acetylation, methylation, ubiquitination, sumolation, phosphorylation, citrullination and ribosylation (reviewed in [127]). Acetylation is the most extensively studied histone modication. Protein complexes that directly cause acetylation are referred to as histone-acetyl-transferases (HATs) and factors that remove acetyl groups are called histone de-acetylases (HDACs). Acetylated histones are usually associated with an open and active chromatin state (with the exception of lysine 12 in histone H4 which has evidence for silencing ca- pabilities). Conversely, de-acetylated histones are associated with dense, silent chro- 24 matin (reviewed in [127]). Thus, HATs are associated with activation and HDACs are associated with silencing. Acetylation is thought to reduce chromatin density by neutralizing the basic charge of the histone tail, thus reducing the histones anity for DNA (reviewed in [279]). Histone acetylation is the class of modication that has shown to be most correlated with replication initiation. In yeast when Cdc6 is conditionally knocked out (hindering pre-RC loading) deletion of the HDAC Sir2 rescues approximately 20% of origins [42]. This rescue appears to be independent of Sir2's action as a silencer because deletion of Sir3 and Sir4 has no eect on these origins. It seems that Sir2 directly inhibits the loading of the pre-RC by negatively regulating H4K14 acetylation, as mutations that mimic this modication also rescue origins from Cdc6 knockdown. The Sin3- Rpd3 histone deacetylase complex, known for its role as a gene-specic transcriptional repressor, also regulates initiation timing. In S. cerevisiae, Rpd3 deletion causes signicantly earlier initiation of some non-telomeric, late-ring origins, along with increased acetylation of histones anking these origins [9,268]. The observation that targeting of a histone acetylase adjacent to a late-ring origin advances its time of initiation supports the idea that local acetylation at origins in uences their initiation probabilities [79,268]. As expected, in contrast to the HDACs Sir2 and Rpd3, HATs activate origins. The histone acetylase binding to ORC (Hbo1) protein is recruited to origins via its in- teraction with Cdt1 and once there it has been shown to interact with Orc1 and Mcm2 [29,97,166]. In Xenopus extracts Hbo1 is required for pre-RC formation, and in yeast its deletion results in a delayed S-phase [51,96]. Furthermore, tethering Hbo1 to the Drosphila chorion locus results in increased origin activity [2]. The exact mech- anism by which Hbo1 has its eect is in dispute however, as it has been shown to 25 acetylate Orc2, Mcm2 and Cdc6 in vitro, leaving open the possibility that Hbo1's di- rect action on these proteins could be responsible for their action at origins [96]. Methylation like acetylation is most notably associated with silencing (reviewed in [127]). In many of the systems studied, it is not the initial methylation by a methyl- trasferase (HMT) that causes the silencing but a downstream alteration that is brought about by methylation-dependent recruitment of other modiers (e.g. HADCs [127]). Several lines of evidence have also implicated methylation marks in origin selec- tion/timing. The ENCODE project identied H3K4me2, H3K4me3 as being posi- tively correlated with the timing of large human replication domains and also found H3K27me3 to be negatively correlated [22]. In a separate analysis Ryba et al. de- termined that enrichment for H3K4me1,2,3, H3K9Ac, H3K27Ac, H3K36me3 and H3K27me3 was correlated with replication timing, but did not show evidence for H3K27me3 repression [216]. 1.6.3 Transcription The coordination between transcription and replication is perhaps most evident when examining the positions of active origins with respect to genes. In yeast, almost all origins reside in intergenic regions where they are protected from the transcriptional apparatus [175, 282]. Functional origins that are contained within inactive genes are made dysfunctional when those genes are activated [168]. This may initially be predictive of a negative correlation between the two processes but this is not the case. In metazoans, late replicating regions of the genome are associated with transcriptionally silent genes and actively transcribed genes are typically replicated 26 in very early S-phase (reviewed in [225]). This correlation has not, however, been observed in yeast and exceptions have been identied in Drosophila [226]. In Drosophila, over two-thirds of ORC binding sites are found near transcription start sites [151]. Also, recent studies indicate that the majority of functional origins in mouse and human cells co-localize with the CpG islands of active genes [30, 230]. Finally, in metazoans, multiple transcription factors (e.g. Myc, E2F1, Rb and Myb) have been implicated in ORC and pre-RC assembly, suggesting that some of these factors may specify sites at which origins are to assemble. The co-localization of origins with genes and the observed correlations in their activi- ties could be the result of chromatin. The regions upstream of the promoters of active genes are typically nucleosome-free so origin localization at these sites could simply be due to the availability of an open chromatin state [105,227,292]. Also, a genome-wide analysis of Myc, E2F1, Rb and Myb binding determined that these factors show high overlap in their localization and that ORC binding within these sites of overlap was more probable when more transcription factors were found to bind there, indicating that the accessibility of the site for protein binding, not necessarily the presence of specic factors was important for origin selection [213]. Finally, promoter regions typ- ically contain specic epigenetic marks such as H3K4me2 and H3K9Ac (which have been linked to active origins) leaving open the possibility that a shared preference for these modications leads to the observed co-localization of origins and genes [127]. Regardless, these pieces of evidence together support the notion that transcription and replication are both activated by similar chromatin environments. 27 1.7 Chapter Summary and Thesis Outline Chromatin structure in uences the dynamics of genomic events by regulating the availability of pertinent loci to process-specic proteins. Modifying chromatin struc- ture to accommodate one genomic task inevitably alters the kinetics of proximal processes. In Eukaryotes, DNA replication occurs on a chromatin template that is actively engaged in DNA transcription, defense and repair. Replication initiates at sites called origins, which are distributed throughout the genome in primarily in- tergenic regions, where they are most protected from transcriptional interruption. However, in the tightly packed S. cerevisiae genome, intergenic regions are small (536 bp), thus their chromatin environment is likely in uenced by anking genic activity. Furthermore, due to 3-D chromatin packaging, genomic features that are linearly dis- tal to origins can, in actuality, be proximal in the nucleus and can, thus, also in uence their chromatin environment. In order for DNA replication to occur accurately and concurrently with these fundamental genomic processes, coordination between their structural requirements is essential. The work described in this document can be divided into three areas. In the rst we focus on measuring DNA replication dynamics. In particular we demonstrate that many of the current methods for analyzing genome-wide genetic datasets are inap- propriate for datasets aimed at capturing replication activity. We introduce several methods that aid in this analysis and then apply these methods to capture replication dynamics in yeast cells with high delity. In the second research area we attempt to model these dynamics. Specically we develop a molecular model for how replication timing of the genome is dened by upstream events. We then implement the model computationally and demonstrate that it can reproduce these timing schedules. In the 28 model, origins are assigned a propensity to attract important replication proteins; in the third area of research we attempt to determine the molecular kinetics that dene these propensities. Specically we analyze transcription, chromatin modications and chromatin structure in WT cells to draw correlations between replication and these concurrent processes. We then analyze replication in cells where an important reg- ulator of histone acetylation (Rpd3) is removed to determine that it represses many origins from ring. Finally we identify two new replication factors, Fkh1 and Fkh2, that broadly regulate origin ring. We determine that these proteins act to regulate non-CEN-proximal origins by promoting long range interactions between them. We also provide evidence that they promote these interactions by binding at separate origins and dimerizing to bring the origins together in the nucleus. Taken as a whole, our research in these areas promotes the idea that chromatin structure at origins denes their ability to replicate by allowing them preferential access to rate limiting replication proteins. 29 Chapter 2: Strategies for Analyzing BrdU-IP-chip Datasets Chapter Disclosure: This work has been published in Knott, S.R.V., Viggiani, C.J., Aparicio, O.M. and Tavar e, S. Strategies for analyzing highly enriched IP-chip datasets. BMC Bioinformatics, 10:305:doi:10.1186/1471-2105-10-305, 2009. Christo- pher J. Viggiani performed all BrdU-IP-chip experiments. 2.1 Background Chromatin immunoprecipitation on tiling array (ChIP-chip) studies attempt to iden- tify genomic features such as protein binding [83,137] or histone modication/occupancy [196, 210]. In the former, the regions of interest are generally small, resulting in a low proportion of enriched probes and the data can be considered to come from one of two distributions, enriched or non-enriched. In contrast, the regions analyzed in the latter studies are generally large and can have multiple levels of enrichment within and between them, making their analysis more dicult. BrdU-IP-chip datasets have characteristics that are similar to histone modication/occupancy experiments. 30 While computational tools have been developed to address the analytical issues asso- ciated with mRNA-chip and protein binding ChIP-chip studies, the highly enriched IP-chip datasets described above pose unique problems requiring new investigative strategies. Analyses of BrdU-IP-chip experiments aim to distinguish true biological signals (DNA replication activity) from array noise and to examine those signals for magnitude and associated genomic features. Microarray datasets (specically from two-color plat- forms) typically contain errors resulting from sample handling, preferential ampli- cation and labeling bias, making this task dicult. In attempts to correct for this, several ChIP-chip studies have incorporated mock controls into their experimental design [5, 196]. Under this protocol, for each experiment a mock sample (DNA ac- quired with a non-specic antibody or no antibody at all) is hybridized against the same total DNA as the experimental sample. Following array quantication, true positive signals are identied as those that are signicantly higher in the experimen- tal data than the mock data. Recently, it has been shown that without these controls the false positive rate can be high [196]. Unfortunately, the use of these controls signicantly increases the cost of each experiment and furthermore, the strategy fails to address issues pertinent to studies aimed at comparing the magnitude of signals across dierent experimental conditions. Computational alternatives to the use of mock controls have been developed to work with two-color array data. These typically involve a within-array normalization step aimed at eliminating intensity bias (where M = log 2 (IP=Total) values show depen- dence on their corresponding A = (log 2 (IP) + log 2 (Total))=2 values) and can be followed by a between-array normalization step to remove location and scale vari- ation across multiple experiments [25, 239, 289, 290]. Simple loess normalization is 31 usually used in mRNA-chip studies for within-array normalization, based on the as- sumption that the M-values should follow a symmetric distribution [239, 289, 290]. Brie y, probes are plotted in the MA plane and a loess curve is tted to the data. To remove the intensity bias, the resultant curve is then subtracted from the probe M-values. While mRNA-chip M-values typically follow a symmetric distribution, array studies involving chromatin immunoprecipitation are often associated with asymmetric em- pirical M-distributions [192]. To remove the intensity bias in ChIP-chip data Peng et al. [192] proposed a two-step process in which an initial data transformation is per- formed under the assumption that chromosomally neighboring probes should have minimal dierence in their M-values (with the exception of probes bordering bound and unbound regions). Probes are rst plotted in the (M) vs. (A) plane, where (M) and(A) values are the dierences between the M- and A-values of neighboring probes, respectively. Under their assumption, when plotted in this plane probe data should have a slope equal to zero. With this in mind, the line of best t to the probes in this plane is taken as the x-axis for a modied MA plane into which the probes are transformed; we refer to this line as the rotation line. Following this, a modied loess normalization step is performed where the loess curve is tted to data points within two standard deviations of the median. If comparisons are to be made across experiments after within-array normalization, between-array normalization is typically applied to remove dierences between the empirical M-distributions of the arrays not attributable to true biological variation. For ChIP-chip data, Yang et al. [290] proposed scale normalizing by a value pro- portional to the median absolute deviation (MAD). Others have proposed quantile normalization [25, 289], which forces the M-values of all experiments to follow the 32 same empirical distribution. Here we demonstrate that current methods for normalizing ChIP-chip datasets may be unsuitable for BrdU-IP-chip experiments, and we describe a novel algorithm for within-array normalization that is robust to the nuances of protein binding and hi- stone modication/occupancy ChIP-chip and BrdU-IP-chip datasets. For each ex- periment, the algorithm identies a subset of putative background probes and uses it to transform the data onto a plane where the intensity bias of the dataset is low. We then employ these subsets in between-array normalization and peak identication strategies to prepare the data for downstream analysis. We illustrate the strategies proposed here on four replicate wild-type (WT) and four replicate mutant S. cerevisiae datasets. The mutants are rpd3 cells (these strains are analyzed in further detail in Chapter V). All datasets were produced when DNA was harvested from cells one hour after release from -factor into HU. The well- studied replication landscape of WT S. cerevisiae cells in HU and the subset of origins whose altered replication activity inrpd3 cells is known allows us to test the signal identication and quantication capabilities of our methods in the context of cross-experiment analysis. 2.2 Within-Array Normalization To remove the intensity bias present in the BrdU-IP-chip data (Figure 2.1A) we rst attempted simple loess normalization with default parameter settings. Figures 2.1A and 2.1B show the result of this normalization on the \cleanest" (as measured by autocorrelation of probe M-values along the genome; cf. [130]) WT dataset. Under the assumption that in the presence of HU earlier and more-ecient origins re in a higher 33 Figure 2.1: (A) The density of all WT probes on the MA plane (red) before normaliza- tion (probes within ARS1 are denoted with green dots). During loess normalization a loess curve is tted to the probes in this plane. (B) Probes on the MA plane after the loess curve has been subtracted from their M-values. Note that M-values of ARS1 probes have been pulled towards 0. percentage of cells than do later less-ecient origins, we expect that the amount of IP DNA, and thus M-values, associated with active origins will have larger magnitudes than those associated with less active origins. The green points on the MA plots signify probes within ARS1 (an origin that res early and eciently in HU [221]) and these can be used as a measure of the normalization procedure's performance. Due to the high percentage of BrdU-enriched probes the loess curve is pulled away from the background probe set (non-BrdU-enriched probes) during tting. As a result, when these curves are used for normalization they articially lower the M-values of some signicantly BrdU-enriched probes (e.g. probes within ARS1 ). Next we applied the two-step within-array normalization scheme for ChIP-chip data proposed in [192] to BrdU-IP-chip data, again using default parameter settings. Fig- 34 ures 2.2A and 2.2B show the probes of the \cleanest" WT and rpd3 datasets, respectively, plotted in the (M) vs. (A) plane. The rotation lines identied in this plane do not follow the slope of the background distribution in the MA plane. Af- ter probes have been transformed using these lines, a residual intensity bias remains that seems to be more prominent in therpd3 data (Figures 2.2C & 2.2D). Unfortu- nately this residual bias appears signicant enough to aect the modied loess step, resulting in a normalized probe set with characteristics similar to probes after simple loess normalization (a sloping background distribution and articially lowered ARS1 probe M-values, Figures 2.2E & 2.2F). When these methods are applied to a slightly \noisier" (as measured by autocorrelation once more) rpd3 dataset, they dene a rotation line whose slope has the opposite sign to that of the background distribu- tion (Supplemental Figure B.1), leading to a more obviously incorrect transformation. The methods proposed in [192] were developed under the assumption that probe M-values follow one of two distributions (enriched or non-enriched) and that these distributions have relatively low variance (i.e., enriched probes have similar M-values). While this assumption is generally valid for ChIP-chip data, it does not hold for BrdU-IP-chip experiments. Figure 2.2G shows that the replicated regions are wide (up to 30 kbp) and, due to the asynchrony of replication fork movement across the cell population, there is no sharp boundary between enriched and non-enriched regions, but rather an incremental decrease in M-values on either side of each peak apex. We suggest that these characteristics, in not following those of typical ChIP-chip data, are the reason why the method proposed in [192] is sub-optimal for BrdU-IP-chip datasets. Although the data transformation proposed in [192] is not appropriate for BrdU-IP- 35 Figure 2.2: Illustration of method proposed in [192] for normalization of BrdU-IP- chip data. Each probe in the WT (A) and rpd3 (B) datasets is plotted in the (M) vs. (A) plane and a line of best t, which should run parallel to the slope of the background distribution, is identied. The WT (C) and rpd3 (D) probes transformed onto the modied MA plane with probes from within ARS1 highlighted (green). Following this transformation a loess curve is tted to probes within 2 standard deviations of the median M-value. WT (E) and rpd3 (F) probes after the nal loess normalization step. (G) Raw M-values of WT probes plotted in the chromosomal plane (Chromosome XIII shown here). 36 chip data, we agree with their strategy of rst transforming probe intensities onto an appropriate plane before further normalization. Thus, to remove intensity bias we have developed a data rotation method, robust to the nuances of both ChIP-chip and BrdU-IP-chip data, that we employ prior to the modied loess normalization step. We demonstrate our transformation on the \clean"rpd3 dataset, as it best displays the analytical issues associated with BrdU-IP-chip arrays; for analysis of the \noisier" rpd3 dataset see Supplemental Figure B.2. An MA plot of the rawrpd3 data shows that the background probes (dark region), under the correct transformation, have a dense and relatively symmetric empirical M-distribution (Figure 2.3A). As shown in [192], this is a characteristic feature of ChIP-chip data, and thus the methods described below will also be applicable to such data. We propose a data transformation that takes advantage of, and searches for, a subsetS of the N probes whose distribution best follows these characteristics. After the probes inS are identied we dene a rotation line that follows their slope in the MA plane and adopt it as the x-axis for a modied MA plane. To identifyS we rst search for the D densest subsets of probesS 1 ;S 2 ;:::;S D with sizesk 1 =N=D;k 2 = 2N=D;:::;k D =N. Here, the density of a probe set is measured by the size of its minimum spanning tree in the MA plane. D is a parameter that determines the granularity of the algorithm (we useD = 100 here; for a more precise solution D can be increased at the expense of running time). Finding the k-vertex minimum spanning tree in a dataset of size N k is an NP-hard problem known as k-Minimum Spanning Tree (k-MST ). Instead of solving this directly, we employ a time-optimized version of an approximation algorithm aimed at identifying only the set of probes contained in the k-MST rather than the actual k-MST [70]. The algorithm proposed in [70] is polynomial in time, but current tiling array feature 37 Figure 2.3: (A) rpd3 probes plotted in the MA plane (ARS1 probes are indicated with green dots). (B) The background probe subset plotted in the MA plane. The rst and second principal component axes are used as the new set of axes in the data rotation. (C) Probes plotted in a modied MA plane after data rotation. A loess curve is then tted to the probes within two standard deviations of the median M-value. (D) Probes plotted in the modied MA plane after loess normalization is complete. 38 counts are now in the millions. To reduce its search space, and hence its running time, we have modied the algorithm in [70] by integrating an initial greedy step. First, probes are binned into cells of a uniformly spaced 128 128 grid (I) in the MA plane. Following this, cells of I (which we denote by I ij ; 1 i;j 128) and their probes are added to a set C in descending order of the number of probes (jI i;j j) they contain, until kN=DjCj k, wherejCj is the total number of probes in the cells of C. Following this, \layers" of cells neighboring C are added to a set Q until jCj +jQjk. More precisely, when a new neighboring \layer" is to be added to Q, its cell set is dened as fI i;j :I i;j 6Q[C ^ 9 u; v2f1; 0; 1g s:t: I i+u;j+v Q[Cg: We then alter the algorithm in [70] so that all probes in C are included in the nal k-probe solution and the search space for the additionalkjCj probes is constrained to the cells in Q. In [70] the authors employ a set of grids G 0 ;G 1 ;:::;G n whose cells each have corresponding listL. To ensure the above constraints are followed, we initialize the lists corresponding to the cells of the nest grid, G 0 (a 256 256 grid here) as follows: if cellC L(p) = 8 > < > : x 0 if p =m 1 otherwise, elseif cellQ L(p) = 8 > < > : x 0 if pm 1 otherwise, else L(p) =1; 39 wherex 0 andm are the width of, and number of probes in, the cell corresponding to L, respectively. AfterL has been computed for each of the cells inG 0 , the algorithm proceeds as described in [70], with the following modications: (i) for a larger cell c and corresponding list L, if r of the probes in c are contained in C, L(p) =1 for p < r; (ii) L(r) is calculated by merging all lists corresponding to subcells of c that are contained in C, and (iii) for r <qk, L(q) is calculated by merging L(r) with all lists corresponding to subcells ofc that are not contained inC. After completion, the nal set of k probes used for subsequent analysis is that corresponding to L(k) for the 1 1 grid G n (see [70] for further details). Following this, we search for the smallest of the D subsets whose\symmetry" measure R (dened below) is greater than an experiment-specic cuto R C (also dened below), andS is dened by this subset of probes (see Supplemental Figure B.3). To assess the symmetry of probes in the setS i we calculate the rst and second principal components, PC i 1 and PC i 2 respectively, of its probes in the MA plane, and dene its symmetry measure R i by R i = log 0 B B @ X Probes2 MST i 1(PC i 2 value >c i ) X Probes2 MST i 1(PC i 2 value c i ) 1 C C A ; where MST i denotes the minimum spanning tree of the subset, 1 denotes the indicator of a set, and the cuto c i is determined as the median of the PC i 2 -values of the set S 0:2N . We choose this subset size because we know a priori that less than 80% of probes are enriched in the experimental conditions being analyzed (this ensures that this subset contains primarily background probes; for other experimental conditions this subset size can be altered accordingly). 40 We deneS as the set of size k j where j = minfmD :R m R m+1 R D ;R m R C g andR C = 2standard deviation ofR 1 ;R 2 ;:::;R 0:2N . This choice is motivated by the observation that ifk i is the size of the largest subset of size at mostjSj, then the values R 1 ;R 2 ;R;:::;R i uctuate at a value close to 0, whereas the values R i+1 ;R i+2 ;:::;R D incrementally increase, as enriched probes are only included in the numerator of the ratio deningR (Supplemental Figure B.3). The cuto valueR C is dependent on the a priori knowledge that at most 80% of all probes are enriched. AfterS is identied, all probes are transformed into the plane whose x and y axes correspond to its rst and second principle components, PC 1 and PC 2 respectively (Figure 2.3B). Following the rotation, the modied loess step proposed in [192] is applied to the data (with default parameter settings) and although the large numbers of enriched probes \pull" the loess curve away from the background distribution (Figure 2.3C), the data transformation ensures that the loess normalization does not distort the data and that the majority of the residual intensity bias is removed (Figure 2.3D). The autocorrelation structure of probe M-values along the chromosome is inversely proportional to array noise and intensity bias and should increase when within-array normalization methods are carefully applied [130, 192]. To assess our methods, we calculated the autocorrelations of both the WT and rpd3 datasets prior to and after application of our within-array normalization scheme at lags of 0 to 100 probes (corresponding to distances of 0 to 3000 base pairs). Figure 2.4 demonstrates that the proposed strategies reduce the intensity bias-related noise inherent in BrdU- 41 IP-chip experiments. In addition the correlation structure of the WT data is worse than that of rpd3. We think that this is due to the mutant array having a higher proportion of enriched probes, as noise appears to be more signicant in non-enriched regions (compare Figures 2.2G and Supplemental Figure B.4) 2.3 Between-Array Normalization 2.3.1 Location Normalization When comparing the within-array normalized data across dierent experiments, fur- ther normalization is needed to correct for the fact that the M-values inS 0:2N can have dierent locations. For example, when comparing the MA plots of WT and rpd3 after within-array normalization, the median is much lower inrpd3 (Figures 2.5A and B). When these data are plotted along the chromosome we see that the baseline of the rpd3 plot is articially lower than that of WT (Figure 2.5C). If not corrected, this would result in errors when testing for dierences between WT and rpd3 peaks. To correct for this, for each experiment we propose subtracting the median M-value of itsS 0:2N as calculated after within-array normalization (Figure 2.5D and E). This strategy successfully normalizes the baseline across arrays, allow- ing comparisons between experimental conditions to be performed more accurately (Figure 2.5F). 2.3.2 Scale Normalization We observe noticeable scale dierences in the empirical M-distributions of experi- mental replicates. Before performing comparisons across various conditions, these 42 Figure 2.4: The correlation structure of the WT andrpd3 datasets before and after within-array normalization. y-axis: Spearman rank correlation. x-axis: lag, measured as number of probes along a chromosome. 43 Figure 2.5: (A) WT probes (after within-array normalization) plotted in the MA plane. The location parameter is the median M-value ofS 0:2N . (B) rpd3 probes (after within-array normalization) plotted in the MA plane. (C) WT and rpd3 probes plotted in the chromosomal plane (Chromosome XIII). (D) WT probes plotted in the MA plane after location normalization. (E) rpd3 probes plotted in the MA plane after location normalization. (F) WT and rpd3 probes plotted in the chromosomal plane (Chromosome XIII) after location normalization. 44 experimental errors should be eliminated without removing dierences attributable to true biological variation. We tested the existing strategies for scale normalization (MAD scaling and quantile normalization) and found that signal dierences observed consistently between WT andrpd3 replicates, which we attribute to true replication landscape changes in rpd3, are removed when either is applied (data not shown). With MAD scaling, dierences between larger enrichment peaks are removed and with quantile normalization virtually all biological dierences are eliminated. Here we propose a modied quantile normalization procedure where the M-values of each set of replicates are normalized together [25], but not with replicates from other experimental conditions (e.g. the WT replicates are quantile normalized with one another separately from the rpd3 replicates). This forces replicates to better resemble each other (removing experimental error) without removing true biological dierences. Figure 2.6A shows the peak heights from the four WT replicate datasets (for peak identication and quantication see below) plotted against their averages (before scale normalization). The scale dierences result in discrepancies between replicate peaks with larger heights, which can be a source of false negatives when testing for peak height changes (e.g. the larger variation in peak heights results in a smaller t-statistic). Figure 2.6B shows that, when the modied quantile normalization strategy is applied, these size-dependent dierences are removed. 2.4 Peak Detection and Quantication There are several ways in which peak identication and quantication can be per- formed. For example, we might average the observations from replicate experiments to get a single set of potential peaks for each experimental condition. Because there 45 Figure 2.6: (A) Peak heights of each WT replicate, calculated before scale normal- ization, plotted against the average height across replicates. (B) Peak heights of each WT replicate, calculated after scale normalization, plotted against the average height across replicates. are often multiple peaks within a given enriched region that may be lost if averag- ing across replicates is used, we have found it better to identify peaks within each replicate, and then compare peaks across replicates (and perhaps conditions) using further alignment. 2.4.1 Detection of Enriched Regions Several algorithms have been developed to identify enriched genomic regions in ChIP- chip data [5, 28, 73, 108, 130, 140, 192, 199, 283]. Many of these use Hidden Markov Models (HMMs) with two probe states, corresponding to enriched and non-enriched. Others have proposed simpler methods, such as setting an enrichment threshold based on the variability of the array noise [192]. Here we calculate a nal enrichment cuto, used below to identify positive signals, by taking advantage of the characteristics of the distribution of the M-values of background probes. We employ a strategy 46 Figure 2.7: Peaks identied by the present method in a single replicate are marked with red stars. Probes in blocks called enriched by the HMM (posterior probability 0:5) are marked in blue and probes from non-enriched blocks are grey. Notice the agreement between the calls. Further details are provided in the text. similar to that proposed in [73]: identify all probes whose M-values are less than the median of the setS 0:2N , as recomputed after within-array and between-array normalization, re ect them about this value, and set the cuto to twice the sample standard deviation of the resulting distribution (Figure 2.7). We note that we could also use this distribution to provide P-values for ranking probes, but we do not explore this further here. 2.4.2 Detection of de novo Replication Peaks To identify individual replication peaks, we begin by tting a loess curve to the normalized data on the chromosomal plane. Following this, a sliding window is applied to search for all regions with a continuous increase in smoothed M-values for at least 20 probes ( 0:6 kbp) followed by a continual decrease for at least 20 probes (typical 47 replication peaks are relatively symmetric about one apex; this choice can be changed for other types of data). We assign each peak a height equal to the median of the non-smoothed M-values within 500 bp of its apex and accept it as a potential positive if its height is greater than the enrichment cuto (Figure 2.7). After potential peaks have been identied for each experiment, we align them across replicates with a dynamic programming algorithm. To identify peaks that are present across a set ofr replicates we perform a multiple global alignment on their replicate- specic locations using a version of the Needleman-Wunsch algorithm [174] similar to the one described in [209]. Each element A of the alignment setA is represented in the form of a sequence of tuples: A = ( (C 1 ;f(E 11 ;L 11 );:::; (E 1n 1 ;L 1n 1 )g); (C 2 ;f(E 21 ;L 21 );:::; (E 2n 2 ;L 2n 2 )g);::: ) The rst element C of each tuple denes the chromosomal origin of a peak. The second element in the tuple,f(E 1 ;L 1 ); (E 2 ;L 2 );:::; (E v ;L v )g say, is a set of tuples consisting of experiment labels (E) and corresponding chromosomal locations (L) of peaks that are identied as aligned in experiments E 1 ;:::;E v . The method starts with the peak locations identied above in each experiment; the peaks in the jth experiment can be represented in the form A j = (C j 1 ;f(j;L j 11 )g); (C j 2 ;f(j;L j 21 )g);::: : The algorithm proceeds by successively calculating all pairwise alignments and align- ment distances between sequences inA with the Needleman-Wunsch algorithm, each 48 time replacing the most similar pair with its alignment: whilejAj> 1 (x;y) = argmin u;v (jAlignment(A u ;A v )j) A =ffAnfA x ;A y gg[ Alignment(A x ;A y )g end returnA; wherejAlignment(:;:)j is equal to the bottom right hand corner of the Needleman- Wunsch distance matrix calculated during an alignment. During an alignment, if peaks (C;f(E;L)g) and (C 0 ;f(E 0 ;L 0 )g) from two inputs are deemed close enough, they are merged into a single peak (C 00 ;f(E 00 ;L 00 )g in the output alignment. This new peak has chromosomal origin C 00 =C 0 =C, andf(E 00 ;L 00 )g =f(E;L)g[f(E 0 ;L 0 )g. Peaks that are not deemed close enough are not merged and their values are inserted separately into the new alignment. It remains to dene the distance measure to be used in the Needleman-Wunsch algo- rithm. For peaks P = (C;f(E u ;L u )g) and P 0 = (C 0 ;f(E 0 v ;L 0 v )g), we set Dist(P;P 0 ) = 8 > < > : 1; if C6=C 0 ; max u;v fjL u L 0 v jg; otherwise. The gap penalty is the maximum distance permitted between two aligned peaks. Here we set it to 2000, as an empirical analysis across experiments showed that several large corresponding peaks had coordinate dierences up to 1700 bp. 49 2.4.3 Peak Filtration Based on a priori Information Following the the above alignment, peaks present across all replicates are aligned with the known/predicted origins reported in the OriDB database [180]. This second alignment allows us to further conrm the validity of peaks with a priori knowledge of origin locations which, in turn, allows for an in-depth analysis of the chromosomal fea- tures surrounding the start point of each peak. We align peaks with known/predicted origin locations (as listed in OriDB) to remove some false positives and to determine the precise genomic loci that each BrdU peak emanates from. OriDB lists origins in one of three categories: conrmed (conrmed with an ARS stability assay), likely (inferred in two or more experiments) or dubious (inferred in only one experiment). Based on the assumption that peaks are more likely associated with conrmed than dubious origins, we perform peak/origin alignments in a three-step process designed to align peaks with the highest ranking origin in their vicinity. We begin with the nal sequence of peak locations (A =A) and three sets of chro- mosomally ordered origin locations O C , O L and O D (corresponding to conrmed, likely and dubious origin sets, respectively). An origin location in one of these sets is a triplet O = (O ch ;O s ;O e ) giving its chromosome, its starting coordinate and its 50 ending coordinate, respectively. The alignment proceeds as follows: 1) T = peak/origin pairs Alignment(A;O C ) 2) A =fAnA :A Tg 3) Q = peak/origin pairs Alignment(A;O L ) 4) T =T[Q 5) A =fAnA :A T )g 6) Q = peak/origin pairs Alignment(A;O D ) 7) T =T[Q and the nal set of peak/origin pairs are held in the set T . Although we employ the same gap penalty as during the alignment of replicates described above, we alter the distance function to re ect the fact that peaks located between the start and end coordinates of an origin should have a distance of zero from that origin. Thus, we dene the distance between a peak P = (C;f(E u ;L u )g) and an origin O as follows: Dist(P;O) = 8 > > > > < > > > > : 1; if C6=O ch 0; if9 u s:t: O s L u O e max (O s max u L u ; min u L u O e ); otherwise. 2.5 Validation of Methods To validate our method of enrichment detection we tted an HMM [283] to the average normalized M-values of non-overlapping 1000 bp blocks of probes. The algorithm assigns to each such block the posterior probability of that block being in an enriched region. These probabilities can be used to rank and call potential enriched regions. 51 Here, blocks with posterior probabilities 0:5 were called as enriched. A comparison of the HMM approach with the one presented here shows substantial agreement in positive peak calls (see Figure 2.7). To validate experimentally our peak identication strategies, we compared the set of peaks identied here (in WT cells in HU) with those identied in two previous studies [7,285] where alternatives to the BrdU-IP-chip assay (density shift assay and copy number assay, respectively) were employed to map origins that re in WT cells in HU. There were 141 origins found to re in HU in [285] and 290 in [7]. Here we identied 251 origins as active in HU (Supplemental Table A.1), with 107 (43 percent) overlapping with those identied in [285] and 198 (79 percent) with those identied in [7]. In total 224 (89 percent) of the origins we identied as active were found to re in at least one of the two previous studies (Figure 2.8A) and in total 238 of the the origins we detected as ring were detected in at least one other study (as listed in OriDB; [180]). Examination of the BrdU proles for Chromosome VI origins, whose initiation timings and eciencies have been carefully characterized [67, 287], shows that BrdU peak heights re ect origin characteristics such as replication timing and origin eciency. For example, the early, ecient origins ARS603.5, ARS605, ARS606, and ARS607 exhibit large BrdU peaks, that span up to 20 kbp, re ecting that replication forks from early origins travel up to 10 kb before stalling in HU (Figure 2.7). In contrast, the late-ring origins ARS601/602 and ARS603 incorporate much less or no BrdU (Figure 2.7), re ecting their inhibition by HU in most cells. The relatively early, but inecient, origin ARS608 shows an intermediate-sized BrdU peak, and the very inecient ARS604 exhibits no signicant BrdU incorporation. ARS602.5, ARS603.1, and Dub were more recently identied and their timings and eciencies 52 Figure 2.8: (A) 251 origins are found to re in this BrdU-IP-chip analysis as compared to the 290 identied in [7] and 141 in [285]. Of the 251 origins identied here 224 (89 percent) were identied in at least one of the other two studies. (B) 142 WT peak heights (calculated here) plotted against their times of replication (as calculated in [200]). The Spearman Rank Correlation between peak heights and time of replication was found to be -0.78. (C) A comparison of WT and rpd3 peak heights shows signicant increases (empirical Bayes t-test, p 0:001) in rpd3 heights at origins ARS603, ARS1413 and ARS501 while the same analysis shows no change (empirical Bayes t-test, p> 0:001) at origins ARS607 and ARS1. have not been determined [63, 283]. In addition to these Chromosome VI origins, large BrdU peaks are found at other well-characterized early-ring origins that re eciently in HU (e.g., ARS305 and ARS306 ; data not shown), and small BrdU peaks are found at well-characterized late-ring origins that are inhibited by HU (e.g., ARS501, ARS1413 ; data not shown) [9, 12, 221]. Thus, BrdU incorporation levels generally re ect the native origin eciencies of early origins and the inecient ring of late origins resulting from their inhibition by the checkpoint. However, it is impossible a priori, from these BrdU proles alone, to distinguish whether small BrdU peaks re ect low eciency or late replication timing. To conrm, globally, that our array normalization and peak identication/quantication methods assign peak heights that are proportional to origin timing/eciency, we com- pared the WT peak heights developed here to their times of replication (T reps ) reported 53 in [200]. We found that BrdU peak heights are signicantly anti-correlated withT reps (Spearman's Rank Correlation of -0.78), indicating that high BrdU peaks are associ- ated with early/eciently ring origins, while lower BrdU peaks are associated with later ring less ecient origins (Figure 2.8B). To examine our ability to identify true biological variation across experimental con- ditions, we tested for peak height dierences in the WT and rpd3 datasets (with empirical Bayes t-tests [238]) and compared these results to those in [9]. In this pre- vious study three independent methods were used to compare the replication activity of ve origins (ARS607, ARS1, ARS603, ARS1413 and ARS501 ) in WT andrpd3 cells. These three methods showed no signicant dierence between WT and rpd3 cells in origin ring times at ARS607 or ARS1 but found advanced origin ring in the rpd3 cells at ARS603, ARS1413 and ARS501. Comparisons of BrdU peak heights at these origins demonstrate signicant peak height dierences at ARS603, ARS1413 and ARS501 (p 0:001 for all), but no signicant dierences at ARS607 or ARS1 (p = 0:122 and 0:21 respectively) (Figure 2.8C). 2.6 Chapter Summary The BrdU-IP-chip assay provides an eective technique to identify replication ac- tivity across the genome, and furthermore, the signal magnitude in these data is proportional to the percentage of cells in a culture that re at each origin. As whole- genome analysis of replication dynamics continues to develop, a proper strategy for analyzing these and other datasets with similar characteristics is essential. Here we have shown that traditional strategies for dealing with expression and protein binding ChIP-chip experiments may be sub-optimal for the analysis of these types of data. We 54 have developed strategies for both within-array and between-array normalization that are able to accommodate highly enriched datasets. Furthermore, we have presented peak identication, quantication and alignment tools that use a priori knowledge to remove both false positives and negatives. We have tested these methods both statistically and through a comparative analysis with previous studies to show that they are able to identify enriched regions correctly and that the array normalization and peak identication/quantication strategies are eective in detecting biologically meaningful changes in experiments performed under dierent conditions. 55 Chapter 3: The Spatio-Temporal Map of Yeast DNA Replication Chapter Disclosure: Jared M. Peace performed the nucleosome positioning exper- iments. 3.1 Background The development of genome-wide methods to capture replication, transcription and chromatin structures and modications opens the possibility of assessing how such fac- tors in uence one another. In this chapter we investigate the molecular kinetics that drive the S. cerevisiae replication schedule and begin to uncover what coordination they exhibit with concurrently operating genomic processes. To better understand these timing dynamics we begin by introducing a novel dataset that represents the highest delity temporal map of DNA replication to date. Next, to identify novel candidate limiting factors to DNA replication, we characterize (in the context of this temporal map) two additional datasets designed to capture both pre-S-phase replica- tion protein loading and global origin eciencies. Finally, we analyze transcription activity, epigenetic environment, nucleosome positioning, and 3-D genome structure to ascertain which of these features in uence origin function to help establish the 56 conserved spatio-temporal replication schedule. 3.2 A High-Resolution Temporal Prole of DNA Replication Throughout S-phase To elucidate, with high delity, the temporal program of S. cerevisiae chromosomal replication, we combined BrdU pulse-labeling and immunoprecipitation [251] with high-throughput DNA sequencing (Illumina), which we term (BrdU-IP-Seq). A cul- ture of BrdU-incorporating, but otherwise wild-type cells was synchronized in late G1-phase with -factor and released from the block into S-phase. Every six min- utes, an aliquot of the culture was removed and incubated with BrdU for 12 minutes, and harvested for analysis. Both Fluorescence Activated Cell Scanning (FACS) and the quantity of bulk DNA extracted from individual aliquots suggested that DNA replication occurred from 12 through 84 minutes after release, so we selected those samples for BrdU-IP-Seq (Figure 3.1A and B). DNA sequencing read counts for each time point were binned by genomic location (50 bp non-overlapping bins) and scale- normalized based on their corresponding IP quantity as follows: for a bin (sayj) and pulse-interval (say i) we scale the bin count (B ij ) as follows: B ij =B ij X i X j B ij X j B ij IP i X i IP i ; where IP i is the amount of immunoprecipitated DNA obtained from BrdU-pulse- interval i. This method ensures that for a given pulse-interval the amount of IP material obtained is re ected in the nal sequence count. (Figure 3.1B and 3.1C). 57 Figure 3.1: (A) Fluorescence Activated Cell Sorting analysis of cells released from G1-phase into fresh YEPD media. (B) Genomic and IP DNA extracted from each 12 minute BrdU-pulse. (C) Raw and temporally normalized solexa read counts obtained for each BrdU-pulse. Following scaling, bins were assigned the median count of their neighbors within 2500 bp. A plot of the normalized data for Chromosome X shows the temporal pattern of BrdU incorporation (Fig. 3.2A; for plots of all chromosomes see Supplemental Figure B.5). Initially, isolated peaks indicative of replication origin activity are observed, with new, isolated peaks emerging later, re ecting distinct activation kinetics for these origins. BrdU incorporation at individual origins occurs over several intervals, re ecting a temporal distribution of initiation across the population. These peaks broaden and split into two diverging peaks, indicative of replication forks emanating bi-directionally from origins. Eventually, the migrating forks merge with oncoming forks, forming broad, shallow peaks, inferred to be replication termination (TER) regions. With these data, we calculated the Times of Replication (TReps) for each 50bp bin based on the number of sequence reads that mapped to it at each time interval as 58 Figure 3.2: (A) BrdU-IP-seq was performed on cells released from G1-phase and pulsed with BrdU in time intervals (12-24, 18-30, :::, 72-84). Read counts for Chro- mosome X are shown here. Replication origin ring is highlighted with green circles. Empirically inferred fork locations are highlighted with dashed lines. Forks are ob- served emanating from ring origins and then converging at TERs (highlighted with red circles). (B) The time of replication was calculated for each locus based on the read counts at each time-interval in A. Firing origins exhibit troughs and TERs are represented with peaks. 59 follows: TRep = X T=f6;12;:::;78g TC T X T=f6;12;:::;78g C T whereC T is the read count in the region at time interval [T 6,T + 6]. These TReps correlate well with a previously reported dataset (Spearmans Rank Correlation=0.78; [200]). Peaks in the TRep prole should represent origins, while troughs should represent TERs (Figure 3.2A and 3.2B). Thus, to identify possibly novel origins and TERs we searched for peaks and troughs (with a minimum peak-peak and trough- trough separation distance of 10 kbp) in the TRep proles. This analysis identied 284 peaks (ring origins), 282 of which map to previously identied origins (Figure 3.3A), as well as 234 troughs, predicting TERs (Figure 3.3B). A previous analysis of BrdU-IP-chip data predicted the locations of 71 TERs, 66 of which are included in our dataset. Thus, with this analysis we were able to more than triple the total number or putative TERs. 3.3 Replication Origin Firing Rates in HU and Cdc45 Binding in G1-phase are Predictive of Replication Timing Schedules We have previously reported that when cells are synchronously released into HU and analyzed with BrdU-IP-chip, BrdU peak-heights associated with origins correlate with their TReps (Chapter II). To test this hypothesis with the new sequence-based data, we performed BrdU-IP-Seq on cells released from G1-phase into HU for 1 hr. Figure 3.4A shows reads plotted along Chromosome VI; 307 BrdU peaks that aligned 60 Figure 3.3: TRep proles were analyzed for peaks and troughs (corresponding to TERs and origins) with minimum inter-peak or trough separation distances of 10 kbp. (A) Troughs correspond to sites where origins (as listed in OriDB) have been identied (pink dots). (B) Peaks correspond to sites anked by previously identied origins (pink dots). Green dots indicate TERs that have been identied in previous studies. to origins (as listed in OriDB) were detected. In addition to these origins we have also detected strain-specic origin ring at 45 other sites (described in Chapter VI; Supplemental Table A.2). For a more robust analysis we have chosen to include these loci in our subsequent analysis as they represent functional origins with potential to re. We computed the number of reads that mapped to within 2500 bp of each of these origins in the HU experiment (Figure 3.4F; hereon referred to as HU-eciencies) and in each of the BrdU time-course intervals (Figure 3.4F). The correlation between HU- eciencies and mean time of read-mapping for each origin was high (Spearmans Rank Correlation=0.92), validating our previous assertion that HU peak-heights re ect origin-ring times. 61 Figure 3.4: (A) BrdU-IP-seq analysis of cells released from G1-phase into YEPD + HU for one hour (Chromosome III). B) BrdU-Pulse-IP-seq (cells and data were analyzed as described for Figure 3.2A; Chromosome III). (C) ChIP-IP of ORC (per- formed on an asynchronous cell culture). (D) ChIP-chip of MCM2+4 (performed on G1-arrrested cells). (E) ChIP-chip of CDC45 (performed on G1-arrested cells). (F) The number of reads from (A) mapping to within 500 bp each origin were calculated and origins were sorted based on these counts (HU-eciency; left panel). The number of reads mapping to these same regions were then computed for each time interval in the BrdU-pulse data and were then sorted based on HU-eciency (second from the left panel). ORC, MCM and CDC45 binding at each origin was also quantied with ChIP-chip and sorted based on HU-eciency (right panel). 62 Figure 3.4: Continued 63 To determine whether origin ring dynamics dictate most (if not all) of the genome- wide replication schedule, we performed linear regression analysis using the mean and variance of each 10 kbp region's TRep as independent variables, where TRep variance for a given genomic region was calculated as: 2 TRep = X T=f6;12;:::;78g C T (TTRep) 2 X T=f6;12;:::;78g C T We included as predictors the distance to and HU-eciency of the nearest origin for each region. Also included as a predictor was an interaction term composed of the individual measures' product (distance to nearest origins 1/HU-read-count). The results show that, individually, the distance to and HU-eciency of the nearest origin are both predictive of a given locus' TRep (P-value< 0:001), and that the interaction term provided no further prediction power. Furthermore, the HU-eciency of the nearest origin alone was predictive of each locus TRep variance (P-value < 0:001). This indicates that both the mean and variance of each genomic region's TRep is determined by local origin ring dynamics. To determine whether dierential occupancy at origins of specic replication proteins might be predictive of origin activity, we performed ChIP-chip of ORC (in unsyn- chronized cells), Mcm2+4 (in G1-arrested cells) and Ccd45 (in G1-arrested cells), the levels of which have been correlated with origin eciency or timing; enriched regions were detected and assigned a peak strength with the methods described in Chap- ter II while unbound origins were assigned a peak strength of zero. The data show that ORC binds 303 origins, Mcm2+4 binds 350 origins, and Ccd45 binds 29 origins (Figure 3.4 C-F). To determine whether the levels of these origin-protein associations 64 were predictive of origin eciencies, we performed multivariate regression analysis us- ing HU-eciency as the independent variable and ORC, Mdc2+4 and Cdc45 binding values as the predictors. The results show that both ORC and Cdc45 binding levels at origins are predictive of their HU-eciencies (P-value < 0:001), whereas Mcm2+4 binding levels do not correlate with origin eciencies. 3.4 Unperturbed Transcription Does Not In uence Replication Timing or Direction in S. cere- visiae Cells Above we have demonstrated that, in S. cerevisae cells, origin HU-eciencies are predictive of the entire genome's replication timing schedule and also that these e- ciencies are dependent on the association of a limited replication factor (Cdc45). In the next sections we examine what concurrent genomic processes are correlated with and perhaps causative of origin activity. First, to determine if local genic architecture correlates with origin function we mapped each of the 253 ACSs identied in [57] to intergenic regions. We found that these mapped preferentially to converging inter- genic loci (P-value = 0:0013). We also found that the diverging intergenic regions to which origins mapped were signicantly wider than other ACS-containing inter- genic regions (P-value < 0:001) and also wider than non-ACS containing diverging intergenic regions (P-value< 0:001). However, we found no signicant correlation be- tween intergenic class and origin eciency (tested with ANOVA using HU-eciency as output and intergenic class as treatment). To determine if genetic activity (as opposed to architecture) in the vicinity of origins 65 aects their eciency, we computed, for each origin, a direction specic transcription prole that represents the anking genic activity in G1-arrest. We binned (50 bp bins) and averaged probe-values from the G1-phase arrested Watson and Crick expression arrays presented in [80]. We then computed a direction specic transcription prole surrounding each ACS. If the ACS was in a tandem intergenic region it was assigned a transcription prole from bins within 10 kb of the ACS from the expression array corresponding to the anking genes direction (i.e. if both genes are transcribed in the Watson direction the corresponding array was used). For converging intergenic regions the left side of the prole was computed from the Watson array and the right side from the the Crick array (vice versa for diverging intergenic regions; Figure 3.5A). A visual inspection of each individual intergenic class of ACSs (sorted by HU- eciency) shows no dened patterns of transcriptional activity that correlate with origin eciency. To conrm this statistically we performed regression analysis using the anking transcripts' lengths, expressions (in G1-arrest) and distances from the ACS as the predictors and found no signicant relationships with replication activity. Therefore, in contrast to in higher eukaryotes, S. cerevisae replication origin ring dynamics are not correlated with local transcription activity. In [245] the authors demonstrated that in bacteria, the genome is organized to min- imize the probability of collisions between converging transcription and replication complexes. To test if S. cerevisiae cells employ this strategy, for each 50 bp re- gion of the genome, we calculated both a Watson and Crick TRep expression prole (Txn TRep ) that, for each genomic region, represents the level of transcription (in a given direction) at the time it is replicated. For this analysis we used expression arrays 0, 5, ..., 85 minutes from [80] (we refer to these asTxn(0);Txn(5);:::;Txn(85)). For each of these arrays (keeping strand specicity) we calculated the mean probe level 66 Figure 3.5: (A) Transcription proles around each ACS calculated and then sorted (separately for each intergenic class) based on their associated ACSs HU-eciency. (B) Transcription between collision sites and their anking origins was computed in both co- and bi-directions (with respect to replication fork movement). Collision sites are sorted based on the distance between their two anking origins.(C) tRNAs, Long Terminal Repeats and Transposable elements that are transcribed in the direction with and opposite to the direction replication forks move through them. 67 for each 50 bp bin in the genome. Then for each BrdU-pulse interval we calculated both a Watson and Crick genome-wide expression prole (Txn BrdU ) using a weighted average of all expression arrays: Txn BrdU (t) = X T=f0;5;:::;85g jtTj 1 Txn(T ) X T=f0;5;:::;85g jtTj 1 ; whereTxn BrdU (t) represents the transcription activity during time interval [t6;t+6]. Following this, for each 50 bp bin, we calculated the Watson and Crick Txn TRep proles as follows: Txn TRep = X t=f6;12;:::;78g C t Txn BrdU (t) X t=f6;12;:::;78g C t : Using these measures, we calculated the mean transcriptional activity moving toward and away from each putative TER, between it and its anking origins (corresponding to co- and anti-directional transcription with respect to the incoming replication fork; Figure 3.5B and C). We found no dierence between transcription activity moving in the same or opposite direction as replication forks in these regions (P-value > 0:1; signed rank test), indicating that S. cerevisiae cells have developed a mechanism to prevent collisions between converging replication and transcription complexes that is dierent from that observed in bacteria. To determine if the genome is organized such that highly transcribed tRNAs, Transposable Elements or Long Terminal Repeats are organized to minimize transcription/replication complex collisions, we mapped each such feature to the regions analyzed in Figures 3.5.B and C. For each feature we determined if it was replicated in the direction with or against its transcription 68 directionality (Figure 3.5C). We found that none of these features shows preferen- tial orientation with respect to the replication fork (P-value > 0:1; hypergeometric test). 3.5 Chromatin Organization Correlates with Repli- cation Timing Chromatin modications and structure have been shown to regulate local genomic processes (reviewed in [127]). In yeast, of the histone modications studied, acety- lation has been most closely linked with replication timing [9, 42, 79, 268] whereas in higher eukaryotes both the acetylation and methylation states within a replica- tion domain are predictive of its replication timing schedule [213,216]. Here we took advantage of an existing genome-wide map of yeast histone modications (H3K9ac, H3K14ac, H4ac, H3K4me1, H3K4me2, H3K5me3 and H3K36me3, [196]) to perform a correlation-based analysis of how modications localizing to origins eect those ori- gins' HU-eciency. For each of the 352 origins that mapped to an ACS identied in [57] we calculated the mean ChIP-chip probe signal within 500 bp from arrays corresponding to each modication (Figure 3.6A). To test whether any modication was predictive of HU-eciency we performed linear regression analysis using mod- ication signals as predictors and HU-eciencies as the independent variable. To take into account the possibility that a combinatorial histone-code may exist, we also included an interaction term as a predictor for each pair of modications in the regression model. With this analysis we found that no individual and no pair of modications was predictive of origin function. With a simpler univariate analysis we found that the modication that correlated most closely with HU-eciency was 69 H3K79me3 (Spearman's Rank Correlation =0:11). Two recent studies have analyzed nucleosome positioning around the ACS and both have concluded that no correlation exists between nucleosome positioning and origin ring dynamics. In one of these studies MNase in combination with high density micorarrays was used [19], and in the other single-end sequencing was performed [57]. Of the two studies, the sequencing-based approach oered a more precise nucleosome map. However, even with this analysis the nucleosome density map may not be of a high enough delity to identify any small dierences that may exist in nucleosome placements at late vs. early origins. With single end sequencing, to identify puta- tive nucleosome positions one must merge two initial analyses (corresponding to the inferred left and right boundaries of each nucleosome). We reasoned that a more accurate map could be assembled by instead using paired end sequencing. With this strategy we analyzed asynchronous and G1-arrested WT cells in duplicate as in [15]. Paired-end reads were mapped and the density of their midpoints (the mean of the paired ends' mappings) along the chromosome was calculated with a kernel density function. Peaks in this density curve that were separated by a minimum distance of 140 bp were then identied (giving an initial set of putative nucleosome positions). To lter dubious sites, for each experiment we simulated 100 such density curves by rst building a mock MNase sequence library (where a read's mapped length was sampled from the real data's empirical size distribution and its genomic location was sampled randomly from the genome). For each mock dataset we applied a kernel density curve and identied peaks as described above for the real data. Peaks in the real data were only considered further if they were higher than 99% of the simulated peaks within 1000 bp of their apex. Nucleosome positions from biological replicates were then aligned between replicates with the dynamic programming alignment al- 70 gorithm described in Chapter II. Only aligned positions (nucleosome positions that were inferred in both replicates) were kept for subsequent analysis. Figure 3.6B shows the nucleosome positions that were inferred to exist within 1000 bp of each origin's ACS (ACS locations were taken from [57]). As was determined in [57] the ACS of almost all origins is nucleosome free. Furthermore, as discussed in [19] a signicant variability in nucleosome positioning is observed across origins (in both asynchronous and G1-arrested cells), however no clear correlation between positioning or NFR size with HU-eciency is observable. To perform a more direct comparison of nucleosome positioning as it relates to HU- eciency, for both the asynchronous and G1-arrested datasets we developed a density curve of nucleosome locations around the ACS for the top and bottom quartile of origins (as measured by HU-eciency; for the nucleosome density curves developed with all origins, see Supplemental Figure B.6A). Figure 3.6C shows that the size of the NFR does not appear to be dierent between ecient and non-ecient origins. However, the phasing of nucleosomes appears to be less dened for ACSs in the lower quartile at distances greater than 500 bp away from the ACS. For example, in both the asynchronous and G1-arrested cells the fourth nucleosome upstream of the ACS is shifted towards the ACS in the lower quartile origins (with respect to in the top-quartile origins; green arrows). Furthermore, the nucleosome upstream of this (orange arrows) is well dened in the top-quartile origins but appears to be split among two locations in the bottom-quartile origins. Finally, the third nucleosome downstream of the ACS appears almost non-existent in the bottom-quartile origins (magenta arrow). These same observations are made when replicates are analyzed individually (Supplemental Figure B.6B). 71 Figure 3.6: (A) Average ChIP-chip signal of histone modications within 500 bp of each ACS-aligned origin (B)Nucleosome positions within 1000 bp of each ACS aligned origin in asynchronous (left panel) and G1-arrested (right panel) cells (C) Nucleosome density (with respect to distance from the ACS) in the top and bottom quartile ecient origins (as measured by HU-eciency) in asynchronous (left panel) and G1-arrested (right panel) cells. (D) Origins sorted by their Hi-C clustering order (right panel). For each cluster Cdc45 binding (via ChIP-chip) is shown in the second from left panel, average HU-eciency is shown in the third from the left panel and average timing (BrdU-Pulse-IP-seq) is shown on the rightmost panel. 72 Figure 3.6: Continued 73 As outlined in Chapter I, origins cluster in the nucleus prior to initiating replication. In a recent study Duan et al. [53] produced a Hi-C dataset to capture long-range chromatin interactions (in G1-arrested yeast cells). With this data they performed a 2-D clustering of early origins (i.e. origins that are not dependent on Clb5 and are not regulated by Rad53; [159]). With this analysis they identied two clusters of origins that interact with each other. To ask if early origins cluster separately from later origins, we performed the same analysis on the 352 origins identied with BrdU-HU; 126 origins whose dened regions (as listed in OriDB) overlapped two EcoRI or HindIII restriction fragments were not analyzed. With the Hi-C data, we rst built a 2-D interaction matrix where the value corresponding to, for example, row i and column j represents the interaction distance (a value between 0 and 4; as dened in [53]) between origins i and j. Following this, the interaction matrices for the EcoRI and HindIII datasets were summed and the 2-D clustering algorithm dened in [53] was applied. Figure 3.6D (left panel) shows that one large well dened 80 origin cluster is formed. In [53] origin clustering was performed separately using the EcoRI and HindIII datasets and only early origins were analyzed. We reason that this is why they captured two as opposed to one cluster of origins. Of the 226 origins that were analyzed, 25 showed Cdc45 binding, and 21 of these origins were contained in the cluster (P-value < 0:001, hypergeometric test; Figure 3.6D second from left panel). Next, we calculated the mean HU-eciency for each cluster and found the 80 origin cluster to have high-HU-eciency as compared to the origins outside of the cluster (P-value < 0:001, rank sum test; Figure 3.6D second from right panel). We do note that, individually, several other clusters showed higher HU-eciency than the large cluster, but each of these sets contained fewer than 5 origins. To determine how origin clustering in the nucleus relates to the timing prole of origins, for each time-point in the BrdU-Pulse-IP-seq dataset, we calculated the mean number of reads 74 that mapped to origins within each cluster. Figure 3.6 (right panel) shows that, in concordance with the HU results, origins within the large cluster have a very early replication timing schedule. 3.6 Chapter Summary We have developed a high-delity, spatio-temporal map of S. cerevisiae chromosomal replication using BrdU-Pulse-IP-seq. With this dataset we have extracted over 150 novel termination sites and have further conrmed many existing origin locations. With a second BrdU-IP-seq dataset we have conrmed that HU-eciencies are highly predictive of origin timing. Furthermore, we have demonstrated that for any genomic region the mean and variability of its TRep is dened by the distance to and the eciency of its closest origin, indicating that the entire timing schedule of the genome is dened by individual origin ring times. By analyzing ORC, MCM and Cdc45 binding we have determined that ORC and Cdc45 binding are predictive of origin timing. In higher eukaryotes local transcription activity is correlated with origin function. We have shown that does not hold in S. cerevisiae cells. Furthermore, unlike in higher eukaryotes, where origins typically localize to the 5' ends of genes, yeast origins preferentially localize to 3 0 gene ends. Finally, unlike in bacteria, the S. cerevisiae genome is not organized to minimize collisions between replication and transcription machineries, indicating that eukaryotic cells have evolved to resolve these collisions after they have occurred. We have also analyzed chromatin modications and structure in relation to local replication activity. Based on the available data, we have determined that histone 75 modications do not correlate with replication timing. Through analysis of nucleo- some positioning, we have demonstrated that some dierences exist between ecient and inecient origins in their nucleosome maps. Brie y, the phasing of nucleosomes at more ecient origins is well dened for up to 1000 bp away from the origin while it becomes less dened at 500 bp from the ACS at less ecient origins. Finally, we have analyzed the 3-D structure of the genome and determined that a large sub- set of origins that bind the majority of Cdc45, and re eciently in HU and early in unperturbed S-phase, cluster tightly and separately from non-ecient origins in G1-arrested cells. Taken together, this suggests that origin timing is likely dened by the propensity of an origin to cluster with others. Once clustered, these origins share the ability to attract limiting replication factors, which in turn allows them to re early in S-phase. 76 Chapter 4: Modeling DNA Replication 4.1 Background In Chapter I a molecular model for the replication timing schedule of the cell was introduced. Brie y, in this model each origin's ring time/eciency is dened by its ability to attract replication factors from a limited nuclear pool (propensity for factors, PF ). Early origins are those with a high PF , and in late G1-phase and early S-phase these origins recruit the majority of the limited pool. This allows these origins to re while concurrently reducing the probability of those with lesser PF s from ring by depleting the pool of factors. Eventually, as the early origins re and their replicons terminate their associated factors are returned to the nuclear pool and made available to the unred, later origins. As this process continues, the later origins eventually attract enough factor(s) and re. In Chapter II we demonstrated that an origin's HU-eciency correlates well with its replication timing schedule. We have shown that Cdc45 is limited in its late G1- phase localization (relative to ORC and MCM). Furthermore, the origins that Cdc45 77 binds in late G1-phase show high HU-eciency and early replication. Finally, we have shown evidence suggesting that both early origin clustering and well-phased nucleosomes around the origin ACS promote HU-eciency and early ring. Taken together this evidence points to a molecular model where an origin's nuclear location and local nucleosome positioning, at least partially, dene its PF . The clustering of several origins likely creates co-localized clusters of limiting factors, which in turn increases the probability of a clustered origin recruiting such a factor. As outlined in Chapter I, several attempts have been made to model the yeast replication timing schedule, but all have involved assigning a density to origins' timing schedules rather than letting the timing schedule be dened by upstream events (i.e., by a molecular model driving the timing kinetics). In this chapter we present a mathematical model for replication that is based on and extends upon this proposed molecular model. The extensions to Rhind's model [205] that we have made are based on the observation that replication factors accumulate at replication factories [41,244]. We hypothesize that the concentration of factors at a factory denes the ring probability of each unred, clustered origin. We also hypothesize that when the factor concentration at a factory reaches a threshold (likely coinciding with a surplus for the unred origins) the probability of the origins within that factory ring moves towards one. Furthermore, due to our previous observation that Cdc45 is limiting in G1- and early S-phase and the fact that Cdc45 travels with the replication fork, we hypothesize that the limiting factor(s), once associated with a red origin remains with its replicon until it terminates. 78 4.2 A Stochastic Model Of DNA Replication The model simulates S-phase in a synchronous and genetically homogenous cell cul- ture by stochastically modeling DNA synthesis separately in multiple, individual cells. In each cell the linear organization of origins along the chromosomes is identical. Fig- ure 4.1A shows two cells containing one chromosome (red solid line) and four origins (red circles); the size of each origin's corresponding circle represents its PF (note origins have the same PF in each cell). In the model, cells release asynchronously from G1-phase, however, within a cell culture, cells share a common mean time of release (see Model Implementation). We simulate limiting factors as one protein (LF ) whose concentration ([LF ]) after G1-release is dened by a deterministic con- centration curve that increases through time to simulate increased translation of the limiting protein(s) (in Figure 4.1A dark blue rectangles represent newly translated LF within the nucleus). This [LF ] curve was tested against two alternatives; in one [LF ] was held constant throughout S-phase, and in the second [LF ] rst increased and, after a specied time, decreased (to simulate protein degradation). In comparison to these other curves, the monotonically increasing curve and its associated model t experimental data with the highest accuracy (see Model Selection). Note in Figure 4.1 Cell II does not begin to translate LF until after time-point II indicating that it has released from G1-phase later than Cell I. As each cell progresses through S-phase its origins accumulate LF in their nuclear vicinity, stochastically, with a rate proportional to their PF . In Figure 4.1 dark blue solid arrows represent free LF (LF free ) being distributed to origins. The light blue rectangle at each origin represents the amount of LF it has accumulated at that time. In order for an origin to re it must have accumulated at least one unit of LF in its 79 nuclear vicinity (in Figure 4.1A one unit ofLF is represented as the square box with the green top surrounding each origin). After an origin has accumulated a threshold amount ofLF it res deterministically ( ofLF is represented as the blue rectangle surrounding each origin; note > 1). During the time an origin, i, has accumulated an amount of LF such that 1 < LF i < it res with a probability proportional to that amount (i.e. the more LF it has accumulated, the higher probability it has to re). After an origin has red the amount of LF it has accumulated exceeding one unit is returned to LF free and it no longer recruits LF. For example in Figure 4.1A, in Cell I the second origin from the right has accumulated one unit of LF at time-point II and had thus gained the potential to re. Between time-points II and III it has gained an additional amount and red. Red arrows leading into an origin indicates it has red between the current and previous time-points. After ring, the LF exceeding one unit that has accumulated at the origin is released to LF free (dashed dark blue arrows leaving the origin and the light blue rectangle with dark blue boarder) and its PF is set to zero (as indicated by its corresponding circle becoming empty). The required unit ofLF at a red origin is distributed between the two resultant replication forks (note in Figure 4.1A one unit of LF remains with the origin). In Figure 4.1A newly synthesized DNA resulting from ring is represented by the diverging red dashed arrows emanating from the origin. Fork movement in the model is stochastic and follows a mean rate F that is shared by all regions of the genome and between cells in a culture. When a replication fork from a red origin encounters an unred origin (i.e., passive replication) the unred origin's accumulatedLF is returned toLF free and itsPF is set to zero. For example, for the second-from-left origin at time-point IV in Cell I, the yellow circle indicates it has been passively replicated and the yellow arrow and yellow borderedLF represent 80 its released LF . When two converging forks from neighboring origins collide, they terminate and their associatedLF (one-half unit per fork) is returned toLF free . For example, the rightmost and second-from-right origins' forks collide in Figure 4.1A, in Cell I at time-point IV (green arrows and green bordered LF represent the newly freed factors being returned to the nuclear pool). Finally, when a replication fork reaches the end of the chromosome it terminates and its associated LF (one-half unit) is returned to LF free (e.g. the right-most origin's right-moving fork in Cell I at time-point V; light blue arrows and light blue-bordered LF represent the factor being returned to the nuclear pool). We t this model to the BrdU-Pulse-IP-seq dataset described in Chapter III using Approximate Bayesian Computation (ABC [16, 156, 198]; see Model Fitting and Se- lection for details). This technique requires a measure of similarity between simulated and real data. To compare simulations of a cell culture's replication schedule (pro- duced with the above model) with the data, we perform simulated BrdU-Pulse-IP-seq experiments on individual populations. For each cell, we keep track of which genomic regions are replicated during each experimental time interval (0-12, 6-18,. . . , 78-90 minutes). These regions are what would be experimentally extracted if the cells were pulsed with BrdU during these times (Figure 4.1B shows which regions would be extracted from the the two cells shown in Figure 4.1A with BrdU-Pulse intervals of I-II, I-III,:::, VI-VII). For each interval, we combine and stochastically fragment (to a mean length of 500 bp) the replicated regions from all cells to simulate BrdU-IP and DNA shearing (Figure 4.1C). Finally, a simulated size-selection procedure is per- formed (selecting for 300-700 bp fragments) and the resultant reads (Figure 4.1D) are mapped and binned (50bp bins) by chromosomal location (Figure 4.1E). 81 Figure 4.1: (A) Two cells with four origins (red circles indicate origin locations) are simulated whereLF begins accumulating at time I-II in Cell I and II-III in Cell II (dark blue boxes). As it accumulates LF is distributed to origins (dark blue arrows) based on theirPF (red circles indicate an origin's relative PF magnitude). After an origin has accumulated one unit of LF (blue boxes with green tops surrounding ori- gins) it res stochastically, but until it has red it continues to accumulate LF . If it accumulates LF (larger blue box with blue top) it res deterministically and forks begin to emanate from it (red dashed lines; arrows indicate fork direction). When an origin is replicated passively it loses its PF is set to zero (yellow circles) and its accumulated LF is returned to the free pool (yellow arrows and blue rectangles with yellow boarders). When a fork collides with another fork or reaches the end of the chromosome its associated LF is returned to the pool (green arrows and light blue boxes with green boarders represent limiting factor being returned from fork-fork col- lisions while light blue arrows and light blue boxes with light blue boarders represent limiting factor being returned from forks reaching the telomere) B) To simulate BrdU- Pulse-IP-seq, for all cells the regions that are replicated at each time are recorded and combined and segmented (C). Following this size selection is performed (D) and then segments are mapped (E). 82 Figure 4.1: Continued 83 4.3 Model Implementation In the model, each origin has a genomic location and PF . We have previously de- scribed a BrdU-IP-seq dataset that was produced with WT cells released from G1- phase into HU for one hour (Chapter III). Since HU blocks fork progression and thus prevents most passive replication, and because sequencing analysis lacks the non-linearity of peak signal observed with array-based methods (due to probe satura- tion), we hypothesized that sequence reads mapping to an origin would be predictive of its PF in this dataset. Thus, for each origin we assigned a location equal to its peak apex and aPF proportional to the number of reads that mapped to within 500 bp of its apex (PF s were linearly scaled to sum to one for each simulated cell). The free parameters in the model are used to dene the delay/asynchrony with which cells release from G1-phase, [LF ] in the nucleus after release from G1-phase, the variability with which LF free is distributed to origins between cells, and the rate of fork movement. 4.3.1 Cell Asynchrony Cell asynchrony is dened by a single parameterD. At the beginning of a simulation each cell in the culture is assigned a delay D i exponential(D) for i = 1; 2;:::;N where N is the number of cells in the culture. As the simulation ensues, the [LF ] of a cell, i, is kept at zero until D i minutes have passed. At this point the [LF ] curve LF C dened below is used to determine i's [LF ]. Thus after a group of cells has been simulated for t minutes, a cell i within that culture holds LF C (tD i ) within its nucleus if tD i and 0 if T <D i . 84 4.3.2 Concentration of Limiting Factor All cells share the same [LF ] curve as it is deterministically dened. We have modeled [LF ] with three dierent schemes. In the simplest scheme,LF 0 C , we set [LF ] to a con- stant value (LF const ) throughout S-phase (LF 0 C (t) =LF const fort = 1; 2;:::; 90). In a second scheme, LF 00 C , [LF ] begins at an initial level LF O and then increases (as S-phase progresses) to LF MAX . We implement this increase with a generalized logistic function (GLF; [206]). LF O and LF MAX are incorporated into the GLF as follows: LF 00 C (t) = (LF MAX LF O ) 1 +e LF G (tLF T ) (LF MAX LF O ) 1 +e LF G LF T +LF O ; where LF G and LF T are the growth rate and in ection point of the GLF. The third scheme LF 000 C extends upon LF 00 C to have [LF ] decrease at a specied time after reachingLF MAX (simulating protein degradation). For this, only one additional parameter (LF W ) is required, representing the amount of time that [LF ] remains above LF MAX before being degraded. From t = 0; 1;:::;t max where t max is the smallestt s.t. LF 00 C (t;LF O ;LF T ;LF G ;LF MAX )Q 0:99 (LF 00 C (LF O ;LF T ;LF G ;LF MAX )), LF 000 C (t) =LF 00 C (t) (Q 0:99 represents the 99th quantile). Fromt max tot max +LF W 1, LF 000 C = LF 00 C (t max ). From t max +LF W until the end of S-phase LF 000 C is dened by LF 00 C (LF O ;LF T ;LF G ;LF MAX ) from inf;:::;t max re ected about the vertical plane (at t max ) and adjusted to have a minimum of zero rather than LF O . Thus, at time tt max +k we dene LF 000 C as: LF 000 C (t) = LF 00 C ([(2t max ) +k 1]t)L O LF 00 C (t max )L O LF 00 C (t max ) 85 4.3.3 Distribution of Limiting Factor As S-phase progressesLF is distributed to origins using aDirichlet r.v. with param- eters that are proportional to the origin's corresponding PF s. As described above, PF i (fori = 1; 2;:::n, wheren is the number of origins in the genome) are scaled such that they sum to one. The variance of a Dirichlet random variable increases mono- tonically with the sum of its input parameters. Therefore, by multiplying eachPF by an additional free parameter to givefE 1 ;E 2 ;:::;E n g =fPF 1 ;PF 2 ;:::;PF n g we allow the variance of the output parametersfX 1 ;X 2 ;:::;X n g to be free in the model: Var(X i ) = PF i (PF i ) 2 ( + 1) = PF i (1PF i ) + 1 : The result of this is that the variance (across a cell population) of the amount of LF free that is distributed to an origin is inversely proportional to . Therefore, by increasing we decrease the variance of each origin's ring time. Before an origin has accumulated one unit ofLF in its nuclear vicinity its probability of ring is zero. When an origin, i, has accumulated LF i where 1LF i < , it res with probabilityBernoulli([LF i 1]=[ 1]). If LF i reaches a value then it res deterministically. When i res the amount of LF exceeding one is returned to the nuclear pool to be recruited by other origins. Thus, at a given time-point, where the setFired contains the set of origins that have already red, we distribute LF free to origins in an iterative manner as follows: LF free = max(L C (t) n X i=1 LF i ; 0) 86 [E 1 ;E 2 ;:::;E n ] = [PF 1 ;PF 2;:::;PFn] while(9E i > 0^LF free > 0) [D 1 ;D 2 ;:::;D n ]Dirichlet(E 1 ;E 2 ;:::;E n ) [LF 1 ;LF 2 ;:::;LF n ] = [LF 1 ;LF 2 ;:::;LF nN ] +LF free [D 1 ;D 2 ;:::;D n ] 8i = 2Fired :ifBernouli(min( 1;LF i 1; 0)= 1);Fired =Fired[i LF free = max(0;L C (t) X i= 2Fired LF i + X i2Fired LF i 1 8i2Fired;LF i = 1&PF i = 0 [PF 1 ;PF 2 ;:::;PF n ] = [PF 1 ;PF 2 ;:::;PF n ]= n X i=1 PF i [E 1 ;E 2 ;:::;E n ] = [PF 1 ;PF 2 ;:::;PF n ] end 4.3.4 Firing Times, Fork Movement and Termination At time t, after the LF distribution stage, newly red origins are assigned a ring time oft. We keep track of all origins' left and right forks with parameters L andR. At time t the uncollapsed forks of a red origin (say i) travel stochastically with a mean rate F (a free parameter that cells within a culture share). L i and R i at time t + 1 are set as follows: L i (t + 1) =L i (t)f R i (t + 1) =R i (t) +f where fexponential(F ). If, at timet, two neighboring origins' forks meet (say wherei is the leftmost origin and 87 j is the rightmost origin) we assign a termination site with linear interpolation: R i (t) =L j (t) =R i (t 1) + [L i (t 1)R i (t 1)] R i (t)R i (t 1) L j (t 1)L j (t) : 4.4 Model Fitting and Selection For the three models corresponding toLF distribution schemesLF 0 C ,LF 00 C ,LF 000 C , there are ve, eight and nine parameters, respectively, to be tted to the BrdU-Pulse-IP-seq data. All models include parametersD,LF MAX (forLF 0 C LF const =LF MAX ),, and F . Scheme LF 00 C requires three additional parameters (LF O , LF T and LF G ), while schemeLF 000 C also requires these three and a fourth (LF W ). As mentioned previously, to t these, we have applied ABC, which requires the assignment of prior distributions to each parameter. We estimatedD based on the observed budding indices of the cells when they were released from G1-phase (Supplemental Figure B.7). Some cells begin to bud at 36 minutes and virtually all have budded by 60 minutes; thus, we set the prior distribution ofDuniform(0; 24). ForLF MAX we reasoned that within a cell, all origins could potentially re immediately upon release from G1-phase ifLF O >n. This was not observed in the data, so that a value much greater than this would not be appropriate and would result in many poorly performing simulations. To prevent this we set LF O uniform(0; 1:5n). We had no experimental observations to estimate except that we had previously analyzed models where was set constant at 1. Based on the observation that these simulations performed reasonably well, we chose to sample this parameter as uniform(0; 100). As was the case for , we had no experimental results on which to base an estimate of . For an initial test we decided to simulate this parameter as uniform(0; 5) (as seen below this 88 estimate was accurate) . Fork rates have been previously estimated in the range 1.5-3 kbp/min. Based on this we set a prior F uniform(1; 3000). In G1-arrested cells 30 origins bind Cdc45 (an essential protein for ring with two copies per origin). We reasoned that LF O should be set such that a similar number of origins could be bound immediately upon entry into S-phase, so we set LF O uniform(0; 70). For LF G we used the full range of likely growth rates LF G uniform(0; 1) and we sampled LF T from all times within S-phase LF T uniform(0; 84). For each modeling scheme we simulated 1M sets of model parameters and, with each parameter vector, simulated a BrdU-Pulse-IP-seq experiment from a population of ten cells. To make comparisons between simulated and experimental data more accu- rate, simulated reads that mapped to repetitive regions were not considered further. Furthermore, bin counts for each simulation were scaled so that their global sum is equal to that of the experimental data. We calculated each simulation's error as the global dierence between experimental and simulated binned read counts. Figure 4.2 shows the error distribution for the three model schemes. The error for LF 0 C is noticeably higher than for both of LF 00 C LF 000 C , thus we did not consider it further. To compare between LF 00 C and LF 000 C we performed a rank-sum test on their top 1% simulation errors. With this we determined that the tails of the two error curves were statistically equivalent (P-value> 0:1). Thus, for subsequent analysis we chose the simpler model (LF 00 C ) and estimated the posterior parameter distributions as the empirical distribution of the parameters from the top 1% of simulations. The posterior distribution of D is centered on 8 minutes which corresponds quite well with budding indices. The posterior distribution of is close to the upper extremity of the prior (100) but it appears as though it reduces in density as it reaches 100, making an adjustment to the prior unnecessary. LF O shows a posterior distribution 89 that indicates its levels are close to zero at the onset of S-phase. This may seem contradictory to the slightly higher levels of Cdc45 binding that we observe in G1- arrested cells, however one must take into account the possibility that the amount of Cdc45 bound to origins at this stage is insucient for ring, and also the fact that other limiting factors likely exist. As predicted with the prior distribution selection, the maximum amount of LF (LF MAX ) reaches close to the number of origins in the system. The parameter with the broadest distribution is LF T whose posterior mode is 0:2. A possible reason for this is that for a given time-point the same [LF ] can be reached with a lesser value of LF T if it is combined with a larger LF MAX . The most dened posterior distribution is that ofLF T ; these posterior values correspond to the GLF in ecting from exponential to logarithmic growth at 25 minutes. Finally, we extracted a fork rate with mode 1400 bp/min. This is slightly lower than other estimates and may re ect the fact that we have added some origins to the model that really do not re in WT cells. 4.5 Analysis of the Fitted Model To produce a nal dataset, we simulated replication in 1M cells, where each cells' parameters were sampled from the posterior parameter vectors. The results of these simulations were then combined and processed, as a single BrdU-Pulse-IP-seq data set. A comparison of real and simulated mapped reads at each time-point shows that the model begins and completes replication at similar times to that indicated in the real data (Supplemental Figure B.8). Furthermore, the maximum replication activity peaks at approximately the same time ( 36 minutes) in both datasets. Heat maps of read counts along the chromosome through time also reveal highly 90 Figure 4.2: For each model scheme 1M variable simulations were performed and for each simulation a mock BrdU-Pulse-IP-seq dataset was produced. (A) Error distributions for each scheme. (B) For each parameter a posterior distribution was assigned that was equal to the parameter's empirical distribution in the top 1% of simulations from A. In each plot the x-axis spans the prior probability distribution of the model parameter. 91 similar replication dynamics in the real and simulated data (Figure 4.3A; individual chromosomal plots are shown in Supplemental Figure B.5 (right panels)). To quantify better this global similarity we calculated the running correlation of bin counts across all chromosomes and through time. Disregarding repetitive regions (bins with< 90% mappability) we nd the model produces data that correlates well with experimental results (Spearman's rank correlation = 0.92). As demonstrated above, origin ring times dictate much (if not all) of the genome's replication schedule. To examine more closely if the proposed model reproduces ex- perimental ring times we performed the same analysis on the simulated dataset that was performed in Figure 3.4.B. We then sorted both the experimental and simu- lated temporal read-counts by inherent origin eciencies (HU read-counts). Figure 4.3B shows that the simulated temporal ring schedules recapitulate the experimental schedules well (correlation of mean TReps = 0.93). It has been previously reported that later origins re more variably with respect to time than do early origins. We nd that the simulated mean TReps are highly correlated with their variance (Spearman's rank correlation = 0.91). As simulated origins re accurately (with respect to the experimental data), if the fork dynamics represented in the model are correct, then the sites at which their cor- responding forks meet should be predictive of real termination sites. To test simulated fork dynamics, we analyzed the simulated data-set to identify TERs with the same analysis that was performed for Figure 3.3B. A heat map of the TReps surrounding the putative termination sites show that they are anked by earlier replicating regions, corresponding to converging replication forks traversing the anking regions before terminating at the predicted sites Figure 4.3C (right panel). To determine the accu- racy of the predictions, we also produced a heat-map representing the same regions 92 Figure 4.3: 1 M parameter vectors were sampled from the posterior parameter distri- butions to produce a nal simulated dataset. (A) A heat map of real and simulated read counts on each chromosome at each time-point. Chromosomes are separated by yellow lines. (B) For each origin real and simulated read counts (within 500 bp) were calculated at each time point to get a replication timing schedule. Origins were then sorted based on their HU-eciency. (C) Sites where collision probabilities are high (according to the model) are compared to the same sites in the experimental data. Origin locations are highlighted with magenta circles. 93 TReps as calculated with the experimental data (Figure 4.3C; left panel). A visual comparison reveals strikingly similar TRep proles (Simulated vs. Experimental) around these sites. To quantify this similarity we aligned the collision sites identied with the experimental data with the predicted sites and determined that 96% of the predicted sites map to TERS within 10 kb (median distance = 552 bp). An examination of the TERS that dier most signicantly in their location from the predicted sites mainly reveals regions that are anked by origins whose ring appears to be less ecient in the experimental data than predicted in the simulated data. As a result in the experimental data, the TERs are predominantly determined by more ecient origins in the neighborhood. Another phenomenon we have observed fork stalling. An example of this is seen at ARS310 where the right fork appears to stall for 12 minutes at a tRNA shortly after leaving the origin. Fork stalling at such sites is not accounted for in the model, thus the predicted termination site between ARS310 and ARS313 is shifted to the right with respect to the true site of fork collision (Figure 4.4). 4.6 Chapter Summary We have proposed a model of the molecular kinetics that drive the spatio-temporal schedule of yeast chromosomal replication. In this model, origins must accumulate replication factors from a limited nuclear pool in order to re. The propensity of individuals origins to accumulate these factors (which is likely dened by nuclear lo- cation and co-localization with other origins) denes the distribution of their ring times. Origins that are able to recruit sucient amounts of the limiting pool ensure their early ring while simultaneously reducing the probability of other origins from 94 Figure 4.4: Distances between TERs predicted using simulated vs. real data were calculated and sites where large discrepancies existed were analyzed empirically. An example of where fork stalling (which is not taken account of in the model) causes a shifting of the experimentally predicted TER relative to the TER predicted with the simulated data is shown here. 95 ring (by reducing their available pool of limiting factors). After an origin has accu- mulated a concentration of limiting factor greater than a minimum threshold, it res with a probability proportional to this concentration. Eventually, if the origin has not red and the amount of factors it has accumulated in its vicinity reaches a maximum threshold and it res deterministically. After ring, the limiting factors travel with the replication fork to be released when the associated replisome terminates. After this event, associated limiting factors are returned to the free pool to be recruited by other unred origins. We have computationally implemented this model, giving each origin a propensity proportional to its HU-eciency. We have determined that for the model to recapit- ulate experimentally observed ring schedules the limiting factor in the system must be negligible at the onset of S-phase but then must increase to an amount roughly equivalent to the number of origins in the genome. With this model we are able to reproduce experimentally observed ring schedules of origins and also observe that the variance of origin ring times positively correlates with mean ring time. Fur- thermore we are able to predict TERs that correspond well with others that have been experimentally dened and predict many new TERs. 96 Chapter 5: Rpd3 Co-regulates Replication Origin Firing and Flanking Transcriptional Activity Chapter Disclosure: This work has been published in Knott, S.R.V., Viggiani, C.J., Tavar e, S. and Aparicio, O.M. Genome-wide replication proles indicate an expansive role for Rpd3L in regulating replication initiation timing or eciency, and reveal genomic loci of Rpd3 function in Saccharomyces cerevisiae. Genes Dev, 23: 1077-90, 2009. Christopher J. Viggiani performed all BrdU-IP-chip experiments except those corresponding to the ume6 and ash1 datasets. 5.1 Background The nding in Chapter III, that histone modications (specically acetylation) do not correlate with the HU-eciency of origins, contradicts multiple other reports suggesting that HDACs disrupt origin function. Specically, in yeast when the HDAC Rpd3 is deleted signicantly earlier initiation of at least some non-telomeric, late- ring origins occurs, along with increased acetylation of histones anking these origins 97 [9,268]. Furthermore, targeting of a HAT adjacent to a late-ring origin advances its time of initiation [79, 268]. The recent description of two functionally distinct Rpd3 complexes, large (Rpd3L) and small (Rpd3S), presented the opportunity to elucidate more clearly the mechanism of Rpd3's eect on replication timing [33, 109, 120] and resolve this contradiction. Rpd3L represents the previously characterized transcriptional regulator, which is recruited by sequence-specic DNA-binding proteins such as Ume6 to promoters, where this complex typically represses gene expression by deacetylating proximal hi- stones [110, 111, 215]. Unlike the large complex, Rpd3S is nonspecically recruited to actively transcribed regions where it deacetylates chromatin in the wake of the transcription elongation machinery and suppresses spurious transcription initiation from cryptic start sites within ORFs [33, 109, 139]. Conceivably, Sin3-Rpd3 may af- fect replication timing through Rpd3L's function as a promoter-specic regulator of gene expression, and/or through Rpd3S's function in condensing chromatin within transcribed regions that ank origins in the proximal intergenic regions. In fact, Set2, which recruits Rpd3S to chromatin, has been suggested to play a role in negatively regulating DNA replication [23]. Sequence-specic DNA-binding proteins normally target Rpd3 to specic promoters to regulate transcription; however, its mechanism of targeting to origins remains unclear. Deletion of the gene-specic repressor Ume6, which recruits Rpd3, does not advance the timing of late origin ARS603, which is proximal to a potential Ume6- binding site, as does deletion of RPD3 [9]. Because of the apparent lack of correlation between gene expression levels and replication timing in yeast, and because deletion of RPD3 aects chromatin acetylation throughout extensive regions of the genome and is not restricted to promoters of regulated genes, it has been suggested that Rpd3 98 acts in an untargeted manner to aect origins [268]. Previous studies have addressed the genome-wide function of Rpd3 in transcriptional control through analysis of gene expression levels in rpd3 and sin3 cells, and analysis of Rpd3 and Sin3 chromatin binding by chromatin immunoprecipitation (ChIP) [21, 62, 83, 132, 207, 217]. In contrast, the genome-wide scope of origin dereg- ulation in rpd3 cells is unknown. 5.2 Rpd3 Regulates the Initiation of Many Repli- cation Origins It has previously been demonstrated that deletion of RPD3 advances the initiation timing of specic, late-ring origins, which allows these origins to escape inhibition by the HU-induced intra-S phase checkpoint [9]. This previous study also suggested that the eect of Rpd3 on replication timing might be quite broad across the genome, as deletion of RPD3 suppresses the defect in late origin activation and resulting slow S phase of clb5 cells. To explore the genomic scope of Rpd3's in uence over origin function, we generated BrdU replication proles in HU for rpd3 cells, in quadru- plicate, as described above for wild-type cells. We identied 304 signicant BrdU peaks, including 282 known origins as well as 22 peaks at loci previously predicted only in single studies according to OriDB. An additional 41 signicant BrdU peaks that did not align with any previously predicted origins are reported in Supplemental Table A.3, but have not been analyzed further. Whereas most large peaks represent- ing early-ring origins are very similar in size in the two strains, there are numerous smaller (or absent) BrdU peaks in wild-type cells that are larger in rpd3 cells (Fig- 99 ure 5.1A-C and Supplemental Figure B.9). For instance, we described previously the advanced ring, and resulting escape from HU-induced inhibition, of the late origin ARS603 in strains lacking RPD3. The wild-type replication prole shows slight, but signicant incorporation of BrdU, consistent with this late origin ring ineciently in HU (Figure 5.1A); the rpd3 replication prole, however, shows a signicantly greater BrdU peak height (empirical-Bayes-moderated t-test, P < 0:001). Similarly, the data show signicantly larger BrdU peaks at other late origins previously iden- tied as advanced-ring in rpd3 cells including ARS501 and ARS1413 (Figure 5.1B,C). Thus, increases in BrdU peak heights at these origins in rpd3 cells re ect their advanced initiation resulting in more ecient ring in HU. Global comparison of corresponding BrdU peak heights between strains reveals a total of 104 BrdU peaks that are signicantly dierent in height (empirical-Bayes- moderated t-test, P < 0:001) in the rpd3 strain, including 53 origins that failed to initiate (or initiated below the evel of detection) in the wild-type strain (Figure 5.1E and Supplemental Table A.4). Data points signicantly above the diagonal re ect increased levels of origin ring in the rpd3 cells. Rpd3-regulated origins are over- represented in their dependence on Clb5; of the 104 Rpd3-regulated origins, 74 locate within regions whose replication timing schedules were previously reported as being dependent on Clb5 action [159] (hypergeometric test, P = 3:8 10 5 ). Interestingly, the BrdU peak at ARS1206 is signicantly smaller in the rpd3 strain suggesting that the deletion of RPD3 has a unique eect on this origin. Only two signicant BrdU peaks are detected at telomere-proximal origins (dened here as within 20 kb of a telomere) in wild-type cells (one is deregulated rpd3 cells), consistent with the late and/or inecient nature of subtelomeric origins. However, because of this small number and because probe coverage of origins annotated to these regions is 100 Figure 5.1: Early S-phase replication proles identify Rpd3-regulated origins. Wild- type (WT) and rpd3 cells blocked in G1 phase with a-factor were synchronously released into fresh medium containing HU plus BrdU for 1 h and harvested for BrdU- IP-chip analysis. Data were processed as described in the Materials and Methods; Chromosomes VI (A), V (B), and XIV (C) are shown. Individual peaks are denoted with gray dots, and peaks that are signicantly dierent in height (P < 0:001) be- tween the strains are denoted with red dots; origins discussed in the text are indicated. (D) Horizontal lines in the plots indicate the lower quartile, median (red line), and upper quartile of wild-type BrdU peak heights assigned to the corresponding TRep quartile (as indicated on the X-axis). Outliers are displayed with a red plus (+) sign. (E) BrdU peak heights determined for each origin in the rpd3 strain are plotted against the corresponding peak heights in the wild type; peaks that are signicantly dierent in height in the rpd3 cells are indicated with red dots. 101 poor (due to lack of sequence specicity), limiting the number of origins available for analysis, we cannot make a strong conclusion about the role of Rpd3 in subtelomeric regions. In summary, this analysis identies 104 origins that incorporate BrdU to a signicantly dierent degree in rpd3 cells, indicating that the Sin3Rpd3 complex impacts the initiation timing and/or replication eciencies of many origins genome wide. 5.3 Rpd3S Signicantly Modulates the Initiation of Only a Few Selected Replication Origins Rpd3S deacetylates histones in the wake of the transcription elongation machinery (for review, see [136]). Disruption of Rpd3S results in histone hyperacetylation of much of the genome [139, 203], suggesting that Rpd3S might be responsible for the Rpd3-dependent delay of replication initiation. To test the role of Rpd3S in modulat- ing origin ring, we generated replication proles in mutant strains that specically disrupt the chromatin recruitment of Rpd3S (set2,eaf3, andrco1) without af- fecting the function of Rpd3L. Visual examination of the replication proles indicates that theset2,eaf3, andrco1 proles more closely resemble wild-type than the rpd3 proles (Figure 5.2A; Supplemental Figure B.10A). This suggests that Rpd3S does not mediate most of the observed eects of Rpd3 on origin activity. Global comparison of BrdU peak heights in each of the Rpd3S mutant strains ver- sus wild-type conrms that disruption of Rpd3S function does not deregulate ori- gin initiation like RPD3 deletion (cf. Figures 5.3A-C and 5.1E). In contrast to the rpd3 mutant prole, only six BrdU peaks were signicantly greater in the set2, 102 Figure 5.2: Early S-phase replication proles identify Rpd3S- and Rpd3L-regulated origins. eaf3 (A),dep1 (B), anddep1eaf3 (C) cells were analyzed as described in the legend for Figure 5.1 and the resulting data for Chromosome XIV are shown overlaid with the wild-type (WT) and rpd3 proles. Peaks meeting signicance criteria for initiation in rpd3 cells are indicated with green dots. 103 eaf3, and rco1 strains (empirical Bayes-moderated t-test, P 0:001; t-tests were performed with data pooled from duplicate set2, eaf3, and rco1 exper- iments; Supplemental Table A.5). To ensure that these individual mutations were fully inactivating Rpd3S function we also analyzed eaf3set2, and eaf3 rco1 strains. The results show strikingly similar replication proles as the single mutants (cf. Figure 5.2A and Supplemental Figure B.10A-B; cf. Figure 5.3A-C Supplemental Figure B.11A-B). Taken together, these results indicate that Rpd3S modulates the activation of a handful of origins, but is not responsible for the much more extensive Rpd3-dependent regulation of origin ring demonstrated above. Whereas disruption of Rpd3S function results in only a small number of origins that are signicantly deregulated, our whole-genome analysis reveals slight, but repro- ducible increases in BrdU incorporation at many origins in these mutants (Figure 5.2A, 5.3A-C; Supplemental Figures A.6A, A.7A,B); a Wilcoxon signed rank test on the dierences between BrdU peak heights in Rpd3S mutants and wild-type cells con- rmed the signicance of this global dierence (P = 6:1710 31 ). This indicates that in addition to the primary role in delaying a handful of late origins, the chromatin modications by Rpd3S have a minor, global eect on origin timing or eciency. Moreover, the well-studied late origin ARS603 is among the six origins signicantly deregulated in the Rpd3S mutant strains (Supplemental Table A.4). These ndings underscore the value of employing whole-genome approaches to identify subtle, but pervasive dierences in replication timing or eciency, and reveal the limits of rely- ing on representative origins (whose regulation may in fact be somewhat unique) as indicative of an entire class. 104 Figure 5.3: Global comparison of BrdU peak heights in Rpd3S and Rpd3L mutants with corresponding wild-type peaks. BrdU peak heights for each origin inrco1 (A), eaf3 (B),set2 (C),dep1 (D),cti6 (E),dep1eaf3 (F),ume6 (G), and ash1 (H) mutant strains are plotted against the corresponding BrdU peak heights in wild-type (WT) cells; peaks that are signicantly dierent in height from the wild type are indicated with red dots. 105 5.4 Rpd3L Mediates the Rpd3-dependent Eect on Replication Origin Firing To determine Rpd3L's role in regulating origin initiation, we specically disrupted Rpd3L by deleting DEP1 or CTI6, two components unique to the large complex, and performed BrdU-IP-chip analysis in HU as in the previous experiments. The replica- tion proles of dep1 andcti6 mutants closely resemble those of rpd3 cells, with signicant increases in most of the same aected origins (Figure 5.2B; Supplemental Figure B.10B). Global comparison of BrdU peak heights illustrates many signicantly larger peaks in dep1 and cti6 cells, reminiscent of rpd3 cells (Figure 5.3D,E); here the t-tests are based on quadruplicate experiments in the dep1 strain (Sup- plemental Table A.6). For instance, 94 peaks are signicantly larger in the dep1 strain than in the wild-type strain, and 81 of these are also deregulated inrpd3 cells (hypergeometric test, P = 7:96 10 14 ). These results strongly suggest that Rpd3L mediates the widespread eect on replication initiation timing or eciency that we demonstrated for Rpd3. To compare objectively the replication proles for all mutant strains, we systemati- cally determined how each mutant prole diered from the wild-type prole. Wild- type peak heights were subtracted from each of the corresponding mutant peak heights and Pearsons's correlation coecient () was used to measure pairwise correlations between the resulting dierences (Figure 5.4). This analysis yields signicant corre- lations between the Rpd3L mutant strains, dep1 and cti6, and the rpd3 strain ( = 0:88 for dep1 vs. rpd3; = 0:74 for cti6 vs. rpd3), demonstrating that replication patterns in these strains dier from wild-type in similar ways and that most of the same origins are deregulated in each. The slightly weaker correlation 106 betweencti6 andrpd3 probably re ects incomplete disruption of Rpd3L function by CTI6 deletion (M. Carrozza and J.Workman, pers. comm.). The data also show signicant correlations (mean = 0:87) between all of the Rpd3S targeting mutant strains-set2,eaf3,rco1,set2eaf3, andeaf3rco1-indicating that these mutations all result in replication proles that dier from the wild-type in similar ways. However, these mutant proles do not correlate well with the rpd3 prole ( < 0:5), conrming that the Rpd3S mutant strains do not comparably deregulate origin ring like RPD3 deletion. Deletion of both large and small complex subunits even more closely phenocopies rpd3, as deletion of EAF3 in either dep1 or cti6 results in replication proles that correlate more strongly with rpd3 than the respective Rpd3L single mutant strains (Figure 5.2C, 5.3F, Supplemental Figure B.10B and A.11C). Strains with dele- tion of RPD3 together with an Rpd3S- (eaf3rpd3 andset2rpd3) or Rpd3L- (cti6rpd3) specic gene show very high correlations with rpd3, indicating that these genes function with RPD3, rather than independently (Figure 5.4, Supplemen- tal Figures A.10C and A.11D-F). In summary, these data demonstrate that Rpd3L plays the predominant role in modulating the initiation of Rpd3-regulated late ori- gins, whereas Rpd3S plays the predominant role at very few sites, but has a minor repressive eect on origins globally. 107 Figure 5.4: Pairwise correlation analysis comparing Rpd3S and Rpd3L replication proles. Wild-type peak heights were subtracted from the corresponding peak heights of each mutant to yield a dierence vector for every strain. The full pairwise Pearsons correlation matrix for the set of mutant strains was calculated using these dierence vectors. 108 5.5 Deletion of Putative Rpd3L-Targeting Factors Deregulates Few Rpd3L-Regulated Replication Origins Our previous analysis of the Rpd3-targeting factor Ume6 indicated that UME6 dele- tion does not advance ring of ARS603, whereas deletion of RPD3 does [9]. However, as mentioned above, ARS603 ring is delayed by Rpd3S in addition to Rpd3L, and hence, a role for Ume6 may have been obscured. Therefore, we reexamined the role of Ume6 in Rpd3L targeting by generating replication proles in triplicate inume6 cells for comparison with Rpd3L mutant proles. Consistent with our previous study, deletion of UME6 did not alter late origin ring like RPD3 deletion (Figure 5.3G; Supplemental Figure B.10D). In ume6 cells, only one BrdU peak is signicantly increased relative to wild-type, whereas ve BrdU peaks are signicantly reduced (empirical Bayes-moderated t-test, P 0:001), suggesting that these ve origins re less eciently or that their initiation timing is delayed when UME6 is deleted. These results suggest that Ume6 plays little or no role in recruiting Sin3-Rpd3, at least with respect to origin regulation. However, an alternative explanation is that Ume6 me- diates additional factors that act in opposition to or independently of Rpd3. Indeed, Ume6 recruits both Isw2 and Rpd3 complexes to repress URS1-containing genes such that deletion of both regulators is required for full derepression [77]. Thus, deletion of UME6 has eects distinct from RPD3 deletion on transcriptional repression, and this may explain the dierent eects on origin ring observed here. We also tested the role of Ash1, a sequence-specic DNA-binding protein, which stably associates with Rpd3L and may participate in its recruitment to promoters 109 [34]. We generated replication proles in triplicate for ash1 strains and found little similarity between theash1 andrpd3 (ordep1) replication proles (Figure 5.3H; Supplemental Figure B.6D). In ash1 cells, two BrdU peaks (ARSXII-199 and ARSXIII- 482 ) are signicantly larger (empirical Bayes-moderated t-test, P 0:001) than the corresponding wild-type peaks. Interestingly, these origins are also signicantly deregulated in the rpd3 and dep1 strains, and Ash1 appears to bind within the intergenic region occupied by ARSXII-199 (P-value= 0:072; [83]). These results are consistent with Ash1 recruiting Rpd3L to a subset of its targets, while Rpd3L is recruited to many sites by factors other than Ume6 and Ash1. 5.6 Rpd3L-Regulated Replication Origins are Lo- cally Associated with Sin3-Rpd3-Regulated Transcription and Chromatin Binding Our results suggest that the function of Rpd3L as a targeted regulator of gene ex- pression through histone deacetylation generally delays or impedes the activity of proximal origins. To conrm that changes in histone acetylation were associated with origin deregulation in rpd3 cells, we compared the locations of deregulated origins in therpd3 replication prole with published data on histone acetylation changes in rpd3 cells [210]. Of the 304 origins that red in rpd3 cells, acetylation data were available for 229 origins (within 500 base pairs [bp] of the 59 or 39 end of the ARS as dened in OriDB), which included 78 origins that incorporated BrdU more robustly. Of these 78, 63 exhibited at least twofold increased acetylation of histone H3K18, H4K5, or H4K12, indicating a signicant association between locally increased his- 110 Figure 5.5: Rpd3L-regulated origins are associated with Rpd3-regulated genes. (A) Venn diagram of overlap between (1) origins deregulated in rpd3 cells and (2) re- gions that show increased acetylation in rpd3 cells. (B) Venn diagram of overlap between intergenic regions containing origins that are (1) deregulated in dep1D cells, (2) anked by genes deregulated inrpd3 orsin3 cells, and (3) anked by genes that bind Rpd3 or Sin3 in their promoter regions. tone acetylation and origin deregulation (hypergeometric test, P = 0:0016) (Figure 5.5A; Supplemental Table A.7). To examine the relationship of origin deregulation with transcription, we compared the locations of deregulated origins in the dep1 replication prole with published, genome-wide gene expression proles of rpd3 and sin3 cells and genome-wide chromatin-binding maps of Rpd3 and Sin3 [21,62,207,217]. We focused this analysis on 241 conrmed origins whose precise intergenic locations are known (compiled at OriDB), allowing the unambiguous assignment of each origin to the anking genes. The results of this analysis are depicted in Figure 5.5B. Of these 241 origins, 57 are signicantly deregulated in dep1 cells. Rpd3 and/or Sin3 binding (P < 0:05) has been reported at the promoters of genes adjacent to 45 of the 241 origins, and 17 of the 111 57 deregulated origins. A hypergeometric test indicates signicant partial association (P = 0.014) of deregulated origins with proximal genes that bind Sin3-Rpd3 in their promoters. Next, we compared the locations of deregulated origins with the locations of Sin3-Rpd3-regulated genes. Of the 241 origins, 68 are adjacent to at least one gene whose expression level has been reported to increase (P < 0:05, or twofold increased depending on the data source) in rpd3 or sin3 cells; 23 of these 68 origins are signicantly deregulated in dep1 cells. A hypergeometric test reveals a signicant partial association (P = 0:017) between deregulated origins and anking genes whose expression levels are derepressed in rpd3 or sin3 cells. Finally, in an eort to identify additional factors involved, potentially as Rpd3L- targeting factors, we examined whether other transcription factors were over repre- sented near Rpd3L-regulated origins. We conducted Random Forest Regression anal- ysis to identify factors associated with promoters of genes adjacent to Rpd3- regulated origins, using concatenated ChIP-chip data for 203 yeast transcription factors and six chromatin remodeling factors (including Rpd3 and Sin3) [83,207]. As the Rpd3 and Sin3 promoter binding reported in the ChIPchip study likely re ects that of Rpd3L, we again chose the deregulated origins in the dep1 strain as our response variable to be compared against the DNA-binding factor data. For this analysis, we used 200 conrmed origins whose precise intergenic locations are known and that are present in the transcription factor data sets; 57 of these origins are deregulated in dep1 cells. Of the 209 DNA-binding factors examined, six are identied as signicantly associ- ated with origin deregulation (Supplemental Table A.8). Importantly, this analysis identies the binding of Rpd3 and Sin3 as most predictive of Dep1-dependent origin regulation, independently conrming the results of the previous association analysis. Rap1, Sum1, Smp1, and Swi6 also are overrepresented at genes adjacent to Dep1- 112 regulated origins, suggesting that each of these factors may impact origin regulation. In addition, these ndings suggest a functional link between each of these factors and Sin3-Rpd3, potentially as recruitment factors or co-regulators. 5.7 Chapter Summary The HDAC Rpd3 acts in two distinct complexes (Rpd3S and Rpd3L). Rpd3S acts broadly throughout the genome at ORFs to prevent spurious transcription, while Rpd3L is directly targeted to gene promoters where it acts as a classical transcription regulator. With BrdU-IP-chip we analyzed the replication proles of WT andrpd3 cells as well as cells with targeted deletion of the Rpd3S and Rpd3L complexes. With this analysis we found that 1=3 of origins are repressed by Rpd3 action. The majority of these are less ecient origins that are dependent on Clb5. Rpd3S targeted deletion caused only subtle eects on origin ring activity. Of the > 300 origins analyzed < 5 are regulated by Rpd3S in a signicant way. In contrast Rpd3L targeted deletion mimicked much of the eects seen in rpd3 cells indicating that Rpd3 has its eects on origins through this targeted complex. Furthermore, when both Rpd3S and Rpd3L action were removed (without deletion of Rpd3) replication proles even more closely resembled those seen in rpd3 cells. An analysis of chromatin modications revealed that Rpd3 regulated origins are al- most exclusively contained in regions that Rpd3 deacetylates in WT cells. Further- more, Rpd3 regulated origins were found to overlap with Rpd3 binding sites in a mod- est but statistically signicant manner; indicating that Rpd3 acts directly at origins to in uence their activity. Furthermore, a statistically signicant number of gene's that ank Rpd3 regulated origins were found to be Rpd3 regulated themselves. 113 Chapter 6: Forkhead Transcription Factors Regulate Replication Origin Firing Through Long Range Chromatin Interactions Chapter Disclosure: Jared M. Peace performed several of the BrdU-IP-seq ex- periments, the nucleosome positioning experiments, the RNA-seq and RNA-Pol-II ChIP-seq experiments. A. Zachary Ostrow performed the Fkh1 and Fkh2 ChIP- chip experiments as well as the targeted mutagensis BrdU-IP-seq experiments. Reza Kalhor performed the amino acid alignment of the yeast forkhead proteins with the FoxP proteins and also modeled Fkh1's structure. 6.1 Background During investigations into Rpd3's action on origins we performed regression analysis to determine which other transcription factors were present in high abundance at Rpd3 regulated origins. Through this analysis we identied Forkhead 1 (Fkh1) and Forkhead 2 (Fkh2) as origin-associated DNA binding proteins. An over abundance of Fkh1 and Fkh2 consensus sequences at origins has has also been reported previ- 114 ously [118]. Forkhead transcription factors compose a large eukaryotic gene family playing diverse roles in gene regulation relevant to development and cancer [82]. Fkh1 and Fkh2 have been characterized as broad transcriptional regulators with functions in activation as well as repression. They play both redundant and opposing roles in regulating the expression of CLB2 and in fkh1fkh2 strains, the cell cycle regu- lated transcription of the CLB2 cluster of genes is obliterated [89, 131, 195, 294] and cells grow in a pseudohyphal manner. In addition, both proteins have been reported to associate with the transcription apparatus to regulate elongation and termination, playing opposing roles here as well [169]. However neither Fkh1 nor Fkh2 has been directly implicated in regulating replication. 6.2 Fkh1 and Fkh2 Regulate the Firing Dynamics of Non-Centromeric Replication Origins To examine replication dynamics of Fkh1- or Fkh2-depleted cells, we have employed BrdU-IP-seq to analyze WT, fkh1, fkh2 and fkh1fkh2 cells. In addition, to analyze more easily fkh1fkh2 cells (whose pseudohyphal growth makes them dicult to arrest) we have introduced a high-copy vector harboring a C-terminally truncated Fkh2 into fkh1fkh2 cells (fkh1, fkh1fkh2 +pFKH2C), as this has been reported to suppress the abnormal growth (which we have conrmed). For each cell type we performed the same analysis in HU that was described for Figures 3.4A and F to allow for a full analysis of each origin's replication initia- tion without interference from neighboring fork movement. Figure 6.1A shows the Chromosome III and XII replication dynamics of all strains (see Supplemental Fig- ure B.12 for all chromosomes). In this analysis (on these chromosomes) fkh2 cells 115 look similar to WT cells. In contrast,fkh1 cells show both increases and decreases in specic origin eciencies. This is interesting because, as mentioned previously, Fkh1 is generally viewed as an inhibitor. The most striking phenotype occurs in fkh1fkh2 and fkh1fkh2 +pFKH2 cells, in which one population of ori- gins shows increases in HU-eciency (indicated with green circles; identied as sites with signicant dierences in counts within 500 bp between WT and mutant cells; DESeq FDR < 0:005; [8]) and another population shows decreases (indicated with red circles). Included in the origins that decrease in their HU-eciencies are ARS305 and ARS607, which are highly ecient origins. To grasp better the genome-wide scope of forkhead regulation of origins, we have analyzed all BrdU peak counts (measured by the total read count within 500 bp of a peak apex), and performed hierarchical clustering of these peak counts. A heat- map of the clustered matrix is shown in Figure 6.1B. We see that the removal of Fkh2 results in very little phenotypic change; the WT and fkh2 peak proles are most similar to one another. In contrast, fkh1 cells show an intermediate phenotype clustering between WT and the two double mutant cell types. Deletion of FKH1 alone causes 35 origins to decrease and 15 origins to increase in their HU- eciencies (Figure 6.1C,D and Supplemental Table A.9). Again, the most striking phenotype is that of the fkh1fkh2 and fkh1fkh2 +pFKH2C cells which show a \switching\ of HU-eciencies (where HU-ecient origins become in-ecient and HU-inecient origins become ecient) at 200 of the 352 origins analyzed. In total 106 origins show decreases in their HU-eciency in fkh1fkh2 cells, with 95 exhibiting this phenotype when pFkh3C is added (Figure 6.1C, Supplemental Table A.9). Of the origins that show decreases in their eciency (which from here on out we term Fkh-excited origins), 32 show regulation by Fkh1 alone. Almost an equal 116 Figure 6.1: (A)Forkhead deletion mutants were analyzed with BrdU-seq in the pres- ence of HU in the same manner as was described for Figure 3.4A. Peak counts sur- rounding each origin (reads within 500 bp) were compared with DESeq to identify origins that showed signicant dierences to WT proles; red indicates Fkh-excited origins and green indicates Fkh-repressed origins FDR< 0:005. (B) Peak counts at Fkh-regulated origins were placed into a matrix (with rows corresponding to cell type and columns to origins) and clustered in two dimensions Blue cells indicate low HU- eciency and red cells indicate high HU-eciency. In (C) and (D) the overlap of origins that show increases (C) and decreases (D) in their HU-eciency in fkh1, fkh1fkh2 and fkh1fkh2 cells are shown. 117 number of origins show increases in their HU-eciencies in the double mutants (82 fkh1fkh2 and 80fkh1fkh2+pFKH2C; we refer to these as Fkh-repressed origins [Figure 6.1D; Supplemental Table A.9]). Since the plasmid-harboring double mutant showed very little phenotypic dierence with fkh1fkh2, we have chosen to proceed with its analysis alone due to the better growth properties. Hereon we will refer to fkh1fkh2 +pFKH2 simply as fkh1fkh2. To ensure that replication proceeds at a rate similar to that of WT cells, and also to ensure that the dierences observed between WT and Fkh-mutant strains are not checkpoint dependent (i.e. only present in HU), we have performed a BrdU-pulse- IP-chip experiment on WT vs. fkh1fkh2 cells. Brie y, cells were arrested with -factor and then released into YEPD. Replicating cell cultures were pulsed with BrdU at 12 24; 18 30;:::; 48 60 minutes after release into S-phase; following this BrdU-IP-chip was performed on each separate pulse. Also, at each time-point genomic DNA content was also analyzed by ow cytometry (FACS). FACS analysis conrms that WT and fkh1fkh2 cells enter and complete S-phase under a sim- ilar schedule (Figure 6.2A). Furthermore both ARS305 and ARS306 show delayed ring in fkh1fkh2 cells (consistent with the HU results; Figure 6.2B and Sup- plemental Figure B.13). Furthermore, when multiple Fkh-repressed origins neighbor one another, regions of up to 300 kbp wide undergo replication an advanced time (in comparison to WT cells; see the region from 600 900 kbp on Chromosome XV (Figure 6.2B). To analyze replication changes through S-phase genome-wide, we determined the mean array signal within 10 kb of each replication origin that was detected to re with BrdU-IP-seq in HU. Following this a matrix was constructed where each cell holds the mean WT-fkh1fkh2 signal in these regions at each time-point. To 118 Figure 6.2: (A) FACS analysis shows that WT andfkh1fkh2 cells enter and exit S-phase under the same schedule. Cells were released into S-phase and pulsed (in sep- arate cultures) with BrdU at 1224; 1830;:::; 4860 minutes and then harvested for BrdU-IP-chip; Chromosome VI (B) and XIV (C) are shown here. (D) To view timing at Fkh-regulated origins genome-wide, M-values surrounding each origin were calculated at each time-point and placed into a matrix where rows correspond to ori- gins and columns to BrdU-pulse time intervals. Following this the matrix was sorted based on Fkh-regulation. ARS305 and ARS607 have been annotated with pink and blue circles, respectively. Centromeric origins are annotated with green circles and telomeric origins are annotated with red circles. 119 look specically at Fkh-regulated origins we have sorted the columns of the matrix (each column corresponds to an origin's WT-fkh1fkh2 signal through time) such that they are ordered by the magnitude of their forkhead regulation (as measured by changes in their HU-eciency). A heat map of this sorted matrix demonstrates that all Fkh-regulated origins (in HU) also show changes in their replication dynamics through time (Figure 6.2D). Fkh-excited origins show more signal in WT experiments early in S-phase (as indicated by red cells to the left in the matrix) and stronger fkh1fkh2 signal later in S-phase (as indicated by blue cells on the right of the matrix; ARS305 and ARS607 are annotated on the left of the matrix with pink and blue circles, respectively). Fkh-repressed origins show the opposite eect (higher fkh1fkh2 signal early in S-phase and higher WT signal later in S-phase). We found that Fkh-excited origins are depleted of CEN-proximal origins (within 25 kbp of a CEN; P-value < 0:001) while Fkh-repressed origins are signicantly enriched for them (P-value < 0:001). We have indicated which rows of the matrix correspond to CEN-proximal origins with green circles. The position of the circles at the top of the Fkh-repressed group indicates that these origins exhibit large positive changes in HU-eciency infkh1fkh2 cells and that these changes coincide with substantial changes in the timing schedule. Finally, although the repetitive nature of telomeres limits their analysis we have identied origin-ring at four such sites that are Fkh- regulated (red circles; all of these sites are Fkh-excited). 120 6.3 Fkh1 and Fkh2, Regulate CDC45 Binding at Forkhead-Regulated Replication Origins To determine if the changes in origin usage infkh1fkh2 cells is coupled with cor- responding changes in replication factor binding, we analyzed ORC, MCM and Cdc45 binding in both WT and fkh1fkh2 cells. Polyclonal ORC and MCM antibodies were used to analyze the ORC and MCM complexes, respectively, while an HA-tagged CDC45 protein was inserted into both WT andfkh1fkh2 cells for analysis of their binding proles. For each dataset we quantied peak signal by calculating the mean M-values in each detected peak (the methods described in Chapter II were used to detect enriched regions). As shown in Figure 6.3A, whereas neither ORC nor MCM binding appear to be signicantly altered at Fkh-regulated origins, all Cdc45 binding (which is only detected at the earliest most ecient origins in the genome in WT G1-arrested cells) is lost at Fkh-excited origins with one exception, reinforcing the notion that these origins need Fkh proteins for their highly ecient function in WT cells. In contrast, Cdc45 binding is observed at only one Fkh-repressed origin in WT cells. This origin maintains Cdc45 binding in fkh1fkh2 cells; and in addition a second origin gains early Cdc45 binding. In total, 29 origins bind Cdc45 in WT cells, whereas only 14 do so in fkh1fkh2 cells (Figure 6.3B; dark blue and light blue rectangles represent Cdc45 binding in WT andfkh1fkh2 cells, respectively). In WT cells, binding was split relatively evenly among CEN-proximal and non-CEN-proximal origins with 15 non-CEN-proximal and 14 CEN-proximal origins enriching for Cdc45. In contrast, the majority of Cdc45 binding sites (12 of 14) were CEN-proximal in fkh1fkh2 cells (see overlap of yellow rectangle with light and dark blue rectangles in Figure 6.3B). In total, Cdc45 121 Figure 6.3: ORC, MCM2,7 and Cdc45 binding was detected genome-wide with ChIP- chip in both WT and fkh1fkh2 cells.(A) Each of ORC, MCM2,7 and CDC45 binding was analyzed at all Fkh-excited and -repressed origins and only Cdc45 binding proles were altered in the mutant background. (B) A Venn diagram demonstrating that sites where Cdc45 binding was lost in fkh1fkh2 cells were mainly non- centromeric. 122 binding is lost at 21 origins in the mutant cells. Of these, 14 are non-CEN-proximal and nine are CEN-proximal (only two of the non-CEN-proximal origins were classied as Fkh-excited). Of the eight origins that gain access to Cdc45 in the mutant cells seven are CEN-proximal. Taken together with the results of Figure 6.2D, this indi- cates that CEN-proximal origins are non-Fkh-excited and, in fact, tend to increase Cdc45 binding and ring activity when Fkh1 and Fkh2 action is removed. 6.4 Many Replication Origins are Bound by Fkh1 and Fkh2 In our analysis of Rpd3 (Chapter V), we found that a signicant portion of Rpd3- regulated origins were bound by Rpd3. We wished to examine whether such a rela- tionship was evident at Fkh-regulated origins. To analyze binding patterns of these proteins we epitope-tagged Fkh1 and Fkh2 with Myc separately, in WT cells. For each, Fkh1-Myc and Fkh2-Myc, we performed ChIP-Chip experiments in triplicate. Furthermore, as a control, we performed the same analysis with an untagged strain. All arrays were normalized as described in Chapter II and triplicates were averaged. The control experiments were employed as follows: using only probes above the 95% cuto described in Chapter II, control and experimental datasets were scaled to have the same MAD of M-value dierences between neighboring probes. Next, enriched regions were called in each of the datasets using a cuto that corresponded to an FDR< 0:05 (FDR was calculated as the number of enriched regions that were called in the control divided by the number of enriched regions called in experiment). With this analysis we identied 502 Fkh1 binding sites and 138 Fkh2 binding sites. 123 Figure 6.4: ChIP-chip was performed on Fkh1- and Fkh2-myc tagged proteins. (A) ARS305 and ARS607 both bind Fkh1 but neither bind Fkh2. (B) A Venn diagram demonstrating the overlap between Fkh-regulated origins with sites of Fkh1 and Fkh2 enrichment. Figure 6.4A shows Fkh1 and Fkh2 enrichment at origins ARS305 and ARS607. While Fkh1 is bound strongly at both origins, no signicant Fkh2 signal is detected. To an- alyze the relationship between Fkh-binding and Fkh-regulation at origins, we aligned origin locations with Fkh1/2-enriched regions as described in Chapter III (using a 500 bp gap penalty; Figure 6.4B). We found that 59% of all origins were bound by Fkh1 and/or Fkh2. Furthermore, we found that 61% of Fkh-regulated origins were bound by Fkh1 and/or Fkh2. Of the 95 Fkh-excited origins, 48 are bound by Fkh1 and 20 are bound by Fkh2 (16 are bound by both Fkh1 and Fkh2). Of the 80 Fkh- repressed origins, 52 are bound by Fkh1 and 17 are bound by Fkh2 (14 are bound by both Fkh1 and Fkh2). Of these results, the only statistically signicant nding is that Fkh1 binding is over represented at Fkh-repressed origins (P-value = 0:06). This result is consistent with Fkh1's known action as a repressor. 124 6.5 Forkhead-Regulated Origins are not Spatially Associated with Forkhead-Regulated Genes Transcription is linked with origin activity in higher eukaryotes, and recently we have shown that Rpd3 co-regulates a statistically signicant number of origins and genes (Chapter V). As Fkh1 and Fkh2 are broad regulators of transcription, we reasoned that these proteins may be regulating origins indirectly by regulating their anking genes. Recent advances in transcription proling have uncovered many previously unannotated sites of transcription and these sites have not been included in any anal- ysis of trancription changes in Fkh1 or Fkh2 mutant backgrounds. Furthermore, it has been proposed that Fkh1 and Fkh2 regulate transcript elongation and termination, leaving open the possibility that Fkh-excited origins may be protected from intrusion of transcriptional machinery in WT cells, but may lose that protection in the Fkh mutants. Read-thru transcripts resulting from malfunctioning termination control mechanisms could lack poly-A capping at the 5 0 tail. To analyze transcription as it pertains to Fkh-regulated origins, while accounting for the above mentioned factors, we isolated total mRNA from cells with ribominus beads (as opposed to Poly-A cap- ture) and analyzed the resulting mRNA with strand specic RNA-seq. This analysis was performed in WT andfkh1fkh2 cells in both asynchronous and G1-arrested populations, in duplicate. To analyze transcription changes around origins, we performed the same analysis for each dataset that was described for Figure 2.6E (except that sequence counts within 50bp bins instead of average M-values were calculated and origins were sorted based on the dierences in their WT and fkh1fkh2 BrdU-HU signals). Figures 6.5A and 6.5B shows that in both asynchronous and G1-arrested cells, respectively, there 125 is no visible correlation between origin and gene regulation. Furthermore, there is no evidence for infringing transcripts at Fkh-excited origins where genes converge on origins (i.e. at tandem or converging intergenic regions). To conrm these observations statistically we performed regression analysis on origins grouped by the orientations of their anking genes. For converging and diverging intergenic regions we used one predictor representing the fkh1fkh2-WT read count dierence of the closest transcript (as measured in bp between the origin and the gene's nearest end) and one predictor representing the same dierence measure in the farther of the two anking transcripts. For tandem intergenic regions, one predictor representedfkh1fkh2-WT read count dierence for the converging gene and the other predictor represented the dierence for the diverging gene. With this analysis the only predictor that showed signicant correlation with origin regulation was the gene furthest away from origins within a converging intergenic region (P-value < 0:05). With this exception, the anking genes of Fkh-regulated origins do not show corresponding transcriptional changes. 126 Figure 6.5: Ribominus Isolated mRNA was analyzed with strand specic RNA-seq in Asynchronous (A) and G1-arrested (B) cells. Origins were grouped based on their intergenic class (tandem, converging and diverging origins are shown in the top, middle an bottom panels, respectively) and sorted based on their corresponding fkh1fkh2-WT HU-eciencies (leftmost panels). The second from the left panels show the WT transcription prole. The second middle panels show thefkh1fkh2 transcription prole. Finally the rightmost panel shows the dierence (fkh1fkh2- WT) in transcription proles. 127 Figure 6.5: Continued 128 To determine if non- anking genes might show co-regulation with Fkh-regulated ori- gins we performed a permutation test on the distances between Fkh-regulated origins and their nearest Fkh-regulated gene. Fkh-regulated genes were identied as those that showed dierential regulation (DESeq FDR < 0:01; [8]) between Asynchronous and G1-arrested WT cells but showed no such dierential expression infkh1fkh2 cells. This analysis identied 263 genes (of which 159 increase and 104 decrease as the WT cells move from an asynchronous to a G1-arrested state). A signicant num- ber (12) of the CLB2 cluster of genes (32), which has previously been shown to be regulated by Fkh1 and Fkh2, were contained in this set (P-value < 0:001; hypergeo- metric test). For each of these 263 genes, we calculated the minimum distance from its promoter to the nearest Fkh-regulated origin (for both Fkh-excited and repressed origins). Following this, 100,000 sets of simulated gene sets were identied by ran- domly selecting 263 genes from the SGD database and separating them (randomly) into up- and down-regulated (as dened above). For each of these sets, the minimum distance to Fkh-excited and Fkh-repressed origins were calculated. From this analy- sis we determined for all possible combinations (e.g. up-regulated gene/Fkh-excited origin, down regulated gene/Fkh-excited origin etc) that Fkh-regulated origins are not signicantly closer to Fkh-regulated genes than a random sampling of the origin population. The results described above indicate transcripts that we were able to isolate with ribominus beads (i.e. non-degraded transcripts) showed no correlation in their Fkh- regulation with Fkh-regulated origins. Recently, however the presence of cryptic unstable transcripts (CUTs) has been observed in yeast cells [284]. These are unde- tectable by RNA-seq unless this analysis is performed in arrm6 background strain (Rrm6 acts to stabilize the transcripts). Furthermore, if Fkh1 or Fkh2 are indeed 129 required for proper transcript elongation and termination, the possibility exists that improperly terminated transcripts could be destroyed, causing the above analysis to fail in identifying such phenomena at Fkh-regulated origins. To ensure that such transcripts were not a cause of origin regulation we performed the same analysis as described for Figure 6.5 using RNA Pol II Chip-seq counts as a proxy for transcription activity at each loci. Figure 6.6 shows that the dierences in RNA Pol II enrichment are greater in asyn- chronous cells than in G1-arrested cells. However, as with the analysis of RNA-seq data there is no visible correlation between the HU-eciency changes observed at Fkh-regulated origins and their anking genes' RNA Pol II binding proles. To vali- date this result statistically we performed multivariate regression analysis, using the fkh1fkh2-WT count dierence of RNA-Pol-II-reads that mapped to within 500 bp of each origin as the predictors. Two predictors were analyzed (corresponding to RNA-Pol-II dierences in asynchronous and G1-arrested cells). As with our analysis of RNA-seq dierences, for each origin we used the dierence in HU-eciency between WT andfkh1fkh2 cells as the independent variable. With this analysis we vali- dated that there is no statistically signicant correlation between proximal RNA Pol II binding and HU-eciency dierences in WT and fkh1fkh2 cells. 130 Figure 6.6: RNA-Pol-II bound DNA was analyzed with ChIP-seq in WT andfkh1 fkh2 cells in asynchronous (A) and G1-arrested (B) cells. Origins were grouped based on their intergenic class (tandem, converging and diverging origins are shown in the top, middle an bottom panels, respectively) and sorted based on their corre- spondingfkh1fkh2-WT HU-eciencies (leftmost panels). Left and middle panels show the RNA-Pol-II binding prole around origins in WT fkh1fkh2 cells, re- spectively. The rightmost panel shows the dierence (fkh1fkh2-WT) in RNA Pol II binding proles. 131 Figure 6.6: Continued 132 6.6 Fkh1 and Fkh2 In uence Nucleosome-Phasing Around the ACS The localization of origins to NFRs suggests that they exploit this feature to bind chromatin. Indeed, previous studies have implicated nucleosome positioning in origin competence, and one of these studies implicated ORC and the transcription factor Abf1 (which binds at some origins) in maintaining a suciently large NFR for ef- cient pre-RC assembly [143]. However, pre-RC assembly does not appear to be compromised in Fkh mutants, suggesting that any eects of Fkh proteins on the local chromatin and NFR structure likely operate through a distinct mechanism to regulate origin activation. We have shown that in WT cells nucleosome phasing is more precise at origins that are ecient in HU as compared to origins that are inecient. To test if corre- sponding changes are observed at Fkh-regulated origins (i.e. decreased phasing at Fkh-excited origins or increased phasing at Fkh-repressed origins in fkh1fkh2 cells), we analyzed nucleosome occupancy in unsynchronized and G1-arrested WT andfkh1fkh2 cells as described for Figures 3.6B and C. First, we searched for dif- ferences between WT andfkh1fkh2 cells in nucleosome occupancy at all origins. In asynchronous cells, the fkh1fkh2 nucleosome density prole is nearly identi- cal to the WT prole (Figure 6.7 A). Furthermore, in both WT and fkh1fkh2 G1-arrested cells, phasing seems to fade more quickly, compared to asynchronous cells, as distance from the ACS increases (Figure 6.7B). Next, we performed the same analysis onfkh1fkh2 cells as performed for WT cells in Figure 3.6C (to compare HU-ecient and -inecient origins). We nd that in asynchronous cell populations at the highest eciency origins, nucleosomes upstream of the ACS (from -1000 to 133 -150 bp) shift away from the ACS (with respect to their positions in the total origin population; indicated with green lines, Figure 6.7C). This eect is also seen down- stream of the ACS. In contrast, at less ecient origins there appears to be a shifting of the nucleosomes towards the ACS. In G1-arrested cells (with the exception of the nucleosome two positions from the left of the ACS) the same eect of ecient ori- gin nucleosome shifting away from the ACS is seen both upstream and downstream of the ACS (Figure 6.7D). In addition, (as occurs in WT cells) at less ecient ori- gins nucleosome phasing at less ecient origins is deteriorated infkh1fkh2 cells. To compare directly the nucleosome densities of WT andfkh1fkh2 cells at Fkh- regulated origins, for each cell type (WT and fkh1fkh2 ) and experimental con- dition (asynchronous and G1-arrested), we calculated the nucleosome density curve of Fkh-excited and Fkh-repressed origins separately. A comparison of these curves for asynchronous WTfkh1fkh2 cells at Fkh-excited origins shows extremely similar density proles (Figure 6.7E). In contrast, in G1-arrested cells at Fkh-excited origins, we observe a deterioration of nucleosome phasing in fkh1fkh2 cells upstream of the ACS (Figure 6.7F). This again coincides with the observation that (especially in G1-arrested cells) phasing is reduced at less ecient origins. At Fkh-repressed ori- gins (which are less ecient in WT cells) we observe less robust phasing in both WT and fkh1fkh2 asynchronous cell populations (Figure 6.7G). In the G1-arrested population, with the exception of the four nucleosomes closest to the ACS, phasing is abolished in both WT and fkh1fkh2 cells (Figure 6.7H). 134 Figure 6.7: MNase-seq was used to analyze nucleosome density around Fkh-excited and -repressed origins in both WT and fkh1fkh2 cells in asynchronous and G1- arrested cells. For each subplot density curves were developed as described for Figure 3.6C. Green lines on each plot indicate nucleosome positions identied using all ACS aligning origins in asynchronous WT cells. (A) Comparison between asynchronous WT andfkh1fkh2 cells at all ACS aligning origins. (B) Comparison between G1- arrested WT andfkh1fkh2 cells at all ACS aligning origins. (C) Comparison of the top and bottom quartile origins (as measured by HU-eciency) in asynchronous fkh1fkh2 cells. (D) Comparison of the top and bottom quartile origins (as measured by HU-eciency) in G1-arrested fkh1fkh2 cells. (E) Comparison of nucleosome density in asynchronous WT and fkh1fkh2 cells at Fkh-excited ori- gins. (F) Comparison of nucleosome density in G1-arrested WT and fkh1fkh2 cells at Fkh-excited origins. (E) Comparison of nucleosome density in asynchronous WT andfkh1fkh2 cells at Fkh-repressed origins. (F) Comparison of nucleosome density in G1-arrested WT and fkh1fkh2 cells at Fkh-repressed origins. 135 6.7 Fkh1 and Fkh2 Regulate Long-Range Chro- matin Interactions As discussed previously, origins cluster to form replication foci in early S-phase, with the earliest, most ecient origins clustering in G1-phase [53,74,161]. Infkh1fkh2 cells many of the normally earliest, most ecient origins lose activity. We hypoth- esized that Fkh1 and Fkh2 act to recruit a subset of origins into organized clusters that benet from preferential access to limiting replication factors (e.g. Cdc45), and hence, initiate early and eciently upon entry to S-phase. To test this hypothesis we used 4C on G1-arrested WT and fkh1fkh2 cells (see Figure 6.8A for a diagram of the procedure). We chose to analyze ARS305 (a strongly Fkh-excited origin) as it has been shown to associate with other origins that we have identied as Fkh-excited (ARS306 and ARS607 ; [53]). For this we digested cross- linked-chromatin with Xba1 and then performed an initial (cross-link-dependent) ligation. Following DNA purication we performed a secondary digestion with Mse1 followed by a secondary ligation. We then amplied the molecules that had been captured by ARS305 with primers that aligned within the ARS305 Xba1-Mse1 frag- ment, oriented away from ARS305. Next, amplied material was uorescently labeled and hybridized on tiling arrays against genomic DNA. If interactions were properly captured, array probe enrichment should begin at the Xba1 site anking the locus that ARS305 interacts with, and extend towards that locus. To detect interactions, we called probes as signicantly enriched with the methods described in Chapter II, and searched for interactions based on the above criteria. In WT cells we were able to detect ARS305 interacting with ARS306 (one out of two replicates) and ARS607 (two out of two replicates). True to our hypothesis, these interactions were not de- 136 tected in fkh1fkh2 cells (no interaction was detected in either replicate; Figure 6.7B, bottom panels), indicating that Fkh1 and Fkh2, at least partially, mediate the clustering of the earliest, most ecient non-centromeric origins in the nucleus. As Fkh1 and Fkh2 are bound at origins that cluster in G1 (ARS305 and ARS306 ), and removal of Fkh proteins causes a cessation in that clustering, we hypothesized that Fkh1 and Fkh2 might form homo- and/or hetero-dimers to form long-range chromatin interactions between origins. Interestingly, FoxP2 and FoxP3, which are mammalian members of the Forkhead-Box (FOX) family, have been shown to dimerize in vitro through a domain-swapping mechanism [248] and (L. Chen, pers. comm.). Three residues, conserved in the FoxP subfamily, have been implicated in their domain swapping (FoxP2-Ala539, FoxP3-W348 and FoxP3-M370; Figure 6.9A orange, blue and green columns). Brie y, the FoxP2-Ala539 stabilizes a helix (H2) and allows interaction between helices H2 and H3 of dierent FoxP2 polypeptides [248]. Also, the hydrophobic FoxP3-W348 and FoxP3-M370 residues interact with one another in the domain-swapping (L. Chen, pers. comm; Figure 6.9 B shows the interactions these residues form with one another at the interface of the dimer). 137 Figure 6.8: A modied 4C protocol was applied to G1-arrested WT fkh1fkh2 cells to probe for ARS305 interactions. (A). Chromatin is cross-linked and digested with a less-frequent (5 or 6 bp) cutter. Following this cross-linked and digested chro- matin is ligated in high concentration to promote intra- and prevent inter-molecular ligations. After ligation and DNA isolation a second more frequent (4 bp) cutter is used to digest DNA. Materials are then diluted and a second ligation is performed. At this point circular molecules should be enriched for loci that were initially bound in the nucleus. Thus, primers emanating from the bait can be applied to amplify prey material. Following this, amplied material can be analyzed by array or sequencing. (B) This strategy was applied to WT andfkh1fkh2 arrested cells using ARS305 as a bait. ARS305 and ARS306 and ARS305 and ARS607 nuclear co-localization was observed in WT cells but those interactions were not observed in fkh1fkh2 cells (B). 138 Figure 6.8: Continued 139 Figure 6.9: (A)The amino acid sequence of all members of the yeast Forkhead fam- ily were aligned with those of the human FoxP protein family. Residues that were found to be important for formation and stabilization of domain-swapped dimers in FoxP2 and 3 were also found at corresponding sites n Fkh1 and Fkh2 (highlighted columns). When modeling Fkh1 structure using FoxP3 as a blueprint the residues that were identied in FoxP3 to stabilize the dimer through direct interactions (B) were also shown to interact directly in Fkh1 (C). (D) BrdU-IP-seq reveals a similar pattern of origin ring in fkh1*** and fkh2*** cells as is seen in fkh1fkh2 cells. (E) Fkh1***-repressed and excited origins correspond strongly with Fkh-excited and repressed origins. 140 Figure 6.9: Continued 141 We have aligned the forkhead domains of yeast Fkh1, Fkh2, Hcm1 and Fhl1 with those of the human proteins FoxP1-P4 (Figure 6.9A). Both Hcm1 and Fhl1, like most other known members of the forkhead family, have proline in the position cor- responding to FoxP2-Ala539. However, both Fkh1 and Fkh2 have alanine at that site (orange column). Furthermore, Fkh1 and Fkh2 contain glutamine residues at the site corresponding to FoxP3-W348 (blue column) and asparagine residues at the site corresponding to FoxP3-M370 (green column). Both glutamine and asparagine can form hydrogen bonds with each other, and could thereby stabilize the proposed domain-swapped conformation of Fkh1 and Fkh2. Furthermore, we have modeled the structure of Fkh1 and Fkh2 forkhead domains against those of FoxP3 (based on se- quence alignment) and have demonstrated that these residues (at least in the model) are positioned exactly so that they would stabilize the proposed domain-swapped structure (Figure 6.9C). To test whether Fkh1 or Fkh2 might have their action on origins through a domain- swapped dimer formation, we have transformed fkh1fkh2 cells separately with Fkh1 or Fkh2 where the alanine aligning to FoxP2-Ala539 is substituted with a proline (helix breaker), the glutamine aligning to FoxP3-W348 is substituted with a glutamic acid (charged) and the asparagine aligning to FoxP3-M370 is substituted with aspartic acid (charged; hereon referred to as Fkh1*** and Fkh2**). With these strains we have performed BrdU-IP-seq in HU (Figure 6.9D; Chromosome III), and observe that ARS305 is severely reduced in its ring in these mutant cells. Using the same analysis as we applied to call Fkh-excited and -repressed origins we have determined that, in fkh1*** cells, 37 origins are reduced in their HU-eciency (Figure 6.9E). Of these 37, 34 overlap with Fkh-repressed origins (P-value < 0:001). Like in fkh1fkh2 cells we also see a large population (65 origins lose eciency in fkh*** cells). Of these, 57 142 overlap with Fkh-excited origins (P-value < 0:001). 6.8 Chapter Summary We have identied Fkh1 and Fkh2 as broad regulators of replication initiation in S. cerevisiae. Alone, Fkh1 regulates over 50 origins. At more than half of these origins, Fkh1 acts to increase HU-eciency while at the others it represses ring. Fkh2 on its own is not required for proper origin function, however when both Fkh1 and Fkh2 action is removed a larger increase in origin deregulation (relative to that seen in fkh1 cells) is observed. Almost 200 of the 350 origins we have analyzed are deregulated in fkh1fkh2 cells. Over half of these show increases in their eciency when Fkh action is removed while the others show a decrease in origin activity. Notably, ARS305 and ARS607, which are both highly HU-ecient origins, become very HU-inecient in fkh1fkh2 cells. FACS analysis shows that WT and Fkh-mutant cells enter and exit S-phase at ap- proximately the same times. With a BrdU-pulsing strategy we conrmed that both ARS305 and 607 are delayed in their ring in fkh1fkh2 cells. Furthermore, we have identied large ( 300 kbp) regions that show earlier replication schedules in fkh1fkh2 cells due to earlier activation of Fkh-repressed origins. In general, in fkh1fkh2 cells Fkh-repressed origins and their surrounding regions are replicated earlier while Fkh-excited origins and their surrounding regions are replicated later. Furthermore, Fkh-repressed and Fkh-excited origins are signicantly enriched and depleted for CEN-proximal origins, respectively. Neither ORC nor MCM enrichment is altered at Fkh-regulated origins. However, at Fkh-excited origins Cdc45 is signicantly depleted in the mutant cells. In WT 143 cells, Cdc45-enriched regions are evenly divided between CEN-proximal and non- CEN-proximal origins, whereas in fkh1fkh2 cells, the majority of origins that bind Cdc45 are CEN-proximal. Approximately 60% of all origins bind Fkh1 or Fkh2 and two origins (ARS305 and ARS607 ) that show large excitation by Fkh1 and Fkh2 are enriched for Fkh1 asso- ciation. However, Fkh-excited origins as a population are not signicantly enriched for Fkh association. In contrast, Fkh-repressed origins are enriched for Fkh1 bind- ing. The mechanism by which Fkh-proteins regulate origin ring does not appear to be through concurrent action at nearby genes. Although Fkh1 and Fkh2 are known as broad regulators of transcription and have been implicated in regulating transcript elongation and termination, we nd no evidence for transcriptional changes at Fkh- regulated origins. However, chromatin alterations at these origins are observed; in fkh1fkh2 cells, at Fkh-excited origins, nucleosome phasing is decreased. Further- more, in fkh1fkh2 cells the well dened inter-chromosomal interaction between ARS305 and ARS607 is abrogated indicating that Fkh1 and Fkh2 regulate long- range chromatin interactions. We hypothesize that these interactions are mediated by Fkh-proteins forming domain-swapped dimers. Three strategically placed residues in the FoxP proteins allow them to form such structures, and alignment of Fkh1 and Fkh2 with these proteins reveals that they (but not other yeast forkhead proteins) have biochemically similar residues at corresponding positions. Furthermore, struc- tural modeling has demonstrated that if Fkh1 or Fkh2 were to form these dimers, these residues would interact at the dimer interface much like the corresponding FoxP residues. Finally, in cells where Fkh1 or Fkh2 have been altered by replacing these these amino-acids with molecules that inhibit such an interaction, a replica- 144 tion phenotype similar to that of fkh1fkh2 cells is observed. This suggests that Fkh1 and Fkh2 regulate origin ring through their ability to form domain-swapped dimers. 145 Chapter 7: Discussion DNA replication occurs on a genomic template that is actively engaged in other genomic processes such as transcription and DNA repair. To ensure that each pro- cess occurs in a timely and accurate manner, coordination between their nuclear requirements is necessary. In the tightly packed S. cerevisae genome origins are closely anked by genes that are regulated by transcriptional activators/repressors. Whether this regulation occurs through histone modication or through direct inter- actions with the transcription machinery, the regulators come in close proximity to the origins. Long- and short-range chromatin interactions also exist, and thus, loci (and their regulators) that are linearly distal to an origin can in reality be proximal in the nucleus. In this current work we sought to elucidate the mechanisms the cell has evolved to coordinate these processes. To perform this task, we planned to de- velop techniques to measure replication activity with an accuracy that would allow changes in that activity to be detected. Second, we wished to capture a high delity spatio-temporal map of replication that would allow us to discern between early- and late-replicating genomic regions so that a correlative analysis of replication timing and concurrent genomic processes could be performed. An emerging hypothesis for origin replication dynamics is that ring times are dened by their ability to recruit rate limiting replication factors. We wished to determine whether such a model could 146 explain experimentally observed ring times. To perform this task, we planned to im- plement a computational model in a manner that would allow for direct comparisons between its output and the spatio-temporal map described above. Within the model, each origin is assigned a propensity to attract limiting factors, and we also wished to identify factors that were responsible for assigning origins this propensity. Previous analysis revealed that the HDAC Rpd3 suppressed the ring of several origins. With new tools for analysis, we planned to determine Rpd3's action at origins genome-wide and to dissect its mechanism of action by directly targeting the various complexes it acts within. From the results of this work, we hoped to identify other putative origin-regulating proteins, and to test for causation and mechanism, we planned to perturb them and determine whether a corresponding change in replication was pro- duced. 7.1 Summary of Results BrdU can be used to produce genome-wide maps of replication timing and activity. Sequencing-based analysis of BrdU-incorporated DNA provides higher delity data than does array-based analysis. However, current array costs are suciently low to allow for many initial experiments to be performed genome-wide, and for multiple cell types to be analyzed with pulsing experiments (whose costs are high with sequencing- based analysis). Therefore, tools for analyzing BrdU-IP-chip datasets are necessary. We have demonstrated that in such data, large biases corresponding to probe intensity exist that cannot be reconciled with current array analysis techniques. We have introduced a method to remove this bias, and further, have introduced methods to analyze the resultant data; after such normalization, BrdU-IP-chip datasets provide 147 an accurate measure of replication activity. We have also taken advantage of sequencing technologies and have applied BrdU-pulse strategies to develop a high delity map of yeast chromosomal replication through- out S-phase. Through analysis of this dataset, we were able to more than triple the number of putative replication fork TER sites. To capture replication origin e- ciencies, we sequenced BrdU-IP'd DNA from cells that replicated in the presence of HU. Combining these datasets we conrmed that HU-eciency predicts origin timing proles. Furthermore, we demonstrated that, for any genomic region, the mean and variance (between cells) of its replication time is dened by the distance to and the activity of the nearest origin. Finally, with ChIP-chip we have mapped ORC, MCM and Cdc45 origin localization genome-wide, and have determined that both ORC and Cdc45 binding at origins is predictive of their HU-eciency. Based on the above results, we have proposed a molecular model for how the replica- tion timing schedule is dened. We hypothesized that replication origin ring times are determined by their ability to attract rate-limiting replication factors from limited nuclear pools. In the model, origins with a high propensity to recruit these factors are able to do so early in S-phase (reducing the size of the nuclear pool). When these origins have accumulated a sucient amount of factor in their vicinity, they re (resulting in their early replication), while origins with lesser propensities must wait for more factor to become available (resulting in their later replication). We have computationally implemented this model to test its validity. The model was tted to the temporal dataset described above and used to produce a simulated dataset. The tted parameter values are plausible and the simulated data are highly correlated with experimental results. To determine what allows some origins to attract factors such as Cdc45 better than 148 others we analyzed transcription, chromatin modications and chromatin structure in the regions surrounding origins. We found, contrary to results in higher eukaryotes, that origin activity is not correlated with anking genic activity in WT yeast cells. Also, no single histone modication type in the vicinity of the origins was predictive of their HU-eciency. However, we determined that HU-ecient origins have more dened phasing of nucleosomes around their ACS, and that origins that form long- range chromatin interactions with one another are HU-ecient. In contrast to the above result (a lack of correlation between histone modications and origin HU-eciency), the HDAC Rpd3 has been previously reported to repress the ring of several origins. To grasp better the scope of Rpd3's action at origins we analyzed WT and rpd3 cells replicating in HU, with BrdU-IP-chip. We deter- mined that over one hundred origins are repressed by Rpd3 action. Furthermore, we found that this regulation is through its action in the Rpd3L complex (a tran- scriptional repressor that is recruited to origins by several DNA binding proteins) as opposed to its action in Rpd3S (a complex that acts to suppress spurious transcrip- tion throughout the genome). Finally, we found that Rpd3-regulated origins were localized where Rpd3 has been shown to act as an HDAC and also that Rpd3 binding and transcription regulation were over-represented at these sites. We have also identied Fkh1 and Fkh2 as factors that in uence origin activity. Fkh1 alone suppresses the HU-eciency many origins while also activating the HU- eciency of others. Fkh2 alone does not regulate origin ring, however when both Fkh1 and Fkh2 action is removed, 100 origins are increased in their HU-eciency while an almost equal amount are reduced in there activity. A closer examina- tion reveals that the majority of Fkh-excited origins are non-CEN-proximal, while CEN-proximal origins tend to be Fkh-repressed. Also, in concordance with above 149 results, changes in HU-eciency at Fkh-regulated origins are coupled with a cor- responding change in their ring schedule (Fkh-excited origins become later-ring, while Fkh-repressed origins become earlier-ring). An analysis of ORC and MCM binding revealed no dierences between WT and fkh1fkh2 cells. However, in fkh1fkh2 cells, Cdc45 is depleted at Fkh-excited origins and the origins that are enriched for Cdc45 are primarily CEN-proximal. In fkh1fkh2 cells, genes anking Fkh-regulated origins do not show changes that correlate with the changes seen in HU-eciency. In asynchronous cells there is no discernible dierence between WT and fkh1fkh2 cells' nucleosome phasing around the ACS at Fkh-regulated origins. However, there is some evidence (at Fkh-excited origins, in G1-phase) sug- gesting that phasing in fkh1fkh2 cells is reduced. Forkhead proteins appear to have their action at Fkh-excited origins by promoting long range interactions between origins. Using 4C, we were able to verify previous reports that the Fkh-excited origins ARS305 and ARS607 interact in the nucleus. Strikingly, this interaction is abolished in fkh1fkh2 cells. Furthermore, based on amino acid sequence alignments, we have shown that Fkh1 and Fkh2 have homology to FoxP proteins at residues that promote FoxP domain-swapped dimerization. Through directed mutagenesis we have demonstrated that altering these sites in Fkh1 or Fkh2 causes cells to replicate with a pattern of origin ring that mimics that seen in fkh1fkh2 cells. 7.2 Interpretation of Results We have developed and validated a model of DNA replication that directly tests the hypothesis that the replication schedule of the yeast genome is dictated by individual origins' abilities to attract limiting replication proteins essential for ring. This leaves 150 open the question, of what origin features determine its propensity to attract such factors. An early hypothesis is that local transcription activity could play a role. In higher eukaryotes origins localize to CpG islands and their activity is correlated with nearby genes. We have shown that in unperturbed yeast cells, the transcription activity of the genes anking origins does not correlate with origin activity. An explanation for this dierence may be found in the fact that yeast origins are enriched at converging intergenic regions (downstream of genes as opposed to upstream of genes in higher eukaryotes) and, furthermore, at such loci, origins are typically found in the downstream NFR. Unlike at the promoter, the downstream NFR is, in general, unregulated by transcription activators/repressors (although it may be in uenced by factors such as Rpd3S). Therefore, in comparison to its upstream counterpart, the downstream NFRs of dierentially expressed genes likely dier very little from one another. Thus, origins that are localized downstream of dierentially expressed genes likely dier in their chromatin environment less than they would if they sat upstream of those same genes and are, hence, likely less eected by anking genic activities. In higher eukaryotes, DNA- and histone-modications contribute to the establishment of large regions of euchromatin and heterochromatin, and origins are typically found in euchromatic regions. Furthermore, active origins are associated with several specic histone modications. We have demonstrated that, based on the data available, no single histone modication at yeast origins is predictive of HU-eciency. This nding contradicts previous results demonstrating that HDACs regulate yeast origin function. Furthermore, in this thesis we have demonstrated that the Rpd3 HDAC regulates over 100 yeast origins. A simple explanation for the lack of correlation, in WT cells, is that while histone modications may in uence origins they are not the 151 dominant factor doing so. If a more dominant mechanism for promoting eciency existed, the origins that beneted from it would have HU-eciencies that far exceed origins that did not. If a lesser regulating factor (e.g. acetylation) were responsible for some regulation, it would likely only aect less ecient origins as the origins beneting from the dominant factor would have close to 100% eciency. Therefore, by removing the lesser regulator, one would expect only to see changes in function at less ecient origins. In concordance with this hypothesis, when Rpd3 is removed, the majority of origins that show increases in their activity are of low eciency. The dominant regulator of origin eciency is likely involved in origin clustering within the nucleus. In higher eukaryotes the nucleus is partitioned into two compartments and only origins that exist in one of those domains are early replicating [216]. We have demonstrated that yeast origins that cluster in early G1-phase recruit the majority of the essential replication protein Cdc45, are highly HU-ecient and show very early ring densities. We have also shown evidence suggesting that Fkh1 and Fkh2 are regulators of at least some of this clustering and we hypothesize that they perform this task by binding separate origins and dimerizing to bring those origins together in the nucleus. One observation that requires reconciling with the above model is that, while many Fkh-excited origins are enriched for Fkh1 and Fkh2, Fkh-binding is also observed at non-Fkh-regulated and Fkh-repressed origins (in fact, Fkh1 is enriched at Fkh- repressed origins). How these proteins can act in opposite ways at dierent origins remains an open question. One explanation is that the proteins act with a yet uniden- tied factor and that only origins that show concurrent association of both a Fkh1 or Fkh2 and this other factor benet from clustering. We have shown that a truncated version of Fkh2 (which has been hypothesized to contain multiple phosphorylation 152 sites), when transformed into the cells in a high copy vector, rescues fkh1fkh2 cells from pseudohyphal growth, but does not rescue origin function. This indicates that these putative sites of modication are important for Fkh2's action at origins and promotes the idea that a second factor may be acting to modify these proteins in a manner that is specic to their actions at origins. A second hypothesis for the above discrepancy is that other factors (along with Fkh1 and Fkh2) in uence higher order genomic packaging. For example, DNA sequence alone is predictive of a portion of the structure that was analyzed in Figure 3.6 D (Dr. F Alber, pers. comm.). In order for Fkh1 or Fkh2 to dimerize they may be required to be in the same general vicinity in the nucleus, and their monomeric versus dimeric status would likely be determined by these other factors. In this model, origins whose clustering is Fkh-dependent, even without Fkh-action occupy the same regions of the nucleus, while origins that are Fkh-unregulated or -repressed are isolated. Two plausible explanations for the existence of Fkh-repressed origins exist under this model. In the rst, Fkh1 (which is enriched at Fkh-repressed origins and is a known silencer) acts to repress all origins it localizes at when in its monomeric form. However, when dimerized Fkh1 may lose its repressive action or this action may simply be overpowered by the advantages of origin clustering. When Fkh-action is removed (e.g. in fkh1fkh2 cells) origins that do not cluster in G1-phase only lose the repressive action of Fkh1 and are, thus, increased in their HU-eciency, while those origins that do cluster lose this advantage and, thus, lose HU-eciency. A second possibility is that Fkh-repressed origins are not repressed directly by Fkh- action, but infkh1fkh2 cells, when normally highly ecient origins are reduced in their propensity to attract limiting factors, less ecient origins' relative propensities increase, causing an increase their HU-eciency. 153 7.3 Concluding Remarks Replication origin eciencies and ring times are well conserved in simple eukaryotes such as yeast and also in higher order organisms such as metazoans and mammals. Furthermore, in higher eukaryotes dierences are seen in the replication schedule of the cell as it moves from one developmental stage to the next. Timing has been implicated in epigenetic inheritance and also in genome stability, thus understanding the mechanisms responsible for dening origin ring schedules is important to un- derstanding healthy cell proliferation. Here we have shown that the timing schedule is determined by origins' ability to recruit limiting replication factors, and also that this ability is dened by their propensity to cluster with one another in the nucleus. Furthermore, we have shown that the proteins responsible for (at least some of) yeast origin clustering show important similarities to two members of the human FoxP pro- tein family, which are important for development, neuronal plasticity and immune response. Future research directions should be focused on determining if FoxP pro- teins also regulate replication timing and also if this regulation is coupled to their function in these three areas. 154 Bibliography [1] G. Abdurashidova, M. Deganuto, R. Klima, S. Riva, G. Biamonti, M. Giacca, and A. Falaschi. Start sites of bidirectional DNA synthesis at the human lamin B2 origin. Science, 287(5460):2023{6, 2000. [2] B. D. Aggarwal and B. R. Calvi. Chromatin regulates origin activity in Drosophila follicle cells. Nature, 430(6997):372{6, 2004. [3] I. Albert, T. N. Mavrich, L. P. Tomsho, J. Qi, S. J. Zanton, S. C. Schuster, and B. F. Pugh. Translational and rotational settings of H2A.Z nucleosomes across the Saccharomyces cerevisiae genome. Nature, 446(7135):572{6, 2007. [4] A. A. Alcasabas, A. J. Osborn, J. Bachant, F. Hu, P. J. Werler, K. Bousset, K. Furuya, J. F. Diey, A. M. Carr, and S. J. Elledge. Mrc1 transduces signals of DNA replication stress to activate Rad53. Nat Cell Biol, 3(11):958{65, 2001. [5] AA Alekseyenko, E Larschan, WR Lai, PJ Park, and MI Kuroda. High- resolution ChIP-chip analysis reveals that the Drosophila MSL selectively iden- ties active genes on the male X chromosome. Genes & Dev, 20(7):848{857, 2006. [6] J. B. Allen, Z. Zhou, W. Siede, E. C. Friedberg, and S. J. Elledge. The SAD1/RAD53 protein kinase controls multiple checkpoints and DNA damage- induced transcription in yeast. Genes Dev, 8(20):2401{15, 1994. [7] G. M. Alvino, D. Collingwood, J. M. Murphy, J. Delrow, B. J. Brewer, and M. K. Raghuraman. Replication in hydroxyurea: it's a matter of time. Mol Cell Biol, 27(18):6396{406, 2007. [8] S. Anders and W. Huber. Dierential expression analysis for sequence count data. Genome Biol, 11(10):R106, 2001. [9] J. G. Aparicio, C. J. Viggiani, D. G. Gibson, and O. M. Aparicio. The Rpd3- Sin3 histone deacetylase regulates replication timing and enables intra-S origin control in Saccharomyces cerevisiae. Mol Cell Biol, 24(11):4769{80, 2004. 155 [10] O. M. Aparicio. Characterization of proteins bound to chromatin by immunopre- cipitation from whole-cell extracts, volume 4, pages 21.3.1{21.3.12. John Wiley and Sons, Inc., New York, 1999. [11] O. M. Aparicio, A. M. Stout, and S. P. Bell. Dierential assembly of Cdc45p and DNA polymerases at early and late origins of DNA replication. Proc Natl Acad Sci USA, 96(16):9130{5., 1999. [12] O. M. Aparicio, D. M. Weinstein, and S. P. Bell. Components and dynamics of DNA replication complexes in S. cerevisiae: redistribution of MCM proteins and Cdc45p during S-phase. Cell, 91(1):59{69, 1997. [13] H. Araki, S. H. Leem, A. Phongdara, and A. Sugino. Dpb11, which interacts with DNA polymerase II (epsilon) in Saccharomyces cerevisiae, has a dual role in S-phase progression and at a cell cycle checkpoint. Proc Natl Acad Sci USA, 92(25):11791{11795, 1995. [14] R. J. Austin, T. L. Orr-Weaver, and S. P. Bell. Drosophila ORC specically binds to ACE3, an origin of DNA replication control element. Genes Dev, 13(20):2639{49, 1999. [15] A. Barski, S. Cuddapah, K. Cui, T. Y. Roh, D. E. Schones, Z. Wang, G. Wei, I. Chepelev, and K. Zhao. High-resolution proling of histone methylations in the human genome. Cell, 129(4):823{37, 2007. [16] M.A. Beaumont, W. Zhang, and D.J. Balding. Approximate Bayesian Compu- tation in population genetics. Genetics, 162(4):2025{35., 2002. [17] S. P. Bell and A. Dutta. DNA replication in eukaryotic cells. Annu Rev Biochem, 71:333{74, 2002. [18] S. P. Bell and B. Stillman. ATP-dependent recognition of eucaryotic origins of DNA replication by a multiprotein complex. Nature, 357(May 14):128{134, 1992. [19] N. M. Berbenetz, C. Nislow, and G. W. Brown. Diversity of eukaryotic DNA replication origins revealed by genome-wide analysis of chromatin structure. PLoS Genet, 6(9), 2010. [20] R. Berezney, D. D. Dubey, and J. A. Huberman. Heterogeneity of eukaryotic replicons, replicon clusters, and replication foci. Chromosoma, 108(8):471{84, 2000. [21] B. E. Bernstein, J. K. Tong, and S. L. Schreiber. Genomewide studies of histone deacetylase function in yeast. Proc Natl Acad Sci USA, 97(25):13708{13., 2000. 156 [22] E. Birney, J. A. Stamatoyannopoulos, A. Dutta, R. Guigo, T. R. Gingeras, E. H. Margulies, Z. Weng, M. Snyder, E. T. Dermitzakis, R. E. Thurman, M. S. Kuehn, C. M. Taylor, S. Neph, C. M. Koch, S. Asthana, A. Malho- tra, I. Adzhubei, J. A. Greenbaum, R. M. Andrews, P. Flicek, P. J. Boyle, H. Cao, N. P. Carter, G. K. Clelland, S. Davis, N. Day, P. Dhami, S. C. Dillon, M. O. Dorschner, H. Fiegler, P. G. Giresi, J. Goldy, M. Hawrylycz, A. Haydock, R. Humbert, K. D. James, B. E. Johnson, E. M. Johnson, T. T. Frum, E. R. Rosenzweig, N. Karnani, K. Lee, G. C. Lefebvre, P. A. Navas, F. Neri, S. C. Parker, P. J. Sabo, R. Sandstrom, A. Shafer, D. Vetrie, M. Weaver, S. Wilcox, M. Yu, F. S. Collins, J. Dekker, J. D. Lieb, T. D. Tullius, G. E. Crawford, S. Sunyaev, W. S. Noble, I. Dunham, F. Denoeud, A. Reymond, P. Kapra- nov, J. Rozowsky, D. Zheng, R. Castelo, A. Frankish, J. Harrow, S. Ghosh, A. Sandelin, I. L. Hofacker, R. Baertsch, D. Keefe, S. Dike, J. Cheng, H. A. Hirsch, E. A. Sekinger, J. Lagarde, J. F. Abril, A. Shahab, C. Flamm, C. Fried, J. Hackermuller, J. Hertel, M. Lindemeyer, K. Missal, A. Tanzer, S. Washietl, J. Korbel, O. Emanuelsson, J. S. Pedersen, N. Holroyd, R. Taylor, D. Swar- breck, N. Matthews, M. C. Dickson, D. J. Thomas, M. T. Weirauch, J. Gilbert, et al. Identication and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature, 447(7146):799{816, 2007. [23] D. Biswas, S. Takahata, and D. J. Stillman. Dierent genetic functions for the Rpd3(L) and Rpd3(S) complexes suggest competition between NuA4 and Rpd3(S). Mol Cell Biol, 28(14):4445{58, 2008. [24] J. J. Blow, P. J. Gillespie, D. Francis, and D. A. Jackson. Replication origins in Xenopus egg extract are 5-15 kilobases apart and are activated in clusters that re at dierent times. J Cell Biol, 152(1):15{25, 2001. [25] B. Bolstad, R. Irizarry, M. Astrand, and T. Speed. A comparison of normal- ization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics, 19:185{193, 2003. [26] A. S. Brewster, G. Wang, X. Yu, W. B. Greenleaf, J. M. Carazo, M. Tjajadia, M. G. Klein, and X. S. Chen. Crystal structure of a near-full-length archaeal MCM: functional insights for an AAA+ hexameric helicase. Proc Natl Acad Sci USA, 105(51):20191{6, 2008. [27] J. R. Broach, Y. Y. Li, J. Feldman, M. Jayaram, J. Abraham, K. A. Nasmyth, and J. B. Hicks. Localization and sequence analysis of yeast origins of DNA replication. Cold Spring Harb Symp Quant Biol, 47(Pt 2):1165{73., 1983. [28] M.J. Buck, A.B. Nobel, and J.D Lieb. ChIPOTle: a user-friendly tool for the analysis of ChIP-chip data. Genome Biol, 6(11):R97, 2005. 157 [29] T. W. Burke, J. G. Cook, M. Asano, and J. R. Nevins. Replication factors MCM2 and ORC1 interact with the histone acetyltransferase HBO1. J Biol Chem, 276(18):15397{408, 2001. [30] J. C. Cadoret and M. N. Prioleau. Genome-wide approaches to determining origin distribution. Chromosome Res, 18(1):79{89, 2010. [31] A. Calzada, B. Hodgson, M. Kanemaki, A. Bueno, and K. Labib. Molecular anatomy and regulation of a stable replisome at a paused eukaryotic DNA replication fork. Genes Dev, 19(16):1905{19, 2005. [32] P. B. Carpenter, P. R. Mueller, and W. G. Dunphy. Role for a Xenopus Orc2- related protein in controlling DNA replication. Nature, 379(6563):357{360, 1996. [33] M. J. Carrozza, L. Florens, S. K. Swanson, W. J. Shia, S. Anderson, J. Yates, M. P. Washburn, and J. L. Workman. Stable incorporation of sequence specic repressors Ash1 and Ume6 into the Rpd3L complex. Biochim Biophys Acta, 1731(2):77{87; discussion 75{6, 2005. [34] M. J. Carrozza, B. Li, L. Florens, T. Suganuma, S. K. Swanson, K. K. Lee, W. J. Shia, S. Anderson, J. Yates, M. P. Washburn, and J. L. Workman. Histone H3 methylation by Set2 directs deacetylation of coding regions by Rpd3S to suppress spurious intragenic transcription. Cell, 123(4):581{92, 2005. [35] S. E. Celniker, K. Sweder, F. Srienc, J. E. Bailey, and J. L. Campbell. Deletion mutations aecting autonomously replicating sequence ARS1 of Saccharomyces cerevisiae. Mol Cell Biol, 4(11):2455{66, 1984. [36] R. S. Cha and N. Kleckner. ATR homolog Mec1 promotes fork progression, thus averting breaks in replication slow zones. Science, 297(5581):602{6, 2002. [37] I. Chesnokov, D. Remus, and M. Botchan. Functional analysis of mutant and wild-type Drosophila origin recognition complex. Proc Natl Acad Sci USA, 98(21):11997{2002, 2001. [38] J. P. Chong, M. K. Hayashi, M. N. Simon, R. M. Xu, and B. Stillman. A double- hexamer archaeal minichromosome maintenance protein is an ATP-dependent DNA helicase. Proc Natl Acad Sci USA, 97(4):1530{5, 2000. [39] R. K. Clyne and T. J. Kelly. Genetic analysis of an ARS element from the ssion yeast Schizosaccharomyces pombe. Embo J, 14(24):6348{6357, 1995. [40] T. R. Coleman, P. B. Carpenter, and W. G. Dunphy. The Xenopus Cdc6 protein is essential for the initiation of a single round of DNA replication in cell-free extracts. Cell, 87:53{63, 1996. 158 [41] P. R. Cook. Predicting three-dimensional genome structure from transcriptional activity. Nat Genet, 32(3):347{52, 2002. [42] A. Crampton, F. Chang, Jr. Pappas, D. L., R. L. Frisch, and M. Weinreich. An ARS element inhibits DNA replication through a SIR2-dependent mechanism. Mol Cell, 30(2):156{66, 2008. [43] C. Dahmann, J. F. Diey, and K. A. Nasmyth. S-phase-promoting cyclin- dependent kinases prevent re-replication by inhibiting the transition of replica- tion origins to a pre-replicative state. Curr Biol, 5(11):1257{1269, 1995. [44] J. Dai, R. Y. Chuang, and T. J. Kelly. DNA replication origins in the Schizosac- charomyces pombe genome. Proc Natl Acad Sci USA, 102(2):337{42, 2005. [45] A. P. de Moura, R. Retkute, M. Hawkins, and C. A. Nieduszynski. Mathemati- cal modelling of whole chromosome replication. Nucleic Acids Res, 38(17):5623{ 33, 2010. [46] J. Dekker, K. Rippe, M. Dekker, and N. Kleckner. Capturing chromosome conformation. Science, 295(5558):1306{11, 2002. [47] C. S. Detweiler and J. J. Li. Ectopic induction of Clb2 in early G1 phase is sucient to block prereplicative complex formation in Saccharomyces cerevisiae. Proc Natl Acad Sci USA, 95(5):2384{9, 1998. [48] J. F. Diey, J. H. Cocker, S. J. Dowell, and A. Rowley. Two steps in the assembly of complexes at yeast replication origins in vivo. Cell, 78(2):303{316, 1994. [49] J. F. Diey and K. Labib. The chromosome replication cycle. J Cell Sci, 115(Pt 5):869{72, 2002. [50] P. A. Dijkwel, S. Wang, and J. L. Hamlin. Initiation sites are distributed at frequent intervals in the chinese hamster dihydrofolate reductase origin of repli- cation but are used with very dierent eciencies. Mol Cell Biol, 22(9):3053{65, 2002. [51] Y. Doyon, C. Cayrou, M. Ullah, A. J. Landry, V. Cote, W. Selleck, W. S. Lane, S. Tan, X. J. Yang, and J. Cote. ING tumor suppressor proteins are critical regulators of chromatin acetylation required for genome expression and perpetuation. Mol Cell, 21(1):51{64, 2006. [52] L. S. Drury, G. Perkins, and J. F. Diey. The cyclin-dependent kinase Cdc28p regulates distinct modes of Cdc6p proteolysis during the budding yeast cell cycle. Curr Biol, 10(5):231{40, 2000. 159 [53] Z. Duan, M. Andronescu, K. Schutz, S. McIlwain, Y. J. Kim, C. Lee, J. Shen- dure, S. Fields, C. A. Blau, and W. S. Noble. A three-dimensional model of the yeast genome. Nature, 465(7296):363{7, 2010. [54] D. D. Dubey, S. M. Kim, I. T. Todorov, and J. A. Huberman. Large, complex modular structure of a ssion yeast DNA replication origin. Curr Biol, 6(4):467{ 473, 1996. [55] B. P. Duncker, I. N. Chesnokov, and B. J. McConkey. The origin recognition complex protein family. Genome Biol, 10(3):214, 2009. [56] B. P. Duncker, K. Shimada, M. Tsai-P ugfelder, P. Pasero, and S. M. Gasser. An N-terminal domain of Dbf4p mediates interaction with both origin recog- nition complex (ORC) and Rad53p and can deregulate late origin ring. Proc Natl Acad Sci USA, 99(25):16087{92, 2002. [57] M. L. Eaton, K. Galani, S. Kang, S. P. Bell, and D. M. MacAlpine. Conserved nucleosome positioning denes replication origins. Genes Dev, 24(8):748{53, 2010. [58] S. Elsasser, Y. Chi, P. Yang, and J. L. Campbell. Phosphorylation controls tim- ing of Cdc6p destruction: a biochemical analysis. Mol Biol Cell, 10(10):3263{77, 1999. [59] S. Elsasser, F. Lou, B. Wang, J. L. Campbell, and A. Jong. Interaction between yeast Cdc6 protein and B-type cyclin/Cdc28 kinases. Mol Biol Cell, 7(11):1723{ 35, 1996. [60] A. Emili. MEC1-dependent phosphorylation of Rad9p in response to DNA damage. Mol Cell, 2(2):183{9, 1998. [61] D. Fachinetti, R. Bermejo, A. Cocito, S. Minardi, Y. Katou, Y. Kanoh, K. Shi- rahige, A. Azvolinsky, V. A. Zakian, and M. Foiani. Replication termination at eukaryotic chromosomes is mediated by Top2 and occurs at genomic loci containing pausing elements. Mol Cell, 39(4):595{605, 2010. [62] T. G. Fazzio, C. Kooperberg, J. P. Goldmark, C. Neal, R. Basom, J. Delrow, and T. Tsukiyama. Widespread collaboration of Isw2 and Sin3-Rpd3 chromatin remodeling complexes in transcriptional repression. Mol Cell Biol, 21(19):6450{ 60, 2001. [63] W. Feng, D. Collingwood, M. E. Boeck, L. A. Fox, G. M. Alvino, W. L. Fangman, M. K. Raghuraman, and B. J. Brewer. Genomic mapping of single- stranded DNA in hydroxyurea-challenged yeasts identies origins of replication. Nat Cell Biol, 8(2):148{55, 2006. 160 [64] Y. Field, N. Kaplan, Y. Fondufe-Mittendorf, I. K. Moore, E. Sharon, Y. Lubling, J. Widom, and E. Segal. Distinct modes of regulation by chromatin encoded through nucleosome positioning signals. PLoS Comput Biol, 4(11):e1000216, 2008. [65] C. A. Fox, S. Loo, A. Dillin, and J. Rine. The origin recognition complex has essential functions in transcriptional silencing and chromosomal replication. Genes Dev, 9(8):911{924, 1995. [66] K. L. Friedman and B. J. Brewer. Analysis of replication intermediates by two- dimensional agarose gel electrophoresis. Methods Enzymol, 262:613{27, 1995. [67] K. L. Friedman, B. J. Brewer, and W. L. Fangman. Replication prole of Saccharomyces cerevisiae Chromosome VI. Genes Cells, 2(11):667{78, 1997. [68] M. Fujita, T. Kiyono, Y. Hayashi, and M. Ishibashi. hCDC47, a human member of the MCM family. dissociation of the nucleus-bound form during S-phase. J. Biol. Chem., 271(8):4349{4354, 1996. [69] L. Furstenthal, B. K. Kaiser, C. Swanson, and P. K. Jackson. Cyclin E uses Cdc6 as a chromatin-associated receptor required for DNA replication. J Cell Biol, 152(6):1267{78, 2001. [70] N. Garg and D. Hochbaum. An O (log k) approximation algorithm for the k minimum spanning tree problem in the plane. Algorithmica, 18(1):111{121, 1997. [71] P. Garg and P. M. Burgers. DNA polymerases that propagate the eukaryotic DNA replication fork. Crit Rev Biochem Mol Biol, 40(2):115{28, 2005. [72] M. G. Gauthier and J. Bechhoefer. Control of DNA replication by anomalous reaction-diusion kinetics. Phys Rev Lett, 102(15):158104, 2009. [73] F.D. Gibbons, M. Proft, K. Struhl, and F.P. Roth. Chipper: discovering transcription-factor targets from chromatin immunoprecipitation microarrays using variance stabilization. Genome Biol, 6(11):R96, 2005. [74] D. M. Gilbert. Making sense of eukaryotic DNA replication origins. Science, 294:96{100, 2001. [75] D. M. Gilbert. Evaluating genome-scale approaches to eukaryotic DNA repli- cation. Nat Rev Genet, 11(10):673{84, 2010. [76] A. Goldar, H. Labit, K. Marheineke, and O. Hyrien. A dynamic stochastic model for DNA replication initiation in early embryos. PLoS One, 3(8):e2919, 2008. 161 [77] J. P. Goldmark, T. G. Fazzio, P. W. Estep, G. M. Church, and T. Tsukiyama. The Isw2 chromatin remodeling complex represses early meiotic genes upon recruitment by Ume6p. Cell, 103(3):423{33, 2000. [78] A. Gondor and R. Ohlsson. Replication timing and epigenetic reprogramming of gene expression: a two-way relationship? Nat Rev Genet, 10(4):269{276, 2009. [79] A. Goren, A. Tabib, M. Hecht, and H. Cedar. DNA replication timing of the human beta-globin domain is controlled by histone modication at the origin. Genes Dev, 22(10):1319{24, 2008. [80] L.J. Granovskaia, MV.and Jensen, M.E. Ritchie, J. Toedling, Y. Ning, P. Bork, W. Huber, and L.M. Steinmetz. High-resolution transcription atlas of the mi- totic cell cycle in budding yeast. Genome Biol, 11(3):R24, 1993. [81] E. Guillou, A. Ibarra, V. Coulon, J. Casado-Vela, D. Rico, I. Casal, E. Schwob, A. Losada, and J. Mendez. Cohesin organizes chromatin loops at DNA replica- tion factories. Genes Dev, 24(24):2812{22, 2010. [82] S. Hannenhalli and K. H. Kaestner. The evolution of Fox genes and their role in development and disease. Nat Rev Genet, 10(4):233{40, 2009. [83] C. T. Harbison, D. B. Gordon, T. I. Lee, N. J. Rinaldi, K. D. Macisaac, T. W. Danford, N. M. Hannett, J. B. Tagne, D. B. Reynolds, J. Yoo, E. G. Jennings, J. Zeitlinger, D. K. Pokholok, M. Kellis, P. A. Rolfe, K. T. Takusagawa, E. S. Lander, D. K. Giord, E. Fraenkel, and R. A. Young. Transcriptional regulatory code of a eukaryotic genome. Nature, 431(7004):99{104, 2004. [84] C. F. Hardy. Identication of Cdc45p, an essential factor required for DNA replication. Gene, 187(2):239{46, 1997. [85] C. Heichinger, C. J. Penkett, J. Bahler, and P. Nurse. Genome-wide charac- terization of ssion yeast DNA replication origins. EMBO J, 25(21):5171{9, 2006. [86] J. Herrick, P. Stanislawski, O. Hyrien, and A. Bensimon. Replication fork density increases during DNA synthesis in X. laevis egg extracts. J Mol Biol, 300(5):1133{42, 2000. [87] I. Hiratani, T. Ryba, M. Itoh, T. Yokochi, M. Schwaiger, C. W. Chang, Y. Lyou, T. M. Townes, D. Schubeler, and D. M. Gilbert. Global reorganization of replication domains during embryonic stem cell dierentiation. PLoS Biol, 6(10):e245, 2008. [88] B. Hodgson, A. Calzada, and K. Labib. Mrc1 and Tof1 regulate DNA replication forks in dierent ways during normal S-phase. Mol Biol Cell, 18(10):3894{902, 2007. 162 [89] P. C. Hollenhorst, M. E. Bose, M. R. Mielke, U. Muller, and C. A. Fox. Forkhead genes in transcriptional silencing, cell morphology and the cell cycle. overlap- ping and distinct functions for FKH1 and FKH2 in Saccharomyces cerevisiae. Genetics, 154(4):1533{48, 2000. [90] L. Homesley, M. Lei, Y. Kawasaki, S. Sawyer, T. Christensen, and B. K. Tye. Mcm10 and the MCM2-7 complex interact to initiate DNA synthesis and to release replication factors from origins. Genes Dev, 14(8):913{26, 2000. [91] T.C. Hsu, W. Schmid, and E. Stubbleeld. DNA replication sequence in higher animals. Academic Press, New York, p. 83, 1964. [92] X. H. Hua and J. Newport. Identication of a preinitiation step in DNA replica- tion that is independent of origin recognition complex and cdc6, but dependent on cdk2. J Cell Biol, 140(2):271{81, 1998. [93] X. H. Hua, H. Yan, and J. Newport. A role for Cdk2 kinase in negatively regulat- ing DNA replication during S-phase of the cell cycle. J Cell Biol, 137(1):183{92, 1997. [94] R. Y. Huang and D. Kowalski. Multiple DNA elements in ARS305 determine replication origin activity in a yeast chromosome. Nucleic Acids Res, 24(5):816{ 23., 1996. [95] U. Hubscher, G. Maga, and V. N. Podust. DNA replication accessory proteins, pages 525{44. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, 1996. [96] M. Iizuka, T. Matsui, H. Takisawa, and M. M. Smith. Regulation of replication licensing by acetyltransferase Hbo1. Mol Cell Biol, 26(3):1098{108, 2006. [97] M. Iizuka and B. Stillman. Histone acetyltransferase HBO1 interacts with the ORC1 subunit of the human initiator protein. J Biol Chem, 274(33):23027{34, 1999. [98] Y. Ishimi. A DNA helicase activity is associated with an MCM4, -6, and -7 protein complex. J Biol Chem, 272(39):24508{13, 1997. [99] Y. Ishimi and Y. Komamura-Kohno. Phosphorylation of Mcm4 at specic sites by cyclin-dependent kinase leads to loss of Mcm4,6,7 helicase activity. J Biol Chem, 276(37):34428{33, 2001. [100] A. S. Ivessa, B. A. Lenzmeier, J. B. Bessler, L. K. Goudsouzian, S. L. Schnakenberg, and V. A. Zakian. The Saccharomyces cerevisiae helicase Rrm3p facilitates replication past non-histone protein-DNA complexes. Mol Cell, 12(6):1525{36, 2003. 163 [101] D. A. Jackson and A. Pombo. Replicon clusters are stable units of chromosome structure: evidence that nuclear organization contributes to the ecient acti- vation and propagation of S-phase in human cells. J Cell Biol, 140(6):1285{95, 1998. [102] S. P. Jackson. The recognition of DNA damage. Curr Opin Genet Dev, 6(1):19{ 25, 1996. [103] P. V. Jallepalli, G. W. Brown, M. Muzi-Falconi, D. Tien, and T. J. Kelly. Reg- ulation of the replication initiator protein p65cdc18 by CDK phosphorylation. Genes Dev, 11(21):2767{79, 1997. [104] P. Jares, A. Donaldson, and J. J. Blow. The Cdc7/Dbf4 protein kinase: target of the S-phase checkpoint? EMBO Rep, 1(4):319{22, 2000. [105] C. Jiang and B. F. Pugh. A compiled and systematic reference map of nu- cleosome positions across the Saccharomyces cerevisiae genome. Genome Biol, 10(10):R109, 2009. [106] W. Jiang, D. McDonald, T. J. Hope, and T. Hunter. Mammalian Cdc7-Dbf4 protein kinase complex is essential for initiation of DNA replication. Embo J, 18(20):5703{13, 1999. [107] W. Jiang, N. J. Wells, and T. Hunter. Multistep regulation of DNA replication by Cdk phosphorylation of HsCdc6. Proc Natl Acad Sci USA, 96(11):6193{8, 1999. [108] W.E. Johnson, W. Li, C.A. Meyer, R. Gottardo, J.S. Carroll, M. Brown, and X.S. Liu. Model-based analysis of tiling-arrays for ChIP-chip. Proc Natl Acad Sci USA, 103(33):12457{12462, 2006. [109] A. A. Joshi and K. Struhl. Eaf3 chromodomain interaction with methylated H3-K36 links histone deacetylation to Pol II elongation. Mol Cell, 20(6):971{8, 2005. [110] D. Kadosh and K. Struhl. Histone deacetylase activity of Rpd3 is important for transcriptional repression in vivo. Genes Dev, 12(6):797{805., 1998. [111] D. Kadosh and K. Struhl. Targeted recruitment of the Sin3-Rpd3 histone deacetylase complex generates a highly localized domain of repressed chromatin in vivo. Mol Cell Biol, 18(9):5121{7., 1998. [112] Y. Kamimura, H. Masumoto, A. Sugino, and H. Araki. Sld2, which interacts with Dpb11 in Saccharomyces cerevisiae, is required for chromosomal DNA replication. Mol Cell Biol, 18(10):6102{9., 1998. 164 [113] Y. Kamimura, Y. S. Tak, A. Sugino, and H. Araki. Sld3, which interacts with Cdc45 (sld4), functions for chromosomal DNA replication in Saccharomyces cerevisiae. Embo J, 20(8):2097{107., 2001. [114] N. Kaplan, I. K. Moore, Y. Fondufe-Mittendorf, A. J. Gossett, D. Tillo, Y. Field, E. M. LeProust, T. R. Hughes, J. D. Lieb, J. Widom, and E. Segal. The DNA-encoded nucleosome organization of a eukaryotic genome. Nature, 458(7236):362{6, 2009. [115] T. Kaplan, C. L. Liu, J. A. Erkmann, J. Holik, M. Grunstein, P. D. Kaufman, N. Friedman, and O. J. Rando. Cell cycle- and chaperone-mediated regulation of H3K56ac incorporation in yeast. PLoS Genet, 4(11):e1000270, 2008. [116] Y. Katou, Y. Kanoh, M. Bando, H. Noguchi, H. Tanaka, T. Ashikari, K. Sug- imoto, and K. Shirahige. S-phase checkpoint proteins Tof1 and Mrc1 form a stable replication-pausing complex. Nature, 424(6952):1078{83, 2003. [117] Y. Kawasaki, S. Hiraga, and A. Sugino. Interactions between Mcm10p and other replication factors are required for proper initiation and elongation of chromo- somal DNA replication in Saccharomyces cerevisiae. Genes Cells, 5(12):975{89, 2000. [118] U. Keich, H Gao, J.S. Garretson, A. Bhaskar, I Liachko, J Donato, and B.K. Tye. Computational detection of signicant variation in binding anities across two sets of sequences with application to the analysis of replication origins in yeast. BMC Bioinformatics, 9(372):doi:10.1186/1471{2105{9{372, 2008. [119] Z. Kelman, J. K. Lee, and J. Hurwitz. The single minichromosome maintenance protein of Methanobacterium thermoautotrophicum DeltaH contains DNA heli- case activity. Proc Natl Acad Sci USA, 96(26):14783{8, 1999. [120] M. C. Keogh, S. K. Kurdistani, S. A. Morris, S. H. Ahn, V. Podolny, S. R. Collins, M. Schuldiner, K. Chin, T. Punna, N. J. Thompson, C. Boone, A. Emili, J. S. Weissman, T. R. Hughes, B. D. Strahl, M. Grunstein, J. F. Greenblatt, S. Buratowski, and N. J. Krogan. Cotranscriptional set2 methylation of histone H3 lysine 36 recruits a repressive Rpd3 complex. Cell, 123(4):593{605, 2005. [121] S. M. Kim and J. A. Huberman. Multiple orientation-dependent, synergistically interacting, similar domains in the ribosomal DNA replication origin of the ssion yeast, Schizosaccharomyces pombe. Mol Cell Biol, 18(12):7294{303, 1998. [122] S. M. Kim and J. A. Huberman. Regulation of replication timing in ssion yeast. EMBO J, 20(21):6115{26, 2001. [123] E. Kitamura, J. J. Blow, and T. U. Tanaka. Live-cell imaging reveals replication of individual replicons in eukaryotic replication factories. Cell, 125(7):1297{308, 2006. 165 [124] D. Kitsberg, S. Selig, I. Keshet, and H. Cedar. Replication structure of the human beta-globin domain. Nature, 366(Dec 9):588{590, 1993. [125] R. D. Klemm, R. J. Austin, and S. P. Bell. Coordinate binding of ATP and origin DNA regulates the ATPase activity of the Origin Recognition Complex. Cell, 88(Feb. 21):493{502, 1997. [126] S. R. Knott, C. J. Viggiani, and O. M. Aparicio. To promote and protect: co- ordinating DNA replication and transcription for genome stability. Epigenetics, 4(6):362{5, 2009. [127] T. Kouzarides. Chromatin modications and their function. Cell, 128(4):693{ 705, 2007. [128] S. Kreitz, M. Ritzi, M. Baack, and R. Knippers. The human origin recognition complex protein 1 dissociates from chromatin during S-phase in HeLa cells. J Biol Chem, 276(9):6337{42., 2001. [129] P. J. Krysan and M. P. Calos. Replication initiates at multiple locations on an autonomously replicating plasmid in human cells. Mol Cell Biol, 11(3):1464{72, 1991. [130] P.F. Kuan, H. Chun, and K. Sunduz. CMARRT: A tool for the analysis of ChIP-chip data from tiling array s by incorporating the correlation structure. Pacic Symposium on Biocomputing, 13:515{526, 2008. [131] R. Kumar, D. M. Reynolds, A. Shevchenko, S. D. Goldstone, and S. Dalton. Forkhead transcription factors, Fkh1p and Fkh2p, collaborate with Mcm1p to control transcription required for M-phase. Curr Biol, 10(15):896{906, 2000. [132] S. K. Kurdistani, D. Robyr, S. Tavazoie, and M. Grunstein. Genome-wide binding map of the histone deacetylase Rpd3 in yeast. Nat Genet, 31(3):248{ 54., 2002. [133] K. Labib, J. F. Diey, and S. E. Kearsey. G1-phase and B-type cyclins exclude the DNA-replication factor Mcm4 from the nucleus. Nat Cell Biol, 1(7):415{22, 1999. [134] K. Labib, J. A. Tercero, and J. F. Diey. Uninterrupted MCM2-7 function required for DNA replication fork progression. Science, 288(5471):1643{7., 2000. [135] J. K. Lee and J. Hurwitz. Processive DNA helicase activity of the minichromo- some maintenance proteins 4, 6, and 7 complex requires forked DNA structures. Proc Natl Acad Sci USA, 98(1):54{9, 2001. [136] J. S. Lee and A. Shilatifard. A site to remember: H3K36 methylation a mark for histone deacetylation. Mutat Res, 618(1-2):130{4, 2007. 166 [137] T. I. Lee, N. J. Rinaldi, F. Robert, D. T. Odom, Z. Bar-Joseph, G. K. Gerber, N. M. Hannett, C. T. Harbison, C. M. Thompson, I. Simon, J. Zeitlinger, E. G. Jennings, H. L. Murray, D. B. Gordon, B. Ren, J. J. Wyrick, J. B. Tagne, T. L. Volkert, E. Fraenkel, D. K. Giord, and R. A. Young. Transcriptional regulatory networks in Saccharomyces cerevisiae. Science, 298(5594):799{804, 2002. [138] W. Lee, D. Tillo, N. Bray, R. H. Morse, R. W. Davis, T. R. Hughes, and C. Nislow. A high-resolution atlas of nucleosome occupancy in yeast. Nat Genet, 39(10):1235{44, 2007. [139] B. Li, M. Gogol, M. Carey, D. Lee, C. Seidel, and J. L. Workman. Combined action of PHD and chromo-domains directs the Rpd3S HDAC to transcribed chromatin. Science, 316(5827):1050{4, 2007. [140] W Li, CA Meyer, and XS Liu. A hidden Markov model for analyzing ChIP- chip experiments on genome tiling arrays and its application to p53 binding sequences. Bioinformatics, 21)(1):274{i282, 2005. [141] C. Liang, M. Weinreich, and B. Stillman. ORC and Cdc6p interact and de- termine the frequency of initiation of DNA replication in the genome. Cell, 81(5):667{676, 1995. [142] E. Lieberman-Aiden, N. L. van Berkum, L. Williams, M. Imakaev, T. Ragoczy, A. Telling, I. Amit, B. R. Lajoie, P. J. Sabo, M. O. Dorschner, R. Sandstrom, B. Bernstein, M. A. Bender, M. Groudine, A. Gnirke, J. Stamatoyannopoulos, L. A. Mirny, E. S. Lander, and J. Dekker. Comprehensive mapping of long- range interactions reveals folding principles of the human genome. Science, 326(5950):289{93, 2009. [143] J. R. Lipford and S. P. Bell. Nucleosomes positioned by ORC facilitate the initiation of DNA replication. Mol Cell, 7(1):21{30., 2001. [144] M. Lopes, C. Cotta-Ramusino, A. Pellicioli, G. Liberi, P. Plevani, M. Muzi- Falconi, C. S. Newlon, and M. Foiani. The DNA replication checkpoint response stabilizes stalled replication forks. Nature, 412(6846):557{61, 2001. [145] A. Lopez-Girona, O. Mondesert, J. Leatherwood, and P. Russell. Negative regulation of Cdc18 DNA replication protein by Cdc2. Mol Biol Cell, 9(1):63{ 73, 1998. [146] I. Lucas, M. Chevrier-Miller, J. M. Sogo, and O. Hyrien. Mechanisms ensuring rapid and complete DNA replication despite random initiation in Xenopus early embryos. J Mol Biol, 296(3):769{86, 2000. 167 [147] I. Lucas, T. Germe, M. Chevrier-Miller, and O. Hyrien. Topoisomerase II can unlink replicating DNA by precatenane removal. EMBO J, 20(22):6509{19, 2001. [148] J. Lygeros, K. Koutroumpas, S. Dimopoulos, I. Legouras, P. Kouretas, C. He- ichinger, P. Nurse, and Z. Lygerou. Stochastic hybrid modeling of DNA repli- cation across a complete genome. Proc Natl Acad Sci USA, 105(34):12295{300, 2008. [149] M Lyon. Chromosomal and subchromosomal inactivation. Annu Rev Genet, 2:31{52, 1968. [150] H. Ma, J. Samarabandu, R. S. Devdhar, R. Acharya, P. C. Cheng, C. Meng, and R. Berezney. Spatial and temporal dynamics of DNA replication sites in mammalian cells. J Cell Biol, 143(6):1415{25, 1998. [151] H. K. MacAlpine, R. Gordan, S. K. Powell, A. J. Hartemink, and D. M. MacAlpine. Drosophila ORC localizes to open chromatin and marks sites of cohesin complex loading. Genome Res, 20(2):201{11, 2009. [152] D. Maiorano, J. Moreau, and M. Mechali. XCDT1 is required for the assembly of pre-replicative complexes in Xenopus laevis. Nature, 404(6778):622{5., 2000. [153] J. Majka and P. M. Burgers. Yeast Rad17/Mec3/Ddc1: a sliding clamp for the DNA damage checkpoint. Proc Natl Acad Sci USA, 100(5):2249{54, 2003. [154] J. Majka and P. M. Burgers. The PCNA-RFC families of DNA clamps and clamp loaders. Prog Nucleic Acid Res Mol Biol, 78:227{60, 2004. [155] K. Marheineke and O. Hyrien. Aphidicolin triggers a block to replication origin ring in Xenopus egg extracts. J Biol Chem, 276(20):17092{100, 2001. [156] P. Marjoram, J. Molitor, V. Plagnol, and S. Tavar e. Markov chain Monte Carlo without likelihoods. PNAS, 100(26):15324{28., 2003. [157] H. Masumoto, A. Sugino, and H. Araki. Dpb11 controls the association between DNA polymerases alpha and epsilon and the autonomously replicating sequence region of budding yeast. Mol Cell Biol, 20(8):2809{17, 2000. [158] K. Matsumoto and Y. Ishimi. Single-stranded-DNA-binding protein-dependent DNA unwinding of the yeast ARS1 region. Mol Cell Biol, 14(7):4624{32, 1994. [159] H.J. McCune, Danielson L.S., G.M. Alvino, D. Collingwood, J.J. Delrow, Fang- man W.L., B.J. Brewer, and M.K. Raghuraman. The temporal program of chro- mosome replication: genomewide replication inclb5 Saccharomyces cerevisiae. Genetics, 180(4):1833{47, 2008. 168 [160] M. Mechali and S. Kearsey. Lack of specic sequence requirement for DNA replication in Xenopus eggs compared with high sequence specicity in yeast. Cell, 38(1), 1984. [161] P. Meister, A. Taddei, P. Aaron, G. Baldacci, and S.M. Gasser. Replication foci dynamics: replication patterns are modulated by S-phase checkpoint kinases in ssion yeast. EMBO, 26(5):1315{1326, 2007. [162] A. M. Merchant, Y. Kawasaki, Y. Chen, M. Lei, and B. K. Tye. A lesion in the DNA replication initiation factor Mcm10 induces pausing of elongation forks through chromosomal replication origins in Saccharomyces cerevisiae. Mol Cell Biol, 17(6):3261{71, 1997. [163] L. D. Mesner, E. L. Crawford, and J. L. Hamlin. Isolating apparently pure libraries of replication origins from complex genomes. Mol Cell, 21(5):719{26, 2006. [164] S. Mimura, T. Masuda, T. Matsui, and H. Takisawa. Central role for cdc45 in establishing an initiation complex of DNA replication in Xenopus egg extracts. Genes Cells, 5(6):439{52, 2000. [165] S. Mimura and H. Takisawa. Xenopus Cdc45-dependent loading of DNA poly- merase alpha onto chromatin under the control of S-phase Cdk. Embo J, 17(19):5699{707, 1998. [166] B. Miotto and K. Struhl. HBO1 histone acetylase activity is essential for DNA replication licensing and inhibited by Geminin. Mol Cell, 37(1):57{66, 2010. [167] B. K. Mohanty, N. K. Bairwa, and D. Bastia. The Tof1p-Csm3p protein com- plex counteracts the Rrm3p helicase to control replication termination of Sac- charomyces cerevisiae. Proc Natl Acad Sci USA, 103(4):897{902, 2006. [168] S. Mori and K. Shirahige. Perturbation of the activity of replication origin by meiosis-specic transcription. J Biol Chem, 282(7):4447{52, 2007. [169] A. Morillon, J. O'Sullivan, A. Azad, N. Proudfoot, and J. Mellor. Regulation of elongating RNA polymerase II by forkhead transcription factors in yeast. Science, 300(5618):492{5, c. [170] P. Muller, S. Park, E. Shor, D. J. Huebert, C. L. Warren, A. Z. Ansari, M. Wein- reich, M. L. Eaton, D. M. MacAlpine, and C. A. Fox. The conserved bromo- adjacent homology domain of yeast Orc1 functions in the selection of DNA replication origins within chromatin. Genes Dev, 24(13):1418{33, 2010. [171] H. Nakamura, T. Morita, and C. Sato. Structural organizations of replicon domains during DNA synthetic phase in the mammalian nucleus. Exp Cell Res, 165(2):291{7, 1986. 169 [172] D. A. Natale, C. J. Li, W. H. Sun, and M. L. DePamphilis. Selective insta- bility of Orc1 protein accounts for the absence of functional origin recognition complexes during the M-G(1) transition in mammals. Embo J, 19(11):2728{38., 2000. [173] M. N. Nedelcheva, A. Roguev, L. B. Dolapchiev, A. Shevchenko, H. B. Taskov, A. F. Stewart, and S. S. Stoynov. Uncoupling of unwinding from DNA synthesis implies regulation of MCM helicase by Tof1/Mrc1/Csm3 checkpoint complex. J Mol Biol, 347(3):509{21, 2005. [174] S.B. Needleman and C.D. Wunsch. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol, 48:443{ 453, 1970. [175] C. S. Newlon, I. Collins, A. Dershowitz, A. M. SDeshpande, S. A. Greenfeder, L. Y. Ong, and J.F. Theis. Analysis of Replication Origin Function in Chro- mosome III of Saccharomyces cerevisiae, volume LVIII, pages 415{423. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, NY, 1993. [176] C. S. Newlon, L. R. Lipchitz, I. Collins, A. Deshpande, R. J. Devenish, R. P. Green, H. L. Klein, T. G. Palzkill, R. B. Ren, S. Synn, and et al. Analysis of a circular derivative of Saccharomyces cerevisiae Chromosome III: a physical map and identication and location of ARS elements. Genetics, 129(2):343{57., 1991. [177] V. Q. Nguyen, C. Co, K. Irie, and J. J. Li. Clb/Cdc28 kinases promote nuclear export of the replication initiator proteins Mcm2-7. Curr Biol, 10(4):195{205, 2000. [178] V. Q. Nguyen, C. Co, and J. J. Li. Cyclin-dependent kinases prevent DNA re-replication through multiple mechanisms. Nature, 411(6841):1068{73, 2001. [179] S.A Nick-McElhinney, D.A. Gordenin, C.M. Stith, P.M. Burgers, and T.A. Kunkel. Division of labor at the eukaryotic replication fork. Mol Cell, 30(2):137{ 44, 2008. [180] C. A. Nieduszynski, S. Hiraga, P. Ak, C. J. Benham, and A. D. Donaldson. OriDB: a DNA replication origin database. Nucleic Acids Res, 35(Database issue):D40{6, 2007. [181] H. Nishitani, Z. Lygerou, T. Nishimoto, and P. Nurse. The Cdt1 protein is required to license DNA for replication in ssion yeast. Nature, 404(6778):625{ 8., 2000. [182] H. Nishitani and P. Nurse. p65cdc18 plays a major role controlling the initiation of DNA replication in ssion yeast. Cell, 83(3):397{405, 1995. 170 [183] R. Nougarede, F. Della Seta, P. Zarzov, and E. Schwob. Hierarchy of S- phase-promoting factors: yeast Dbf4-Cdc7 kinase requires prior S-phase cyclin- dependent kinase activation. Mol Cell Biol, 20(11):3795{806, 2000. [184] M. Onishi, G. G. Liou, J. R. Buchberger, T. Walz, and D. Moazed. Role of the conserved Sir3-BAH domain in nucleosome binding and silent chromatin assembly. Mol Cell, 28(6):1015{28, 2007. [185] T. L. Orr-Weaver, C. G. Johnston, and A. C. Spradling. The role of ACE3 in Drosophila chorion gene amplication. EMBO J, 8(13):4153{62, 1989. [186] A. J. Osborn and S. J. Elledge. Mrc1 is a replication fork component whose phosphorylation in response to DNA replication stress activates Rad53. Genes Dev, 17(14):1755{67, 2003. [187] D. T. Pak, M. P umm, I. Chesnokov, D. W. Huang, R. Kellum, J. Marr, P. Romanowski, and M. R. Botchan. Association of the origin recognition complex with heterochromatin and HP1 in higher eukaryotes. Cell, 91(3):311{ 23., 1997. [188] D. L. Pappas Jr, R. Frisch, and M. Weinreich. The NAD(+)-dependent Sir2p histone deacetylase is a negative regulator of chromosomal DNA replication. Genes Dev, 18(7):769{81, 2004. [189] P. Pasero, B. P. Duncker, E. Schwob, and S. M. Gasser. A role for the Cdc7 kinase regulatory subunit Dbf4p in the formation of initiation-competent origins of replication. Genes Dev, 13(16):2159{76, 1999. [190] P. K. Patel, N. Kommajosyula, A. Rosebrock, A. Bensimon, J. Leatherwood, J. Bechhoefer, and N. Rhind. The Hsk1(Cdc7) replication kinase regulates origin eciency. Mol Biol Cell, 19(12):5550{8, 2008. [191] C. Pelizon, M. A. Madine, P. Romanowski, and R. A. Laskey. Unphospho- rylatable mutants of Cdc6 disrupt its nuclear export but still support DNA replication once per cell cycle. Genes Dev, 14(19):2526{33, 2000. [192] S. Peng, A.A. Alekseyenko, E. Larschan, M. Kuroda, and P.J. Park. Normaliza- tion and experimental design for ChIP-chip data. BMC Bioinformatics, 8(219), 2007. [193] G. Perkins and J. F. Diey. Nucleotide-dependent prereplicative complex as- sembly by Cdc6p, a homolog of eukaryotic and prokaryotic clamp-loaders. Mol Cell, 2(1):23{32, 1998. [194] S. Piatti, T. Bohm, J. H. Cocker, J. F. Diey, and K. Nasmyth. Activation of S-phase-promoting CDKs in late G1 denes a "point of no return" after which Cdc6 synthesis cannot promote DNA replication in yeast. Genes Dev, 10(12):1516{1531, 1996. 171 [195] A. Pic, F. L. Lim, S. J. Ross, E. A. Veal, A. L. Johnson, M. R. Sultan, A. G. West, L. H. Johnston, A. D. Sharrocks, and B. A. Morgan. The forkhead protein Fkh2 is a component of the yeast cell cycle transcription factor SFF. EMBO J, 19(14):3750{61, 2000. [196] D. K. Pokholok, C. T. Harbison, S. Levine, M. Cole, N. M. Hannett, T. I. Lee, G. W. Bell, K. Walker, P. A. Rolfe, E. Herbolsheimer, J. Zeitlinger, F. Lewitter, D. K. Giord, and R. A. Young. Genome-wide map of nucleosome acetylation and methylation in yeast. Cell, 122(4):517{27, 2005. [197] F. Prado and A. Aguilera. Impairment of replication fork progression mediates RNA pol II transcription-associated recombination. EMBO J, 24(6):1267{76, 2005. [198] J.K. Pritchard, M.T. Seielstad, A. Perez-Lezaun, and M.W. Feldman. Popula- tion growth of human Y chromosmes: a study of Y chromosome microsatellites. Mol. Bio. Evol., 16(12):1791{98, 1999. [199] Y. Qi, A. Rolfe, K.D. MacIsaac, G.K. Gerber, D. Pokholok, J. Zeitlinger, T. Danford, R Dowell, E. Fraenkel, T.S. Jaakkola, R.A. Young, and D.K. Gif- ford. High-resolution computational models of genome binding events. Nat Biotechnol, 24(8):963{970, 2006. [200] M. K. Raghuraman, E. A. Winzeler, D. Collingwood, S. Hunt, L. Wodicka, A. Conway, D. J. Lockhart, R. W. Davis, B. J. Brewer, and W. L. Fangman. Replication dynamics of the yeast genome. Science, 294(5540):115{21., 2001. [201] H. Rao and B. Stillman. The origin recognition complex interacts with a bi- partite DNA binding site within yeast replicators. Proc Natl Acad Sci USA, 92(6):2224{2228, 1995. [202] S. V. Razin, O. V. Iarovaia, N. Sjakste, T. Sjakste, L. Bagdoniene, A. V. Ryn- ditch, E. R. Eivazova, M. Lipinski, and Y. S. Vassetzky. Chromatin domains and regulation of transcription. J Mol Biol, 369(3):597{607, 2007. [203] J. L. Reid, Z. Moqtaderi, and K. Struhl. Eaf3 regulates the global pattern of histone acetylation in Saccharomyces cerevisiae. Mol Cell Biol, 24(2):757{64, 2004. [204] B. Ren, F. Robert, J. J. Wyrick, O. Aparicio, E. G. Jennings, I. Simon, J. Zeitlinger, J. Schreiber, N. Hannett, E. Kanin, T. L. Volkert, C. J. Wil- son, S. P. Bell, and R. A. Young. Genome-wide location and function of DNA binding proteins. Science, 290(5500):2306{9., 2000. [205] N. Rhind. DNA replication timing: random thoughts about origin ring. Nat Cell Biol, 8(12):1313{6, 2006. 172 [206] F.J. Richards. A exible growth function for empirical use. J Exp Bot, 10:290{ 300, 1959. [207] F. Robert, D. K. Pokholok, N. M. Hannett, N. J. Rinaldi, M. Chandy, A. Rolfe, J. L. Workman, D. K. Giord, and R. A. Young. Global position and recruit- ment of HATs and HDACs in the yeast genome. Mol Cell, 16(2):199{209, 2004. [208] G. Robertson, M. Hirst, M. Bainbridge, M. Bilenky, Y. Zhao, T. Zeng, G. Eu- skirchen, B. Bernier, R. Varhol, A. Delaney, N. Thiessen, O. L. Grith, A. He, M. Marra, M. Snyder, and S. Jones. Genome-wide proles of STAT1 DNA asso- ciation using chromatin immunoprecipitation and massively parallel sequencing. Nat Methods, 4(8):651{7, 2007. [209] M. Robinson, D. De-Souza, W. Keen, and E. Saunders. A dynamic programming approach for the alignment of signal peaks in multiple gas chromatography-mass spectrometry experiments. BMC Bioinformatics, 8(419), 2007. [210] D. Robyr, Y. Suka, I. Xenarios, S. K. Kurdistani, A. Wang, N. Suka, and M. Grunstein. Microarray deacetylation maps determine genome-wide functions for yeast histone deacetylases. Cell, 109(4):437{46., 2002. [211] P. Romanowski, M. A. Madine, A. Rowles, J. J. Blow, and R. A. Laskey. The Xenopus origin recognition complex is essential for DNA replication and MCM binding to chromatin. Curr. Biol., 6(11):1416{1425, 1996. [212] A. Rowles, S. Tada, and J. J. Blow. Changes in association of the Xenopus origin recognition complex with chromatin on licensing of replication origins. J Cell Sci, 112 ( Pt 12):2011{8, 1999. 173 [213] S. Roy, J. Ernst, P. V. Kharchenko, P. Kheradpour, N. Negre, M. L. Eaton, J. M. Landolin, C. A. Bristow, L. Ma, M. F. Lin, S. Washietl, B. I. Arshino, F. Ay, P. E. Meyer, N. Robine, N. L. Washington, L. Di Stefano, E. Berezikov, C. D. Brown, R. Candeias, J. W. Carlson, A. Carr, I. Jungreis, D. Marbach, R. Sealfon, M. Y. Tolstorukov, S. Will, A. A. Alekseyenko, C. Artieri, B. W. Booth, A. N. Brooks, Q. Dai, C. A. Davis, M. O. Du, X. Feng, A. A. Gor- chakov, T. Gu, J. G. Heniko, P. Kapranov, R. Li, H. K. MacAlpine, J. Malone, A. Minoda, J. Nordman, K. Okamura, M. Perry, S. K. Powell, N. C. Riddle, A. Sakai, A. Samsonova, J. E. Sandler, Y. B. Schwartz, N. Sher, R. Spokony, D. Sturgill, M. van Baren, K. H. Wan, L. Yang, C. Yu, E. Feingold, P. Good, M. Guyer, R. Lowdon, K. Ahmad, J. Andrews, B. Berger, S. E. Brenner, M. R. Brent, L. Cherbas, S. C. Elgin, T. R. Gingeras, R. Grossman, R. A. Hoskins, T. C. Kaufman, W. Kent, M. I. Kuroda, T. Orr-Weaver, N. Perrimon, V. Pir- rotta, J. W. Posakony, B. Ren, S. Russell, P. Cherbas, B. R. Graveley, S. Lewis, G. Micklem, B. Oliver, P. J. Park, S. E. Celniker, S. Heniko, G. H. Karpen, E. C. Lai, D. M. MacAlpine, L. D. Stein, K. P. White, M. Kellis, C. L. Comstock, A. Dobin, J. Drenkow, S. Dudoit, et al. Identication of functional elements and regulatory circuits by Drosophila modENCODE. Science, 330(6012):1787{97, 2010. [214] A. Ruange, P. E. Jacques, W. Bhat, F. Robert, and A. Nourani. Genome-wide replication-independent histone H3 exchange occurs predominantly at promot- ers and implicates H3-K56 acetylation and Asf1. Mol Cell, 27(3):393{405, 2007. [215] S. E. Rundlett, A. A. Carmen, N. Suka, B. M. Turner, and M. Grunstein. Transcriptional repression by UME6 involves deacetylation of lysine 5 of histone H4 by RPD3. Nature, 392(6678):831{5., 1998. [216] T. Ryba, I. Hiratani, J. Lu, M. Itoh, M. Kulik, J. Zhang, T. C. Schulz, A. J. Robins, S. Dalton, and D. M. Gilbert. Evolutionarily conserved replication timing proles predict long-range chromatin interactions and distinguish closely related cell types. Genome Res, 20(6):761{70, 2010. [217] N. Sabet, S. Volo, C. Yu, J. P. Madigan, and R. H. Morse. Genome-wide analysis of the relationship between transcriptional regulation by Rpd3p and the histone H3 and H4 amino termini in budding yeast. Mol Cell Biol, 24(20):8823{33, 2004. [218] P. Saha, J. Chen, K. C. Thome, S. J. Lawlis, Z. H. Hou, M. Hendricks, J. D. Parvin, and A. Dutta. Human CDC6/Cdc18 associates with Orc1 and cyclin- cdk and is selectively eliminated from the nucleus at the onset of S-phase. Mol Cell Biol, 18(5):2758{67, 1998. [219] P. Saha, K. C. Thome, R. Yamaguchi, Z. Hou, S. Weremowicz, and A. Dutta. The human homolog of Saccharomyces cerevisiae CDC45. J Biol Chem, 273(29):18205{9, 1998. 174 [220] Y. Sanchez, B. A. Desany, W. J. Jones, Q. Liu, B. Wang, and S. J. Elledge. Regulation of RAD53 by the ATM-like kinases MEC1 and TEL1 in yeast cell cycle checkpoint pathways. Science, 271(5247):357{60, 1996. [221] C. Santocanale and J. F. Diey. A Mec1- and Rad53-dependent checkpoint controls late-ring origins of DNA replication. Nature, 395(6702):615{8, 1998. [222] C. Santocanale and J. F. X. Diey. ORC- and Cdc6-dependent complexes at active and inactive chromosomal replication origins in Saccharomyces cerevisiae. EMBO J., 15(23):6671{6679, 1996. [223] C. Santocanale, K. Sharma, and J. F. Diey. Activation of dormant origins of DNA replication in budding yeast. Genes Dev, 13(18):2360{4, 1999. [224] M. Sato, T. Gotow, Z. You, Y. Komamura-Kohno, Y. Uchiyama, N. Yabuta, H. Nojima, and Y. Ishimi. Electron microscopic observation and single-stranded DNA binding activity of the Mcm4,6,7 complex. J Mol Biol, 300(3):421{31, 2000. [225] M. Schwaiger and D. Schubeler. A question of timing: emerging links between transcription and replication. Curr Opin Genet Dev, 16(2):177{83, 2006. [226] M. Schwaiger, M. B. Stadler, O. Bell, H. Kohler, E. J. Oakeley, and D. Schubeler. Chromatin state marks cell-type- and gender-specic replication of the Drosophila genome. Genes Dev, 23(5):589{601, 2009. [227] E. Segal, Y. Fondufe-Mittendorf, L. Chen, A. Thastrom, Y. Field, I. K. Moore, J. P. Wang, and J. Widom. A genomic code for nucleosome positioning. Nature, 442(7104):772{8, 2006. [228] M. Segurado, A. de Luis, and F. Antequera. Genome-wide distribution of DNA replication origins at A+T-rich islands in Schizosaccharomyces pombe. EMBO Rep, 4(11):1048{53, 2003. [229] M. D. Sekedat, D. Fenyo, R. S. Rogers, A. J. Tackett, J. D. Aitchison, and B. T. Chait. GINS motion reveals replication fork progression is remarkably uniform throughout the yeast genome. Mol Syst Biol, 6:353, 2010. [230] J. Sequeira-Mendes, R. Diaz-Uriarte, A. Apedaile, D. Huntley, N. Brock- dor, and M. Gomez. Transcription initiation activity sets replica- tion origin eciency in mammalian cells. PLoS Genet, 5(4):e1000446. doi:10.1371/journal.pgen.1000446, 2009. [231] D. Shechter, C. Y. Ying, and J. Gautier. DNA unwinding is an Mcm complex-dependent and ATP hydrolysis-dependent process. J Biol Chem, 279(44):45586{93, 2004. 175 [232] R. Sherwood, T. S. Takahashi, and P. V. Jallepalli. Sister acts: coordinating DNA replication and cohesion establishment. Genes Dev, 24(24):2723{31, 2010. [233] Y. J. Sheu and B. Stillman. The Dbf4-Cdc7 kinase promotes S-phase by alle- viating an inhibitory activity in Mcm4. Nature, 463(7277):113{7, 2010. [234] K. Shirahige, Y. Hori, K. Shiraishi, M. Yamashita, K. Takahashi, C. Obuse, T. Tsurimoto, and H. Yoshikawa. Regulation of DNA-replication origins during cell-cycle progression. Nature, 395(6702):618{21, 1998. [235] K. Shirahige, T. Iwasaki, M. B. Rashid, N. Ogasawara, and H. Yoshikawa. Lo- cation and characterization of autonomously replicating sequences from Chro- mosome VI of Saccharomyces cerevisiae. Mol Cell Biol, 13(8):5043{56., 1993. [236] M. Simonis, P. Klous, E. Splinter, Y. Moshkin, R. Willemsen, E. de Wit, B. van Steensel, and W. de Laat. Nuclear organization of active and inactive chromatin domains uncovered by chromosome conformation capture-on-chip (4C). Nat Genet, 38(11):1348{54, 2006. [237] R. T. Simpson. Nucleosome positioning can aect the function of a cis-acting DNA element in vivo. Nature, 343(6256):387{9, 1990. [238] G. K. Smyth. Linear models and empirical bayes methods for assessing dieren- tial expression in microarray experiments. Stat Appl Genet Mol Biol, 3:Article3, 2004. [239] G. K. Smyth and T. Speed. Normalization of cDNA microarray data. Methods, 31(4):265{73, 2003. [240] J. M. Sogo, M. Lopes, and M. Foiani. Fork reversal and ssDNA accumulation at stalled replication forks owing to checkpoint defects. Science, 297(5581):599{ 602, 2002. [241] M. J. Solomon, P. L. Larsen, and A. Varshavsky. Mapping protein-DNA inter- actions in vivo with formaldehyde: evidence that histone H4 is retained on a highly transcribed gene. Cell, 53(6):937{47, 1988. [242] M. J. Solomon and A. Varshavsky. Formaldehyde-mediated DNA-protein crosslinking: a probe for in vivo chromatin structures. Proc Natl Acad Sci USA, 82(19):6470{4, 1985. [243] T. W. Spiesser, E. Klipp, and M. Barberis. A model for the spatiotemporal organization of DNA replication in Saccharomyces cerevisiae. Mol Genet Ge- nomics, 282(1):25{35, 2009. 176 [244] A. Sporbert, A. Gahl, R. Ankerhold, H. Leonhardt, and M. C. Cardoso. DNA polymerase clamp shows little turnover at established replication sites but se- quential de novo assembly at adjacent origin clusters. Mol Cell, 10(6):1355{65, 2002. [245] A. Srivatsan, A. Tehranchi, D.M. MacAlpine, and Wang J.D. Co-orientation of replication and transcription preserves genome integrity. PLoS Genet, 6(1):e1000810, 2010. [246] P. J. Stambrook and R. A. Flickinger. Changes in chromosomal DNA replication patterns in developing frog embryos. J Exp Zool, 174(1):101{13, 1970. [247] D. T. Stinchcomb, K. Struhl, and R. W. Davis. Isolation and characterization of a yeast chromosomal replicator. Nature, 282(5734):39{43, 1979. [248] J.C. Stroud, Y. Wu, D.L. Bates, A. Han, K. Nowick, S. Paabo, H. Tong, and Chen L. Structure of the forkhead domain of FOXP2 bound to DNA. Structure, 14(1):159{166, 2006. [249] Z. Sun, D. S. Fay, F. Marini, M. Foiani, and D. F. Stern. Spk1/Rad53 is regulated by Mec1-dependent protein phosphorylation in DNA replication and damage checkpoint pathways. Genes Dev, 10(4):395{406, 1996. [250] F. D. Sweeney, F. Yang, A. Chi, J. Shabanowitz, D. F. Hunt, and D. Durocher. Saccharomyces cerevisiae Rad9 acts as a Mec1 adaptor to allow Rad53 activa- tion. Curr Biol, 15(15):1364{75, 2005. [251] S. J. Szyjka, J. G. Aparicio, C. J. Viggiani, S. Knott, W. Xu, S. Tavar e, and O. M. Aparicio. Rad53 regulates replication fork restart after DNA damage in Saccharomyces cerevisiae. Genes Dev, 22(14):1906{20, 2008. [252] S. J. Szyjka, C. J. Viggiani, and O. M. Aparicio. Mrc1 is required for normal progression of replication forks throughout chromatin in S. cerevisiae. Mol Cell, 19(5):691{7, 2005. [253] S. Tanaka, Y. Tanaka, and K. Isono. Systematic mapping of autonomously replicating sequences on Chromosome V of Saccharomyces cerevisiae using a novel strategy. Yeast, 12(2):101{13., 1996. [254] T. Tanaka, D. Knapp, and K. Nasmyth. Loading of an MCM protein onto DNA replication origins is regulated by Cdc6p and CDKs. Cell, 90:649{660, 1997. [255] T. Tanaka and K. Nasmyth. Association of RPA with chromosomal replication origins requires an Mcm protein, and is regulated by Rad53, and cyclin- and Dbf4-dependent kinases. Embo J, 17(17):5182{91, 1998. 177 [256] Y. Tatsumi, T. Tsurimoto, K. Shirahige, H. Yoshikawa, and C. Obuse. Associa- tion of human origin recognition complex 1 with chromatin DNA and nuclease- resistant nuclear structures. J Biol Chem, 275(8):5904{10, 2000. [257] J. A. Tercero and J. F. Diey. Regulation of DNA replication fork progression through damaged DNA by the Mec1/Rad53 checkpoint. Nature, 412(6846):553{ 7., 2001. [258] J. A. Tercero, M. P. Longhese, and J. F. Diey. A central role for DNA replication forks in checkpoint activation and response. Mol Cell, 11(5):1323{ 36, 2003. [259] F. Thoma, L. W. Bergman, and R. T. Simpson. Nuclease digestion of circular TRP1-ARS1 chromatin reveals positioned nucleosomes separated by nuclease- sensitive regions. J Mol Biol, 177(4):715{33, 1984. [260] J. Z. Torres, S. L. Schnakenberg, and V. A. Zakian. Saccharomyces cerevisiae Rrm3p DNA helicase promotes genome integrity by preventing replication fork stalling: viability of rrm3 cells requires the intra-S-phase checkpoint and fork restart activities. Mol Cell Biol, 24(8):3198{212, 2004. [261] H. Tourriere, G. Versini, V. Cordon-Preciado, C. Alabert, and P. Pasero. Mrc1 and Tof1 promote replication fork progression and recovery independently of Rad53. Mol Cell, 19(5):699{706, 2005. [262] S. Tuduri, H. Tourriere, and P. Pasero. Dening replication origin eciency using DNA ber assays. Chromosome Res, 18(1):91{102, 2009. [263] M. Uchiyama, D. Griths, K. Arai, and H. Masai. Essential role of Sna41/Cdc45 in loading of DNA polymerase alpha onto minichromosome main- tenance proteins in ssion yeast. J Biol Chem, 276(28):26189{96, 2001. [264] T. Usui, H. Ogawa, and J. H. Petrini. A DNA damage response pathway controlled by Tel1 and the Mre11 complex. Mol Cell, 7(6):1255{66, 2001. [265] A. Vas, W. Mok, and J. Leatherwood. Control of DNA rereplication via Cdc2 phosphorylation sites in the origin recognition complex. Mol Cell Biol, 21(17):5767{77, 2001. [266] J. E. Vialard, C. S. Gilbert, C. M. Green, and N. F. Lowndes. The budding yeast Rad9 checkpoint protein is subjected to Mec1/Tel1-dependent hyperphospho- rylation and interacts with Rad53 after DNA damage. Embo J, 17(19):5679{88, 1998. [267] C. J. Viggiani, S. R. Knott, and O. M. Aparicio. Genome-wide analysis of DNA synthesis by BrdU immunoprecipitation on tiling microarrays (BrdU-IP-chip) in Saccharomyces cerevisiae. Cold Spring Harb Protoc, 2010(2):pdb prot5385, 2010. 178 [268] M. Vogelauer, L. Rubbi, I. Lucas, B. J. Brewer, and M. Grunstein. Histone acetylation regulates the time of replication origin ring. Mol Cell, 10(5):1223{ 33, 2002. [269] M. Vujcic, C. A. Miller, and D. Kowalski. Activation of silent replication origins at autonomously replicating sequence elements near the HML locus in budding yeast. Mol Cell Biol, 19(9):6098{109, 1999. [270] S. S. Walker, S. C. Francesconi, and S. Eisenberg. A DNA replication enhancer in Saccharomyces cerevisiae. Proc Natl Acad Sci USA, 87(12):4665{9, 1990. [271] S. S. Walker, S. C. Francesconi, B. K. Tye, and S. Eisenberg. The OBF1 protein and its DNA-binding site are important for the function of an autonomously replicating sequence in Saccharomyces cerevisiae. Mol Cell Biol, 9(7):2914{2921, 1989. [272] J. Walter and J. Newport. Initiation of eukaryotic DNA replication: origin un- winding and sequential chromatin association of Cdc45, RPA, and DNA poly- merase alpha. Mol Cell, 5(4):617{27, 2000. [273] H. Wang and S. J. Elledge. DRC1, DNA replication and checkpoint protein 1, functions with DPB11 to control DNA replication and the S-phase checkpoint in Saccharomyces cerevisiae. Proc Natl Acad Sci USA, 96(7):3824{9, 1999. [274] T. A. Weinert, G. L. Kiser, and L. H. Hartwell. Mitotic checkpoint genes in budding yeast and the dependence of mitosis on DNA replication and repair. Genes Dev, 8(6):652{665, 1994. [275] M. Weinreich, C. Liang, and B. Stillman. The Cdc6p nucleotide-binding motif is required for loading MCM proteins onto chromatin. Proc Natl Acad Sci USA, 96(2):441{6, 1999. [276] M. Weinreich and B. Stillman. Cdc7p-Dbf4p kinase binds to chromatin during S-phase and is regulated by both the APC and the RAD53 checkpoint pathway. Embo J, 18(19):5334{46, 1999. [277] G. M. Wilmes, V. Archambault, R. J. Austin, M. D. Jacobson, S. P. Bell, and F. R. Cross. Interaction of the S-phase cyclin Clb5 with an RXL docking sequence in the initiator protein Orc6 provides an origin-localized replication control switch. Genes Dev, 18(9):981{91, 2004. [278] G. M. Wilmes and S. P. Bell. The B2 element of the Saccharomyces cerevisiae ARS1 origin of replication requires specic sequences to facilitate pre-RC for- mation. Proc Natl Acad Sci USA, 99(1):101{6, 2002. [279] A. P. Wole and J. J. Hayes. Chromatin disruption and modication. Nucleic Acids Res, 27(3):711{20, 1999. 179 [280] C. L. Woodcock and R. P. Ghosh. Chromatin higher-order structure and dy- namics. Cold Spring Harb Perspect Biol, 2(5):a000596, 2010. [281] P. Y. Wu and P. Nurse. Establishing the program of origin ring during S-phase in ssion yeast. Cell, 136(5):852{64, 2009. [282] J. J. Wyrick, J. G. Aparicio, T. Chen, J. D. Barnett, E. G. Jennings, R. A. Young, S. P. Bell, and O. M. Aparicio. Genome-wide distribution of ORC and MCM proteins in S. cerevisiae: high- resolution mapping of replication origins. Science, 294(5550):2357{60., 2001. [283] W. Xu, J. G. Aparicio, O. M. Aparicio, and S. Tavar e. Genome-wide mapping of ORC and Mcm2p binding sites on tiling arrays and identication of essential ARS consensus sequences in S. cerevisiae. BMC Genomics, 7(1):276, 2006. [284] Z. Xu, W. Wei, J. Gagneur, F. Perocchi, S. Clauder-Munster, J. Camblong, E. Guanti, F. Stutz, W. Huber, and L.M. Steinmetz. Bidirectional promoters generate pervasive transcription in yeast. Nature, 457(7232):1033{1037, 2009. [285] N. Yabuki, H. Terashima, and K. Kitada. Mapping of early ring origins on a replication prole of budding yeast. Genes Cells, 7(8):781{9, 2002. [286] H. Yabuuchi, Y. Yamada, T. Uchida, T. Sunathvanichkul, T. Nakagawa, and H. Masukata. Ordered assembly of Sld3, GINS and Cdc45 is distinctly regulated by DDK and CDK for activation of replication origins. EMBO J, 25(19):4663{ 74, 2006. [287] M. Yamashita, Y. Hori, T. Shinomiya, C. Obuse, T. Tsurimoto, H. Yoshikawa, and K. Shirahige. The eciency and timing of initiation of replication of multiple replicons of Saccharomyces cerevisiae Chromosome VI. Genes Cells, 2(11):655{65, 1997. [288] S. C. Yang, N. Rhind, and J. Bechhoefer. Modeling genome-wide replication kinetics reveals a mechanism for regulation of replication timing. Mol Syst Biol, 6:404, 2010. [289] Y. Yang and N Thorne. Normalization for two-color cDNA microarray data, volume 40, pages 403{418. Institute of Mathematical Statistics, Baltimore, MD, 2003. [290] Y.H. Yang, S. Dudoit, P. Luu, D.M. Lin, V. Peng, N Ngai, and T.P. Speed. Normalization for cDNA microarray data: a robust composite method address- ing single and multiple slide systematic variation. Nucleic Acids Res, 30(4):e15, 2002. [291] Z. You, Y. Komamura, and Y. Ishimi. Biochemical analysis of the intrinsic Mcm4-Mcm6-Mcm7 DNA helicase activity. Mol Cell Biol, 19(12):8003{15, 1999. 180 [292] G. C. Yuan, Y. J. Liu, M. F. Dion, M. D. Slack, L. F. Wu, S. J. Altschuler, and O. J. Rando. Genome-scale identication of nucleosome positions in S. cerevisiae. Science, 309(5734):626{30, 2005. [293] Z. Zhang, M. K. Hayashi, O. Merkel, B. Stillman, and R. M. Xu. Structure and function of the BAH-containing domain of Orc1p in epigenetic silencing. EMBO J, 21(17):4600{11, 2002. [294] G. Zhu, P. T. Spellman, T. Volpe, P. O. Brown, D. Botstein, T. N. Davis, and B. Futcher. Two yeast forkhead genes regulate the cell cycle and pseudohyphal growth. Nature, 406(6791):90{4, 2000. [295] L. Zou, J. Mitchell, and B. Stillman. CDC45, a novel yeast gene that functions with the origin recognition complex and Mcm proteins in initiation of DNA replication. Mol Cell Biol, 17(2):553{563, 1997. [296] L. Zou and B. Stillman. Assembly of a complex containing Cdc45p, replica- tion protein A, and Mcm2p at replication origins controlled by S-phase cyclin- dependent kinases and Cdc7p-Dbf4p kinase. Mol Cell Biol, 20(9), 2000. 181 Appendices Appendix A: Supplemental Tables Table A.1: List of origins associated with BrdU peaks in wild-type cells analyzed as described in Figure 2.7 legend Location ARS Alternative Name Chr Start End I-31 ARS104 proARS104 1 30946 31184 I-70 ARS106 proARS106 1 70258 70491 I-124 ARS107 proARS107 1 124350 124599 I-137 ARS107.5 NA 1 136900 137900 I-147 ARS108 proARS108 1 146703 147690 I-160 ARS109 ARS101,proARS109 1 159906 160127 I-176 ARS110 ADE1 1 176154 176402 II-29 ARS201.5 ARS230 2 28933 29152 II-63 ARS202 proARS202 2 63186 63421 II-78 NA NA 2 70781 85781 II-198 ARS207.5 ARS231 2 198193 198434 II-210 ARS207.8 NA 2 209187 210063 II-238 ARS208 proARS208 2 237644 237879 II-255 ARS209 ARSH4,proARS209 2 254890 255136 II-283 ARS210.5 NA 2 283015 283913 II-326 ARS211 proARS211 2 326099 326335 II-379 ARS212 proARS212 2 378434 379194 II-408 ARS214 proARS214 2 407831 408064 II-487 ARS216 proARS216 2 486661 486909 II-612 ARS219.5 NA 2 611269 613200 II-623 ARS220 proARS220 2 622625 622894 II-632 ARS221 proARS221 2 631934 632246 II-774 ARS227 proARS227 2 773918 774348 II-802 ARS229 proARS229 2 801930 802617 III-30 ARS304 NA 3 30199 30657 176 Table A.1: Continued Location ARS Alternative Name Chr Start End III-39 ARS305 ARSA6C,proARS305 3 39158 39706 III-75 ARS306 NA 3 74457 74677 III-109 ARS307 ARSC2G1,proARS307 3 108775 109291 III-132 ARS309 proARS309 3 131978 132322 III-167 ARS310 proARS310 3 166494 167340 III-225 ARS315 NA 3 224807 225053 III-273 ARS316 proARS316 3 272844 273088 IV-46 ARS404 HO,proARS404 4 46181 46237 IV-124 ARS406 proARS406 4 123617 123902 IV-138 NA proARS407 4 137003 138437 IV-213 ARS409 proARS409 4 212420 212669 IV-330 ARS413 proARS413 4 329564 329813 IV-408 ARS414 proARS414 4 408070 408312 IV-436 NA proARS415 4 434223 437917 IV-463 ARS416 ARS1,proARS416 4 462430 462700 IV-484 ARS417 proARS417 4 483846 484091 IV-505 ARS417.5 ARS450 4 505336 505578 IV-555 ARS418 proARS418 4 555224 555461 IV-753 ARS423 proARS423 4 753159 753391 IV-806 ARS425 proARS425 4 806044 806270 IV-845 ARS426.5 NA 4 844597 845593 IV-879 NA NA 4 875791 882791 IV-914 ARS428 proARS428 4 913780 914029 IV-1017 ARS430 proARS430 4 1016624 1016922 IV-1058 ARS431 proARS431 4 1057828 1058076 IV-1166 ARS432.5 ARS453 4 1165998 1166221 IV-1303 ARS435 proARS435 4 1302579 1302819 IV-1354 ARS437 proARS437 4 1353494 1353667 IV-1462 ARS442 proARS443 4 1461849 1462161 IV-1506 NA proARS448 4 1505300 1507153 V-59 ARS507 proARS507 5 59282 59516 V-94 ARS508 proARS508 5 93977 94218 V-146 ARS510 proARS510 5 145539 145782 V-174 ARS511 proARS511 5 173636 173874 V-213 ARS512 proARS512 5 212381 212630 V-256 NA NA 5 255623 256957 V-288 ARS514 proARS514 5 287504 287750 V-302 NA NA 5 301213 302387 V-317 NA NA 5 316043 317307 177 Table A.1: Continued Location ARS Alternative Name Chr Start End V-354 ARS516 proARS516 5 353504 353751 V-407 ARS517 proARS517 5 406747 406949 V-443 ARS519 proARS519 5 442412 442731 V-499 ARS520 proARS520 5 498417 499343 V-550 ARS522 ARS501,proARS522 5 549560 549809 VI-53 NA NA 6 51814 55080 VI-99 NA NA 6 95000 103000 VI-105 ARS603.1 NA 6 104456 104695 VI-119 ARS603.5 proARS603.5 6 118631 118952 VI-128 ARS604 NA 6 127745 128066 VI-136 ARS605 proARS605 6 135979 136080 VI-168 ARS606 proARS606 6 167606 168041 VI-199 ARS607 proARS607 6 199382 199493 VI-217 ARS608 NA 6 216344 216692 VII-17 NA NA 7 15327 19624 VII-64 ARS702 proARS702 7 64279 64528 VII-163 ARS707 proARS707 7 163180 163447 VII-204 ARS710 proARS710 7 203917 204159 VII-286 ARS714 proARS714 7 285951 286246 VII-353 ARS716 proARS716 7 352695 352917 VII-389 ARS717 proARS717 7 388658 388892 VII-421 ARS718 proARS718 7 421093 421342 VII-485 ARS719 proARS719 7 484932 485160 VII-509 ARS720 proARS720 7 508729 508978 VII-575 ARS722 proARS722 7 574622 574916 VII-621 NA NA 7 620243 622517 VII-660 ARS727 proARS727 7 659809 660054 VII-715 ARS728 proARS728 7 715273 715556 VII-778 ARS729 proARS729 7 777967 778216 VII-835 ARS731 proARS731 7 834492 834736 VII-888 ARS731.5 ARS737 7 888350 888599 VII-978 ARS733 proARS733 7 977730 977979 VII-1000 ARS734 proARS734 7 999448 999695 VIII-46 NA proARS804 8 45113 47112 VIII-64 ARS805 SPO11,proARS805 8 64255 64489 VIII-133 ARS807 NA 8 133347 133591 VIII-169 ARS809 proARS809 8 168531 168773 VIII-246 ARS813 proARS813 8 245719 245968 VIII-297 NA proARS815 8 296233 298222 178 Table A.1: Continued Location ARS Alternative Name Chr Start End VIII-360 NA proARS816 8 358953 360647 VIII-381 NA proARS817 8 380153 382157 VIII-392 ARS818 proARS818 8 392148 392391 VIII-448 ARS820 proARS820 8 447619 447853 VIII-502 ARS822 proARS822 8 501751 501992 IX-74 NA proARS907 9 72923 75194 IX-106 ARS909 proARS909 9 105821 106048 IX-136 ARS911 proARS911 9 136094 136335 IX-175 ARS912 proARS912 9 175034 175355 IX-205 NA NA 9 201591 208591 IX-215 ARS913 ARS901,proARS913 9 214675 214826 IX-220 NA NA 9 216591 223591 IX-248 ARS914 proARS914 9 247579 247800 IX-283 NA NA 9 275259 290259 IX-314 NA proARS918 9 313359 314533 IX-342 ARS919 proARS919 9 341853 342096 IX-357 ARS920 proARS920 9 357156 357393 IX-407 NA NA 9 405633 407617 IX-412 ARS922 proARS922 9 411817 412053 X-68 ARS1005 proARS1005 10 67467 67949 X-100 ARS1006 proARS1006 10 99359 99796 X-162 ARS1007.5 NA 10 161435 161860 X-204 ARS1008 proARS1008 10 203729 204614 X-228 ARS1009 proARS1009 10 228248 228740 X-355 NA NA 10 353463 355722 X-376 ARS1013 proARS1013 10 375401 375923 X-417 ARS1014 proARS1014 10 416888 417134 X-442 ARS1015 proARS1015 10 442248 442658 X-459 ARS1017 NA 10 459029 459588 X-540 ARS1018 proARS1018 10 540239 540474 X-613 ARS1019 proARS1019 10 612542 612975 X-654 ARS1020 proARS1020 10 654069 654309 X-684 ARS1021 ARS121,proARS1021 10 683328 683817 XI-56 ARS1103 proARS1103 11 55670 55917 XI-98 ARS1104.5 ARS1125 11 98329 98568 XI-114 NA proARS1105 11 113073 115127 XI-153 ARS1106 proARS1106 11 152934 153173 XI-213 ARS1106.7 ARS1127 11 213080 213385 XI-236 NA NA 11 235293 236577 179 Table A.1: Continued Location ARS Alternative Name Chr Start End XI-257 NA proARS1107 11 256478 258327 XI-302 NA NA 11 301463 303017 XI-329 ARS1109 proARS1109 11 329322 329571 XI-389 ARS1112 proARS1112 11 388607 388902 XI-417 ARS1113 proARS1113 11 416822 417055 XI-448 ARS1114 proARS1114 11 447657 447892 XI-457 ARS1114.5 NA 11 454453 459197 XI-517 ARS1116 proARS1116 11 516653 516902 XI-530 NA proARS1117 11 528843 531477 XI-582 ARS1118 proARS1118 11 581468 581706 XI-612 ARS1120 proARS1120 11 611874 612107 XII-77 NA proARS1205 12 75896 77618 XII-92 ARS1206 proARS1206 12 91417 91659 XII-140 NA proARS1207 12 139293 140447 XII-157 ARS1209 proARS1209 12 156646 156883 XII-199 NA NA 12 197913 199417 XII-231 ARS1211 proARS1211 12 231179 231422 XII-373 ARS1213 proARS1213 12 373156 373400 XII-413 ARS1215 proARS1215 12 412668 412897 XII-513 ARS1217 proARS1217 12 512868 513117 XII-599 NA NA 12 595804 602804 XII-623 NA proARS1219 12 622103 623937 XII-660 ARS1220 proARS1220 12 659823 660072 XII-688 NA NA 12 684000 692000 XII-731 NA proARS1222 12 729123 732237 XII-745 ARS1223 proARS1223 12 744942 745179 XII-794 ARS1226 proARS1226 12 794020 794269 XII-948 NA proARS1231 12 947123 949407 XII-1007 ARS1232 proARS1232 12 1007180 1007470 XIII-32 ARS1303 proARS1303 13 31687 31935 XIII-94 ARS1305 proARS1305 13 94216 94463 XIII-137 ARS1307 proARS1307 13 137299 137548 XIII-184 ARS1308 proARS1308 13 183793 184037 XIII-227 NA NA 13 227023 227487 XIII-263 ARS1309 proARS1309 13 263062 263296 XIII-287 ARS1310 proARS1310 13 286782 287067 XIII-371 ARS1312 proARS1312 13 370976 371221 XIII-421 NA proARS1314 13 420160 421983 XIII-433 NA proARS1315 13 431453 433827 180 Table A.1: Continued Location ARS Alternative Name Chr Start End XIII-468 ARS1316 proARS1316 13 468177 468468 XIII-482 NA NA 13 480233 483637 XIII-503 NA proARS1319 13 502233 504197 XIII-536 ARS1320 proARS1320 13 535595 535843 XIII-555 NA proARS1322 13 553563 556237 XIII-574 NA NA 13 569500 577500 XIII-611 ARS1323 proARS1323 13 611273 611488 XIII-635 ARS1324 proARS1324 13 634479 634714 XIII-649 ARS1325 proARS1325 13 649307 649551 XIII-758 ARS1327 proARS1327 13 758222 758470 XIII-815 ARS1330 proARS1330 13 815341 815567 XIII-837 NA proARS1331 13 836823 838167 XIII-879 NA NA 13 877553 879907 XIII-898 ARS1332 proARS1332 13 897804 898040 XIV-62 ARS1406 proARS1406 14 61597 61894 XIV-90 ARS1407 proARS1407 14 89528 89802 XIV-280 ARS1414 proARS1414 14 279875 280108 XIV-322 ARS1415 proARS1415 14 321917 322210 XIV-344 NA NA 14 342963 344117 XIV-412 ARS1417 proARS1417 14 412263 412493 XIV-449 ARS1419 proARS1419 14 449343 449588 XIV-499 ARS1420 proARS1420 14 498987 499232 XIV-546 ARS1421 proARS1421 14 545966 546201 XIV-561 ARS1422 proARS1422 14 561106 561384 XIV-610 ARS1424 proARS1424 14 609458 609706 XIV-636 ARS1426 proARS1426 14 635660 635901 XIV-692 ARS1427 proARS1427 14 691482 691727 XV-36 ARS1506.5 ARS1531 15 35667 35903 XV-73 ARS1507 proARS1507 15 72636 72872 XV-85 ARS1508 proARS1508 15 85195 85444 XV-114 ARS1509 proARS1509 15 113843 114084 XV-167 ARS1510 proARS1510 15 166974 167220 XV-227 NA NA 15 223471 230471 XV-278 ARS1511 proARS1511 15 277529 277778 XV-309 NA proARS1512 15 308463 310157 XV-337 ARS1513 proARS1513 15 337279 337528 XV-372 NA NA 15 371693 372627 XV-398 NA NA 15 397773 399037 XV-437 ARS1513.5 ARS1501 15 436732 436966 181 Table A.1: Continued Location ARS Alternative Name Chr Start End XV-464 NA NA 15 463698 464877 XV-490 ARS1514 NA 15 489645 490129 XV-567 ARS1516 ADE2,proARS1516 15 566409 566643 XV-618 NA proARS1518 15 616913 618427 XV-767 ARS1523 proARS1522 15 766617 766862 XV-855 NA proARS1525 15 854173 856027 XV-874 ARS1526 proARS1526 15 874190 874434 XV-908 ARS1528 proARS1528 15 908288 908537 XV-982 ARS1529 proARS1529 15 981454 981690 XV-1054 ARS1529.5 NA 15 1053490 1053901 XVI-43 ARS1604 proARS1604 16 42976 43212 XVI-73 ARS1605 proARS1605 16 73038 73283 XVI-91 NA proARS1606 16 89758 92975 XVI-117 ARS1607 proARS1607 16 116505 116765 XVI-196 NA NA 16 194903 196097 XVI-242 NA NA 16 241223 242267 XVI-262 NA proARS1612 16 261168 262687 XVI-274 NA NA 16 273918 274962 XVI-290 ARS1614 proARS1614 16 289483 289704 XVI-385 ARS1618 proARS1618 16 384536 384784 XVI-418 ARS1619 proARS1619 16 418132 418359 XVI-457 ARS1620.5 ARS1633 16 456557 456805 XVI-512 ARS1621 proARS1621 16 511619 511940 XVI-560 NA NA 16 556133 563133 XVI-565 ARS1622.5 ARS1634 16 565046 565289 XVI-584 NA NA 16 583493 585147 XVI-634 ARS1623 proARS1623 16 633868 634117 XVI-685 ARS1624 proARS1624 16 684383 684632 XVI-777 ARS1626.5 ARS1635 16 776921 777152 XVI-819 ARS1627 proARS1627 16 819153 819393 XVI-843 ARS1628 proARS1628 16 842646 842894 182 Table A.2: Origins Identied With BrdU-IP-seq. Origins were analyzed for BrdU- incorporation as described for Figure 3.4A in WT and Forkhead Protein Mutant Cells (see Chapter VI). Origins that were shown to re in any of the cell types are listed here. Location ARS Name Alt Name Chr Start End I-31 ARS104 proARS104 1 30946 31184 I-42 ARS105 proARS105 1 40716 43300 I-70 ARS106 proARS106 1 70258 70491 I-124 ARS107 proARS107 1 124350 124599 I-137 ARS107.5 NA 1 136900 137900 I-147 ARS108 proARS108 1 146703 147690 I-160 ARS109 ARS101,proARS109 1 159906 160127 I-166 NA NA 1 162000 170000 I-176 ARS110 ADE1 1 176154 176402 II-37 NA NA 2 33000 41000 II-63 ARS202 proARS202 2 63186 63421 II-78 NA NA 2 70781 85781 II-94 ARS203 proARS203 2 93410 93811 II-143 ARS206 proARS206 2 142868 144016 II-170 ARS207 proARS207 2 170049 170298 II-198 ARS207.5 ARS231 2 198193 198434 II-210 ARS207.8 NA 2 209187 210063 II-238 ARS208 proARS208 2 237644 237879 II-255 ARS209 ARSH4,proARS209 2 254890 255136 II-266 NA NA 2 262978 269978 II-283 ARS210.5 NA 2 283015 283913 II-326 ARS211 proARS211 2 326099 326335 II-379 ARS212 proARS212 2 378434 379194 II-390 ARS213 proARS213 2 389245 390368 II-408 ARS214 proARS214 2 407831 408064 II-418 ARS215 proARS215 2 417739 418035 II-487 ARS216 proARS216 2 486661 486909 II-539 ARS218 proARS218 2 539137 539699 II-612 ARS219.5 NA 2 611269 613200 II-623 ARS220 proARS220 2 622625 622894 II-632 ARS221 proARS221 2 631934 632246 II-708 ARS222.5 NA 2 707158 708262 II-721 ARS223 proARS223 2 720601 721038 II-742 ARS224 proARS224 2 741512 741802 II-758 ARS225 proARS225 2 757390 757621 II-774 ARS227 proARS227 2 773918 774348 183 Table A.2: Continued Location ARS Name Alt Name Chr Start End II-802 ARS229 proARS229 2 801930 802617 III-39 ARS305 ARSA6C,proARS305 3 39158 39706 III-75 ARS306 NA 3 74457 74677 III-109 ARS307 ARSC2G1,proARS307 3 108775 109291 III-115 ARS308 NA 3 114314 114933 III-132 ARS309 proARS309 3 131978 132322 III-167 ARS310 proARS310 3 166494 167340 III-225 ARS315 NA 3 224807 225053 III-273 ARS316 proARS316 3 272844 273088 IV-46 ARS404 HO,proARS404 4 46181 46237 IV-124 ARS406 proARS406 4 123617 123902 IV-138 NA proARS407 4 137003 138437 IV-158 NA proARS408 4 157458 158807 IV-213 ARS409 proARS409 4 212420 212669 IV-236 ARS409.5 NA 4 235935 236184 IV-254 ARS410 proARS410 4 253789 254038 IV-317 ARS412 proARS412 4 316719 317111 IV-330 ARS413 proARS413 4 329564 329813 IV-408 ARS414 proARS414 4 408070 408312 IV-435 ARS415 proARS415 4 435056 435388 IV-444 NA NA 4 443318 444447 IV-463 ARS416 ARS1,proARS416 4 462430 462700 IV-477 NA NA 4 476003 477577 IV-484 ARS417 proARS417 4 483846 484091 IV-505 ARS417.5 ARS450 4 505336 505578 IV-555 ARS418 proARS418 4 555224 555461 IV-568 ARS419 proARS419 4 567490 567737 IV-629 ARS420 proARS420 4 629072 629669 IV-640 ARS421 proARS421 4 639859 640108 IV-703 ARS422 ARO1,proARS422 4 702879 703125 IV-721 NA NA 4 719873 721227 IV-749 ARS422.5 ARS451 4 748384 748630 IV-806 ARS425 proARS425 4 806044 806270 IV-845 ARS426.5 NA 4 844597 845593 IV-865 NA NA 4 857472 872472 IV-899 NA proARS427 4 898253 899887 IV-914 ARS428 proARS428 4 913780 914029 IV-973 NA NA 4 965500 980500 IV-1017 ARS430 proARS430 4 1016624 1016922 184 Table A.2: Continued Location ARS Name Alt Name Chr Start End IV-1058 ARS431 proARS431 4 1057828 1058076 IV-1110 ARS431.5 ARS452 4 1109955 1110196 IV-1159 ARS432 proARS432 4 1159250 1159499 IV-1166 ARS432.5 ARS453 4 1165998 1166221 IV-1241 ARS433 proARS433 4 1240869 1241098 IV-1276 ARS434 proARS434 4 1276212 1276440 IV-1303 ARS435 proARS435 4 1302579 1302819 IV-1354 ARS437 proARS437 4 1353494 1353667 IV-1404 ARS440 proARS440 4 1404277 1404512 IV-1448 ARS441 proARS441 4 1447298 1448928 IV-1462 ARS442 proARS443 4 1461849 1462161 IV-1487 ARS446 proARS446 4 1486905 1487149 IV-1503 ARS447 proARS447 4 1502624 1503221 V-59 ARS507 proARS507 5 59282 59516 V-94 ARS508 proARS508 5 93977 94218 V-146 ARS510 proARS510 5 145539 145782 V-174 ARS511 proARS511 5 173636 173874 V-213 ARS512 proARS512 5 212381 212630 V-288 ARS514 proARS514 5 287504 287750 V-302 NA NA 5 301213 302387 V-317 NA NA 5 316043 317307 V-354 ARS516 tRNAgluARS,proARS516 5 353504 353751 V-407 ARS517 proARS517 5 406747 406949 V-439 ARS518 proARS518 5 438929 439178 V-499 ARS520 proARS520 5 498417 499343 V-521 ARS521 NA 5 520767 521025 V-550 ARS522 ARS501,proARS522 5 549560 549809 V-570 ARS523 proARS523 5 569020 570085 VI-53 NA NA 6 51814 55080 VI-69 ARS603 proARS603 6 68690 68869 VI-105 ARS603.1 NA 6 104456 104695 VI-119 ARS603.5 proARS603.5 6 118631 118952 VI-136 ARS605 proARS605 6 135979 136080 VI-168 ARS606 proARS606 6 167606 168041 VI-199 ARS607 proARS607 6 199382 199493 VI-217 ARS608 NA 6 216344 216692 VII-17 NA NA 7 15327 19624 VII-64 ARS702 proARS702 7 64279 64528 VII-163 ARS707 proARS707 7 163180 163447 185 Table A.2: Continued Location ARS Name Alt Name Chr Start End VII-187 NA proARS709 7 186583 188207 VII-204 ARS710 proARS710 7 203917 204159 VII-286 ARS714 proARS714 7 285951 286246 VII-353 ARS716 proARS716 7 352695 352917 VII-389 ARS717 proARS717 7 388658 388892 VII-421 ARS718 proARS718 7 421093 421342 VII-485 ARS719 proARS719 7 484932 485160 VII-509 ARS720 proARS720 7 508729 508978 VII-569 ARS721 proARS721 7 568490 568738 VII-576 NA proARS723 7 574978 576840 VII-608 NA NA 7 607563 607887 VII-621 NA NA 7 620243 622517 VII-654 NA proARS726 7 652703 655047 VII-660 ARS727 proARS727 7 659809 660054 VII-715 ARS728 proARS728 7 715273 715556 VII-778 ARS729 proARS729 7 777967 778216 VII-795 NA proARS730 7 794178 795956 VII-835 ARS731 proARS731 7 834492 834736 VII-847 NA NA 7 845923 848287 VII-888 ARS731.5 ARS737 7 888350 888599 VII-917 NA proARS732 7 916233 917857 VII-978 ARS733 proARS733 7 977730 977979 VII-1000 ARS734 proARS734 7 999448 999695 VII-1063 NA NA 7 1061998 1064852 VIII-46 NA proARS804 8 45113 47112 VIII-64 ARS805 SPO11,proARS805 8 64255 64489 VIII-116 NA proARS806 8 115683 117257 VIII-133 ARS807 NA 8 133347 133591 VIII-169 ARS809 proARS809 8 168531 168773 VIII-246 ARS813 proARS813 8 245719 245968 VIII-297 ARS815 proARS815 8 296882 297475 VIII-360 NA proARS816 8 358953 360647 VIII-381 NA proARS817 8 380153 382157 VIII-392 ARS818 proARS818 8 392148 392391 VIII-403 NA NA 8 395590 410590 VIII-448 ARS820 proARS820 8 447619 447853 VIII-475 NA proARS821 8 474023 475267 VIII-502 ARS822 proARS822 8 501751 501992 IX-34 NA proARS905 9 32509 34859 186 Table A.2: Continued Location ARS Name Alt Name Chr Start End IX-74 ARS907 proARS907 9 73803 74220 IX-106 ARS909 proARS909 9 105821 106048 IX-136 ARS911 proARS911 9 136094 136335 IX-175 ARS912 proARS912 9 175034 175355 IX-205 NA NA 9 201591 208591 IX-215 ARS913 ARS901,proARS913 9 214675 214826 IX-226 NA NA 9 224943 227297 IX-248 ARS914 proARS914 9 247579 247800 IX-283 NA NA 9 275259 290259 IX-311 ARS916 proARS917 9 310583 311070 IX-342 ARS919 proARS919 9 341853 342096 IX-357 ARS920 proARS920 9 357156 357393 IX-412 ARS922 proARS922 9 411817 412053 X-68 ARS1005 proARS1005 10 67467 67949 X-100 ARS1006 proARS1006 10 99359 99796 X-114 ARS1007 proARS1007 10 113226 113828 X-162 ARS1007.5 NA 10 161435 161860 X-204 ARS1008 proARS1008 10 203729 204614 X-228 ARS1009 proARS1009 10 228248 228740 X-299 ARS1010 proARS1010 10 298471 298952 X-337 ARS1011 proARS1011 10 336976 337225 X-355 NA NA 10 353463 355722 X-375 ARS1012 proARS1012 10 374575 374818 X-417 ARS1014 proARS1014 10 416888 417134 X-442 ARS1015 proARS1015 10 442248 442658 X-459 ARS1017 NA 10 459029 459588 X-540 ARS1018 proARS1018 10 540239 540474 X-613 ARS1019 proARS1019 10 612542 612975 X-654 ARS1020 proARS1020 10 654069 654309 X-684 ARS1021 ARS121,proARS1021 10 683328 683817 XI-56 ARS1103 proARS1103 11 55670 55917 XI-98 ARS1104.5 ARS1125 11 98329 98568 XI-114 NA proARS1105 11 113073 115127 XI-153 ARS1106 proARS1106 11 152934 153173 XI-196 ARS1106.3 ARS1126 11 196038 196284 XI-213 ARS1106.7 ARS1127 11 213080 213385 XI-236 NA NA 11 235293 236577 XI-257 NA proARS1107 11 256478 258327 XI-302 NA NA 11 301463 303017 187 Table A.2: Continued Location ARS Name Alt Name Chr Start End XI-329 ARS1109 proARS1109 11 329322 329571 XI-389 ARS1112 proARS1112 11 388607 388902 XI-417 ARS1113 proARS1113 11 416822 417055 XI-448 ARS1114 proARS1114 11 447657 447892 XI-457 ARS1114.5 NA 11 454453 459197 XI-517 ARS1116 proARS1116 11 516653 516902 XI-530 NA proARS1117 11 528843 531477 XI-582 ARS1118 proARS1118 11 581468 581706 XI-612 ARS1120 proARS1120 11 611874 612107 XI-642 ARS1123 proARS1123 11 642355 642602 XII-77 NA proARS1205 12 75896 77618 XII-92 ARS1206 proARS1206 12 91417 91659 XII-140 NA proARS1207 12 139293 140447 XII-157 ARS1209 proARS1209 12 156646 156883 XII-199 NA NA 12 197913 199417 XII-231 ARS1211 proARS1211 12 231179 231422 XII-244 ARS1211.5 NA 12 243527 243960 XII-289 ARS1212 proARS1212 12 289220 289469 XII-344 ARS1212.5 NA 12 343577 344033 XII-373 ARS1213 proARS1213 12 373156 373400 XII-413 ARS1215 proARS1215 12 412668 412897 XII-434 NA NA 12 426942 441942 XII-459 ARS1216.5 rDNAARS 12 458826 459141 XII-494 NA NA 12 493903 494987 XII-513 ARS1217 proARS1217 12 512868 513117 XII-603 ARS1218 proARS1218 12 602938 603155 XII-623 NA proARS1219 12 622103 623937 XII-645 NA NA 12 644493 645767 XII-660 ARS1220 proARS1220 12 659823 660072 XII-688 NA NA 12 684000 692000 XII-731 NA proARS1222 12 729123 732237 XII-745 ARS1223 proARS1223 12 744942 745179 XII-794 ARS1226 proARS1226 12 794020 794269 XII-822 NA proARS1227 12 821403 823002 XII-889 ARS1227.5 ARS1238 12 888569 888810 XII-928 NA proARS1229 12 927043 929292 XII-948 NA proARS1231 12 947123 949407 XII-1007 ARS1232 proARS1232 12 1007180 1007470 XII-1014 ARS1233 proARS1233 12 1013789 1014017 188 Table A.2: Continued Location ARS Name Alt Name Chr Start End XII-1024 ARS1234 proARS1234 12 1023967 1024207 XII-1043 NA NA 12 1042798 1043952 XIII-32 ARS1303 proARS1303 13 31687 31935 XIII-94 ARS1305 proARS1305 13 94216 94463 XIII-137 ARS1307 proARS1307 13 137299 137548 XIII-159 ARS1307.5 NA 13 158846 159279 XIII-184 ARS1308 proARS1308 13 183793 184037 XIII-227 NA NA 13 227023 227487 XIII-263 ARS1309 proARS1309 13 263062 263296 XIII-287 ARS1310 proARS1310 13 286782 287067 XIII-371 ARS1312 proARS1312 13 370976 371221 XIII-421 NA proARS1314 13 420160 421983 XIII-433 NA proARS1315 13 431453 433827 XIII-468 ARS1316 proARS1316 13 468177 468468 XIII-482 NA NA 13 480233 483637 XIII-504 ARS1319 proARS1319 13 503346 504087 XIII-536 ARS1320 proARS1320 13 535595 535843 XIII-555 ARS1322 proARS1322 13 554392 554750 XIII-569 NA NA 13 568128 569097 XIII-611 ARS1323 proARS1323 13 611273 611488 XIII-635 ARS1324 proARS1324 13 634479 634714 XIII-649 ARS1325 proARS1325 13 649307 649551 XIII-689 NA proARS1326 13 687903 689917 XIII-758 ARS1327 proARS1327 13 758222 758470 XIII-773 ARS1328 proARS1328 13 772629 772878 XIII-805 ARS1329 proARS1329 13 805116 805338 XIII-815 ARS1330 proARS1330 13 815341 815567 XIII-837 NA proARS1331 13 836823 838167 XIII-879 NA NA 13 877553 879907 XIII-898 ARS1332 proARS1332 13 897804 898040 XIV-62 ARS1406 proARS1406 14 61597 61894 XIV-90 ARS1407 proARS1407 14 89528 89802 XIV-127 NA proARS1410 14 125623 127807 XIV-170 ARS1411 proARS1411 14 169566 169804 XIV-196 ARS1412 proARS1412 14 196055 196291 XIV-250 ARS1413 proARS1413 14 250259 250506 XIV-280 ARS1414 proARS1414 14 279875 280108 XIV-322 ARS1415 proARS1415 14 321917 322210 XIV-344 NA NA 14 342963 344117 189 Table A.2: Continued Location ARS Name Alt Name Chr Start End XIV-352 NA NA 14 348469 355469 XIV-412 ARS1417 proARS1417 14 412263 412493 XIV-449 ARS1419 proARS1419 14 449343 449588 XIV-499 ARS1420 proARS1420 14 498987 499232 XIV-546 ARS1421 proARS1421 14 545966 546201 XIV-561 ARS1422 proARS1422 14 561106 561384 XIV-568 NA proARS1423 14 567715 569020 XIV-610 ARS1424 proARS1424 14 609458 609706 XIV-636 ARS1426 proARS1426 14 635660 635901 XIV-692 ARS1427 proARS1427 14 691482 691727 XIV-714 NA proARS1428 14 712633 714697 XIV-739 NA NA 14 737003 740237 XV-36 ARS1506.5 ARS1531 15 35667 35903 XV-73 ARS1507 proARS1507 15 72636 72872 XV-85 ARS1508 proARS1508 15 85195 85444 XV-104 NA NA 15 103253 104167 XV-114 ARS1509 proARS1509 15 113843 114084 XV-155 ARS1509.5 NA 15 154972 155462 XV-167 ARS1510 proARS1510 15 166974 167220 XV-228 ARS1510.5 NA 15 227481 228117 XV-278 ARS1511 proARS1511 15 277529 277778 XV-309 NA proARS1512 15 308463 310157 XV-337 ARS1513 proARS1513 15 337279 337528 XV-348 NA NA 15 347633 348862 XV-372 NA NA 15 371693 372627 XV-398 NA NA 15 397773 399037 XV-437 ARS1513.5 ARS1501 15 436732 436966 XV-464 NA NA 15 463698 464877 XV-490 ARS1514 NA 15 489645 490129 XV-521 NA NA 15 520113 521217 XV-567 ARS1516 ADE2,proARS1516 15 566409 566643 XV-601 ARS1517 ARS1502,proARS1517 15 600885 600960 XV-618 NA proARS1518 15 616913 618427 XV-657 ARS1519 proARS1519 15 656632 656876 XV-682 NA proARS1520 15 680823 683197 XV-730 ARS1521 proARS1521 15 729739 729969 XV-767 ARS1523 proARS1522 15 766617 766862 XV-783 ARS1524 proARS1524 15 783344 783563 XV-855 NA proARS1525 15 854173 856027 190 Table A.2: Continued Location ARS Name Alt Name Chr Start End XV-874 ARS1526 proARS1526 15 874190 874434 XV-908 ARS1528 proARS1528 15 908288 908537 XV-982 ARS1529 proARS1529 15 981454 981690 XV-1011 NA NA 15 1003760 1018760 XV-1023 NA NA 15 1022548 1023652 XV-1054 ARS1529.5 NA 15 1053490 1053901 XVI-43 ARS1604 proARS1604 16 42976 43212 XVI-73 ARS1605 proARS1605 16 73038 73283 XVI-91 NA proARS1606 16 89758 92975 XVI-117 ARS1607 proARS1607 16 116505 116765 XVI-163 NA proARS1608 16 161693 163447 XVI-179 NA NA 16 178643 179827 XVI-196 NA NA 16 194903 196097 XVI-211 ARS1609 proARS1609,proARS1609 16 210467 211723 XVI-242 NA NA 16 241223 242267 XVI-262 NA proARS1612 16 261168 262687 XVI-274 NA NA 16 273918 274962 XVI-290 ARS1614 proARS1614 16 289483 289704 XVI-304 NA proARS1615 16 303433 305027 XVI-318 NA proARS1616 16 316743 318397 XVI-332 NA proARS1617 16 331043 332607 XVI-369 NA NA 16 369003 369737 XVI-385 ARS1618 proARS1618 16 384536 384784 XVI-418 ARS1619 proARS1619 16 418132 418359 XVI-428 NA NA 16 427273 428757 XVI-457 ARS1620.5 ARS1633 16 456557 456805 XVI-500 NA NA 16 498853 500557 XVI-512 ARS1621 proARS1621 16 511619 511940 XVI-553 NA NA 16 552403 554287 XVI-564 ARS1622 proARS1622 16 563822 564061 XVI-584 NA NA 16 583493 585147 XVI-634 ARS1623 proARS1623 16 633868 634117 XVI-685 ARS1624 proARS1624 16 684383 684632 XVI-696 ARS1625 proARS1625 16 695432 695681 XVI-729 NA NA 16 728393 729207 XVI-749 ARS1626 proARS1626 16 749094 749341 XVI-777 ARS1626.5 ARS1635 16 776921 777152 XVI-819 ARS1627 proARS1627 16 819153 819393 XVI-843 ARS1628 proARS1628 16 842646 842894 191 Table A.2: Continued Location ARS Name Alt Name Chr Start End XVI-881 ARS1630 proARS1630 16 880854 881102 XVI-933 ARS1631 proARS1631 16 932976 933223 XVI-942 NA proARS1632 16 940923 943157 192 Table A.3: BrdU peaks that were detected based on peak height but that did not align with origins listed in OriDB were analyzed for height dierences between WT, rpd3, dep1 and set2eaf3rco1 cells as described in Figure 5.1 legend. Chr Start End WT rpd3 dep1 rco1=eaf3=set2 2 292643 293603 1 0 0 0 2 309833 310153 0 0 1 0 2 342533 343943 0 1 1 0 2 498133 498383 1 1 0 0 2 659113 660553 0 0 0 0 3 262416 263006 1 0 1 0 4 231440 231830 1 1 1 0 4 358110 358890 1 1 1 0 4 385340 385920 0 1 0 0 4 511330 511330 0 1 1 0 4 983560 983560 1 0 0 0 4 1103420 1105170 0 1 1 0 5 192879 193049 1 0 0 0 5 220879 221589 1 1 1 0 6 61827 62377 0 1 0 0 7 317428 318698 0 1 1 0 7 404058 404678 0 1 0 0 7 593648 593648 0 1 1 0 7 635478 637108 0 0 0 0 7 765058 765058 1 0 0 0 8 78045 78485 1 1 1 1 8 280195 281735 0 0 0 0 9 196410 197380 1 1 1 0 9 331650 331850 1 0 1 0 10 324306 325306 0 1 1 0 11 30521 31231 0 0 1 0 11 169211 169911 0 1 1 0 11 428551 428781 0 1 1 0 11 488781 489071 0 1 1 0 11 499901 500181 0 1 1 0 12 170864 171604 1 0 0 0 12 472034 472034 1 0 0 0 14 100994 100994 1 0 1 0 14 234874 236584 0 1 1 0 14 359424 359424 1 0 0 0 14 585374 587254 0 1 1 0 14 675564 676364 0 1 1 0 193 Table A.3: Continued Chr Start End WT rpd3 dep1 rco1=eaf3=set2 14 728314 729194 0 1 1 0 15 517071 517071 0 1 1 0 15 648591 650691 0 0 0 0 15 799251 799631 0 1 0 0 194 Table A.4: List of origins associated with BrdU peaks that are signicantly changed in rpd3 cells versus wild-type cells, analyzed as described in Figure 5.1 legend. Location ARS Alternative Name Chr Start End I-42 ARS105 proARS105 1 40716 43300 I-70 ARS106 proARS106 1 70258 70491 II-94 ARS203 proARS203 2 93410 93811 II-178 ARS207.1 ARS207.3 2 177529 177877 II-283 ARS210.5 NA 2 283015 283913 II-612 ARS219.5 NA 2 611269 613200 II-774 ARS227 proARS227 2 773918 774348 III-30 ARS304 NA 3 30199 30657 IV-46 ARS404 HO,proARS404 4 46181 46237 IV-138 NA proARS407 4 137003 138437 IV-505 ARS417.5 ARS450 4 505336 505578 IV-753 ARS423 proARS423 4 753159 753391 IV-1058 ARS431 proARS431 4 1057828 1058076 IV-1303 ARS435 proARS435 4 1302579 1302819 IV-1354 ARS437 proARS437 4 1353494 1353667 V-213 ARS512 proARS512 5 212381 212630 V-256 NA NA 5 255623 256957 V-302 NA NA 5 301213 302387 V-521 ARS521 NA 5 520767 521025 V-550 ARS522 ARS501,proARS522 5 549560 549809 VI-53 NA NA 6 51814 55080 VI-69 ARS603 proARS603 6 68690 68869 VI-217 ARS608 NA 6 216344 216692 VII-17 NA NA 7 15327 19624 VII-33 NA NA 7 31980 34434 VII-64 ARS702 proARS702 7 64279 64528 VII-187 NA proARS709 7 186583 188207 VII-353 ARS716 proARS716 7 352695 352917 VII-540 NA NA 7 532482 547482 VII-660 ARS727 proARS727 7 659809 660054 VII-917 NA proARS732 7 916233 917857 VII-978 ARS733 proARS733 7 977730 977979 VIII-46 NA proARS804 8 45113 47112 VIII-169 ARS809 proARS809 8 168531 168773 VIII-246 ARS813 proARS813 8 245719 245968 VIII-381 NA proARS817 8 380153 382157 VIII-392 ARS818 proARS818 8 392148 392391 VIII-475 NA proARS821 8 474023 475267 195 Table A.4: Continued Location ARS Alternative Name Chr Start End VIII-502 ARS822 proARS822 8 501751 501992 IX-163 ARS911.5 NA 9 162936 163302 IX-175 ARS912 proARS912 9 175034 175355 IX-283 NA NA 9 275259 290259 IX-314 NA proARS918 9 313359 314533 X-114 ARS1007 proARS1007 10 113226 113828 X-162 ARS1007.5 NA 10 161435 161860 X-355 NA NA 10 353463 355722 X-654 ARS1020 proARS1020 10 654069 654309 XI-98 ARS1104.5 ARS1125 11 98329 98568 XI-114 NA proARS1105 11 113073 115127 XI-196 ARS1106.3 ARS1126 11 196038 196284 XI-236 NA NA 11 235293 236577 XI-257 NA proARS1107 11 256478 258327 XI-530 NA proARS1117 11 528843 531477 XI-582 ARS1118 proARS1118 11 581468 581706 XII-77 NA proARS1205 12 75896 77618 XII-92 ARS1206 proARS1206 12 91417 91659 XII-199 NA NA 12 197913 199417 XII-344 NA NA 12 342443 344717 XII-623 NA proARS1219 12 622103 623937 XII-645 NA NA 12 644493 645767 XII-822 NA proARS1227 12 821403 823002 XII-889 ARS1227.5 ARS1238 12 888569 888810 XII-948 NA proARS1231 12 947123 949407 XII-1007 ARS1232 proARS1232 12 1007180 1007470 XII-1043 NA NA 12 1042798 1043952 XIII-159 NA NA 13 158563 159797 XIII-227 NA NA 13 227023 227487 XIII-421 NA proARS1314 13 420160 421983 XIII-482 NA NA 13 480233 483637 XIII-555 NA proARS1322 13 553563 556237 XIII-574 NA NA 13 569500 577500 XIII-689 NA proARS1326 13 687903 689917 XIII-837 NA proARS1331 13 836823 838167 XIII-879 NA NA 13 877553 879907 XIV-62 ARS1406 proARS1406 14 61597 61894 XIV-127 NA proARS1410 14 125623 127807 XIV-159 NA NA 14 151280 166280 196 Table A.4: Continued Location ARS Alternative Name Chr Start End XIV-170 ARS1411 proARS1411 14 169566 169804 XIV-196 ARS1412 proARS1412 14 196055 196291 XIV-250 ARS1413 proARS1413 14 250259 250506 XIV-280 ARS1414 proARS1414 14 279875 280108 XIV-344 NA NA 14 342963 344117 XIV-449 ARS1419 proARS1419 14 449343 449588 XIV-499 ARS1420 proARS1420 14 498987 499232 XIV-692 ARS1427 proARS1427 14 691482 691727 XV-73 ARS1507 proARS1507 15 72636 72872 XV-567 ARS1516 ADE2,proARS1516 15 566409 566643 XV-682 NA proARS1520 15 680823 683197 XV-767 ARS1523 proARS1522 15 766617 766862 XV-874 ARS1526 proARS1526 15 874190 874434 XV-982 ARS1529 proARS1529 15 981454 981690 XV-1023 NA NA 15 1022548 1023652 XVI-43 ARS1604 proARS1604 16 42976 43212 XVI-91 NA proARS1606 16 89758 92975 XVI-117 ARS1607 proARS1607 16 116505 116765 XVI-163 NA proARS1608 16 161693 163447 XVI-179 NA NA 16 178643 179827 XVI-262 NA proARS1612 16 261168 262687 XVI-332 NA proARS1617 16 331043 332607 XVI-369 NA NA 16 369003 369737 XVI-457 ARS1620.5 ARS1633 16 456557 456805 XVI-584 NA NA 16 583493 585147 XVI-685 ARS1624 proARS1624 16 684383 684632 XVI-749 ARS1626 proARS1626 16 749094 749341 197 Table A.5: List of origins associated with BrdU peaks that are signicantly changed in Rpd3S cells based on analysis described in Figure 5.3 ARS Location ARS Alternative Name Chr Start End IV-753 ARS423 proARS423 4 753159 753391 VI-69 ARS603 proARS603 6 68690 68869 VII-64 ARS702 proARS702 7 64279 64528 XII-623 NA proARS1219 12 622103 623937 XII-889 ARS1227.5 ARS1238 12 888569 888810 XV-767 ARS1523 proARS1522 15 766617 766862 198 Table A.6: List of origins associated with BrdU peaks that are signicantly changed in dep1 cells based on analysis described in Figure 5.3 Location ARS Name Alternative Name Chr Start End I-70 ARS106 proARS106 1 70258 70491 II-178 ARS207.1 ARS207.3 2 177529 177877 II-283 ARS210.5 NA 2 283015 283913 II-612 ARS219.5 NA 2 611269 613200 II-774 ARS227 proARS227 2 773918 774348 II-802 ARS229 proARS229 2 801930 802617 III-30 ARS304 NA 3 30199 30657 III-273 ARS316 proARS316 3 272844 273088 IV-46 ARS404 HO,proARS404 4 46181 46237 IV-138 NA proARS407 4 137003 138437 IV-484 ARS417 proARS417 4 483846 484091 IV-505 ARS417.5 ARS450 4 505336 505578 IV-1058 ARS431 proARS431 4 1057828 1058076 IV-1303 ARS435 proARS435 4 1302579 1302819 IV-1354 ARS437 proARS437 4 1353494 1353667 V-213 ARS512 proARS512 5 212381 212630 V-256 NA NA 5 255623 256957 V-288 ARS514 proARS514 5 287504 287750 V-317 NA NA 5 316043 317307 V-407 ARS517 proARS517 5 406747 406949 V-550 ARS522 ARS501,proARS522 5 549560 549809 VI-53 NA NA 6 51814 55080 VI-69 ARS603 proARS603 6 68690 68869 VI-217 ARS608 NA 6 216344 216692 VII-17 NA NA 7 15327 19624 VII-33 NA NA 7 31980 34434 VII-64 ARS702 proARS702 7 64279 64528 VII-353 ARS716 proARS716 7 352695 352917 VII-917 NA proARS732 7 916233 917857 VII-978 ARS733 proARS733 7 977730 977979 VIII-46 NA proARS804 8 45113 47112 VIII-169 ARS809 proARS809 8 168531 168773 VIII-246 ARS813 proARS813 8 245719 245968 VIII-381 NA proARS817 8 380153 382157 VIII-392 ARS818 proARS818 8 392148 392391 VIII-475 NA proARS821 8 474023 475267 VIII-502 ARS822 proARS822 8 501751 501992 IX-136 ARS911 proARS911 9 136094 136335 199 Table A.6: Continued Location ARS Name Alternative Name Chr Start End IX-163 ARS911.5 NA 9 162936 163302 IX-175 ARS912 proARS912 9 175034 175355 IX-283 NA NA 9 275259 290259 IX-314 NA proARS918 9 313359 314533 X-114 ARS1007 proARS1007 10 113226 113828 X-162 ARS1007.5 NA 10 161435 161860 X-355 NA NA 10 353463 355722 X-654 ARS1020 proARS1020 10 654069 654309 XI-98 ARS1104.5 ARS1125 11 98329 98568 XI-196 ARS1106.3 ARS1126 11 196038 196284 XI-236 NA NA 11 235293 236577 XI-530 NA proARS1117 11 528843 531477 XI-582 ARS1118 proARS1118 11 581468 581706 XI-642 ARS1123 proARS1123 11 642355 642602 XII-77 NA proARS1205 12 75896 77618 XII-199 NA NA 12 197913 199417 XII-623 NA proARS1219 12 622103 623937 XII-645 NA NA 12 644493 645767 XII-822 NA proARS1227 12 821403 823002 XII-1007 ARS1232 proARS1232 12 1007180 1007470 XII-1043 NA NA 12 1042798 1043952 XIII-159 NA NA 13 158563 159797 XIII-227 NA NA 13 227023 227487 XIII-482 NA NA 13 480233 483637 XIII-555 NA proARS1322 13 553563 556237 XIII-574 NA NA 13 569500 577500 XIII-689 NA proARS1326 13 687903 689917 XIII-837 NA proARS1331 13 836823 838167 XIV-62 ARS1406 proARS1406 14 61597 61894 XIV-127 NA proARS1410 14 125623 127807 XIV-159 NA NA 14 151280 166280 XIV-250 ARS1413 proARS1413 14 250259 250506 XIV-280 ARS1414 proARS1414 14 279875 280108 XIV-449 ARS1419 proARS1419 14 449343 449588 XIV-499 ARS1420 proARS1420 14 498987 499232 XIV-546 ARS1421 proARS1421 14 545966 546201 XIV-692 ARS1427 proARS1427 14 691482 691727 XV-73 ARS1507 proARS1507 15 72636 72872 XV-85 ARS1508 proARS1508 15 85195 85444 200 Table A.6: Continued Location ARS Name Alternative Name Chr Start End XV-114 ARS1509 proARS1509 15 113843 114084 XV-490 ARS1514 NA 15 489645 490129 XV-567 ARS1516 ADE2,proARS1516 15 566409 566643 XV-682 NA proARS1520 15 680823 683197 XV-767 ARS1523 proARS1522 15 766617 766862 XV-855 NA proARS1525 15 854173 856027 XV-874 ARS1526 proARS1526 15 874190 874434 XV-982 ARS1529 proARS1529 15 981454 981690 XV-1023 NA NA 15 1022548 1023652 XVI-43 ARS1604 proARS1604 16 42976 43212 XVI-91 NA proARS1606 16 89758 92975 XVI-117 ARS1607 proARS1607 16 116505 116765 XVI-163 NA proARS1608 16 161693 163447 XVI-262 NA proARS1612 16 261168 262687 XVI-332 NA proARS1617 16 331043 332607 XVI-457 ARS1620.5 ARS1633 16 456557 456805 XVI-685 ARS1624 proARS1624 16 684383 684632 201 Table A.7: Each origin was analyzed for specic Rpd3-regulated histone modications (acetylation at H4 K12, H4 K5, H4 K6 and H4 K18). The overlap between Rpd3 regulated origins and modications was then analyzed with a set of hypergeometric tests. Modication Full Data Rpd3 Changes Acetyl Changes P-value H4 K12 212 71 140 0.0004 H4 K5 189 61 67 0.0132 H4 K16 205 65 8 0.492 H4 K18 167 50 71 0.0068 Union 229 78 155 0.0016 202 Table A.8: Wild-type and dep1 datasets were analyzed as described in Materials and Methods and the dierences in their peak-heights were used as response variables in Random Forest Regression analysis. The covariates (DNA binding factors) were ranked in their association to Rpd3 origin deregulation by the sum of their rankings in both % Increase in MSE and Increase in Node Impurity. Only binding factors that ranked higher than tenth in both scores are listed. Factor MSE (Ranking) Node Impurity (Ranking) Rank Sum Sin3 7.8 (1) 0.31 (2) 3 Rap1 5.13 (2) 0.37 (1) 3 Rpd3 4.21 (4) 0.30 (3) 7 Sum1 4.5 (3) 0.29 (4) 7 Smp1 2.98 (9) 0.2 (6) 15 Swi6 3.37 (7) 0.15 (9) 16 203 Table A.9: Origins Analyzed For Fkh-Regulation with BrdU-IP-seq. Excited origins are indicated with a -1, repressed origins are indicated with a 1 and unregulated origins are indicated with a 0 Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 1 30946 31184 0 0 -1 -1 1 40716 43300 0 0 0 0 1 70258 70491 0 0 1 1 1 124350 124599 0 0 0 0 1 136900 137900 0 0 0 0 1 146703 147690 0 0 0 0 1 159906 160127 0 0 0 0 1 162000 170000 0 0 1 0 1 176154 176402 0 0 -1 -1 2 33000 41000 0 0 0 0 2 63186 63421 0 0 -1 -1 2 70781 85781 0 0 0 -1 2 93410 93811 0 0 1 1 2 142868 144016 0 0 1 1 2 170049 170298 0 0 0 0 2 198193 198434 0 0 -1 0 2 209187 210063 0 0 -1 -1 2 237644 237879 0 0 0 0 2 254890 255136 0 0 0 0 2 262978 269978 1 0 0 0 2 283015 283913 0 0 0 0 2 326099 326335 0 0 0 0 2 378434 379194 0 0 1 1 2 389245 390368 0 0 0 0 2 407831 408064 0 0 0 0 2 417739 418035 0 0 0 1 2 486661 486909 -1 0 -1 -1 2 539137 539699 0 0 0 0 2 611269 613200 0 0 0 0 2 622625 622894 0 0 -1 -1 2 631934 632246 -1 0 -1 -1 2 707158 708262 0 0 1 0 2 720601 721038 0 0 0 0 2 741512 741802 0 0 0 0 2 757390 757621 0 0 1 1 2 773918 774348 0 0 0 0 2 801930 802617 -1 0 -1 -1 204 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 3 39158 39706 -1 0 -1 -1 3 74457 74677 0 0 -1 -1 3 108775 109291 0 0 0 0 3 114314 114933 1 0 1 1 3 131978 132322 0 0 0 0 3 166494 167340 0 0 -1 0 3 224807 225053 0 0 -1 0 3 272844 273088 0 0 0 0 4 46181 46237 0 0 0 0 4 123617 123902 -1 0 -1 0 4 137003 138437 0 0 1 1 4 157458 158807 0 0 0 0 4 212420 212669 1 0 1 1 4 235935 236184 0 0 1 1 4 253789 254038 0 0 0 0 4 316719 317111 0 0 1 1 4 329564 329813 0 0 -1 -1 4 408070 408312 0 0 -1 0 4 435056 435388 0 0 0 0 4 443318 444447 1 0 1 1 4 462430 462700 0 0 0 0 4 476003 477577 0 0 0 0 4 483846 484091 0 0 -1 -1 4 505336 505578 0 0 0 0 4 555224 555461 0 0 -1 -1 4 567490 567737 0 0 -1 -1 4 629072 629669 0 0 1 1 4 639859 640108 0 0 1 1 4 702879 703125 0 0 1 1 4 719873 721227 0 0 1 1 4 748384 748630 0 0 -1 -1 4 806044 806270 0 0 0 0 4 844597 845593 0 0 -1 -1 4 857472 872472 0 0 0 0 4 898253 899887 -1 0 -1 -1 4 913780 914029 0 0 -1 -1 4 965500 980500 0 0 0 0 4 1016624 1016922 0 0 0 0 4 1057828 1058076 0 0 1 1 205 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 4 1109955 1110196 0 0 1 1 4 1159250 1159499 0 0 -1 -1 4 1165998 1166221 0 0 -1 -1 4 1240869 1241098 0 0 1 1 4 1276212 1276440 0 0 1 1 4 1302579 1302819 0 0 1 1 4 1353494 1353667 0 0 0 0 4 1404277 1404512 0 0 1 1 4 1447298 1448928 1 0 1 1 4 1461849 1462161 0 0 1 1 4 1486905 1487149 0 0 0 0 4 1502624 1503221 0 0 0 0 5 59282 59516 0 0 -1 -1 5 93977 94218 0 0 -1 -1 5 145539 145782 0 0 0 0 5 173636 173874 0 0 -1 0 5 212381 212630 1 0 1 1 5 287504 287750 0 0 0 0 5 301213 302387 0 0 1 1 5 316043 317307 -1 0 -1 -1 5 353504 353751 0 0 -1 -1 5 406747 406949 0 0 0 0 5 438929 439178 0 0 0 0 5 498417 499343 0 0 0 0 5 520767 521025 0 0 0 0 5 549560 549809 0 0 0 0 5 569020 570085 0 0 -1 -1 6 51814 55080 0 0 0 0 6 68690 68869 0 0 0 0 6 104456 104695 0 0 0 0 6 118631 118952 0 0 0 0 6 135979 136080 1 0 1 1 6 167606 168041 0 0 0 0 6 199382 199493 0 0 -1 -1 6 216344 216692 0 0 -1 -1 7 15327 19624 0 0 -1 -1 7 64279 64528 0 0 0 0 7 163180 163447 -1 0 -1 -1 7 186583 188207 0 0 0 0 206 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 7 203917 204159 -1 0 -1 -1 7 285951 286246 0 0 -1 -1 7 352695 352917 0 0 0 0 7 388658 388892 0 0 -1 -1 7 421093 421342 -1 0 -1 -1 7 484932 485160 0 0 0 0 7 508729 508978 0 0 0 0 7 568490 568738 0 0 -1 0 7 574978 576840 0 0 0 0 7 607563 607887 0 0 0 0 7 620243 622517 0 0 -1 -1 7 652703 655047 0 0 0 0 7 659809 660054 -1 0 -1 -1 7 715273 715556 -1 0 -1 -1 7 777967 778216 0 0 -1 -1 7 794178 795956 0 0 0 1 7 834492 834736 0 0 -1 -1 7 845923 848287 0 0 0 1 7 888350 888599 0 0 -1 -1 7 916233 917857 0 0 1 1 7 977730 977979 0 0 1 1 7 999448 999695 0 0 1 1 7 1061998 1064852 0 0 0 0 8 45113 47112 0 0 0 0 8 64255 64489 0 0 -1 -1 8 115683 117257 0 0 0 0 8 133347 133591 0 0 -1 -1 8 168531 168773 0 0 0 0 8 245719 245968 0 0 0 0 8 296882 297475 -1 0 -1 -1 8 358953 360647 0 0 0 0 8 380153 382157 0 0 0 0 8 392148 392391 0 0 0 0 8 395590 410590 0 0 0 0 8 447619 447853 0 0 -1 -1 8 474023 475267 0 0 0 0 8 501751 501992 0 0 0 0 9 32509 34859 0 0 0 0 9 73803 74220 0 0 -1 0 207 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 9 105821 106048 -1 0 -1 -1 9 136094 136335 0 0 1 1 9 175034 175355 0 0 1 0 9 201591 208591 0 0 0 0 9 214675 214826 0 0 -1 -1 9 224943 227297 0 0 -1 0 9 247579 247800 0 0 -1 -1 9 275259 290259 0 0 0 0 9 310583 311070 0 0 1 1 9 341853 342096 1 0 0 0 9 357156 357393 1 0 0 0 9 411817 412053 0 0 -1 -1 10 67467 67949 0 0 -1 0 10 99359 99796 0 0 -1 0 10 113226 113828 0 0 1 1 10 161435 161860 0 0 0 0 10 203729 204614 0 0 -1 -1 10 228248 228740 0 0 0 0 10 298471 298952 0 0 1 1 10 336976 337225 0 0 1 1 10 353463 355722 0 0 0 0 10 374575 374818 -1 0 -1 -1 10 416888 417134 0 0 -1 -1 10 442248 442658 1 0 1 1 10 459029 459588 0 0 1 1 10 540239 540474 -1 0 -1 -1 10 612542 612975 0 0 -1 -1 10 654069 654309 0 0 0 0 10 683328 683817 0 0 -1 -1 11 55670 55917 -1 0 -1 -1 11 98329 98568 0 0 1 1 11 113073 115127 0 0 0 0 11 152934 153173 0 0 0 0 11 196038 196284 0 0 0 0 11 213080 213385 0 0 0 0 11 235293 236577 0 0 0 0 11 256478 258327 0 0 1 1 11 301463 303017 0 0 -1 -1 11 329322 329571 0 0 1 0 208 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 11 388607 388902 0 0 -1 -1 11 416822 417055 1 0 1 1 11 447657 447892 0 0 0 0 11 454453 459197 0 0 0 0 11 516653 516902 0 0 0 0 11 528843 531477 0 0 0 0 11 581468 581706 0 0 1 1 11 611874 612107 0 0 0 0 11 642355 642602 0 0 0 0 12 75896 77618 0 0 -1 -1 12 91417 91659 -1 0 -1 -1 12 139293 140447 0 0 1 0 12 156646 156883 1 0 0 0 12 197913 199417 0 0 0 0 12 231179 231422 0 0 -1 -1 12 243527 243960 0 0 1 1 12 289220 289469 0 0 1 1 12 343577 344033 0 0 1 1 12 373156 373400 0 0 -1 -1 12 412668 412897 -1 0 -1 -1 12 426942 441942 0 0 0 0 12 458826 459141 0 0 -1 -1 12 493903 494987 0 0 0 0 12 512868 513117 0 0 -1 -1 12 602938 603155 0 0 -1 -1 12 622103 623937 0 0 0 0 12 644493 645767 0 0 1 1 12 659823 660072 0 0 0 0 12 684000 692000 0 0 0 0 12 729123 732237 0 0 -1 -1 12 744942 745179 0 0 -1 -1 12 794020 794269 0 0 1 1 12 821403 823002 0 0 1 1 12 888569 888810 0 0 1 1 12 927043 929292 0 0 1 1 12 947123 949407 0 0 0 0 12 1007180 1007470 0 0 0 0 12 1013789 1014017 0 0 0 0 12 1023967 1024207 0 0 1 1 209 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 12 1042798 1043952 0 0 0 0 13 31687 31935 -1 0 -1 -1 13 94216 94463 0 0 0 0 13 137299 137548 0 0 -1 -1 13 158846 159279 0 0 1 1 13 183793 184037 0 0 -1 -1 13 227023 227487 0 0 1 1 13 263062 263296 0 0 0 0 13 286782 287067 0 0 -1 0 13 370976 371221 0 0 -1 -1 13 420160 421983 0 0 0 0 13 431453 433827 0 0 0 0 13 468177 468468 0 0 0 0 13 480233 483637 0 0 0 0 13 503346 504087 0 0 0 0 13 535595 535843 -1 0 -1 -1 13 554392 554750 0 0 0 0 13 568128 569097 0 0 0 0 13 611273 611488 0 0 -1 -1 13 634479 634714 0 0 0 0 13 649307 649551 -1 0 -1 -1 13 687903 689917 0 0 1 1 13 758222 758470 0 0 0 0 13 772629 772878 0 0 1 1 13 805116 805338 0 0 0 0 13 815341 815567 -1 0 -1 -1 13 836823 838167 0 0 0 0 13 877553 879907 0 0 0 0 13 897804 898040 -1 0 -1 -1 14 61597 61894 0 0 1 1 14 89528 89802 0 0 0 0 14 125623 127807 0 0 0 0 14 169566 169804 0 0 0 0 14 196055 196291 0 0 1 1 14 250259 250506 0 0 1 1 14 279875 280108 0 0 1 1 14 321917 322210 0 0 -1 -1 14 342963 344117 0 0 0 0 14 348469 355469 0 0 1 0 210 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 14 412263 412493 1 0 1 1 14 449343 449588 0 0 0 0 14 498987 499232 0 0 1 1 14 545966 546201 0 0 0 0 14 561106 561384 0 0 -1 -1 14 567715 569020 0 0 0 0 14 609458 609706 0 0 0 0 14 635660 635901 1 0 1 1 14 691482 691727 0 0 0 0 14 712633 714697 0 0 1 1 14 737003 740237 0 0 0 0 15 35667 35903 0 0 -1 -1 15 72636 72872 0 0 0 0 15 85195 85444 -1 0 0 0 15 103253 104167 0 0 0 0 15 113843 114084 -1 0 -1 -1 15 154972 155462 0 0 0 0 15 166974 167220 -1 0 -1 -1 15 227481 228117 0 0 -1 -1 15 277529 277778 0 0 -1 -1 15 308463 310157 0 0 1 1 15 337279 337528 0 0 0 0 15 347633 348862 0 0 0 1 15 371693 372627 0 0 0 0 15 397773 399037 0 0 0 -1 15 436732 436966 -1 0 -1 -1 15 463698 464877 0 0 0 0 15 489645 490129 0 0 -1 -1 15 520113 521217 0 0 1 1 15 566409 566643 0 0 1 1 15 600885 600960 0 0 1 0 15 616913 618427 0 0 0 0 15 656632 656876 0 0 0 0 15 680823 683197 0 0 1 1 15 729739 729969 0 0 1 1 15 766617 766862 0 0 0 0 15 783344 783563 0 0 1 1 15 854173 856027 0 0 0 0 15 874190 874434 0 0 0 0 211 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 15 908288 908537 0 0 0 0 15 981454 981690 0 0 0 0 15 1003760 1018760 0 0 0 0 15 1022548 1023652 0 0 0 0 15 1053490 1053901 -1 0 -1 -1 16 42976 43212 0 0 0 0 16 73038 73283 0 0 0 0 16 89758 92975 0 0 0 0 16 116505 116765 0 0 1 1 16 161693 163447 0 0 1 1 16 178643 179827 0 0 0 0 16 194903 196097 0 0 -1 -1 16 210467 211723 0 0 0 0 16 241223 242267 0 0 -1 0 16 261168 262687 1 0 1 1 16 273918 274962 0 0 0 0 16 289483 289704 -1 0 -1 -1 16 303433 305027 0 0 0 0 16 316743 318397 0 0 0 0 16 331043 332607 0 0 1 1 16 369003 369737 0 0 0 0 16 384536 384784 -1 0 -1 -1 16 418132 418359 -1 0 -1 -1 16 427273 428757 0 0 0 0 16 456557 456805 0 0 0 0 16 498853 500557 0 0 0 0 16 511619 511940 -1 0 -1 -1 16 552403 554287 0 0 1 1 16 563822 564061 1 0 1 1 16 583493 585147 0 0 1 1 16 633868 634117 0 0 -1 -1 16 684383 684632 -1 0 0 0 16 695432 695681 0 0 1 1 16 728393 729207 0 0 0 1 16 749094 749341 0 0 0 0 16 776921 777152 -1 0 -1 -1 16 819153 819393 0 0 0 0 16 842646 842894 0 0 0 0 16 880854 881102 0 0 0 0 212 Table A.9: Continued Chr Start End Fkh1-Reg Fkh2-Reg Fkh1/2-Reg Fkh-Reg 16 932976 933223 0 0 0 0 16 940923 943157 0 0 -1 -1 213 Appendix B: Supplemental Figures Figure B.1: (A) Illustration of method proposed in [192] for normalization of \noisy" BrdU-IP-chip data. (A) rpd3 probes (from the \noisy" rpd3 dataset) plotted in the MA plane (ARS1 probes are indicated with green dots). (B) Each probe is plotted in the MA plane and a line of best t, which should run parallel to the slope of the background distribution, is employed as the x-axis on the modied MA plane. (C) Probes transformed onto the modied MA plane. Following this transformation a loess line is tted to probes within two standard deviations of the median M-value. (D) Probes plotted in the modied MA plane after the nal loess normalization step. 214 Figure B.2: (A) Probes from the \noisy" rpd3 dataset plotted in the MA plane. (B) The background probe subset plotted in the MA plane. The rst and second principal component axes are used as the new set of axes in the data rotation. (C) Probes plotted in the modied MA plane after data rotation. After this rotation a loess curve is tted to the probes within two standard deviations of the median M-value. (D) Probes plotted in the modied MA plane after the modied loess normalization. 215 Figure B.3: During within-array normalization non-enriched probes are identied as the largest set with a symmetry measure R R C = 2 standard deviation of R 1 ;R 2 ;:::;R 0:2N . R uctuates about 0 while only background probes are included in its calculation. When enriched probes begin to be included in its calculation, R incrementally increases. 216 Figure B.4: Raw M-values ofrpd3 probes plotted in the chromosomal plane (Chro- mosome XIII shown here). 217 Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome I shown here. 218 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome II shown here. 219 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome III shown here. 220 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome IV shown here. 221 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome V shown here. 222 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome VI shown here. 223 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome VII shown here. 224 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome VIII shown here. 225 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome IX shown here. 226 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome X shown here. 227 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome XI shown here. 228 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome XII shown here. 229 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome XIII shown here. 230 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome XIV shown here. 231 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome XV shown here. 232 Figure B.5: Continued Figure B.5: Cells were analyzed as described for Figure 2.2A. Left panels show read counts. Top panel displays counts from the earliest BrdU-pulse and the bottom panel displays the read counts from the latest pulse. This dataset was modeled as described for Figure 4.1 and simulated read counts are plotted on the right panels. Panels are organized vertically in the same manner as experimental counts. Chromosome XVI shown here. 233 Figure B.6: Nucleosome positions were identied as described for Figure 3.6B and C. (A) Density plot of nucleosomes around the ACS developed using all origins in asynchronous (left panel) and G1-arrested cells (right panel). (B) For each of two replicates nucleosome density curves were developed for origins in the top and bottom quartiles (as measured by HU) in asynchronous (top panels) and G1-arrested cells (bottom panels). 234 0 6 12 18 24 30 36 42 48 54 60 66 72 78 84 0 20 40 60 80 100 Time From !−factor Release Cells Budded (%) Figure B.7: Cells were released from G1-phase arrest and budding indices were cal- culated based on a 100 cell population at time-points 0; 1;:::; 84 minutes. 235 Figure B.8: Read counts corresponding to each BrdU-IP-pulse are plotted for both the real and simulated dataset. Real read counts were normalized to be proportional to the bulk IP-DNA extracted for each corresponding BrdU-pulse. Simulated read counts were normalized such that their global sum is equal to that of the read data 236 Figure B.9: WT and rpd3 cells were analyzed as described in Figure 5.1 legend; results for Chromosomes I-IV are shown. Individual peaks are denoted with gray dots, and peaks that are signicantly dierent in height (p 0:001) between the strains are denoted with red dots). 237 Figure B.9: Continued Figure B.9: WT and rpd3 cells were analyzed as described in Figure 5.1 legend; results for Chromosomes VII-X are shown. Individual peaks are denoted with gray dots, and peaks that are signicantly dierent in height (p 0:001) between the strains are denoted with red dots). 238 Figure B.9: Continued Figure B.9: WT and rpd3 cells were analyzed as described in Figure 5.1 legend; results for Chromosomes XI,XII,XIII and XV are shown. Individual peaks are denoted with gray dots, and peaks that are signicantly dierent in height (p 0:001) between the strains are denoted with red dots). 239 Figure B.9: Continued Figure B.9: WT and rpd3 cells were analyzed as described in Figure 5.1 legend; results for Chromosome XVI is shown. Individual peaks are denoted with gray dots, and peaks that are signicantly dierent in height (p 0:001) between the strains are denoted with red dots). 240 Figure B.10: Early S-phase replication proles identify Rpd3S- and Rpd3L regulated origins. set2,eaf3set2,rco1,rco1eaf3 (A),cti6,cti6eaf3 (B) and rpd3set2, rpd3eaf3 and rpd3cti6 (C) cells were analyzed as described in Figure 5.1 legend and the resulting data for Chromosome XIV are shown overlaid with the WT andrpd3 proles. Peaks meeting signicance criteria for initiation in rpd3 cells are indicated with green dots. 241 Figure B.11: BrdU peak-heights for each origin in eaf3set2 (A), eaf3rco1 (B), cti6eaf3 (C), rpd3set2 (D), rpd3eaf3 (E) and rpd3cti6 (F) mutant strains are plotted against the corresponding BrdU peak-heights in WT cells; peaks that are signicantly dierent in height from the WT are indicated with red dots. 242 Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome I shown here. 243 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome II shown here. 244 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome III shown here. 245 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome IV shown here. 246 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome VI shown here. 247 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome VI shown here. 248 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome VII shown here. 249 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome VIII] shown here. 250 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome IX shown here. 251 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome Xshown here. 252 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome XI shown here. 253 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome XII shown here. 254 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome XIII shown here. 255 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome XIV shown here. 256 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome XV shown here. 257 Figure B.12: Continued Figure B.12: Cells were analyzed as described in Figure 6.1A and B. Chromosome XVI shown here. 258 Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome I shown here. 259 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome II shown here. 260 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome III shown here. 261 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome IV shown here. 262 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome V shown here. 263 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome VI shown here. 264 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome VII shown here. 265 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome VIII shown here. 266 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome IX shown here. 267 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome X shown here. 268 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome XI shown here. 269 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome XII shown here. 270 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome XIII shown here. 271 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome XIV shown here. 272 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome XV shown here. 273 Figure B.13: Continued Figure B.13: Cells were analyzed as described in Figure 6.2A and B. Chromosome XVI shown here. 274
Abstract (if available)
Abstract
For cells to proliferate, the genome must be replicated exactly once per cell cycle in a timely and accurate manner. Making this task difficult are multiple other genomic processes, such as transcription and DNA repair, that are concurrently operating on the same genomic template. Replication initiates at specific loci called replication origins that must undergo a series of protein loadings before they can begin to replicate. Although this loading schedule takes place at all origins, individual origins fire at distinct and conserved times during S-phase. It has been suggested that origin firing schedules are defined by their propensity to attract rate limiting replication factors from limited pools (where origins with higher propensities replicate earlier and origins with lower propensities replicate later). This model has not been validated and, furthermore, the factors determining an origin's propensity to attract replication factors remains poorly understood. In higher eukaryotes, replication timing has been linked to epigenetic inheritance and genomic stability. Thus, determining which factors dictate origin timing schedules is important for understanding the mechanisms driving development and healthy cell proliferation.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
New tools for whole-genome analysis of DNA replication timing and fork elongation in saccharomyces cerevisiae
PDF
Forkhead transcription factors control genome wide dynamics of the S. cerevisiae replication timing program
PDF
Computational analysis of DNA replication timing and fork dynamics in S. cerevisiae
PDF
Forkhead transcription factors regulate replication origin firing through dimerization and cell cycle-dependent chromatin binding in S. cerevisiae
PDF
Distinct mechanisms of DDK recruitment to Fkh-activated and CEN-proximal origins control replication timing program in S. cerevisiae
PDF
The intra-S phase checkpoint and its effect on replication fork dynamics in saccharomyces cerevisiae
PDF
The function of Rpd3 in balancing the replicaton initiation of different genomic regions
PDF
The role of Cdc7 in replication fork progression in response to DNA damage
PDF
Quantitative modeling of in vivo transcription factor–DNA binding and beyond
PDF
3D modeling of eukaryotic genomes
PDF
The role of Rpd3 and Sir2 in regulation of replication initiation in budding yeast: Rpd3 acts directly on single-copy origins while Sir2 works through ribosomal DNA origins
PDF
Identifying allele-specific DNA methylation in mammalian genomes
PDF
Biochemical mechanism of TopBP1 recruitment to sites of DNA damage
PDF
Exploring three-dimensional organization of the genome by mapping chromatin contacts and population modeling
PDF
Different alleles of fission yeast mcm4 uncover different roles in replication
PDF
Structural and biochemical studies of large T antigen: the SV40 replicative helicase
PDF
The effects of methionine restriction on hepatitis B virus replication
PDF
Searching for mitogenic factors from the epicardium: PDGFA, Igf2 and more
PDF
Biochemical characterization and structural analysis of two hexameric helicases for eukaryotic DNA replication
PDF
Comparative analysis of DNA methylation in mammals
Asset Metadata
Creator
Knott, Simon Robert Vincent
(author)
Core Title
Measuring, modeling and identifying factors that influence eukaryotic DNA replication
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Computational Biology and Bioinformatics
Publication Date
11/09/2011
Defense Date
03/28/2011
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
chromatin,Modeling,OAI-PMH Harvest,replication origin,Transcription
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Tavaré, Simon (
committee chair
), Aparicio, Oscar Martin (
committee member
), Laird, Peter W. (
committee member
), Smith, Andrew D. (
committee member
)
Creator Email
knott@usc.edu,srvknott22@gmail.com
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-m3931
Unique identifier
UC1455409
Identifier
etd-Knott-4521 (filename),usctheses-m40 (legacy collection record id),usctheses-c127-478524 (legacy record id),usctheses-m3931 (legacy record id)
Legacy Identifier
etd-Knott-4521.pdf
Dmrecord
478524
Document Type
Dissertation
Rights
Knott, Simon Robert Vincent
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Repository Name
Libraries, University of Southern California
Repository Location
Los Angeles, California
Repository Email
cisadmin@lib.usc.edu
Tags
chromatin
replication origin