Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads
(USC Thesis Other)
Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
i Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads. by Fabian Seidl A Dissertation presented to the FACULTY OF THE USC GRADUATE SCHOOL UNIVERSITY OF SOUTHERN CALIFORNIA In Partial Fulfillment of the Requirements for the Degree DOCTOR OF PHILOSOPHY (MOLECULAR BIOLOGY) AUGUST 2019 Copyright 2019 Fabian Seidl ii Acknowledgements I would like to thank Ian Ehrenreich and my lab members for all their feedback and support. I would like also like to thank Karin and David Pfennig and member of their labs for collaborating with us in their wonderful animal system. I would further like to thank my committee members, Steven Finkel, Sergey Nuzhdin, and Eric Webb and members of the Finkel, Nuzhdin, and Dean labs. iii Acknowledgements ii List of Figures vi List of Supplementary Figures vi List of Tables vii List of Supplementary Tables vii Abstract 1 Chapter 1: Introduction 2 1.1 Genomic resources are relatively sparse in anurans. 2 1.2 The mechanics underlying evolution of genome size are still ambiguous. 2 1.3 Phenotypic plasticity is posited to underlie evolutionary novelty. 3 1.4 Reinforcement of interspecies barriers and adaptive hybridization are important forces in species formation. 3 1.5 Spadefoot toads are specially adapted to a desert environment 4 1.6 Spadefoot toads display phenotypic plasticity in the form of alternate developmental morphology 4 1.7 Sp. multiplicata and Sp bombifrons engage in adaptive hybridization in the wild 5 1.8 Goals of this dissertation. 6 1.9 Summary of the chapters. 6 Chapter 2: Genome of the rapidly developing, phenotypically plastic, and desert-adapted spadefoot toad 8 2.1 Abstract 8 2.2 Introduction 9 2.3 Results 12 2.3.1 Properties of the Sp. multiplicata genome 12 2.3.2 Genes with elevated copy numbers in Sp. multiplicata 13 2.3.3 Factors contributing to anuran genome size differences 15 2.3.4 Identification of genes showing evidence of positive selection 16 iv 2.3.5 Insights into adaptive hybridization from the transcriptome 18 2.4 Discussion 19 2.5 Materials and Methods 22 2.5.1 Genome assembly 22 2.5.2 Annotation of protein-coding genes 24 2.5.3 Gene ontology analysis 25 2.5.4 Copy number analysis 25 2.5.5 dN/dS analysis 26 2.5.6 Analysis of gene expression in pure species and their hybrids 28 2.6 Supplementary Material 30 Chapter 3: Hybrid gene expression in allopatry versus sympatry: implications for the evolution of genetic incompatibilities in interbreeding species 45 3.1 Abstract 45 3.2 Introduction 46 3.3 Results 50 3.3.1 Do Spea hybrids vary in gene expression? 50 3.3.2 Does variation among hybrids correspond to expression differences between species? 51 3.3.3 What are the impacts of sympatry versus allopatry on gene expression? 53 3.4 Discussion 55 3.4.1 Widespread gene expression differences among Spea hybrids 56 3.4.2 Implications for genetic incompatibilities between species 57 3.4.3 How can counteracting selective regimes operate in the same system? 58 3.5 Materials and methods 61 3.5.1 Sample production and preparation 61 3.5.2 Measurement of differential expression 63 3.5.3 Identification of genes showing expression differences among hybrids 63 3.5.4 Expression patterns among hybrids 64 3.5.5 Principal components analysis to contrast groups 65 3.6 Conclusion 65 3.7 Supplementary Material 66 v Chapter 4: Concluding Remarks 68 4.1 Impact of my work 68 4.2 Future directions 70 Appendix A: Trends of evolution of Escherichia coli under stressful conditions 83 Appendix B: Regulatory rewiring in a cross causes extensive genetic heterogeneity. 106 Appendix C: The complex genetic and molecular basis of a model quantitative trait. 124 Appendix D: The stress-inducible peroxidase TSA2 enables Chromosome IV duplication to be conditionally beneficial in Saccharomyces cerevisiae. 135 vi List of Figures Figure 2.1. Natural history of New World spadefoot toads. 11 Figure 2.2. Gene trees for genes with elevated copy numbers in Sp. multiplicata. 15 Figure 2.3. Factors contributing to anuran genome size differences. 16 Figure 2.4. Genes showing evidence of positive selection in two genera of spadefoots. 17 Figure 2.5. Gene expression analysis in Spea hybrids. 19 Figure 3.1. Heatmap of hybrid type expression values 51 Figure 3.2 Heatmap of expression values of four hybrid types as well as pure species 53 Figure 3.3 Distribution of cross types in principal component space based on expression of genes. 55 List of Supplementary Figures Supplementary Figure 2.1. Percent identity of peptides to the best match in the vertebrate SWISS-prot database. 30 vii List of Tables Table 3.1: Cross types analyzed in the experiment with respective abbreviations used throughout text and figures. 62 List of Supplementary Tables Supplementary Table 2.1 Read data collected for each species. 31 Supplementary Table 2.2 Assembly characteristics at major steps. Long reads greatly improved contiguity of the genome. 31 Supplementary Table 2.3 Repetitive DNA in Sp. multiplicata, as outputted by RepeatModeler. 32 Supplementary Table 2.4 Comparison of diploid anuran genomes. 33 Supplementary Table 2.5 Description of three genes with increased copy number in Sp. multiplicata relative to the other published anuran genomes. 33 Supplementary Table 2.6 Expansion of nodal in anurans. 34 Supplementary Table 2.7 Genes showing different patterns of molecular evolution in Scaphiopus and Spea. 36 Supplementary Table 2.8 Functions of genes under positive selection in Sp. multiplicata. 43 Supplementary Table 2.9 Results from gene ontology (GO) analysis of genes under positive selection in Spea. 44 Supplementary Table 2.10 Assembly statistics of short read libraries for Sp. bombifrons, Sc. couchii, and Sc. holbrookii. 44 Supplementary Table 3.1 Read counts from 3’ RNA-Seq for each cross type. 66 Supplementary Table 3.2 Observed distances in least-square means (above diagonals) and corresponding p values (below diagonals) among cross types in principal component space. 67 1 Abstract Frogs and toads (anurans) are widely used to study many biological problems, however, few have genome assemblies or annotations. This lack of resources limits the possible research that can be conducted in these systems. Research strategies that are commonplace in many systems, such as genome wide association studies or quantitative trait mapping, are not feasible without a reference. However, thanks to the continuing reduction in the cost of sequencing, it has become possible for small lab groups to generate high quality genomes and annotations for their model of choice. In chapter two I describe the genome sequencing, assembly, and annotation of a promising anuran system as well as limited biological insight into some of its attributes. In chapter three I describe a transcriptomic analysis of inter species hybrids in this system enabled by the genome described in chapter two. This transcriptomic analysis revealed large-scale patterns of differential expression unique to hybrids consistent with reinforcement of hybrid boundaries. 2 Chapter 1: Introduction 1.1 Genomic resources are relatively sparse in anurans. Anurans are found in a broad range of habitats, and are critical components of these ecosystems (1). Further, they display a large array of adaptations and variation. Xenopus tropicalis and its tetraploid relative Xenopus laevis, are widely studied and well annotated model organisms (2-4). X. tropicalis was also the first anuran to have its genome sequenced (5), with the X. laevis assembly following a short time later (6). Since then, three other anurans genomes have been released, the Tibetan Plateau frog (7), the American bullfrog (8), and the Cane toad (9). However, with a total of 7,000 known species of frogs and toads (10), much of the biodiversity of anurans remains unexplored. Expanding the range of genomic resources in this branch of the evolutionary tree will enable investigation of fundamental questions in biology. 1.2 The mechanics underlying evolution of genome size are still ambiguous. Amphibians have the largest range of genome sizes of any extant class of vertebrates, than 1 to more than 140 Gb (11). Genome size in eukaryotes has been shown to correlate with a large number of phenotypes, including cell size, cell proliferation rate and metabolic rate (12). However, unlike in prokaryotes, genome size does not seem to correlate with number of genes in eukaryotes (13). Recent work has also shown a correlation between developmental rate and genome size (14, 15). Although hypotheses invoking both selection and non-adaptive mechanisms have been proposed, ambiguity remains about the factors underlying the wide range of variation observed in genome size. To comprehensively answer this question, it is necessary to sequence more genomes. Comparative genomics of anurans, especially of those with small genomes and fast development times may shed light on the dynamics of this observed correlation. 3 1.3 Phenotypic plasticity is posited to underlie evolutionary novelty. Phenotypic plasticity refers to an organisms ability to change its phenotype in response to environmental conditions (16). Phenotypic plasticity has been posited as the first step in the evolution of novel characteristics (16) (17-19). An environmental change causes phenotypic responses in members of a population. Individuals, depending on their genotype vary in their response and are subject to selection. This selection will then produce a final phenotype. However, to study this process an example in a natural population is required. Further, genomic resources are necessary to gain insight into molecular basis underlying this plasticity and how selection refines the phenotype. 1.4 Reinforcement of interspecies barriers and adaptive hybridization are important forces in species formation. Interspecies hybridization, breeding between two individuals of closely related but distinct species, is maladaptive as it results in few, generally low fertility, or no offspring (20). Therefore, when closely related species end up in sympatric conditions they are subject to reinforcement, which leads to reproductive isolation. Reproductive isolation is believed to be driven by accumulation of genetic variants that are neutral or adaptive in each species individually, but maladaptive in combination in the hybrid. These are referred to as "Bateson-Dobzhansky-Muller Incompatibilities" (20-23). However, there is evidence that in some cases hybridization can be adaptive (24, 25). How BDMs affect gene expression in hybrids, and more broadly, how the opposing forces of reinforcement and hybrid vigor interact in natural populations is unknown. Studies of these dynamics require large amounts of genotyping, a reference to compare them to, as well as gene models for transcriptomic analyses. 4 1.5 Spadefoot toads are specially adapted to a desert environment Anurans generally inhabit temperate and tropical climates, however, some anurans have adapted to drier climates. Spadefoot toads are found in drier parts of deserts than any other anuran (26). Spadefoots fill a major gap on the anuran phylogeny, as the only representative of the Pelobatoidea superfamily found across North America, Europe, parts of Asia and North Africa. The Pelobatoidea represent an ancient clade of anurans that has toad-like bodies and frog-like faces, and can potentially provide valuable insights into anuran evolution. In addition, they have some of the smallest reported anuran genomes, they have 13 chromosomes with the entire genome estimated to be between 1 and 1.4 Gb (27, 28). Adult spadefoots reach about 5-8 cm in size (29), and subsist mostly on a diet of invertebrates, including flies, beetles, and termites (26, 30-32). The name "spadefoot" is derived from their "spade-like" hind limbs, which they use to burrow as deep as 91 cm underground in loose sandy soil (33). They spend the winters, and most of the year, estivating in these underground burrows (26, 31, 32), emerging for only a few weeks following warm rains to feed and breed in short-lived pools (29). Resulting from this, they have developed striking phenotypic plasticity which enables one of the shortest known development time of vertebrates, as little as 8 days (34). Spadefoot tadpoles can facultatively speed up (35-38) or slow down (35, 36, 39) development in response to the environment. 1.6 Spadefoot toads display phenotypic plasticity in the form of alternate developmental morphology Not only can spadefoots control the speed of their development, but they also display a form of diet dependent phenotypic developmental plasticity. Whereas most anuran tadpoles are omnivores and exhibit traits adapted for feeding on detritus and plankton (40), Spea tadpoles, when sharing an environment with fairy shrimp, have evolved the ability to express an alternative ‘carnivore’ ecomorph, which exhibit predatory traits (41). Carnivore morphs differ from omnivore morphs in several physical 5 characteristics. Carnivore morphs have large jaw muscles, notched, serrated mouthparts, as well as short, uncoiled intestines. Omnivore morphs have smaller jaw muscles, smooth mouthparts and long, coiled intestines (42). Surveys in natural populations, as well as experiments in lab have shown that carnivore development is largely determined by exposure to fairy shrimp in their diet (41). Although large omnivores can eat shrimp, they do so less efficiently (43), further carnivores also engage in cross species cannibalism, preferentially eating smaller Sc. couchii tadpoles (44). Carnivore frequency, as well as the extremity of the phenotype, varies amongst populations, between species, as well as within sibships (45, 46). In allopatry, Sp. multiplicata and Sp. bombifrons show similar frequency and extent of carnivore morphology (47). However, in sympatric populations Sp. multiplcata and Sp. bombifrons showcase a striking example of character displacement - Sp. multiplicata tadpoles are almost exclusively omnivore morphs whereas Sp. bombifrons tadpoles develop almost exclusively as carnivore morphs. Carnivore morphology is also influenced by a maternal effect, specifically, larger mothers produce larger eggs, leading to a greater expression of carnivore morphology in their offspring (48). This trait appears to be recently derived specifically in the Spea genus (49). 1.7 Sp. multiplicata and Sp bombifrons engage in adaptive hybridization in the wild Finally, Sp. multiplicata and Sp. bombifrons are able to hybridize and have been found to selectively do so in nature (25). Hybridization frequency in the wild ranges from 0 to 40%, differing across populations as well as across micro environments, increasing in smaller, shallower ponds (50). Sp. multiplicata tadpoles develop more rapidly than S. bombifrons, and hybrid tadpoles metamorphose more quickly than pure Sp. bombifrons tadpoles (50). Therefore, hybridization may enhance the odds of survival for offspring of Sp. bombifrons females. However, hybridization reduces fertility and fecundity of the resulting offspring (25), with observed sterility in males (51). Sp. bombifrons females have been shown to undergo shifts in mate choice, preferentially mating with Sp. 6 multiplicata males in environmental conditions where hybridization had the most benefit (shallow pools) (25). This has led to potentially contrasting selective pressures within hybridizing populations of Sp. multiplicata and Sp. bombifrons. The unique characteristics of Spea - specifically: fast development, a small genome, adaptive interspecies hybridization, and an abundance of standing genetic variation - make it an ideal model system for studying the evolution of genome size, phenotypic plasticity, as well as dynamics of reproductive barriers. 1.8 Goals of this dissertation. Although spadefoots, and the carnivore phenotype specifically have been thoroughly investigated and characterized from an ecological perspective, further research is limited by a lack of genomic resources. My goal as a PhD student has been to build these genetic resources by producing 1) a contiguous genome assembly, and more importantly 2) a set of high quality gene models. Using these resources I aimed to 3) investigate the history of this genome and 4) investigate the impact and signatures of interspecies hybridization. To answer these questions, I generated and analyzed second and third generation DNA sequencing libraries, as well as RNA sequencing data from a set of lab generated interspecies crosses. I used a large array of available bioinformatic tools as well as my own code to assemble and annotate a high quality haploid genome for a wild sampled, heterozygous Sp. multiplicata individual, and generate a set of high quality gene models. I then performed an array of analyses, including gene enrichment, dN/dS, differential gene expression, repetitive DNA prediction, and GO enrichment to investigate the genome. The resources and the insights that I have generated will enable further investigation of these questions with experiments already underway. 1.9 Summary of the chapters. In chapter 2, I describe work completed to sequence and assemble the genome of Sp. multiplicata using second and third generation sequencing technologies. Using this 7 draft genome I perform ab-initio gene prediction and generate a high confidence set of gene models. I search for and identify unique gene proliferation events in Sp. multiplicata compared to other anurans. I perform comparisons of short-read assemblies from Sp. bombifrons, Scaphiopus couchii, and Scaphiopus holbrookii to my gene models to look for evidence of recent positive selection on genes in the Spea branch. I then utilize RNA sequencing data from two individuals per species to characterize differential expression between Sp. multiplicata and Sp. bombifrons. In chapter 3, I investigate the expression profile of four categories of lab-generated interspecies hybrids from sympatric and allopatric populations of Sp. multiplicata and Sp. bombifrons. I find evidence of differential expression patterns not present between parent species, unique to hybrid individuals. I show larger variation in expression in hybrids from populations undergoing active interbreeding, consistent with known patterns of selection both for and against hybridization. This evidence suggests potentially both reinforcement and adaptive hybridization can occur in the same system. In chapter 4, I discuss the broader impact of my research and mention ongoing work to further investigate both the underpinnings of the 'carnivore' phenotype as well as interspecies hybridization. 8 Chapter 2: Genome of the rapidly developing, phenotypically plastic, and desert-adapted spadefoot toad This work is currently under review at Nature Communications. 2.1 Abstract Frogs and toads (anurans) are widely used to study many biological problems. Yet, few anuran genomes have been sequenced, limiting research on these organisms. Here, we produce a draft genome for Mexican spadefoot toads, Spea multiplicata, which represent an unsequenced anuran clade. Atypically for amphibians, spadefoots inhabit deserts. Consequently, they possess many unique adaptations, including rapid growth, prolonged dormancy, phenotypically plastic development, and adaptive, interspecies hybridization. We assembled and annotated a 1.07 Gb Sp. multiplicata genome containing 19,639 genes. By comparing this sequence to other available anuran genomes, we found gene amplifications in spadefoots and obtained insights into sources of anuran genome size differences. We also used the genome to identify genes experiencing positive selection or exhibiting expression differences among spadefoots. Completion of the Sp. multiplicata genome advances efforts to determine the genetic bases of spadefoots’ unique adaptations and enhances comparative genomic research in anurans. 9 2.2 Introduction With nearly 7,000 species (10), frogs and toads (anurans) occur across diverse habitats and exhibit a stunning array of adaptations (52, 53). Moreover, anurans are critical, but increasingly threatened, components of most ecosystems and thus serve as key bioindicators (1). Yet, despite their importance to fields from developmental biology and physiology to ecology and evolution, genomic resources are relatively scarce for anurans. Indeed, fewer genomes are available for anurans than for any other major group of vertebrates, with only five anurans sequenced thus far: the Western clawed frog, Xenopus tropicalis, and the closely related African clawed frog, Xenopus laevis (5), the Tibetan Plateau frog, Nanorana parkeri (7), the American bullfrog, Rana (Lithobates) catesbeiana (8), and the Cane toad, Rhinella marina (9). This paucity of genomes limits the use of anurans as model systems for many important biological questions, especially given the deep levels of divergence among the anurans whose genomes have been sequenced thus far (for most, > 200 million years ago (54)). Here, we present a draft genome of a New World spadefoot toad, the Mexican spadefoot toad, Spea multiplicata (Fig. 2.1a). New World spadefoot toads (hereafter, ‘spadefoots’) comprise seven diploid species, two of which––Scaphiopus holbrookii and Sc. hurterii––occur in relatively mesic eastern and central North America, and five of which––Sp. multiplicata, Sp. bombifrons, Sp. hammondii, Sp. intermontana, and Sc. couchii––inhabit xeric western North America. Crucially, relative to the other frogs and toads with published genomes, spadefoots fill a major gap on the anuran phylogeny, as the only representative of the Pelobatoidea superfamily that is found across North America, Europe, parts of Asia and North Africa. The Pelobatoidea are an ancient clade of anurans that has toad-like bodies and frog-like faces, and thus can potentially provide valuable insights into anuran evolution. Additionally, spadefoots have been reported to have some of the smallest anuran genomes, with most size estimates between 1.0 and 1.4 Gb (11). Comparatively, excluding the tetraploid X. laevis, the genomes of the other sequenced anurans, which are all diploid, range from 1.37 Gb for X. tropicalis (5) to 5.8 Gb for R. catesbeiana (8). 10 Spadefoots serve as important models in ecology and evolution, owing to their unusual ecology, rapid development, and striking phenotypic plasticity. For example, spadefoots cope with their arid habitat by burrowing underground (33) and estivating for a year or longer (26, 31, 32), emerging for only a few weeks following warm rains to feed and breed in short-lived pools (Fig. 2.1b, c) (29). Although these highly ephemeral pools are inaccessible to most anurans, spadefoot tadpoles can develop rapidly––in some cases metamorphosing in eight days post-hatching (Fig. 2.1d) (34). Spadefoots also exhibit multiple forms of phenotypic plasticity that further hasten their development and allow them to thrive in unpredictable environments (such as deserts) where rainfall is highly variable (55). Specifically, spadefoot tadpoles can facultatively speed up (35-38) or slow down (35, 36, 39) development in response to the environment. Additionally, whereas most anuran tadpoles are omnivores and exhibit traits adapted for feeding on detritus and plankton (40), Spea tadpoles can develop into an alternative ‘carnivore’ ecomorph, which develops enlarged jaw muscles and mouthparts for capturing and consuming large animal prey (Fig. 2.1e) (41). Finally, as adults, when breeding in shallow, rapidly drying ponds, Sp. bombifrons females preferentially mate with sympatric Sp. multiplicata males, thereby producing hybrid tadpoles that develop even faster than pure species tadpoles (25). 11 Figure 2.1. Natural history of New World spadefoot toads. a Mexican spadefoot toads, Spea multiplicata, possess numerous adaptations for dealing with desert conditions including a keratinized spade on their hind feet (arrow), which enables them to b burrow underground. c They emerge for only a few weeks each year to feed and to breed in temporary, rain-filled pools. d Spadefoot tadpoles exhibit rapid, and adaptively flexible, larval development (here, a metamorph emerges from a drying pond). e They also produce alternative, environmentally induced morphs: a slower developing omnivore morph (left) and a more rapidly developing carnivore morph (right), which is induced by, and specializes on, animal prey, such as fairy shrimp (center). In this chapter, we perform a combination of long- and short-read sequencing on Sp. multiplicata, and produce a draft genome for this species. By comparing this genome to other available anuran genomes, we identify several distinctive gene amplifications, as well as factors contributing to the substantial genome size variation found among anurans. We then leverage the Sp. multiplicata genome as a platform for exploring evolution in two ways. First, we produce short-read, whole genome sequencing data for 12 Sp. bombifrons, Sc. couchii, and Sc. hammondii, and obtain thousands of protein-coding gene sequences for these species by mapping the data against Sp. multiplicata gene models. This allows us to identify genes exhibiting different selection pressures in Spea or Scaphiopus, including positive selection, thereby providing insights into specific genes underlying adaptive evolution in these genera. Second, we generate transcriptome data for Sp. multiplicata and Sp. bombifrons tadpoles, as well as for tadpoles produced by hybridizing these species. We find that nearly 55% of all examined protein-coding genes are differentially expressed between the species. Hybrids exhibit intermediate levels of transcription for most of these genes, which, as we discuss below suggests that this widespread complementation of deleterious regulatory polymorphisms might explain why hybridization in spadefoots (25) promotes adaptive evolution and ecological range shifts (56). Together, our results demonstrate how the Sp. multiplicata genome can facilitate genetics and genomics research in spadefoots. Ultimately, such research promises to provide key insights into the distinctive phenotypes of these unique organisms. 2.3 Results 2.3.1 Properties of the Sp. multiplicata genome We generated a Sp. multiplicata draft genome using a combination of high- coverage long- and short-read sequencing (Supplementary Table 2.1), hybrid read assembly, and scaffolding of contigs against the X. tropicalis reference genome (Supplementary Table 2.2; Methods). We obtained a haploid assembly of ~1.07 Gb, which is consistent with historical, densitometry-based genome-size estimates (27, 28). The draft genome consisted of 49,736 scaffolds with an N50 of 70,967 bp and a maximum scaffold size of 60,197,306 Mb. Thirty-two percent of the assembly was comprised of repetitive DNA (Supplementary Table 3). We used Benchmarking Universal Single-Copy Orthologs (BUSCO) (57) to check draft genome completeness (Methods). Among the 978 genes in BUSCO’s metazoan database, 878 (89.8%) were 13 complete, whereas 47 (4.8%) were incomplete and 53 (5.4%) were absent, which is similar to the other recently published anuran genomes (7-9). After confirming a high level of genome completeness, we used the software package AUGUSTUS (58) to perform ab initio prediction of protein-coding genes (Methods). We then BLASTed the predicted proteins against the proteome of X. tropicalis and filtered them using multiple quality-control criteria (Methods). Comparisons of the proteome of Sp. multiplicata to those of the other five anurans suggests our approach yielded high quality models (Supplementary Figure 2.1). We thereby identified 19,639 protein-coding gene models, which were on average 1,370 bp excluding introns and 9,398 bp including introns. This number of genes in Sp. multiplicata is slightly lower than, but comparable to, the range of gene numbers reported for the four other diploid anuran genomes, which extends from 21,067 to 25,846 genes (5, 8, 9) (Supplementary Table 2.4). 2.3.2 Genes with elevated copy numbers in Sp. multiplicata We examined the gene content of Sp. multiplicata in greater detail. At a False Discovery Rate (FDR) of 0.05, Gene Ontology (GO) enrichment analysis failed to identify any significant gene content differences relative to the other diploid anuran genomes (Methods). However, comparison of these genomes revealed that three specific genes were amplified in Sp. multiplicata relative to the four other sequenced, diploid anurans (Supplementary Table 2.5; Methods). These were hyaluronan synthase (hyas), nodal (nod), and zona pellucida glycoprotein (zp3) (Fig. 2.2). Hyaluronan is a component of extracellular matrices, which play an important role in cell adhesion, differentiation, and migration throughout the body (59). Whereas all other sequenced anurans have one to three annotated copies of hyas, Sp. multiplicata has seven (Fig. 2.2a). As for nodal, this gene encodes a cytokine that plays a key role in mesoderm formation and body patterning during embryogenesis and development in deuterostomes (60-62). Vertebrates exhibit substantial diversity in nodal content: humans have just a single copy of nodal, zebrafish has three, and other sequenced, diploid anurans have nine or fewer (Fig. 14 2.2)(63). In comparison, we found evidence that Sp. multiplicata has at least 12 copies of nodal (Fig. 2.2b; Supplementary Table 2.6). When we compared the 12 Sp. multiplicata copies of nodal to those present in X. tropicalis, we found that all were most similar to xnr6. The nodal gene family has been shown to have expansions in X. tropicalis (5, 64). To further investigate this, we relaxed our criteria and included matches to all the nodals in the SWISSprot database. We found evidence for potentially as many as 24 copies of nodal, two of which were most closely related to xnr2 (Supplementary Table 2.6; Methods). Unlike most other nodal copies in Xenopus, xnr6 acts in a cell-autonomous manner (61) and plays a key role in mesoendoderm specification (65). In contrast, xnr2 acts later in development and is a mesoderm-inducing factor (66). Lastly, zp3 encodes a protein component of sperm-binding glycoproteins in the egg’s zona pellucida (67). While the four other diploid anuran genomes had four or fewer copies of zp3, Sp. multiplicata had nine (Fig. 2.2c). 15 Figure 2.2. Gene trees for genes with elevated copy numbers in Sp. multiplicata. We utilized Usearch(68) to identify gene enrichments in Spea compared to the other four anurans. We then retrieved copies of the gene in question for humans and zebrafish from Uniprot(69). Trees were built from alignments including only sites where all species had data. Distance is in number of substitutions per site. Node labels are shown for all nodes with at least 50% bootstrapping support over 100 iterations. a hyaluronan synthase 3 (hyas3; seven copies of Sp. multiplicata); b nodal (12 copies); and c zona pellucida glycoprotein 3 (zp3; nine copies). 2.3.3 Factors contributing to anuran genome size differences Spea multiplicata has a smaller genome than most anurans (Fig. 2.3a), including the four other sequenced diploid anurans (Supplementary Table 2.4). We sought to identify genomic features that explain these genome size differences (Methods). In our analyses, we excluded R. catesbiana because of its fragmented gene set, which is a result of its comparatively large and very repetitive genome (8). Among the other four assemblies, there is a more than two-fold range of genome sizes (Fig. 2.3b; Supplementary Table 2.4). Repetitive DNA content––i.e., percent of a genome assembly comprised of any class of repetitive DNA––exhibited a near perfect correlation with genome size (r = 0.99, p = 0.01; Fig. 2.3b). Repetitive DNA exhibits an almost two-fold range across the four species, suggesting it accounts for most, but not all, of the differences in genome size. We also found smaller contributors to genome size differences (Supplementary Table 2.4). Number of annotated genes was strongly correlated with genome size (ρ = 1, p < 16 0.0001), although we note that this feature is highly sensitive to assembly and annotation methods. Additionally, gene length, calculated as the number of exonic and intronic bases within a protein-coding gene, varied substantially across the four genomes. Sp. multiplicata has appreciably smaller genes (~9.4 kb on average) than the other species (≥ ~16.5 kb on average; Fig. 2.3c). These differences in gene size are driven by variability in intronic DNA, as the exonic portions of genes are approximately the same in the four species (~1.2 to ~1.3 kb). Figure 2.3. Factors contributing to anuran genome size differences. a Estimated genome sizes (Supplementary Note 1) among 284 species of anurans(11); arrow, estimated genome size of Spea from densitometry. b Assembly size and c gene and transcript lengths of Sp. multiplicata compared to three other sequenced, diploid anurans. 2.3.4 Identification of genes showing evidence of positive selection Spadefoots, in particular those in the genus Spea, exhibit remarkable adaptations, especially for living in desert conditions where most anurans would not survive. We therefore used the Sp. multiplicata genome as a resource for studying adaptive evolution across spadefoot species. To do so, we obtained short-read sequencing data for an additional Spea species, Sp. bombifrons, as well as for two of the closest outgroup species, Sc. holbrookii, and Sc. couchii (Methods). Using these data, we generated near complete four-species nucleotide and amino acid alignments for 1,967 single-copy, protein-coding genes (Methods). We then estimated dN/dS (ω) within each genus and tested whether genes showed evidence of different selection pressures in the two genera (Supplementary Table 2.7; Methods). 17 At an FDR of 0.05, 172 genes had significantly different ω values between Spea and Scaphiopus (Fig. 2.4, Methods). Of these, 26 genes (22 in Spea and 4 in Scaphiopus) exhibited evidence of positive selection in one of the genera, here operationally defined as ω > 2 (Methods). In every case, genes under positive selection in one genus were under purifying selection in the other. The remaining genes either showed evidence of neutral evolution in one genus but not the other, or exhibited differing degrees of purifying selection between the genera (Fig.2. 4). Figure 2.4. Genes showing evidence of positive selection in two genera of spadefoots. ω values for genes showing significantly different selection between sequenced Spea and Scaphiopus species. Genes with a ω >= 2 (horizontal dashed line) were considered as putatively under positive selection in this study. We found more genes (n = 22) that met these criteria in Spea than in Scaphiopus (n = 2). We determined the functions of the 22 genes that were under positive selection in Spea (Methods). These genes played roles in eye function, immune function, metabolism and digestion, oxygen transport, and smell (Supplementary Table 2.8). Gene ontology (GO) analysis of these genes identified 13 biological processes that were enriched (Supplementary Table 2.9), including: coenzyme biosynthesis, immune function, intracellular organization, lipid metabolism and transport, photoreceptor cell maintenance, and zinc ion transport. 18 2.3.5 Insights into adaptive hybridization from the transcriptome Finally, we leveraged the Sp. multiplicata genome to gain insights into the genomic factors that might contribute to adaptive hybridization that is observed between the Spea species, Sp. multiplicata and Sp. bombifrons (25). Specifically, we first performed 3’ RNA-seq on seven Sp. bombifrons and seven Sp. multiplicata tadpoles (Methods). The tadpoles had different parents that were sampled from distinct geographic locations, and allowed us to identify those genes that exhibit fixed expression differences between the two species. In total, we obtained measurements in all 14 tadpoles for 10,695 annotated genes (Methods). At an FDR of 0.05, we identified 5,865 genes (54.8% of all genes) that were differentially expressed between the species (Methods). Among these genes, 53.3% exhibited higher expression in Sp. bombifrons and 46.7% showed higher expression in Sp. multiplicata. On average, differentially expressed genes had a 3.8-fold difference in transcript levels between the two species. However, differences as small as 1.2-fold and as large as 117.7-fold were detected. Notably, many genes show sizable changes in transcription between the species; for example, 10% of all identified genes exhibited an at least 6.5-fold expression difference. We also used 3’ RNA-seq to measure the expression of the same 10,695 genes in 14 F 1 hybrid tadpoles produced by mating Sp. bombifrons and Sp. multiplicata adults obtained from multiple geographic locations (Methods). For 93.8% of all genes differentially expressed between the species, transcript levels were higher in the hybrid tadpoles than in the lower expressing parent-species tadpoles. When we focused on the 586 genes showing greater than 6.5-fold expression differences between the species, this proportion was even higher: 585 of 586 (99.8%) had transcript levels in the hybrid tadpoles that were above the lower expressing parent-species tadpoles. Indeed, hybrids exhibited expression levels close to the average of their parents (Fig. 2.5). 19 Figure 2.5. Gene expression analysis in Spea hybrids. Genes exhibiting differential expression between Sp. bombifrons and Sp. multiplicata are shown in these pure species and their hybrids. Genes expressed at a higher level in Sp. bombifrons are shown in a, while genes expressed at a higher level in Sp. multiplicata are shown in b. Each point represents the average expression level of a single gene across all samples in a given class. 2.4 Discussion The Sp. multiplicata genome advances research not only in New World spadefoot toads, but also in anurans more generally. Anurans are noted for their genome size variation, so they are powerful models for evaluating how and why genome size evolves (14, 70). Recent findings indicate that anurans’ continuous rate of genome size evolution is higher on average than other amphibian clades, and that life history—specifically 20 larval development time––is positively correlated with genome size (14). The small genome size and rapid development of Sp. multiplicata exemplifies this relationship. The genomic factors that contribute to variation in genome size remains an issue of active inquiry (71-73). Our study reveals that spadefoots have similar genomic content to other sequenced anurans, despite their smaller genome. Thus, genome size reduction in Spea appears to derive from diminished repetitive and intronic DNA, which is consistent with the prevailing hypothesis that genome size has undergone gradual change––as opposed to abrupt change––throughout much of amphibian evolutionary history (14, 74). As more amphibian genomes become available, greater insights will be attained into the evolutionary and genomic factors that contribute to genome size evolution. Because of their diverse adaptations (52, 53), anurans are also classic models in ecology, evolution, and development. The Sp. multiplicata genome will help provide additional insights into these fields. For example, we have found that spadefoots possess increased copies of genes involved in development and fertilization, most notably in the key developmental regulator nodal. Studies using X. tropicalis have shown that nodal paralogs exhibit different spatiotemporal expression patterns during development, which play roles in the formation of distinct tissues (60, 75). The numerous copies of nodal in Sp. multiplicata might contribute to this species’ remarkable phenotypic plasticity by assigning specialized functions to different copies during development and/or facilitating rapid bursts of transcription following abrupt changes in the environment (e.g., diet or pond volume) that allows alternative traits to develop quickly. Although entirely speculative at this stage, such gene proliferation could help resolve a paradox of phenotypic plasticity: namely, how key developmental pathways become environmentally sensitive without disrupting overall organism form and function (18). The Sp. multiplicata genome, along with other anuran genomes, will enable future work that can address this and related issues in evolution and development. As another example that the Sp. multiplicata genome facilitates new lines of ecological and evolutionary genomic research, we report a large-scale scan for genes showing evidence for selection in Spea and/or Scaphiopus. Both genera include desert- 21 adapted and extremely rapid developing species, but Scaphiopus cannot produce carnivore-morph tadpoles (Fig. 2.1e). We identified 26 genes (22 in Spea and 4 in Scaphiopus) exhibiting signatures of potential positive selection. Interestingly, genes under positive selection in one genus were exclusively under purifying selection in the other genus. This suggests that the two genera, while ecologically similar in many ways, are nevertheless experiencing, and responding to, distinct selection pressures. A key aspect of spadefoot biology that could be impacted by these putatively selected genes in Spea is the production of carnivores, which frequently feed on other tadpoles. Consumption of other tadpoles increases risk of pathogen transmission (76) and might thereby drive the observed positive selection on the immune function genes in Spea. The Sp. multiplicata genome will enable explicit testing of these hypotheses and allow for deeper investigation of the mechanisms underlying both adaptive evolution and the diversification of phenotypes among species that share the same environments. As a further example of how the Sp. multiplicata genome enables genomic research in an ecological and evolutionary model system, we analyzed gene expression differences between Sp. multiplicata and Sp. bombifrons. We found that more than half of the genes in the genome exhibit differential expression between these species, which are the most divergent in their genus (77, 78). The number of genes showing higher expression in Sp. multiplicata versus Sp. bombifrons was roughly equal, which is consistent with the notion that species differences accumulate via genetic drift. Yet, despite these genome-wide expression differences, Sp. multiplicata and Sp. bombifrons interbreed and produce viable hybrid offspring (25, 50, 79). Our finding that F 1 hybrid gene expression was intermediate for those genes that differed between Sp. multiplicata and Sp. bombifrons reveals at least one means by which hybridization could be beneficial. If lower gene expression is deleterious, hybridization would generate individuals that are heterozygous genome-wide and consequently experience large-scale complementation of the deleterious alleles accumulated within each respective species (80). For most of the genes in our study, any such complementation was partial and consistent with parental alleles behaving additively, as hybrids exhibited expression levels close to mid-parent values 22 (Fig. 2.5). Nevertheless, this complementation could explain, in part, the observed patterns of adaptive hybridization in Spea. Given that hybridization’s role in the origin and distribution of species remains a topic of keen interest (81, 82), the Sp. multiplicata genome provides a new resource for evaluating how genomic factors and ecological context interact to determine how and when hybridization is adaptive. In summary, spadefoots possess many striking ecological, evolutionary, and developmental features that are now possible to study at the genomic level. Moving forward, the genome described in this paper should provide a critical foundation for analyzing this substantial diversity within and between spadefoot species, as well as for more deeply understanding the mechanisms producing the features that distinguish spadefoots from other anurans. 2.5 Materials and Methods 2.5.1 Genome assembly We extracted high molecular weight DNA from a single adult by homogenizing liver tissue and using Qiagen 500G Genomic-tip columns. Additional tissue from the sample has been preserved and stored at the North Carolina Museum of Natural Sciences under the identifier NCSM8430. Three types of whole genome sequencing data were generated: Illumina, PacBio, and Oxford Nanopore. Illumina sequencing libraries were constructed with the Illumina Nextera kit. Five replicate Illumina sequencing libraries were prepared, multiplexed using barcoded adapters, and then sequenced at the USC Molecular Genomics Core on an Illumina NextSeq using the 150 bp paired-end kit. PacBio libraries were generated and sequenced on a Pacbio Sequel by the UC Irvine Genomics High-Throughput facility. We also generated Oxford Nanopore long read libraries, which we sequenced on two 2D DNA chips using a Mk1b Oxford Nanopore Minion. More information about the sequencing data is provided in Supplementary Table 2.1. 23 Assembly was performed using all long and short read sequencing data. After trying multiple assemblers, we found that MaSuRCA v3.2.1 (83) produced the most contiguous assembly. Specifically, we employed a kmer size of 51, a cgwError rate of 0.15, and a jellyfish hash size of 6x10 10 . Following completion of the assembly, duplicate contigs were identified using the LAST aligner (84) with default parameters. All contigs were mapped against all other contigs. If a contig was entirely contained within another contig, with the overlapping regions showing similarity ≥ 0.9, we classified the smaller contig as a duplicate. Such duplicates were removed from the assembly and excluded from all subsequent analyses. Assembly characteristics are described in Supplementary Table 2.2. We next ran RepeatModeler v1.0.4 (85) on the assembly, using default settings. Repetitive elements identified in the Sp. multiplicata genome were combined with RepDB volume 16, issue 12 (86). RepeatMasker v4.0.7 (85) with -e ncbi was then used to mask repetitive DNA. Characteristics of the repetitive DNA identified in the assembly are described in Supplementary Table 2.3. Anuran genomes have a slow rate of structural evolution (7). Thus, we attempted to scaffold the repeat-masked Sp. multiplicata contigs using the X. tropicalis genome as a reference. This was done in Chromosomer v0.1.4 (87) with a gap length setting of 100 bases. To assess the quality of the scaffolded assembly, we applied BUSCO v2.0 (57) to both repeat-masked contigs and scaffolds. Assembly completeness was assessed based on the metazoan gene set. The scaffolds showed significantly improved contiguity relative to the contigs (Supplementary Table 2.2), indicating that scaffolding improved the assembly. Lastly, we unmasked repeats in the scaffolds. This scaffolded assembly without repeat masking is what we refer to throughout the paper as the Sp. multiplicata ‘assembly.’ 24 2.5.2 Annotation of protein-coding genes We performed ab initio protein-coding gene prediction using Augustus v3.2.3 (58, 88). However, to first generate a training set for Augustus, we empirically annotated a subset of genes in Sp. multiplicata. To do this, we utilized previously generated full- length cDNA, RNA-seq data from tadpoles (89). RNA-seq reads were mapped to the assembly using Tophat2 v2.1.1 (90). We then extracted the nucleotide sequences of parts of the assembly covered by the RNA-seq data. Best matches for these sequences in X. tropicalis were obtained by comparing six-frame translations of the Sp. multiplicata data against X. tropicalis v9.1 peptides obtained from Xenbase (91) using Blastx v2.2.30 (92). Putative translation start sites were defined as the ATG in the Sp. multiplicata sequence closest to the beginning of the alignment. This resulted in a set of 1,478 empirically defined gene models for which the Sp. multiplicata peptide spanned ≥ 80% of the X. tropicalis match with ≥ 30% sequence identity and a putative translation start site was found. To train Augustus, we randomly split our empirically defined gene set into a training set (1,000 genes) and a verification set (478 genes). We then ran the Augustus etraining pipeline to estimate parameters that accurately described features of protein-coding genes in Sp. multiplicata. We further optimized the parameter estimates using the optimize_augustus.pl script. After estimating species-specific parameters, we ran Augustus to annotate protein- coding genes in the assembly. This yielded a total of 81,079 genes from our contigs and 42,671 genes from our scaffolds. The average gene size of the two sets was 316.9 and 365.8 peptides respectively. We took this as evidence the scaffolding enabled better gene prediction than the contigs. We compared the protein sequences predicted by Augustus against the X. tropicalis (91) database using global_search in Usearch v10.0.240 (68). The high confidence set of 19,639 genes described in the paper was obtained by taking all annotated protein-coding genes that were larger than 30 peptides, had ≥30% identity, 25 ≥30% target coverage, and ≥75% query coverage relative to their best X. tropicalis match. 2.5.3 Gene ontology analysis We obtained full gene sets for X. tropicalis, R. marina, N. parkeri, and R. catesbeiana from Xenbase (91) or NCBI. We assigned each gene in each species a uniprot ID (69) by determining its best match in a SWISS-prot database, retrieved from Uniprot, including anurans, human, and zebrafish. To generate these matches, we performed vsearch (93) global pairwise alignment, returning only the best match alignment showing ≥30% identity. We then assigned Biological Process gene ontology (GO) terms to each gene based on its best match uniprot ID. To determine if Sp. multiplicata showed enrichment or depletion of any GO term relative to the other species, we performed chi-square tests for each term (94). Specifically, we counted the number of genes with and without a given term in each species and compared these values using a series of pairwise chi-square tests. We then corrected for multiple testing using FDR (95) implemented with the qvalue function in R (96). A GO term was considered significant if all of its pairwise chi-square test results were significant at an FDR below 0.05 and if Sp. multiplicata had more (or fewer) genes with that term than each of the other species. If Sp. multiplicata was not significantly enriched or depleted for a given GO term relative to every other species, we did not consider the GO term significant in our overall analysis. 2.5.4 Copy number analysis We compared the peptide sequences of our gene models, as well as the peptide models of other published anuran assemblies and annotations against the vertebrate SWISS-prot (97) database. To decrease the likelihood of false positives, we first removed all genes with at least ≥30% identity and ≥90% target coverage. We then counted the number of matches in each species for each of the 18,341 genes in the database. We compared the counts across all five anurans and looked for cases of enrichment specific 26 to Spea (Supplementary Table 2.5). For the purposes of this paper, we defined a gene as amplified specifically in Sp. multiplicata if it was present in at least twice the number of copies in Sp. multiplicata compared to all the other sequenced anurans, with Sp. multiplicata having at least five copies. We retrieved peptide sequences of nodal, hyas, and zp3 from human and zebrafish from uniprot (69). We aligned the peptide sequences of all species using Muscle v3.8.31 (98) and removed all positions where any species had a gap. We used the R package phangorn (99) to calculate maximum likelihood models of amino acid substitution distance using optim.pml with model=WAG and stochastic rearrangement. To further investigate the nodal expansion, we included matches to all nodal related genes in the SWISSprot database and reduced our coverage requirement to ≥50% for all five anuran species (Supplementary Table 2.6), We used short read sequencing data to verify gene amplifications detected in the assembly. We mapped short read data from Sp. multiplicata against the gene alignments generated for our PAML analyses using bwa v0.7.12 (100) and extracted per base coverage information using the bedtools (101) genomecov module. We then divided the median coverage values of our enriched gene models by the median coverage across all gene models to generate a fold coverage measurement. To estimate the total number of gene models in each family we summed fold coverage. 2.5.5 dN/dS analysis We sampled, preserved, and stored at the North Carolina Museum of Natural Sciences a single adult from Sp. bombifrons (NCSM84228), Sc. holbrookii (NCSM842321), and Sc. couchii (NCSM84229). We used Qiagen 500G Genomic-tip columns to extract DNA from liver tissue. For each sample, we constructed between three and five replicate Illumina Nextera libraries using the same DNA but different tagmentation and PCR steps following standard protocols. Indexed libraries were then sequenced on an Illumina NextSeq using the 150 bp paired-end kit at the USC Molecular Genomics Core. 27 We performed traditional short read assembly using SOAPdenovo2 (102) on the data from Sp. bombifrons, Sc. couchii, and Sc. holbrookii. We trimmed short reads to remove low-quality bases using Trimmomatic (103) and then used the trimmed reads as inputs for the SOAPdenovo2 v2.04 assembler (102), and ran the assembler across a range of kmers (31, 41, 51, 61, 71, 81, 91, 101) for each species. A single contig set was selected for further use based on the assembly quality parameters, with an emphasis on a total assembly size of ~1 Gb (Supplementary Table 2.10). We aligned the contigs of these assemblies to the nucleotide sequences of our predicted genic models using lastal (84) with default settings. The contig with the best match score for each exon from each species was used to generate an alignment for each gene. Bases that were not covered by the top match contig were encoded as N. We were able to generate alignments for 25,382 genes with partial data from each species. We used Phylogentic Analysis by Maximum Likelihood (PAML) v4.9 (104) to estimate dN/dS (ω) for all genes alignments. We used the cleandata = 1 option to remove all sites where any species had 1 or more Ns in the codon. We utilized the same model on two distinct tree topologies. In the first model, a single ω was estimated across all branches. In the second model, two ω parameters were estimated, one for Spea (Sp. multiplicata, Sp. bombifrons) and one for Scaphiopus (Sc. couchii, Sc. holbrookii). We removed those genes with any ω estimates of 999 as well as those with fewer than 50 sites. We then further removed all genes without ≥ 80% of the gene model covered. This left a total of 1,967 genes. Significance of two ω models relative to one ω models was determined using likelihood ratio tests, with FDRs estimated as above (96). Typically, ω > 1 indicates positive selection, ω < 1 indicates purifying selection, and ω = 1 indicates neutral evolution. However, here, hypothesis tests were performed by comparing ω between Spea and Scaphiopus, rather than by comparing ω in a given species against a model where ω = 1 (104). To account for this, we used a more stringent threshold for positive selection (ω ≥ 2). Likewise, we also used a more stringent threshold for calling genes as experiencing purifying selection (ω ≤ 0.5). 28 We performed GO enrichment tests on genes operationally defined as experiencing positive selection in Spea. To do this, we compared the number of genes with or without a given GO term in the set of genes under positive selection in Spea to the number of genes with or without a given GO term in those genes not under positive selection in Spea. For each GO term, a chi-square test was performed using the chisq.test function in R and multiple testing correction was conducted on the entire set of tests in qvalue (96). We then manually explored information regarding the parent-child relationships of each term, the full description of the GO term, and terms with which a focal term frequently co-occurs using the QuickGO online resource(105) to determine potential biological relevance (‘Functional grouping’ in Supplementary Table 2.9). Note that, unlike other analyses in the chapter, terms reported as significant in this analysis were identified at an FDR threshold of 0.065. 2.5.6 Analysis of gene expression in pure species and their hybrids We sampled at least six tadpoles each of pure Sp. multiplicata, pure Sp. bombifrons, and of both types of interspecies hybrid combinations (i.e., female Sp. multiplicata x male Sp. bombifrons and female Sp. bombifrons x male Sp. multiplicata). We extracted RNA from 10-day old tadpoles using a combination of TRIzol Reagent and the Ambion PureLink RNA Mini Kit according to Levis et al. (106) and submitted the samples to Cornell Capillary DNA Analysis for preparation and sequencing of 3' RNA- seq libraries. We trimmed the resulting short reads for poly-A tails and adapter contamination using Trimmomatic (103), and aligned individual libraries, as well as pooled libraries, to our scaffolds using STAR aligner (107). 3’ RNA-seq only generates reads from the 3’ ends of transcripts, resulting in peaks of data (108). To identify transcription peaks, we selected the base with maximum coverge for each region with continuous coverage above 50. We then extracted coverage from each individual sample for each of these peaks. We normalized by library size (millions of reads) and log2 transformed the resulting values before performing digital normalization across all data. We calculated the mean-fold coverage for all peaks for pure 29 individuals and hybrids. Gene-specific ANOVA models were then used to identify genes showing significant differential expression between the species. For each gene, we fit the following model: expression = species + error, where expression corresponds to vector of log 2 expression measurements for the samples, species is a vector containing the species from which each measurement was taken, and error denotes the vector of residuals. These models were fit only to data from the pure species, using the aov function in R. P-values for the models were obtained and corrected for multiple testing using ‘qvalue’ (96), with a significance threshold of FDR ≤ 0.05. After identifying these differentially expressed genes in pure species, we then examined their log2 expression levels in hybrids. Mean expression levels for a given gene within a particular sample class were determined by computing the arithmetic mean of all measurements for that gene in the appropriate class using the mean function in R. 30 2.6 Supplementary Material Supplementary Figure 2.1. Percent identity of peptides to the best match in the vertebrate SWISS- prot database. All matches with ≥ 30% identity are shown. Supplementary Note 1. Conversion of C-value to gigabases. Genome size were converted from picogram to base pairs by the formula provided by Dolezel et al. . Supplementary Note 2. Rationale for excluding the Rana catesbeiana genome. Annotated peptides in Rana (Lithobates) catesbeiana are slightly more than half the size of genes in the other species (Supplementary Table 4), suggesting a highly fragmented assembly. We posit that this is a technical artifact as this genome reported 40.7% of genes in the Core Eukaryotic Genes Mapping Approach (CEGMA) as complete (8). 31 Supplementary Table 2.1 Read data collected for each species. Short read data were trimmed before any assembly process. Coverage was estimated based on our 1.07 Gb assembly size. Species Technology Reads Bases est. fold coverage Sp. multiplicata Illumina 447,040,259 134,112,077,700 125.2 Sp. multiplicata Pacbio 3,582,769 20,842,516,014 19.5 Sp. multiplicata Oxford Nanopore 430,065 952,110,011 0.9 Sp. bombifrons Illumina 569,784,662 170,935,398,600 159.7 Sc. holbrookii Illumina 254,070,350 41,990,135,448 39.4 Sc. couchii Illumina 90,655,530 27,196,659,000 25.4 Supplementary Table 2.2 Assembly characteristics at major steps. Long reads greatly improved contiguity of the genome. Duplicate contigs potentially produced by heterozygosity were removed (Methods). Given observed colinearity between the X. tropicalis and the Sp. multiplicata genomes, we performed a scaffolding of these contigs against the X. tropicalis genome assembly using Chromosomer (87). Assembly Kmer size Total size Number of sequences Longest sequence N25 N50 N75 Hybrid 51 1,169,449,223 111,151 401,788 63,738 26,214 9,418 Duplicates removed 51 1,091,640,347 84,984 401,788 68,199 29,771 11,387 Scaffolds ungapped 51 1,073,483,482 49,736 601,973,06 36,398,839 70,967 19,311 32 Supplementary Table 2.3 Repetitive DNA in Sp. multiplicata, as outputted by RepeatModeler.We also performed analysis to quantify overrepresented kmers in our short and long read set. We detected 2 elements in our kmer scans that showed evidence of enrichment: GGGGGTTATATTACTGTATACAGCGCT and CTGTCTCTTATACACATCTCCGAGCCCACG. These appear to be novel repeats, as we were unable to find a match for either of these sequences in the NCBI database. We broadly investigated the locations of repetitive elements across contigs as well as scaffolds. We found that repetitive elements were evenly distributed across the scaffolds with no obvious clustering. We found evidence that repetitive elements may have played a role in the fragmentation of the assembly as there was overrepresentation of simple AT repeats near the ends of contigs. We also found an enrichment of the second repetitive element discovered from kmer enrichment at the ends of contigs. Type Subset Number of elements Length (bp) Percentage of sequence SINEs 1,474 81,832 0.01 ALUs 4 280 0 MIRs 77 2,833 0 LINEs 218,428 58,280,930 4.98 LINE1 26,126 13,550,592 1.16 LINE2 51,747 11,458,129 0.98 L3/CR1 117,401 26,689,898 2.28 LTR elements 58,089 29,554,407 2.53 ERVL 555 96,592 0.01 ERVL-MaLRs 25 1,430 0 ERV_classI 5,256 2,224,606 0.19 ERV_classII 1,125 60,728 0.01 DNA elements 220,710 45,278,690 3.87 hAT-Charlie 7,350 606,067 0.05 TcMar-Tigger 564 77,182 0.01 Unclassified 1,196,208 215,660,787 18.44 Interspersed repeats 348,856,646 29.83 Small RNA 6,395 534,323 0.05 Satellites 4,496 576,825 0.05 Simple repeats 364,651 28,449,523 0.05 Low complexity 36,062 1,923,725 0.16 33 Supplementary Table 2.4 Comparison of diploid anuran genomes. Species Assembly size (Gbp) Repetitive content Genes Transcript length (bp) Gene length (bp) Reference Database N. parkeri 2.0 48% 23,408 1,382 42,081 Sun, Y.-B. et al NCBI R. catesbeiana 5.8 62% 22,204 720 7,144 Hammond, S.A. et al NCBI R. marina 2.5 64% 25,846 1,204 18,800 Edwards, R.J. et al GigaDB Sp. multiplicata 1.1 32% 19,639 1,370 9,399 This study X. tropicalis 1.4 35% 21,067 1,300 1,6500 Hellsten, U. et al Xenbase Supplementary Table 2.5 Description of three genes with increased copy number in Sp. multiplicata relative to the other published anuran genomes. Uniprot match N. parkeri R. catesbeiana R. marina Sp. multiplicata X. tropicalis Function HYAS3 2 1 2 7 1 Hyaluronic acid production, critical for cell surface NOD2A 2 1 2 12 5 TGF-β signal factor controlling developmental patterning ZP3 3 4 3 9 2 Zona pellucida glycoprotein, involved in egg sperm recognition 34 Supplementary Table 2.6 Expansion of nodal in anurans. Including the other members of the nodal gene family as well as reducing target coverage requirements to ≥50% reveals evidence of 24 copies of nodal in Spea multiplicata. 22 of these are most closely related to xnr5 and xnr6. The remaining 2 were most closely related to xnr2. Species Gene name SWISSprot match Gene length Match length Match coverage Percent identity N. parkeri nodal homolog NOD4A 405 402 100 51.4 N. parkeri nodal homolog NODAL 394 406 97 56.5 N. parkeri nodal 2-A-like NOD2A 386 405 95 51.6 N. parkeri nodal 2-A-like NOD2A 458 405 99 48.1 N. parkeri nodal 2-A-like NOD2B 382 384 93 46 N. parkeri nodal 2-A-like NODAL 378 406 92 36.5 R. catesbeiana PIO13389.1 NOD2A 223 405 55 49.6 R. catesbeiana PIO32278.1 NOD2A 516 405 94 53.9 R. catesbeiana PIO32279.1 NODAL 331 406 81 57.3 R. marina RMA_00017829 NODAL 704 403 96 52.2 R. marina RMA_00020594 NOD2A 471 405 76 52.2 R. marina RMA_00020595 NOD2A 414 405 94 52 R. marina RMA_00027748 NOD2A 1502 405 94 48.5 R. marina RMA_00029607 NOD2A 319 405 76 50.1 R. marina RMA_00040870 NOD4A 404 402 99 54.1 S. multiplicata g11187 NOD2A 365 405 90 46.8 S. multiplicata g11188 NOD2A 365 405 90 47 S. multiplicata g11568 NOD2A 365 405 90 47.3 S. multiplicata g11569 NOD2A 320 405 79 49.3 S. multiplicata g11677 NOD2A 365 405 90 46.8 S. multiplicata g16257 NOD2A 208 405 51 47.5 S. multiplicata g17861 NOD2A 234 405 58 41.8 S. multiplicata g20382 NOD2A 365 405 90 46.8 S. multiplicata g20383 NOD2A 365 405 90 47.3 S. multiplicata g22585 NOD2A 329 405 81 45.4 S. multiplicata g25137 NODAL 451 406 97 50.8 S. multiplicata g25860 NOD2A 365 405 90 46.8 S. multiplicata g2598 NOD2A 365 405 90 47 S. multiplicata g3034 NOD2A 284 405 70 47 S. multiplicata g30506 NOD2A 315 405 78 45.1 S. multiplicata g30507 NOD2A 365 405 90 46.6 S. multiplicata g34467 NOD2A 365 405 90 47.3 35 S. multiplicata g34468 NOD2A 365 405 90 45.6 S. multiplicata g34476 NODAL 367 403 91 53.4 S. multiplicata g3511 NOD2A 208 405 51 47.5 S. multiplicata g3513 NOD2A 316 405 78 43.9 S. multiplicata g38079 NOD2A 362 405 89 46.1 S. multiplicata g5200 NOD2A 365 405 90 45.7 S. multiplicata g9681 NOD2A 329 405 81 45.4 X. tropicalis LOC100488935 NOD3A 397 401 99 95.5 X. tropicalis LOC100491713 NOD3C 401 401 100 95 X. tropicalis LOC101732995 NOD3A 401 401 100 99 X. tropicalis LOC101734157 NOD2A 386 405 95 52.8 X. tropicalis LOC101734231 NOD2A 386 405 95 53.3 X. tropicalis nodal NOD3B 401 401 100 99.8 X. tropicalis nodal1 NODAL 403 403 100 99.5 X. tropicalis nodal2 NOD2A 405 405 100 89.9 X. tropicalis nodal5 NOD2A 386 405 95 53.6 X. tropicalis nodal6 NOD2A 382 405 93 58.6 36 Supplementary Table 2.7 Genes showing different patterns of molecular evolution in Scaphiopus and Spea. Significantly different selection pressures were determined if the two group (i.e., two genus) model was statistically superior to a single factor model according to a likelihood ratio test. ‘ω’ is the dN/dS value of a given gene under the specified category (i.e., single model or genus-specific model) and ‘-lnL refers to the loglikelihood of a given model design. gene X. tropicalis match Swissprot match Model 1 -lnL Model 1 ω Model 2 -lnL Model 2 ω Spea Model 2 ω Scaphiopus p-value q-value g10134 LOC100493382 UBP13_DANRE -602.8 0.1726 -590.4 2.8667 0.0095 6.6E-07 5.5E-05 g1015 tsc22d3 TPD52_HUMAN -515.9 0.1153 -511.6 0.8994 0.0001 3.3E-03 4.0E-02 g1017 tsc22d3 TPD52_HUMAN -512.8 0.1171 -508.0 0.8043 0.0001 1.9E-03 2.8E-02 g10677 tox4 TOX4_XENTR -570.9 0.0872 -564.8 1.1463 0.0001 5.1E-04 1.1E-02 g1071 snx12 SNX12_HUMAN -756.7 0.0093 -752.3 0.0001 0.1440 3.0E-03 3.8E-02 g10887 cs CISY_XENTR -1455.4 0.0296 -1449.9 0.2403 0.0040 9.9E-04 1.7E-02 g11371 fam167a F167A_HUMAN -923.4 0.2872 -916.5 4.0825 0.0371 2.0E-04 5.7E-03 g117 ccdc115 NH2L1_XENTR -1292.4 0.2590 -1282.6 1.6489 0.0623 9.8E-06 5.5E-04 g11731 LOC100498221 RDH16_HUMAN -1695.5 0.2383 -1687.1 0.6609 0.1609 4.3E-05 1.8E-03 g11732 taok2 TAOK2_XENLA -1106.9 0.0152 -1100.5 0.0263 0.0007 3.2E-04 7.5E-03 g11875 LOC100145107 HBB2_XENTR -826.0 0.5821 -804.9 10.0000 0.0371 8.1E-11 2.7E-08 g12141 snai3 SNAI2_HUMAN -1371.5 0.3387 -1366.6 0.0564 1.3937 1.7E-03 2.6E-02 g12484 b4galnt4 B4GN4_HUMAN -1247.4 0.0787 -1240.1 0.6056 0.0060 1.3E-04 4.4E-03 g12815 polr1d.2 RPAC2_XENLA -547.7 0.0842 -541.2 0.8065 0.0001 2.9E-04 7.3E-03 g12861 flnc FLNA_HUMAN -3610.0 0.0683 -3603.9 0.2654 0.0386 4.8E-04 1.0E-02 g12863 fam91a1 F91A1_HUMAN -1229.6 0.1270 -1222.8 0.9405 0.0081 2.3E-04 6.4E-03 g12892 fam131b F131B_HUMAN -1505.9 0.0598 -1496.5 1.1962 0.0001 1.6E-05 7.9E-04 g12893 fam131b F131B_HUMAN -1334.4 0.0663 -1320.6 0.1187 0.0599 1.4E-07 1.5E-05 g13150 LOC100492966 O52A1_HUMAN -1370.0 0.2126 -1361.5 3.3201 0.0291 3.6E-05 1.6E-03 g13767 atxn3 ATX3_HUMAN -843.8 0.0753 -839.7 0.0057 0.7380 4.0E-03 4.6E-02 g13992 not NOTO_XENLA -1099.8 0.1216 -1095.3 0.0056 0.6973 2.7E-03 3.6E-02 g13993 not NOTO_XENLA -1095.7 0.1097 -1087.2 0.0001 0.7263 3.5E-05 1.6E-03 37 g14393 grin2b NMDE2_XENLA -617.9 0.0855 -613.1 0.3895 0.0624 2.1E-03 2.9E-02 g14927 plaa PLAP_XENLA -916.4 0.1002 -911.7 0.0942 0.1686 2.0E-03 2.9E-02 g15077 gmpr2 GMPR2_HUMAN -1852.2 0.0484 -1845.0 0.0020 0.2632 1.4E-04 4.5E-03 g15087 tbc1d9b TBC9B_HUMAN -1434.7 0.0429 -1430.4 0.1970 0.0001 3.4E-03 4.2E-02 g15248 myo18b MY18A_HUMAN -435.4 0.4798 -428.7 0.0001 1.6539 2.6E-04 6.8E-03 g15278 ppcdc COAC_HUMAN -377.0 0.3873 -370.1 5.2004 0.0390 2.1E-04 5.9E-03 g15420 rnf34 RNF34_HUMAN -2171.4 0.0869 -2167.3 0.0274 0.3982 4.1E-03 4.8E-02 g15892 synj2 SYNJ2_HUMAN -489.0 0.1620 -483.8 2.8634 0.0485 1.3E-03 2.1E-02 g16767 zg16 ZG16_HUMAN -501.4 0.1211 -496.5 2.7921 0.0537 1.7E-03 2.5E-02 g16939 rpusd1 RUSD1_DANRE -464.0 0.1169 -453.5 1.2218 0.0001 4.9E-06 3.0E-04 g17467 hoxc4 HXD4A_DANRE -747.5 0.1579 -741.9 1.6533 0.0118 7.6E-04 1.4E-02 g18054 pgap3 PGAP3_XENLA -1307.3 0.1687 -1302.2 0.2369 0.0001 1.4E-03 2.2E-02 g18311 mafg MAFG_HUMAN -1201.1 0.0355 -1192.8 0.2080 0.0271 4.8E-05 1.9E-03 g1861 fermt2 FERM2_HUMAN -1610.0 0.0941 -1605.9 0.0001 0.1074 3.9E-03 4.6E-02 g18637 chmp2a CHM2A_XENLA -1010.4 0.0960 -986.6 1.7430 0.0205 5.1E-12 3.3E-09 g19099 spata6 SPAT6_HUMAN -764.9 0.2960 -757.5 1.7840 0.0098 1.3E-04 4.4E-03 g19401 cyb5r4 NB5R4_XENTR -1604.7 0.1527 -1597.9 0.0109 0.9682 2.3E-04 6.4E-03 g19524 pak3 PAK3_XENLA -785.3 0.0396 -772.2 1.5501 0.0026 3.2E-07 2.9E-05 g19551 atp1a1 AT1A1_XENLA -5412.5 0.0236 -5407.2 0.0555 0.0176 1.1E-03 1.9E-02 g19630 mfge8 EDIL3_HUMAN -996.0 0.1363 -989.5 0.6095 0.0076 3.2E-04 7.5E-03 g19914 ubr1 UBR1_HUMAN -2171.2 0.1929 -2166.8 0.5445 0.1231 3.3E-03 4.0E-02 g20142 fermt2 FERM2_HUMAN -3241.7 0.0835 -3235.6 0.0001 0.0930 4.7E-04 1.0E-02 g20308 fcn2 FCN1A_XENLA -1348.6 0.1600 -1343.9 0.3778 0.0906 2.1E-03 2.9E-02 g20505 nrxn3 NRX3A_HUMAN -714.7 0.0183 -701.5 0.3023 0.0043 2.9E-07 2.7E-05 g21205 nqo1 NQO1_HUMAN -1588.9 0.2178 -1581.4 0.0682 0.9483 1.1E-04 4.0E-03 g21206 nqo1 NQO1_HUMAN -1713.0 0.2201 -1707.0 1.0144 0.1221 5.2E-04 1.1E-02 g21659 dph5 DPH5_HUMAN -1497.8 0.0246 -1493.4 0.2441 0.0037 3.0E-03 3.8E-02 38 g21857 gnat1 GNAT1_HUMAN -591.7 0.1129 -585.8 1.5532 0.0198 6.0E-04 1.2E-02 g2204 dpf3 REQUB_XENLA -503.0 0.0522 -498.8 0.0001 0.3885 4.0E-03 4.6E-02 g2209 ttc9 TTC9A_HUMAN -896.1 0.1457 -891.7 0.4326 0.1279 3.2E-03 4.0E-02 g22100 pdzrn3 PZR3B_DANRE -1994.1 0.3186 -1985.0 3.3467 0.0402 2.0E-05 9.3E-04 g22198 pdgfa PDGFA_XENLA -416.5 0.7708 -411.3 0.0471 10.0000 1.3E-03 2.1E-02 g22638 znf516 REST_DANRE -458.8 0.1188 -453.8 0.0001 0.5813 1.6E-03 2.5E-02 g22710 phka1 KPB1_HUMAN -2636.1 0.0950 -2631.5 0.0167 0.1217 2.4E-03 3.3E-02 g23338 rab31 RAB31_HUMAN -338.5 0.2719 -334.2 1.3395 0.0001 3.7E-03 4.4E-02 g23556 coro2b COR2B_XENLA -2820.7 0.1865 -2811.5 0.0617 0.7571 1.8E-05 8.6E-04 g23737 polr2d AMERL_HUMAN -1188.9 0.1834 -1183.3 0.1807 0.1818 8.0E-04 1.5E-02 g24002 LOC100492966 CO3_HUMAN -3668.6 0.2279 -3653.9 0.0621 0.8954 5.7E-08 7.4E-06 g24008 LOC101735171 HBB2_XENTR -1593.2 0.3394 -1583.8 1.7070 0.0554 1.4E-05 7.3E-04 g24150 LOC100488377 UNC4_HUMAN -1316.2 0.2965 -1300.9 0.3072 0.3009 3.2E-08 4.5E-06 g24419 LOC105948455 SVBP_DANRE -1263.6 0.4560 -1258.2 0.3622 0.9036 9.8E-04 1.7E-02 g24561 krt5.2 K2C6A_HUMAN -1716.2 0.2213 -1711.2 0.7942 0.1626 1.6E-03 2.5E-02 g24936 LOC100497729 CNFN_XENTR -912.4 0.1444 -905.9 1.3415 0.0328 3.2E-04 7.5E-03 g25011 LOC100495065 MYH3_HUMAN -816.8 0.0469 -811.6 0.0055 0.4771 1.2E-03 2.1E-02 g25175 dync2h1 DYHC2_HUMAN -1112.5 0.0247 -1104.8 0.0079 0.2567 8.9E-05 3.4E-03 g25600 krt35 KRT36_HUMAN -2037.3 0.2710 -2027.6 0.8682 0.1554 1.0E-05 5.6E-04 g25634 LOC100493251 UN93A_XENLA -745.1 0.3669 -738.9 2.2970 0.0471 4.0E-04 8.8E-03 g2565 celf3 CELF4_XENTR -460.1 0.1317 -455.9 0.4971 0.0001 3.7E-03 4.5E-02 g25695 atp6v1a VATA_HUMAN -1996.7 0.0180 -1992.4 0.1728 0.0022 3.4E-03 4.1E-02 g25820 LOC100492966 CO3_HUMAN -5268.3 0.2141 -5244.8 0.0528 0.9982 7.3E-12 3.6E-09 g25939 LOC100488078 CP4AB_HUMAN -3145.2 0.1038 -3139.0 0.2699 0.0724 4.6E-04 9.9E-03 g25963 ankrd13b AN13B_HUMAN -2808.1 0.0971 -2797.7 0.8957 0.0001 4.9E-06 3.0E-04 g26118 syne1 SYNE1_HUMAN -3875.2 0.6134 -3839.4 7.8515 0.0654 2.6E-17 5.2E-14 g2651 cacna1e CAC1E_HUMAN -988.9 0.0839 -983.1 0.0001 1.2265 6.4E-04 1.2E-02 39 g26543 abca5 ABCA5_HUMAN -509.1 0.0718 -501.9 2.2252 0.0001 1.6E-04 5.0E-03 g27008 LOC101733681 CAC1D_HUMAN -3348.6 0.0335 -3341.3 0.1374 0.0216 1.2E-04 4.4E-03 g27017 glra3 GLRA3_HUMAN -2205.2 0.0517 -2189.5 0.8960 0.0159 2.1E-08 3.4E-06 g27186 gatad2a P66A_HUMAN -2213.7 0.0713 -2200.4 0.6299 0.0001 2.6E-07 2.6E-05 g27464 s100a11 S10AB_HUMAN -1238.0 0.1187 -1233.9 0.0001 0.1371 4.1E-03 4.8E-02 g27489 LOC105945936 CP2S1_HUMAN -1098.5 0.2816 -1087.3 0.0524 1.3001 2.2E-06 1.6E-04 g27758 spdef SPDEF_HUMAN -437.3 0.0001 -425.7 0.0001 0.0001 1.4E-06 1.1E-04 g28362 atad1 ATAD1_XENTR -540.5 0.0867 -533.9 0.0062 0.3063 2.9E-04 7.3E-03 g28462 wnt11r WNT11_XENLA -440.2 0.0323 -431.3 0.0001 1.3225 2.4E-05 1.1E-03 g28493 pak3 PAK3_XENLA -788.9 0.0384 -776.6 1.1660 0.0026 6.7E-07 5.5E-05 g28859 atad1 ATAD1_XENTR -2714.7 0.3692 -2703.8 1.8518 0.0651 3.3E-06 2.2E-04 g2917 ctdspl CTDSL_HUMAN -272.7 0.0114 -264.3 0.0001 0.4948 4.1E-05 1.7E-03 g29382 tmem17 TM17A_XENTR -817.2 0.0347 -812.4 0.2612 0.0001 2.0E-03 2.9E-02 g30097 gtf2h1 TF2H1_HUMAN -2796.1 0.1039 -2789.7 0.0001 0.1185 3.3E-04 7.7E-03 g30620 hpn HEPS_HUMAN -1608.6 0.0744 -1601.5 0.5601 0.0069 1.8E-04 5.3E-03 g30676 ctbp1 CTBP1_HUMAN -492.8 0.0436 -488.1 0.4517 0.0001 2.2E-03 3.0E-02 g31455 frmd8 FRMD8_XENTR -733.8 0.1820 -720.0 0.0001 0.1900 1.6E-07 1.6E-05 g31508 cacna1f CAC1D_HUMAN -1319.3 0.0367 -1312.6 0.2436 0.0115 2.7E-04 7.1E-03 g31526 nin IN80D_XENLA -442.6 0.1982 -437.6 0.0001 0.2072 1.7E-03 2.5E-02 g32315 LOC100493382 NDUV3_HUMAN -1821.8 0.2238 -1806.4 2.9167 0.0252 2.9E-08 4.3E-06 g32328 vps13a VP13A_HUMAN -1470.5 0.1483 -1466.0 0.0001 0.1785 2.7E-03 3.6E-02 g3316 chfr CHFR_XENTR -1905.9 0.1482 -1898.8 1.3083 0.0318 1.8E-04 5.3E-03 g33967 odf3 ODF3A_XENTR -1634.6 0.2078 -1629.4 0.0318 0.2667 1.3E-03 2.1E-02 g33970 odf3 ODF3A_XENTR -1973.5 0.2035 -1966.1 0.0149 0.2531 1.2E-04 4.4E-03 g34021 senp8 SENP8_HUMAN -1180.0 0.0943 -1174.1 0.8176 0.0085 6.2E-04 1.2E-02 g34491 LOC101731432 ELL2_HUMAN -791.8 0.0373 -775.8 0.0021 0.6603 1.5E-08 2.7E-06 g34891 LOC100497845 AGRB2_HUMAN -919.1 0.2297 -914.4 1.4074 0.0923 2.2E-03 3.1E-02 40 g35249 sec23a SC23A_XENTR -1338.5 0.0418 -1333.9 0.2198 0.0189 2.3E-03 3.1E-02 g35360 cacna1c CAC1C_HUMAN -643.5 0.0737 -635.7 0.0001 0.6559 8.1E-05 3.2E-03 g35409 pcid2 PCID2_XENLA -938.0 0.0546 -932.9 0.3950 0.0001 1.4E-03 2.2E-02 g3558 maats1 CFA91_HUMAN -1449.6 0.1321 -1442.4 1.4890 0.0081 1.5E-04 4.8E-03 g35613 tgfbr1 TGFR1_HUMAN -2147.7 0.0947 -2143.0 0.0001 0.1068 2.0E-03 2.9E-02 g35713 tmod4 TMOD4_HUMAN -555.4 0.0345 -549.1 0.1915 0.0001 3.7E-04 8.3E-03 g36089 LOC100498621 ZNT9_HUMAN -789.0 0.2576 -784.8 6.3764 0.0431 3.8E-03 4.5E-02 g36266 vash2 VASH2_HUMAN -1060.9 0.0366 -1056.2 0.0001 0.3981 2.2E-03 3.1E-02 g3637 atp6v1a VATA_HUMAN -1602.4 0.0132 -1597.0 0.1961 0.0001 9.6E-04 1.7E-02 g36977 faxc FAXC_XENTR -618.8 0.2051 -614.1 0.6438 0.0001 2.1E-03 2.9E-02 g37293 kif1a KIF1A_HUMAN -2722.5 0.0382 -2718.1 0.0001 0.0529 3.0E-03 3.8E-02 g37339 pid1 PCLI1_HUMAN -974.3 0.1429 -962.1 0.6888 0.0001 8.3E-07 6.5E-05 g38247 cog5 COG5_HUMAN -360.9 0.1943 -355.9 0.4231 0.1739 1.4E-03 2.3E-02 g38345 myo1e.1 MYO1E_HUMAN -569.4 0.0226 -562.3 0.0023 0.1306 1.6E-04 5.1E-03 g38924 cct7 TCPH_HUMAN -2188.0 0.0512 -2181.8 0.0218 0.0558 4.5E-04 9.8E-03 g38986 rpsa RSSA_XENTR -1379.9 0.0224 -1373.4 0.0001 0.2063 3.0E-04 7.5E-03 g39077 adam28 ID2B_HUMAN -3608.3 0.4204 -3601.4 1.1302 0.3204 2.1E-04 5.9E-03 g3909 cfap45 CFA45_HUMAN -1222.1 0.1520 -1216.3 0.0001 0.2057 6.8E-04 1.3E-02 g39132 leprotl1 LERL1_HUMAN -803.3 0.1015 -794.9 0.0096 0.8451 4.1E-05 1.7E-03 g39373 hbg2 HBB2_XENLA -1003.5 0.5434 -984.0 9.8853 0.0421 4.3E-10 1.1E-07 g39390 LOC100489851 CRG1_RANTE -1035.7 0.1042 -1028.9 2.6719 0.0568 2.4E-04 6.4E-03 g39581 spop SPOPA_XENLA -969.4 0.0182 -951.6 0.2276 0.0001 2.5E-09 4.9E-07 g39924 LOC100485772 FBCDB_XENLA -1087.6 0.2417 -1082.6 0.3316 0.2226 1.6E-03 2.5E-02 g40152 med20 MED20_XENLA -831.8 0.1053 -827.5 1.1428 0.0001 3.2E-03 4.0E-02 g40367 csmd2 CSMD2_HUMAN -707.6 1.3612 -702.0 10.0000 0.3277 8.4E-04 1.5E-02 g40593 cyfip1 CYFP1_HUMAN -952.0 0.0531 -947.0 0.0001 0.2856 1.5E-03 2.3E-02 g40599 tubgcp5 GCP5_HUMAN -980.9 0.1005 -975.8 1.5739 0.0082 1.3E-03 2.2E-02 41 g41140 ints4 INT4_XENLA -1599.7 0.0819 -1581.2 0.0001 0.0893 1.1E-09 2.5E-07 g41141 kctd14 KCD14_HUMAN -1175.6 0.0452 -1168.6 0.0024 0.6452 1.9E-04 5.5E-03 g41145 kctd14 KCD14_HUMAN -1392.0 0.0396 -1382.8 0.0019 0.5242 1.8E-05 8.6E-04 g41225 MGC107841 HBB2_XENTR -3924.2 0.2331 -3918.4 0.6642 0.1844 6.4E-04 1.2E-02 g41571 LOC100489684 CRG1_RANTE -1396.1 0.1045 -1391.0 0.2228 0.0650 1.3E-03 2.2E-02 g41647 LOC100497503 CP2C8_HUMAN -2302.1 0.3328 -2291.7 1.5118 0.1381 5.5E-06 3.3E-04 g41764 l3mbtl3 LMBL4_HUMAN -3767.1 0.1580 -3756.4 0.0376 0.5088 3.8E-06 2.5E-04 g42163 LOC101731432 RRP44_HUMAN -1101.0 0.1819 -1080.9 1.1150 0.0574 2.4E-10 6.8E-08 g42175 adam28 ATS10_HUMAN -1720.8 0.2210 -1714.9 0.7435 0.1456 5.9E-04 1.2E-02 g42416 MGC89248 RDH16_HUMAN -615.4 0.0844 -609.6 4.6101 0.0165 6.9E-04 1.3E-02 g42511 LOC100497729 CNFN_XENTR -912.4 0.1444 -905.9 1.3415 0.0328 3.2E-04 7.5E-03 g4290 r3hdm2 SYVN1_DANRE -364.7 0.0975 -360.5 0.0203 1.5389 3.5E-03 4.3E-02 g4514 trim9 TRIM9_HUMAN -1624.4 0.1203 -1619.5 0.0001 0.1554 1.7E-03 2.6E-02 g4578 uba6 UBA6_HUMAN -1085.5 0.1232 -1080.8 0.0089 0.5184 2.1E-03 2.9E-02 g4863 kif5b KINH_HUMAN -1933.3 0.0361 -1929.0 0.1135 0.0234 3.4E-03 4.2E-02 g4890 pgc PEPC_HUMAN -786.4 0.1618 -781.9 10.0000 0.1274 2.7E-03 3.6E-02 g5141 LOC100485974 RHG39_HUMAN -554.3 0.0230 -549.8 0.0001 0.1066 2.8E-03 3.6E-02 g520 cdip1 CDIP1_XENLA -1007.4 0.2020 -999.9 0.0438 0.2331 1.1E-04 4.0E-03 g5259 dock9 DOCK9_HUMAN -379.5 0.0491 -374.3 0.0001 10.0000 1.3E-03 2.2E-02 g5286 sspo ATL5_HUMAN -1814.3 0.2044 -1800.3 0.0372 1.5788 1.2E-07 1.4E-05 g538 sec31b SC31B_HUMAN -889.5 0.1219 -878.4 0.6886 0.0001 2.4E-06 1.7E-04 g620 LOC100497460 C2G1P_HUMAN -2906.4 0.2300 -2901.9 0.5790 0.1963 2.7E-03 3.6E-02 g6283 chmp2a CHM2A_XENLA -1014.3 0.0895 -991.2 1.3879 0.0203 1.1E-11 4.4E-09 g6323 LOC100145082 ST1C4_HUMAN -2047.1 0.1320 -2037.6 0.0616 0.6063 1.3E-05 7.1E-04 g6358 brinp2 BRNP3_HUMAN -932.6 0.2829 -922.7 10.0000 0.0373 8.5E-06 4.9E-04 g6678 taf2 TAF2_DANRE -999.2 0.1600 -994.8 0.0145 1.5835 3.0E-03 3.8E-02 g6875 nr6a1 NR6A1_XENLA -594.7 0.2180 -588.3 0.0117 1.2940 3.5E-04 7.9E-03 42 g7180 tmem55b PP4P1_XENLA -525.1 0.1423 -518.1 0.0126 2.4106 1.8E-04 5.3E-03 g7391 dscaml1 DSCL1_HUMAN -558.9 0.1046 -554.1 0.0337 0.5132 2.0E-03 2.9E-02 g7555 cog1 COG1_HUMAN -660.2 0.5069 -653.6 0.0137 2.7297 2.9E-04 7.3E-03 g7561 LOC100489851 CRG1_RANTE -1963.5 0.1078 -1959.4 0.5086 0.0902 4.4E-03 5.0E-02 g7620 hbg2 HBB2_XENLA -1006.6 0.5573 -981.2 10.0000 0.0417 9.8E-13 9.7E-10 g7688 smim18 SIM18_HUMAN -764.2 0.1566 -758.2 0.9261 0.0267 5.3E-04 1.1E-02 g7766 acox1 ACOX1_HUMAN -735.1 0.0930 -727.8 1.1980 0.0001 1.3E-04 4.4E-03 g7772 unc13d UN13D_HUMAN -824.6 0.1862 -819.4 0.0241 0.7904 1.2E-03 2.1E-02 g80 myh3 MYH2_HUMAN -10165.9 0.0626 -10159.8 0.1270 0.0555 4.9E-04 1.0E-02 g800 hsd17b10 HCD2_HUMAN -2313.3 0.0267 -2307.4 0.0001 0.2910 5.9E-04 1.2E-02 g8010 LOC105947758 CAD23_HUMAN -531.5 0.0928 -524.9 2.8005 0.0117 2.9E-04 7.3E-03 g8401 slc25a40 S2540_XENTR -1113.5 0.1337 -1107.8 0.9791 0.0301 7.4E-04 1.4E-02 g9435 rab11a RB11A_HUMAN -1137.6 0.0139 -1130.4 0.0942 0.0065 1.5E-04 4.8E-03 g9487 pabpn1l EPA2A_XENLA -1163.2 0.1299 -1156.8 0.6734 0.0240 3.7E-04 8.3E-03 g9818 LOC100493382 C4BPA_HUMAN -676.4 0.1924 -662.1 3.5302 0.0086 8.4E-08 1.0E-05 43 Supplementary Table 2.8 Functions of genes under positive selection in Sp. multiplicata. Functions were determined by exploring the gene description on Xenbase or, if no description was available, by exploring the description on the Uniprot database. ‘NA’ indicates that no meaningful description could be found. Spea gene Generic function X. tropicalis match Swissprot best match g8010 cell adhesion LOC105947758 CAD23_HUMAN g16767 digestive system related zg16 ZG16_HUMAN g32315 electron transport chain LOC100493382 NDUV3_HUMAN g39390 eye formation LOC100489851 CRG1_RANTE g15278 glycolysis ppcdc COAC_HUMAN g40367 immune response csmd2 CSMD2_HUMAN g9818 immune response LOC100493382 C4BPA_HUMAN g36089 intracellular homeostasis LOC100498621 ZNT9_HUMAN g26118 intracellular organization syne1 SYNE1_HUMAN g26543 lipid metabolism abca5 ABCA5_HUMAN g42416 metabolism MGC89248 RDH16_HUMAN g15892 nervous system development synj2 SYNJ2_HUMAN g6358 neural development brinp2 BRNP3_HUMAN g4890 protein metabolism pgc PEPC_HUMAN g10134 protein modification LOC100493382 UBP13_DANRE g22100 protein modification pdzrn3 PZR3B_DANRE g39373 oxygen transport hbg2 HBB2_XENLA g7620 oxygen transport hbg2 HBB2_XENLA g11875 oxygen transport LOC100145107 HBB2_XENTR g13150 smell LOC100492966 O52A1_HUMAN g11371 NA fam167a F167A_HUMAN g25634 NA LOC100493251 UN93A_XENLA 44 Supplementary Table 2.9 Results from gene ontology (GO) analysis of genes under positive selection in Spea. ‘Functional grouping’ was determined by exploring the term’s page on QuickGO (see Methods for more details). Term χ 2 p q Term description Functional grouping GO:0009108 88.454 0.012 0.065 coenzyme biosynthetic process enzyme synthesis GO:0015937 88.454 0.013 0.065 coenzyme A biosynthetic process enzyme synthesis GO:0045494 88.454 0.014 0.065 photoreceptor cell maintenance eye function GO:0045959 88.454 0.013 0.065 negative regulation of complement activation, classical pathway immune function GO:1903027 88.454 0.014 0.065 regulation of opsonization immune function GO:0006997 88.454 0.011 0.065 nucleus organization intracellular organization GO:0090286 88.454 0.013 0.065 cytoskeletal anchoring at nuclear membrane intracellular organization GO:0090292 88.454 0.014 0.065 nuclear matrix anchoring at nuclear membrane intracellular organization GO:0034375 88.454 0.010 0.065 high-density lipoprotein particle remodeling lipid and protein metabolism GO:0010745 88.454 0.011 0.065 negative regulation of macrophage derived foam cell differentiation lipid metabolism GO:0043691 88.454 0.011 0.065 reverse cholesterol transport lipid metabolism GO:0033344 88.454 0.013 0.065 cholesterol efflux lipid metabolism GO:0006829 88.454 0.013 0.065 zinc ion transport zinc transport Supplementary Table 2.10 Assembly statistics of short read libraries for Sp. bombifrons, Sc. couchii, and Sc. holbrookii. Species Kmer size Total size Number of sequences Longest sequence N25 N50 N75 Sp. bombifrons 101 956,925,739 3,134,405 7,759 734 444 217 Sc. couchii 51 1,111,494,020 869,0179 2,463 346 187 85 Sc. holbrookii 81 1,015,691,853 4,300,914 4,686 664 385 160 45 Chapter 3: Hybrid gene expression in allopatry versus sympatry: implications for the evolution of genetic incompatibilities in interbreeding species This work is currently under review at Molecular Ecology. 3.1 Abstract Interbreeding species often produce hybrids with lower fitness than their parents. Such low hybrid fitness is often attributable to genetic incompatibilities between parental genomes. These incompatibilities can be driven either by fixed allelic differences between, or by standing variants that segregate within, hybridizing species. Here, we used gene expression in hybrids as a proxy for potential incompatibilities. We examined gene expression at 10,695 protein- coding genes in hybrids and pure-species types across populations of two interbreeding spadefoot toad species, Spea multiplicata and Spea bombifrons. Hybrid types varied in expression at 35% of all genes, with 48% of identified genes showing no differential expression among pure-species types. Critically, population had significant effects on gene expression variability among hybrids: hybrids produced by matings between regularly interbreeding populations showed substantially more expression variation than hybrids produced by matings between non-interbreeding populations. This finding is consistent with known patterns of selection both for and against hybridization, and potentially explains how reinforcement and adaptive hybridization can unfold in the same system. Altogether, our results indicate that segregating genetic variation can cause incompatibilities to vary in their occurrence across hybrids involving the same parent species. This variation can, in turn, have important implications for the evolutionary maintenance––or breakdown––of reproductive barriers between species. 46 3.2 Introduction Hybridization between species can lead to offspring that exhibit reduced fitness (e.g., reduced survival or fertility) relative to pure-species types (20, 24, 109). Because reduced hybrid fitness constitutes a barrier to gene flow between species and helps maintain species boundaries, understanding the causes and evolution of reduced hybrid fitness is a focus of speciation research (20, 81, 110) Reduced hybrid fitness can result from differences in gene expression between hybrids relative to parents, which may then lead to maladaptive phenotypes (e.g., reduced growth, low fertility; 110, 111, 112-119). At the genetic level, these changes in gene expression are likely to result from deleterious epistatic interactions between genetic variants in the pure-species genomes (such interactions are referred to as "Bateson-Dobzhansky-Muller Incompatibilities", hereafter "BDMs"; 20-22, 23). Thus, genes that are differentially expressed in hybrids may be indicators of BDMs and may enable BDMs to be studied even in instances where the loci underlying the BDMs are unknown (22). Although alleles that contribute to BDMs in hybrids are often assumed to be fixed in parent species, this does not have to be the case (21, 120). In some instances, loci that contribute to BDMs might remain polymorphic in one or both parent species (21, 120-122). In this latter scenario, the nature and extent of BDMs between hybridizing species should depend on the standing genetic variation present in the specific populations that interbreed. Consequently, selection might act on any variation in the nature or extent of BDMs, thereby reducing the deleterious effects of hybridization (109, 123-127). The outcome of such selection could be reduced BDMs in populations that experience greater hybridization rates relative to those that experience lower hybridization rates, because hybrids are more frequently exposed to selection in populations where hybridization is more common. Yet, when BDMs generate low fitness hybrids, selection favors traits in the parent species that prevent the production of hybrids in the first place (128, 129). 47 This process, termed reinforcement, enhances isolation between species and ultimately results in a decline of hybridization over time (130-132). The conditions under which reinforcement occurs remains an open issue (20, 129). Generally, reinforcement is less likely to occur as the fitness costs of hybridization decline (133), so reinforcement could break down if hybrid fitness increases owing to natural selection reducing BDMs as described above. Here, we address the above issues by evaluating whether gene expression varies among hybrids derived from different populations of the same two species. In particular, we contrast hybrids generated by interbreeding allopatric pure- species parents (where hybridization does not occur) with hybrids generated by interbreeding sympatric pure-species parents (where hybridization is on-going). Doing so allowed us to infer how gene expression (and any possible BDMs) might evolve in sympatry. We used as our study system spadefoot toads, Spea bombifrons and S. multiplicata, which hybridize in the southwestern USA. Hybrids are viable but F 1 males are sterile and females are less fecund (50, 134, 135). Introgression between the two species occurs because hybrid females will breed with pure-species males (136, 137), and subsequent cross types appear fertile and capable of reproducing (50, 56, 79, 136, 138). Hybridization and levels of introgression between the two species are highly variable across populations: proportions of F 1 hybrid tadpoles have historically been as high as 30% in some sites where the species interbreed, with proportions of mixed-ancestry tadpoles ranging from 0 to 60% (79, 136, 139). Previous work shows that relative hybrid fitness depends on which species is maternal (50). When S. multiplicata is maternal, F 1 hybrids have lower survival than either S. bombifrons or S. multiplicata tadpoles, whereas when S. bombifrons is maternal, F 1 hybrids survive as well as pure-species types (50). In addition to these inherent differences in survival, hybrid tadpoles of both types have intermediate growth rates between the faster developing S. multiplicata and slower 48 developing S. bombifrons tadpoles (50). Because faster development is favored in highly ephemeral ponds where tadpoles develop (tadpoles that do not reach metamorphosis quickly enough die when the pond dries), hybrids have higher fitness relative to S. bombifrons tadpoles but not S. multiplicata tadpoles (25, 50). The combination of fitness components generates opposing patterns of selection on hybridization behavior in females of the two species (50). Because hybrids with S. multiplicata mothers have slower development than pure-species tadpoles, along with lower survival and reduced fertility, S. multiplicata females are under selection to avoid hybridization. Consequently, female mate choice shows hallmark patterns of divergent mating behaviors between sympatry and allopatry, as is expected if reinforcement has occurred (140, 141). Critically, reinforcement has been attributed to the decline of hybridization over time in some populations (131). In contrast to S. multiplicata females, S. bombifrons females benefit by hybridizing in ephemeral ponds where hybrids have a fitness advantage relative to pure S. bombifrons tadpoles (25, 50). As a consequence of these environment- dependent fitness effects, S. bombifrons females facultatively hybridize with S. multiplicata males in ephemeral ponds, but not long-lasting ponds (25). This context-dependent behavior appears to have evolved in sympatry, because allopatric females do not discriminate between conspecific and heterospecific males (25). Female hybrids can backcross to either species (137), resulting in introgression and the presence of mixed-genotype individuals in populations where hybridization occurs (50, 56, 79, 136, 138). Thus, hybridization, as well as the interbreeding of hybrids and pure-species types, results in ongoing selection in sympatry, but not allopatry. Spadefoots are ideally suited for evaluating how differential gene expression in hybrids evolves because: they naturally hybridize, hybridization is favored by natural selection in some (but not all) contexts, and hybridization varies across populations. Previous work on Spea tadpoles showed that roughly 49 55% of all genes exhibit fixed expression differences between S. multiplicata and S. bombifrons (142). For these genes with fixed species differences, hybrids expression levels are intermediate on average relative to pure-species types. Such intermediate expression could result in intermediate fitness of hybrids if gene expression levels correspond to fitness. For example, gene expression differences could underlie the different development times between the species (highlighted above); the intermediate development time of hybrids generates low hybrid fitness relative to S. multiplicata but high hybrid fitness relative to S. bombifrons (50). Variation in gene expression among different Spea hybrid types could have important consequences for how reproductive isolation evolves and is maintained. We therefore evaluated whether hybrid types vary in gene expression. To do this, we generated first-generation (F 1 ) hybrids using parents from: 1) allopatric populations that have not experienced hybridization; and 2) sympatric populations where hybridization has been ongoing. The former cross type mimics hybrids produced during initial contact between the two species, whereas the latter cross type enabled us to examine gene expression in a context where selection has had an opportunity to act upon hybrids. We find that 35% of all examined protein- coding genes vary in their expression across different hybrid types. Comparison of hybrids produced by allopatric and sympatric matings reveals that gene expression variability is greater among hybrids produced by sympatric matings, despite the fact that allopatric populations do not interbreed. Our results suggest that ongoing selection on genetic variants influencing gene expression can arise between interbreeding species, with important consequences for reproductive isolation between them. 50 3.3 Results 3.3.1 Do Spea hybrids vary in gene expression? Our hybrid types, which differed in both maternal cross direction and population of origin (Table 3.1), showed substantial variation in gene expression in the 3’ RNA-seq data. Of the 10,695 protein-coding genes measured in the transcriptome, 35% (3,248 genes) were significantly different among the hybrid types at an FDR of 5%. One explanation for this variation is that population-level variation in the pure-species types generates variation that is also reflected in the hybrids. However, this was not the case. At an FDR of 5%, we found no such differences in any genes between populations of origin for the pure-species types. This pattern suggests that segregating variation within populations can impact gene expression in hybrids without affecting gene expression within pure-species types. Given that hybrids vary in gene expression at 35% of the transcriptome, we examined what patterns of variation emerged, if any. Using K-means clustering on the hybrid data, we identified two main patterns of expression among the hybrids (Figure 3.1). In particular, 55% of the genes that varied among the hybrids (1,785 genes) exhibited a pattern in which expression was highest in BMs and MBa, intermediate in BMa, and lowest in MBs (Figure 3.1). The remaining 45% of the genes that varied among the hybrids (1,463) exhibited expression that was highest in MBs, intermediate in BMa, and lowest in BMs and MBa (Figure 3.1). Because the sympatric hybrids (BMs and MBs) occur in the extremes in both clusters, this pattern supports the possibility that selection in sympatry (where hybrids actually occur) has shaped how segregating variation affects gene expression in hybrids. 51 Figure 3.1. Heatmap of hybrid type expression values (log2(fold coverage) scaled by Euclidian distance from row mean) for 3,248 genes showing significant differential expression. Each row corresponds to one gene and each column corresponds to one sample. Numbers provided to the left denote cluster membership as determined by K-means clustering. Colors associated with cross types at top of heatmap correspond to those used in Figure 3.3. 3.3.2 Does variation among hybrids correspond to expression differences between species? In past work, we showed that 55% of the transcriptome exhibits fixed differences between S. bombifrons and S. multiplicata (142). Here, we examined the overlap between genes that show fixed expression differences between species and genes that exhibit variation in expression among hybrids. We found that 52% (1,675 genes) of the genes showing expression variation among hybrids also differed in expression between the pure-species types. The remaining 48% of genes that varied in expression among hybrid types (1,573 genes) exhibited no detectable difference between the two pure-species types. Thus, of all the genes that varied in expression among the hybrid types, there was no significant 52 difference in the number that showed differential expression between the pure- species types versus the number that did not differ between the species (binomial test, p > 0.05). This pattern suggests that variation in gene expression can emerge in hybrids regardless of whether expression differences exist between the species. We determined if the 1,675 genes that significantly varied among both the hybrid and pure-species types exhibit any specific expression patterns. We found four clusters of distinct expression patterns among these genes when all hybrids and pure-species samples were considered (Figure 3.2). In two of these clusters (which combined to a total of 1,044, or 63%, of the genes), expression levels in MBs were similar to S. bombifrons, whereas expression levels in BMs and MBa were similar to S. multiplicata (Figure 3.2). What distinguished the two clusters was expression in the two pure-species types: in one of these clusters, 532 genes (i.e., 51% of the 1,044 genes) exhibited higher expression in S. bombifrons relative to S. multiplicata; whereas in the second of these clusters, 512 genes (i.e., 49% of the 1,044 genes) exhibited lower expression in S. bombifrons relative to S. multiplicata (Figure 3.2). In the remaining 631 (37%) of the 1,675 genes examined in this analysis, MBs hybrids displayed expression patterns on par with S. multiplicata, whereas BMs and MBa were on par with S. bombifrons (Figure 3.2). Of these 631 genes, 341 (54%) showed higher expression in S. multiplicata relative to S. bombifrons; and 290 (46%) showed lower expression in S. multiplicata relative to S. bombifrons (Figure 3.2). For all 1,675 genes, BMa exhibited intermediate expression both between S. multiplicata and S. bombifrons and between the more extreme expression levels of the other hybrid types (Figure 3.2). Thus, of those genes that significantly varied among the hybrid and pure-species types, all hybrid types except BMa exhibit expression levels that resemble one of the pure-species types. At these genes, MBs resembles S. bombifrons versus S. multiplicata at a ratio of roughly 2:1, while BMs and MBa show the opposite ratio. 53 Figure 3.2 Heatmap of expression values of four hybrid types as well as pure species (log2(fold coverage) scaled by Euclidian distance from row mean) for 1,675 genes showing significant differential expression. Each row corresponds to one gene and each column corresponds to one sample. Numbers provided to the left denote cluster membership as determined by K-means clustering. Colors associated with cross types at top of heatmap correspond to those used in Figure 3.3. 3.3.3 What are the impacts of sympatry versus allopatry on gene expression? We performed PCA on the genes that vary among hybrids and are also differentially expressed between species (Figure 3.3a). In this analysis, the first two PCs explained the majority of the variation in the data (56.9%). Not surprisingly, these PCs distinguished S. bombifrons and S. multiplicata samples (Figure 3.3a; Supplementary Table 3.2). Hybrids were generally intermediate to the pure-species. However, consistent with the above results (Figures 3.1 & 3.2), three groups of hybrids were visible: BMa, MBs, and the pair of BMs and MBa. When we compared distances among all groups, we found that BMa was not significantly different from any other cross type; MBs differed from all other 54 groups except BMa; MBa differed from BBa; and BMs differed from both MM types and from BBa, but not BBs (Figure 3.3a, c; Supplementary Table 3.2). Notably, the distance between the sympatric hybrid types (MBs vs. BMs) and the distance between the MB cross types (MBs vs MBa) were similar in magnitude to the distance between the pure-species types (BB vs MM) (Figure 3.3a, c; Supplementary Table 3.2). Indeed, the sympatric hybrids (BMs and MBs) were more different from each other than the allopatric hybrids (BMa and MBs). These results show that: 1) hybrids can exhibit large-scale differences in gene expression that are similar to those seen between species; and 2) hybrids can become more dissimilar in sympatry where hybridization occurs. We also applied PCA to the genes that varied in expression among hybrids, but not between pure-species types (Figure 3.3b). In this case, the first two PCs again explained the majority of the variation (53.6%). As expected, PC space was primarily characterized by differences among hybrids (Figure 3.3b). We found that MBs hybrids were significantly different from all other groups, and BMa hybrids were different from all groups except MMa (Figure 3.3d; Supplementary Table 3.2). Neither MBa or BMs hybrids were significantly different from the pure- species types (Figure 3.3d; Supplementary Table 3.2). For this set of genes, the distance between the sympatric hybrids (BMs and MBs) was similar to that between the allopatric hybrids (BMa and MBa). However, the nature of change among the groups differed between sympatry and allopatry: relative to BMa, BMs primarily changed along PC2, whereas relative to MBa, MBs changed along PC1. Thus, different sets of genes were driving the variation in expression between the cross types in sympatry relative to allopatry. 55 Figure 3.3 Distribution of cross types in principal component space based on expression of genes. a where there was variation among hybrids and differential expression between parent species, and b where there was variation among hybrids, but parent species were indistinguishable. In a and b, points represent individual replicates for each cross type; polygons are for illustration only to indicate groups by cross type. Heatmap of the magnitude of pairwise distances among groups are given in c and d, and correspond to data in a and b, respectively. In c and d, bold outlines denote which groups were significantly different from each other. 3.4 Discussion Using spadefoot toads and their F 1 hybrids, we evaluated gene expression in hybrids derived from two population types: sympatry, where hybridization occurs, and allopatry, where hybridization has not occurred. The latter treatment serves as a proxy for “first contact” between our two focal species S. multiplicata and S. bombifrons, which naturally hybridize in many of the areas where they co- occur. The former treatment represented samples from populations where selection could have acted to reduce introgression (via reinforcement) and/or to mitigate deleterious patterns of differential expression in hybrids relative to pure-species types (when hybrids––F 1 or any advanced generation crosses––are exposed to 56 selection). Essentially, by contrasting hybrids generated in this way, we could infer how gene expression in hybrids and the pure-species types has evolved in sympatric populations, where interbreeding regularly occurs. 3.4.1 Widespread gene expression differences among Spea hybrids We identified 3,248 protein-coding genes that exhibit expression differences among Spea hybrids produced from sympatric versus allopatric matings (Figure 3.1). This represents a large proportion of the transcriptome (35%) that varies in expression among hybrids. Genetic incompatibilities (i.e., BDMs) that disrupt gene expression in hybrids are often assumed to derive from fixed differences between the hybridizing species (21, 22). We found that nearly half of the genes that varied among hybrids did not show detectable expression differences between the pure-species types, whereas half of the genes that varied among hybrids also differed between species. Thus, variation in hybrid gene expression is not necessarily tied to the presence or absence of fixed expression differences between species and may reflect underlying segregating variation in one or both species. One striking feature of our results is that the hybrids do not necessarily group according to population type. Three of the four hybrid types (MBs, MBa, and BMs) exhibited expression levels that generally resembled one of the parent species whereas BMa was intermediate in expression between the two Spea species (Figure 3.2). Interestingly, MBs more often resembled S. bombifrons than S. multiplicata, but MBa and BMs were more likely to resemble S. multiplicata than S. bombifrons (Figure 3.2). If the genes whose expression varies among hybrids are co-adapted, expressing two-thirds of these genes at a comparable level to one parent and one-third of these genes at a comparable level to the other parent could be deleterious. Alternatively, hybrids resembling, or being intermediate between, the pure-species may mean that hybrids do not suffer widespread disruption in gene expression that reduces fitness. Differentiating between these 57 possibilities requires a better understanding of the relationship between gene expression and fitness in this system. 3.4.2 Implications for genetic incompatibilities between species To the extent that differences in gene expression between hybrids and pure- species types reveal possible BDMs, our results highlight the potential for genetic incompatibilities to vary and evolve in different populations. We evaluated two elements of the hypothesis that BDMs can evolve in sympatry: 1) segregating variants in the hybridizing species should generate variation in BDMs among hybrids; and 2) hybrid gene expression should become more similar to pure- species expression in sympatry relative to allopatry. As we highlighted above, our hybrid types showed widespread variation in expression relative to each other and the pure-species types, and the nature of variation among hybrids was consistent with segregating variation within each of the pure-species types. Any such segregating variation within the pure-species can generate variation in the strength and pattern of incompatibilities produced in hybrid offspring and subsequent hybrid/hybrid or pure-species/hybrid crosses (21). This situation could produce phenotypic variation among hybrids upon which selection can act to purge deleterious allelic combinations or favor the evolution of modifier alleles that improve hybrid fitness (109, 123-127). At the level of gene expression, this hypothesis predicts that hybrids should become more similar in expression to the pure-species types in sympatry where species naturally hybridize and interbreeding among pure-species and hybrids expose hybrid combinations to selection than when hybrids are produced by parent species from allopatry––where species do not hybridize. For some of the genes we identified, our results for hybrids produced by S. bombifrons females are consistent with the prediction that BDMs should be ameliorated in sympatry. Specifically, in our PCA analysis of the 1,573 genes that varied among hybrids, but were similar between the two species, we found that 58 BMa hybrids were significantly different from the pure-species types but BMs hybrids were not (Figure 3.3b, d). Thus, BMs hybrids showed greater convergence with the pure-species types in this set of genes. If the overall expression differences (as captured by the PCA) between hybrids and pure-species types corresponds to BDMs that reduce fitness in hybrid types, then our findings suggest that it is possible for BDMs to become evolutionarily reduced and hybrid fitness evolutionarily enhanced in sympatry. At the same time, the expected pattern was contradicted in other patterns of expression. In this same set of genes where BMs became more like the pure- species types, the opposite pattern held for the MB cross types: MBa hybrids were more similar to the parents than MBs hybrids (Figure 3.3b, d). Moreover, for those 1,675 genes that differed among hybrids and the pure-species types (Figure 3.3a), we found weak evidence supporting the prediction that hybrids should become more like the pure-species: although BMs converged on the expression pattern of BBs, this hybrid type became more different from MMa, MMs, and BBa (Figure 3.3a, c). Moreover, MBs was distinct from all pure-species types, but this was not the case with MBa. Taken together, the evidence supporting the possibility that BDMs can be ameliorated is therefore mixed in our study. Indeed, the results with MBs, which differ consistently from the other groups in their combined gene expression (Figure 3.3), suggest that BDMs can become exaggerated in sympatry relative to allopatry. Ultimately, whether or not these patterns of expression constitute postzygotic incompatibilities that do––or do not––serve as reproductive isolating mechanisms depends on the resulting fitness and phenotypes of the hybrids relative to the pure-species types. 3.4.3 How can counteracting selective regimes operate in the same system? Whether BDMs are ameliorated, exaggerated, or neither in sympatry, could explain how selection might both favor and disfavor hybridization if the 59 expression differences we observed among hybrids are associated with fitness. Indeed, S. bombifrons and S. multiplicata females can be under opposing patterns of selection to avoid hybridization depending on the environment in which hybrids develop (50). These opposing patterns of selection may be generated, in part, by the distinct patterns of gene expression in the MBs versus BMs hybrid types. In particular, female S. multiplicata are under strong selection to avoid mating with S. bombifrons males, because MBs hybrids have lower survival, poorer growth, slower development and reduced fertility (50, 51, 135). The lower fitness of MBs hybrids might reflect two aspects of gene expression that we observed in previous work (142) and in this study. First, previous work shows that, for a majority of the transcriptome, the two Spea species show differential expression and hybrid expression is intermediate between the two species at these genes (142). For additive traits such as development time, an intermediate phenotype could result in low fitness relative to one of the pure-species, as is the case for MBs hybrids relative to pure S. multiplicata. Second, although MBs hybrids resemble one of the pure-species types for many of the genes that varied in expression among hybrids (Figure 3.2), the combined pattern of expression (Figure 3.3) might produce low fitness, especially if co-adapted genes are not appropriately expressed as a set. That MBs stand out relative to pure-species types and the other hybrids in expression (Figures 3.1-3.3) and in terms of low fitness (50) is consistent with this possibility. When selection disfavors hybrids with low fitness, reinforcement is expected arise, and previous work with S. multiplicata is consistent with reinforcement operating (131, 140, 141). Generally, reinforcement theory assumes that BDMs, which generate low hybrid fitness, arise when species that possess fixed differences initially come into secondary contact. The possibility that evolution in sympatry might actually accentuate BDMs (especially in the MBs cross; Figure 3.3a) would increase the likelihood of reinforcement occurring. Reinforcement is most likely when both the costs and risk of hybridization are 60 high, but it can be impeded by gene flow between species (20, 128, 129, 143). If the fitness costs of hybridization become increasingly severe in sympatry, reinforcing selection could be maintained even in the face of introgression. Generally, models of reinforcement do not consider the possibility that hybrid fitness might evolve (hybrid fitness could also improve; see below). Our findings suggest that additional work is needed to better understand how the evolution of BDMs in sympatry might alter patterns of reinforcing selection and the way in which reinforcement might change over time. Unlike S. multiplicata females, S. bombifrons females enhance their fitness by hybridizing. Despite their reduced fertility, BMs hybrids develop faster and grow as well or better than S. bombifrons tadpoles (25, 50). As with MBs hybrids, the fitness of BMs hybrids might also reflect patterns of gene expression. First, the intermediate expression by hybrids described previously (142), might confer a fitness advantage if this expression pattern underlies traits such as development time. Second, if the overall pattern of expression in the BMs (Figure 3.3) corresponds with relatively higher fitness for this cross type, adaptive hybridization could be facilitated especially if BDMs have become evolutionarily weaker in this cross direction. Although speculative at this point, our results highlight the potential for directional BDMs to contribute to opposing patterns of selection in the interacting species (reinforcement on the one hand versus adaptive hybridization on the other) (see also 144). A related point is that the nature of genetic variation (and any resulting BDMs) in the pure species at initial contact between them might set the stage for subsequent selective dynamics in sympatry. Our allopatric crosses were set up to mimic the conditions of such initial contact between the Spea species, and we found that the BMa cross type stood out for their intermediate expression among the hybrids and pure-species types for those genes that differed between the species (Figure 3.2, Figure 3.3a). If such intermediate expression does not carry major fitness costs, then conditions could have been conducive for the evolution of 61 S. bombifrons females’ adaptive hybridization behavior. Further studies are needed to fully evaluate how the conditions at contact between hybridizing species impact the subsequent evolution in sympatry of traits that prevent––or promote: hybridization; hybrid phenotypes and their fitness; and the genetic variation and interactions that underlie it all. 3.5 Materials and methods 3.5.1 Sample production and preparation We crossed allopatric and sympatric S. bombifrons and S. multiplicata to generate eight pure-species and hybrid types. These cross types, as well as their corresponding abbreviations and number of sequenced biological replicates, are described in Table 3.1. To generate hybrid tadpoles from allopatric populations, we bred S. multiplicata from populations in Arizona, USA, outside the western range limit of S. bombifrons, with S. bombifrons from populations in Colorado, USA, outside the northern range limit of S. multiplicata. Because of the geographic distance between the populations used to create allopatric hybrids, we generated comparable sympatric hybrids by pairing S. multiplicata and S. bombifrons from sympatric populations in Arizona and Texas (i.e., sympatric hybrids were not derived by crossing individuals from the same population). So that species identity and population identity were not confounded, half of the families had S. multiplicata parents from Arizona and S. bombifrons from Texas, whereas half had S. multiplicata parents from Texas and S. bombifrons from Arizona. To induce breeding, adults were injected with 0.07 mL 0.01ug/ml gonadotropin releasing hormone (GnRH) agonist. Males and females were placed as pairs in separate aquaria with 10 L of dechlorinated water and allowed to oviposit. We generated at least three replicate families per cross type (Table 3.1). After egg release was complete, adults were removed from the tanks and the eggs 62 were aerated until hatching. When tadpoles were swimming freely, we selected a subset of 16 tadpoles at random from each family. For each family we divided the tadpoles into two groups of eight and placed each group in a tank (34 cm X 21 cm X 11.5 cm) filled with dechlorinated water. All were fed shrimp and detritus (their natural diet) ad libitum. At approximately 1 week old, we euthanized tadpoles by placing them in MS-222 and freezing them in liquid nitrogen. Spadefoot toad tadpoles reach metamorphosis in as few as three weeks (50). Thus, 1-week-old tadpoles represented tadpoles that were well along in development and therefore had the potential to exhibit differential expression in genes that impact growth and survival during the tadpole stage. To prepare samples for RNA extraction, we randomly selected a single whole frozen tadpole from each family, ground each tadpole with a mortar and pestle, and then homogenized each of our samples in 15 ml centrifuge tubes. Each tadpole had a mass of approximately 0.2 g. We extracted RNA from each tadpole sample using Invitrogen PureLink extraction columns with TRIzol reagent. We obtained RNA from 27 samples total (biological replicates; Table 3.1; Supplementary Table 3.1). 3' RNA-seq libraries were generated using the Lexogen Quantseq FWD kit and sequenced on an Illumina NextSeq500 (Illumina, San Diego, CA, USA) using a single end 75 bp kit with an actual read length of 86 bp. Resulting read counts are reported in Supplementary Table 1. Library preparation and sequencing were performed at the Cornell University Institute of Biotechnology. Table 3.1: Cross types analyzed in the experiment with respective abbreviations used throughout text and figures. Population Parents (Female x Male) Abbreviation used in text Number of replicate families Allopatry S. bombifrons x S. bombifrons BBa 4 Allopatry S. multiplicata x S. multiplicata MMa 3 63 Allopatry S. bombifrons x S. multiplicata BMa 3 Allopatry S. multiplicata x S. bombifrons MBa 4 Sympatry S. bombifrons x S. bombifrons BBs 3 Sympatry S. multiplicata x S. multiplicata MMs 4 Sympatry S. bombifrons x S. multiplicata BMs 3 Sympatry S. multiplicata x S. bombifrons MBs 4 3.5.2 Measurement of differential expression To measure gene expression across our different types (Table 3.1), we generated high coverage expression measurements of the transcriptome with biological replication. 3' RNA-seq reads were trimmed to remove adapter and poly-A contamination using Trimmomatic (103) with recommended parameters. Individual reads, as well as pooled reads from all pure individuals were mapped to the S. multiplicata genome, which is described in Seidl, Levis, Pfennig, Pfennig and Ehrenreich (142), using STAR aligner (107) with default parameters. We used bedtools genome_cov (101) to generate bed coverage files at all positions. We performed peak discovery by finding all continuous windows of coverage ≥ 50. We defined the peak as the base with the maximum coverage in each window. We then extracted coverage of each peak for each individual using the bedtools coverage tool. Gene expression measurements within a sample were normalized by library size x 10 -6 and log2 transformed. The table containing measurements from all samples was then digitally normalized using the R function normalize.quantiles(), which is in the preprocessCore package (145). 3.5.3 Identification of genes showing expression differences among hybrids Genes showing differential expression between the species are described in Seidl, Levis, Pfennig, Pfennig and Ehrenreich (142). Here, we used the same gene expression data to identify genes that are differentially expressed among hybrid 64 types. The processed 3’ RNA-seq data is available through the following link: https://drive.google.com/open?id=1r6ba1arrAlzHRkcMZiPxYS8SRzFpfCfF For each gene, we fit the following model to data: expression = species + error, where expression corresponds to a vector of log 2 expression measurements for the samples, hybrid is a vector containing the hybrid type from which each measurement was taken, and error denotes the vector of residuals. We also determined if any genes showed differential expression between population types of the same species, using the following model: expression = population + error, where expression corresponds to vector of log 2 expression measurements for the samples, population is a vector containing the population type from which each measurement was taken, and error denotes the vector of residuals. This latter analysis was conducted separately on S. bombifrons and S. multiplicata. These analyses were performed entirely in R: ANOVAs were fit using the aov() function, p-values were extracted for each model using the summary() function, and point- wise FDR values (i.e., q-values) were then obtained using the QVALUE package (146). 3.5.4 Expression patterns among hybrids We used K-means clustering from K=1 to K=10 to identify clusters in our gene sets. This was implemented using the kmeans() function in R. The appropriate number of clusters in a given analysis was determined by applying the elbow method to the total within group Sum of Squares obtained for each level of K. We plotted heatmaps of the matrix of expression values ordered by the output of the kmeans clustering using the heatmap3() function from the heatmap3 package in R (147). 65 3.5.5 Principal components analysis to contrast groups We performed principal component analysis (PCA) to evaluate the impacts of sympatry versus allopatry on gene expression. Specifically, using the princomp() function in R, we ran one PCA across the expression values of all individuals and all genes that were significant among both the hybrids and the pure-species types. In addition, we ran a second PCA across the expression values of all individuals and all genes that were significant among the hybrids but not the pure-species types. Each PCA resulted in two principal components (PC) scores that together explained the majority of the variation (56.9% and 53.6% respectively) in the examined datasets. Using these PCs, we next contrasted the locations of the hybrid and pure- species types in PC space for each of our gene sets. We used the function advanced.procD.lm()in the geomorph package to calculate the distance (and corresponding empirically derived p values) among all cross types using a nonparametric randomized residual permutation procedure ('RRPP'; 148). 3.6 Conclusion Our results (and, more generally, the natural history of the Spea system) emphasize the possibility that BDMs might become both exaggerated and mitigated in the same system (Figure 3.1-3.3). That selection in sympatry can both enhance and mitigate genetic incompatibilities has important implications for gene flow between species and the evolutionary maintenance or break down of species boundaries. Indeed, such dynamics could generate the porous nature of species boundaries that are observed in many systems (149), and a better understanding of these dynamics could help explain how species arise and persist in the face of hybridization––even when it is adaptive. 66 3.7 Supplementary Material Supplementary Table 3.1 Read counts from 3’ RNA-Seq for each cross type. First letter of cross corresponds to maternal species. B = Spea bombifrons; M = S. multiplicata; a = allopatry; s = sympatry. Family ID Cross type Reads 1 BBa 4201834 2 3713980 3 3410409 4 4332464 1 MMa 2499736 2 3261144 3 3745695 1 BBs 2785915 2 2625436 3 2632609 1 MMs 2684599 2 3131783 3 2668006 4 3049993 1 BMa 3377464 2 2472324 3 3101684 1 BMs 1995624 2 2715458 3 3453959 1 MBa 2791231 2 2876914 3 3430804 1 MBs 3104900 2 3400954 3 4236713 4 5893133 67 Supplementary Table 3.2 Observed distances in least-square means (above diagonals) and corresponding p values (below diagonals) among cross types in principal component space. Values in (a) correspond to Figure 3a, and values in (b) correspond to Figure 3b. Distances and significance were based on a nonparametric randomized residual permutation procedure (“RRPP”) with 1000 iterations. Bold values are significant with α = 0.05. Abbreviations as in Table 3.1 (main text) and Supplementary Table 3.1. a. Variable among hybrids and different between MM and BB BBa BBs BMa BMs MBa MBs MMa MMs BBa --- 0.179 0.299 0.364 0.407 0.377 0.532 0.547 BBs 0.509 --- 0.304 0.226 0.294 0.507 0.518 0.520 BMa 0.130 0.157 --- 0.260 0.231 0.311 0.234 0.249 BMs 0.043 0.369 0.276 --- 0.084 0.560 0.383 0.371 MBa 0.007 0.144 0.328 0.855 --- 0.542 0.308 0.293 MBs 0.012 0.001 0.105 0.001 0.001 --- 0.424 0.458 MMa 0.001 0.001 0.342 0.042 0.096 0.007 --- 0.067 MMs 0.001 0.003 0.250 0.028 0.089 0.001 0.983 --- b. Variable among hybrids, but indistinguishable between MM and BB BBa BBs BMa BMs MBa MBs MMa MMs BBa --- 0.179 0.361 0.313 0.281 0.310 0.079 0.122 BBs 0.417 --- 0.416 0.151 0.160 0.459 0.118 0.058 BMa 0.019 0.017 --- 0.558 0.342 0.600 0.334 0.401 BMs 0.069 0.604 0.001 --- 0.246 0.543 0.265 0.194 MBa 0.071 0.541 0.022 0.200 --- 0.589 0.202 0.198 MBs 0.034 0.001 0.001 0.001 0.001 --- 0.387 0.403 MMa 0.853 0.728 0.052 0.182 0.297 0.012 --- 0.0749 MMs 0.612 0.911 0.009 0.382 0.284 0.004 0.848 --- 68 Chapter 4: Concluding Remarks In this chapter, I will cover the broader impact of my work as well as future direction of research. 4.1 Impact of my work The work performed during my PhD has provided novel insights into a well-studied and unique model organism and generated important resources that will enable more detailed study moving forward. 1) Generating a contiguous genome assembly for Spea multiplicata. In chapter two, I produced a 1.07 Gb assembly representing an estimated 89.8% of the genome. The assembly is composed of 49,736 scaffolds with an N50 of 70,967 bp, comparable to other recently published anuran genomes. I identified and annotated repetitive regions and elements corresponding to 32% of the assembly. I further produced contig sets for the closely related species Spea bombifrons, Scaphiopus couchii, and Scaphiopus holbrookii. These resources will be useful as comparisons in future investigation to delimit changes unique to Spea. Altogether, this draft assembly is of high enough quality to perform gene prediction and is sufficiently contiguous to enable a new range of experimental approaches in this system. 2) Generating a high quality gene annotation of the Sp. multiplicata genome. Chapter two contains work in which I estimated species-specific gene characteristics from gene models generated from RNA sequencing data. I performed ab-initio gene prediction and identified a total of 42,671 gene models. I utilized comparisons of these predicted peptides to databases including the model Xenopus tropicalis to assess quality and generated a set of 19,639 high-confidence 69 models. This number is consistent with other anurans and vertebrates in general. Percent identity of these gene models compared to the best match in the SWISSprot database is also similar or greater than other recently released anuran genomes. These gene models have already been used to perform global expression analyses between species and are crucial to further study of this system. 3) What can we say about signatures of evolutionary novelty and changes in Sp. multiplicata and the Spea genus more broadly? Work presented in chapter 2 includes a finding that size differences in anuran genomes can be largely explained by the amount of repetitive content, born in intergenic or intronic regions. I also report a search for gene proliferation specific to Spea multiplicata compared to other Anurans. I present evidence in support of expansion of three major gene families - nodal, zp3, hyas3 - with potential ties to some of the unique phenotypes of Sp. multiplicata. I report evidence of 22 genes under recent positive selection in the proteome of Spea. I then discuss significant differential expression of 5,865 genes, 53.3% of the genes tested, between Sp. multiplicata and Sp. bombifrons. These findings answer standing questions about the evolutionary history of Sp. multiplicata, the Spea genus, and anurans as a whole. 4) What are the transcriptional signatures of active interspecies hybridization? In chapter 3, I present the results of a 3' RNA Seq experiment into the hybridization occurring in the Spea genus, specifically between Sp. multiplicata and Sp. bombifrons. Using crosses between parents from allopatric and sympatric populations I demonstrate 1,675 genes that show differential expression between parents and hybrids, as well as 1,573 genes that show differential expression between hybrids but not between parents. I demonstrate that hybrids from sympatric populations, populations also undergoing active hybridization, show greater differential expression than those from allopatric populations. The results 70 further suggest that incompatibilities may be under mitigating and reinforcing behaviour within the same hybrid system. Other impacts I have already shared my genome with several other labs investigating questions in Spea or in anurans as a whole. Cataloging genomic sequences for all species on our planet should be a continuing goal for humanity, especially as our tool set for genome manipulation improves. Our ecosystems rely on biodiversity and producing an in-silico Genebank will be a boon to future restoration endeavors. 4.2 Future directions Chapter two The genome reported in chapter 2 is a resource that will open further experimental inquiry into this system. However, it will be important to continue to improve the assembly, as well as the secondary assemblies of Sp. bombifrons, Sc. couchii, Sc. holbrookii, reported in chapter 2 through long read sequencing. That being said, it will now be possible, using techniques such as Rad-seq, to get a clear picture of the population structure and standing variation of populations of Sp. multiplicata and Sp. bombifrons. These data will allow us to determine effective population size as well as the level of divergence between Sp. multiplicata and Sp. bombifrons. It will also be possible to map traits of interest, such as fraction of tadpoles displaying carnivore morphology, using genome wide association techniques. Further, using crossing schemes between Sp. multiplicata and Sp. bombifrons it will be possible to map the genes underlying the carnivore morphology by quantitative trait loci mapping. Much of this proposed work is already ongoing. Nodal is a highly conserved developmental signaling factor, further exploring what functional roles the duplicates of this gene have may allow 71 us to explain both the rapid rate at which Spea develop, but also how Spea is able to control development time depending on environmental cues. Chapter three Recently, we've begun to see how fluid species borders can actually be in wild systems. Further investigating the selective pressures favoring interspecies hybrids in sympatric populations of Sp. multiplicata and Sp. bombifrons can serve as a powerful example case to study both reinforcement as well as adaptive hybridization. Determining what variants are controlling the observed differential gene expression and their corresponding frequency in the sympatric and allopatric populations through low coverage sequencing of many wild isolates can tell us how hybridization shapes the standing variation in both species. 72 References 1. Stuart SN, et al. (2004) Status and trends of amphibian declines and extinctions worldwide. Science 306(5702):1783-1786. 2. Robert J & Cohen N (2011) The genus Xenopus as a multispecies model for evolutionary and comparative immunobiology of the 21st century. Dev Comp Immunol 35(9):916-923. 3. Hardwick LJ & Philpott A (2015) An oncologists friend: How Xenopus contributes to cancer research. Dev Biol 408(2):180-187. 4. Harland RM & Grainger RM (2011) Xenopus research: metamorphosed by genetics and genomics. Trends Genet 27(12):507-515. 5. Hellsten U, et al. (2010) The genome of the western clawed frog Xenopus tropicalis. Science 328(5978):633-636. 6. Session AM, et al. (2016) Genome evolution in the allotetraploid frog Xenopus laevis. Nature 538(7625):336-343. 7. Sun Y-B, et al. (2015) Whole-genome sequence of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes. Proc Natl Acad of Sci USA 112(11):E1257-E1262. 8. Hammond SA, et al. (2017) The North American bullfrog draft genome provides insight into hormonal regulation of long noncoding RNA. Nature Commun 8(1):1433. 9. Edwards RJ, et al. (2018) Draft genome assembly of the invasive cane toad, Rhinella marina. GigaScience 7(9):giy095-giy095. 10. AmphibiaWeb (2018) <https://amphibiaweb.org>. (Berkeley, California). 11. Gregory TR (2018) Animal Genome Size Database. 12. Oliver MJ, Petrov D, Ackerly D, Falkowski P, & Schofield OM (2007) The mode and tempo of genome size evolution in eukaryotes. Genome Res 17(5):594-601. 13. Lynch M & Conery JS (2003) The origins of genome complexity. Science 302(5649):1401-1404. 14. Liedtke HC, Gower DJ, Wilkinson M, & Gomez-Mestre I (2018) Macroevolutionary shift in the size of amphibian genomes and the role of life history and climate. Nat Ecol Evol 2(11):1792-1799. 15. Chipman AD, Khaner O, Haas A, & Tchernov E (2001) The evolution of genome size: what can be learned from anuran development? J Exp Zool 291(4):365-374. 16. Moczek AP, et al. (2011) The role of developmental plasticity in evolutionary innovation. Proc Biol Sci 278(1719):2705-2713. 17. Levis NA & Pfennig DW (2016) Evaluating 'plasticity-first' evolution in nature: key criteria and empirical approaches. Trends Ecol Evol 31(7):563-574. 18. West-Eberhard MJ (2003) Developmental plasticity and evolution (Oxford University Press, New York). 19. Pigliucci M, Murren CJ, & Schlichting CD (2006) Phenotypic plasticity and evolution by genetic assimilation. J Exp Biol 209(Pt 12):2362-2367. 20. Coyne JA & Orr HA (2004) Speciation (Sinauer, Sunderland, MA). 73 21. Cutter AD (2012) The polymorphic prelude to Bateson-Dobzhansky-Muller incompatibilities. Trends Ecol Evol 27(4):209-218. 22. Mack KL & Nachman MW (2017) Gene regulation and speciation. Trends in Genetics 33(1):68-80. 23. Orr HA (1995) The population genetics of speciation: the evolution of hybrid incompatibilities. Genetics 139:1805-1813. 24. Arnold ML (1997) Natural hybridization and evolution (Oxford University Press, Oxford, UK). 25. Pfennig KS (2007) Facultative mate choice drives adaptive hybridization. Science 318:965-967. 26. Mayhew WW (1965) Adaptations of the amphibian, Scaphiopus couchii, to desert conditions. Am Midl Nat 74:95–109. 27. Sexsmith LE (1968) DNA values and karyotypes of amphibians. (University of Toronto). 28. Bachmann K (1972) Nuclear DNA and developmental rate in frogs. Q J Florida Acad Sci 35:225–231. 29. Bragg AN (1965) Gnomes of the night: the spadefoot toads (University of Pennsylvania Press, Philadelphia, PA). 30. Dimmitt MA & Rodolfo R (1980) Exploitation of food resources by spadefoot toads (Scaphiopus). Copeia 1980(4):854-862. 31. McClanahan LJ (1967) Adaptations of the spadefoot toad Scaphiopus couchii, to desert environments. Comp. Biochem. Physiol. 20:73–79. 32. Seymour RS (1973) Energy metabolism of dormant spadefoot toads (Scaphiopus). Copeia 1973:435–445. 33. Ruibal R, Tevis L, & Roig V (1969) The terrestrial ecology of the spadefoot toad Scaphiopus hammondii. Copeia 1969(3):571-584. 34. Newman RA (1989) Developmental plasticity of Scaphiopus couchii tadpoles in an unpredictable environment. Ecology 70:1775–1787. 35. Denver RJ, Mirhadi N, & Phillips M (1998) Adaptive plasticity in amphibian metamorphosis: response of Scaphiopus hammondii tadpoles to habitat desiccation. Ecology 79(6):1859-1872. 36. Morey S & Reznick D (2000) A comparative analysis of plasticity in larval development in three species of spadefoot toads. Ecology 81:1736–1749. 37. Boorse GC & Denver RJ (2003) Endocrine mechanisms underlying plasticity in metamorphic timing in spadefoot toads. Integ and Comp Biol 43:646–657. 38. Gomez-Mestre I & Buchholz DR (2006) Developmental plasticity mirrors differences among taxa in spadefoot toads linking plasticity and diversity. Proc Natl Acad of Sci USA 103:19021-19026. 39. Newman RA (1992) Adaptive plasticity in amphibian metamorphosis. BioScience 42:671–678. 40. Wells KD (2007) The ecology and behavior of amphibians (University of Chicago Press, Chicago, IL). 41. Pfennig DW (1990) The adaptive significance of an environmentally-cued developmental switch in an anuran tadpole. Oecologia 85:101-107. 42. Ledon-Rettig CC & Pfennig DW (2011) Emerging model systems in eco‐evo‐ devo: the environmentally responsive spadefoot toad. Evol Dev 13(4):391-400. 74 43. Frankino WA & Pfennig DW (2001) Condition‐dependent expression of trophic polyphenism: effects of individual size and competitive ability. Evol Ecol Res 3(8):939-951. 44. Levis NA, Buzon S, & Pfennig DW (2015) An inducible offense: carnivore morph tadpoles induced by tadpole carnivory. 5 7(1405-1411). 45. Pfennig DW (1999) Polyphenism in spadefoot toad tadpoles as a locally adjusted evolutionary stable strategy. Evolution 46:1408-1420. 46. Pfennig DW (2000) Character displacement in polyphenic tadpoles. Evolution 54(1738-1749). 47. Pfennig DW & Martin RA (2010) Evolution of character displacement in spadefoot toads: different proximate mechanisms in different species. Evolution 64(8):2331-2341. 48. Martin RA & Pfennig DW (2010) Maternal investment influences expression of resource polymorphism in amphibians: implications for the evolution of novel resource-use phenotypes. PLoS One 5(2):e9117. 49. Ledon-Rettig CC, Pfennig DW, & Crespi EJ (2010) Diet and hormonal manipulation reveal cryptic genetic variation: implications for the evolution of novel feeding strategies. Proc Biol Sci 277(1700):3569-3578. 50. Pfennig KS & Simovich MA (2002) Differential selection to avoid hybridization in two toad species. Evolution 56(9):1840-1848. 51. Simovich MA, Sassaman CA, & Chovnick A (1991) Post-mating selection of hybrid toads (Scaphiopus multiplicatus and Scaphiopus bombifrons). Proc S Diego Soc Nat Hist 1991:1-6. 52. Duellman WE & Trueb L (1986) Biology of amphibians (MacGraw Hill, New York). 53. Halliday TR (2016) The book of frogs (University of Chicago Press, Chicago, IL). 54. Bossuyt F & Roelants K (2009) Frogs and toads (Anura). The timetree of life, eds Hedges SB & Kumar S (Oxford University Press, Oxford, U.K.), pp 357–364. 55. Tinsley RC & Tocque K (1995) The population dynamics of a desert anuran, Scaphiopus couchii. Austral Ecol 20(3):376-384. 56. Pierce AA, Gutierrez R, Rice AM, & Pfennig KS (2017) Genetic variation during range expansion: effects of habitat novelty and hybridization. Proc Biol Sci 284(1852). 57. Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, & Zdobnov EM (2015) BUSCO: assessing genome assembly and annotation completeness with single- copy orthologs. Bioinformatics 31(19):3210-3212. 58. Stanke M & Morgenstern B (2005) AUGUSTUS: a web server for gene prediction in eukaryotes that allows user-defined constraints. Nucleic Acids Res 33:W465–W467. 59. Spicer AP & McDonald JA (1998) Characterization and molecular evolution of a vertebrate hyaluronan synthase gene family. J Biol Chem 273:1923–1932. 60. Osada SI & Wright CVE (1999) Xenopus nodal-related signaling is essential for mesendodermal patterning during early embryogenesis. Development 126(14):3229-3240. 61. Takahashi S, et al. (2000) Two novel nodal-related genes initiate early inductive events in Xenopus. Development 127(24):5319-5329. 75 62. Schier AF & Shen MM (2000) Nodal signalling in vertebrate development. Nature 403(6768):385-389. 63. Takahashi S, et al. (2006) Nodal-related gene Xnr5 is amplified in the Xenopus genome. Genesis 44(7):309-321. 64. Terai Y, et al. (2006) Divergent selection on opsins drives incipient speciation in Lake Victoria cichlids. Plos Biology 4(12):2244-2251. 65. Luxardi G, Marchal L, Thomé V, & Kodjabachian L (2010) Distinct Xenopus nodal ligands sequentially induce mesendoderm and control gastrulation movements in parallel to the Wnt/PCP pathway. Development 137(3):417–426. 66. Agius E, Oelgeschlager M, Wessely O, Kemp C, & De Robertis EM (2000) Endodermal nodal-related signals and mesoderm induction in Xenopus. Development 127(6):1173-1183. 67. Harris JD, et al. (2009) Cloning and characterization of zona pellucida genes and cDNAs from a variety of mammalian species: The ZPA, ZPB and ZPC gene families. DNA Sequence 4(6):361-393. 68. Edgar RC (2010) Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26(19):2460-2461. 69. The UniProt C (2017) UniProt: the universal protein knowledgebase. Nucleic Acids Res 45(D1):D158-D169. 70. Mueller RL & Jockusch EL (2018) Jumping genomic gigantism. Nat Ecol Evol 2(11):1687-1688. 71. Petrov DA (2001) Evolution of genome size: new approaches to an old problem. Trends in Genetics 17:23–28. 72. Lynch M (2007) The origin of genome architecture (Sinauer Associates, Sunderland, MA). 73. Gregory TR (2005) The evolution of the genome (Academic/Elsevier, Burlington, VT). 74. Chrtek JIJ, Zahradnek J, Krak K, & Fehrer J (2009) Genome size in Hieracium subgenus Hieracium (Asteraceae) is strongly correlated with major phylogenetic groups. Ann Bot 104:161–178. 75. Charney RM, Paraiso KD, Blitz IL, & Cho KWY (2017) A gene regulatory program controlling early Xenopus mesendoderm formation: network conservation and motifs. Semin Cell Dev Biol 66:12–24. 76. Pfennig DW, Ho SG, & Hoffman EA (1998) Pathogen transmission as a selective force against cannibalism. Anim. Behav. 55:1255-1261. 77. Wiens JJ & Titus TA (1991) A phylogenetic analysis of Spea (Anura: Pelobatidae). Herpetologica 47:21-28. 78. Zeng C, Gomez-Mestre I, & Wiens JJ (2014) Evolution of rapid development in spadefoot toads is unrelated to arid environments. PLoS ONE 9(5):e96637. 79. Pfennig KS, Allenby A, Martin RA, Monroy A, & Jones CD (2012) A suite of molecular markers for identifying species, detecting introgression and describing population structure in spadefoot toads (Spea spp.). Molecular ecology resources 12(5):909-917. 80. Bhadra MP, Bhadra U, & Birchler JA (2006) Misregulation of sex-lethal and disruption of male-specific lethal complex localization in Drosophila species hybrids. Genetics 174(3):1151-1159. 76 81. Abbott R, et al. (2013) Hybridization and speciation. J Evol Biol 26(2):229-246. 82. Pfennig KS, Kelly AL, & Pierce AA (2016) Hybridization as a facilitator of species range expansion. Proc Biol Sci 283(1839). 83. Zimin AV, et al. (2013) The MaSuRCA genome assembler. Bioinformatics 29(21):2669-2677. 84. Kielbasa SM, Wan R, Sato K, Horton P, & Frith MC (2011) Adaptive seeds tame genomic sequence comparison. Genome Res 21(3):487-493. 85. Smit AFA, Hubley R, & Green P (2013-2015) RepeatMasker Open-4.0. 86. Bao W, Kojima KK, & Kohany O (2015) Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob DNA 6:11. 87. Tamazian G, et al. (2016) Chromosomer: a reference-based genome arrangement tool for producing draft chromosome sequences. Gigascience 5(1):38. 88. Stanke M, Diekhans M, Baertsch R, & Haussler D (2008) Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24(5):637-644. 89. Seidl F, et al. (2019) Hybrid gene expression in allopatry versus sympatry: implications for the evolution of gene regulation in interbreeding species. Mol Ecol (in revision). 90. Kim D, et al. (2013) TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol 14(4):R36. 91. Karimi K, et al. (2018) Xenbase: a genomic, epigenomic and transcriptomic model organism database. Nucleic Acids Res 46(D1):D861-D868. 92. Camacho C, et al. (2009) BLAST+: architecture and applications. BMC Bioinformatics 10:421. 93. Rognes T, Flouri T, Nichols B, Quince C, & Mahé F (2016) VSEARCH: a versatile open source tool for metagenomics. PeerJ 4:e2584-e2584. 94. Cai Z, Mao X, Li S, & Wei L (2006) Genome comparison using Gene Ontology (GO) with statistical testing. BMC Bioinformatics 7:374. 95. Benjamini Y & Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B- Methodol. 57(1):289-300. 96. Bass AJ, Dabney A, & Robinson D (2018) qvalue: Q-value estimation for false discovery rate control. R package version 2.14.0). 97. Bairoch A & Apweiler R (2000) The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 28(1):45-48. 98. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32(5):1792–1797. 99. Schliep K, Potts AJ, Morrison DA, & Grimm GW (2017) Intertwining phylogenetic trees and networks. Methods Ecol Evol 8:1212–1220. 100. Li H & Durbin R (2009) Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics 25(14):1754-1760. 101. Quinlan AR & Hall IM (2010) BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26(6):841-842. 102. Luo R, et al. (2012) SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1(1):18. 77 103. Bolger AM, Lohse M, & Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30(15):2114-2120. 104. Yang Z (2007) PAML 4: a program package for phylogenetic analysis by maximum likelihood. Mol Biol Evol 24:1586-1591. 105. Binns D, et al. (2009) QuickGO: a web-based tool for Gene Ontology searching. Bioinformatics 25(22):3045–3046. 106. Levis NA, Serrato-Capuchina A, & Pfennig DW (2017) Genetic accommodation in the wild: evolution of gene expression plasticity during character displacement. J Evol Biol 30(9):1712–1723. 107. Dobin A, et al. (2013) STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29(1):15-21. 108. Beck AH, et al. (2010) 3′-end sequencing for expression quantification (3SEQ) from archival tumor samples. PLoS One 5(1):e8768. 109. Barton NH & Hewitt GM (1985) Analysis of hybrid zones. Annu. Rev. Ecol. Syst. 16:113-148. 110. Wolf JBW, Lindell J, & Backstrom N (2010) Speciation genetics: current status and evolving approaches. Philos Trans Royal Soc B 365(1547):1717-1733. 111. Michalak P & Noor MAF (2003) Genome-wide patterns of expression in Drosophila pure species and hybrid males. Mol Biol Evol 20(7):1070-1076. 112. Michalak P & Noor MAF (2004) Association of misexpression with sterility in hybrids of Drosophila simulans and D-mauritiana. J Mol Evol 59(2):277-282. 113. Malone JH, Chrzanowski TH, & Michalak P (2007) Sterility and gene expression in hybrid males of Xenopus laevis and X. muelleri. PLoS One 2(8):e781. 114. Moehring AJ, Teeter KC, & Noor MAF (2007) Genome-wide patterns of expression in Drosophila pure species and hybrid males. II. Examination of multiple-species hybridizations, platforms, and life cycle stages. Mol Biol Evol 24(1):137-145. 115. Ortiz-Barrientos D, Counterman BA, & Noor MAF (2007) Gene expression divergence and the origin of hybrid dysfunctions. Genetica 129(1):71-81. 116. Meiklejohn CD, Coolon JD, Hartl DL, & Wittkopp PJ (2014) The roles of cis- and trans- regulation in the evolution of regulatory incompatibilities and sexually dimorphic gene expression. Genome Research 24(1):84-95. 117. Brill E, Kang L, Michalak K, Michalak P, & Price DK (2016) Hybrid sterility and evolution in Hawaiian Drosophila: differential gene and allele-specific expression analysis of backcross males. Heredity (Edinb) 117(2):100-108. 118. Gomes S & Civetta A (2015) Hybrid male sterility and genome-wide misexpression of male reproductive proteases. Scientific Reports 5:11976. 119. Lopez-Maestre H, et al. (2017) Identification of misexpressed genetic elements in hybrids between Drosophila-related species. Sci Rep 7:40618. 120. Larson EL, et al. (2018) The evolution of polymorphic hybrid incompatibilities in house mice. Genetics 209(3):845-859. 121. Gerard PR & Presgraves DC (2012) Abundant genetic variability in Drosophila simulans for hybrid female lethality in interspecific crosses to Drosophila melanogaster. Genet Res 94(1):1-7. 122. Matute DR, Gavin-Smyth J, & Liu G (2014) Variable post-zygotic isolation in Drosophila melanogaster/D. simulans hybrids. J Evol Biol 27(8):1691-1705. 78 123. Sanderson N (1989) Can Gene Flow Prevent Reinforcement. Evolution 43(6):1223-1235. 124. Barton NH & Hewitt GM (1989) Adaptation, speciation and hybrid zones. Nature 341:497-503. 125. Ritchie MG, Butlin RK, & Hewitt GM (1992) Fitness consequences of potential assortative mating inside and outside a hybrid zone in Chorthippus-parallelus (Orthoptera, Acrididae) - implications for reinforcement and sexual selection theory. Biol. J. Linnean Soc. 45(3):219-234. 126. Lammers Y, et al. (2013) SNP genotyping for detecting the "rare allele phenomenon' in hybrid zones. Molecular ecology resources 13(2):237-242. 127. Schilthuizen M & Lammers Y (2013) Hybrid zones, barrier loci and the rare allele phenomenon'. J Evol Biol 26(2):288-290. 128. Servedio MR & Noor MAF (2003) The role of reinforcement in speciation: Theory and data. Annu Rev Ecol Evol Syst 34:339-364. 129. Pfennig DW & Pfennig KS (2012) Evolution's wedge: competition and the origins of diversity (University of California Press, Berkeley, CA). 130. Jones JM (1973) Effects of thirty years hybridization on the toads Bufo americanus and Bufo woodhousii fowleri at Bloomington, Indiana. Evolution 27:435-448. 131. Pfennig KS (2003) A test of alternative hypotheses for the evolution of reproductive isolation between spadefoot toads: support for the reinforcement hypothesis. Evolution 57:2842-2851. 132. Hoarau G, Coyer JA, Giesbers MCWG, Jueterbock A, & Olsen JL (2015) Pre- zygotic isolation in the macroalgal genus Fucus from four contact zones spanning 100–10 000 years: a tale of reinforcement? Royal Soc Open Sci 2(2). 133. Liou LW & Price TD (1994) Speciation by reinforcement of premating isolation. Evolution 48(5):1451-1459. 134. Simovich MA (1994) The dynamics of a spadefoot toad (Spea multiplicata and S. bombifrons) hybridization system. Herpetology of North American deserts, special publication no. 5, eds Brown PR & Wright JW (Southwestern Herpetologists Society, Los Angeles), pp 167-182. 135. Wunsch LK & Pfennig KS (2013) Failed sperm development as a reproductive isolating barrier between species. Evol Dev 15(6):458-465. 136. Simovich MA (1985) Analysis of a hybrid zone between the spadefoot toads Scaphiopus multiplicatus and Scaphiopus bombifrons. Ph.D. Thesis (University of California Riverside). 137. Schmidt EM & Pfennig KS (2016) Hybrid female mate choice as a species isolating mechanism: environment matters. J Evol Biol 29(4):865-869. 138. Sattler PW (1985) Introgressive hybridization between the spadefoot toads Scaphiopus bombifrons and Scaphiopus multiplicatus (Salientia: Pelobatidae). Copeia 1985:324-332. 139. Simovich MA & Sassaman CA (1986) Four independent electrophoretic markers in spadefoot toads. Journal of Heredity 77:410-414. 140. Pfennig KS (2000) Female spadefoot toads compromise on mate quality to ensure conspecific matings. Behav Ecol 11:220-227. 79 141. Pfennig KS & Rice AM (2014) Reinforcement generates reproductive isolation between neighbouring conspecific populations of spadefoot toads. Proc Biol Sci 281(1789):20140949. 142. Seidl F, Levis NA, Pfennig DW, Pfennig KS, & Ehrenreich IM (2019) Genome of the rapidly developing, phenotypically plastic, and desert-adapted spadefoot toad. Nature Commun (in revision). 143. Price T (2008) Speciation in birds (Roberts and Company Publishers, Greenwood Village, CO). 144. Turelli M & Moyle LC (2007) Asymmetric postmating isolation: Darwin's corollary to Haldane's rule. Genetics 176(2):1059-1088. 145. Bolstad BM (2016) preprocessCore: A collection of pre-processing functions.), 1.36.0. 146. Storey JD, Bass AJ, Dabney A, & Robinson D (2018) qvalue: Q-value estimation for false discovery rate control (R package version 2.14.0). 147. Zhao S, Guo Y, Sheng Q, & Shyr Y (2015) heatmap3: An Improved Heatmap Package (R package version 1.1.1). 148. Collyer ML, Sekora DJ, & Adams DC (2015) A method for analysis of phenotypic change for phenotypes described by high-dimensional data. Heredity 115:357. 149. Harrison RG & Larson EL (2014) Hybridization, introgression, and the nature of species boundaries. J Hered 105:795-809. 150. Morita RY (1988) Bioavailability of energy and its relationship to growth and starvation survival in nature. Can J Microbiol 34(4):436-441. 151. Kram KE, et al. (2017) Adaptation of Escherichia coli to long-term serial passage in complex medium: evidence of parallel evolution. mSystems 2(2). 152. Good BH, McDonald MJ, Barrick JE, Lenski RE, & Desai MM (2017) The dynamics of molecular evolution over 60,000 generations. Nature 551(7678):45- 50. 153. Blount ZD, Barrick JE, Davidson CJ, & Lenski RE (2012) Genomic analysis of a key innovation in an experimental Escherichia coli population. Nature 489(7417):513-518. 154. Gaffe J, et al. (2011) Insertion sequence-driven evolution of Escherichia coli in chemostats. J Mol Evol 72(4):398-412. 155. Helling RB, Vargas CN, & Adams J (1987) Evolution of Escherichia coli during growth in a constant environment. Genetics 116(3):349-358. 156. Gresham D & Hong J (2015) The functional basis of adaptive evolution in chemostats. FEMS Microbiol Rev 39(1):2-16. 157. Sezonov G, Joseleau-Petit D, & D'Ari R (2007) Escherichia coli physiology in Luria-Bertani broth. J Bacteriol 189(23):8746-8749. 158. Zambrano MM, Siegele DA, Almiron M, Tormo A, & Kolter R (1993) Microbial Competition - Escherichia-coli mutants that take over stationary phase cultures. Science 259(5102):1757-1760. 159. Finkel SE & Kolter R (1999) Evolution of microbial diversity during prolonged starvation. Proc Natl Acad Sci U S A 96(7):4023-4027. 160. Finkel SE (2006) Long-term survival during stationary phase: evolution and the GASP phenotype. Nat Rev Microbiol 4(2):113-120. 80 161. Zinser ER & Kolter R (2000) Prolonged stationary-phase incubation selects for lrp mutations in Escherichia coli K-12. J Bacteriol 182(15):4361-4365. 162. Zinser ER & Kolter R (1999) Mutations enhancing amino acid catabolism confer a growth advantage in stationary phase. J Bacteriol 181(18):5800-5807. 163. Zinser ER, Schneider D, Blott M, & Kolter R (2003) Bacterial evolution through the selective loss of beneficial genes: Trade-offs in expression involving two loci. Genetics 164(4):1271-1277. 164. Tani TH, Khodursky A, Blumenthal RM, Brown PO, & Matthews RG (2002) Adaptation to famine: a family of stationary-phase genes revealed by microarray analysis. Proc Natl Acad Sci U S A 99(21):13471-13476. 165. Gudelj I, Kinnersley M, Rashkov P, Schmidt K, & Rosenzweig F (2016) Stability of cross-feeding polymorphisms in microbial communities. PLoS Comput Biol 12(12):e1005269. 166. McCully AL, et al. (2018) An Escherichia coli nitrogen starvation response is important for mutualistic coexistence with Rhodopseudomonas palustris. Appl Environ Microbiol 84(14). 167. Ozkaya O, Xavier KB, Dionisio F, & Balbontin R (2017) Maintenance of microbial cooperation mediated by public goods in single- and multiple-trait scenarios. J Bacteriol 199(22). 168. Kim Y & Orr HA (2005) Adaptation in sexuals vs. asexuals: clonal interference and the Fisher-Muller model. Genetics 171(3):1377-1386. 169. Larson TJ, Ye SZ, Weissenborn DL, Hoffmann HJ, & Schweizer H (1987) Purification and characterization of the repressor for the sn-glycerol 3-phosphate regulon of Escherichia coli K12. J Biol Chem 262(33):15869-15874. 170. Ferrandez A, Garcia JL, & Diaz E (2000) Transcriptional regulation of the divergent paa catabolic operons for phenylacetic acid degradation in Escherichia coli. J Biol Chem 275(16):12214-12222. 171. Tan K, et al. (2009) The mannitol operon repressor MtlR belongs to a new class of transcription regulators in bacteria. J Biol Chem 284(52):36670-36679. 172. Zheng DL, Constantinidou C, Hobman JL, & Minchin SD (2004) Identification of the CRP regulon using in vitro and in vivo transcriptional profiling. Nucleic Acids Res 32(19):5874-5893. 173. Vogel J & Luisi BF (2011) Hfq and its constellation of RNA. Nat Rev Microbiol 9(8):578-589. 174. Weber H, Polen T, Heuveling J, Wendisch VF, & Hengge R (2005) Genome-wide analysis of the general stress response network in Escherichia coli: sigmaS- dependent genes, promoters, and sigma factor selectivity. J Bacteriol 187(5):1591-1603. 175. Avrani S, Bolotin E, Katz S, & Hershberg R (2017) Rapid genetic adaptation during the first four months of survival under resource exhaustion. Mol Biol Evol 34(7):1758-1769. 176. Chib S, Ali F, & Seshasayee ASN (2017) Genomewide Mutational Diversity in Escherichia coli Population Evolving in Prolonged Stationary Phase. mSphere 2(3). 81 177. Fong SS, Joyce AR, & Palsson BO (2005) Parallel adaptive evolution cultures of Escherichia coli lead to convergent growth phenotypes with different gene expression states. Genome Res 15(10):1365-1372. 178. Tenaillon O, et al. (2012) The molecular diversity of adaptive convergence. Science 335(6067):457-461. 179. Forst S, Delgado J, & Inouye M (1989) Phosphorylation of OmpR by the osmosensor EnvZ modulates expression of the ompF and ompC genes in Escherichia coli. Proc Natl Acad Sci U S A 86(16):6052-6056. 180. Pratt LA, Hsing W, Gibson KE, & Silhavy TJ (1996) From acids to osmZ: multiple factors influence synthesis of the OmpF and OmpC porins in Escherichia coli. Mol Microbiol 20(5):911-917. 181. Lenski RE, Rose MR, Simpson SC, & Tadler SC (1991) Long-term experimental evolution in Escherichia coli.1. adaptation and divergence during 2,000 generations. American Naturalist 138(6):1315-1341. 182. Lewis LA & Grindley NDF (1997) Two abundant intramolecular transposition products, resulting from reactions initiated at a single end, suggest that IS2 transposes by an unconventional pathway. Mol Microbiol 25(3):517-529. 183. Duval-Valentin G, Marty-Cointin B, & Chandler M (2004) Requirement of IS911 replication before integration defines a new bacterial transposition pathway. Embo J 23(19):3897-3906. 184. Hu ST, Wang HC, Lei GS, & Wang SH (1998) Negative regulation of IS2 transposition by the cyclic AMP (cAMP)-cAMP receptor protein complex. J Bacteriol 180(10):2682-2688. 185. Tenaillon O, et al. (2016) Tempo and mode of genome evolution in a 50,000- generation experiment. Nature 536(7615):165-170. 186. Battesti A, Majdalani N, & Gottesman S (2011) The RpoS-mediated general stress response in Escherichia coli. Annu Rev Microbiol 65:189-213. 187. Muffler A, Fischer D, & HenggeAronis R (1996) The RNA-binding protein HF-I, known as a host factor for phage Q beta RNA replication, is essential for rpoS translation in Escherichia coli. Gene Dev 10(9):1143-1151. 188. Murina VN, et al. (2014) Effect of conserved intersubunit amino acid substitutions on Hfq protein structure and stability. Biochemistry 79(5):469-477. 189. Hengge-Aronis R & Fischer D (1992) Identification and molecular analysis of glgS, a novel growth-phase-regulated and rpoS-dependent gene involved in glycogen synthesis in Escherichia coli. Mol Microbiol 6(14):1877-1886. 190. Schellhorn HE & Hassan HM (1988) Transcriptional regulation of katE in Escherichia coli K-12. J Bacteriol 170(9):4286-4292. 191. Chaulk SG, et al. (2011) ProQ Is an RNA Chaperone that Controls ProP Levels in Escherichia coli. Biochemistry 50(15):3095-3106. 192. Racher KI, et al. (1999) Purification and reconstitution of an osmosensor: transporter ProP of Escherichia coli senses and responds to osmotic shifts. Biochemistry 38(6):1676-1684. 193. Westphal LL, Sauvey P, Champion MM, Ehrenreich IM, & Finkel SE (2016) Genomewide Dam methylation in Escherichia coli during long-term stationary phase. mSystems 1(6). 82 194. Li H, et al. (2009) The Sequence Alignment/Map format and SAMtools. Bioinformatics 25(16):2078-2079. 195. Francisco AP, et al. (2012) PHYLOViZ: phylogenetic inference and data visualization for sequence based typing methods. BMC Bioinformatics 13:87. 196. Lee WP, et al. (2014) MOSAIK: a hash-based algorithm for accurate next- generation sequencing short-read mapping. PLoS One 9(3):e90581. 197. Wang H & Song M (2011) Ckmeans.1d.dp: Optimal k-means Clustering in One Dimension by Dynamic Programming. R J 3(2):29-33. 83 Appendix A: Trends of evolution of Escherichia coli under stressful conditions This work was performed in collaboration with Nicole Ratib. A.1 Abstract Bacteria have been shown to adapt to constantly changing nutritional environments. Determining the mechanisms that allow for these adaptations in a lab environment can deepen our understanding of survival mechanisms in natural environments. In most natural environments, bacteria spend much of their time under conditions of starvation and stress, similar to those during long-term stationary phase (LTSP) in the laboratory, making long-term batch cultures an excellent system to study evolution, bacterial survival and adaptation. The constantly changing environment necessitates new mutations that allow mutants to occupy temporary environmental niches. Here we analyzed the genomes of clones isolated from a single long-term batch culture incubated for 1200 days. We find that novel genotypes continually emerge, most of which ultimately go extinct. Four main lineages are present over the course of the experiment; two of which trade majority fraction of the population. Most acquired new mutations are associated with genes involved in metabolism, transport, or transcriptional regulation. We also identified amplifications and deletions, many affecting >1% of the chromosome, as well as insertion element replication and mobilization. These data provide new insight into the population structure and the types of mutations that may be adaptive in wild bacterial populations. 84 A.2 Introduction In most naturally occurring environments at least one nutrient required for bacterial growth is limited, causing bacteria to enter a state of prolonged starvation (150). When there is an influx of nutrients, bacteria will divide exponentially utilizing the new nutrients. Once nutrients are exhausted, cells will again enter a state of prolonged starvation creating a cycle of exponential growth followed by starvation. We don’t yet understand the population dynamics and potential mechanisms bacteria employ to survive during prolonged starvation. Long-term batch cultures are an excellent laboratory model system to study bacteria, such as E. coli, during prolonged starvation. Work has been done characterizing serially passaged bacterial populations (151-153) and populations maintained in chemostats (154-156). In these experiments bacterial populations are maintained in constant environment with loss of genetic diversity through daily bottlenecking or flushing of nutrients. Long-term batch cultures are initiated by inoculating bacteria at low density into a rich medium, such as Luria-Bertani (LB) broth, where cells undergo the well-characterized first three phases of growth in the laboratory: lag phase, log or exponential phase, and stationary phase (157). Following stationary-phase, viable cell counts exponentially decrease during death phase, where depending on the strain and growth medium, ~99% of cells die (158, 159). After death phase, the surviving population enters long-term stationary phase (LTSP) where cells can survive for years in laboratory culture without the addition of nutrients (160). During LTSP, the nutrient makeup of the culture environment is variable as the cells themselves catabolize molecules from their environment to survive. It is hypothesized that mutants able to utilize new energy sources increase in frequency creating new waste streams that serve as potential energy sources for later mutants to exploit. This dynamic of changing metabolisms causes different environmental 85 niches to expand and contract and new mutants, better able to utilize particular molecules, to arise. Cells expressing the Growth Advantage in Stationary Phase (GASP) phenotype can be detected as early as day 10 in long-term batch culture (158-160). The GASP phenotype is defined as the ability of an initially isogenic aged strain to outcompete younger strains when cocultured, due to the acquisition of beneficial mutant alleles during LTSP (161-163). The best characterized mutations that confer a GASP fitness advantage during LTSP have been identified in rpoS, the stationary phase-specific sigma factor, and lrp, a transcription factor shown to share many common regulated genes with RpoS (164). Analysis of the rpoS and lrp mutants showed that these mutations allow E. coli to better utilize amino acids as carbon sources (158, 163). More recently, it was shown that cells serially passaged after entering LTSP acquired beneficial mutations in cytR, sspA, and tolC, in as little as 30 generations (151). Complex mutations involving mobilization of insertion sequence (IS) elements have also been shown to confer the GASP phenotype (163). In addition to these well-characterized mutations, evidence suggests that there are many more GASP alleles. In short, it has been shown that during LTSP populations are continuously evolving and this evolution can follow different trajectories (159). What remains poorly understood is how complex the evolving populations can become, how many genotypes can be present at any one time, frequency of complete sweeps, or types of genes may be under natural selection. To address these questions, an extensive study of an evolving population was undertaken. We examine a population of E. coli that was incubated in batch culture for 1200 days without the addition of nutrients, with samples taken every 30 days. The genomes of 1117 clones sampled from the population at 24 time points were sequenced. Identification of mutations present at each time point during the evolution experiment allowed us to determine the underlying genetic structure and the evolutionary paths taken within the population. 86 A.3 Results A.3.1 Genomic sequence of clones from 24 time points across 1200 days. A total of 1117 clones were sequenced from a single long-term batch culture incubated for 1200 days. Samples of the population were frozen on days 10, 20, 30 and every subsequent 30 days. The batch culture was initiated on day 0 by inoculating ~10 6 colony forming units (CFU) into 5mL of Luria-Bertani (LB) broth. Following inoculation, the population experienced the five growth phases typical of E. coli K-12 incubated in LB (Fig. A.1). By day 1, after exponential growth, population density increased nearly 1000-fold to ~5x10 9 CFU per mL. Following stationary phase and death phase, upon entry into LTSP, the population density remained ~5x10 7 CFU per mL from day 10 to day 60. Between days 60 and 1200, the population density ranged from ~10 4 to ~10 6 CFU per mL (Fig A.1). Figure A.1. Population density of E. coli during 1200 days of incubation in a long-term evolution experiment in batch culture. The population density is shown as colony forming units (CFU) per mL on a log-scale. To examine the population structure of the long-term batch culture at the genotypic level, ~48 clones from each time point were isolated on LB agar plates, 87 then analyzed individually using whole genome sequencing. We chose to sequence ~48 clones from each time point based on results from a random subsampling of genotypes of 93 clones obtained from the day 150 time point and 71 clones from day 210 (Supplementary Fig. A.1; Materials and Methods). An average of 40X sequence coverage was obtained per clone using Illumina NextSeq technology, allowing identification of 661 mutations that include single nucleotide polymorphisms (SNPs), small indels (1 to 17 bp), new insertion sequence (IS) elements, and large deletions/duplications/amplifications (in some cases covering 1-2% of the chromosome) (Fig. A.2). Half of all mutations are exclusively novel IS2 insertions, which rapidly increased in frequency between days 270 and 600 (Fig A.2) in one lineage (see below). We also observe numerous non-synonymous SNPs, small indels, and other new IS element insertion sites. All of the observed mutation types are capable of disrupting or altering protein function or affecting patterns of gene expression. Figure_A A.2. Mutation types over sampled time points. Distribution of different mutation types across time, including SNPs (synonymous, intergenic, nonsense, nonsynonymous), indels (small / large deletions), and mobile elements (IS2, all others). 88 A.3.2 Four main lineages are observed over 1200 days of evolution. During incubation in long-term batch culture significant dynamism is observed in the genotypic diversity of the evolving community, possibly an ever- changing adaptive landscape. Over the course of the experiment four major lineages diverged from the parental genotype that are present for 150 days or longer (Fig A.3 and Fig. A.4). Most sequenced clones belong to Lineage 1 (teal, 61%) and Lineage 2 (purple, 33%) that are present through day 1200, while Lineage 3 (orange, 3%) and Lineage 4 (pink, 1%) are present as a minority through days 150 and 180, respectively (Fig A.3). After 10 days in long-term batch culture no parental genotypes were detected. Instead, novel genotypes are observed that serve as the founding genotypes for three of the four lineages we ultimately observed over the course of the entire experiment (Fig A.3). Ten days later on day 20, the genotypic structure shifts with 83% of clones belonging to Lineage 1 and sharing the same three mutations. After another ten days on day 30, the genotypic structure shifts again, with 43% population belonging to Lineage 2; the majority of these clones sharing three new mutations compared to their day 10 ancestors. By day 60 96% of the population sampled is composed of Lineage 2 where two additional mutations have fixed compared to day 30. Lineage 1 is again detected on day 90 having acquired two additional mutations, while Lineage 2 diverges into two sublineages that are present up to day 240 (Fig A.3, light purple). Lineage 1 also has transient sublineages that are observed on day 150, as well as on days 360 through 720 (Fig A.3, light blue). Throughout the 1200 days of incubation, Lineages 1 and 2 change in relative frequency and continually acquire new mutations (Fig A.3 & Fig A.4). Despite the large fluctuations in their frequencies, including population numbers below our limit of detection, neither Lineage 1 nor Lineage 2 ever permanently take over the population (Fig. A.5A). It is possible that Lineages 1 and 2 are each occupying their own niches and their relative frequency in the population 89 fluctuates as their particularly niches expand or contract. Another possibility is that the frequency fluctuations are caused by one-way or two-way cross-feeding interactions between Lineages 1 and 2, a phenomenon that has been observed in experimentally evolved microbial populations in constant environments (165, 166) and would likely impose constraints on niche size based on the output of a public good produced by one or both lineages (167). The divergence and apparent extinction of sub-lineages from Lineages 1 and 2 might suggest the action of clonal interference, where different genotypes that confer a similar fitness advantage cannot coexist for extended periods of time (168). Further, the long- term survival of genotypes from Lineages 1 and 2 support a model where those clones are each are occupying their own niches. In contrast to Lineages 1 and 2, Lineages 3 and 4 are relatively invariant over the course of the experiment, fixing only one new mutation each after day 10 (Fig A.3). Further, compared to Lineages 1 and 2, Lineages 3 and 4 experience relatively few mutations. One possible explanation for the lack of genetic sweeps and abundant diversity in Lineages 3 and 4 is that the relatively small size of these subpopulations limits the amount of diversity that can be generated. This is in contrast to Lineages 1 and 2 who, because of their ability to transiently grow and take over the population, even though occasionally dropping below our limit of detection, reach threshold population densities large enough to yield enough genetic diversity for selection to act. Alternatively, it is possible that Lineages 3 and 4 were niche specialists, allowing them to remain invariant for relatively long periods of time, though ultimately losing their competitive advantage once their particular niches were lost. A.3.3 Abundant genetic diversity leads to sequential selective sweeps in the two main lineages. We sequenced 1117 clones and identified 393 unique genotypes, which include some combination of 661 mutations. At day 10 we observe ten genotypes 90 diverging from the parent. Of these, four become the founding genotypes for the four lineages that persist for at least 150 days; the other six genotypes are extinct by day 60 (Fig A.5A). Ultimately, two lineages , L1 and L2, dominate the population (Fig. A.4). A more detailed analysis of L1 and L2 shows similar features: Each lineage is the result of a series of mutational events, most likely the result of positive selection, where each node represents a selective sweep (Fig. A.4; node size reflects number of clones). We refer to these mutations as the “core” mutations associated with each lineage. On day 20, three genotypes exist, largely (~90%), but not exclusively with mutations in rpoS or sdhB. A series of mutations continual sweep through the L1 and L2 populations, with at least 14 sweeps in L1 (Table A.1). The first observed mutation to sweep in L1 is dtpA on day 20 (Fig. A.5A; Table A.1). Subsequent new mutational sweeps occur roughly every 90 days. Multiple concurrent genotypes can be observed within a Lineage and multiple Lineages co-occur in the same population (Fig. A.4). The observed genetic variation of the population is more complex from days 360 to 390 where at least 5 genotypes co-occur, with sequential mutations in rimJ, metB, acs1, IG1, IG2, and ybcN. All of these mutations are ultimately fixed by day 420. We observed 15 selective sweeps in L2, representing a similar level of genotypic diversity as L1 (Table A.1). A variant of hfq is the first mutation to fix on day 10 in L2. We also report a unique mobilization of the insertion sequence 2 element (IS2) (Fig. A.4 and Fig. A.5B) in L2. We detected 23 non-reference IS2 elements across a total of 5 genotypes. 91 Table A.1. List of mutations reaching fixation divided by lineage. Where applicable gene identity, codon change and function are included. Gene Promoter Mutation Ref Codon New Codon Function Lineage 1 nth/dtpA dtpA A -> T (+503/-108) DNA glycosylase and apyrimidinic (AP) lyase (endonuclease III)/dipeptide and tripeptide permease A nlpD rpoS S210S TCT TCC activator of AmiC murein hydrolase activity, lipoprotein sdhB M107I ATG ATA succinate dehydrogenase, FeS subunit cpdA 1 S222L TCG TTG 3',5' cAMP phosphodiesterase glpR N91fs Geneious type: cds rimJ A61fs ribosomal-protein-S5-alanine N-acetyltransferase metB P146S CCA TCA cystathionine gamma-synthase, PLP-dependent acs 1 GTTATCGAC 1→2 acetyl-CoA synthetase IG1 mak/araJ araJ G -> A manno(fructo)kinase/L-arabinose-inducible putative transporter, MFS family IG2 mtlA/yibI 1bp insertion mannitol-specific PTS enzyme: IIA, IIB and IIC components/DUF3302 family inner membrane protein ybcN IS2 insertion DLP12 prophage, uncharacterized protein IS2 Deletion 1336bp del insC1, insD1, insD1 kgtP IS3 insertion alpha-ketoglutarate transporter galS R5C CGT TGT autorepressor, galactose- and fucose-inducible galactose regulon transcriptional isorepressor, mgl operon transcriptional repressor envZ R214C CGT TGT sensory histidine kinase in two-component regulatory system with OmpR yfcR V54I GTA ATA putative fimbrial-like adhesin protein Lineage 2 hfq I44S ATC AGC paaX 1 R74C CGC TGC transcriptional repressor of phenylacetic acid degradation paa operon, phenylacetyl-CoA inducer lrp IS2 insertion leucine-responsive global transcriptional regulator rne G174D GGC GAC RNA-binding protein,RNA degradosome binding protein, endoribonuclease proQ R80C CGT TGT RNA chaperone, putative ProP translation regulator IG3 insH1/lnt insH1 1bp del IS5 transposase and trans-activator/apolipoprotein N-acyltransferase IG4 clpX/lon lon IS186 insertion ATPase and specificity subunit of ClpX-ClpP ATP-dependent serine protease/DNA-binding ATP-dependent protease La putP 1 A337T GCG ACG proline:sodium symporter crp E55K GAA AAA cAMP-activated global transcription factor, mediator of catabolite repression yfhM G685D GGC GAC anti-host protease defense factor, bacterial alpha2-macroglobulin colonization factor ECAM, periplasmic inner membrane-anchored lipoprotein ompC A282S GCT TCT outer membrane porin protein C rcsB IS2 insertion response regulator in two-component regulatory system with RcsC and YojN kup IS2 insertion potassium transporter yraJ IS2 insertion putative outer membrane protein rnlA IS2 insertion CP4-57 prophage, RNase LS ydeT IS2 insertion Geneious type: cds kdgR IS2 insertion KDG regulon transcriptional repressor dcuA IS2 insertion C4-dicarboxylate antiporter acs 2 M1V ATG GTG acetyl-CoA synthetase IG6 arpB/yniD yniD IS2 insertion Geneious type: cds/uncharacterized protein lptC IS2 insertion LPS export protein, periplasmic membrane-anchored LPS-binding protein yhbO A113D GCC GAC stress-resistance protein tfaS IS2 insertion Geneious type: cds ydhV IS2 insertion putative oxidoreductase subunit ybbP D54E GAT GAA putative ABC transporter permease acrB IS2 insertion multidrug efflux system protein nanC IS2 insertion N-acetylnuraminic acid outer membrane channel protein gspO IS2 insertion bifunctional prepilin leader peptidase/ methylase yqjC IS2 insertion DUF1090 family putative periplasmic protein cpdA 2 IS2 insertion 3',5' cAMP phosphodiesterase ydjA G129G GGC GGT putative oxidoreductase nagK IS2 insertion N-acetyl-D-glucosamine kinase ycbV IS2 insertion putative fimbrial-like adhesin protein IG7 yjjY/yjtD yjtD IS2 insertion uncharacterized protein/putative methyltransferase ydhC P227L CCG CTG putative arabinose efflux transporter cvrA F311fs putative cation/proton antiporter IG8 phoE/proB phoE IS2 insertion outer membrane phosphoporin protein E/gamma-glutamate kinase Lineage 3 cpdA 3 3bp del 3',5' cAMP phosphodiesterase rpoS 1bp del RNA polymerase, sigma S (sigma 38) factor malZ I128T ATC ACC maltodextrin glucosidase rrl? 1bp insertion 23S ribosomal RNA of rrn operon Lineage 4 IG12 psuK/fruA psuK A -> C pseudouridine kinase/fused fructose-specific PTS enzymes: IIBcomponent/IIC components mukB R1465R CGA CGG chromosome condensin MukBEF, ATPase and DNA-binding subunit rpoS L116* TTG TAG RNA polymerase, sigma S (sigma 38) factor mhpF T55T ACC ACT acetaldehyde-CoA dehydrogenase II, NAD-binding amn 1bp insertion AMP nucleosidase Table 1. Lineage 1 and 2 mutations that fix along the main line 92 Figure A.3. Mutations present in individual clones from Day 10 to 1200. Time points are separated into panels where columns are clones and rows are mutations. Filled in boxes represent mutations identified in clones, gray boxes represent wild-type alleles, and white boxes are missing data due to low coverage. Mutations are colored according to the lineage a clone belongs to. Along with those core mutations we observe a tremendous amount of “collateral” mutations, mutations that may occur once or at higher frequency, but ultimately are evolutionary “dead ends.” We detect these collateral mutations in 1 to 10 clones from a given population, suggesting that these genotypes were at least transiently under positive selection. Randomly sequencing ~48 clones from a population whose size fluctuates between 10 4 and 10 6 cells samples a very small fraction of the population (between 0. 5% and 0.005%). Yet, we identified many single mutation variants from Lineage 1 and Lineage 2 providing a glimpse into the enormous amount of genetic variation present in the population that can be acted upon by natural selection. Clearly the variants observed are only the largest tip of the “genetic iceberg” in the entire population since these variants are present at a high enough frequency in the population they are detected at our level of sampling. As new mutations fix within 93 Lineages 1 and 2 many variants are purged from the population, however variants of these newly fixed mutations are ultimately detected as well. This also suggests that a significant amount of cell division is occurring to produce the amount of variation detected. 94 Figure A.4. Allele frequencies of mutations identified in Lineage 1 and Lineage 2. Four main lineages diverge from the parent between Days 10 and 1200. Minimum spanning tree (MST) showing phylogenetic relatedness of all 1151 clones sequenced. Each node represents a single genotype with the size proportional to the number of clones identified with that genotype. White nodes on bottom right provide a reference of number of clones relative to node size. The parental genotype is indicated as the black node at the top. Gene names within or next to nodes indicate the mutated gene(s) for that genotype. The four main lineages are indicated by color (L1, teal; L2, purple; L3, orange; L4, pink.) Sub-lineages that diverge are indicated in lighter shades of blue and purple. indicates new insertion element (IS). * indicates reversion to reference allele. A.3.4 Types of mutations that fix in Lineages. Many of the mutations fixed in the four lineages occur in genes from three broad functional groups: metabolic enzymes, transcriptional regulation, or transport (Table A.1). Genes involved in central metabolism include sdhB, a subunit of succinate dehydrogenase; acs, an acetyl-coA synthetase; and mhpF, an acetaldehyde dehydrogenase. Mutations in transcriptional regulators encoded by glpR and paaX may indirectly affect metabolism since they regulate genes involved in the catabolism of glycerol-3-phosphate and phenylacetic acid, respectively (169-171). Genes encoding the global transcriptional regulators RpoS, Hfq, and CRP are also mutated, potentially affecting expression of hundreds of genes (172-174). Two recent studies of evolution during LTSP identified mutations in multiple sub-units of RNA polymerase within the first 28 days of culture (175, 176). Mutations in global regulator genes, such as hfq and rpoS, as well as the core transcriptional machinery from these other studies may point to global transcriptional disregulation as an early strategy to adapt during LTSP. Global disregulation through mutations in transcription machinery and global regulators followed by fine-tuning to restore transcription to a pre-stress or new optimal state has previously been suggested as a strategy to adapt to stress conditions (177, 178). Combinations of mutant alleles identified in Chib et al. (2017) and in this study, including genes such as rpoS, sdhB, and glpR, suggests that parallel routes to similar fitness exist represented by multiple genotypes. We observe mutations in membrane proteins involved in the transport of small peptides and amino acids including: putP, a sodium/proline symporter; 95 ybbP, a predicted component of an ABC superfamily metabolite uptake transporter; and sstT, a sodium ion coupled serine/threonine symporter. Mutations in both envZ and ompR, a two-component system controlling expression of the outer membrane porins ompC and ompF, (179) occur both in Lineage 1 and 2 (Fig A.4). Mutations in envZ, ompC, and ompR all respectively fix within Lineage 1, Lineage 2, and a collateral line of Lineage 2. Expression of ompF has been shown to be regulated by rpoS, expression of ompF decreases as cells transition from exponential to stationary-phase in an rpoS dependent manner (180). Intergenic mutations are often thought of as neutral and unlikely to be beneficial or deleterious. The first mutation distinguishing Lineage 1 is an intergenic mutation upstream of dtpA, a di- and tripeptide transporter (Table A.1). The A-to-T mutation upstream of dtpA is adjacent to the -10 sequence and qRT- PCR measured dptA mRNA in clones with this mutation is increased 23-fold in the mutant (Fig). The second mutation to fix within Lineage 1 results in a silent mutation within the coding region of nlpD, and silent mutations are also often thought of as neutral (Table A.1). The rpoS promoter is within the coding region of nlpD, thus a mutation within the rpoS promoter also results in a mutation within nlpD, albeit a silent mutation in this case (Table A.1). The T-to-C change identified within the -10 of rpoS, and coding region of nlpD, results in an 8-fold decrease in expression of rpoS, measured by qRT-PCR (data not shown). This suggests that mutations traditionally thought to be neutral are affecting important phenotypes, and that mutations that fix within Lineages were likely under positive selection. New IS insertions were also identified in clones which can result in knocking out genes if inserted within the coding region, increased expression by replacing the native promoter with a more active IS promoter, or decrease expression of expression by removing the native promoter (181). New IS element insertions can also contribute to rearrangements and amplifications with in the genome (161). A significant amount of IS2 hopping was detected starting on day 96 300, when the frequency of Lineage 2 increased to nearly half of the population (Fig A.5A). Most of the IS2 insertions occur exclusively within Lineage 2, suggesting the hopping is likely due to either a physiologic or genetic response. IS2 has been shown to transpose via a two-step copy-and-paste process (182, 183) with CRP negatively regulating this process by binding the promoter of the IS2 element transposase gene (184). Prior to the expansion of IS2 within Lineage 2, a mutation in crp had fixed, which may be the causal factor in the IS2 mobilization (Fig A.3B). Of all IS2 insertions, 19% (63/331) inserted into noncoding regions and 81% (268/331) inserted into coding regions. Since ninety percent of the parental genome is coding, significantly more IS2s inserted into noncoding regions than expected (Chi-Square P< 0.0001). This is likely because disruption of some genes is disadvantageous and therefore selected against, while neutral IS2 insertions can hitchhike along with beneficial mutations. A total of nine distinct amplification and deletion events affecting >1% of the chromosome were detected (Supplementary Fig A.5). All were transient, being present only at a single time point, suggesting that these amplifications and deletions may be maladaptive during LTSP. Since these large amplifications and deletions were transient we did not observe a change in genome size over time like those observed in several serial passage evolution experiments (185). In long-term batch culture the environment is dynamic where loss of genomic content may be maladaptive, compared to a serial passage regime where cells are in a constant environment and, therefore, certain genes are more likely to become unnecessary in the absence of environmental variation. 97 Figure A.5. Frequency of four main Lineages over time, and allele frequencies within Lineage 1 and 2. (A) Stacked bar plots showing the frequencies of individual genotypes from Lineages 1, 2, 3, and 4, with colors corresponding to those in the MST plot. Allele frequencies for novel genotypes appearing in clones from Lineage 1 (B) and Lineage 2 (C) are shown. A.3.5 Parallel Evolution. Examples of potential parallel evolution were detected in our dataset and strongly suggest that certain loci were under positive selection. In clones from day 10, eight mutant alleles of hfq were identified with five of the eight alleles encoding either a nonsense or frame-shift mutation, suggesting that deactivation of Hfq was under strong positive selection (Table A.2; Supplementary Table A.1). Hfq is a highly conserved small RNA chaperone, functioning as a homohexamer, that regulates translation of many mRNAs, including positive regulation of rpoS translation (173, 186, 187). The three remaining hfq alleles identified result in non-synonymous amino acid changes in the Hfq protein. Two alleles change the aspartic acid at position 40 to either an alanine or a valine, and the third changes the isoleucine at position 144 to a serine. The aspartic acid residue at position 40 is important for Hfq hexamer formation, and it was shown that a D40A mutant has decreased Hfq hexamer stability, likely decreasing Hfq function (188). Three 98 mutant rpoS alleles are also present on day 10, two resulting in non-conservative, single amino acid changes, while the third results in a stop codon, truncating RpoS to about two-thirds of normal length (Supplementary Table A.1). Table A.2. Genes mutated at least 3 times, the type of mutation and function of gene. These hfq and rpoS alleles may both be providing an advantage by downregulating the RpoS regulon, similar to the first GASP mutation identified in batch culture (Zambrano et al. 1993). Assays for the production of glycogen and degradation of hydrogen peroxide are frequently used to test for RpoS activity since these processes are directly regulated by RpoS (189, 190). Clones isolated from day 10 harboring hfq and rpoS mutant alleles showed decreased RpoS activity by assaying the products of these stationary-phase specific genes; supporting a model where decreased RpoS activity is under strong selection by day 10 (data not shown). Aside from mutations that fix in Lineages 1 through 4, there are numerous other mutations detected only once in collateral genotypes (Table A.2). Many of the mutations affecting genes three or more times are IS2 insertions, examples of genes that may be beneficial to knockout during periods of starvation (Table A.2). The two genes with the most mutations detected yafD, an endo/exonuclease/phosphatase family protein and dcuA, a C4-dicarboxylate Gene Total Syn Nonsyn Nonsen Indel IS (IS2) Del Function yafD 13 1 12 (12) endo/exonuclease/phosphatase family protein dcuA 9 9 (9) C4-dicarboxylate antiporter hfq 8 3 4 1 HF-I, host factor for RNAphage Q beta replication, global sRNA chaperone htrE 6 6 (6) putative outer membrane usher protein proQ 6 2 2 2 RNA chaperone, putative ProP translation regulator rhsJ 5 5 (5) putative RHS domain-containing protein rpoS 5 3 1 1 RNA polymerase, sigma S (sigma 38) factor cpdA 4 1 1 2 (2) 3',5' cAMP phosphodiesterase dtpB/uspA 4 4 (4) dipeptide and tripeptide permease B/universal stress global response regulator mqsA 4 4 (3) antitoxin for MqsR toxin, transcriptional repressor nupX 4 1 3 nucleoside permease sulA 4 1 1 1 1 (1) SOS cell division inhibitor yafC/yafD 4 4 (4) LysR family putative transcriptional regulator/endo/exonuclease/phosphatase family protein yaiT 4 4 (4) putative autotransporter ycgB/dadA 4 4 (4) SpoVR family stationary phase protein/D-amino acid dehydrogenase yneO 4 1 3 (3) putative autotransporter-encoding sequence acs 3 1 1 1 acetyl-CoA synthetase ispB 3 3 octaprenyl diphosphate synthase lrp 3 3 (3) leucine-responsive global transcriptional regulator paaX 3 2 1 transcriptional repressor of phenylacetic acid degradation paa operon, phenylacetyl-CoA inducer putP 3 2 1 proline:sodium symporter rcsB 3 1 2 (2) response regulator in two-component regulatory system with RcsC and YojN smf 3 3 (3) DNA recombination-mediator A family protein ydbA 3 2 (2) 1 putative outer membrane protein yfjH 3 3 (3) CP4-57 prophage, uncharacterized protein yhcF 3 3 (3) putative transcriptional regulator Table.2 Genes and intergenic regions mutated 3 or more times 99 transporter, both having many IS2 insertions detected (Table A.2). Four IS2 insertions upstream of yafD were also identified. Three alleles of putP, a proline and sodium transporter, were identified, one in the Lineage 2 main line of descent and a second in a Linage 2 collateral line of descent, illustrating a possible example of clonal interference (Fig A.4). Five alleles of proQ, a small RNA chaperone that affects mRNA levels of proP, an osmoprotectant/proton symporter capable of transporting proline and glycine betaine (191, 192). Multiple alleles of genes linked to the TCA cycle such as paaX, sdhB, fadB, and acs are detected, some of which fix in Lineages 1 and 2. A.4 Materials and Methods Add qRT-PCR A.4.1 Incubation of long-term cultures. 5.0 ml of Luria-Bertani (LB) broth in an 18 x 150mm borosilicate test tube was inoculated with a clone of E. coli K-12, W3110-lineage strain ZK1142 from a frozen glycerol stock and incubated at 37°C with aeration in a humidified warm room (65-70% relative humidity) for 480 days in a TC-7 roller drum (New Brunswick Scientific, Edison, NJ). 10 µL of the culture was saved on days 10, 20, 30 and every subsequent 30 days in LB+10% glycerol and frozen. Starting at Day 30, prior to each sampling, sterile distilled water was added to the culture to restore volume to 5.0 ml (typically 0.25 to 0.5 ml of water added, data not shown). A.4.2 Next Generation sequencing of clones. Clones from frozen samples at appropriate time points were isolated by spreading a sample of the glycerol stock onto LB agar plates at sufficient dilution to yield individual colonies. Clones from days 330 and 240, while forming colonies, did not grow well in liquid LB media making it difficult isolate DNA for sequencing libraries. Therefore, we were unable to recover clones from day 330 and only 15 clones from day 240. After overnight growth at 37°C, single colonies 100 were picked into 200 µL LB broth in individual wells of a 96-well plate and incubated for 48 hours at 37°C. DNA was extracted from 96-well plate cultures using Qiagen DNeasy 96 Blood & Tissue Kits. Illumina sequencing libraries were prepared for each clone following standard Illumina methods using the Illumina Nextera DNA Library Preparation Kit. By utilizing dual indices, we pooled together 192 libraries, or ~4 time points, per run on an Illumina NextSeq (Sequencing was performed through the USC sequencing CORE and BGI). A.4.3 Subsampling to determine number of clones to sequence. We performed a pilot experiment sequencing 93 and 150 individuals on day 150 and day 210 respectively. We randomly subsampled 1 to 80 individuals without replacement 10 times from each sample and looked at the number of unique variant sites detected in each sample. We further used hierarchical clustering and divided the resulting tree into 1-5 clusters. We randomly subsampled 1 to 80 individuals from all of these clusters without replacement 10 times each (Supplementary Fig. A.1). Based on these samples we concluded that at 48 individuals we were consistently able to sample all major genotypes occuring on day 150 and 210. For every subsequent time point where possible we generated DNA sequencing libraries from this many individuals. A.4.4 llumina read alignment and variant calling. Reads were aligned to the parental ZK126 reference genome (193) using bwa (100) and mpileups were generated from the resulting Bam files after removal of duplicate reads using Samtools(194). Mpileups were then parsed using a custom python pipeline to identify mutation events. Any site with coverage of 5 or more reads and 80% or higher consensus different from reference was called as a variant site. To reduce the incidence of false negatives, mpileups were parsed a second time scanning all identified variants and called by simple majority at each sites with a minimum coverage of 3. We next determined whether variants occurred in 101 coding sequences, promoter elements, or intergenic regions. If variants were found to be in coding regions both reference sequences and variant sequences were translated and interrogated for amino acid changes. Variant sites not in coding regions were checked for proximity to genes. Any site within 200 bases of the transcriptional start of a gene was considered to be a potential promoter variant. A.4.5 Phylogenetic organization. To visualize phylogenetic relatedness a minimum spanning tree (MST) of the sequenced clones was generated using Phyloviz (195). We generated a sequence containing only variant sites for each unique genotype. Clones were then placed into sequence type groups and a Full MST was generated using the geoBURST algorithm. A.4.6 CNV and Insertion Sequence (IS) element movement. To identify any large-scale duplication or deletions in the genome, sequencing coverage depth was plotted across all individuals. By visual inspection it was possible to identify several events across time points. To further investigate insertion element mobility in clones we generated a mobileome by determining the consensus sequence for all annotated IS1, IS2, IS3, IS4, IS5, IS150, IS186, IS530, IS600, IS609, IS911 and ISZ elements in the E. coli K-12 reference genome. We mapped reads to this mobileome using bwa (100) and parsed the resulting sam file for reads where one read mapped to a mobile element but its pair did not. We then used BLAST to localize the non-mapping reads to the reference genome keeping only matches that mapped uniquely with 0 or 1 mismatch. We then mapped reads the reference genome using MOSAIK (196) to identify reads that spanned the insertion site. We used BLAST to localize the unmapped segment of the read as well as the reads pair if unmapped to the genome. We used the R package Ckmeans.1d.dp (197) to perform k-means clustering of all genomic positions associated with each IS element. 102 A.5 Conclusion This work provides the most comprehensive analysis of the population structure from cells incubating in long-term batch culture to date. From this work we have gained our first, fine structure, look into the population dynamics during long-term batch culture and have characterized mutations that may be beneficial during LTSP. In their natural environments, bacteria likely spend a majority of their time under conditions of starvation and stress, similar to those during LTSP. Bacteria have evolved the ability to not only survive but thrive in environments once thought to be inhabitable on our planet. Elucidating mechanisms that allow them to adapt to shifts through cycles of “feast and famine” deepens our understanding of their survival mechanisms in natural environments. Future investigation of parallel cultures will provide information into evolutionary trajectories that are under strong selection and may provide targets to further understand mechanisms utilized to survive in a broad variety of environments. 103 A.6 Supplementary Materials Supplementary Figure A.1. Number of clones required to capture the major genotypes present in the population. The number of genotypes, as determined by hierarchical clustering of mutation content, observed is shown as a function of number of individuals sampled. At 40 or more individuals, 3 genotypes or more are sampled in all cases. Supplementary Figure A.2. Spectrum of new mutations from each time point. Types of mutation changes identified for each time point (A) as well as the frequency of mutation type from all point mutations identified (B) are show in stacked bar graphs. *P-value 3.521e-09 for chi-square goodness of fit test compared to mutation spectrum data from Lee et al. 2012. 104 Supplementary Figure A.3. Most genes and intergenic regions are mutated only once. 105 Supplementary Figure A.4. Minimum spanning tree encoding time as node color. Earlier time points are dark purple and later time points are yellow. 106 Appendix B: Regulatory rewiring in a cross causes extensive genetic heterogeneity. This work appears as published in 2015 in Genetics. 201: 769-777 GENETICS | INVESTIGATION Regulatory Rewiring in a Cross Causes Extensive Genetic Heterogeneity Takeshi Matsui, Robert Linder, Joann Phan, Fabian Seidl, and Ian M. Ehrenreich 1 Molecular and Computational Biology Section, Department of Biological Sciences, University of Southern California, Los Angeles, California 90089-2910 ABSTRACT Genetic heterogeneity occurs when individuals express similar phenotypes as a result of different underlying mechanisms. Although such heterogeneity is known to be a potential source of unexplained heritability in genetic mapping studies, its prevalence and molecular basis are not fully understood. Here we show that substantial genetic heterogeneity underlies a model phenotype—the ability to grow invasively—in a cross of two Saccharomyces cerevisiae strains. The heterogeneous basis of this trait across genotypes and environments makes it difficult to detect causal loci with standard genetic mapping techniques. However, using selective geno- typingintheoriginalcross,aswellasintargetedbackcrosses,wedetectedfourlocithatcontributetodifferencesintheabilitytogrow invasively. Identification of causal genes at these loci suggests that they act by changing the underlying regulatory architecture of invasion. We verified this point by deleting many of the known transcriptional activators of invasion, as well as the gene encoding the cell surface protein Flo11 fromfiverelevant segregantsandshowing thatthese individuals differinthe genesthey requirefor invasion. Our work illustrates the extensive genetic heterogeneity that can underlie a trait and suggests that regulatory rewiring is a basic mechanism that gives rise to this heterogeneity. KEYWORDS complex traits; genetic mapping; invasive growth; regulatory networks; yeast G ENETIC studies in humans and model organisms have reportedunexplainedheritabilityformanytraits(Manolio etal.2009).Apossiblecontributortothis “missing”heritability is genetic heterogeneity—individuals exhibiting similar phe- notypesowingtodifferentgeneticandmolecularmechanisms (Risch 2000; McClellan and King 2010; Wray and Maier 2014).Geneticheterogeneitycanreducethestatisticalpower of mapping studies (Manchia et al. 2013; Wray and Maier 2014) and may involve multiple variants segregating in the samegene(allelicheterogeneity)ordifferentgenes(nonallelic heterogeneity) (Risch 2000). Work to date has shown that allelic heterogeneity is widespread (e.g., McClellan and King 2010; Ehrenreich et al. 2012; Long et al. 2014) and often involves two or more null or partial loss-of-function variants segregating in a single phenotypically important gene (e.g., Nogeeetal.2000;Sutcliffeetal.2005;Willetal.2010).How- ever,theprominenceandunderlyingmechanismsofnonallelic heterogeneity are less understood. Inthispaperwedescribeanexampleofnonallelichetero- geneityusingheritablevariationintheabilityofSaccharomy- ces cerevisiae strains to undergo haploid invasive growth as our model. Invasive growth is a phenotype that is triggered bylowcarbonornitrogenavailabilityandisthoughttobean adaptive response that allows yeast cells to adhere to and penetratesurfaces(CullenandSprague2000).Invasiontyp- icallyrequiresexpressionofFLO11,whichencodesacellsur- face glycoprotein that facilitates cell-cell and cell-surface adhesion (Lo and Dranginis 1998; Rupp et al. 1999). In additiontoFLO11,S.cerevisiaepossessesothercellsurface proteins that can contribute to adhesion-related traits [as described in Guo et al. (2000) and Halme et al. (2004) and elsewhere]. In some cases, these cell surface proteins are regulated by multiple signaling cascades (Bruckner and Mosch 2012), potentially providing an opportunity for ge- netic variants in different pathways to have similar effects on invasion. Hereweexaminethegeneticbasisofvariationintheability to invade on two carbon sources—glucose and ethanol—in Copyright © 2015 by the Genetics Society of America doi: 10.1534/genetics.115.180661 ManuscriptreceivedMarch12,2015;acceptedforpublicationJuly28,2015;published Early Online July 30, 2015. Supporting information is available online at www.genetics.org/lookup/suppl/ doi:10.1534/genetics.115.180661/-/DC1 1 Corresponding author: Molecular andComputational Biology Section, Departmentof Biological Sciences, University of Southern California, Los Angeles, CA 21546. E-mail: ian.ehrenreich@usc.edu Genetics, Vol. 201, 769–777 October 2015 769 a cross of the laboratory strain BY4716 (BY) and the clinical isolate YJM789 (YJM) (Liti et al. 2009). YJM is highly inva- siveonbothcarbonsources(Figure1A).Incontrast,BYcan- notgrowinvasivelyoneithercarbonsource(Figure1A).This is because BY carries a nonsense allele of FLO8 (Figure 1B; see also Materials and Methods), which encodes a transcrip- tionalactivatorthatisregulatedbytheRas-cAMP-PKApath- way. Flo8 is typically required for invasive growth in both S. cerevisiae (Liu et al. 1996) and Candida albicans (Cao et al. 2006). Consistent with the importance of FLO8 for in- vasion, deletion of this gene from YJM significantly reduces its invasive growth on both carbon sources (Figure 1B; see also Materials and Methods). While screening BYxYJM segregants for invasion on the two carbon sources, we found that many individuals exhibit invasion even though they possess the FLO8 BY nonsense allele, a result that also was recently reported by Song et al. (2014). We show that this FLO8-independent growth has a heterogeneous genetic basis that reflects the presence of multiple distinct regulatory architectures that enable FLO8- independentinvasion.Mostoftheseregulatoryarchitectures are FLO11 dependent but require different transcriptional activators; however, we also provide evidence for an archi- tecture that is FLO11 independent. Our results suggest that regulatory rewiring is an important source of nonallelic ge- neticheterogeneityandillustratehowstudyingthecausesof phenotypicsimilaritiesamonggeneticallydistinctindividuals can advance our understanding of complex traits. Materials and Methods Generation of initial mapping population Weusedthesyntheticgeneticarraymarkersystem(Tongetal. 2001) to generate recombinant BYxYJM MATa segregants. TheBYparentofourcrosswasMATacan1D::STE2pr-SpHIS5 lyp1D his3D, while the YJM parent was MATa his3D::natMX ho::kanMX.WematedtheseBYandYJMhaploidstoproduce the diploid progenitor of our cross, which was sporulated using standard techniques (Sherman 1991). MATa segre- gantswereobtainedusingrandomsporeplatingonminimal medium containing canavanine, as described previously (Ehrenreich et al. 2010; Taylor and Ehrenreich 2014). Phenotyping for invasive growth Strains were phenotyped for invasive growth on 2% agar platescontainingyeastextractandpeptone(YP)witheither 2% glucose (dextrose) or 2% ethanol as the carbon source (YPD and YPE, respectively). Prior to pinning onto the agar plates, strains were grown overnight to stationary phase in liquidYPD.Afterthisculturingstep,strainswerethenpinned ontoagarplatesandallowedtogrowfor5days.Followingthis incubationperiod,wescreenedforinvasivegrowthbyapply- ingwatertotheagarplates,manuallyscrubbingcolonies,and decantingthemixtureofwaterandcells.Presenceorabsence ofinvasionwasscoredbyeyeunderalightmicroscope.Each segregant was phenotyped three independent times, and the median phenotype was used in analyses (Supporting Information, Table S1). Genotyping by sequencing Segregants were genotyped by Illumina sequencing. Whole genomelibrarieswereconstructedusingtheIlluminaNextera XT DNA Library Preparation Kit. These libraries then were sequencedinmultiplextoatleastfivetimesgenomiccoverage oneitheranIlluminaHiSeq2000oranIlluminaNextSeq500 with100basepair(bp)3100bpreads.WealsosequencedBY andYJMto!100timesgenomiccoverageandusedthedata to identify 57,402 high-confidence SNPs. Reads for segre- gants were mapped to the BY genome using a Burrows- WheelerAligner(BWA)(LiandDurbin2009)andSAMtools (Li et al. 2009). We called genotypes for each individual by taking the base calls at the SNPs and employing a hidden Markov model by chromosome using the HMM package in R, as described by Taylor and Ehrenreich (2014). Data availability Thesequencedatafromourexperimentsisavailablefromthe NCBI Sequence Read Archive under accession numbers SRR2039809–SRR2039935, SRR2039936–SRR2039992, SRR2040045–SRR2040076, SRR2040023–SRR2040044, and SRR2039993–SRR2040022 (Table S1, TableS2, Table S3, Table S4, Table S5, and Table S6). All other data from the paper are provided in the Supplement or are available by request from the authors. Detection of loci influencing ability to invade Allelefrequencyanalyseswerecomputedusingthegenotype data of all individuals from a particular mapping population thatexhibitedthesamephenotype.Todeterminetheintervals of the identified causal loci, we identified regions where the alleles were eitherfixed or at a frequency of 95% or higher. Genetic engineering Knockouts were generated by PCR amplifying the CORE cassette with homology-tailed primers and then selecting fortransformantsonG418(Storicietal.2001).NEBPhusion high-fidelity DNA polymerase was used for PCR under the recommended reactionconditions with 35 cyclesand anex- tensiontimeof30sperkilobase.Theentirecodingregionof targetgeneswasdeletedinthesestrains.Correctintegration of the CORE cassette was checked for each deletion strain usingPCR.Allelereplacementstrainswereconstructedusing thecotransformationoftwopartiallyoverlappingPCRprod- ucts(FigureS1),similartotheworkofErdenizetal.(1997). One product contained the promoter and coding region of the gene to be replaced, while the other included (in order) 60 bp of overlap with the 39 end of the gene PCR product, kanMX or natMX, and 30–50 bp of the genomic region im- mediately downstream of the transcribed portion of the gene. Replacement of a gene was verified using Sanger sequencing. 770 T. Matsui et al. Generation of backcross segregants BackcrosseswereconductedbymatingaBYxYJMsegregantto a MATa his3D version of BY or YJM. Sporulation and selec- tion for MATa backcross segregants were performed as de- scribed for the initial mapping population. Screening for mating type and nongenetic effects Toinducemating-typeswitchinginourMATasegregants,we first deleted URA3 from these individuals using the hphMX cassette with homology-tailed primers, as described earlier. CorrectintegrationofthecassettewasverifiedusingPCRand further checked by plating the ura3D strains onto 5-FOA plates. Next, mating-type switching was performed using the pGAL-HO plasmid, as described previously (Herskowitz and Jensen 1991). Otherwise, isogenic MATa and MATa indi- viduals were mated to produce homozygous diploids. These individuals were sporulated as described earlier, and stan- dard microdissection techniques were used to obtain spores from the homozygous diploids. Tetrads from which all four sporeswererecoveredwerethengrownonglucoseandeth- anol and checked for the ability to invade (Table S7). Amplification of the FLO11 coding region The entire FLO11 coding region was PCR amplified using 59-GGAAGAGCGAGTAGCAACCA as the forward primer and 59-TTGTAGGCCTCAAAAATCCA as the reverse primer. The sizes of the BY and YJM alleles were compared on a 2% agarose gel. Results Many BYxYJM segregants show invasion that is independent of FLO8 We examined a population of 127 genotyped BYxYJM MATa segregants for the ability to invade on two carbon sources—glucose and ethanol (see Materials and Methods). DespitethemajorroleofFLO8intheinvasionphenotypesof BYandYJM(Figure1,AandB),weunexpectedlyfoundthat a large fraction (52%) of segregants with the FLO8 BY non- senseallelewerecapableofinvadinginatleastonecondition (Figure 1C). A possible explanation for these individuals’ phenotypes is that FLO8 BY is partially functional in some ge- netic backgrounds. Flo8 is comprised of a LisH domain (aminoacids72–105)thatisinvolvedinphysicalinteractions with the transcription factor Mss11 and a transcriptional ac- tivationdomain(aminoacids701–799)thatisnecessaryfor DNAbinding(Kimetal.2014).Thenonsensepolymorphism in FLO8 BY occurs after the LisH domain at amino acid 142, suggestingthatthetruncatedFlo8mayretainsomefunction- ality.WetestedforpartialfunctionalityofFLO8 BY bydeleting the entire coding portion of FLO8 from multiple invasive FLO8 BY segregants and phenotyping them for invasive growthonglucoseandethanol (see Materials and Methods). Complete deletion of FLO8 had no effect on invasion, sug- gesting that other mechanisms enable these individuals to grow invasively. Initial effort to identify loci underlying FLO8-independent invasion As a first step in identifying the genetic basis of FLO8- independent invasion, we screened 384 additional F 2 segre- gants for invasion on glucose and ethanol. We obtained 55 invasive FLO8 BY individuals from this experiment, bringing thetotalnumberofinvasiveFLO8 BY individualsto97.Among these97individuals,50%wereinvasiveonbothglucoseand ethanol, 37% were invasive only on glucose, and 12% were invasive only on ethanol (Figure 1D). We genotyped the 55 new individuals using low-coverage genome sequencing and attempted to detect enriched alleles among the larger set of 97 genotyped FLO8 BY strains that were capable of in- vasion(seeMaterialsandMethods).Althoughourpastwork Figure 1 Effects of FLO8 on ability to invade. (A) BY and YJM were grown for 5 days on YPD or YPE plates at 30!. Colonies were then washed off the plates using water and exam- ined for invasion. (B) Comparison of BY with a functional allele of FLO8 and YJM flo8D.(C) Fraction of the initial mapping population of 127 F 2 BYxYJM segregants that shows invasion on glucose (“glu”) or ethanol (“eth”) in each FLO8 genotype class. (D) Fraction of the 97 in- vasive FLO8 BY segregants that shows invasion on glucose, ethanol, or both carbon sources (“both”). Regulatory Rewiring in a Cross 771 suggested that such selective genotyping should have high statistical power (Ehrenreich et al. 2010), even in the pres- ence of complex nonadditive genetic effects (Taylor and Ehrenreich 2014), we failed to detect any loci using this strategy (Figure S2A). FLO8-independent invasion in glucose-only individuals depends on the MAPK cascade We hypothesized that FLO8-independent invasion is geneti- cally heterogeneous in the BYxYJM cross, reducing the sta- tisticalpowerofourgeneticmappingeffort.Tomitigatethis potential problem, we attempted to identify causal loci by focusing on different classes of FLO8 BY segregants. We first looked at FLO8 BY individuals that showed invasion on both glucoseandethanol,butthisanalysisdidnotidentifyanyloci (Figure S2B). We next examined individuals that invaded in only one condition, under the assumption that differ- ent mechanisms might underlie condition-specific invasion. Among the segregants showing FLO8-independent invasion onlyonglucose(n=36),nearlyalltheseindividualscarried the BY allele of a locus on chromosome VIII, which we were abletodelimitto10genes(Figure2A;seealsoMaterialsand Methods). To determine the causal gene(s) at the chromosome VIII locus, we replaced the BY allele of each gene in this interval with the YJM allele in a FLO8 BY segregant that was invasive only on glucose (Segregant 1; see also Materials and Meth- ods).Eachreplacementspannedthepromoter,codingregion, and part of the downstream region of the tested gene (Figure S1).TheonlyreplacementthathadaneffectwasGPA1,asub- unitoftheG-protein-coupledreceptorinvolvedinthemitogen- activatedproteinkinase(MAPK)cascadepheromoneresponse (Fujimura1989).ConvertingSegregant1’sGPA1alleletothe YJMversionrenderedthestrainnearlyincapableofinvading on glucose and had no effect on ethanol (Figure 2B). BY is known to possess a laboratory-derived amino acid variant (S469I) in GPA1 that causes a large number of gene expres- sionchangesspecificallyinglucose(Yvert et al. 2003;Smith and Kruglyak 2008). This amino acid substitution also may be the causal variant in our study. Multiple architectures of FLO8-independent invasion in ethanol-only individuals We next studied FLO8 BY individuals that were invasive only onethanol.Becauseoursamplesizeforthisgroupwassmall (n = 12), we generated backcross populations in a manner similartoTaylorandEhrenreich(2014)andusedthesepop- ulations to identify loci that influence invasive growth in asinglesegregant(Segregant2;seeMaterialsandMethods). In the backcross to BY, we screened 192 segregants and foundthat16%wereinvasiveonlyonethanol.Amongthese individuals (n = 30), we identified a single locus that was nearly fixed for the YJM allele (Figure 3A, top), which was located on chromosome IX and overlapped FLO11. FLO11 is known to harbor extensive functional variation across yeast isolates in both its coding and noncoding regions (Fidalgo et al. 2006, 2008). To test for functional variation at FLO11 in the BYxYJM cross, we separately replaced the coding and noncoding regions of FLO11 in Segregant 2 with the BY alleles(FigureS1;seealsoMaterialsandMethods).Wefound thatreplacementoftheFLO11codingregioncausedalossof invasion on ethanol (Figure 3B), while replacement of the noncoding region had no effect. A number of amino acid differences, as well as an!700-bp length difference, distin- guishtheBYandYJMallelesofFLO11(FigureS3andFigure S4), making it difficult to determine the causal variant. In the backcross of Segregant 2 to YJM, we also screened 192 segregants and found that 11% were invasive only on ethanol. Among these individuals (n = 22), we identified a single locus on chromosome XIV that was fixed for the BY allele.Basedonthegenotypedata,wedelimitedthisinterval to16candidategenes(Figure3A,bottom;seealsoMaterials Figure2 GeneticdissectionofFLO8 -independentglucose-onlyinvasion.(A)Genome-widerelativeallelefrequencyplotofglucose-onlyFLO8 BY BYxYJM segregants. FLO8 and the markers used to generate haploid progeny are highlighted by red vertical bars, while the strongly enriched locus on chromosome VIII, which was nearly fixed for the BY allele, is highlighted by a green vertical bar. The genomic interval underlying the chromosome VIII peak is also provided. (B) Comparison of Segregant1, a glucose-only FLO8 BY individual, and the GPA1 YJM Segregant 1 supports GPA1 as the causal gene underlying the chromosome VIII locus. 772 T. Matsui et al. and Methods). We tested every gene in this interval for an effectonSegregant2’sabilitytoinvadeusinggeneknockouts and found that only deletion of BNI1 resulted in a loss of invasion (Figure 3B; see also Materials and Methods). The BY and YJM alleles of BNI1 possess 31 coding SNPs, 7 of which are nonsynonymous, as well as 3 SNPs upstream of thegene(FigureS5).Bni1,whichhasbeenshownpreviously to affect invasive growth (Mosch and Fink 1997; Kang and Jiang2005),isinvolvedintheassemblyofactincables(Sagot et al. 2002) and physically interacts with multiple compo- nents of the MAPK cascadeinvolved in pheromone response (Chen and Thorner 2007). Although the FLO11 YJM coding region contributes to in- vasion on ethanol, not all the ethanol-only segregants possessed this allele. Among the 12 individuals that were invasive only on ethanol in our genotyped F 2 population, two carried FLO11 BY . To determine the mechanism that allowstheseindividualstoinvadeonlyonethanol,weback- crossedonerelevantsegregant(Segregant3)toBYandYJM. The YJM backcross exhibited very low sporulation; for this reason,wewereonlyabletoperformgeneticmappinginthe BY backcross. We screened 192 segregants and found 32 individuals (17%) that grew invasively only on ethanol. We performed genetic mapping to look for enriched alleles and identifiedasinglelocusonchromosomeII,atwhichindivid- ualswerefixedfortheYJMallele(Figure4A).Thislocuswas detected at a resolution of four genes, of which only AMN1 had an effect when deleted. To verify that the BY and YJM alleles functionally differ, we replaced Segregant 3’s AMN1 YJM withAMN1 BY andfoundthatthisresultedinalossofinvasion (Figure4BandFigureS1;seealsoMaterialsandMethods).An amino acid variant (D368V) in AMN1, which plays a role in daughter cell separation and exit from mitosis (Wang et al. 2003),hasbeenimplicatedasamajordeterminantofFLO11- independent cell clumping in multiple studies (Yvert et al. 2003; Li et al. 2013) and also may be the causal variant in our study. Testing for effects of mating type and nongenetic factors on FLO8-independent invasion Nongenetic factors are known to influence the expression of traits in yeast crosses (e.g., Sirr et al. 2015) and also may contribute to FLO8-independent invasion. Additionally, be- cause our experiments were conducted exclusively in MATa haploids, some of the FLO8-independent invasion may be mating-type dependent. To test both these possibilities, we generated and sporulated homozygous diploid versions of Segregants 1, 2, and 3 (see Materials and Methods). From each individual we obtained 7–10 four-spore tetrads. Only matingtypeandnongeneticfactorsshouldsegregateamong these spores (see Materials and Methods). If we have identi- fied loci that depend on mating type, then invasion should Figure 3 Genetic dissection of ethanol-only invasion by backcrossing Segregant 2 to BY and YJM. (A) Genome-wide relative allele frequency plots for the BY and YJM backcrosses are shown on the top and bottom, respectively. FLO8 and the markers used to generate haploid progeny are highlighted with red vertical bars, while the strongly enriched intervals on chromosomes IX and XIV are highlighted with green vertical bars. The genomic intervals underlying the chromosome IX and XIV loci are also provided. (B) Comparison of Segregant 2, an ethanol-only FLO8 BY individual, to FLO11 YJM replacement and BNI1 deletion strains in the Segregant 2 background supports FLO11 and BNI1 as the causal genes underlying the chromosome IX and XIV loci, respectively. Regulatory Rewiring in a Cross 773 cosegregate2:2withmatingtype.Alternatively,ifnongenetic factors contribute to FLO8-independent invasion, then less than 100% of the examined spores should show the same phenotype as their progenitor. The effects of mating type and nongenetic factors varied amongthetestedsegregants.ForSegregants2and3,which only invade on ethanol, all the haploid spores also showed ethanol-only invasion (Table S7). This indicates thatmating type and nongenetic factors likely do not influence the phe- notypesoftheseindividuals.Incontrast,Segregant1,which onlyinvadesonglucose,providedevidenceforbothmating- typeandnongeneticeffects.Amongthe40testedsporesfrom this individual, 16 of 20 MATa spores showed glucose-only invasion, while none of the 20 MATa spores exhibited inva- sion(TableS7).ThissuggeststhatSegregant1’sphenotypeis mating-type dependent and also may have a nongenetic component. Segregants that invade in a FLO8-independent manner require different transcription factors and cell surface proteins Our results to this point indicate that FLO8-independent in- vasionhasaheterogeneousbasisthatislargelygenetic.This genetic heterogeneity might arise if distinct regulatory fac- tors and/or cell surface proteins facilitate invasion in differ- ent segregants and environments. The possibility of such rewiring of invasive growth is supported by recent work showing that the S1278b strain requires the transcrip- tion factor Tec1 to express FLO11, while BY does not (Chin et al.2012),aswellasbyexperimentsdemonstratingexten- sivevariabilityintranscriptionfactorbindingamongprogeny fromtheBYxYJMcross(Zhengetal.2010).Furthersupport- ing such a scenario, some of the genes that we cloned have regulatoryfunctions.Forexample,GPA1influencessignaling throughtheMAPKcascade,andtheMAPKcascadeisknown to regulate Ste12,whichisatranscriptionalactivatorrequired for invasion in many pathogenicfungi(LoandDranginis1998; Felden et al. 2014). To explore whether regulatory rewiring might contribute to the genetic heterogeneity in our study, we deleted 11 transcription factors that are known to regulate invasion, as wellasFLO11,fromSegregants1,2,and3(seeMaterialsand Methods). We also performed these deletions in two addi- tional individuals that showed FLO8-independent invasion on both glucose and ethanol (hereafter referred to as Segre- gant4andSegregant5).Althoughsomedeletionshadquan- titative effects on invasion (Figure 5), we focused on cases wheredeletionofoneoftheexaminedgenescausedinability to invade. Such complete losses of the phenotype indicate genes that are required for a particular segregant to express FLO8-independent invasion. Theexaminedsegregantsdifferedintheirrequirementsfor FLO11 and four transcription factors—MGA1, MSN1, RME1, and STE12 (Figure 5). None of the deletions caused Segre- gant3toloseitsabilitytoinvade,implyingthatthisindivid- ual invades in a FLO11-independent manner that may not require the examined transcription factors. In contrast, Seg- regants1,2,4,and5showedFLO11-dependentinvasionbut differed in the transcription factors that they require. Segre- gants 1 and 4 lost the ability to invade when STE12 was deleted, suggesting that their ability to invade is MAPK dependent. Segregants 2 and 5 required MSN1, a transcrip- tional activator that influences many traits in yeast. While MSN1 was the only transcription factor that caused loss of invasion in Segregant 2, Segregant 5 also lost its ability to invadewhenMGA1andRME1weredeleted.Thefindingthat individualsdifferinthetranscriptionfactorsandcellsurface Figure 4 Genetic dissection of FLO11-independent ethanol-only invasion by backcrossing of Segregant 3 to BY. (A) Genome-wide relative allele frequency plot of ethanol-only invasion in the backcross of Segregant 3 to BY. The marker used to generate haploid progeny is highlighted with aredverticalbar,whiletheenrichedlocusonchromosomeIIishighlightedwithagreenverticalbar.ThegenomicintervalunderlyingthechromosomeII locus is also provided. (B) Comparison of Segregant 3, a FLO11-independent ethanol-only FLO8 BY individual, to AMN1 BY replacement strains in the Segregant 3 background supports AMN1 as the causal gene underlying the chromosome II locus. 774 T. Matsui et al. proteins that they require for invasion supports regulatory rewiring as a cause of genetic heterogeneity in our study. Conclusion We have shown that a model phenotype in yeast—haploid invasive growth—exhibits extensive nonallelic genetic het- erogeneity. This heterogeneity is caused by genetic variants that change the regulation of invasive growth and enable FLO8-independent invasion in specific cross progeny. Our resultsfromgeneticmappingandgeneticengineeringexperi- mentssuggestthatmultipledistinctregulatoryarchitectures ofFLO8-independentinvasionsegregateintheBYxYJMcross. Althoughtheseregulatoryarchitecturesrequiredifferenttran- scriptionfactorsand/orcellsurfaceproteins,theyleadtosim- ilarabilitiestoinvade. Thepresentdatadonotshedlightonthespecificdetailsof thesedifferentregulatoryarchitectures.However,thefinding that most BYxYJM segregants that show FLO8-independent invasionrequireFLO11suggeststhatFLO11expressionisan importantcomponentofmostoftheregulatoryarchitectures. This is of note because FLO11 has one of the largest pro- moters in the yeast genome and is thought to be influenced byatleast8pathwaysand15transcriptionfactors,aswellas linked noncoding RNAs and chromatin remodeling com- plexes (Bruckner and Mosch 2012). The potential of FLO11 toberegulatedbyanumberofdifferentpathwaysmayfacil- itatesomeofthevariabilityinwiringthatwehavedescribed. Our finding that different transcription factors and cell surface proteins are required for different genetic back- grounds to invade is similar to the recent discovery of “con- ditionalessential”genesinyeast(Dowell etal.2010).These conditionalessentialgenesarenecessaryforviabilityinsome isolates but dispensable in others. Our work suggests that conditionalessentialitymayarisebecausegeneticallydistinct individualsexpresssimilarphenotypesasaresultofdifferent underlying regulatory mechanisms. If this is true, then the essentialityofageneforatraitwilldependonwhichsignal- ing cascade(s) or pathway(s) an individual employs to ex- press a given phenotype in a particular environment. Giventhatwehaveexaminedasinglephenotypeinonly onepairwisecrossandtwoconditions,wecannotcomment on the broader extent of this heterogeneity across species, traits,andenvironments.However,wenotethatourresults arecomparabletorecentstudiesinhumans[assummarized inMcClellanandKing(2010)]andmice(Shaoetal.2008; Figure 5 Deletion screen of known FLO11 activators. FLO11 and a number of transcription factors that regulate invasive growth were knocked out in Segregants 1–5. These deletion strains then were phenotyped for their ability to invade. Regulatory Rewiring in a Cross 775 Spiezio et al. 2012), which have shown that many genetic perturbations can produce comparable phenotypic outcomes. To some degree, our effort also represents an integration of previous work describing genetic variation in regulatory pathways(Yvertetal.2003)andtranscriptionfactoractivity (Zheng et al. 2010; Chin et al. 2012) across yeast isolates. Importantly,wehaveextendedthesepaststudiesbyconnect- ing changes in signaling and transcription factor activity, as identified via genetic techniques, to phenotypic outcomes. Acknowledgments We thank Jonathan Lee, Martin Mullis, Matthew Taylor, Lars Steinmetz, and two anonymous reviewers for critically reviewing a draft of this manuscript. We also thank Sammi Ali for technical assistance with this project, Oscar Aparicio for the pGAL-HO plasmid, Charles Nicolet and the USC Epi- genome Center staff and Jinliang Li and the staff at Laragen fortheirhelpwithIlluminasequencing,andPeterCalabrese forcommentsonthisprojectduringitsimplementation.Our work was supported in part by grants from the National Institutes of Health (R01GM110255 and R21AI108939), the National Science Foundation (MCB1330874), the Army Research Office (W911NF-14-1-0318), the Alfred P. Sloan Foundation, and the Rose Hills Foundation to I.M.E. Literature Cited Bruckner, S., and H. U. Mosch, 2012 Choosing the right lifestyle: adhesion and development in Saccharomyces cerevisiae. FEMS Microbiol. Rev. 36: 25–58. Cao, F., S. Lane, P. Raniga, Z. Zhou, K. Ramon et al., 2006 The Flo8transcriptionfactorisessentialforhyphaldevelopmentand virulence in Candida albicans. Mol. Biol. Cell 17: 295–307. Chen, R. E., and J. Thorner, 2007 Function and regulation in MAPK signaling pathways: lessons learned from the yeast Sac- charomycescerevisiae.Biochim.Biophys.Acta1773:1311–1340. Chin, B. L., O. Ryan, F. Lewitter, C. Boone, and G. R. Fink, 2012 Genetic variation in Saccharomyces cerevisiae: circuit di- versification in a signal transduction network. Genetics 192: 1523–1532. Cullen, P. J., and G. F. Sprague, Jr.., 2000 Glucose depletion causes haploid invasive growth in yeast. Proc. Natl. Acad. Sci. USA 97: 13619–13624. Dowell, R. D., O. Ryan, A. Jansen, D. Cheung, S. Agarwala et al., 2010 Genotype to phenotype: a complex problem. Science 328: 469. Ehrenreich, I. M., J. Bloom, N. Torabi, X. Wang, Y. Jia et al., 2012 Genetic architecture of highly complex chemical resis- tance traits across four yeast strains. PLoS Genet. 8: e1002570. Ehrenreich, I. M., N. Torabi, Y. Jia, J. Kent, S. Martis et al., 2010 Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464: 1039–1042. Erdeniz,N.,U.H.Mortensen,andR.Rothstein,1997 Cloning-free PCR-based allele replacement methods. Genome Res. 7: 1174– 1183. Felden, J., S. Weisser, S. Bruckner, P. Lenz, and H. U. Mosch, 2014 The transcription factors Tec1 and Ste12 interact with coregulators Msa1 and Msa2 to activate adhesion and multicel- lular development. Mol. Biol. Cell 34: 2283–2293. Fidalgo, M., R. R. Barrales, J. I. Ibeas, and J. Jimenez, 2006 Adaptive evolution by mutations in the FLO11 gene. Proc. Natl. Acad. Sci. USA 103: 11228–11233. Fidalgo, M., R. R. Barrales, and J. Jimenez, 2008 Coding repeat instability in the FLO11 gene of Saccharomyces yeasts. Yeast 25: 879–889. Fujimura, H. A., 1989 The yeast G-protein homolog is involved in the mating pheromone signal transduction system. Mol. Cell. Biol. 9: 152–158. Guo, B., C. A. Styles, Q. Feng, and G. R. Fink, 2000 A Saccharo- myces gene family involved in invasive growth, cell-cell adhe- sion, and mating. Proc. Natl. Acad. Sci. USA 97: 12158–12163. Halme, A., S. Bumgarner, C. Styles, and G. R. Fink, 2004 Genetic and epigenetic regulation of the FLO gene family generates cell- surface variation in yeast. Cell 116: 405–415. Herskowitz, I., and R. E. Jensen, 1991 Putting the HO gene to work: practical uses for mating-type switching. Methods Enzy- mol. 194: 132–146. Kang, C. M., and Y. W. Jiang, 2005 Genome-wide survey of non- essential genes required for slowed DNA synthesis-induced fila- mentous growth in yeast. Yeast 22: 79–90. Kim, H. Y., S. B. Lee, H. S. Kang, G. T. Oh, and T. Kim, 2014 Two distinct domains of Flo8 activator mediates its role in transcrip- tional activation and the physical interaction with Mss11. Bio- chem. Biophys. Res. Commun. 449: 202–207. Li, H., and R. Durbin, 2009 Fast and accurate short read align- ment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760. Li, H., B. Handsaker, A. Wysoker, T. Fennell, J. Ruan et al., 2009 TheSequenceAlignment/MapformatandSAMtools.Bio- informatics 25: 2078–2079. Li, J., L. Wang, X. Wu, O. Fang, L. Wang et al., 2013 Polygenic molecular architecture underlying non-sexual cell aggregation in budding yeast. DNA Res. 20: 55–66. Liti, G., D. M. Carter, A. M. Moses, J. Warringer, L. Parts et al., 2009 Population genomics of domestic and wild yeasts. Na- ture 458: 337–341. Liu,H.,C.A.Styles,andG.R.Fink,1996 Saccharomycescerevisiae S288C has a mutation in FLO8, a gene required for filamentous growth. Genetics 144: 967–978. Lo, W. S., and A. M. Dranginis, 1998 The cell surface flocculin Flo11 is required for pseudohyphae formation and invasion by Saccharomyces cerevisiae. Mol. Biol. Cell 9: 161–171. Long, A. D., S. J. Macdonald, and E. G. King, 2014 Dissecting complex traits using the Drosophila Synthetic Population Re- source. Trends Genet. 30: 488–495. Manchia, M., J. Cullis, G. Turecki, G. A. Rouleau, R. Uher et al., 2013 The impact of phenotypic and genetic heterogeneity on results of genome wide association studies of complex diseases. PLoS One 8: e76295. Manolio,T.A.,F.S.Collins,N.J.Cox,D.B.Goldstein,L.A.Hindorff etal.,2009 Findingthemissingheritabilityofcomplexdiseases. Nature 461: 747–753. McClellan, J., and M. C. King, 2010 Genetic heterogeneity in hu- man disease. Cell 141: 210–217. Mosch, H. U., and G. R. Fink, 1997 Dissection of filamentous growth by transposon mutagenesis in Saccharomyces cerevisiae. Genetics 145: 671–684. Nogee,L.M.,S.E.Wert,S.A.Proffit,W.M.Hull,andJ.A.Whitsett, 2000 Allelic heterogeneity in hereditary surfactant protein B(SP-B)deficiency. Am. J. Respir. Crit. Care Med. 161: 973– 981. Risch, N. J., 2000 Searching for genetic determinants in the new millennium. Nature 405: 847–856. Rupp, S., E. Summers, H. J. Lo, H. Madhani, and G. Fink, 1999 MAP kinase and cAMPfilamentation signaling pathways 776 T. Matsui et al. converge on the unusually large promoter of the yeast FLO11 gene. EMBO J. 18: 1257–1269. Sagot, I., S. K. Klee, and D. Pellman, 2002 Yeast formins regulate cellpolaritybycontrollingtheassemblyofactincables.Nat.Cell Biol. 4: 42–50. Shao, H., L. C. Burrage, D. S. Sinasac, A. E. Hill, S. R. Ernest et al., 2008 Genetic architecture of complex traits: large phenotypic effects and pervasive epistasis. Proc. Natl. Acad. Sci. USA 105: 19910–19914. Sherman, F., 1991 Guide to yeast genetics and molecular, pp. 3– 21 in Methods in Enzymology, edited by C. Guthrie, and G. R. Fink. Elsevier Academic Press, San Diego. Sirr, A., G. A. Cromie, E. W. Jeffery, T. L. Gilbert, C. L. Ludlow et al., 2015 Allelic variation, aneuploidy, and nongenetic mechanisms suppress a monogenic trait in yeast. Genetics 199: 247–262. Smith, E. N., and L. Kruglyak, 2008 Gene-environment interac- tion in yeast gene expression. PLoS Biol. 6: e83. Song, Q., C. Johnson, T. E. Wilson, and A. Kumar, 2014 Pooled segregant sequencing reveals genetic determinants of yeast pseudohyphal growth. PLoS Genet. 10: e1004570. Spiezio, S. H., T. Takada, T. Shiroishi, and J. H. Nadeau, 2012 Genetic divergence and the genetic architecture of com- plex traits in chromosome substitution strains of mice. BMC Genet. 13: 38. Storici,F.,L.K.Lewis,andM.A.Resnick,2001 Invivosite-directed mutagenesisusingoligonucleotides.Nat.Biotechnol.19:773– 776. Sutcliffe,J.S.,R.J.Delahanty,H.C.Prasad,J.L.McCauley,Q.Han et al., 2005 Allelic heterogeneity at the serotonin transporter locus (SLC6A4) confers susceptibility to autism and rigid- compulsive behaviors. Am. J. Hum. Genet. 77: 265–279. Taylor, M. B., and I. M. Ehrenreich, 2014 Genetic interactions involving five or more genes contribute to a complex trait in yeast. PLoS Genet. 10: e1004324. Tong,A.H.,M.Evangelista, A.B. Parsons,H.Xu,G.D.Bader et al., 2001 Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science 294: 2364–2368. Wang, Y., T. Shirogane, D. Liu, J. W. Harper, and S. J. Elledge, 2003 Exit from exit: resetting the cell cycle through Amn1 inhibition of G protein signaling. Cell 112: 697–709. Will, J. L., H. S. Kim, J. Clarke, J. C. Painter, J. C. Fay et al., 2010 Incipient balancing selection through adaptive loss of aquaporins in natural Saccharomyces cerevisiae populations. PLoS Genet. 6: e1000893. Wray, N. R., and R. Maier, 2014 Genetic basis of complex genetic disease: the contribution of disease heterogeneity to missing heritability. Curr. Epidemiol. Rep. 1: 220–227. Yvert, G., R. B. Brem, J. Whittle, J. M. Akey, E. Foss et al., 2003 Trans-acting regulatory variation in Saccharomyces cerevi- siae and the role of transcription factors. Nat. Genet. 35: 57–64. Zheng, W., H. Zhao, E. Mancera, L. M. Steinmetz, and M. Snyder, 2010 Genetic analysis of variation in transcription factor bind- ing in yeast. Nature 464: 1187–1191. Communicating editor: L. M. Steinmetz Regulatory Rewiring in a Cross 777 GENETICS Supporting Information www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.180661/-/DC1 Regulatory Rewiring in a Cross Causes Extensive Genetic Heterogeneity Takeshi Matsui, Robert Linder,Joann Phan,Fabian Seidl,and Ian M. Ehrenreich Copyright © 2015 by the Genetics Society of America DOI: 10.1534/genetics.115.180661 T. Matsui et al. 2 SI Figure S1. Construction of allele replacements. In the first step, one pair of primers (F1 and R1) was used to amplify the promoter and the coding sequence of the gene to be replaced with 60 bp overlapping the 5’ end of the resistance marker attached at the 3’ end of the PCR product (shown in orange). Another pair of primers (F2 and R2) was used to amplify the resistance marker with 60 bp overlapping the genomic region immediately downstream of the transcribed potion of the gene using the first primer pair attached at the 3’ end of the PCR product. In the second step, the two overlapping PCR products were transformed into the strains. Integration into the genome requires recombination between the PCR products and the target locus. T. Matsui et al. 3SI Figure S2. Initial results from selective genotyping of segregants that show FLO8-independent invasion. (A) Comparison of genome-wide relative allele frequency plot among FLO8 BY invasive progeny to a non-invasive FLO8 BY control population. (B) Genome-wide relative allele frequency plot among FLO8 BY segregants that invade on both glucose and ethanol. T. Matsui et al. 4 SI Figure S3. Differences FLO11 coding region length between BY and YJM. PCR was used to amplify the FLO11 coding region from the BY and YJM strains. The size of FLO11 BY was ~4.1kb, while FLO11 YJM was ~3.4kb. 0.5 kb 1.0 kb 1.5 kb 2.0 kb 3.0 kb 4.0 kb 5.0 kb BY YJM T. Matsui et al. 5SI Figure S4. Replacement of the FLO11 coding region in segregant 2 with the BY allele causes loss of invasion. To verify that FLO11 BY was correctly integrated and replaced using our one-step allele replacement, we PCR amplified the 5’ end of the gene, and Sanger sequenced multiple invasive and non- invasive transformants. Only the transformants carrying the BY SNPs (marked in black) toward the 5’ end showed loss of invasion, implying that only individuals with most of the FLO11 gene replaced exhibited loss of invasion. Flo11 protein is comprised of three domains, which are reflected in the sequence of the FLO11 gene. The N-terminal portion of the protein encodes a hydrophobic signal sequence, is exposed at the cell surface, and binds to ligands. The middle domain largely contains variable length tandem repeats that are enriched for serines and threonines, and is the part of the protein where heavy glycosylation occurs. The C-terminal portion of the protein is a GPI anchor that localizes Flo11 to the cell wall. The highly repetitive nature of the middle portion of FLO11 makes it difficult to accurately determine the length and sequence of the gene using short Illumina reads. In the regions that we were able to confidently align, we identified 69 SNPs between the BY and the YJM allele, of which 31 were non- synonymous. In addition, we identified that the YJM allele of FLO11 has a 45bp insertion in the N-terminal region between amino acid position 123 and 124. We also found that no sequencing reads from the YJM mapped to 635 base positions in comparison to BY, which is most likely due to deletions given that the YJM allele of FLO11 was ~700 bases smaller in comparison to the BY allele (Figure S4). In particular, large stretches of the middle domains were missing from amino acid positions 207 to 315, 359 to 372, 409 to 449, 795 to 808, 824 to 845, and 881 to 899 in the YJM allele. We have not yet determined how these changes alter the functionality of Flo11. We note that this portion of the gene is known to be highly variable across yeast strains, affecting many FLO11-dependent traits, such as biofilm formation, flocculation, and invasion. 100 200 300 400 500 BY YJM Non-invasive Non-invasive Invasive Invasive T. Matsui et al. 6 SI Figure S5. Alignment of the BNI1 gene. (A) Alignment of the nucleotide sequences identified 31 SNPs between the BY and YJM allele of BNI1. (B) Alignment of the translated amino acid sequence revealed that 7 SNPs were nonsynonymous. 1500 1000 1500 2000 2500 3000 3500 4000 4500 5000 55005872 BNI1 YJM BNI1 YJM 1200 400 600 800 1000 1200 1400 1600 18001952 A B T. Matsui et al. 7SI Table S1 Phenotype data and Short Read Archive identifiers for the segregants examined in the paper. Available for download as an Excel file at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.180661/-/DC1 Tables S2-S6 Available for download as .txt files at www.genetics.org/lookup/suppl/doi:10.1534/genetics.115.180661/-/DC1 Table S2. Genotype data for the initial 127 BYxYJM segregants. Genotypes in this table, as well as the following tables, are encoded as 0 for BY and 1 for YJM. Table S3. Genotype data for the additional 55 BYxYJM segregants that show FLO8-independent invasion. Table S4. Genotype data for the backcross of Segregant 2 to BY. Table S5. Genotype data for the backcross of Segregant 2 to YJM. Table S6. Genotype data for the backcross of Segregant 3 to BY. T. Matsui et al. 8 SI Table S7. Analysis of dissected tetrads from homozygous diploid derivatives of specific segregants. Phenotypes of spores from homozygous diploid versions of Segregants 1, 2, and 3. Segregant Tetrad MATa spore 1 MATa spore 2 MATalpha spore 1 MATalpha spore 2 1 1 N N N N 1 2 I N N N 1 3 I N N N 1 4 I I N N 1 5 I I N N 1 6 I I N N 1 7 I I N N 1 8 I I N N 1 9 I I N N 1 10 I I N N 2 1 I I I I 2 2 I I I I 2 3 I I I I 2 4 I I I I 2 5 I I I I 2 6 I I I I 2 7 I I I I 3 1 I I I I 3 2 I I I I 3 3 I I I I 3 4 I I I I 3 5 I I I I 3 6 I I I I 3 7 I I I I I = Invasive, N = Non-invasive 124 Appendix C: The complex genetic and molecular basis of a model quantitative trait. This work appears as published in 2016 in Molecular Biology of the Cell. 27(1): 209-18 Volume 27 January 1, 2016 209 MB o C | ARTICLE The complex genetic and molecular basis of a model quantitative trait Robert A. Linder, Fabian Seidl, Kimberly Ha, and Ian M. Ehrenreich Molecular and Computational Biology Section, University of Southern California, Los Angeles, CA 90089-2910 ABSTRACT Quantitative traits are often influenced by many loci with small effects. Identify- ing most of these loci and resolving them to specific genes or genetic variants is challenging. Yet, achieving such a detailed understanding of quantitative traits is important, as it can im- prove our knowledge of the genetic and molecular basis of heritable phenotypic variation. In this study, we use a genetic mapping strategy that involves recurrent backcrossing with phe- notypic selection to obtain new insights into an ecologically, industrially, and medically rele- vant quantitative trait—tolerance of oxidative stress, as measured based on resistance to hydrogen peroxide. We examine the genetic basis of hydrogen peroxide resistance in three related yeast crosses and detect 64 distinct genomic loci that likely influence the trait. By precisely resolving or cloning a number of these loci, we demonstrate that a broad spectrum of cellular processes contribute to hydrogen peroxide resistance, including DNA repair, scav- enging of reactive oxygen species, stress-induced MAPK signaling, translation, and water transport. Consistent with the complex genetic and molecular basis of hydrogen peroxide resistance, we show two examples where multiple distinct causal genetic variants underlie what appears to be a single locus. Our results improve understanding of the genetic and molecular basis of a highly complex, model quantitative trait. INTRODUCTION Mapping experiments in model organisms have served and con- tinue to play a crucial role in advancing our understanding of the genetic and molecular basis of quantitative traits (Mackay et al., 2009; Bloom et al., 2013). However, as discussed elsewhere (e.g., Parts et al., 2011; Cubillos et al., 2013), these studies typically in- volve genetic mapping strategies that are capable of identifying loci at the resolution of broad genomic regions. For quantitative traits that are influenced by a large number of loci, this mapping resolu- tion can make it challenging to detect and localize causal genetic variants, and can also make it difficult to measure accurately the effects of individual loci. Recent work in Saccharomyces cerevisiae highlights this prob- lem. Studies using large, statistically powerful mapping populations in crosses involving two strains have detected dozens of loci per phenotype (Ehrenreich et al., 2010, 2012; Parts et al., 2011; Bloom et al., 2013; Taylor and Ehrenreich, 2014; Treusch et al., 2015). How- ever, as more strains are considered, the number of loci identified for a given trait typically increases substantially, with loci often show- ing complicated patterns of detection across genetic backgrounds (e.g., Cubillos et al., 2011; Ehrenreich et al., 2012; Treusch et al., 2015). A variety of biological phenomena, including genetic interac- tions among loci or close linkage of different genetic variants with phenotypic effects, might explain these observations. Differentiat- ing among these possibilities requires identifying combinations of interacting alleles and precisely delimiting loci to very small ge- nomic intervals, ideally to specific genes and nucleotides. In this study, we dissect at high resolution the genetic basis of heritable variation in hydrogen peroxide resistance among three budding yeast isolates—the lab strain BY4716, the wine strain RM11-1a, and the oak strain YPS163 (hereafter BY, RM, and YPS, respectively). We chose these strains because they are known to possess genetically complex differences in their tolerances of hydro- gen peroxide (Kvitek et al., 2008; Ehrenreich et al., 2012). To deter- mine the genetic basis of this heritable phenotypic variation, we Monitoring Editor Charles Boone University of Toronto Received: Jun 16, 2015 Revised: Sep 25, 2015 Accepted: Oct 21, 2015 This article was published online ahead of print in MBoC in Press (http://www .molbiolcell.org/cgi/doi/10.1091/mbc.E15-06-0408) on October 28, 2015. Address correspondence to: Ian Ehrenreich (Ian.Ehrenreich@usc.edu). © 2016 Linder et al. This article is distributed by The American Society for Cell Biology under license from the author(s). Two months after publication it is avail- able to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0). “ASCB ® ,” “The American Society for Cell Biology ® ,” and “Molecular Biology of the Cell ® ” are registered trademarks of The American Society for Cell Biology. Abbreviations used: ANOVA, analysis of variance; FDR, false discovery rate; MIC, minimum inhibitory concentration; RH, reciprocal hemizygosity analysis; SNP , single-nucleotide polymorphism; YPD, yeast extract–peptone–dextrose. 210 | R. A. Linder et al. Molecular Biology of the Cell of these loci at the resolution of small genomic windows, as well as cloning of certain quantitative trait genes and nucleotides, indicates that many distinct molecular processes contribute to hydrogen per- oxide resistance. Consistent with the high genetic and molecular complexity of this trait, we show that multiple causal genetic variants can underlie what appears to be a single locus based on genetic mapping data. Our results advance understanding of the genetic and molecular basis of highly complex quantitative traits in yeast and potentially other organisms as well. RESULTS Generation of mapping populations using recurrent backcrossing with phenotypic selection We first determined the minimum inhibitory concentrations (MICs) for hydrogen peroxide of BYxRM, BYxYPS, and RMxYPS segregants (Supplemental Figure S1 and Supplemental Table S1; Materials and Methods). For each cross, 864 haploid recombinants were screened. After this initial phenotyping, each of the five most resistant F 2 seg- regants from the three crosses were individually subjected to several rounds of selective backcrossing to both of their parents (Figure 1; Materials and Methods). This resulted in the generation of two F 2 B 3 families per hydrogen peroxide–resistant F 2 segregant. During the iterations of backcrossing, haploid recombinants were frozen to cre- ate immortalized stocks and then phenotyped using colony growth assays conducted across a range of hydrogen peroxide doses (Materials and Methods). The most resistant haploid segregant in a given backcross was then used as the founder for the next round of backcrossing (Materials and Methods). To prevent hydrogen perox- ide–induced mutations from accumulating during the backcrossing, we performed each round of crossing using fresh cultures from the frozen stocks that had never been exposed to hydrogen peroxide. From each of the 30 backcross families, 12–15 highly resistant F 2 B 3 progeny were genotyped by low-coverage, whole-genome se- quencing and used to detect loci (Materials and Methods). In total, 417 F 2 B 3 segregants were genotyped. However, we found karyo- type instability in families derived from one of the BYxYPS F 2 segre- gants and therefore excluded individuals in these families from the study (Supplemental Note S2). Thus, the analyses described here are based on the 392 F 2 B 3 segregants that were generated from the remaining 28 backcross families that did not show aneuploidies. Identification of loci that contribute to hydrogen peroxide resistance Loci were identified within individual families based on allele fre- quency skew among the 12–15 resistant backcross segregants that had been genotyped (Materials and Methods). Given that resistant individuals from the same family were generated from a common diploid progenitor and identified by screening of individual strains (as opposed to pools of segregants), we do not expect bias in our results due to inadvertent selection on other traits, such as sporula- tion or mating efficiency. Excluding the MAT locus on chromosome III, which is a control marker that we used to generate haploid segregants, we identified 60 loci at a FDR (q) of ≤ 0.05: 10 in BYxRM, 28 in BYxYPS, and 22 in RMxYPS (Figure 2 and Supplemental Figure S2; Materials and Methods). Lowering the threshold to 0.1 or 0.2 resulted in the detection of 79 and 104 total loci, respectively, with crosses involving YPS showing the highest number of detected loci (Figure 2). Resolution of loci within individual families Because we identified loci using segregants that had been individu- ally genotyped, we were able to delimit loci detected in each family generated multiple backcross populations from each cross of the three strains and implemented additional rounds of backcrossing coupled to stringent phenotypic selections (Figure 1). The rationale behind our approach was to delimit causal loci to discrete genomic intervals that had been introgressed into the backcross parent’s ge- nome. By conducting this process multiple times in parallel, we could then combine information from related backcrosses and map loci even more precisely while retaining high statistical power (Sup- plemental Note S1). Furthermore, by using such a mapping strat- egy, we hoped to introgress combinations of alleles that collectively confer resistance, as these alleles could then be tested for additive effects and genetic interactions in a manner similar to Taylor and Ehrenreich (2014, 2015a). We focused on hydrogen peroxide resistance because it is com- monly used as a proxy for oxidative stress tolerance, a trait that has potential relevance for both human health and yeast biology. Sus- ceptibility to oxidative stress has been linked to aging (Fabrizio et al., 2003; Braun and Westermann, 2011; Petti et al., 2011; Cui et al., 2012; Longo et al., 2012), as well as to Alzheimer’s disease (Jomova et al., 2010; Greenough et al., 2013; Koppenhofer et al., 2015), diabetes (Varvarovska et al., 2004; Aouacheri et al., 2015), and other disorders. Furthermore, tolerance of oxidative stress has ecological and economic ramifications for yeast, in particular for strains that mainly inhabit aerobic environments or are used in fer- mentations or industrial applications (Higgins et al., 2003; Sasano et al., 2012; Dhar et al., 2013; Fierro-Risco et al., 2013; Brown et al., 2014; Kitagaki and Takagi, 2014). By applying our backcrossing strategy to variation in hydrogen peroxide resistance in the BYxRM, BYxYPS, and RMxYPS crosses, we identified 64 distinct genomic loci that likely contribute to the trait. Analysis of allele combinations at certain subsets of these loci sug- gests that variability in hydrogen peroxide resistance has a genetic basis that is largely additive. Furthermore, identification of a number FIGURE 1: Backcrossing strategy. F 2 B 3 segregants with high hydrogen peroxide resistance were generated through multiple rounds of backcrossing with phenotypic selection. The filled rectangles at each stage represent chromosomes, with gray and black depicting chromosomal regions inherited from Parent 1 and Parent 2, respectively. The brackets shown beneath the chromosomes from resistant F 2 B 3 segregants indicate causal loci. Numbers of individuals screened at each stage of crossing are reported in Materials and Methods. xx Resistant F B 2 Segregant Resistant F B 22 Segregant Resistant F B 23 Segregants Resistant F Segregant 2 Parent 1 Parent 2 Volume 27 January 1, 2016 Causes of heritable variation in a trait | 211 generated by repeated backcrossing to YPS (referred to here as (RMxYPS)xYPS families). None of the loci detected in either of these families were the same (Figure 3A), and some of the loci identified in these families were not seen in mapping data for other RMxYPS families (Figure 2). We generated 96 random F 2 B 3 s from each of these fami- lies, screened these segregants for their MICs, and genotyped these segregants at the relevant loci (Materials and Methods). We then used the genotype and phenotype data to test whether the loci individually ex- hibited effects (Materials and Methods). When we did this, six of the eight (75%) tested loci showed significant additive ef- fects (t test, p ≤ 0.045; Figure 3, B and C), a result that is consistent with our false dis- covery rate (FDR) threshold. We also examined whether genetic in- teractions influenced the effects of any of the eight queried loci. Specifically, we used full factorial analysis of variance (ANOVA) models to assess the relationship between genotype at the loci with significant additive effects and MIC in the two (RMxYPS)xYPS families (Materials and Methods). These models included not only the additive terms corresponding to each individual locus, but also all possible pair- wise and higher-order interaction terms among the loci. Although all of the loci continued to show significant additive effects in these models, only one significant interaction term (p < 0.05) was ob- served, which was between L7-I and L15 (Figure 3A). We also tested for statistically significant genetic interactions between alleles de- tected in the two families using F 2 segregants but did not identify any such interactions (Supplemental Figure S4, A and B). These re- sults suggest that genetic interactions contribute little to heritable variation in hydrogen peroxide resistance (Figure 3, D and E). Thus, even though our approach should be capable of revealing pairwise and higher-order genetic interactions (Taylor and Ehrenreich, 2014, 2015a,b), our results agree with recent work demonstrating that quantitative traits in yeast have a genetic basis that is largely addi- tive (Bloom et al., 2013). Using detection of loci in multiple families to improve mapping resolution We attempted to improve our resolution of loci that were detected in multiple families. There were five, five, and six such loci in the BYxRM, BYxYPS, and RMxYPS crosses, respectively (Figure 2). By aggregating data from the families in which these loci were de- tected at a q-value of ≤0.2 (Figure 2; Materials and Methods), we achieved an average resolution for these loci of 14, 6, and 13 genes in the BYxRM, BYxYPS, and RMxYPS crosses, respectively. We concentrated on cloning quantitative genes underlying loci detected in multiple families in the BYxYPS cross, as these were identified at the most precise resolution (Figure 2). We first used reciprocal hemizygosity analysis (RH) (Steinmetz et al., 2002) to ex- amine nearly all of the nonessential candidate genes underlying the five loci, which were located on chromosomes V, VII, IX, XIV, and XVI (25 total genes examined; Supplemental Figure S5A and Supple- mental Table S3; Materials and Methods). This successfully identi- fied two quantitative trait genes: MKT1 (chromosome XIV; Figure 4A based on recombination breakpoints that were observed in the data (Materials and Methods). Among the 60 loci detected at q ≤ 0.05, this resulted in an average resolution of 54.1 kb (Materials and Methods; Supplemental Table S2). However, 11 of these loci were delimited to genomic intervals of <10 kb (Materials and Methods). Two loci were detected at a resolution of a single gene: YIL177C, a putative Y′ element helicase that was detected on chromosome IX in two RMxYPS families, and PFK2, a subunit of phosphofructoki- nase involved in glycolysis that was detected on chromosome XIII in one BYxYPS family. An additional three loci were detected at a reso- lution of two genes (Supplemental Table S2). These occurred on chromosomes I, IX, and XIV in the BYxYPS cross and corresponded respectively to the lectin-like cell wall protein FLO1 and the oxys- terol-binding protein SWH1, the vitamin-related transcriptional acti- vator VHR1 and the respiratory induced gene RGI2, and the BLOC1 component SNN1 and the highly pleiotropic gene MKT1. MKT1 and SWH1 are known quantitative trait genes in the BYxRM cross that were identified through mapping studies focused on other phe- notypes, as reviewed in Ehrenreich et al. (2009) and recently shown in Wang and Kruglyak (2014), respectively. Validation and deeper genetic analysis of a subset of loci When we compared data from families and crosses, we found that detected loci (q ≤ 0.2) could be collapsed into 64 distinct genomic regions (Figure 2; Materials and Methods). For the remainder of the article, we refer to these distinct regions as “loci.” Although 16 of these loci were identified in at least two different families from the same cross, most were not detected in multiple families from the same cross (Figure 2 and Supplemental Figure S3). Some of the loci that were not replicated may be false positives. However, we expect that a large fraction of the detected loci, even those seen only a single time, have biological effects (Supplemental Note S1). To confirm that most of the identified loci have biological effects, we examined in more detail two RMxYPS families that had been FIGURE 2: Genetic mapping results. Regions of the genome that showed nominal significance (p ≤ 0.05) are plotted with their associated q values. (A–C) Results from advanced backcross families derived from different BYxRM, BYxYPS, and RMxYPS F 2 segregants, respectively. Each row represents an F 2 segregant from which two advanced backcross families were derived by recurrent backcrossing to each of the strain’s parents. III III IV VVI VII VIII X IX XI XIIXIII XV XIVXVI 4 3 2 1 5 4 3 2 1 5 4 3 2 1 A B C F 2 Segregant BY YPS RM Q-value Genome Position 0.2 0.1 0.05 0 212 | R. A. Linder et al. Molecular Biology of the Cell tions. We found this because replacing the entire coding and non- coding regions of either POR2 or SDP1 decreased hydrogen perox- ide resistance (Figure 4E), whereas replacing only the coding region of either POR2 or SDP1 had no effect. One single-nucleotide polymorphism (SNP) differentiates the BY and YPS alleles of the POR2-SDP1 intergenic region. To determine whether this SNP affects the expression of POR2 or SDP1, we used quantitative PCR (qPCR) to measure the transcription of these genes in the presence and absence of hydrogen peroxide (Materials and Methods). We performed qPCR in two genetic backgrounds—a BYxYPS F 2 B 3 segregant that carried the BY allele of the POR2-SDP1 region and a genetically engineered version of the same strain that carried the YPS allele of the region. No expression differences were observed in the absence of hydrogen peroxide (Figure 5A). How- ever, the qPCR experiments revealed that the intergenic SNP affects SDP1 expression specifically in the presence of hydrogen peroxide (Figure 5B). The functionally diverse quantitative trait genes (AQY1, MKT1, MMS21, MRP13, and SDP1) that were cloned as a part of this and Supplemental Figure S6) and the aquaporin AQY1 (chromo- some XVI; Figure 4B and Supplemental Figure S6). We validated AQY1 and MKT1 by performing allele replacements spanning the entire coding and noncoding regions of these genes in BYxYPS F 2 B 3 segregants that carried the resistance allele at the locus being tested (Supplemental Figure S5B; Materials and Methods). We also used allele replacements that spanned the entire coding and noncoding regions of candidate genes to test every gene un- derlying the chromosome V , VII, and IX loci in resistant BYxYPS F 2 B 3 segregants (Supplemental Figure S5B and Supplemental Table S3; Materials and Methods). By doing this, we identified a single quan- titative trait gene at two of the loci: MMS21, an essential SUMO li- gase that is involved in DNA repair (chromosome V; Figure 4C), and MRP13, a nuclear-encoded component of the mitochondrial ribo- some (chromosome VII; Figure 4D). In addition, we found that the causal variant underlying the chromosome IX locus is a cis regula- tory polymorphism in the intergenic region between the mitochon- drial porin POR2 and the stress-inducible mitogen-activated protein kinase phosphatase SDP1, which are transcribed in opposite direc- FIGURE 3: Validation of loci detected in families derived from the RMxYPS cross. (A) Different combinations of alleles detected in RMxYPS backcrosses to YPS. (B, C) The 95% confidence intervals of MIC for alleles tested in larger F 2 B 3 populations. (B) The 96 YPS-backcrossed RMxYPS F 2 B 3 segregants from a family in which a combination of three loci were detected were phenotyped and genotyped at these loci. (C) The 96 YPS-backcrossed RMxYPS F 2 B 3 segregants from a family in which a combination of four alternate loci were detected, as well as a locus shared with only one other YPS-backcrossed RMxYPS family, were similarly phenotyped and genotyped at these loci. (D, E) Additive effects of these alleles across genotypes, with black lines illustrating regression models that only include additive effects. Parental Allele Frequency A B DE C 4.0 4.2 4.4 4.6 4.8 5.0 5.2 Genotype Genome Position MIC (mM) MIC (mM) MIC (mM) MIC (mM) YR YR YR 3.5 4.0 4.5 5.0 5.5 Genotype at L16-I and L16-II YY YR RY RR 4.0 4.2 4.4 4.6 4.8 5.0 5.2 Genotype YR YR YR YR YR 3.5 4.0 4.5 5.0 5.5 Number of Resistance Alleles at L6-L15 01234 L4 L16-I L16-II L6 L7-I L9 L15 13 13 III III IV VVIVII VIII XXI IX XIIXIIIXIV XV XVI R L4 L6 L7-I L7-II L9 L15 L16-I L16-II R L7-II (RxY Seg1)xY (RxY Seg5)xY Volume 27 January 1, 2016 Causes of heritable variation in a trait | 213 FIGURE 4: Localization and cloning of loci detected in multiple BYxYPS advanced backcross families. By combining data from advanced backcross families, we improved our mapping resolution. Gray vertical bars indicate the bounds delimited by aggregated data. Horizontal dashed lines represent the nominal significance cut-off (p ≤ 0.05) for a one-tailed binomial test. The causal gene or variant at each locus is illustrated with a red box or asterisk, respectively. Genes represented with white boxes had no effect on hydrogen peroxide resistance. B A Chr IX Chr XVI Chr VII Chr XIV Chr V C RPL11B tT(UGU)G1 GCD2 MRP13 PDC6 CTT1 NNF2 UTP22 PRP31 PIL1 POR2 NUP159 SNN1 MKT1 SKI3 RPC82 AQY1 YEL020C tK(CUU)E1 tR(UCU)E YEL008C-A GLC3 GCN4 MIT1 MMS21 SPC25 ISC1 D E SDP1 * RPR1 0 10 20 30 40 Number with YPS Allele 400,000500,000 0 10 20 30 40 50 Number with BY Allele 820,000860,000 900,000940,000 0 5 10 15 20 25 Number with BY Allele 50,000 250,000 0 10 20 30 40 Number with BY Allele 550,000750,000 0 5 10 15 20 25 30 Number with BY Allele 100,000200,000 214 | R. A. Linder et al. Molecular Biology of the Cell allele in a relevant RMxYPS F 2 B 3 (Materials and Methods). The only gene in this interval that had an effect in the RMxYPS cross was the cytosolic catalase-encoding gene CTT1, which is located three genes away from MRP13 (Figure 6, B and D). These findings are important because they show how indi- vidual genomic loci detected in genetic mapping experiments in yeast can in fact correspond to multiple quantitative trait genes and nucleotides. DISCUSSION Recent studies reveal the very high genetic complexity that can underlie heritable phe- notypes in budding yeast (e.g., Ehrenreich et al., 2010, 2012; Cubillos et al., 2011; Lorenz and Cohen, 2012; Bloom et al., 2013; Granek et al., 2013; Wilkening et al., 2014). Much of this work has focused on chemical resistance traits, which can easily be screened in large populations of segregants and multiple crosses. Despite the valuable insights gained from efforts to map the genetic basis of these phenotypes, questions about the mole- cular mechanisms and statistical genetic architecture underlying these traits remain unanswered, largely due to the limited resolu- tion of current genetic mapping approaches. In this study, we focused on a single chemical resistance trait— hydrogen peroxide resistance—and used a genetic mapping strat- egy that resulted in the precise detection of combinations of loci. Our strategy was in part motivated by recent discoveries of higher- order genetic interactions in yeast (Dowell et al., 2010; Taylor and Ehrenreich, 2014, 2015a,b; Wang and Kruglyak, 2014), which were found for binary traits, and was aimed at determining whether such complex genetic interactions also contribute to quantitative traits. Our work suggests that higher-order genetic interactions are un- likely to make a significant contribution to quantitative traits in yeast and instead supports a largely additive genetic basis for such phe- notypes in this organism, as was recently argued (Bloom et al., 2013). In relation to our past work (Ehrenreich et al., 2012; Bloom et al., 2013), we used a different mapping technique to characterize the genetic basis of hydrogen peroxide resistance in the present study. Comparison of results from this article to our other publications or of our past manuscripts to each other suggests that roughly half of the loci detected in one study are replicated in another. For example, analysis of pools of hydrogen peroxide–resistant BYxRM, BYxYPS, and RMxYPS segregants identified 28 loci (Ehrenreich et al., 2012). Of these, 16 (57%) were also detected in the present study. Dispari- ties in the loci identified by these studies may be due to either bio- logical or technical factors and are consistent with the high genetic complexity of hydrogen peroxide resistance. Of note, the present study is distinguished from our past efforts by the concerted effort we made to clone multiple quantitative trait genes that contribute to hydrogen peroxide resistance. This re- sulted in the identification of CTT1, MKT1, MMS21, MRP13, SDP1, and two different alleles of AQY1. Our results indicate that many molecular and cellular processes influence hydrogen peroxide resis- tance. It is possible that this breadth of mechanisms that contribute to the trait may have provided a large mutational target space for the accumulation of functional genetic variation. However, proving such a point is difficult without knowing more of the genes that section, as well as the loci that were detected at a resolution of one or two genes (Supplemental Table S2), indicate that variation in hy- drogen peroxide resistance is shaped by a broad space of molecular mechanisms and cellular processes. Linkage among genetic variants strongly influences how loci are detected Recent studies applying statistically powerful genetic mapping tech- niques to multiple crosses have shown that loci can exhibit compli- cated patterns of detection among interrelated mapping populations (e.g., Ehrenreich et al., 2012; Treusch et al., 2015). Arguably, the sim- plest explanation for this phenomenon is that multiple genetic vari- ants segregate on each chromosome and interference among these genetic variants affects how loci are detected in any given cross. Our study supports linked genetic polymorphisms as the major cause of loci being detected in complicated patterns in studies in- volving multiple crosses. For example, in addition to identifying AQY1 as a quantitative trait gene in the BYxYPS cross, we also found that AQY1 is a quantitative trait gene in the RMxYPS cross (Supple- mental Figure S7). This was determined through a combination of mapping the chromosome XVI locus spanning AQY1 to two genes using data from four RMxYPS families (Figure 2 and Supplemental Figure S7A), as well as by replacing AQY1 RM with AQY1 YPS in a rele- vant RMxYPS F 2 B 3 (Supplemental Figure S7B; Materials and Methods). These results might suggest that BY and RM share an al- lele of AQY1 that confers hydrogen peroxide resistance and differ- entiates them from YPS. However, this is not the case; there are no polymorphisms shared between BY and RM in the promoter, coding region, or 3′ untranslated region of AQY1. Instead, AQY1 BY and AQY1 RM harbor distinct loss-of-function variants (a missense poly- morphism in BY and a frameshift in RM), which were originally identi- fied due to their similar effects on freeze–thaw tolerance (Will et al., 2010). We also found evidence for the presence of closely linked causal genetic variants in different quantitative trait genes. In this case, a locus overlapping the BYxYPS quantitative trait gene MRP13 was also identified in the RMxYPS cross (Figure 6A). However, replace- ment of MRP13 RM with MRP13 YPS in a resistant RMxYPS F 2 B 3 did not have an effect on hydrogen peroxide resistance (Figure 6C). To de- termine the causal gene underlying this locus in the RMxYPS cross, we replaced the RM allele for every gene in this interval with the YPS FIGURE 5: A cis regulatory polymorphism in the BYxYPS cross causes differential expression of SDP1 in response to hydrogen peroxide. (A, B) qPCR analyses of POR2 and SDP1, respectively. Isogenic strains that only differed in their genotype at the POR2-SDP1 locus were used in these experiments. 0.00 0.04 0.08 0.12 Hydrogen Peroxide (mM) POR2 Relative Ab undance • • • • • • • • • • • • • • • • • • • • • • •• • • • 03 0.00 0.04 0.08 0.12 Hydrogen Peroxide (mM) SDP1 Relative Ab undance 03 AB BY allele YPS allele Volume 27 January 1, 2016 Causes of heritable variation in a trait | 215 effects of high genetic complexity and linkage that we described will be more severe in studies that involve a larger number of strains. If this is true, then precisely resolving loci to specific genes and causal variants will be crucial for studies of highly complex, quantita- tive traits in yeast and other model organisms moving forward. Such resolution will be necessary to maximize the insights that genetic mapping can provide into the genetic and molecular basis of quan- titative traits. MATERIALS AND METHODS Screening for hydrogen peroxide resistance Strains were first inoculated into 96 deep-well plates with liquid yeast extract–peptone–dextrose (YPD) medium. These cultures were then incubated for 24 h at 30°C with shaking. After incubation, cells were manually pinned onto YPD agar plates containing different concen- trations of hydrogen peroxide. Pinned colonies were incubated at 30°C for 48 h, after which colonies were imaged using a standard contribute to hydrogen peroxide resistance in BY, RM, YPS, and other strains, and also determining the causal variants in these genes. Our findings also provide insights into why certain loci show complicated patterns of detection in studies involving more than two strains. We find that this arises due to the presence of mul- tiple variants with additive effects occurring in the same gene, as well as the presence of distinct additive variants in different genes that are closely linked. Which of these scenarios is more prevalent is unclear from the present work due to the limited number of examples provided by this study, as well as the fact that AQY1 is a common target for environment-specific adaptive mutations and thus may be unusual (Will et al., 2010). In conclusion, although we focused on hydrogen peroxide resis- tance, our findings may have general relevance for other chemical resistance phenotypes and highly complex quantitative traits. Given that we examined only three related crosses, we expect that the FIGURE 6: Two different quantitative trait genes segregate in the same genomic region. (A) Overlapping peaks on chromosome VII were detected in BYxYPS and RMxYPS backcrosses to YPS, with B and R representing the BY and RM allele frequencies in these crosses, respectively. (B) MRP13 and CTT1 are depicted in bold to emphasize their close physical proximity within this overlapping interval. MRP13 BY confers resistance in the BYxYPS cross, whereas in the RMxYPS cross, the causal allele in this interval is instead CTT1 RM . (C, D) The 95% confidence intervals of MIC for allele replacement strains, which were generated in a YPS-backcrossed RMxYPS F 2 B 3 segregant with the causal allele at the overlapping locus. Chr VII Genome Position SLX9 TOM20 RPL11B tT(UGU)G1 TWF1 GCD2 MRP13 PDC6 CTT1 NNF2 UTP22 PRP31 DBF2 DRN1 VAS1 RRP46 TPC1 PIL1 ASK10 B A D C 13 15 III III IV VVIVII VIII XXI IX XIIXIII XIVXV XVI B 14 B B 13 R 0 2 4 6 8 10 12 Number with RM Allele 550,000 750,000 Parental Allele Frequency Genotype MIC (mM) Genotype MIC (mM) MRP13 4.0 4.5 5.0 5.5 R Y CTT1 4.0 4.5 5.0 5.5 R Y (RxY Seg1)xY (BxY Seg1)xY (BxY Seg3)xY (BxY Seg4)xY 216 | R. A. Linder et al. Molecular Biology of the Cell throughout the genome, as determined from the mpileup files. All sequencing data are available from the Sequence Read Archive (www.ncbi.nlm.nih.gov/sra) under the BioProject accession identifier PRJNA291876, study accession identifier SRP063000, and Bio- Sample accession identifiers SAMN03998889–SAMN 03999280. Examination of RMxYPS loci To determine the effects of sets of RM alleles detected in the RMxYPS cross, we genotyped F 2 B 3 segregants at the relevant loci and combined these data with our knowledge of MICs for each F 2 B 3 . Genotyping at these loci was conducted using PCR and re- striction enzyme digests. The t tests and ANOVAs were conducted using the t.test() and lm() functions in R (https://cran.r-project.org). Genetic mapping Loci were identified by one-tailed binomial tests conducted on the F 2 B 3 genotype data for single families, with statistical tests imple- mented in R. We conducted one-tailed tests because we expected causal alleles from a given parent to be largely fixed and undetect- able in repeated backcrosses to that parent. In contrast, causal alleles from the nonbackcross parent should show significant enrichment among F 2 B 3 s from the same family, whereas noncausal alleles should segregate at ∼ 50% frequency. The binomial tests were conducted for all genomic positions at which the nonbackcross parental genotype was detected in a given family. The FDR associated with each p value was determined using the QVALUE package in R (Storey and Tibshi- rani, 2003). To estimate q values for each test, we combined p values from all families into a single vector and inputted into QVALUE at the same time. This was done under the assumption that data from differ- ent families share a common null distribution, which can be more accurately estimated using the whole data set generated in this study. Detected intervals were determined as the regions of maximal signifi- cance at a given locus. The size of an identified locus was determined as the region of a chromosome at which a locus showed maximal significance. To map more finely loci that were identified in multiple families from the same cross, we pooled data from each of the fami- lies from the same cross in which a locus was nominally significant. We then delimited the locus as the region of the chromosome show- ing maximal significance in the pooled data. The 64 unique genomic loci described here were identified by combining detected loci from all crosses. In cases in which overlapping loci were detected in mul- tiple families from the same cross or from different crosses, we de- fined the region of overlap as the unique genomic locus. Reciprocal hemizygosity analysis For a given gene, haploid deletion strains were generated in both of the haploid parents of a cross using the CORE cassette (Storici et al., 2001). To facilitate these knockouts, 60–base pair tails were added to the CORE cassette by PCR. The tails were designed so that inte- gration of the cassette into the genome resulted in loss of the entire coding region of that gene in the recipient strain. To generate hemi- zygotes, haploid deletion strains were mated to a wild-type version of the other cross parent. These hemizygotes were then screened for hydrogen peroxide resistance as described. We considered a gene cloned by RH if a significant effect was detected based on the examination of at least three independently generated hemizygotes per allele (i.e., six total hemizygotes per gene). Significance was as- sessed using t tests implemented in R. Allele replacements Marker-assisted allele replacement was conducted as described in Matsui et al. (2015). Unless stated otherwise, for a given allele digital camera. The MIC was calculated as the lowest concentration of hydrogen peroxide at which a segregant was incapable of grow- ing. Representative images of individuals pinned across a range of doses are shown in Supplemental Figure S8A. Supplemental Figure S8B and Supplemental Table S4 show that, in our experimental setup, there is a low correlation between OD 600 readings and MICs. This finding implies that the preculturing steps in our experiments are unlikely to have an effect on our genetic mapping results. Generation of resistant advanced backcross populations Standard yeast techniques were used for mating and sporulation. At each stage of crossing, the Synthetic Genetic Array marker system (Tong and Boone, 2006) was used to generate large numbers of MATa recombinant segregants through random spore analysis (Ehrenreich et al., 2010). We started by screening 864 F 2 segregants each from the BYxRM, BYxYPS, and RMxYPS crosses. The five most hydrogen peroxide–resistant F 2 segregants from each cross were then backcrossed to both of their parents. At the F 2 B and F 2 B 2 steps, 96 recombinants were screened, with the most resistant individual used for the subsequent backcrossing. Freezer stocks were gener- ated for every segregant before phenotyping for hydrogen peroxide resistance, and all crosses were performed using these freezer stocks. At the F 2 B 3 stage, 672 segregants were phenotyped per backcross family. Of these, 12–15 of the most resistant segregants were genotyped using low-coverage, whole-genome sequencing. Generation of parental reference genomes We sequenced whole-genome libraries from each of our parent strains to ∼ 50× coverage on an Illumina (San Diego, CA) HiSeq and used these data to identify genetic differences between our cross parents and the S288c reference genome. Sequencing reads were mapped to the S288c reference genome using Burrows-Wheeler Aligner (BWA-MEM; Li and Durbin, 2009) using the commands bwa –mem –t 6 ref.fsa read1.fq read2.fq > output.sam. Duplicate reads were removed using the SAMtools rmdup command. Mpileup files were generated in SAMtools (Li et al., 2009) using the commands samtools mpileup –f ref.fsa read.rmdp.srt.bam > output.mp, and SNPs were identified using custom Python scripts. The numbers of SNP and small indel differences observed between our parent strains and the S288c reference genome were 111 for BY , 46,900 for RM, and 65,193 for YPS. We constructed strain-specific reference ge- nomes for each strain by integrating these variants into the S288c reference genome. These strain-specific reference genomes were used in subsequent analyses. Low-coverage, whole-genome sequencing of resistant backcross segregants Whole-genome libraries were generated for resistant F 2 B 3 segre- gants using the Illumina Nextera kit, using DNA that had been ex- tracted with Qiagen (Valencia, CA) DNeasy Blood & Tissue Kits. Each backcross segregant was tagged with a unique barcode identi- fier. Equimolar fractions of the libraries were mixed together and sequenced at low coverage on an Illumina HiSeq. The average cov- erage per sequenced segregant was 4.31× . After demultiplexing the sequencing reads, reads were aligned to the reference genome of the parent strain used for backcrossing with BWA-MEM using the same parameters described earlier (Li and Durbin, 2009). This was followed by the generation of mpileup files with SAMtools (Li et al., 2009). Haplotypes were determined by using the fraction of reads from each parental strain as input for hidden Markov models exe- cuted in R chromosome by chromosome (Taylor and Ehrenreich, 2014). Aneuploidies were identified by examining coverage Volume 27 January 1, 2016 Causes of heritable variation in a trait | 217 replacement, the full donor allele (200–300 base pairs of upstream sequence, the entire coding region, and ∼ 50 base pairs of down- stream sequence) was amplified from the relevant haploid cross par- ent. At the same time, the hphMX cassette (Goldstein and McCusker, 1999) was amplified with 60–base pair homology tails. One of the hphMX tails was designed to be identical to the 3′ end of the donor allele PCR product. The other hphMX tail was designed to be identi- cal to the region just downstream of the donor sequence in the recipient strain. Selection for hygromycin resistance was used to ob- tain at least 20 transformants per attempted replacement. Because recombination between the donor allele and the recipient genome can occur anywhere along the length of the donor allele, marked replacements often result in only partial replacement of a gene. Thus we used Sanger sequencing or restriction typing to identify transformants that possessed complete replacement of a gene or that only integrated the hphMX cassette. These two groups of strains served as the allele replacement and control strains de- scribed in this article. Quantitative PCR A BYxYPS F 2 B 3 segregant with the BY version of the POR2-SDP1 region and a version of this strain that was genetically engineered to carry the YPS allele of this region were cultured overnight in YPD. These cultures were then transferred to fresh YPD the next day and incubated for 3 h at 30°C with shaking (200 rpm). After this setback step, six replicate YPD cultures were generated from each strain, each of which contained ∼ 2 × 10 7 cells. For both genotypes, half of the cultures were kept as controls that were never exposed to hy- drogen peroxide, and the other half of the cultures were supple- mented with hydrogen peroxide to a concentration of 3 mM. After 30 min, the cultures were pelleted by centrifugation. The superna- tant was decanted from these pellets, and each pellet was washed with sterile water. After the wash step, the cultures were pelleted again by centrifugation, snap frozen using liquid nitrogen, and stored at − 80°C for subsequent total RNA extraction. The Qiagen RNeasy Plant Mini Kit was used for RNA extraction, and cDNA was obtained using the ThermoFisher Scientific (Waltham, MA) Super- Script VILO cDNA Synthesis Kit. qPCRs were then performed on each sample using the Kapa Biosystems (Wilmington, MA) SYBR Fast qPCR kit and Bio-Rad Laboratories (Hercules, CA) DNA Engine Opticon 2. The relative abundances of POR2 and SDP1 in each sample were determined through comparison to the reference gene, ACT1. ACKNOWLEDGMENTS We thank Jonathan Lee, Takeshi Matsui, Joann Phan, and Matthew Taylor for critically reviewing a draft of the manuscript. We also thank Charles Nicolet and the USC Epigenome Center staff for their help with Illumina sequencing and Norman Arnheim and Jordan Ebore- ime for assisting us with qPCR experiments. This work was sup- ported by grants from the National Institutes of Health (R01GM110255 and R21AI108939), National Science Foundation (MCB1330874), Alfred P . Sloan Foundation, and Rose Hills Founda- tion to I.M.E. Brown AJ, Budge S, Kaloriti D, Tillmann A, Jacobsen MD, Yin Z, Ene IV, Bohovych I, Sandai D, Kastora S, et al. (2014). Stress adaptation in a pathogenic fungus. J Exp Biol 217, 144–155. Cubillos FA, Billi E, Zorgo E, Parts L, Fargier P , Omholt S, Blomberg A, Warringer J, Louis EJ, Liti G (2011). Assessing the complex architecture of polygenic traits in diverged yeast populations. Mol Ecol 20, 1401–1413. Cubillos FA, Parts L, Salinas F, Bergstrom A, Scovacricchi E, Zia A, Illingworth CJ, Mustonen V , Ibstedt S, Warringer J, et al. (2013). High-resolution mapping of complex traits with a four-parent advanced intercross yeast population. Genetics 195, 1141–1155. Cui H, Kong Y, Zhang H (2012). Oxidative stress, mitochondrial dysfunction, and aging. J Signal Transduct 2012, 646354. Dhar R, Sagesser R, Weikert C, Wagner A (2013). Yeast adapts to a chang- ing stressful environment by evolving cross-protection and anticipatory gene regulation. Mol Biol Evol 30, 573–588. Dowell RD, Ryan O, Jansen A, Cheung D, Agarwala S, Danford T, Bernstein DA, Rolfe PA, Heisler LE, Chin B, et al. (2010). Genotype to phenotype: a complex problem. Science 328, 469. Ehrenreich IM, Bloom J, Torabi N, Wang X, Jia Y, Kruglyak L (2012). Genetic architecture of highly complex chemical resistance traits across four yeast strains. PLoS Genet 8, e1002570. Ehrenreich IM, Gerke JP , Kruglyak L (2009). Genetic dissection of com- plex traits in yeast: insights from studies of gene expression and other phenotypes in the BYxRM cross. Cold Spring Harb Symp Quant Biol 74, 145–153. Ehrenreich IM, Torabi N, Jia Y, Kent J, Martis S, Shapiro JA, Gresham D, Caudy AA, Kruglyak L (2010). Dissection of genetically complex traits with extremely large pools of yeast segregants. Nature 464, 1039– 1042. Fabrizio P , Liou LL, Moy VN, Diaspro A, Valentine JS, Gralla EB, Longo VD (2003). SOD2 functions downstream of Sch9 to extend longevity in yeast. Genetics 163, 35–46. Fierro-Risco J, Rincon AM, Benitez T, Codon AC (2013). Overexpression of stress-related genes enhances cell viability and velum formation in Sherry wine yeasts. Appl Microbiol Biotechnol 97, 6867–6881. Goldstein AL, McCusker JH (1999). Three new dominant drug resistance cassettes for gene disruption in Saccharomyces cerevisiae. Yeast 15, 1541–1553. Granek JA, Murray D, Kayrkci O, Magwene PM (2013). The genetic architecture of biofilm formation in a clinical isolate of Saccharomyces cerevisiae. Genetics 193, 587–600. Greenough MA, Camakaris J, Bush AI (2013). Metal dyshomeostasis and oxidative stress in Alzheimer’s disease. Neurochem Int 62, 540–555. Higgins VJ, Beckhouse AG, Oliver AD, Rogers PJ, Dawes IW (2003). Yeast genome-wide expression analysis identifies a strong ergosterol and oxidative stress response during the initial stages of an industrial lager fermentation. Appl Environ Microbiol 69, 4777–4787. Jomova K, Vondrakova D, Lawson M, Valko M (2010). Metals, oxida- tive stress and neurodegenerative disorders. Mol Cell Biochem 345, 91–104. Kitagaki H, Takagi H (2014). Mitochondrial metabolism and stress response of yeast: applications in fermentation technologies. J Biosci Bioeng 117, 383–393. Koppenhofer D, Kettenbaum F, Susloparova A, Law JK, Vu XT, Schwab T, Schafer KH, Ingebrandt S (2015). Neurodegeneration through oxidative stress: monitoring hydrogen peroxide induced apoptosis in primary cells from the subventricular zone of BALB/c mice using field-effect transis- tors. Biosens Bioelectron 67, 490–496. Kvitek DJ, Will JL, Gasch AP (2008). Variations in stress sensitivity and genomic expression in diverse S. cerevisiae isolates. PLoS Genet 4, e1000223. Li H, Durbin R (2009). Fast and accurate short read alignment with Burrows- Wheeler transform. Bioinformatics 25, 1754–1760. Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, 1000 Genome Project Data Processing Subgroup (2009). The Sequence Alignment/Map format and SAMtools. Bioinfor- matics 25, 2078–2079. Longo VD, Shadel GS, Kaeberlein M, Kennedy B (2012). Replicative and chronological aging in Saccharomyces cerevisiae. Cell Metab 16, 18–31. Lorenz K, Cohen BA (2012). Small- and large-effect quantitative trait locus interactions underlie variation in yeast sporulation efficiency. Genetics 192, 1123–1132. Mackay TF, Stone EA, Ayroles JF (2009). The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 10, 565–577. REFERENCES Aouacheri O, Saka S, Krim M, Messaadia A, Maidi I (2015). The investigation of the oxidative stress-related parameters in type 2 diabetes mellitus. Can J Diabetes 39, 44–49. Bloom JS, Ehrenreich IM, Loo WT, Lite TL, Kruglyak L (2013). Finding the sources of missing heritability in a yeast cross. Nature 494, 234–237. Braun RJ, Westermann B (2011). Mitochondrial dynamics in yeast cell death and aging. Biochem Soc Trans 39, 1520–1526. 218 | R. A. Linder et al. Molecular Biology of the Cell Matsui T, Linder R, Phan J, Seidl F , Ehrenreich IM (2015). Regulatory rewir- ing in a cross causes extensive genetic heterogeneity. Genetics 201, 769–777. Parts L, Cubillos FA, Warringer J, Jain K, Salinas F, Bumpstead SJ, Molin M, Zia A, Simpson JT, Quail MA, et al. (2011). Revealing the genetic structure of a trait by sequencing a population under selection. Genome Res 21, 1131–1138. Petti AA, Crutchfield CA, Rabinowitz JD, Botstein D (2011). Survival of starving yeast is correlated with oxidative stress response and nonrespiratory mito- chondrial function. Proc Natl Acad Sci USA 108, E1089–E1098. Sasano Y, Haitani Y, Hashida K, Ohtsu I, Shima J, Takagi H (2012). En- hancement of the proline and nitric oxide synthetic pathway improves fermentation ability under multiple baking-associated stress conditions in industrial baker’s yeast. Microbial Cell Fact 11, 40. Steinmetz LM, Sinha H, Richards DR, Spiegelman JI, Oefner PJ, McCusker JH, Davis RW (2002). Dissecting the architecture of a quantitative trait locus in yeast. Nature 416, 326–330. Storey JD, Tibshirani R (2003). Statistical significance for genomewide stud- ies. Proc Natl Acad Sci USA 100, 9440–9445. Storici F, Lewis LK, Resnick MA (2001). In vivo site-directed mutagenesis using oligonucleotides. Nature Biotechnol 19, 773–776. Taylor MB, Ehrenreich IM (2014). Genetic interactions involving five or more genes contribute to a complex trait in yeast. PLoS Genet 10, e1004324. Taylor MB, Ehrenreich IM (2015a). Transcriptional derepression uncovers cryptic higher-order genetic interactions. PLoS Genet 11, e1005606. Taylor MB, Ehrenreich IM (2015b). Higher-order genetic interactions and their contribution to complex traits. Trends Genet 31, 34–40. Tong AH, Boone C (2006). Synthetic genetic array analysis in Saccharomy- ces cerevisiae. Methods Mol Biol 313, 171–192. Treusch S, Albert FW, Bloom JS, Kotenko IE, Kruglyak L (2015). Genetic mapping of MAPK-mediated complex traits across S. cerevisiae. PLoS Genet 11, e1004913. Varvarovska J, Racek J, Stetina R, Sykora J, Pomahacova R, Rusavy Z, Lacigova S, Trefil L, Siala K, Stozicky F (2004). Aspects of oxidative stress in children with type 1 diabetes mellitus. Biomed Pharmacother 58, 539–545. Wang X, Kruglyak L (2014). Genetic basis of haloperidol resistance in Sac- charomyces cerevisiae is complex and dose dependent. PLoS Genet 10, e1004894. Wilkening S, Lin G, Fritsch ES, Tekkedil MM, Anders S, Kuehn R, Nguyen M, Aiyar RS, Proctor M, Sakhanenko NA, et al. (2014). An evaluation of high-throughput approaches to QTL mapping in Saccharomyces cerevisiae. Genetics 196, 853–865. Will JL, Kim HS, Clarke J, Painter JC, Fay JC, Gasch AP (2010). Incipient balancing selection through adaptive loss of aquaporins in natural Saccharomyces cerevisiae populations. PLoS Genet 6, e1000893. 135 Appendix D: The stress-inducible peroxidase TSA2 enables Chromosome IV duplication to be conditionally beneficial in Saccharomyces cerevisiae. This work appears as published in 2017 in G3 7(9): 3177-3184. INVESTIGATION The Stress-Inducible Peroxidase TSA2 Underlies a ConditionallyBeneficialChromosomalDuplicationin Saccharomyces cerevisiae Robert A. Linder,* ,†,1 John P. Greco,* Fabian Seidl,* Takeshi Matsui,* and Ian M. Ehrenreich* ,1 *MolecularandComputationalBiologySection,DepartmentofBiologicalSciences,UniversityofSouthernCalifornia,Los Angeles, California 90089-2910 and † Department of Ecology and Evolutionary Biology, School of Biological Sciences, University of California, Irvine, California 92697-2525 ABSTRACT Although chromosomal duplications are often deleterious, in some cases they enhance cells’ abilities to tolerate specific genetic or environmental challenges. Identifying the genes that confer these conditionally beneficial effects to particular chromosomal duplications can improve our understanding of the genetic and molecular mechanisms that enable certain aneuploidies to persist in cell populations and contributetodiseaseandevolution.Here,weperformascreenforspontaneousmutationsthatimprovethe tolerance of haploid Saccharomyces cerevisiae to hydrogen peroxide. Chromosome IV duplication is the most frequent mutation, aswell asthe onlychangeinchromosomal copynumber seen inthe screen.Using ageneticmappingstrategythatinvolvessystematicallydeletingsegmentsofaduplicatedchromosome,we show that the chromosome IV’s duplication effect is largely due to the generation of a second copy of the stress-inducible cytoplasmic thioredoxinperoxidase TSA2 .Ourfindingsaddtoagrowing bodyofliterature thatshowstheconditionallybeneficialeffectsofchromosomalduplicationaretypicallymediatedbyasmall number of genes that enhance tolerance to specific stresses when their copy numbers are increased. KEYWORDS aneuploidy chromosomal duplication natural genetic variation oxidative stress yeast Abnormalitiesinchromosomalcopynumber(or “aneuploidies”)often lead to cancer (Davoli et al. 2013; Potapova et al. 2013; Sheltzer 2013; Durrbaum and Storchova 2015, 2016; Laubert et al. 2015; Mohr et al. 2015; Nicholson and Cimini 2015; Pinto et al. 2015; Santaguida and Amon2015),developmentaldefects(Ottesenetal.2010;Gannonetal. 2011; Siegel and Amon 2012; Akasaka et al. 2013; Bose et al. 2015), prematureaging(Andrianietal.2016;Sunshineetal.2016),andother healthissuesinhumans.InthebuddingyeastSaccharomycescerevisiae, aneuploidiesalso tendtobe deleterious (Torres et al. 2007;Yona etal. 2012;Potapova et al.2013;Dodgson et al. 2016; Sunshine et al. 2016). However,insomecases,theseaneuploidiesareconditionallybeneficial, astheycanenableyeasttotoleratespecificloss-of-functionmutationsor environmental stresses (Selmecki et al.2009,2015;Pavelka et al.2010; Chenetal.2012a,b;Yona et al.2012;Tan etal.2013;Kaya et al.2015; Liuetal.2015;Meenaetal.2015;Sirretal.2015;Sunshineetal.2015). An important question regarding such conditionally beneficial aneuploidiesis,dotheireffectstendtoariseduetochangesinthecopy numbers of one or multiple genes on the aneuploid chromosome(s)? Severalstudieshaveattemptedtoaddressthisquestionbyidentifyingthe specificgenesunderlyingtheconditionallybeneficialeffectsofparticular chromosomalduplicationsinbuddingyeast(Pavelkaetal.2010;Chen etal.2012a; Kaya etal.2015; Liu etal.2015).Forexample,Kayaet al. found that chromosome XI duplication enabled S. cerevisiae strains lacking all eight thiol peroxidase genes to be nearly as tolerant to oxidative stress as a wild-type strain (Kaya et al. 2015). Two genes mediatedthebenefitofchromosomeXIduplication:CCP1,ahydrogen peroxide scavenger that acts in the mitochondrial intermembrane space, and UTH1, a mitochondrial inner-membrane protein. In an- other case, Liu et al. showed that chromosome VIII duplication com- pensates for the absence of essential nuclear pore proteins by causing overexpressionofagenethatregulatescellmembranefluidity(Liuetal. 2015). Furthermore, Chen et al. demonstrated that chromosome XV Copyright © 2017 Linder et al. doi: https://doi.org/10.1534/g3.117.300069 Manuscript received May 30, 2017; accepted for publication July 21, 2017; published Early Online July 26, 2017. This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/ licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Supplemental material is available online at www.g3journal.org/lookup/suppl/ doi:10.1534/g3.117.300069/-/DC1. 1 Corresponding authors: Department of Ecology and Evolutionary Biology, McGaugh Hall 5415, University of California, Irvine, Irvine, CA 92697-2525. E-mail: linderr@uci.edu; and Molecular and Computational Biology Section, Ray R. Irani Hall 201, University of Southern California, Los Angeles, CA 90089- 2910. E-mail: ian.ehrenreich@usc.edu Volume 7 | September 2017 | 3177 duplication confers resistance to the Hsp90 inhibitor radicicol by in- creasingthedosageofPDR5andSTI1,whichencodeamultidrugpump and an Hsp90 cochaperone, respectively (Chen et al. 2012a). Lastly, Pavelka et al. showed that chromosome XIII duplication confers in- creased resistance to the DNA-damaging agent 4-NQO by increasing the dosage of ATR1, another multidrug pump (Pavelka et al. 2010). These studies suggest thatthe conditional benefitsofaneuploidization are typically mediated by changes in the copy numbers of a small number of genes that allow cells to cope with specific stresses. Here, we explore how aneuploidies can enable yeast to tolerate environmentalstressestoalevelbeyondthatachievablethroughgenetic variation that segregates in natural populations(so-called “natural ge- neticvariation”).Wepreviouslyfoundthatprogenyproducedbymat- ing the laboratory strain BY4716 (BY), the vineyard isolate RM11-1a (RM),andtheoakisolateYPS163(YPS)showsimilarmaximalhydro- gen peroxide tolerances despite their genetic differences (see Supple- mental Material, Figure S1 in File S1), suggesting the extent to which naturalgeneticvariationcanincreasetolerancetothiscompoundmay be constrained (Linder et al. 2016). We attempted to overcome these limits by screening haploid segregants from the BY· RM, BY· YPS, andRM·YPScrossesforspontaneousmutationsthatincreasehydro- gen peroxide tolerance beyond the maximal levels seen in our past work. Specifically, we took the single most tolerant F 2 segregant that weidentifiedineachofthepossiblepairwisecrossesofthethreestrains and used these three individuals as the progenitors in a screen for spontaneous mutations that enhance hydrogen peroxide resistance. We obtained 37 mutants that show increased hydrogen peroxide tolerancerelativetotheir respective progenitors. Using wholegenome sequencing, we identified spontaneous mutations that may cause in- creased hydrogen peroxide tolerance in these mutants. Duplication of chromosomeIVwasthemostfrequentmutation,andtheonlyaneuploidy detectedinthescreen.ConsistentwithchromosomeIVduplicationbeing conditionallybeneficial,wefoundthatchromosomeIVaneuploidsgrow worse than their progenitors in the absence of hydrogen peroxide, and thatthebenefitofchromosomeIVdisomyoccursonagarplatesbutnot inliquidmedia.Followingonfromthesediscoveries,wedeterminedthe geneticbasisofthischromosomalduplication’seffectusingchromosome- and gene-scale genetic engineering. Employing these techniques, we identifiedasinglegene,thestress-induciblecytoplasmicthioredoxinper- oxidase TSA2,whichaccountsforthemajorityoftheeffectofchromo- some IV duplication on hydrogen peroxide tolerance. Our findings illustratehowaneuploidiescanenablecellstosurpassthelevelsofstress tolerance that are attainable through natural genetic variation and pro- vide further support that the conditionally beneficial effects of aneu- ploidies tend to have a simple genetic basis. MATERIALS AND METHODS Screen for increased hydrogen peroxide tolerance ProgenitorstrainswereproducedduringourpastworkontheBY·RM, BY· YPS, and RM· YPS crosses (Ehrenreich et al. 2012; Linder et al. 2016) and are described in more detail in Linder et al. (2016). Each progenitor strain was streaked ontoyeast extract-peptone-dextrose (YPD) plates and incubated for 2 d at 30!.Tomaximizebiologicalin- dependence among mutations obtained from the screen, 24 different colonies per progenitor were individually inoculated into 800 mlof YPD broth. These cultures were outgrown for 2 d at 30! with shaking at 200 rpm. A total of 20mlfromeachculturewerethendilutedusing 80mlofsterilewaterandspreadontoYPDplatescontainingdifferent dosesofhydrogenperoxide.Theseplateswereincubatedat30!for4–6d, so that slow-growing mutants would have enough time to form visible colonies.Glycerolfreezerstockswerethenmadeformutantsthatformed visible colonies on doses at least 1 mM higher than the minimum in- hibitory concentration (MIC) of their progenitor and stored at280!. After this initial screen, all mutants were then phenotyped side-by-side with their respective progenitors across a broad range of hydrogen per- oxide doses to confirm their increased tolerance. Mutants that grew on dosesatleast0.5mMhigherthantheirprogenitorweresavedfordown- streamanalysis.FourteenBY· RM,nineRM· YPS,and14BY· YPS derived mutants met this requirement. Whole genome sequencing of progenitors and mutants Archivedmutants,aswellastheirprogenitors,wereinoculatedintoYPD liquidandoutgrownfor2dat30!.Foreachmutant,DNAwasextracted usingtheQiagenDNeasykitandawholegenomelibrarywasprepared using the Illumina Nextera kit. Each library was constructed using limited cycle PCR to add adapters to Nextera-treated genomic DNA according to manufacturer recommendations. Two unique 8-bp barcodeswereincorporatedintoeachlibrary,sothatdualindexed sequencing could be performed. Sequencing was conducted on an IlluminaNextSeq500instrumentattheUSCEpigenomeCenter.Reads werethendemultiplexedusingcustomPythonscripts.Theaverageper site coverages of the progenitors and mutants were 116· and 130·, respectively (see Note S1 in File S1). Analysis of sequencing data Tofacilitatehighconfidenceidentificationofmutationsusingtheshort- readsequencingdata,progenitor-specificreferencegenomesweregener- ated. Burrows–Wheeler Aligner (BWA-MEM) software (Li and Durbin 2009)wasusedtomapreadsfromtheprogenitorstotheBYgenome.The parameters used for alignment were as follows: “bwa –mem –t6ref.fsa read1.f1read2.fq. output.sam.”Toremoveduplicatereads,thermdup command was implemented in SAMtools (Li et al. 2009). To create Mpileup files, SAMtools was then run with the command “samtools mpileup –fref.fsaread.rmdp.srt.bam. output.mp.” We then identified genetic differences between a given progenitor and BY, integrated these differences into the BY genome, and then remapped reads for that pro- genitortothemodifiedgenomesequence.Thisprocesswasrepeatedfor upto10cycles,andtheoutputgenomewasthenusedasthereferencefor mapping reads from mutants derived from that progenitor. Reads from mutant strains were aligned to the progenitor reference genomes using BWA-MEM and the same parameters described above, followed by the generation of Mpileup files with SAMtools and the same parameters reported above. Point mutations and small indels were identified as dif- ferencesfromthecorrespondingprogenitorthatwereseeninatleast90% ofthereadsataparticularsite,whereasaneuploidiesweredetectedbased on changes in coverage across all or part of a chromosome. Custom Pythonscriptswereusedtoidentifythesemutations,aswellastocalcu- late per site or per genomic window sequencing coverage. Gene ontology enrichment analysis Gene ontology (GO) analysis was carried out on the Saccharomyces Genome Database website (http://www.yeastgenome.org/cgi-bin/GO/ goTermFinder.pl), using GO Term Finder version 0.83 with the mo- lecularfunctioncategoryselected.AllgeneslistedinTableS1andTable S2 in File S1 were included in the analysis. PCR-mediated chromosomal deletion SimilartoSugiyamaetal.(2008),PCR-mediatedchromosomaldeletion (PCD) was implemented using constructs with three segments, in the following order: 300–600 bp of sequence homologous to the desired 3178 | R. A. Linder et al. integrationsiteonchromosomeIV,akanMXcassette,andasynthetictelo- mereseedsequenceconsistingofsixrepeatsofthemotif 59-CCCCAA-39 (FigureS2inFileS1).Togeneratethisconstruct,theregioncontaining theintegrationsitewasPCRamplifiedfromgenomicDNAusingareverse primer that was tailed with 30 bases of sequence identity to the kanMX construct. At the same time, kanMX was PCR amplified from a plasmid usinga forward primerwith30basesof sequenceidentitytothe integra- tion site and a reverse primer containing the synthetic telomere seed se- quence. Integration site and kanMX/synthetic telomere seed sequences were then joined using overlap fusion PCR (Sugiyama et al. 2005). This wasdonebymixingthetwoproductsinequimolarfractionsincombina- tionwiththeforwardprimerfortheintegrationsiteandthereverseprimer for the kanMX/synthetic telomere seed sequence, and conducting PCR. The cycling parameters for overlap fusion PCR were as follows: initial denaturation at 98! for 3 min, followed by 30 cycles of 98! for 30 sec, 63!for30sec,and72!for1.5min.Thefinalstepwasa5minextensionat 72!.Throughoutthisprocess,allPCRstepswereimplementedusingNEB High-Fidelity Phusion polymerase and all PCR products were purified usingeitherQiagenQIAquickGelExtractionorQIAquickPCRPurifica- tion kits. A standard lithium acetate–based technique was used to trans- formPCDconstructsintocells,with!5mgofconstructemployed(Gietz andSchiestl2007).TransformantswererecoveredusingselectionforG418 resistanceonYPDplates,verified bycolonyPCRorgenomesequencing, andarchivedat280!inglycerolsolution.PrimersusedtoconstructPCD products are listed in Table S3 in File S1. Deletion of individual genes Targeted gene deletions were performed using the kanMX cassette. Constructs for deleting ADA2, ECM11, GUK1, NHX1,and UTP6 were generated by amplifying the kanMX cassette from a plasmid using tailed primers. Each tail contained 30–60 bases of sequence homologytothe targetgene’sflankingregions.Specifically,thefor- wardandreverseprimersweredesignedtohavetailsidenticaltothe regionimmediatelyupstreamofthetranslationstartsiteanddown- streamofthestopcodon,respectively.RegardingdeletionsofPPN1, TOM1,TSA2,RPS17B,RPS18A,YDR455C,andYHP1transformations usingknockoutcassettestailedwithonly60–120bpoftotalhomology to the targeted region were relatively inefficient. To increase the effi- ciency of these knockouts, gene deletion products were generated in a manner analogous to making the PCD constructs described above. Then, 300–600 bp of sequence targeting a region just upstream of the translation start site was fused to the kanMX cassette, with the caveat thatthereverseprimerforkanMXamplificationwastailedwith30–60 bases of homology to a region just downstream of the gene being de- leted.Additionally,several100bpofsequenceup-anddownstreamof TSA2 were deleted by the same method, using a modified targeting sequence and downstream homology tail. The same lithium acetate– basedtransformationmethodsdescribedforPCDwereusedforall individual gene knockouts. Transformants were verified by colony PCR. Primers used for individual gene knockouts are listed in Table S4 in File S1. Plasmid-based increase of TSA2 copy number Low-copy plasmids were constructed by cloning the complete TSA2 locus (promoter, coding region, and 39UTR) from the BY· RM hap- loidprogenitorintothemultiplecloningsiteofpRS410,agiftfromFred Cross purchased through Addgene (Addgene plasmid #11258). This plasmid contains CEN6/ARS4, which results in maintenance at low- copynumberinS.cerevisiae,aswellasAmpRandKanR2,whichenable selection of transformed bacteria on ampicillin and transformed yeast onG418,respectively.RestrictionsitesforNotIandXhoIwereaddedas 59 tails to the primers used to amplify the TSA2 locus. The pRS410 plasmidandmodifiedTSA2PCRproductwereseparatelyandsequen- tiallydigestedwitheachenzymeat37!,afterwhich,thereactionswere quenched by a 20-min incubation at 65!. After the second digestion step, the doublydigested plasmid and insertwere ligatedovernight.A small volume of the ligation reaction was then used to transform competent Escherichia coli obtained from Invitrogen. Transformants were selected on LB plates supplemented with ampicillin. Successful integration ofthe TSA2 locus was confirmedbyPCRamplifying DNA extracted from transformants using M13 primers, whichflank the mul- tiplecloning site. Plasmidswere purifiedfromtransformedstrains using Figure1 ChromosomeIVduplicationisthemostfrequentmutationaleventinascreenforspontaneoushydrogenperoxideresistancemutations. (A)Genome-widecoverageplotsareprovidedfortheBY· RMandRM· YPSF 2 progenitors,aswellasrepresentativeaneuploidorsegmentalduplication mutantsderivedfromthem.Site-specificcoveragesarescaledtotheaveragepersitecoverageseenacrosstheentiregenomeforagivenstrain.(B)Barplots show the fraction ofsequenced mutants that were disomic for the right arm of chromosome IV both across the entire screen and by individual progenitor. Volume 7 September 2017 | TSA2 and Beneficial Aneuploidy in Yeast | 3179 the Qiagen midi plasmid purification kit and transformed into euploid progenitor strains. Based on Sanger sequencing, the cloned TSA2 locus wasfoundtohavethesamesequenceastheBY·RMprogenitor.Primers used for construction of plasmids are listed in Table S4 in File S1. qPCR analysis of TSA2 copy number TSA2copynumberwascheckedinoverexpressiontransformants,aswell as the BY · RM F 2 progenitor and a BY · RM mutant disomic for chromosomeIV,usingqPCR.DNAwasextractedfromthetransformants using the Qiagen QIAamp kit. Transformants were grown on selection plates containing G418. After 3 d at 30!,threecoloniesfromeachover- expressiontransformantandfourcoloniesfromtheBY·RMprogenitor anddisomic mutantstrains were transferredto96-deep-wellplates, with 800mlofYPD,andincubatedat30! for 2 d with shaking, after which strainswereeitherpinnedontoYPDplatessupplementedwithhydrogen peroxide or transferred to 96-deep-well plates containing YPD broth supplemented with hydrogen peroxide. The remaining cells from each wellweretransferredtoEppendorftubes,spundownat.13,000rpmfor 10 sec, after which the supernatantwasremovedandcellswereresus- pendedinspheroplastingsolutioncontainingzymolyase.AnRNasetreat- ment was then performed to eliminate any transcripts that could potentially be amplified along with the genomic copies of TSA2 and our reference gene ACT1.TheKAPASYBRFastqPCRkitwasusedfor qPCR analysis on the Agilent Mx3005p. 96-well semiskirted PCR plates wereemployedandBio-RadMicroseal “C”Filmwasusedtocoverthem. Cyclingparameterswereasfollows:95!for2min,followedby40cyclesof 95! for 3 sec and 60! for 30 sec. At the end of each cycle, fluorescence measurementswerethentakenforeachwell.Afterthefinalcycle,a10-min extension step was run at 72!,followedbyameltingcurveanalysisfrom 55to95!,by0.4!increments.AfterqPCR,productswerecheckedon1.5% agarosegelstoensureprimerspecificity.TSA2abundanceineachsample wasmeasuredrelativetoACT1.Allexperimentswereconductedinatleast biologicaltriplicate.PrimersusedforqPCRarelistedinTableS4inFileS1. Phenotyping of transformants Transformantswereoutgrownfor2din800mlofYPDbrothincubated at30!withshaking.Ascontrolsforbatcheffects,eachtimeoneormore transformants were phenotyped, their euploid and aneuploid progen- itorswerealsoexamined.Aftertheliquidoutgrowthstep,strainswere pinnedontoYPDplatessupplementedwithdifferenthydrogenperox- idedoses.Alternatively,in some experiments,strains were thentrans- ferredtoliquidmediasupplementedwitharangeofdosesofhydrogen peroxide for3 d, after whichthe OD630 of50ml ofculturefromeach welldilutedin150mlofwaterwasmeasuredusingaplatereader.These strainswereincubatedat30!withshaking.Eachexperimentwasdone inatleastbiologicaltriplicate.Agarplateswereincubatedfor3dat30! andthenimagedonaGelDocimagingdeviceusinga0.5secexposure time. MIC was determined as the lowest hydrogen peroxide dose at which a given strain could not grow. Data availability All sequence data are available through the Sequence Read Archive under Bioproject number PRJNA338809, study accession number SRP081591, and sample accession numbers SRX2018688–SRX2018727 and SRX2037407–SRX2037410. RESULTS AND DISCUSSION Screen for spontaneous mutations that increase hydrogen peroxide tolerance Atotalof24independentculturesofeachofthethreehaploidprogenitor strains were examined after 2 d of outgrowth using selection on agar platessupplementedwithhydrogenperoxide(seeMaterialsandMeth- ods). All mutants (37 total) that exhibited an increase in MIC at least 0.5 mM higher than their corresponding progenitor were analyzed further (see Figure S3, A–Cin File S1; Materials and Methods). BY · RM, RM · YPS, and BY · YPS derived mutants were, on average, 2.3 mM (32%), 2.1 mM (25%), and 0.6 mM (7%) more tolerant than their progenitor, respectively (see Figure S3, A–C in File S1). The most frequently identified mutation is a chromosomal duplication Analysisofgenome-widesequencingcoverageindicatedthat43%ofthe mutantscarriedtwocompletecopiesofchromosomeIV(Figure1).No Figure 2 Duplication of chromosome IV is conditionally beneficial. (A) Replicates of a strain fully disomic for chromosome IV show significantly improved growth compared to euploid replicates when grown on agar plates supplemented with 7.5 mM hydrogen peroxide. However, when thesestrains aregrownonagarplateswith richmedium(B), euploidindividuals showsuperioroverallgrowthas compared to individualsdisomic for chromosome IV. When both strains are exposed to 10 mM of hydrogen peroxide in liquid culture (C), euploid individuals also show higher growth than individuals disomic for chromosome IV. In (A) and (B), end-point colony pixel intensity measurements were generated using plate growth assays and image analysis in ImageJ. Mean and 95% confidence intervals are plotted. These measurements are based on four biological replicates per strain, which were each grown in three technical replicates (see Note S5 in File S1). In (C), individuals were exposed to hydrogen peroxide in liquid media for 3 d, after which the OD630 was measured for each culture. Measurements were again based on four biological and three technical replicates per strain (see Materials and Methods). 3180 | R. A. Linder et al. other aneuploidies were detected. The disomy was common among the BY · RM (79%) and RM· YPS derived (45%) mutants, but was absent from the BY · YPS derived mutants (Figure 1B). Given that the BY · RM and RM · YPS derived mutants also showed higher average gains in tolerance, thisfinding is consistent with duplication of chromosome IV conferring a sizable increase in tolerance (see Figure S3, A–C in File S1). Asinglesegmentalduplicationwasalsodetected;thiswasfoundina BY · RM derived mutant that possessed two copies of only the right armofchromosomeIV(chromosomeIV-R;Figure1A).Thesegmental duplicationspanned!890kb(58%ofchromosomeIV).Includingthis mutant, 86% ofthe BY· RMderived mutants weredisomic for chro- mosome IV-R. Additionally,39uniquepointmutationswereidentifiedamongthe mutants (see Table S1 and Table S2 in File S1). Twenty-two of these point mutations were nonsynonymous, eight were synonymous, and nineoccurredinintergenicregions.Somepointmutationswerelocated ingenesthataffecthydrogenperoxidetolerance,includingaSUMOE3 ligase involved in DNA repair (MMS21), an activator of cytochrome oxidase 1 translation (PET309), a subunit of cytochrome c oxidase (COX1), and a negative regulator of Ras-cAMP-PKA signaling (GPB2; see Table S1 and Table S2 in File S1). At false discovery rates of 0.17 or lower, no specific GO enrichments were seen among the genes harboring point mutations (see Materials and Methods). These results are consistent with our past finding that genetic perturbation of many different cellular processes can influence hydrogen peroxide tolerance (Linder et al. 2016), but could also have been driven by passengermutationsthatdonotplaycausalrolesinhydrogenperoxide tolerance. DuplicationofchromosomeIV-Rwasthemostfrequentlydetected mutationinourscreen.However,strainscarryingpointmutationswere foundamongtheBY·RMandRM·YPSderivedmutantsthatwereas Figure 3 Genetic dissection of the chromosome IV duplication’s effect on hydrogen peroxide tolerance. (A) PCDs were staggered nearly every 50 kb along chromosome IV-R in a BY · RM derived aneuploid. Phenotyping of partially aneuploid strains generated by PCD identified a single genomic interval with a large effect on hydrogen peroxide tolerance when strains were examined at a dose of 7.5 mM. This !40 kb region contains 23 genes and three dubious ORFs. (B) Two additional PCD strains were generated within the previously mentioned window and examined at 7.5 mM. This narrowed the interval to 7 kb that contained five genes and a dubious ORF. (C) Individual gene deletions revealed that TSA2 islargelyresponsiblefortheincreaseintoleranceconferredbyduplicationofchromosomeIV-R.In(AandB),representativeimagesof colonies grown at 7.5 mM of hydrogen peroxide are shown next to their associated PCD strains; numbers adjacent to the telomere indicate the starting position of each PCD on chromosome IV. In (C), representative images of colonies grown at 7.5 mM of hydrogen peroxide are again shown next to their associated individual gene deletion strains, with deleted regions indicated by gaps on the duplicated chromosome. Volume 7 September 2017 | TSA2 and Beneficial Aneuploidy in Yeast | 3181 tolerant to hydrogen peroxide as strains containing the chromosome IV-R duplication (see Figure S3, A and B in File S1). In these genetic backgrounds, duplication of chromosome IV-R may occur more fre- quentlythanspontaneouspointmutationsthatincreasehydrogenper- oxide tolerance (Kaya et al. 2015; Liu et al. 2015). Chromosome IV-R duplication is conditionally beneficial Chromosome IV-R duplication only conferred a growth benefit when hydrogen peroxide exposure occurred on agar plates (Figure 2A and FigureS4,AandBinFileS1).Inotherconditions,chromosomeIV-R duplicationappearedtohaveafitnesscost.Similartopastreportsthat chromosomalduplicationsareusuallydeleterious(Pavelka etal.2010; Sunshine et al. 2016), chromosome IV-R duplication reduced growth both on agar plateswithout hydrogen peroxideand in liquid medium containinghydrogenperoxide(Figure2,BandCandFigureS4,Aand B in File S1). Identification of a single region responsible for most of the effect of chromosome IV-R duplication TomaptheconditionalgrowthbenefitofchromosomeIV-Rduplication tospecificgenes,weadaptedatechniqueknownasPCD,whichinvolves eliminatingsegmentsofachromosomethataredistaltoacentromereby inserting a drug resistance cassette linked to a synthetic telomere seed sequence(FigureS2inFileS1;MaterialsandMethods)(Sugiyamaetal. 2005,2008;Kabolietal.2016).ColonyPCRwasusedbothtoconfirm correct placement of the deletion cassette as well as to verify that a singlecopyofthedeletedregionremained(seeMaterialsandMethods). WefirstusedPCDtodeletetherighthalfofchromosomeIV-Rfrom aBY·RMderivedaneuploid(Figure3A;MaterialsandMethods).This geneticchange,whichwasconfirmedbywholegenomesequencing(see FigureS5inFileS1;MaterialsandMethods),causedareversiontothe hydrogen peroxide tolerance exhibited by the euploid BY · RM pro- genitor prior to the screen (see Figure S6A in File S1). This result indicates that the deleted chromosomal segment is required for the aneuploidy’s effect. WenextgeneratedapanelofPCDstrains,withlarge-scaledeletions staggered,onaverage,every50.7kbalongthedistalhalfofchromosome IV-R. This led to the identification of a single 40 kb region that recapitulatesmostoftheeffectofthechromosomeIV-Rdisomy(Figure 3A and Figure S6A in File S1). It is important to note that, although chromosome-scale deletions of this 40 kb region appeared to pheno- copy the progenitor at certain hydrogen peroxide doses (Figure 3, A andB),theaverageMICsofthesestrainswerehigherthanthatoftheir progenitor(FigureS6,AandBinFileS1).Possibleexplanationsforthis resultinclude the presence of one or moreadditional pointmutations that contribute to hydrogen peroxide tolerance in the mutant, an ad- ditional gene or regulatoryelement on chromosomeIV-R whose dos- agecontributestotolerance,andnonlinearrelationshipsbetweenDNA content and hydrogen peroxide tolerance, as the chromosome-scale deletion strains remain aneuploid at a large portion of chromosome IV(seeNoteS2inFileS1).Wenotethatbatcheffectsresultedinslight differencesinthegrowthoftheeuploidprogenitorat7.5mMhydrogen peroxide across separate experiments, but that growth differences be- tween genotypes remained qualitatively consistent (Figure 3A, Figure S4A in File S1, and Figure S10 in File S1). Duplication of TSA2 mediates the conditionally beneficial effect of the aneuploidy To further resolve this window, we performed two additional PCD transformations, which fine-mapped the causal interval to roughly 7 kb, spanning positions 1,362,862 to 1,369,812 bp (Figure 3B and Figure S6B in File S1). This interval contains five genes—the poly- phosphatase PPN1, the cytoplasmic thioredoxin peroxidase TSA2, the guanylate kinase GUK1,theionexchanger NHX1, and the E3 ubiquitin ligase TOM1—as well as a dubious ORF (YDR455C). We used standard techniques to individually delete each of these genes from a BY · RM derived aneuploid, again using colony PCR to verify both correct placement of the deletion cassette and that a singlecopyofthegeneremained(seeMaterialsandMethods).Also, because telomeres can influence the transcription of genes.20 kb away (Gottschling et al. 1990; Aparicio and Gottschling 1994), we deletedthe sixgenesupstreamofPPN1(Figure3BandFigureS7in File S1). The only genedeletion that showed a phenotypic effectwas TSA2, whichencodesacytoplasmicthioredoxinperoxidase(Figure3,BandC andFigureS7inFileS1)(Gaschetal.2000;Parketal.2000;Wongetal. 2002; Munhoz and Netto 2004; Ogusucu et al. 2007; Nielsen et al. 2016).PreviousworkhasshownthatdeletingTSA2leadstoadecrease in hydrogen peroxide tolerance (Wong et al. 2002). Loss of the TSA2 codingregioneliminatedthemajorityoftheaneuploidy’seffect(Figure 3C and Figure S7 in File S1), proving a causal role for TSA2 in the conditional benefit conferred by chromosome IV-R duplication. Knockout of TSA2 in other mutants, including a fully disomic BY · RM mutant, apartially disomicBY· RM mutant, andafully disomic RM·YPSmutant,confirmedthattheeffectofTSA2wasreproducible acrossdifferent aneuploidindividualsrecoveredfromour screen(Fig- ure S8 in File S1; see Note S3 in File S1). Figure 4 Plasmid-based overexpression of TSA2 partially recapitu- lates the effect of having two chromosomal copies of this gene. The entirety of the TSA2 locus was cloned into a CEN plasmid that was then transformed into the haploid progenitor (see Materials and Methods). Two independently generated TSA2 overexpressing strains were screened for hydrogen peroxide tolerance alongside the hap- loid progenitor and disomic strain. Shown are 95% confidence inter- valsfortheMICoftheoverexpressingstrainsasapercentageofthe MIC of the aneuploid strain. These measurements are based on three biological replicates, which were each grown in three technical replicates. 3182 | R. A. Linder et al. Plasmid-based overexpression of TSA2 increases tolerance of hydrogen peroxide To further confirm that an increase in copy number of TSA2 has a beneficial effect on hydrogen peroxide tolerance, the complete TSA2 locus was cloned into a low-copy CEN plasmid, which was subse- quently transformed into the BY· RM euploid progenitor (see Mate- rials and Methods). Increased copy number of TSA2 in the euploid progenitor was confirmed through qPCR (Figure S9 in File S1; Mate- rials and Methods). Plasmid-based overexpression of TSA2 in the eu- ploidprogenitorresultedinasignificantincreaseintolerance(Figure4 andFigureS10inFileS1).However,thelow-copynumberplasmiddid notfullyrecapitulatetheeffectofhavingasecondchromosomalcopyof thegene,furthersuggestingthatTSA2doesnotfullyexplaintheeffects ofchromosomeIVdisomy on hydrogen peroxide tolerance (Figure 4, Figure S10 in File S1, and Figure S11 in File S1). In line with this possibility, TSA2 overexpression strains did not show the increased susceptibility to hydrogen peroxide that was exhibited by aneuploid mutants in liquid media (Figure S12 in File S1). Conclusion Previously,wefoundthatthemaximalhydrogenperoxidetolerancesof segregants from the BY · RM, BY · YPS, and RM · YPS crosses are comparable (see Figure S1 in File S1) (Linder et al. 2016), suggesting thatthelevelofresistanceachievablethroughnaturalgeneticvariation may be constrained to some degree. To surpass the levels of tolerance seeninourpriorstudy,weconductedascreenforspontaneousmuta- tions that confer higher hydrogen peroxide resistance than we previ- ously observed. We employedsegregantsthathad maximal tolerances astheprogenitorsinourmutagenesisscreenbecausetheseindividuals carry combinations of natural genetic variants that lead to resistance and we wanted to find mutations that provide even greater tolerance than these allele combinations. DuplicationofchromosomeIV-Rwasthemostcommonmutationin ourscreen.Aswithotherchromosomalduplications(Pavelkaetal.2010; Sunshine et al. 2015), the beneficial effect of the chromosome IV-R disomy is conditional: it depends on both the presence of hydrogen peroxide and exposure to hydrogen peroxide on agar plates, and may be dependent on genetic background. Regarding this latter point, the presentdatacannotdifferentiatewhethertheprogenitorsinourscreen variedintheirgenomestabilitiesorintheirabilitiestobeneficiallyutilize the chromosome IV-R disomy. Assessment of genotype at TSA2 indi- cates that preexisting variation at this locus probably does not explain why we observed chromosome IV aneuploids in the BY · RM and RM·YPScrosses,butnottheBY·YPScross(seeNoteS4inFileS1). Using chromosome- and gene-scale deletions, we determined that increasedcopynumberofasingledetoxifyinggene,TSA2,explainsthe majorityofthebenefitconferredbyduplicationofchromosomeIV-R. TSA2 is unique among budding yeast’s cytoplasmic thioredoxin per- oxidases,asitistheonlyonethatshowsmarkedlyincreasedactivation in response to hydrogen peroxide (Gasch et al. 2000; Park et al. 2000; Wong et al. 2002; Munhoz and Netto 2004; Ogusucu et al. 2007; Nielsenetal.2016).Ourresultsareconsistentwithrecentstudiesfrom other groups showing that typically, a small number of genes, usually oneortwo,mediatetheconditionallybeneficialeffectsofaneuploidies (Pavelkaetal.2010;Chenetal.2012a;Kayaetal.2015;Liuetal.2015). Insummary,ourworkspeakstochallengesinenhancingparticular traits using natural genetic variation, spontaneous mutations, or a combination of the two. Indeed, the maximal trait values achievable through natural genetic variation may be limited because of both the specific alleles present in a population and features of a system that preventextremephenotypiclevelsfromoccurring.Tryingtoovercome theselimitsmaybeachievableusingspontaneousmutations,whichhave notexperiencedthesameselectionpressuresasnaturalvariantsandmay bemorelikelytohavelargeeffects.However,ourworksuggeststhatthe most likely mutational event to underlie such phenotypic increases is chromosomal duplication. Although these duplications have a limita- tioninthattheycanbeeasilylost(Berman2016),theymayrepresenta transientstatethatcanfacilitatetheacquisitionofothermutationsthat provideamorepermanentsolutiontostressfulconditions(Yonaetal. 2012). ACKNOWLEDGMENTS We thank Jonathan Lee, Martin Mullis, and Rachel Schell for reviewing a draft of this manuscript. This work was supported by grants from the National Institutes of Health (R01GM110255), National Science Foundation (MCB1330874), and Alfred P. Sloan Foundation to I.M.E. LITERATURE CITED Akasaka, N., J. Tohyama, A. Ogawa, T. Takachi, A. Watanabe et al., 2013 Refractory infantile spasms associated with mosaic variegated aneuploidy syndrome. Pediatr. Neurol. 49: 364–367. Andriani, G. A., F. Faggioli, D. Baker, M. E. Dolle, R. S. Sellers et al., 2016 Whole chromosome aneuploidy in the brain of Bub1bH/H and Ercc1-/Delta7 mice. Hum. Mol. Genet. 25: 755–765. Aparicio, O. M., and D. E. Gottschling, 1994 Overcoming telomeric si- lencing: a trans-activator competes to establish gene expression in a cell cycle-dependent way. Genes Dev. 8: 1133–1146. Berman, J., 2016 Ploidy plasticity: a rapid and reversible strategy for ad- aptation to stress. FEMS Yeast Res. 16: fow020. Bose, D., V. Krishnamurthy, K. S. Venkatesh, M. Aiyaz, M. Shetty et al., 2015 Molecular delineation of partial trisomy 14q and partial trisomy 12p in a patient with dysmorphic features, heart defect and develop- mental delay. Cytogenet. Genome Res. 145: 14–18. Chen, G., W. D. Bradford, C. W. Seidel, and R. Li, 2012a Hsp90 stress potentiates rapid cellular adaptation through induction of aneuploidy. Nature 482: 246–250. Chen, G., B. Rubinstein, and R. Li, 2012b Whole chromosome aneuploidy: big mutations drive adaptation by phenotypic leap. Bioessays 34: 893– 900. Davoli, T., A. W. Xu, K. E. Mengwasser, L. M. Sack, J. C. Yoon et al., 2013 Cumulative haploinsufficiency and triplosensitivity drive aneu- ploidy patterns and shape the cancer genome. Cell 155: 948–962. Dodgson, S. E., S. Kim, M. Costanzo, A. Baryshnikova, D. L. Morse et al., 2016 Chromosome-specific and global effects of aneuploidy in Sac- charomyces cerevisiae. Genetics 202: 1395–1409. Durrbaum, M., and Z. Storchova, 2015 Consequences of aneuploidy in cancer: transcriptome and beyond. Recent Results Cancer Res. 200: 195– 224. Durrbaum, M., and Z. Storchova, 2016 Effects of aneuploidy on gene ex- pression: implications for cancer. FEBS J. 283: 791–802. Ehrenreich, I. M., J. Bloom, N. Torabi, X. Wang, Y. Jia et al., 2012 Genetic architecture of highly complexchemicalresistancetraits across fouryeast strains. PLoS Genet. 8: e1002570. Gannon, W. T., J. E. Martinez, S. J. Anderson, and H. M. Swingle, 2011 Craniofacial dysmorphism and developmental disorders among children with chromosomal microdeletions and duplications of unknown significance. J. Dev. Behav. Pediatr. 32: 600–604. Gasch, A. P., P. T. Spellman, C. M. Kao, O. Carmel-Harel, M. B. Eisen et al., 2000 Genomic expression programs in the response of yeast cells to environmental changes. Mol. Biol. Cell 11: 4241–4257. Gietz, R. D., and R. H. Schiestl, 2007 High-efficiency yeast transformation using the LiAc/SS carrier DNA/PEG method. Nat. Protoc. 2: 31–34. Gottschling, D. E., O. M. Aparicio, B. L. Billington, and V. A. Zakian, 1990 Position effect at S. cerevisiae telomeres: reversible repression of Pol II transcription. Cell 63: 751–762. Volume 7 September 2017 | TSA2 and Beneficial Aneuploidy in Yeast | 3183 Kaboli, S., T. Miyamoto, K. Sunada, Y. Sasano, M. Sugiyama et al., 2016 Improved stress resistance and ethanol production by segmental haploidization of the diploid genome in Saccharomyces cerevisiae. J. Biosci. Bioeng. 121: 638–644. Kaya, A., M. V. Gerashchenko, I. Seim, J. Labarre, M. B. Toledano et al., 2015 Adaptive aneuploidy protects against thiol peroxidase deficiency by increasing respiration via key mitochondrial proteins. Proc. Natl. Acad. Sci. USA 112: 10685–10690. Laubert, T., S. Freitag-Wolf, M. Linnebacher, A. Konig, B. Vollmar et al., 2015 Stage-specificfrequencyandprognosticsignificanceof aneuploidy in patients with sporadic colorectal cancer–a meta-analysis and current overview. Int. J. Colorectal Dis. 30: 1015–1028. Li, H., and R. Durbin, 2009 Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25: 1754–1760. Li,H.,B.Handsaker,A.Wysoker,T.Fennell,J.Ruanetal.,2009 Thesequence alignment/map format and SAMtools. Bioinformatics 25: 2078–2079. Linder, R. A., F. Seidl, K. Ha, and I. M. Ehrenreich, 2016 The complex genetic andmolecularbasisofamodelquantitativetrait.Mol.Biol.Cell27:209–218. Liu, G., M. Y. Yong, M. Yurieva, K. G. Srinivasan, J. Liu et al., 2015 Gene essentiality is a quantitative property linked to cellular evolvability. Cell 163: 1388–1399. Meena,J.K.,A.Cerutti,C.Beichler,Y.Morita,C.Bruhnetal.,2015 Telomerase abrogates aneuploidy-induced telomere replication stress, senescence and cell depletion. EMBO J. 34: 1371–1384. Mohr, M., K. S. Zaenker, and T. Dittmar, 2015 Fusion in cancer: an ex- planatory model for aneuploidy, metastasis formation, and drug resis- tance. Methods Mol. Biol. 1313: 21–40. Munhoz, D. C., and L. E. Netto, 2004 Cytosolic thioredoxin peroxidase I and II are important defenses of yeast against organic hydroperoxide insult: catalases and peroxiredoxins cooperate in the decomposition of H2O2 by yeast. J. Biol. Chem. 279: 35219–35227. Nicholson, J. M., and D. Cimini, 2015 Link between aneuploidy and chromosome instability. Int. Rev. Cell Mol. Biol. 315: 299–317. Nielsen, M. H., R. T. Kidmose, and L. B. Jenner, 2016 Structure of TSA2 reveals novel features of the active-site loop of peroxiredoxins. Acta Crystallogr D Struct. Biol. 72: 158–167. Ogusucu, R., D. Rettori, D. C. Munhoz, L. E. Netto, and O. Augusto, 2007 Reactions of yeast thioredoxin peroxidases I and II with hydrogen peroxide and peroxynitrite: rate constants by competitive kinetics. Free Radic. Biol. Med. 42: 326–334. Ottesen, A. M., L. Aksglaede, I. Garn, N. Tartaglia, F. Tassone et al., 2010 Increased number of sex chromosomes affects height in a non- linear fashion: a study of 305 patients with sex chromosome aneuploidy. Am. J. Med. Genet. A. 152A: 1206–1212. Park,S.G.,M.K.Cha,W.Jeong,andI.H.Kim,2000 Distinctphysiological functions of thiol peroxidase isoenzymes in Saccharomyces cerevisiae. J. Biol. Chem. 275: 5723–5732. Pavelka,N.,G.Rancati,J.Zhu,W.D.Bradford,A.Sarafetal.,2010 Aneuploidy confers quantitative proteome changes and phenotypic variation in budding yeast. Nature 468: 321–325. Pinto, A. E., T. Pereira, G. L. Silva, and S. Andre, 2015 Aneuploidy iden- tifies subsets of patients with poor clinical outcome in grade 1 and grade 2 breast cancer. Breast 24: 449–455. Potapova, T. A., J. Zhu, and R. Li, 2013 Aneuploidy and chromosomal instability: a vicious cycle driving cellular evolution and cancer genome chaos. Cancer Metastasis Rev. 32: 377–389. Santaguida, S., and A. Amon, 2015 Short- and long-term effects of chro- mosome mis-segregation and aneuploidy. Nat. Rev. Mol. Cell Biol. 16: 473–485. Selmecki, A. M., K. Dulmage, L. E. Cowen, J. B. Anderson, and J. Berman, 2009 Acquisition of aneuploidy provides increased fitness during the evolution of antifungal drug resistance. PLoS Genet. 5: e1000705. Selmecki, A. M., Y. E. Maruvka, P. A. Richmond, M. Guillet, N. Shoresh et al., 2015 Polyploidycandriverapidadaptationinyeast.Nature519:349–352. Sheltzer, J. M., 2013 A transcriptional and metabolic signature of primary aneuploidy is present in chromosomally unstable cancer cells and informs clinical prognosis. Cancer Res. 73: 6401–6412. Siegel, J. J., and A. Amon, 2012 New insights into the troubles of aneu- ploidy. Annu. Rev. Cell Dev. Biol. 28: 189–214. Sirr, A., G. A. Cromie, E. W. Jeffery, T. L. Gilbert, C. L. Ludlow et al., 2015 Allelic variation, aneuploidy, and nongenetic mechanisms sup- press a monogenic trait in yeast. Genetics 199: 247–262. Sugiyama, M., S. Ikushima, T. Nakazawa, Y. Kaneko, and S. Harashima, 2005 PCR-mediated repeated chromosome splitting in Saccharomyces cerevisiae. Biotechniques 38: 909–914. Sugiyama, M., T. Nakazawa, K. Murakami, T. Sumiya, A. Nakamura et al., 2008 PCR-mediatedone-stepdeletionoftargetedchromosomalregions in haploid Saccharomyces cerevisiae. Appl. Microbiol. Biotechnol. 80: 545–553. Sunshine,A. B., C. Payen, G. T. Ong, I. Liachko,K. M. Tan et al., 2015 The fitness consequences of aneuploidy are driven by condition-dependent gene effects. PLoS Biol. 13: e1002155. Sunshine, A. B., G. T. Ong, D. P. Nickerson, D. Carr, C. J. Murakami et al., 2016 Aneuploidy shortens replicative lifespan in Saccharomyces cerevi- siae. Aging Cell 15: 317–324. Tan,Z.,M.Hays,G.A.Cromie,E.W.Jeffery,A.C.Scottetal.,2013 Aneuploidy underlies a multicellular phenotypic switch. Proc. Natl. Acad. Sci. USA 110: 12367–12372. Torres, E. M., T. Sokolsky, C. M. Tucker, L. Y. Chan, M. Boselli et al., 2007 Effects of aneuploidy on cellular physiology and cell division in haploid yeast. Science 317: 916–924. Wong, C. M., Y. Zhou, R. W. Ng, H. F. Kung Hf, and D. Y. Jin, 2002 Cooperation of yeast peroxiredoxins Tsa1p and Tsa2p in the cellular defense against oxidative and nitrosative stress. J. Biol. Chem. 277: 5385–5394. Yona, A. H., Y. S. Manor, R. H. Herbst, G. H. Romano, A. Mitchell et al., 2012 Chromosomal duplication is a transient evolutionary solution to stress. Proc. Natl. Acad. Sci. USA 109: 21010–21015. Communicating editor: D. Gresham 3184 | R. A. Linder et al.
Abstract (if available)
Abstract
Frogs and toads (anurans) are widely used to study many biological problems, however, few have genome assemblies or annotations. This lack of resources limits the possible research that can be conducted in these systems. Research strategies that are commonplace in many systems, such as genome wide association studies or quantitative trait mapping, are not feasible without a reference. However, thanks to the continuing reduction in the cost of sequencing, it has become possible for small lab groups to generate high quality genomes and annotations for their model of choice. In chapter two I describe the genome sequencing, assembly, and annotation of a promising anuran system as well as limited biological insight into some of its attributes. In chapter three I describe a transcriptomic analysis of inter species hybrids in this system enabled by the genome described in chapter two. This transcriptomic analysis revealed large-scale patterns of differential expression unique to hybrids consistent with reinforcement of hybrid boundaries.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Exploring the genetic basis of quantitative traits
PDF
Evolutionary genomic analysis in heterogeneous populations of non-model and model organisms
PDF
Bayesian analysis of transcriptomic and genomic data
PDF
Genetic engineering of fungi to enhance the production and elucidate the biosynthesis of bioactive secondary metabolites
PDF
The evolution of gene regulatory networks
PDF
Model selection methods for genome wide association studies and statistical analysis of RNA seq data
PDF
Studies in bivalve aquaculture: metallotoxicity, microbiome manipulations, and genomics & breeding programs with a focus on mutation rate
PDF
From gamete to genome: evolutionary consequences of sexual conflict in house mice
PDF
Innovative sequencing techniques elucidate gene regulatory evolution in Drosophila
PDF
Probing the genetic basis of gene expression variation through Bayesian analysis of allelic imbalance and transcriptome studies of oil palm interspecies hybrids
PDF
Application of machine learning methods in genomic data analysis
PDF
Genetic characterization of microbial eukaryotic diversity and metabolic potential
PDF
Genetic architectures of phenotypic capacitance
PDF
Genomic, transcriptomic and immunologic landscapes of human cancers
PDF
Transcriptional and morphological impacts of copper on Mytilus californianus larval development in current and future ocean conditions
PDF
A population genomics approach to the study of speciation in flowering columbines
PDF
Applications of next generation sequencing in sessile marine invertabrates
PDF
Genetic diversity and bacterial death in the context of adaptive evolution
PDF
Plant genome wide association studies and improvement of the linear mixed model by applying the weighted relationship matrix
PDF
Biological interactions on the behavioral, genomic, and ecological scale: investigating patterns in Drosophila melanogaster of the southeast United States and Caribbean islands
Asset Metadata
Creator
Seidl, Fabian
(author)
Core Title
Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Publication Date
06/07/2019
Defense Date
01/28/2019
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
anurans,DNA sequencing,gene expression,genome,genomics,hybridization,OAI-PMH Harvest,spadefoot,Spea bombifrons,Spea multiplicata,transcriptome
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ehrenreich, Ian (
committee chair
), Finkel, Steven (
committee member
), Nuzhdin, Sergey (
committee member
), Webb, Eric (
committee member
)
Creator Email
fabian.seidl@gmail.com,fseidl@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-c89-172370
Unique identifier
UC11660811
Identifier
etd-SeidlFabia-7465.pdf (filename),usctheses-c89-172370 (legacy record id)
Legacy Identifier
etd-SeidlFabia-7465.pdf
Dmrecord
172370
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Seidl, Fabian
Type
texts
Source
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the a...
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Tags
anurans
DNA sequencing
gene expression
genome
genomics
hybridization
spadefoot
Spea bombifrons
Spea multiplicata
transcriptome