Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Genetic and molecular insights into the genotype-phenotype relationship
(USC Thesis Other)
Genetic and molecular insights into the genotype-phenotype relationship
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Genetic and molecular insights into the
genotype-phenotype relationship
by
Rachel Schell
A Dissertation Presented to the
FACULTY OF THE GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In Partial Fulfillment of the
Requirements for the Degree
DOCTOR OF PHILOSOPHY
(MOLECULAR BIOLOGY)
May 2022
ii
Acknowledgements
I thank my advisor Ian who has inspired me to pursue big, bold questions and to always do the
best possible work. Thank you to my committee for your direction, support, and example of what
dedicated and impassioned scientists can accomplish.
Thanks to my fellow lab mates for endless discussion of science and non-science topics. Thanks
to my MCB peers and the broader USC community for creating a supportive and fun environment
in which it has been a pleasure to participate.
I’d like to acknowledge the group of scientists whom I worked with at Hycor Biomedical who first
inspired me to pursue a career in research, in particular, Mark Van Cleve, Scott Vande Wetering,
and Elaine Taine.
I must also acknowledge the wealth of support from my family and friends. It has been a particular
advantage to pursue my PhD while being able to spend time together.
Without the humanities, my endeavors in science would not be possible as they have sustained and
spurned me to seek answers in all things. In particular, the Growlers, Morrissey and The Smiths,
Sufjan Stevens, Stephen King, Buffy the Vampire Slayer, Philip Glass, Long Beach, CA, wine,
cooking, and live music are just a few of my favorite things.
iii
Table of Contents
Acknowledgements ii
Table of Contents iii
List of Tables vii
List of Figures viii
List of Supplemental Figures ix
Abstract x
Chapter 1 Introduction 11
1.1. Connecting genotypes to phenotypes is challenging 11
1.2 Most gene perturbations have minimal phenotypic impact 11
1.3 Genetic background effects are instances of genotype-phenotype rewiring 12
1.4 Genetic background effects are poorly understood at the molecular level 12
1.6 Crosses are useful tools for studying the genotype-phenotype connection 13
1.7 Goals of this dissertation 13
1.8 Chapter summary 14
Chapter 2 The complex underpinnings of genetic background effects 16
Summary of Contribution 16
2.1 Abstract 17
2.3 Results 20
2.3.1 Preliminary screen. 20
2.3.2 Mapping of mutation-independent and mutation-responsive effects. 21
2.3.3 Higher-order epistasis influences response to mutations. 24
2.3.4 Environment plays a strong role in background effects. 25
2.3.5 Interactions between loci and different knockouts. 27
2.3.6 Genetic basis of induced changes in phenotypic variance. 29
2.4 Discussion 30
2.5 Methods 32
2.5.1 Generation of different BY×3S knockout backgrounds 32
2.5.2 Genotyping of segregants 33
2.5.3 Phenotyping of segregants 34
2.5.4 Scans for one-locus effects 35
2.5.5 Scans for two-locus effects 37
2.5.6 Scans for three-locus genetic effects 37
2.5.7 Assignment of mutation-responsive effects to knockouts 38
2.5.8 Statistical power analysis 39
2.5.9 Contributions of loci involved in higher-order epistasis 40
iv
2.5.10 Analysis of mutation-responsive effects across environments 40
2.5.11 Genetic explanation of changes in phenotypic variance 41
2.5.12 Checking potential consequences of allele frequency bias 42
2.4 Supplementary Material 43
Chapter 3 Genetic basis of a spontaneous mutation’s expressivity 80
3.1 Abstract 80
3.2 Introduction 81
3.3 Results 81
3.3.1 A spontaneous mutation increases phenotypic variance in the BYx3S cross 81
3.3.2 A large effect locus shows epistasis with mrp20-105E 82
3.3.3 Epistasis between MRP20 and MKT1 differs in cross parents and segregants 85
3.3.4 Fixation of mrp20-105E and MKT1 genotypes increases phenotypic variance 87
3.3.5 Many additional loci affect the expressivity of mrp20-105E 88
3.3.6 The Chromosome XIV locus contains multiple causal variants 88
3.3.7 Aneuploidy also contributes to the expressivity of mrp20-105E 89
3.3.8 Multiple mechanisms underlie poor growth in the presence of mrp20-105E 90
3.3.9 Genetic underpinnings of mrp20-105E’s expressivity in segregants and parents 91
3.4 Discussion 93
3.5 Methods 94
3.5.1 Generation of segregants 94
3.5.2 Genotyping 95
3.5.3 Phenotyping 96
3.5.4 Linkage mapping 97
3.5.5 Classification of inviable segregants 98
3.5.6 Delimiting loci with recombination breakpoints 98
3.5.7 Reciprocal hemizygosity experiments 99
3.5.8 Construction of nucleotide replacement strains 99
3.5.9 Mitochondrial genome instability experiments 101
3.5.11 Relationship between detrimental alleles, growth, and inviability 101
3.6 Supplementary Material 103
Chapter 4 A gene deletion with no growth effect causes widespread molecular changes 120
4.1 Abstract 120
4.2 Introduction 121
4.3 Results and Discussion 123
4.4 Conclusion 130
4.5 Methods 133
4.5.1 Generation of strains 133
4.5.2 Growth curve experiments 134
4.5.3 ATAC-seq and RNA-seq Experiments 134
4.5.4 ATAC-seq protocol 135
v
4.5.6 Sequencing and mapping 135
4.5.7 Expression Analyses 136
4.5.8 Accessibility Analyses 138
4.5.9 De novo motif discovery and gene ontology enrichment analyses 139
4.6 Supplementary Material 140
Chapter 5 Concluding Remarks 144
5.1 Impact of my work 144
5.2 Future Directions 146
References 148
Appendix A: Modifiers of the Genotype–Phenotype Map: Hsp90 and Beyond 155
A.1 Summary of Contribution 155
A.2 Abstract 155
A.3 Main Text 156
A.3.1 Perturbation of Hsp90 Impacts Heritable Phenotypic Variation 156
A.3.2 Selection Complicates Inferences about Buffering and Potentiation by Hsp90 157
A.3.3 Hsp90 Is a Global Modifier That Shows Extensive Epistasis 159
A.3.4 Questions Regarding Hsp90 and Other Global Modifiers 161
A.4 Conclusion 163
A.5 References 164
Appendix B: Genome of Spea multiplicata, a Rapidly Developing, Phenotypically Plastic, and
Desert-Adapted Spadefoot Toad 167
B.1 Summary of Contribution 167
B.2 Abstract 167
B.3 Introduction 169
B.4 Results 172
B.4.1 Properties of the Sp. multiplicata genome 172
B.4.2 Genes with elevated copy numbers in Sp. multiplicata 174
B.4.3 Factors contributing to anuran genome size differences 176
B.4.4 Identification of genes showing evidence of positive selection 177
B.4.5 Insights into adaptive hybridization from the transcriptome 178
B.5 Discussion 180
B.6 Methods 183
B.6.1 Genome assembly 183
B.6.2 Annotation of protein-coding genes 185
B.6.3 Gene ontology analysis 187
B.6.4 Copy number analysis 187
B.6.5 dN/dS analysis 188
vi
B.6.6 Analysis of gene expression in pure species and their hybrids 190
B.6 Supplementary Material 192
B.7 References 193
Appendix C: The interplay of additivity, dominance, and epistasis in a diploid yeast cross 199
C.1 Summary of Contribution 199
C.2 Abstract 199
C.3 Introduction 200
C.5 Results 201
C.5.1 Phenotyping of a large diploid cross by barcode sequencing 201
C.5.2 Mapping within interrelated families increases statistical power 204
C.5.3 Loci frequently show dominance effects 207
C.5.4 Epistatic hubs govern both additivity and non-additivity 208
C.5.5 Relationships between epistasis and dominance in diploids 209
C.4 Discussion 214
C.7 References 216
vii
List of Tables
Table S2.1 List of 47 genes that were screened for background effects. 57
Table S2.2 Screen summary statistics. 59
Table S2.3 Mapping population breakdown. 60
Table S2.4 Genomic regions with allele frequency bias. 61
Table S2.5 Fixed regions in BY/3S hemizygous diploids. 62
Table S2.6 Phenotyping environments. 63
Table S2.7 Number of mutation-independent and mutation-responsive genetic effects across
different significance thresholds. 64
Table S2.8 Chi-squared test results for mutation-responsive effects. 65
Table S2.9 Percentage of mutation-responsive genetic effects that showed phenotypic effect in
any environment outside the one in which they were originally detected. 66
Table S2.10 Number of mutation-responsive effects that show biased individual and multi-locus
allele frequencies. 67
Table S2.11 Phenotypic variance and the number of identified genetic effects. 69
Table S2.12 List of all primers used in this study. 77
Table S3.1 Crosses and segregant populations examined in this study. 110
Table S3.2 Loci beyond MKT1 that additively influence growth in mrp20-105E segregants. 111
Table S3.3 Genes at loci that influence growth in mrp20-105E segregants. 112
Table S3.4 Candidate genes have diverse cellular functions. 113
Table S3.5 Presence of Chromosome II duplication differs among BY x 3S crosses. 114
viii
List of Figures
Figure 2.1 Examples of mutation-responsive genetic effects. 22
Figure 2.2 Most mutation-responsive genetic effects involve multiple loci. 24
Figure 2.3 Higher-order epistasis among knockouts and multiple loci is an important contributor
to background effects. 25
Figure 2.4 Analysis of mutation-responsive effects across environments. 27
Figure 2.5 Analysis of mutation-responsive effects across knockout backgrounds. 28
Figure 2.6. Mutation-responsive effects underlie differences in phenotypic variance between
knockout and wild-type backgrounds across environments. 30
Figure 3.1 The mrp20-105E mutation occurred spontaneously, increasing phenotypic variance in
the BYx3S cross. 82
Figure 3.2 Epistasis between MRP20 and MKT1 appears to mostly explain response to mrp20-
105E. 84
Figure 3.3 Additional loci govern response to the mutation. 86
Figure 3.4 Mitochondrial genome instability partially underlies the expressivity of mrp20-105E.
91
Figure 3.5 Detected loci quantitatively and qualitatively explain mutant phenotypes. 92
Figure 4.1 HOS3 deletion has no effect on growth. 122
Figure 4.2 Workflow of experiments conducted in this paper. 123
Figure 4.3 HOS3 deletion affects allele-specific expression (ASE) at hundreds of genes. 125
Figure 4.4 HOS3 deletion affects chromatin accessibility at thousands of SNPs in hundreds of
genes. 126
Figure 4.5 HOS3 deletion alters expression at ribosome genes regulated by transcription factors
Tod6 and Sfp1. 128
Figure 4.6 HOS3 deletion alters chromatin accessibility in genomic regions enriched for six
specific transcription factors. 130
Figure 4.7 Wide-spread buffering occurs among accessibility and expression changes occurring
in HOS3 deletion. 133
ix
List of Supplemental Figures
Figure S2.1 Generation of BYx3S knockout segregants. 43
Figure S2.2 Certain genes exhibit significant background effects when perturbed. 44
Figure S2.3 Allele frequency plot. 45
Figure S2.4 Growth of all 1,411 segregants across the 10 environments. 47
Figure S2.5 Individual and joint contributions of loci to background effects across different
significance thresholds. 49
Figure S2.6 Analysis of how mutation-responsive effects interact with different knockouts at
multiple significance thresholds. 50
Figure S2.7 Statistical power analysis for one, two, and three-locus interactions. 51
Figure S2.8 Extent to which mutation-responsive effects interact with different knockouts. 52
Figure S2.9 Absence of a relationship between identified mutation-responsive effects and mean
phenotypic differences between knockout and wild type backgrounds. 53
Figure S2.10 All seven knockout populations show nominally significant correlations between
changes in phenotypic variance and detected mutation-responsive effects. 54
Figure S2.11 Most mutation-responsive effects show small differences in phenotypic variance
explained (PVE) in mutants relative to wild type segregants. 55
Figure S3.1 Identification of mrp20-105E. 104
Figure S3.2 Representative MRP20 and mrp20-105E segregants on ethanol. 106
Figure S3.3 Linkage mapping in the F3 panel more finely resolves the Chromosome XIV locus.
107
Figure S3.4 Growth effects of loci detected in BY x 3S mrp20-105E crosses. 108
Figure S3.5 Loci affecting expressivity of mrp20-015E show minimal effects in MRP20
segregants. 109
Figure S4.1 Correlation of ASE data among individual samples. 140
Figure S4.2 Environmental perturbation of WT strain alters allele-specific expression (ASE) and
allelic accessibility at thousands of polymorphisms and genes. 141
Figure S4.3 Genes experiencing allele-specific expression (ASE) changes in HOS3 deletion in
both environments showed similar effects. 142
Figure S4.4 Polymorphisms that showed allelic occupancy changes in HOS3 deletion in both
environments showed similar effects. 143
x
Abstract
Connecting genotypes to phenotypes is often challenging. Many traits are genetically
complex and involve many loci. Exactly how loci collectively produce phenotypes is poorly
understood at the molecular level. Furthermore, natural populations can harbor substantial amounts
of genetic variation making it difficult to detect and identify causal variation underlying specific
traits. The way that genotypes specify phenotypes is dynamic, and environmental or genetic
perturbations can alter the relationship between genotype and phenotype.
In this dissertation, I use genetic perturbations to examine the genotype-phenotype
relationship. In chapter 2, I show that gene knockouts modify how pairs and trios of genetic
variants contribute to growth. In chapter 3, I determine the genetic variation underlying variable
expressivity of a spontaneous mutation which produces phenotypes that range from lethal to
benign. In chapter 4, I show that a phenotypically silent gene deletion has wide-spread
biochemical effects on expression and chromatin accessibility. Together, these studies shed light
on the genetic and molecular mechanisms connecting genotypes to phenotype.
11
Chapter 1 Introduction
1.1. Connecting genotypes to phenotypes is challenging
Predicting phenotypes for specific genotypes remains a central goal in medicine and
industry, but a deeper genetic and molecular understanding of the genotype-phenotype relationship
is necessary to accomplish this task
1–3
. While technological advancements have made genotyping
easier, more affordable and better, current efforts to connect genotypes to phenotypes, such as
genome wide association studies tend to poorly explain the variation associated with a given trait
4
.
This can be due to a variety of reasons, including the fact that many traits of interest are genetically
complex and are defined by more than one locus or region of the genome
5,6
. Loci can function
independently of one another to additively influence phenotype or can instead impart effects in a
dependent manner
7,8
. These gene-gene interactions, known as epistasis, can involve a large number
of loci and have consequential effects on phenotypes, making epistasis critical for connecting
genotype to phenotype
9–13
. Furthermore, environment can alter the way that variation influences
phenotype
14,15
. Additionally, different genetic architectures can underly the same phenotype.
9,10
For these reasons, connecting genotypes to their phenotypes is often difficult.
1.2 Most gene perturbations have minimal phenotypic impact
Systematic studies in model organisms have shown that perturbation of the vast majority
of genes is tolerable and often have no phenotypic impact
16–20
. Perturbation of pairs and trios of
genes has revealed genetic relationships underlying this widespread genetic redundancy
21–26
. Such
studies have enabled genes to be grouped into functional networks and led to a broad
characterization of the genome into a small number of hub genes which span diverse cellular
functions and interact with many other genes and a large number of localized genes with far fewer
interactions
24,25,27,28
. While some genetic relationships provide functional insights, many do not
12
provide clear hypotheses as to the mechanisms that underly genetic relationships. Furthermore,
why deletion of so many genes has minimal phenotypic impact remains unclear. Due to the
breadth of such studies, few genetic backgrounds have been comprehensively examined.
Therefore, systematic understanding of how natural variation alter genetic relationships, broad
structure of the genome, or the tolerance of gene deletions is unclear.
1.3 Genetic background effects are instances of genotype-phenotype rewiring
A common mutation can produce different phenotypic effects across genetically distinct
individuals
1,2,9,10,29
. These genetic background effects are common across organisms and affect a
variety of traits
1,2
. For instance, certain individuals carrying disease alleles may exhibit no disease
while other individuals may experience varying severity of disease traits
30
. Background effects
occur because of interactions between the shared mutation and standing genetic variation, and
therefore background effects can be described as modification of the genotype-phenotype map
9,31–
33
. Some examples of background effects that have been teased apart are quite complex and
involve higher-order relationships among genetic variants across multiple genes. In some cases,
emergent and unexpected phenotypes occurred
32
. Yet, more studies that identify novel background
effects and map the loci responsible are needed to obtain a more general understanding of genetic
background effects and shed light on properties of the genotype-phenotype connection.
1.4 Genetic background effects are poorly understood at the molecular level
The molecular explanations the underly the complex genetics of background effects are
not well understood. However, transcriptional changes often accompany genetic background
effects
9,10,14,31–36
. Yet other cases of genetic background effects are thought to be mediated at the
post-translational level, as in the case of perturbation of the heat chaperone Hsp90
37–40
.
Importantly, background effects that have been genetically teased apart have led to candidate genes
13
and genetic variants which when confirmed can clarify the cellular processes underlying these
instances of genotype-phenotype rewiring
9,10,14,32–34
. Therefore, deep genetic characterization of
background effects is needed to clarify the general mechanisms by which background effects
occur. These instances of teased apart background effects can also provide general insight into
molecular mechanisms that underlie the genotype-phenotype relationship.
1.6 Crosses are useful tools for studying the genotype-phenotype connection
Genetic crosses have been useful tools for examining the genotypes-phenotype
relationship
7,41,42
. Crosses can produce large sample numbers of segregants and enable the
mapping of traits to near completion
12
. Different crossing designs can be employed to better
resolve loci and identify causal variants
10,43,44
. Crosses also make it possible to examine gene-
gene interactions and determine their phenotypic effects
11–13,41
. Importantly, crosses can be used
to examine the genotype-phenotype connection in model and non-model systems
45–48
.
1.7 Goals of this dissertation
To improve our understanding of the genotype-phenotype relationship, I have sought to
answer these specific in my PhD research: 1) What genetic architectures underly genetic
background effects? 2) How can a common perturbation produce a spectrum of phenotypic
responses across genetically distinct individuals? 3) What molecular mechanisms underly rewiring
of the genotype-phenotype relationship?
To answer these questions, I have used a series of genetic crosses in the budding yeast
Saccharomyces cerevisiae to study the genotype-phenotype map. In a cross between the lab strain
BY4716 (BY) and medical isolate 322134S (3S), I have generated genetically diverse populations
of BYx3S cross progeny
49,50
. By engineering various genetic perturbations into this cross, I have
examined how individual genetic perturbations can rewire the genotype-phenotype relationship.
14
This is an efficient way to study this problem because large numbers of individuals from the same
cross can be produced, sequenced, genotyped, and phenotyped quickly and easily. Furthermore,
the well annotated genome and a wealth of genetic and molecular tools available in budding yeast
make it possible to clone causal genetic variation and nucleotides.
1.8 Chapter summary
In chapter 2, I conducted a screen of chromatin gene knockouts and identified seven new
gene deletions that cause genetic background effects. I then conducted a genetic mapping study
which identified more than 1,000 genetic effects that explain these background effects. A majority
of genetic effects consisted of pairs and trios of genetic variants whose collective effects were
modified by the knockouts. In this work, I demonstrated that gene knockouts can extensively
rewire how genetic variants influences one another and collectively determine phenotype.
In chapter 3, I identified a spontaneous mutation that produced a range of phenotypic
responses from lethal to benign across genetically distinct individuals. Through genetic
engineering and mapping, I identified 18 genetic factors that collectively determined an
individual’s phenotypic response to the mutation. While one large-effect locus initially appeared
to parse phenotypic responses into two qualitative responses, further examination revealed that the
cumulative effects of many other smaller effect variants could overpower this large effect
interaction and produce a quantitative range of phenotypic responses.
In chapter 4, I examine chromatin accessibility and gene expression consequences of a
phenotypically silent gene deletion. To shed light on the mechanisms that make this gene deletion
phenotypically silent, I examined this gene deletion in a highly heterozygous strain. I detected
changes in allelic accessibility and allele-specific expression at thousands of genetic variants
across hundreds of genes, including enrichment of cellular periphery genes. However, few allelic
15
changes appeared to produce differential expression or occupancy changes that were irrespective
of variation. A distinct set of differentially expressed genes and accessibility changes that were
irrespective to variation implied the deletion altered the behavior of at least seven transcription
factors. Differential expression occurred at genes that affect translation. Together, these data
suggest complex changes in transcriptional regulation of translation and cellular periphery genes
may buffer the effects of this gene deletion.
In chapter 5, I discuss the impact of the findings presented in this dissertation and potential
future directions that would improve our understanding of the genotype-phenotype relationship.
16
Chapter 2 The complex underpinnings of genetic
background effects
This chapter appears as published in Nature Communications, 2018. 9: 3548
Summary of Contribution
This project began when I was a first year student in the MCB program and was rotating in the
Ehrenreich lab. Martin Mullis, a PhD student already in the lab, was my rotation mentor. Together,
we began the first part of this project which was to examine 10 gene knockouts in the BYx3S
budding yeast cross. Martin, another rotation student, and I worked on these experiments. Martin
functioned in a more supervisory role but did contribute to some experiments. The other rotation
student involved contributed a few experiments. Since this was my rotation project, I organized
our efforts and led the bulk of these early experiments in conjunction with Martin. At this point,
Martin and I were deemed co-leaders of the project. I then proposed expanding our efforts. After
reviewing literature, Martin and I came up with a list of 40 more genes to examine which would
broadly encapsulate diverse functions of chromatin genes. Initial results with data generated during
my rotation showed we were identifying novel background effects. Since this was my third and
final rotation, the project naturally continued as I formally joined the lab. Shortly after this time,
Takeshi Matsui, a senior graduate student in the lab joined the project and conducted experiments
to expedite completion of the screen in order to include these data into an R01 grant application
that summer, approximately 5 months after the project began. After the initial screen was complete,
we then began experiments needed for the linkage mapping study of the 7 gene knockouts of
interest identified from the initial screen. Additional segregant populations were obtained, grown
in 10 diverse environments, and libraries for 1,411 strains were individually prepared for whole
genome sequencing. I led the organization of our collective efforts and recording of experiments.
17
Martin, Takeshi and I all conducted these experiments. Following receipt of the sequencing data,
we then processed the data with a bioinformatic pipeline to determine the genetic information
carried at ~40,000 variants present in our 1,411 segregants strains. Takeshi had written this
bioinformatic pipeline, and Martin, Takeshi and I all processed a subset of data using this pipeline.
Martin processed 3 populations, while Takeshi and I each processed 2 populations. Next, Martin
and Takeshi led and conducted the linkage mapping study using these data. While I was involved
in conceptual discussions, I did not directly contribute to the linkage mapping study. I did conduct
some analyses, such as hierarchical clustering of gene-by-environment effects during this time,
but those analyses did not make it into the final publication. When it came time to write up our
findings into a manuscript, Takeshi and Martin generated the main text figures and subset of
supplemental figures, and I generated a subset of the supplemental figures. I contributed to writing
of methods, particularly sections that related to the experiments for which I had conducted. I also
contributed to writing of the main text, along with Martin, Takeshi and Ian. Martin, Takeshi, and
Ian revised the manuscript and handled the peer review process.
2.1 Abstract
Genetic interactions between mutations and standing polymorphisms can cause mutations to show
distinct phenotypic effects in different individuals. To characterize the genetic architecture of these
so-called background effects, we genotype 1411 wild-type and mutant yeast cross progeny and
measure their growth in 10 environments. Using these data, we map 1086 interactions between
segregating loci and 7 different gene knockouts. Each knockout exhibits between 73 and 543
interactions, with 89% of all interactions involving higher-order epistasis between a knockout and
multiple loci. Identified loci interact with as few as one knockout and as many as all seven
knockouts. In mutants, loci interacting with fewer and more knockouts tend to show enhanced and
18
reduced phenotypic effects, respectively. Cross–environment analysis reveals that most
interactions between the knockouts and segregating loci also involve the environment. These
results illustrate the complicated interactions between mutations, standing polymorphisms, and the
environment that cause background effects.
19
2.2 Introduction
Background effects occur when the same spontaneous or induced mutations show different
phenotypic effects across genetically distinct individuals
1–3,10,14,32,51
. Countless examples of
background effects have been described across species and traits
1,2
, collectively suggesting that
this phenomenon is common in biological systems and plays a significant role in many phenotypes.
For example, alleles that show background effects contribute to a wide range of hereditary
disorders, including, but not limited to, colorectal cancer, hypertension, and phenylketonuria
52
.
Background effects may also impact other disorders that frequently involve de novo mutations,
such as autism
53
, congenital heart disease
54
, and schizophrenia
55
. Additionally, it has been
proposed that background effects can shape the potential trajectories of evolutionary
adaptation
56,57
, influence the emergence of novel traits
10
, and help maintain deleterious genetic
variation within populations
58
.
Despite the importance of background effects to biology and medicine, understanding of
their causal genetic mechanisms remains limited. Although superficially background effects are
known to arise due to genetic interactions (or “epistasis”) between mutations and standing
polymorphisms
7,41,59–62
, only recently have studies begun to provide deeper insights into the
architecture of epistasis underlying background effects. These papers indicate that background
effects often involve multiple polymorphisms that interact not only with a mutation but also with
each other
10,14,32,37–39,51,63–65
and the environment
14
. This work also suggests that background
effects are caused by a mixture of loci that show enhanced and reduced phenotypic effects in
mutants relative to wild-type individuals
1,10,14,31,32,35,37–41,66–68
. Together, these previous reports
imply that the phenotypic effect of a mutation in a given genetic background can depend on an
individual’s genotype at a potentially large number of loci that interact in complicated, highly
20
contextual ways. However, this point is difficult to explicitly show because doing so requires
systematically mapping the interactions between mutations, polymorphisms, and environment that
give rise to background effects.
In this paper, we perform a detailed genetic characterization of a number of background
effects across multiple environments. Previous work in yeast, as well as other model species, has
established that mutations in chromatin regulation and transcription often show background
effects
10,14,31,32,34,35,63
. We extend this past work by knocking out seven different chromatin
regulators in a cross of the BY4716 (BY) and 322134S (3S) strains of Saccharomyces cerevisiae.
We generate and genotype 1411 wildtype and knockout segregants, measure the growth of these
individuals in 10 environments, and perform linkage mapping with these data. In total, we identify
1086 interactions between the knockouts and segregating loci. These interactions allow us to
obtain novel, detailed insights into the genetic architecture of background effects across different
mutations and environments.
2.3 Results
2.3.1 Preliminary screen.
When a mutation that exhibits background effects is introduced into a population, the
phenotypic variance among individuals will often change
37,38,40,66
. Here, we attempted to identify
mutations that induce such changes in phenotypic variance. Specifically, we screened 47 complete
gene knockouts of histones, histone-modifying enzymes, chromatin remodelers, and other
chromatin-associated genes for impacts on phenotypic variance in segregants from a cross of the
BY and 3S strains of budding yeast (Fig. S2.1 and S2.2; Table S2.1; Methods). To do this, we
generated BY/3S diploid hemizygotes, sporulated these hemizygotes to obtain haploid knockout
segregants, and then quantitatively phenotyped these BY×3S knockout segregants for growth on
21
rich medium containing ethanol, an environment in which we previously found background effects
that influence yeast colony morphology
10,14,29,32
. For each panel of segregants, three biological
replicate end-point colony growth assays were performed and averaged. We then tested whether
the knockout segregants exhibited significantly higher phenotypic variance than wild-type
segregants using Levene’s test (Table S2.2). This analysis implicated CTK1 (a kinase that
regulates RNA polymerase II), ESA1 and GCN5 (two histone acetyltransferases), HOS3 and
RPD3 (two histone deacetylases), HTB1 (a copy of histone H2B), and INO80 (a chromatin
remodeler) as knockouts that show background effects in the BY×3S cross (Fig. S2.1;
Supplementary Note 2.1).
2.3.2 Mapping of mutation-independent and mutation-responsive effects.
To map loci that interact with the seven knockouts identified in the screen, we genotyped
1411 segregants in total. This included 164 wild-type, 210 ctk1Δ, 122 esa1Δ, 215 gcn5Δ, 220
hos3Δ, 177 htb1Δ, 141 ino80Δ, and 162 rpd3Δ segregants (Fig. S3; Tables S2.3 through 2.5;
Methods). These genotyped segregants were phenotyped for growth in 10 diverse environments
using replicated end-point colony growth assays (Fig. S2.4; Table S2.6; Methods). We note that,
despite causing increased phenotypic variance in ethanol, the knockouts induced a broad range of
phenotypic responses in other environments (Fig. S2.4).
As described in detail in Methods, genome-wide linkage mapping scans were conducted
within each environment (Data S2.1 and S2.2). To maximize statistical power, we analyzed the
1411 segregants jointly using a fixed-effects linear model that accounted for genetic background.
We identified individual loci, as well as two- and three-way genetic interactions among loci, that
exhibited the same phenotypic effect across the wild-type and knockout backgrounds (hereafter
“mutationindependent” effects). We also conducted scans for individual loci, as well as two- and
22
three-way genetic interactions among loci, that exhibited different phenotypic effects in at least
one knockout background relative to the wild-type background (hereafter “mutation-responsive”
effects). Post hoc tests were used to associate mutation-responsive effects with specific knockouts.
Mutation-responsive one-, two-, and three-locus effects can alternatively be viewed as two-, three-
, and four-way interactions where one of the involved genetic factors is a knockout. However, to
avoid confusion throughout the paper, we do not count the knockouts as genetic factors. Instead,
we classify each genetic effect as mutation-independent or –responsive and report how many loci
it involves. Representative examples of mutation-responsive effects are shown in Fig. 2.1.
Figure 2.1 Examples of mutation-responsive genetic effects.
a shows representative examples of one-, two-, and three-locus mutation-responsive effects with
larger phenotypic effects in wild-type segregants than mutants. In contrast, b shows representative
examples of one-, two-, and three-locus mutation-responsive effects with larger phenotypic effects
in mutants than wild-type segregants. Means depicted along the y axis show residuals from a fixed-
effects linear model that includes the mutation-independent effect of each involved locus, as well
23
as any possible lower-order mutation-independent and mutation-responsive effects. The different
genotype classes are plotted below the x axis. Blue and orange boxes correspond to the BY and 3S
alleles of a locus, respectively. Error bars represent one standard deviation from the mean.
In total, we detected 1211 genetic effects across the 10 environments (Figs. S2.5 and S2.6;
Data S2.3; Tables S2.7 through S2.10; Supplementary Note 2.2 and 2.3). One hundred and twenty
five (10%) of these genetic effects were mutation-independent while 1086 (90%) were mutation-
responsive (Fig. 2.2a). On average, we identified 121 genetic effects per environment, 109 of
which were mutationresponsive. However, the number of detected genetic effects varied
significantly across the environments, from 15 in room temperature to 359 in ethanol. Despite this
variability, in every environment, ≥47% of the identified genetic effects were mutation-responsive.
This suggests that, regardless of environment, most genetic effects in the cross were responsive to
the knockouts. Additionally, the seven knockouts exhibited major differences in their numbers of
mutation-responsive effects. Between 73 and 118 mutation-responsive effects were found for the
CTK1, ESA1, GCN5, HTB1, INO80, and RPD3 knockouts (Fig. 2.2b). In contrast, the HOS3
knockout had 543 mutation responsive effects (Fig. 2.2b).
24
Figure 2.2 Most mutation-responsive genetic effects involve multiple loci.
In a, the number of mutation-independent and mutation-responsive genetic effects detected in
each environment are shown. In b, the aggregate numbers of mutation-responsive effects found
for each knockout across the 10 environments are provided
2.3.3 Higher-order epistasis influences response to mutations.
While only 29% (36 of 125) of the mutation-independent effects involved multiple loci,
this proportion was more than tripled (89%; 965 of 1086) among the mutation-responsive effects
(Fig. 2.2a). Simulations indicate that our statistical power to detect mutation-responsive loci was
appreciably higher for single locus effects than for multiple locus effects, suggesting that our
results may underestimate the importance of higher-order epistasis to background effects (Fig.
S2.7). To better assess how loci involved in the identified higher-order interactions contribute to
background effects, we partitioned the individual and joint contributions of involved loci to
mutation-responsive phenotypic variance (Data S2.4 and S2.5; Methods). For mutationresponsive
two-locus effects, on average, 78% of the mutationresponsive phenotypic variance was attributed
to the higher-order interaction between the knockout and both loci (Fig. 2.3a). Likewise, among
mutation-responsive three-locus effects, on average, 58% of the mutation-responsive phenotypic
variance was explained by the higher-order interaction of the knockout and the three loci (Fig.
2.3b). Thus, most mutation-responsive effects involve multiple loci that contribute to background
25
effects predominantly through their higher-order interactions with each other and a mutation,
rather than through their individual interactions with a mutation.
Figure 2.3 Higher-order epistasis among knockouts and multiple loci is an important contributor
to background effects.
In a, for each mutation-responsive two-locus effect, we partitioned the individual and joint
contributions of the two loci. L1 and L2 refer to the involved loci, while KO denotes the relevant
knockout. We determined the relative phenotypic variance explained (PVE) by interactions
between a knockout and each individual locus (i.e., KO × L1 and KO × L2) and higher-order
epistasis involving a knockout and the two loci (i.e., KO × L1 × L2). Similarly, in b, for each
mutation-responsive three-locus effect, we determined the relative PVE for all possible mutation-
responsive one-, two-, and three-locus effects involving the participating loci. In both a and b,
relative PVE values were calculated using sum of squares obtained from ANOVA tables, as
described in Methods. Mutation-responsive effects that interact with multiple knockouts are shown
multiple times, once for each relevant knockout.
2.3.4 Environment plays a strong role in background effects.
The role of the environment in background effects has yet to be fully characterized.
Although our group previously showed that the genetic architecture of background effects can
significantly change across environments
14
, this past work focused on only a modest number of
segregating loci and environments. To more generally assess how the environment influences the
26
genetic architecture of background effects, we determined whether the 1086 mutation-responsive
effects impacted phenotype in environments outside the ones in which they were originally
detected. This analysis was performed using statistical thresholds that were more liberal than those
employed in our initial genetic mapping (Methods). In all, 29% (311) of the mutation-responsive
effects were detectable in additional environments, with this proportion varying between 7% and
65% across the 10 environments. (Fig. 2.4). Of these mutation-responsive effects, 64% (200) were
identified in only one additional environment, 28% (85) were found in two additional
environments, and just 8% (26) were detected in ≥3 environments. Given the limited resolution of
the data, it is possible that some of the mutation-responsive effects that were detected in multiple
environments in fact represent distinct, closely linked loci that act in different environments. Such
linkage would lead us to overestimate how often mutation-responsive effects contribute to
background effects in different environments, further suggesting that most mutationresponsive
effects act in a limited number of environments. These findings support the idea that background
effects are caused by complex interactions between not only mutations and polymorphisms but
also the environment.
27
Figure 2.4 Analysis of mutation-responsive effects across environments.
The height of each stacked bar indicates the number of mutation-responsive effects that were
detected in a given environment. The bars are color-coded according to the number of additional
environments in which these mutation-responsive effects could be detected when liberal statistical
thresholds were employed (Methods).
2.3.5 Interactions between loci and different knockouts.
We next looked at how the same mutation-responsive effects interact with different
knockouts. Based on involvement of the same loci, the 1086 mutation-responsive effects were
collapsed into 594 distinct mutation-responsive effects that showed epistasis with at least one
knockout (Methods). In all, 65% of these mutation-responsive effects were found in only one
knockout background, while 35% were identified in ≥2 knockout backgrounds ( Fig. S2.8). Also,
97% of the mutation-responsive effects that interacted with only one knockout were HOS3-
responsive and these effects represented 69% of the total interactions detected in a hos3Δ
background (Fig. 5a). In contrast, nearly all (between 95% and 100%) of the CTK1-, ESA1-,
GCN5-, HTB1-, INO80-, and RPD3-responsive effects were detected in multiple backgrounds
(Fig. 2.5a). Although the mutation-responsive effects exhibited a broad, continuous range of
responses to the knockouts (Fig. 2.5b), they could be partitioned into two qualitative classes—
enhanced and reduced—based on whether they explained more or less phenotypic variance in
28
mutants than in wild-type segregants, respectively. The distinct mutation-responsive effects
exhibited a strong relationship between their number of interacting knockouts and how they were
classified. Mutation-responsive effects that interacted with fewer than three knockouts
predominantly were in the enhanced class, while mutation-responsive effects that interacted with
four or more knockouts typically were in the reduced class (χ2 = 709.37, d.f. = 6, p = 5.81 ×
10−150; Fig. 2.5c; Data S2.6). These results illustrate how background effects are caused by a
mixture of loci that respond specifically to mutations in particular genes and loci that respond more
generically to mutations in different genes, with the relative contribution of these two classes of
loci varying significantly across mutations. Our findings also suggest that how loci respond to
mutations in a particular gene is related to the degree to which they interact with mutations in other
genes.
Figure 2.5 Analysis of mutation-responsive effects across knockout backgrounds.
In a, the number of mutation-responsive effects that interacted with only one knockout (pink) or
interacted with multiple knockouts (blue) are shown for each knockout. In b, the phenotypic
variance explained (PVE) for each mutation-responsive effect is shown in the relevant knockout
(KO) segregants, as well as in the wild-type (WT) segregants. The PVE for each mutation-
responsive effect was determined using fixed-effects linear models fit within each individual
background (Methods). Mutation-responsive effects are color-coded by the knockout population
in which they were identified. In c, the percentage of mutation-responsive effects that showed
larger phenotypic effects in mutants than in wild-type segregants (y axis, left side) and mutation-
responsive effects that showed larger phenotypic effects in wild-type segregants than in mutants
(y axis, right side) is depicted. These values are plotted as a function of the number of knockouts
that interact with a given mutation-responsive effect. Error bars represent 95% bootstrap
confidence intervals (Methods).
29
2.3.6 Genetic basis of induced changes in phenotypic variance.
Lastly, we looked at the extent to which the identified mutation-responsive effects in
aggregate related to differences in phenotypic variance between the knockout and wild-type
versions of the BY×3S cross across the 10 environments (Table S2.11). This was important
because many studies (e.g., refs.
31,37–40,69
) have described how perturbing certain genes can alter
the phenotypic variance within a population, but the genetic underpinnings of this phenomenon
have not been fully determined. Among the 70 different combinations of the 7 knockouts and 10
environments, we found a highly significant relationship between differences in the numbers of
mutation-responsive effects with reduced and enhanced phenotypic effects and knockout-induced
changes in phenotypic variance (Fig. 2.6; Spearman’s ρ = 0.84, p = 4.33 × 10−20). No such
relationship was seen when we looked at the mean phenotypic changes induced by the mutations
(Fig. S2.9). To control for potential biases in our analyses that might arise from allele frequency
differences among the backgrounds (Fig. S2.3), we performed the same analysis on each knockout
background individually using data from the 10 environments (Table S2.11). When we did this,
we found that all seven knockout backgrounds exhibited nominally significant correlations
between observed changes in phenotypic variance and detected mutation-responsive effects
(Spearman’s ρ > 0.71, p < 0.02; Fig. S2.10). Permutations indicate that the probability of observing
this result by chance is low (p < 10−5). Thus, these findings are consistent with the knockout-
induced changes in phenotypic variance resulting from a large number of epistatic loci with small
phenotypic effects (Fig. S2.11; Table S2.11). In summary, our results not only provide valuable
insights into the genetic architecture of background effects but also illustrate how interactions
between mutations, segregating loci, and the environment can influence a population’s phenotypic
variance
30
Figure 2.6. Mutation-responsive effects underlie differences in phenotypic variance between
knockout and wild-type backgrounds across environments.
Each point’s position on the x axis represents the difference in phenotypic variance between a
knockout background of the cross (VP.Mut) and the wild-type background of the cross (VP.WT) in a
single environment. The y axis shows the difference in the number of mutation-responsive effects
with enhanced and reduced phenotypic effects. In this paper, we classified mutation-responsive
effects as enhanced or reduced based on whether they explained more or less phenotypic variance
in mutants relative to wild-type segregants, respectively. Spearman’s ρ and its associated p value
are provided on the plot. Colors denote different knockout backgrounds
2.4 Discussion
Most prior studies of background effects have described specific examples without
identifying the contributing loci. Here we used a screen of 47 different chromatin regulators to
identify 7 knockout mutations that exhibit strong background effects in a yeast cross. We then
generated and phenotyped a panel of 1411 mutant and wild-type segregants. Using these data, we
detected 1086 genetic interactions that involve the 7 knockouts and loci that segregate in the cross.
To better understand the genetic architecture of background effects, we comprehensively
examined how the identified loci interact not only with the knockouts but also with each other and
31
the environment. Our results confirm important points about the genetic architecture of
background effects that to date have been suggested but not conclusively proven. Namely,
background effects can be highly polygenic, with many, if not most, loci contributing through
higher-order genetic interactions that involve a mutation and multiple loci. These loci can respond
to mutations in different ways, such as by exhibiting enhanced and reduced phenotypic effects in
mutants relative to wild-type individuals. Moreover, most of these interactions between mutations
and segregating loci also involve the environment. Altogether, these findings shed light on the
complex genetic and genotype–environment interactions that give rise to background effects.
Our work also illustrates how the genetic architecture of background effects varies
significantly across different mutated genes. In our study, response to six of the seven knockouts
was mediated almost exclusively by loci that respond to mutations in different genes and
predominantly exhibit reduced effects in mutants relative to wild-type segregants. Given that some
of the examined chromatin regulators have counteracting or unrelated biochemical activities
70,71
,
we propose that loci detected in multiple knockout backgrounds respond generically to
perturbations of cell state or fitness, rather than to any specific biochemical process. In contrast,
response to HOS3 knockout was largely mediated by loci that were not detected when the other
genes were compromised. Why so many loci responded specifically to perturbation of HOS3 is
difficult to infer from current understanding of Hos3’s biochemical activities. Although Hos3 can
deacetylate all four of the core histones
72
and influence chromatin regulation in certain genomic
regions
73
, it also plays roles in cell cycle
74
and nuclear pore regulation
75
. Thus further work is
needed to characterize HOS3 and its extensive epistasis with polymorphisms in the BY×3S cross.
In addition to advancing understanding of background effects, our results may also have
more general implications for the genetic architecture of complex traits. Many phenotypes,
32
including common disorders like autism
53
and schizophrenia
55
, are influenced by loss-of-function
mutations that occur de novo or persist within populations at low frequencies. We have shown that
these mutations can significantly change the phenotypic effects of many polymorphisms within a
population by altering how these polymorphisms interact with each other and the environment.
Although these complicated interactions between mutations, standing polymorphisms, and the
environment are often ignored in genetics research, our study suggests that they in fact play a
major role in determining the relationship between genotype and phenotype.
2.5 Methods
2.5.1 Generation of different BY×3S knockout backgrounds
All BY×3S segregants described in this paper were generated using the synthetic genetic
array marker system, which makes it possible to obtain MATa haploids by digesting tetrads and
selecting for spores on minimal medium lacking histidine and containing canavanine
21
(Fig. S2.1).
We first constructed a BY/3S diploid by mating a BY MATa can1Δ::STE2pr-SpHIS5 his3∆ strain
to a 3S MATα ho::HphMX his3∆::NatMX strain. This diploid served as the progenitor for the wild-
type segregants. Hemizygous complete gene deletions were engineered into this wild-type BY/3S
diploid to produce the progenitors of the knockout segregants. Genes were deleted using
transformation with PCR products that were comprised of (in the following order) 60 bp of
genomic sequence immediately upstream of the targeted gene, KanMX, and 60 bp of genomic
sequence immediately downstream of the targeted gene. Lithium acetate transformation was
employed
76
. To obtain a given knockout, transformants were selected on rich medium containing
G418, ClonNAT, and Hygromycin B, and PCR was then used to check transformants for correct
integration of the KanMX cassette. These PCRs were conducted with primer pairs where one
primer was located within KanMX and the other primer was located adjacent to the expected site
33
of integration. PCR products were Sanger sequenced. Primers used in these checks are reported in
Table S2.18. Wild-type and hemizygous knockout diploids were sporulated using standard
techniques. Low-density random spore plating (around 100 colonies per plate) was then used to
obtain haploid BY×3S segregants from each wild-type and knockout background of the cross.
Wild-type segregants were isolated directly from MATa selection plates, while knockout
segregants were first replica plated from MATa selection plates onto G418 plates, which selected
for the gene deletions.
2.5.2 Genotyping of segregants
Segregants were genotyped using low-coverage whole-genome sequencing. A sequencing
library was prepared for each segregant using the Illumina Nextera Kit and custom barcoded
adapters. Libraries from different segregants were pooled in equimolar fractions and these
multiplex pools were size selected using the Qiagen Gel Extraction Kit. Multiplexed samples were
sequenced by BGI on an Illumina HiSeq 2500 using 100 bp × 100 bp paired-end reads. For each
segregant, reads were mapped against the S288c genome (version
S288C_reference_sequence_R64-2-1_20150113.fsa from https://www.yeastgenome.org) using
BWA version 0.7.7-r44
77
. Pileup files were then produced with SAMTOOLS version 0.1.19-
44428 cd
78
. BWA and SAMTOOLS were run with their default settings. Base calls and coverages
were obtained from the pileup files for 36,756 previously identified high-confidence single-
nucleotide polymorphisms (SNPs) that segregate in the cross
10
. Individuals who showed evidence
of being aneuploid, diploid, or contaminated based on unusual patterns of coverage or
heterozygosity were excluded from further analysis. We also used the data to confirm the presence
of KanMX at the gene that had been knocked out. Individuals with an average per site coverage
<1.5× were removed from the dataset. A vector containing the fraction of 3S calls at each SNP
34
was generated and used to make initial genotype calls with sites above and below 0.5 classified as
3S and BY, respectively. This vector of initial genotype calls was then corrected with a Hidden
Markov Model (HMM), implemented using the HMM package version 1.0 in R
79
. We used the
following transition and emission probability matrices:
transProbs = matrix(c(.9999,.0001,.0001,.9999),2) and
emissionProbs = matrix(c(.0.25,0.75,0.75,0.25),2). We examined the HMM-corrected genotype
calls for adjacent SNPs that lacked recombination in the segregants. In such instances, a single
SNP was chosen to serve as the representative for the entire set of adjacent SNPs that lacked
recombination. This reduced the number of markers used in subsequent analyses from 36,756 to
8311.
2.5.3 Phenotyping of segregants
Prior to phenotyping, segregants were always inoculated from freezer stocks into YPD
broth containing 1% yeast extract (BD Product #: 212750), 2% peptone (BD 211677), and 2%
dextrose (BD Product #:15530). After these cultures had reached stationary phase, they were
pinned onto and outgrown on plates containing 2% agar (BD 214050). Unless specified, these
plates were made with YPD and incubated at 30 °C for 2 days. However, some of the environments
required adding a chemical compound to the YPD plates or changing the temperature or carbon
source. In addition to YPD at 30 °C, we measured growth in the following environments: 21 °C,
42 °C, 2% ethanol (Koptec A06141602W), 250 ng/mL 4-nitroquinoline 1-oxide (“4NQO”) (TCI
N0250), 9 mM copper sulfate (Sigma 209198), 50 mg/mL fluconazole (TCI F0677), 260 mM
hydrogen peroxide (EMD Millipore HX0640-5), 7 mg/mL neomycin sulfate (Gibco 21810-031),
and 5 mg/mL zeocin (Invivogen ant-zn-1). For 4-NQO, copper sulfate, fluconazole, hydrogen
peroxide, neomycin sulfate, and zeocin, the doses used for phenotyping were chosen based on
35
preliminary experiments across a broader range of concentrations (Table S2.6). Growth assays
were conducted in triplicate using a randomized block design to account for positional effects on
the plates. Four BY controls were included on each plate. Plates were imaged using the BioRAD
Gel Doc XR+ Molecular Imager. Each image was 11.4 × 8.52 cm
2
(width × length) and imaged
under white epi-illumination with an exposure time of 0.5 s. Images were exported as Tiff files
with a resolution of 600 dpi. As in ref.
80
, image analysis was conducted in the ImageJ software,
with pixel intensity for each colony calculated using the Plate Analysis JRU v1 plugin
(http://research.stowers.org/imagejplugins/index.html). The growth of each segregant on each
plate was computed by dividing the segregant’s total pixel intensity by the mean pixel intensity of
the average of BY controls from the same plate. The replicates for a segregant within an
environment were then averaged and used as that individual’s phenotype in subsequent analyses.
2.5.4 Scans for one-locus effects
All genetic mapping was conducted within each environment using fixed-effects linear
models applied to the complete set of 1411 wild-type and knockout segregants. To ensure that
mean differences in growth among the eight backgrounds were always controlled for during
mapping, we included a background term in our models. Throughout the paper, we refer to loci or
combinations of loci that statistically interact or do not statistically interact with
the background term as mutation-responsive and mutation-independent, respectively. Genetic
mapping was performed in R using the lm() function, with the p values for relevant terms obtained
from tables generated using the summary() function.
We first identified individual loci that show mutation-independent or mutation-responsive
phenotypic effects using forward regression. To detect mutation-independent loci, genome-wide
scans were conducted with the model phenotype ~ background + locus + error. Significant loci
36
identified in this first iteration were then used as covariates in the next iteration,
i.e., phenotype ~ background + known_locus1 … known_locusN + locus + error, where
the known_locus terms corresponded to each of the loci that had already been identified in a given
environment. To determine significance, 1000 permutations were conducted at each iteration of
the forward regression, with the correspondence between genotypes and phenotypes randomly
shuffled each time. Among the minimum p values obtained in the permutations, the fifth quantile
was identified and used as the threshold for determining significant loci. This process was iterated
until no additional loci could be detected for each environment.
To identify mutation-responsive one-locus effects, we employed the same procedure
described in the preceding paragraph, except the
model phenotype ~ background + locus + background:locus + error was used. Here the
significance of the background:locus interaction term was tested, again with significance
determined by permutations as described in the preceding paragraph. The locus term was included
in the model to ensure that phenotypic variance explained by mutation-independent effects did not
load onto the mutation-dependent effects. For each locus with significant background:locus terms,
we included both an additive and background interaction term in the subsequent iterations of the
forward regression:
i.e., phenotype ~ background + known_locus1 … known_locusN + locus + background:known_lo
cus1 … background:known_locusN + background:locus + error. The known_locus terms were
included in these forward regression models to ensure that variance due to the mutation-
independent effects of previously identified loci was not inadvertently attributed to the mutation-
responsive terms for these loci. This process was iterated until no
additional background:locus terms were discovered for each environment.
37
2.5.5 Scans for two-locus effects
We also performed full-genome scans for two-locus effects. Here every unique pair of loci
was interrogated using fixed-effects linear models like those described in the preceding section.
As with the one-locus effects, we employed two models in parallel. The
model phenotype ~ background + locus1 + locus2 + locus1:locus2 + error was used to identify
mutation-independent two-locus effects, whereas the
model phenotype ~ background + locus1 + locus2 + background:locus1 + background:locus2 + l
ocus1:locus2 + background:locus1:locus2 + error was employed to detect mutation-responsive
two-locus effects. Specifically, we tested for significance of
the locus1:locus2 and background:locus1:locus2 interaction terms with the former and latter
models, respectively. Simpler terms were included in each model to ensure that variance was not
erroneously attributed to more complex terms. Significance thresholds for these terms were
determined using 1000 permutations with the correspondence between genotypes and phenotypes
randomly shuffled each time. However, to reduce computational run time, 10,000 random pairs of
loci, rather than all possible pairs of loci, were examined in each permutation. Significance
thresholds were again established based on the fifth quantile of minimum p values observed across
the permutations. To ensure that our main findings were robust to threshold, we also generated
results at false discovery rates (FDRs) of 0.01, 0.05, and 0.1 by comparing the rate of discoveries
at a given p value in the permutations to the rate of discoveries at that same p value in our results
(Table S2.10; Supplementary Note 2.2).
2.5.6 Scans for three-locus genetic effects
Owing to computational limitations, we were unable to run a comprehensive scan for
mutation-independent and mutation-responsive three-locus effects. Instead, we scanned for three-
38
locus effects involving two loci that had already been identified in a given environment
(known_locus1 and known_locus2) and a third locus that had yet to be detected (locus3). The
model phenotype ~ background + known_locus1 + known_locus2 + locus3 + known_locus1:kno
wn_locus2 + known_locus1:locus3 + known_locus2:locus3 + known_locus1:known_locus2:locu
s3 + error was used to identify mutation-independent three-locus effects, whereas the
model phenotype ~ background + known_locus1 + known_locus2 + locus3 + background:known
_locus1 + background:known_locus2 + background:locus3 + known_locus1:known_locus2 + kn
own_locus1:locus3 + known_locus2:locus3 + background:known_locus1:known_locus2 + backg
round:known_locus1:locus3 + background:known_locus2:locus3 + known_locus1:known_locus
2:locus3 + background:known_locus1:known_locus2:locus3 + error was employed to detect
mutation-responsive three-locus effects. Significance of
the known_locus1:known_locus2:locus3 and background:known_locus1:known_locus2:locus3 te
rms in the respective models was determined using 1000 permutations with the correspondence
between genotypes and phenotypes randomly shuffled each time. For each permutation, 10,000
trios of sites were chosen by first randomly picking two loci on different chromosomes and then
randomly selecting an additional 10,000 sites. The minimum p value across the 10,000 tests was
retained. Significance thresholds were again established based on the fifth quantile of
minimum p values observed across the permutations. As with the two-locus effect scans, we also
performed our analysis across multiple FDR thresholds to ensure that our findings were robust
(Table S2.10; Supplementary Note 2.2).
2.5.7 Assignment of mutation-responsive effects to knockouts
In the aforementioned linkage scans, genetic effects exhibited statistical interactions with
the background term if they had a different phenotypic effect in at least one of the eight
39
backgrounds relative to the rest. To determine the specific knockouts that interacted with each
mutation-responsive effect, we used the contrast() function from the R package lsmeans. This was
applied to the specific effect of interest post hoc using the same linear models that were employed
for detection. All possible pairwise contrasts between wild-type and knockout segregants were
conducted. Mutation-responsive effects were assigned to specific mutations if the contrast between
a mutation and a WT population was nominally significant. Unless otherwise noted, we counted
each assignment of a mutation-responsive effect to a specific knockout as a separate genetic effect
even if they involved the same set of loci.
2.5.8 Statistical power analysis
To determine the statistical power of our mapping procedures, we simulated phenotypes
for the 1411 genotyped segregants given their genotypes at randomly chosen loci and then tried to
detect these loci using the approaches described earlier. In each simulation, a given segregant’s
phenotype was determined based on both the mutation it carried (if any), as well as its genotype at
one, two, or three randomly chosen loci. The effects of mutations were calculated based on the
real phenotype data for the glucose environment. Phenotypic effects of the mutation-responsive
locus or loci were also attributed to each segregant. Specifically, the phenotype of segregants in
only one of the possible genotype classes were increased by a given increment, which we refer to
as the absolute effect size. For one-, two-, and three-locus effects, this respectively entailed half,
one quarter, and one eighth of the individuals having their phenotypes increased by the increment.
In the case of mutation-independent effects, these increments were applied to all eight of the wild-
type and knockout backgrounds. In contrast, for mutation-responsive effects, increments were only
applied to one of the eight backgrounds, with the specific background randomly chosen. Lastly,
random environmental noise was added to each segregant’s phenotype. Using these genotype and
40
phenotype data, we tested whether we could detect the loci that had been given a phenotypic effect.
This was done by fitting the appropriate fixed-effects linear model, extracting the p value for the
relevant term, and determining if that p value fell below a nominal significance threshold
of α = 0.05. Statistical power was calculated as the proportion of tests at a given phenotypic
increment where p ≤ 0.05. The results of this analysis are shown in Fig. S2.7.
2.5.9 Contributions of loci involved in higher-order epistasis
For all mutation-responsive two- and three-locus effects, we determined the proportion of
mutation-responsive phenotypic variance explained by each individual locus and the interactions
among these loci. To do this, we generated seven subsets of data, each of which were comprised
of the wild-type segregants and one set of knockout segregants. We then fit the same model that
was used to originally identify a given mutation-responsive effect to the appropriate data subsets.
For two-locus effects, we obtained the sum of squares for
the background:locus1, background:locus2, and background:locus1:locus2 terms. We then
divided each of these values by the sum of all three sum of squares. For three-locus effects, we
obtained the sum of sum of squares associated with each individual locus
(background:locus1, background:locus2, background:locus3) and pair of loci
(background:locus1:locus2, background:locus1:locus3, background:locus2:locus2), as well as
the sum of squares associated with the trio of loci (background:locus1:locus2:locus3). We then
divided the total sum of squares associated with each class of terms by the total sum of squares
across all mutation-responsive terms. The ternary plots used to show these results were generated
using the R package ggtern.
2.5.10 Analysis of mutation-responsive effects across environments
41
We determined whether each one-, two-, and three-locus mutation-responsive effect
exhibited a phenotypic effect in any environment outside the one in which it was originally
detected. To do this, we used seven subsets of data, each of which was comprised of the wild-type
segregants and one set of knockout segregants. We then fit the same model that was used to
originally identify a given mutation-responsive effect to the appropriate data subsets for each of
the nine additional environments. The p value was then extracted for the relevant term. Bonferroni
corrections were used to account for multiple testing.
2.5.11 Genetic explanation of changes in phenotypic variance
We measured the phenotypic variance explained by each mutation-responsive genetic
effect in the relevant knockout background(s), as well as in the wild-type background. Here we fit
each mutation-responsive genetic effect in both populations without using any background term.
For mutation-responsive one-, two-, and three-locus effects, the following models were
respectively
employed: phenotype ~ locus1 + error, phenotype ~ locus1 + locus2 + locus1:locus2 + error,
and phenotype ~ locus1 + locus2 + locus3 + locus1:locus2 + locus1:locus3 + locus2:locus3 + loc
us1:locus2:locus3 + error. Partial R
2
values were obtained in each population by obtaining the
sum of squares associated with the term of interest and dividing it by the total sum of squares.
Mutation-responsive effects were then classified by the number of knockout backgrounds in which
they were detected. For each class, the number of genetic effects with larger partial R
2
values in
the knockout background than in the wild-type background (enhanced effects) and the number of
genetic effects with smaller partial R
2
values in the knockout background than in the wild-type
background (reduced effects) were determined. The proportion of mutation-responsive effects that
show enhanced and reduced phenotypic effects was calculated for each class. 95% bootstrap
42
confidence intervals were then generated using 1000 random samplings of the data with
replacement.
2.5.12 Checking potential consequences of allele frequency bias
Allele frequency bias may result in the erroneous detection of mutation-responsive genetic
effects due to uneven representation of one-, two-, or three-locus combinations across the knockout
and wild-type backgrounds. To account for this, we generated 2 × 8, 4 × 8, and 8 × 8 contingency
tables for all one-, two-, or three-locus interactions, respectively, counting each of the possible
allele combinations in the wild-type and seven knockout populations. Specifically, for one-locus
interactions, we counted the number of individuals carrying the BY and 3S allele at the significant
locus for each population. For two-locus interactions, the number of individuals carrying the
BY/BY, BY/3S, 3S/BY, and 3S/3S alleles at the two loci were enumerated. For three-locus
interactions, the number of individuals carrying the BY/BY/BY, BY/BY/3S, BY/3S/BY,
3S/BY/BY, BY/3S/3S, 3S/BY/3S, 3S/3S/BY, and 3S/3S/3S alleles at the three loci were counted.
We then ran chi-square tests to identify individual loci or combinations of loci that show different
frequencies across the eight backgrounds, using Bonferroni corrections to account for multiple
testing. After filtering out genetic effects that involve loci or combinations of loci with biased
frequencies, we repeated our main analyses to ensure that our results were robust to allele
frequency differences (Figs. S2.5 and S2.6; Table S2.11 and S2.12; Supplementary Note 2.3).
43
2.4 Supplementary Material
Figure S2.1 Generation of BYx3S knockout segregants.
For each knockout or wild type background in our study, a BY/3S diploid was generated and
sporulated. MATa segregants were obtained using the synthetic genetic array marker system
1
(Methods). Wild type segregants were collected directly from MATa selection plates, while
knockout segregants were replica-plated from MATa selection plates onto G418 plates prior to
their collection to select for segregants with the gene deletion.
44
Figure S2.2 Certain genes exhibit significant background effects when perturbed.
In a preliminary screen, we generated and phenotyped segregants from 47 mutant versions of the
same yeast cross, each of which lacked a different chromatin-associated protein (Supplementary
Table S2.1). The coefficient of variation found in a given knockout background is shown on x-
axis, while the phenotypic variance is shown on the y-axis. In addition to increased phenotypic
variance, we found that knockout of certain genes, in particular ESA1, caused severe growth
reductions in all but few outlier segregants, which resulted in a high coefficient of variation.
Presence of these outliers may reflect higher-order interactions among loci, which would lead to
small fraction of individuals showing unusual growth
2,3
. Using Levene’s Test, we found that seven
genes exhibit significant background effects when deleted: CTK1, ESA1, GCN5, HOS3, HTB1,
INO80, and RPD3 (Table S2.2). The points corresponding to these genes are shown in color, while
the point corresponding to wild type is illustrated in black. All other points are gray.
45
Figure S2.3 Allele frequency plot.
46
(a) Allele frequency plots are shown for each knockout and wild type background. 50kb regions
surrounding the knockouts, as well as regions that were fixed due to selection on markers used to
generate MATa haploids, are highlighted in red. Regions where the allele frequency in at least one
population is significantly different from other populations (Table S2.4; Supplementary Note 2.1;
Methods), as well as regions that were fixed due to mitotic recombination or gene conversion in
the progenitor hemizygous diploids, are highlighted in blue (Table S2.5; Supplementary Note 2.3;
Methods). (b) We observed that all of Chromosome XII was enriched in the gcn5∆ population.
This appears to be due to selection against a recombinant version of the chromosome. Specifically,
individuals who harbored a 3S-BY recombinant haplotype centered on YCS4 were depleted (Table
S2.5). This site of increased recombination on Chromosome XII in the gcn5∆ population is
denoted with an asterisk. No recombinants were observed at this site among wild type segregants,
suggesting that the gcn5∆ knockout resulted in a new recombination hotspot.
47
Figure S2.4 Growth of all 1,411 segregants across the 10 environments.
48
49
Figure S2.5 Individual and joint contributions of loci to background effects across different
significance thresholds.
In a, for each mutation-responsive two-locus effect identified across different significance
thresholds, the relative phenotypic variance explained (PVE) by the individual involved loci and
their interaction is illustrated. The same analysis was also performed after loci that show biased
two-locus allele frequencies were filtered from the set of two-locus effects identified at the α =
0.05 threshold. In b, for each mutation-responsive three-locus effect identified across different
significance thresholds, the relative PVE for the individual loci, the pairs of loci, and the trio of
loci is provided. Similar to mutation-responsive two-locus effects, loci that show biased three-
locus allele frequencies were filtered from the set of three-locus effects identified at the α = 0.05
threshold and their relative PVE values were determined. Relative PVE values were calculated
using sum of squares obtained from ANOVA tables, as described in the Methods. As with the
results reported in the paper, which were obtained using the α = 0.05 threshold, we find that loci
involved in most mutation-responsive effects identified at other threshold mainly contribute to
background effects through higher-order epistasis.
50
Figure S2.6 Analysis of how mutation-responsive effects interact with different knockouts at
multiple significance thresholds.
Our findings are reported across FDRs of 0.01, 0.05, and 0.10, as well as α = 0.05 after filtering of
loci and multi-locus genotype classes with biased frequencies. In each row, the left plot shows the
number of genetic effects found in one (pink) or more (blue) knockout backgrounds. The middle
plot shows the PVE for mutation-responsive effects in the wild type and relevant knockout
segregants. The third plot shows the percentage of genetic effects with larger PVE in the relevant
knockout background than the wild type background (PVEKO > PVEWT) as a function of the
number of knockouts that interact with the effect. We provide an additional y-axis on the right side
of the plot, which indicates the percentage of genetic effects with smaller PVE in the relevant
51
knockout background than the wild type background (PVEKO < PVEWT) . Error bars represent
95% bootstrap confidence intervals (Methods).
Figure S2.7 Statistical power analysis for one, two, and three-locus interactions.
We determined our statistical power to detect different types of mutation-independent and
mutation-responsive genetic effects. This was done by simulating phenotype data for the 1,411
segregants and then performing genetic mapping on the simulated phenotype data (Methods). Each
set of simulated phenotypes was determined based on which background of the BYx3S cross a
segregant came from, as well as the segregant’s genotype at one or more randomly chosen loci,
and a knockout that was randomly selected to interact with the loci. We then applied a random
deviate to each segregant’s phenotype, which was intended to represent environmental noise. The
relevant fixed-effects linear model was fit using the genotype and simulated phenotype data, and
the p-value for the appropriate term in the model was obtained. Statistical power for a given
absolute effect size was calculated as the proportion of tests that had a p-value £ 0.05. These
simulations are based on our real phenotype data for glucose. In this environment, we detected
average absolute effect sizes of 0.07, 0.15, and 0.3 for mutation-independent one, two-, and three-
locus effects, and 0.09, 0.13, and 0.26 for mutation-responsive one-, two-, and three-locus effects.
52
Figure S2.8 Extent to which mutation-responsive effects interact with different knockouts.
The number of knockout backgrounds in which a particular mutation-responsive effect was
detected is shown on the x-axis. The number of mutation-responsive effects in each class is shown
on the y-axis.
53
Figure S2.9 Absence of a relationship between identified mutation-responsive effects and mean
phenotypic differences between knockout and wild type backgrounds.
Each point represents a different knockout background and environment. A point’s position on the
x-axis indicates the difference in mean between a particular knockout (MeanP.Mut) background and
the wild type (MeanP.WT) background in a single environment. On the y-axis, the difference in the
number of genetic effects with enhanced and reduced phenotypic effect in mutants relative to wild
type segregants are shown. The spearman’s ρ and its associated p-value are provided on the plot.
54
Figure S2.10 All seven knockout populations show nominally significant correlations between
changes in phenotypic variance and detected mutation-responsive effects.
Individual panels show the results for the ctk1∆, esa1∆, gcn5∆, hos3∆, htb1∆, ino80∆, and rpd3∆
backgrounds. Each point’s position on the x-axis represents the difference in phenotypic variance
between wild type and knockout populations. On the y-axis, the difference in the number of genetic
effects with enhanced and reduced phenotypic effect in mutants relative to wild type segregants
are shown. The spearman’s ρ values and their associated p-values are provided on the plot.
55
Figure S2.11 Most mutation-responsive effects show small differences in phenotypic variance
explained (PVE) in mutants relative to wild type segregants.
On the y-axis, we show differences in PVE for each mutation-responsive effect when PVE is
computed separately in mutant and wild type segregants. Mutation-responsive effects above the
red line have a higher PVE in mutants , while mutation-responsive effects below the red line have
a higher PVE in wild type segregants.
56
Supplementary Tables
Standard
Name
Systematic
Name
Function(s)
ASF1 YJL115W Nucleosome assembly factor; involved in chromatin assembly,
disassembly
CHD1 YER164W Chromatin remodeler; regulate chromatin structure and maintain chromatin
integrity
CTK1 YKL139W Catalytic (alpha) subunit of C-terminal domain kinase I (CTDK-I);
required for H3K36 trimethylation but not dimethylation by Set2p
DOT1 YDR440W Nucleosomal histone H3-Lys79 methylase
EAF3 YPR023C Subunit of Rpd3S deacetylase and NuA4 acetyltransferase complexes;
essential histone acetyltransferase complex
ELP3 YPL086C Subunit of Elongator complex; exhibits histone acetyltransferase activity
ESA1 YOR244W Catalytic subunit of the histone acetyltransferase complex (NuA4)
GCN5 YGR252W Catalytic subunit of ADA and SAGA histone acetyltransferase complexes
GIS1 YDR096W Histone demethylase and transcription factor
HAT1 YPL001W Catalytic subunit of the Hat1p-Hat2p histone acetyltransferase complex
HAT2 YEL056W Subunit of the Hat1p-Hat2p histone acetyltransferase complex
HDA1 YNL021W Putative catalytic subunit of a class II histone deacetylase complex
HHF1 YBR009C Histone H4
HHF2 YNL030W Histone H4
HHO1 YPL127C Histone H1
HHT1 YBR010W Histone H3
HHT2 YNL031C Histone H3
HIR1 YBL008W Subunit of HIR nucleosome assembly complex
HMT1 YBR034C Nuclear SAM-dependent mono- and asymmetric methyltransferase
HOS1 YPR068C Class I histone deacetylase (HDAC) family member
HOS2 YGL194C Histone deacetylase and subunit of Set3 and Rpd3L complexes
HOS3 YPL116W Trichostatin A-insensitive homodimeric histone deacetylase (HDAC)
HST1 YOL068C NAD(+)-dependent histone deacetylase
HTA1 YDR225W Histone H2A
HTA2 YBL003C Histone H2A
HTB1 YDR224C Histone H2B
HTB2 YBL002W Histone H2B
HTZ1 YOL012C Histone variant H2AZ
INO80 YGL150C ATPase and nucleosome spacing factor
ISW1 YBR245C ATPase subunit of imitation-switch (ISWI) class chromatin remodelers
JHD1 YER051W JmjC domain family histone demethylase specific for H3-K36
JHD2 YJR119C JmjC domain family histone demethylase
NAT4 YMR069W N alpha-acetyl-transferase
RLF2 YPR018W Largest subunit (p90) of the Chromatin Assembly Complex (CAF-1)
RPD3 YNL330C Histone deacetylase, component of both the Rpd3S and Rpd3L complexes
57
RPH1 YER169W JmjC domain-containing histone demethylase
RTT109 YLL002W Histone acetyltransferase
SAS2 YMR127C Histone acetyltransferase (HAT) catalytic subunit of the SAS complex
SAS3 YBL052C Histone acetyltransferase catalytic subunit of NuA3 complex
SET1 YHR119W Histone methyltransferase, subunit of the COMPASS (Set1C) complex
SET2 YJL168C Histone methyltransferase with a role in transcriptional elongation
SET3 YKR029C Subunit of the SET3 histone deacetylase complex
SET4 YJL105W Unknown function; paralog of SET3
SNF1 YDR477W AMP-activated S/T protein kinase; regulates H3 acetylation and chromatin
remodelling
SNF2 YOR290C Catalytic subunit of the SWI/SNF chromatin remodeling complex
SNF5 YBR289W Subunit of the SWI/SNF chromatin remodeling complex
SWR1 YDR334W Swi2/Snf2-related ATPase; structural component of the SWR1 complex,
which exchanges histone variant H2AZ (Htz1p) for chromatin-bound
histone H2A
Table S2.1 List of 47 genes that were screened for background effects.
The standard gene name, the systematic name, and the biological function of each gene are shown.
58
Knockout Mean Growth Variance Levene's p-value
asf1Δ 0.779317208 0.020201882 0.977508865
chd1Δ 1.007573633 0.008422714 0.012616542
ctk1Δ 0.505330301 0.046562534 0.014559018
dot1Δ 1.000469872 0.026939673 0.572692518
eaf3Δ 0.892511372 0.02996817 0.159451789
elp3Δ 0.938803654 0.020439982 0.89517808
esa1Δ 0.299208796 0.124484989 0.009723665
gcn5Δ 0.727106888 0.050249092 0.033325098
gis1Δ 1.062032492 0.012881687 0.134256473
hat1Δ 1.076895249 0.027302896 0.235343973
hat2Δ 1.051544847 0.023497867 0.757295234
hda1Δ 0.932223832 0.029161218 0.50857877
hhf1Δ 0.991287515 0.013080162 0.102604216
hhf2Δ 0.957913123 0.012705815 0.123495677
hho1Δ 1.014461673 0.013516731 0.147089216
hht1Δ 1.007595814 0.027733915 0.488955838
hht2Δ 1.021186261 0.020401193 0.968439543
hir1Δ 0.903639888 0.022913698 0.783955806
hmt1Δ 0.958772913 0.021849944 0.591705973
hos1Δ 0.947034124 0.047755186 0.118881424
hos2Δ 0.979073108 0.015371335 0.275248056
hos3Δ 0.882145103 0.07291512 2.51478E-05
hst1Δ 0.859791025 0.028231512 0.552051447
hta1Δ 0.955155532 0.032539889 0.213817393
hta2Δ 1.051833126 0.042651912 0.125633861
htb1Δ 0.453580596 0.070488173 0.002211331
htb2Δ 1.058080223 0.009848719 0.035718658
htz1Δ 0.739161229 0.016096516 0.299152086
ino80Δ 0.368393318 0.068870748 0.000178763
isw1Δ 0.999762052 0.018146537 0.389392505
jhd1Δ 0.985389791 0.018189507 0.863810698
jhd2Δ 1.015143929 0.028568957 0.436252639
nat4Δ 1.007693438 0.024951602 0.653665809
rlf2Δ 0.992563075 0.040109928 0.850565985
rpd3Δ 0.733300085 0.023198875 0.657552748
rph1Δ 0.977926455 0.022801194 0.949320021
rtt109Δ 0.787318686 0.014075713 0.210182586
sas2Δ 1.032806204 0.015742986 0.32742934
sas3Δ 0.98983123 0.017062259 0.448430416
set1Δ 0.963951523 0.045359724 0.021614959
59
set2Δ 0.917625313 0.02221723 0.963192992
set3Δ 1.011235345 0.011976087 0.092060159
set4Δ 1.042182908 0.022035319 0.910385044
snf1Δ 0.236803736 0.011258233 0.0814094
snf2Δ 0.172978254 0.019080041 0.238953594
snf5Δ 0.315511891 0.012033825 0.1426345
swr1Δ 0.817082916 0.023596511 0.810466613
WT 1.000603396 0.024287414 NA
Table S2.2 Screen summary statistics.
The mean growth and variance for each of the 48 knockout and wild type segregant backgrounds
are listed in this table. The Levene’s p-value column specifies the p-value statistic from the
Levene’s test used to assess whether the populations of knockout segregants exhibited significantly
different heritable phenotypic variation than a wild type BYx3S segregant population on ethanol.
60
Population
Number of segregants with
good data
Diploid Aneuploid
Low coverage, cross-
contamination
WT 164 0 0 76
ctk1Δ 210 0 2 28
esa1Δ 122 0 25 93
gcn5Δ 215 9 15 1
hos3Δ 220 1 3 16
htb1Δ 177 1 2 60
ino80Δ 141 5 5 89
rpd3Δ 162 0 0 78
Total 1411 16 52 441
Table S2.3 Mapping population breakdown.
This table shows the number of segregants from each knockout populations that were excluded
due to technical issues (i.e., low coverage or cross-contamination) or biological issues (i.e.,
anueploidy or diploidy).
61
Chromosome Start position End position
4 720061 861257
4 975252 1408452
5 190283 256159
7 84678 170211
7 275701 386593
7 905443 946974
10 642317 684208
11 53931 316100
12 50192 749934
13 75046 130721
14 231034 607186
15 69484 462566
15 843892 959820
16 195879 525073
Table S2.4 Genomic regions with allele frequency bias.
Regions on the genome where the allele frequency in at least one knockout population significantly
differed from other populations were found using chi-square tests. Specifically, 2x8 contingency
tables were generated and chi-square tests were ran for all 8,311 markers, counting the number of
BY and 3S allele in the wild type and 7 knockout populations. Bonferroni correction was used for
multiple testing correction.
62
Hemizygote Chromosome
Start
Position
Stop
Position
Range
(bp)
Fixed
Allele
Positions confirmed by
Sanger Sequencing
ctk1∆/CTK1 11 0 182309 182309 BY
85326 - 85727,
118436 - 119014, and
178304 - 178885
esa1∆/ESA1 15 657980 663812 5832 3S 658744 - 659697
gcn5∆/GCN5 4 969630 985630 11205 3S 975757 - 976743
gcn5∆/GCN5 4 1123442 1408596 285154 3S
1140174 - 1140998,
1251735 - 1252459,
and 1386674 -
1387285
gcn5∆/GCN5 4 1483478 1525043 41565 3S 1491421 - 1492191
gcn5∆/GCN5 12 519517 543736 24219 BY 529628 - 530346
hos3∆/HOS3 13 524221 526917 2696 3S 524221 - 524865
Table S2.5 Fixed regions in BY/3S hemizygous diploids.
This summary table shows regions in the genome where all segregants within a knockout segregant
population carried the same parental haplotype due to mitotic recombination in our parental
diploids. Also listed are positions within the fixed regions that were PCRed and Sanger sequenced
to confirm that these regions were fixed in the parental diploids.
63
Condition Stock solution Amount added per liter Final Concentration
4NQO 0.5 mg/ml 500 uL .25ug/ml
Copper 500mM 18 mL 9 mM
Ethanol 20% 100mL 2% Ethanol
Fluconazole 10 mg/ml 5 mL 50 ug/ml
Glucose NA NA 2% Glucose
High temperature NA NA 2% Glucose/42°C
Hydrogen peroxide 30% 541uL 260 uM
Neomycin 50 mg/ml 140 mL 7 mg/ml
Room temperature NA NA 2% Glucose/22°C
Zeocin 100 mg/ml 50 uL 5 ug/ml
Table S2.6 Phenotyping environments.
This table provides information on the environments used for phenotyping assays in our study. Drugs and
chemicals were added to 2% glucose plates with the exception of the ethanol condition, in which 2% ethanol
was used as the carbon source instead of glucose.
64
# of Mutation-independent effects # of Mutation-responsive effects
Two-locus Three-locus Two-locus Three-locus
α 0.05 17 19 295 670
FDR 0.10 17 2 542 2324
FDR 0.05 10 1 250 1311
FDR 0.01 4 0 79 389
Table S2.7 Number of mutation-independent and mutation-responsive genetic effects across
different significance thresholds.
One-locus effects are excluded from this analysis as we only performed scans for these effects at
a commonly used significance threshold of α=0.05. Total numbers of all identified genetic effects,
including one-locus effects, can be found in Table S2.9.
65
α 0.05 FDR 0.01 FDR 0.05 FDR 0.10 α 0.05 filtered
No.
interacti
ng KOs
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
1 383 2 303 3 893 7 1485 16 178 0
2 174 20 92 8 274 48 478 92 70 8
3 76 50 37 26 97 59 142 110 34 32
4 9 51 3 21 15 69 35 133 3 21
5 21 84 13 27 20 80 28 157 3 32
6 12 120 6 36 6 72 11 181 3 57
7 3 81 1 13 3 39 3 116 2 40
Χ
2
709.37 343.92 981.52 1870.9 341.41
p-value 5.81E-150 3.12E-71 8.87E-209 < 2.2E-16 (0) 1.08E-70
Table S2.8 Chi-squared test results for mutation-responsive effects.
This table shows p-values from chi-squared tests used to test if the ratio of mutation-responsive
genetic effects with enhanced and reduced phenotypic effects in mutants changes as a function of
number of knockouts a loci interacts (Methods). Results are reported across significance
thresholds, as well as at the initial α = 0.05 threshold after filtering out effects involving loci with
biased individual or multi-locus allele frequencies. Test statistics are also reported (Χ-sq statistics).
66
Condition α 0.05 FDR 0.01 FDR 0.05 FDR 0.10 α 0.05 filtered
4-NQO 0.070 0.167 0.154 0.118 0.000
Copper 0.104 0.148 0.118 0.084 0.123
Fluconazole 0.270 0.846 0.591 0.286 0.053
Hydrogen peroxide 0.136 0.183 0.109 0.100 0.115
High temperature 0.189 0.400 0.400 0.147 0.179
Neomycin 0.262 0.391 0.204 0.175 0.175
Room temperature 0.364 0.500 0.500 0.367 0.000
Glucose 0.649 0.769 0.645 0.635 0.586
Ethanol 0.437 0.443 0.331 0.293 0.405
Zeocin 0.500 0.875 0.875 0.875 0.500
Average 0.298 0.472 0.393 0.308 0.214
Table S2.9 Percentage of mutation-responsive genetic effects that showed phenotypic effect in any
environment outside the one in which they were originally detected.
Results are reported across significance thresholds, as well as at the initial α = 0.05 threshold after
filtering out effects involving loci with biased individual or multi-locus allele frequences on
different tabs.
67
1-locus interaction 2-locus interaction 3-locus interaction
Mutation-responsive without filtering 45 143 406
Mutation-responsive with filtering 14 73 181
Mutation-independent 89 17 19
Table S2.10 Number of mutation-responsive effects that show biased individual and multi-locus
allele frequencies.
The numbers of two-locus and three-locus mutation-responsive effects that show biased individual
and multi-locus allele frequencies were determined using chi-square tests. Specifically, 2x8, 4x8,
and 8x8 contingency tables were generated and chi-square tests were ran for all one, two, and
three-locus interactions respectively, counting the number of each loci combination in the wild
type and 7 knockout populations. Bonferroni correction was used for multiple testing correction.
68
4NQO Copper Flu H2O2 HT Neo RT Glu Eth Zeo
ctk1.Enh 1 1 2 2 0 2 1 3 10 1
ctk1.Red 1 20 2 19 11 5 2 5 5 0
ctk1.Diff 0 -19 0 -17 -11 -3 -1 -2 5 1
esa1.Enh 0 2 0 2 1 0 0 3 5 2
esa1.Red 1 20 3 14 11 5 1 2 1 0
esa1.Diff -1 -18 -3 -12 -10 -5 -1 1 4 2
gcn5.Enh 7 3 4 3 4 12 1 1 2 0
gcn5.Red 0 18 3 19 10 6 0 4 4 1
gcn5.Diff 7 -15 1 -16 -6 6 1 -3 -2 -1
hos3.Enh 17 5 15 55 15 70 4 26 296 6
hos3.Red 2 15 3 3 5 2 0 3 1 0
hos3.Diff 15 -10 12 52 10 68 4 23 295 6
htb1.Enh 7 0 19 5 5 9 0 2 4 1
htb1.Red 0 21 2 22 9 5 1 3 2 1
htb1.Diff 7 -21 17 -17 -4 4 -1 -1 2 0
ino80.Enh 0 0 2 7 0 2 1 0 9 2
ino80.Red 0 20 4 13 10 6 0 2 1 1
ino80.Diff 0 -20 -2 -6 -10 -4 1 -2 8 1
rpd3.Enh 6 4 0 2 2 2 0 0 3 0
rpd3.Red 1 15 4 18 12 4 0 3 0 1
rpd3.Diff 5 -11 -4 -16 -10 -2 0 -3 3 -1
WT.Mean 0.918 0.436 0.875 0.896 0.896 1.374 1.053 1.016 1.020 0.973
ctk1.Mean 0.425 0.372 0.113 0.727 0.054 0.407 0.206 0.633 0.504 0.522
esa1.Mean 0.028 0.301 0.066 0.089 0.024 0.161 0.076 0.388 0.144 0.177
gcn5.Mean 0.579 0.537 0.154 0.521 0.260 1.141 0.804 0.897 0.824 0.733
hos3.Mean 0.729 0.356 0.954 0.939 0.595 1.961 0.996 0.887 0.793 0.807
htb1.Mean 0.276 0.495 0.476 0.843 0.414 1.370 0.121 0.721 0.560 0.644
ino80.Mean 0.075 0.439 0.300 0.607 0.014 1.092 0.341 0.558 0.330 0.447
rpd3.Mean 0.387 0.492 0.189 0.721 0.166 1.456 0.931 0.874 0.694 0.827
WT.Var 0.046 0.077 0.083 0.074 0.050 0.183 0.012 0.012 0.022 0.014
ctk1.Var 0.065 0.024 0.039 0.028 0.007 0.080 0.018 0.012 0.043 0.020
esa1.Var 0.015 0.029 0.023 0.029 0.013 0.043 0.016 0.014 0.026 0.028
gcn5.Var 0.121 0.059 0.060 0.031 0.044 0.233 0.013 0.013 0.033 0.011
hos3.Var 0.166 0.083 0.140 0.136 0.060 0.459 0.042 0.032 0.166 0.036
htb1.Var 0.114 0.028 0.137 0.042 0.047 0.183 0.026 0.012 0.029 0.022
ino80.Var 0.018 0.015 0.039 0.036 0.001 0.068 0.024 0.010 0.043 0.017
rpd3.Var 0.100 0.052 0.035 0.032 0.043 0.146 0.018 0.009 0.030 0.012
69
Table S2.11 Phenotypic variance and the number of identified genetic effects.
This is a summary table showing the phenotypic mean and variance in the ctk1Δ, esa1Δ, gcn5Δ,
hos3Δ, htb1Δ, ino80Δ, rpd3Δ, and wild type populations across 10 environments. In addition, the
numbers of genetic effects with reduced (Red) and enhanced (Enh) effects in the knockout
background in which the genetic effect was identified relative to wild type background are shown.
70
Primer Name Primer Sequence
ASF1 Diagnostic F GTGCCACACCTAACCTTCGA
ASF1 Diagnostic R CCGCATCCTTTGGAGTGGAT
ASF1 MX F CTCGAAAGTGTAACAGCGTACTCTCCCTACCATCCAATTGAAACATAA
GATATAGAAAAGCCTTGACAGTCTTGACGTGC
ASF1 MX R ATACATTTTATAAAGTGTACCTCTCTTGCAGGTACCATTAATCTTATAA
CCCATAAATTCCGCACTTAACTTCGCATCTG
CHD1 Diagnostic F CCTCAGCCCATCAATGCGTA
CHD1 Diagnostic R GTATCCAACCAGGCAGGCAT
CHD1 MX F ATTTCTTTAAACCTATACCCAATTCAAAGCAGAACCTTTTCTAATTTAA
TTCTCACTTAT CCTTGACAGTCTTGACGTGC
CHD1 MX R AATACGTTTATAGTTATGGGGGGAAGGAACAATGGAAAATGTGGTGA
AGAAAAATTGTTT CGCACTTAACTTCGCATCTG
CTK1 Diagnostic F CAACTCTTCGACAACTGCGC
CTK1 Diagnostic R TGCTCCTTTGCTGCGTTAGA
CTK1 MX F AGCACTATTCTTTGCACTAGAATAACACAGGGACCATACAGCATAAAT
TATTTGGTAACACCTTGACAGTCTTGACGTGC
CTK1 MX R GTAATAAATAAGTTATTAATCTATTTTTTGTGTCTACTTATTTCAATTG
GCTATATATCCCGCACTTAACTTCGCATCTG
CYC8 Diagnostic F TCCACCGTAGAACCCAAAGC
CYC8 Diagnostic R TAATTGGCGCACAGGAACCT
CYC8 MX F GCTACACAACATTTCTCGTTGATTATAAATTAGTAGATTAATTTTTTGA
ATGCAAACTTTCCTTGACAGTCTTGACGTGC
CYC8 MX R TACAACTACAACAGCAACAACAACAAACAAAACACGACTGGAAAAAA
AAAATTAGGAAAACGCACTTAACTTCGCATCTG
DOT1 Diagnostic F TGGATGAAAGAGCTCTGGCA
DOT1 Diagnostic R AGACCAGGCAGCTGTATTCA
DOT1 MX F AATGGGCGGTCAAGAAAGTATATCAAATAATAACTCAGACTCATTCAT
TATGTCGTCCCCCCTTGACAGTCTTGACGTGC
DOT1 MX R GTGTACATGTTATTTCTACTTAGTTATTCATACTCATCGTTAAAAGCCG
TTCAAAGTGCCCGCACTTAACTTCGCATCTG
EAF3 Diagnostic F TTGCTGCGTCAGAGGGTTAA
EAF3 Diagnostic R TTGCTGCGTCAGAGGGTTAA
EAF3 MX F ATTGCATAAATACGGAAGAACTAAATACTAGAAATAATCCCAAGCTA
GAATATAAACGTCCCTTGACAGTCTTGACGTGC
EAF3 MX R CATTTTGCATTAGCATCTGTGAGGCCTCGTCACTGGATTTACCCTATTG
AAGAACGTATACGCACTTAACTTCGCATCTG
ELP3 Diagnostic F ATCCAAATGACTTCTTATTT
ELP3 Diagnostic R TATACAGCGATAAGACAGTG
ELP3 MX F ATTTAAATTTCTGCTTGGAAAACCGGCCATGTCGGCGGCACATAAAAG
TTCTATTTACCT CCTTGACAGTCTTGACGTGC
ELP3 MX R GTTTTGAAATAAACAAGTCCTAAAAGCACCTAAGGAAAATCGAAGAA
CACCCTGACAAAG CGCACTTAACTTCGCATCTG
ESA1 Diagnostic F GGCCTCTAAATCCTGGCAAT
ESA1 Diagnostic R AGTGCCTGTGTTTGCATTTG
ESA1 MX F TTACCATTCTTTAGACGCTTCCTGTGCTACCATTCTCGGAAATACTGCA
AGAAATCATCG CCTTGACAGTCTTGACGTGC
ESA1 MX R TATTTAAAGCTTTTACATTAGAAGTTGTTTGAATGTAAGTTTAGGAAAG
CACTACATAGC CGCACTTAACTTCGCATCTG
71
GCN5 Diagnostic F GCACACAAGATGACGCTTCC
GCN5 Diagnostic R CTCGCCATTGTACATTCGGC
GCN5 MX F AAGGGAAGACCGTGAGCCGCCCAAAAGTCTTCAGTTAACTCAGGTTCG
TATTCTACATTA CCTTGACAGTCTTGACGTGC
GCN5 MX R CGTACTAAACATTTATTTCTTCTTCGAAAGGAATAGTAGCGGAAAAGC
TTCTTCTACGCA CGCACTTAACTTCGCATCTG
GIS1 Diagnostic F TGGCGTTTTGTGCTTCAAGG
GIS1 Diagnostic R TGTTTGCGGACGGTATGGAA
GIS1 MX F AACAACATCGTTGTAATTTTTTTTTTTTAATTTGAAGAATAGCTACAAA
AACAGACTACACCTTGACAGTCTTGACGTGC
GIS1 MX R TACAGGAAAATATTCGATAAAAATTTTTTTTGAACCCATTTTGTATATC
ATTTTCTTGACCGCACTTAACTTCGCATCTG
HAT1 Diagnostic F CAAACTGTTTCATGTTAGTG
HAT1 Diagnostic R TAAATTCTTGATCAAATACG
HAT1 MX F TTCTGGAATTGTTTTCAGCAAAATTATGCTTAAGCTATAACTATAGTGA
GAATCAAGAAT CCTTGACAGTCTTGACGTGC
HAT1 MX R GAATTTCTTATTTCAGGCTTGTTAAACAAATAAATATGTTATTATATAT
TTAATAAACAG CGCACTTAACTTCGCATCTG
HAT2 Diagnostic F CTCTTTAACGGCGCCTCAAG
HAT2 Diagnostic R CACGATGGGTAGACTATGGGA
HAT2 MX F TATCTCTCCTATCAATTGTGGTTAGCCTAGTAGTCACCAAATAGCAAAT
TACCAATCAAC CCTTGACAGTCTTGACGTGC
HAT2 MX R TGTATGTTGATCTTTGTTTAATTACGCCTTTTCGCCAAAGAAACAATAA
AAAAACTATAT CGCACTTAACTTCGCATCTG
HDA1 Diagnostic F TTATATTTCCAACACGAATCGAGAACT
HDA1 Diagnostic F TTATATTTCCAACACGAATCGAGAACT
HDA1 MX F ACATAACAAAATATTGAGAAAGGGAAAGTTGAGCACTGTAATACGCC
GAACAGATTAAGCCCTTGACAGTCTTGACGTGC
HDA1 MX R ATTCAACTTTCATAAGGCATGAAGGTTGCCGAAAAAAAATTATTAATG
GCCAGTTTTTCCCGCACTTAACTTCGCATCTG
HHF1 Diagnostic F CAAATTATTCCATCATTAAA
HHF1 Diagnostic R AGTCAAGGAGAGATATTACG
HHF1 MX F CAGTTGAATACGAATCCCAAATATTTGCTTGTTGTTACCGTTTTCTTAG
AATTAGCTAAA CCTTGACAGTCTTGACGTGC
HHF1 MX R TGTACTCTATAGTACTAAAGCAACAAACAAAAACAAGCAACAAATAT
AATATAGTAAAAT CGCACTTAACTTCGCATCTG
HHF2 Diagnostic F AAAAGAACAAGAAAAAGATT
HHF2 Diagnostic R CAACAGATAAATGATGACAA
HHF2 MX F TCTTTTTTCCTACATCTTGTTCAAAAGAGTAGCAAAAACAACAATCAAT
ACAATAAAATA CCTTGACAGTCTTGACGTGC
HHF2 MX R TTTTATTTTTTGAAAGGCATGAAAATAATTTCAAACACCGATTGTTTAA
CCACCGATTGT CGCACTTAACTTCGCATCTG
HHO1 Diagnostic F CCACGTCGTGAACAGACAGT
HHO1 Diagnostic R GCTGTTTGCTTTGATGAAATGCT
HHO1 MX F AAGAAAATAGGTTTGATAGTATTGCTATCACCATTGACATTCTCGTTTG
GATATTCACTT CCTTGACAGTCTTGACGTGC
HHO1 MX R TTATGGGCACCTGATAATGCTTGGCAGCGAGGGAAGCAATTATAATAC
AACTAAAGCAAC CGCACTTAACTTCGCATCTG
HHT1 Diagnostic F AAGCGCTCGGAACAGTTTTA
72
HHT1 Diagnostic R GACACCCCAACCTACTCCAA
HHT1 MX F TATATTCTTTCTTTCTAGTTAATAAGAAAAACATCTAACATAAATATAT
AAACGCAAACA CCTTGACAGTCTTGACGTGC
HHT1 MX R TGATTTATATTTTATTGTGTTTTTGTTCGTTTTTTACTAAAACTGATGAC
AATCAACAAA CGCACTTAACTTCGCATCTG
HHT2 Diagnostic F TGATTGGTTGTATAAGAAAA
HHT2 Diagnostic R TAAGTAACAGAGTCCCTGAT
HHT2 MX F TGTTTGTATGATGTCCCCCCAGTCTAAATGCATAGAAAAAAAAAAATT
CCCGCTTTATAT CCTTGACAGTCTTGACGTGC
HHT2 MX R ACTTTGGCCCTTCCAACTGTTCTTCCCCTTTTACTAAAGGATCCAAGCA
AACACTCCACA CGCACTTAACTTCGCATCTG
HIR1 Diagnostic F CGCACTTACTCGATCCTGCT
HIR1 Diagnostic R AGCGGGATCAAAAACAACGC
HIR1 MX F ATGAAAGTGGTAAAGTTTCCATGGTTGGCTCACCGTGAAGAATCGCGA
AAATATGAAATACCTTGACAGTCTTGACGTGC
HIR1 MX R ATAAAATATAGACGTAATTATGAGGGAAAAAACTTGTCCAAAGGAAG
GGGTATAAGCTTACGCACTTAACTTCGCATCTG
HMT1 Diagnostic F TTGCTGCAATTCGGATGCTG
HMT1 Diagnostic R GACAGCCGTGAAAGATTCTGC
HMT1 MX F TATGAACAAGTTTGTTTATTTGCTTTTCAAATTTTTTTCTTTCTCCAGCA
AACAAAAGTC CCTTGACAGTCTTGACGTGC
HMT1 MX R CTGCTCACCTTGCCGTTTCCAAAAAAGAGTTAGAACCGACAAATTCAT
CCAAAGAAAATA CGCACTTAACTTCGCATCTG
HOS1 Diagnostic F AGCCGTTGATCACACCGAAA
HOS1 Diagnostic R AAACAGAGGGCAAGCGGATT
HOS1 MX F TTACAGTTCGTAAAACTTCATAAGTTCGACCATATCTCTATCCTTATTG
TCATTCCTTAACCTTGACAGTCTTGACGTGC
HOS1 MX R AAAAAGGTGTATGTACTGTAATATGAATTAATAAACACCTGTCCATTT
TAGAAAAACGCTCGCACTTAACTTCGCATCTG
HOS2 Diagnostic F TCGAATACGGAGTGCACCTT
HOS2 Diagnostic R TCTCGATGTTCTTTGGGGCA
HOS2 MX F GAAAATAAAAAAAAAAAAAAAAAAAAAAACGGGAGATTAACCGAAT
AGCAAACTCTTAAA CCTTGACAGTCTTGACGTGC
HOS2 MX R AAGACGCCAGATTACTCAAGTACGTTAAAATCAGGTATCAAGTGAATA
ACAACACGCAAC CGCACTTAACTTCGCATCTG
HOS3 Diagnostic F AGGGAAAAGAATCGGCTGGG
HOS3 Diagnostic R GTCCCATCACCGTGGTGTAG
HOS3 MX F ACTGAAAATATAACGAAAAAAAGGGCTCTGGAAGTAAACAGAGAAAT
TCGACGATATAATCCTTGACAGTCTTGACGTGC
HOS3 MX R TCACCATCTTCCACCACTTCTTGTTGTATGTTTTCTTGAAACATGAGAA
ATCATTGATATCGCACTTAACTTCGCATCTG
HST1 Diagnostic F TGGAATCGTTGCTGGGTCAG
HST1 Diagnostic R GCCAGTGGAAGTCAACTCGA
HST1 MX F TTACTGTTGTTTCTTTCGTGGCTGTTTCTTAATCTTATACGTACCTTTAT
CTATTTCCGT CCTTGACAGTCTTGACGTGC
HST1 MX R TGGTAGTGATACGAACACTTCTCTTCTTTTTTGTTGTTTTTGTGAGAAA
AAAAAATCTAA CGCACTTAACTTCGCATCTG
HTA1 Diagnostic F GGTTTCTTTTCAGCTGGGGC
HTA1 Diagnostic R TCAACATGTCACCAGTGGCC
73
HTA1 MX F TTATTTCTCAGTGAATAAACAACTTCAAAACAAACAAATTTCATACAT
ATAAAATATAAA CCTTGACAGTCTTGACGTGC
HTA1 MX R TAGTTACAATGGAGAAGCAGTTTAGTTCCTTCCGCCTTCTTTAAAATAC
CAGAACCGATC CGCACTTAACTTCGCATCTG
HTA2 Diagnostic F GCGCCTTCTATTCCGGAGAA
HTA2 Diagnostic R TGACGGCAAGTGTCTCACTG
HTA2 MX F TACTTTAAAACCCCAAATGACAAGAATGTTTGATTTGCTTTGTTTCTTT
TCAACTCAGTT CCTTGACAGTCTTGACGTGC
HTA2 MX R ATTCTTGTCTTTTTACATAAGAATTAGGAAAGTACAGAACAAGAGCAA
ATTTAATATATA CGCACTTAACTTCGCATCTG
HTB1 Diagnostic F ACGATCCAGTCAGCGACATC
HTB1 Diagnostic R AGCCGAAAAGAAACCAGCCT
HTB1 MX F TTAATTTTTATATACCCATATAAATAATAATATTAATTATAACCAAAGG
AAGTGATTTCACCTTGACAGTCTTGACGTGC
HTB1 MX R TTATATTAAATTTATCCTATATAGACAAGTCAAACCACAAATAAACCA
TACACACATACACGCACTTAACTTCGCATCTG
HTB2 Diagnostic F ATGGCCCCCAGGTTAATGTG
HTB2 Diagnostic R ACAGCCCTAGTACCTTCGGA
HTB2 MX F TCTTCTTGTTAATTTTTTCTGATTGCTCTATACTCAAACCAACAACAAC
TTACTCTACAACCTTGACAGTCTTGACGTGC
HTB2 MX R TATAAAAAATGCCACTAATAAAAAGAAAACATGACTAAATCACAATA
CCTAGTGAGTGACCGCACTTAACTTCGCATCTG
HTZ1 Diagnostic F CGTTCGTGGAACAGTGAGGA
HTZ1 Diagnostic R GCAACGCACAAAGCTTCGTA
HTZ1 MX F ATAGAATGAGGATACAGGAGCAGGGAGAATTACGGGAAATGGGAAA
GAAAAACTATTCTTCCTTGACAGTCTTGACGTGC
HTZ1 MX R GAAAAAATATCGTTAAATTCAATTTCGCACTATAGCCGCACGTAAAAA
TAACTTAACATACGCACTTAACTTCGCATCTG
INO80 Diagnostic F AGCTTTGGAAAACGGCGAAA
INO80 Diagnostic R ATACTGGATGAGGCCCAAGC
INO80 MX F AGCAGATTAAAGATAGACATTAACTCCGCTTAATGTAAATAACACAAT
ATGAATACCTTTCCTTGACAGTCTTGACGTGC
INO80 MX R ACCGATCCTGTCCATATTAGCAAAGCAAGGCTTAAGACATATAGAAGA
GCATTTATAGACCGCACTTAACTTCGCATCTG
ISW1 Diagnostic F CGGCCGGCCATATCTAGAC
ISW1 Diagnostic R CAACCCCACCAAACGTGAAC
ISW1 MX F TTTTCTTCAGAAGCATGGTGTAGGATATATTAAAAAAAATCGAAATAT
AAAAAAAGAAGG CCTTGACAGTCTTGACGTGC
ISW1 MX R AGAGTCCATATGTATAGCTATGCAAAAACCAGCTAGAGGTGGATGTAG
AAATACCCTATT CGCACTTAACTTCGCATCTG
JHD1 Diagnostic F CCTGTGGTGGCATTCATCTTG
JHD1 Diagnostic R1 GTGATTGAACGTCCATTACATCA
JHD1 MX F ATTATAATGAGTAAGAAGACGTAATGATCATAAAACAAAATACTAAT
AAGCTATGGTGCACCTTGACAGTCTTGACGTGC
JHD1 MX R TTACGAAATAAGATCGGCTAAAGCTTTCGCTAAATGCTTCTTTGAAGT
GAAATTTAACGGCGCACTTAACTTCGCATCTG
JHD2 Diagnostic F GTGGAGTTCAAGGCTTGAGC
JHD2 Diagnostic R TGTGCATGTTGACAACGCAG
74
JHD2 MX F GGCGTGCTAAATGCCAAGTATTATTCTAAAAAATCATTACGCCATACA
CAAATATTGAAGCCTTGACAGTCTTGACGTGC
JHD2 MX R CAATTTTACCTCTAGATCATATTAACTAATCTCATCTTGCACAAAAAAC
GTATCACTATCCGCACTTAACTTCGCATCTG
KanMX check F CAGATGCGAAGTTAAGTGCG
KanMX check R GCACGTCAAGACTGTCAAGG
NAT4 Diagnostic F GTATGCCTGGAGTATGGCAGT
NAT4 Diagnostic R AAGCGCCTCATATAGTCGCC
NAT4 MX F CGCGCCATTAAAAGATTTTTTTCTTAGCTCTTTTTCTTTTTTCTTTTTCTT
TCCACTGAG CCTTGACAGTCTTGACGTGC
NAT4 MX R GCCGCGTACGTAGTTTTACTTTTAATTTTTTTTTTATCGCGCGTTGTCCC
TGTCGGCTTT CGCACTTAACTTCGCATCTG
RLF2 Diagnostic F GCTGGATCAAGTGGTTCCCT
RLF2 Diagnostic R GCACGCCTTTGTCTTTCCTC
RLF2 MX F CAGAGAATTATATGTTTTAGTGAACCTCAAGACAGAAGAGAATCGAA
AGGAAAAGGGAAACCTTGACAGTCTTGACGTGC
RLF2 MX R TGTATACCAATAAATAATCAGTTTATCTGTATGTTTCTATATACTAAAG
ATCCGTTCAAGCGCACTTAACTTCGCATCTG
RPD3 Diagnostic F ACTGGTTTTGTACAGCGCTG
RPD3 Diagnostic R ACTGGTTTTGTACAGCGCTG
RPD3 MX F TTCACTTTTCTTCTTTTGTTTCACATTATTTATATTCGTATATACTTCCA
ACTCTTTTTTCCTTGACAGTCTTGACGTGC
RPD3 MX R GGTTCATAAAACAATTGCGCCATACAAAACATTCGTGGCTACAACTCG
ATATCCGTGCAGCGCACTTAACTTCGCATCTG
RPH1 Diagnostic F CATCGCCATGCAATTAATCA
RPH1 Diagnostic R ACCACATGGCAGATGGTTTT
RPH1 MX F AAAAAAAAGGGAAAAAAAGAATAAGACTGTCTTGGTGAGGATATTCA
GTTGCGTGAAATC CCTTGACAGTCTTGACGTGC
RPH1 MX R CGAGCACATTTTAAGAGCCTTCAAAATGAGAGATCTCGGTAAACAACT
GGCAATCGTGAG CGCACTTAACTTCGCATCTG
RSC4 Diagnostic F CGGAAGACAAGTAGCCCAAGA
RSC4 Diagnostic R CGCCGCTGTAAAACTATCGC
RSC4 MX F ACCCTCAGCCTGTTATTACAATAGAGAGCAAACCAAAGAATAATAGAT
AAAGTACACAGACCTTGACAGTCTTGACGTGC
RSC4 MX R ATATAGGTTGTATATAGATACATGCATATGATGGGAAGACTATGAAGA
GAGAGATAGTCACGCACTTAACTTCGCATCTG
RTT109 Diagnostic
F
TCGAGGTGTTTCGTCATCGG
RTT109 Diagnostic
R
GATGTTGCTTGCAGGAACCG
RTT109 MX F TTTGTCAATAGAGTTGTCCAGTAGAGTTAAAAGGTCAATTCAACCGGT
CTTCAATAAGAC CCTTGACAGTCTTGACGTGC
RTT109 MX R CATGCATTTTCTAAGATCGATGCTACATACGTGTACTAAATAATAAAT
ATCAATATGTAT CGCACTTAACTTCGCATCTG
SAS2 Diagnostic F CATCGAAAAACCGGCCCAAA
SAS2 Diagnostic R TACACGGATGATCAGACGCG
SAS2 MX F CATCGAGCGATATTCTATCCTGAAATACATATGCCATTAAGTTACATCC
TGAATAGATTCCCTTGACAGTCTTGACGTGC
75
SAS2 MX R ATTTTTTGATATTGGAGGCTCCTATTTTCTAGTTGCTTTTTGTTTTCACT
CGCAAAAAAACGCACTTAACTTCGCATCTG
SAS3 Diagnostic F ACCATTTGTCGCCGCTAGAA
SAS3 Diagnostic R TCGCCATAAAACACGCTCAT
SAS3 MX F CTTATTGCTATTAATAATGTTACATGTATATGCTTATATCCAATATATA
CCCATCGCCGCCCTTGACAGTCTTGACGTGC
SAS3 MX R AAAAGCATTGCTATTCTTTTCTCATAGGTGTTATTCATACCGCCCTCTC
TCTTCTTCCTTCGCACTTAACTTCGCATCTG
SET1 Diagnostic F AAATATCCGCTCGACCAGTCC
SET1 Diagnostic F AAATATCCGCTCGACCAGTCC
SET1 MX F CTAGCATAGGTAACATTCCTTATTTGTTGAATCTTTATAAGAGGTCTCT
GCGTTTAGAGACCTTGACAGTCTTGACGTGC
SET1 MX R TTTGCTGGAAAGCAACGATATGTTAAATCAGGAAGCTCCAAACAAATC
AATGTATCATCGCGCACTTAACTTCGCATCTG
SET2 Diagnostic F ACTAGTCAACGACGCTGACC
SET2 Diagnostic R GCCTGGCGCTTTAGACTCTA
SET2 MX F ACAAGACTTCCTTTGGGACAGAAAACGTGAAACAAGCCCCAAATATG
CATGTCTGGTTAACCTTGACAGTCTTGACGTGC
SET2 MX R AAAACTGCATAGTCGTGCTGTCAAACCTTTCTCCTTTCCTGGTTGTTGT
TTTACGTGATCCGCACTTAACTTCGCATCTG
SET3 Diagnostic F GCACTAGGCACCAGTGTTGA
SET3 Diagnostic R TCCTTAATCGAAATAATGGTCCAAAA
SET3 MX F TGAATATTCACTTTTGAATATACTTAAGTTTATATAGGTGTAAGAAGG
AAATGTCCATGTCCTTGACAGTCTTGACGTGC
SET3 MX R GATTTAAAGCGTATATACAACAGTTTTAGATCGTACTTCACAAAATAC
GAGAACTGAATCCGCACTTAACTTCGCATCTG
SET4 Diagnostic F ACATGTAGAGGTCACCGGGA
SET4 Diagnostic R GGAGTCACTTGAACCGCAGA
SET4 MX F AACGCCGGAATAAGATTGGTACCCTCGTCAGAAAGTTACAAATACCGC
TTCATCTTCAAACCTTGACAGTCTTGACGTGC
SET4 MX R ATGATAAAATTAAGCTTTCAAAAAAGATTAAAATGAATACTATTAATT
TTAAAATTTCGTCGCACTTAACTTCGCATCTG
SIR2 Diagnostic F CTTAACACATTTAAACCATG
SIR2 Diagnostic R AGCTCTAATTTGAAAGAAAT
SIR2 MX F TTGCCATACTATGTAAATTGATATTAATTTGGCACTTTTAAATTATTAA
ATTGCCTTCTA CCTTGACAGTCTTGACGTGC
SIR2 MX R AGGCATCGCTTCGGTAGACACATTCAAACCATTTTTCCCTCATCGGCA
CATTAAAGCTGG CGCACTTAACTTCGCATCTG
SNF1 Diagnostic F GCAAAAGGATGGGCGTGATG
SNF1 Diagnostic F2 GCAAAAGGATGGGCGTGATG
SNF1 MX F GAAAGAAATAGAAGTTTTTTTTTGTAACAAGTTTTGCTACACTCCCTTA
ATAAAGTCAACCCTTGACAGTCTTGACGTGC
SNF1 MX R AAATACGTTACGATACATAAAAAAAAGGGAACTTCCATATCATTCTTT
TACGTTCCACCACGCACTTAACTTCGCATCTG
SNF2 Diagnostic F TGTGTTGCTAGCAGGGTGTT
SNF2 Diagnostic R CTTGATTATGTCCGCACGCC
SNF2 MX F GAGGGATTAATGTTTGTCTACGTATAAACGAATAAGTACTTATATTGC
TTTAGGAAGGTACCTTGACAGTCTTGACGTGC
76
SNF2 MX R ATGAACATACCACAGCGTCAATTTAGCAACGAAGAGGTCAACCGCTG
CTATTTAAGATGGCGCACTTAACTTCGCATCTG
SNF5 Diagnostic F CAGGTGCTTGAAGGAGGGAG
SNF5 Diagnostic R ACTGTTGTTGCTGTTGTCGC
SNF5 MX F AAACACCAAAACAAAGCATCATCAAGGGAACATATAGTAAAGAACTA
CACAAAAGCAACACCTTGACAGTCTTGACGTGC
SNF5 MX R TACAAATTCTTCCACGGTTATTTACATCTCCGGTATATTTTATATATGT
GTATATATTTTCGCACTTAACTTCGCATCTG
SWR1 Diagnostic F ACCCTTGTCTTTTGCAGCCT
SWR1 Diagnostic R TGAACTGCGAACCTGGCTAG
SWR1 MX F TGAAAATTTATGAATTCTAACTGCTCTTTGCATTTTCCAAGTTATTGCA
TTACAAGAATACCTTGACAGTCTTGACGTGC
SWR1 MX R TCAATAATAATAACCGTTGGCAATAAACCTGATCATGTACTCGTCAAC
ATGGGCAGTGCCCGCACTTAACTTCGCATCTG
TUP1 Diagnostic F AAGCTCTCCCGTCAAAGCAA
TUP1 Diagnostic R AGGAAAAGGAGGGGAAGGGA
TUP1 MX F TTTTTTTGTCTTTTTTGATAAGCAGGGGAAGAAAGAAATCAGCTTTCCA
TCCAAACCAATCCTTGACAGTCTTGACGTGC
TUP1 MX R ATGAATTGAAGAATAGTTTAGTTAGTTACATTTGTAAAGTGTTCCTTTT
GTGTTCTGTTCCGCACTTAACTTCGCATCTG
Ctk1_ChrXI_check1
_F
CCGCATTTACTGCACATCC
Ctk1_ChrXI_check1
_R
GAAATTATACGCGGCAAGGA
Ctk1_ChrXI_check2
_F
GCACCACACTAGCCTTCGAT
Ctk1_ChrXI_check2
_R
CCAAAAGCAATCCAGGAAAA
Ctk1_ChrXI_check3
_F
TTGTGGATTTCGGAGAAAGG
Ctk1_ChrXI_check3
_R
AGGGAACACTCCGTCTGAAA
Ctk1_ChrXI_check_
control1_F
GCACCCGAAAAATTAACTTGA
Ctk1_ChrXI_check_
control1_R
TGGGATCAGAACAATCATTACAA
Gcn5_ChrIV_check
1_F
GAACTGGCTCTCCCACAGTC
Gcn5_ChrIV_check
1_R
GGACGAATCAACGGAGGGTT
Gcn5_ChrIV_check
2.1_F
TCCGAATTGCCCGAAAATGC
Gcn5_ChrIV_check
2.1_R
GACACCTTACGTTCTCGCCT
Gcn5_ChrIV_check
2.2_F
AGGGTAATTCATCCTTCCACCA
Gcn5_ChrIV_check
2.2_R
TCGAAAGCAAACTTGTGAAGGC
77
Gcn5_ChrIV_check
2.3_F
TGCTTTTTCCCATCCCTGCA
Gcn5_ChrIV_check
2.3_R
TCGCCGTTGTTCAAGACCTT
Gcn5_ChrIV_check
3_F
CCCAAACTGATTGCCAAGCT
Gcn5_ChrIV_check
3_R
TGGCACTAGCTGGTAGTAGC
Gcn5_ChrXII_check
1_F
TCGGCCCAAACACCAGATTT
Gcn5_ChrXII_check
1_R
CCCCTTTTCCTTCCTCGTCC
Gcn5_ChrIV_check
_control_F
CATGGTCCAAGGTTGCAAGT
Gcn5_ChrIV_check
_control_R
CCCGCCATCAATCATTGTGC
Gcn5_ChrXII_check
_control_F
GTGCCATTCAAGTGCCCAATT
Gcn5_ChrXII_check
_control_R
ACTTTCCCATCCGCATAAAGA
Table S2.12 List of all primers used in this study.
78
Supplementary Notes
Supplementary Note 2.1 Conditional essentiality of esa1Δ segregants. Esa1, the catalytic subunit of the
NuA4 HAT complex, is essential in the BY background, with esa1∆ spores only able to divide four or five
times following germination before dying
4
. However, we found that roughly 1% of the esa1Δ knockout
segregants generated from the ESA1 hemizygous diploids survive in our study. To examine for selection
on specific genotypes in the esa1Δ segregants, allele frequency was examined across all SNP markers (Fig.
S2.3a). All esa1∆ segregants carried the 3S genotype from positions 467,219 bp to 472,584 bp on
Chromosome XIV, which is centered on END3, an EH-domain containing protein that functions in
endocytosis and actin cytoskeleton formation. Two lines of evidence suggest that END3 is the causal gene
at this locus: end3∆ and a temperature sensitive allele of ESA1 were previously found to be synthetic lethal
in the BY background
5
, and END3 is a major contributor to trait variation in the BYx3S cross
2,3,6,7
. We
note that this chromosome XIV locus was also identified as having a large additive effect in other knockout
and wild type populations. We did not observe any other fixed regions in esa1∆ segregants. This implies
that conditional essentiality of ESA1 may depend on the cumulative effect of the chromosome XIV region
and many small effects variants.
Supplementary Note 2.2 All major results are robust to different significance thresholds. Significance
thresholds can affect the results and interpretations of genetic mapping studies, especially those focused on
genetic interactions. To address this possibility, we reiterated our work across a number of different False
Discovery Rate (FDR) thresholds (Methods). Although choice of threshold impacted the number of genetic
effects that were detected (Table S2.7), we found that all of the major results remain the same regardless of
threshold (Figs. S2.5 and S2.6; Table S2.8 and S2.9.). This implies that our main conclusions are robust to
threshold.
79
Supplementary Note 2.3 All major results are also robust to bias in allele combinations. In the paper, we
show that mutation-independent effects tend to be genetically simpler, while mutation-responsive effects
tend to be more genetically complex. This may be a technical artifact driven by allele frequency differences
between the different knockout and wild type versions of the BYx3S cross, which has the potential to cause
both false negatives and false positives. To examine this possibility, we excluded loci that show biased
individual or multi-locus allele frequencies from our analyses. After exclusion, we found that mutation-
responsive effects still involve more higher-order epistasis than mutation-independent effects (Table
S2.10). This difference was determined to be significant for both two-locus and three-locus mutation-
responsive effects using chi-square tests (p-value = 6.731 x 10
-19
and 5 x 10
-30
, respectively). Additionally,
similar to analyses done with different significance thresholds, we find that our conclusions remain
qualitatively the same when we include or exclude loci that exhibit allele frequency bias (Figs. S2.5 and 6;
Supplementary Table 2.8 and S2.9.). Thus, this implies that our major findings are not the results of
technical artifacts.
80
Chapter 3 Genetic basis of a spontaneous mutation’s
expressivity
This chapter is currently in review at Genetics.
3.1 Abstract
Genetic background often influences the phenotypic consequences of mutations, resulting in
variable expressivity. How standing genetic variants collectively cause this phenomenon is not
fully understood. Here, we comprehensively identify loci in a budding yeast cross that impact the
growth of individuals carrying a spontaneous missense mutation in the nuclear-encoded
mitochondrial ribosomal gene MRP20. Initial results suggested that a single large effect locus
influences the mutation’s expressivity, with one allele causing inviability in mutants. However,
further experiments revealed this simplicity was an illusion. In fact, many additional loci shape the
mutation’s expressivity, collectively leading to a wide spectrum of mutational responses. These
results exemplify how complex combinations of alleles can produce a diversity of qualitative and
quantitative responses to the same mutation.
81
3.2 Introduction
Mutations frequently exhibit different effects in genetically distinct individuals (or
‘background effects’)
1,2,30
. For example, not all people with the same disease-causing mutations
manifest the associated disorder or exhibit identical symptoms. A commonly observed form of
background effect among individuals carrying the same mutation is different degrees of response
to that mutation (or ‘variable expressivity’)
81
. Variable expressivity can arise due to a myriad of
reasons, including genetic interactions (or epistasis) between a mutation and segregating loci
2
,
dominance
2
, stochastic noise
82
, the microbiome
83
, and the environment
2
.
The role of epistasis in expressivity has proven especially difficult to study, in part because
natural populations harbor substantial genetic diversity, which can facilitate complex genetic
interactions between segregating loci and mutations
9,10,14,29,32,34,51,61,63,64,84–87
. Mapping the loci
involved in these interactions is technically challenging. However, controlled laboratory crosses
provide a powerful tool for identifying the loci that interact with particular mutations, giving rise
to background effects
9,32,51,85,86,88
.
In this paper, we use a series of controlled crosses in the budding yeast Saccharomyces
cerevisiae to comprehensively characterize the genetic basis of a mutation’s expressivity. We
focus on a missense mutation in MRP20, an essential nuclear-encoded subunit of the mitochondrial
ribosome
89
. This mutation occurred by chance in a cross between the reference strain BY4716
(‘BY’) and a clinical isolate 322134S (‘3S’), and was found to show variable expressivity among
BYx3S cross progeny. This presented an opportunity to determine how loci segregating in the
BYx3S cross individually and collectively influence this mutation’s expressivity.
3.3 Results
3.3.1 A spontaneous mutation increases phenotypic variance in the BYx3S cross
82
In the BY/3S diploid progenitor of haploid BYx3S segregants, a spontaneous mutation
occurred in a core domain of Mrp20 that is conserved from bacteria to humans (Fig. 3.1A, Fig.
S3.1, Table S3.1)
89,90
. This mutation resulted in an alanine to glutamine substitution at amino acid
105 (mrp20-105E) and showed variable expressivity among segregants carrying it. Specifically,
segregants with this mutation showed increased phenotypic variance relative to wild type
segregants when ethanol was provided as the carbon source, the condition used hereafter (Fig.
3.1B; Levene’s test, p = 5.9 x 10
-22
). Mutant segregants exhibited levels of growth ranging from
inviable to wild type, and fit a bimodal distribution that centered on 10% and 57% growth relative
to the haploid BY parent strain (bimodal fit log likelihood = 30; Fig. S3.2).
Figure 3.1 The mrp20-105E mutation occurred spontaneously, increasing phenotypic variance in
the BYx3S cross.
(A) A spontaneous mutation in a BY/3S diploid gave rise to a BYx3S segregant population in
which mrp20-105E segregated. (B) The mrp20-105E segregants exhibited increased phenotypic
variance and a bimodal distribution of phenotypes. Throughout the paper, blue and orange are used
to denote BY and 3S genetic material, respectively. All growth data presented in the paper are
measurements of colonies on agar plates containing rich medium with ethanol as the carbon source.
3.3.2 A large effect locus shows epistasis with mrp20-105E
83
Loci contributing to this variable expressivity should be detectable through their genetic
interactions with MRP20. To find such loci, we performed linkage scans for two-way epistasis
with MRP20. We identified a single locus on Chromosome XIV (ANOVA, interaction term p =
4.3 x 10
-16
; Fig. 3.2A). Individuals with XIV
BY
showed reduced growth among both MRP20 and
mrp20-105E segregants, but to a greater degree among the latter (Fig. 3.2B). The Chromosome
XIV locus explained 79% of the phenotypic variance among mrp20-105E segregants (ANOVA, p
= 3.2 x 10
-31
) and accounted for all observed cases of inviability (Fig. 3.2B).
84
Figure 3.2 Epistasis between MRP20 and MKT1 appears to mostly explain response to mrp20-
105E.
85
(A) Linkage mapping in the BYx3S segregants shown in Fig 1 identified a locus on Chromosome
XIV that exhibits a two-way genetic interaction with MRP20. (B) The Chromosome XIV locus
had effects in both MRP20 and mrp20-105E segregants but had a greater effect among mrp20-
105E segregants. (C) To identify the causal gene, we crossed two mrp20-105E F2 segregants that
differed at the Chromosome XIV locus and gathered a panel of F3 segregants. (D) Linkage
mapping in the F3 segregants identified the Chromosome XIV locus at high resolution, with a peak
at position 467,219. Tick marks denote every 100,000 bases along the chromosome. (E)
Recombination breakpoints in the F3 segregants delimited the Chromosome XIV locus to a single
SNP in MKT1 at position 467,219. Vertical dashed line highlights the delimited causal
polymorphism, while small vertical lines along the x-axis indicate different SNPs in the window
that is shown. (F) Engineering the BY allele into mrp20-105E XIV
3S
segregants changed growth
(left), while substitutions at the nearest upstream and downstream variants did not (right).
To further resolve the Chromosome XIV locus, we crossed an mrp20-105E XIV
BY
F2
segregant and an mrp20-105E XIV
3S
F2 segregant (Supplementary text 3.2; Fig. 3.2C, Table S3.1).
361 F3 progeny were genotyped by low-coverage whole genome sequencing and phenotyped for
growth. Linkage mapping with these data reidentified the Chromosome XIV locus at a p-value of
2.50 x 10
-43
(ANOVA; Fig. 3.2D, Fig. S3.3) and resolved it to a single SNP in the coding region
of MKT1 (Fig. 3.2E). This SNP, which encodes a glycine in BY and a serine in 3S at amino acid
30, was then validated by nucleotide replacement in mrp20-105E segregants (Fig. 3.2F). Notably,
this specific SNP was previously shown to play a role in mitochondrial genome stability
91
,
suggesting epistasis between MRP20 and MKT1 involves mitochondrial dysfunction, impairing
growth on non-fermentative carbon sources such as ethanol.
3.3.3 Epistasis between MRP20 and MKT1 differs in cross parents and segregants
We attempted to validate the epistasis between MRP20 and MKT1 by introducing all four
possible combinations of the causal nucleotides at these two genes into haploid versions of both
BY and 3S (Fig. 3.3A). The mrp20-105E mutation affected growth in both parent strains
(ANOVA, p = 4.3 x 10
-24
and p = 4.0 x 10
-4
). However, the magnitude of the effect differed
between the two: mrp20-105E caused inviability in BY but had a more modest effect in 3S. In
86
addition, MKT1 influenced response to mrp20-105E in the 3S background (ANOVA, p = 0.01) but
not in the BY background (ANOVA, p = 0 .99).
Figure 3.3 Additional loci govern response to the mutation.
(A) We engineered all combinations of MRP20 and MKT1 into the BY and 3S cross parents.
Expected phenotypes are shown as shaded boxes denoting 95% confidence interval based on the
originally obtained segregant phenotypes. (B) We generated BY x 3S crosses in which all
segregants carried mrp20-105E. Two crosses were performed: one in which all segregants carried
MKT1
BY
and one in which all segregants carried MKT1
3S
. Tetrads were dissected and spores were
phenotyped for growth on ethanol. (C) Each of the new crosses showed increased growth that
extended from inviable to wild type, differing from the more qualitative bimodal phenotypes seen
among the original mrp20-105E MKT1 segregant populations. (D) Linkage mapping identified a
87
total of 16 loci that influenced growth. After four iterations of a forward regression, no additional
loci were identified. (E) Inviable segregants were present among all mrp20-105E MKT1 SAL1
genotype classes. (F) Aneuploid individuals with duplicated Chromosome II showed reduced
growth. Aneuploid individuals were not evenly detected across the different MKT1-SAL1 genotype
classes.
The phenotypic consequence of epistasis between MRP20 and MKT1 differed between
parent and segregant strains. Specifically, the phenotypes of BY mrp20-105E MKT1
3S
, 3S mrp20-
105E MKT1
BY
, and 3S mrp20-105E MKT1
3S
all differed from the expectations established by
BYx3S mrp20-105E segregants. These departures from expectation imply that additional
unidentified loci also influence response to mrp20-105E.
3.3.4 Fixation of mrp20-105E and MKT1 genotypes increases phenotypic variance
To enable the identification of other loci underlying response to mrp20-105E, we generated
two new BYx3S crosses (Fig. 3.3B, Table S3.1). In both crosses, the BY and 3S parents were
engineered to carry mrp20-105E. Furthermore, one cross was engineered so that both parents
carried MKT1
BY
and the other cross was engineered so that both parents carried MKT1
3S
. By
altering the parent strains in this manner, we increased the chance of detecting additional loci
contributing to the variable expressivity of mrp20-105E. From these engineered crosses, 749 total
segregants were obtained through tetrad dissection, genotyped by low-coverage genome
sequencing, and phenotyped for growth on ethanol.
The new crosses exhibited continuous ranges of phenotypes, in contrast to the bimodal
phenotypic distribution observed in the original segregants (Fig. 3.3C). In both the MKT1
BY
and
MKT1
3S
crosses, mrp20-105E segregants ranged from inviable to nearly wild type. The
distributions of phenotypes in the two crosses differed in a manner consistent with their MKT1
alleles, with the mean of the MKT1
BY
segregants lower than the MKT1
3S
segregants (t-test, p = 4.8
88
x 10
-34
). These data show that regardless of the MKT1 allele present, additional loci can cause
mrp20-105E to show phenotypic effects ranging from lethal to benign.
3.3.5 Many additional loci affect the expressivity of mrp20-105E
We used the new crosses to map other loci contributing to response to mrp20-105E.
Excluding MKT1, which explained 18% of the phenotypic variance in the new crosses, linkage
mapping identified 16 new loci (Fig. 3.3D, Fig. S3.4, and Table S3.2). We found no evidence for
genetic interactions among the loci (pairs and trios examined with fixed effects linear models,
Bonferroni threshold).
Of the new loci, the BY allele was inferior at 10 and superior at six. These loci individually
explained between 0.79% and 14% of the phenotypic variance in the new crosses. 13 of these loci
resided on a subset of chromosomes but were distantly linked: four on Chromosomes XII, three
on XIII, two on XIV, and four on XV. The three remaining loci were detected on Chromosomes
IV, VII, and XI.
Recombination breakpoints delimited the loci to small genomic intervals spanning one (12
loci), two (3 loci) or three (1 locus) genes (Table S3.3). These candidate genes functioned in many
compartments of the cell and implicated a diversity of cellular pathways and processes in the
expressivity of mrp20-105E (Table S3.4). Thus, the molecular basis of mrp20-105E’s expressivity
is complex.
3.3.6 The Chromosome XIV locus contains multiple causal variants
Among the newly detected loci, the largest effect (14% phenotypic variance explained)
was on Chromosome XIV. The position of maximal significance at this site was located two genes
away from the end of MKT1, with a 99% confidence interval that did not encompass the causal
variant in MKT1 (Table S3.3). Thus, the originally identified large effect Chromosome XIV locus
89
in fact represents multiple distinct closely linked nucleotides that both genetically interact with
MRP20 and occur in different genes (Fig. 3.3E).
The new locus on Chromosome XIV was delimited to two genes, one of which was SAL1,
encoding a mitochondrial ADP/ATP transporter that physically interacts with Mrp20. A SNP in
SAL1 that segregates in this cross was previously linked to increased mitochondrial genome
instability in BY
91
, suggesting it is likely also causal in our study. For this reason, we refer to this
additional Chromosome XIV locus as ‘SAL1’. We found no evidence for epistasis between MKT1
and SAL1 (ANOVA, p = 0.77).
Although the MKT1-SAL1 locus had a large effect, it explained a minority of the
phenotypic variance among mrp20-105E segregants in a model including all detected loci (32%
for MKT1-SAL1 vs. 36% for all other loci collectively). Thus, by enabling MKT1 and SAL1 to
segregate independently through genetic engineering and examining a large number of mrp20-
105E segregants with different MKT1-SAL1 genotypes, we observed a greater diversity of
mutational responses than was originally seen and detected many additional loci.
3.3.7 Aneuploidy also contributes to the expressivity of mrp20-105E
Despite the fact that the identified loci explain most of mrp20-105E’s expressivity, some
individuals exhibited unexpectedly poor growth (Fig. 3.3F). This finding led to the identification
of a Chromosome II duplication that reduced growth (ANOVA, 1.2 x 10
-48
). The aneuploidy was
common among mrp20-105E segregants, with a higher prevalence when MKT1
3S
was also present
(Fisher’s exact test, p = 1.5 x 10
-43
; Table S3.5). The Chromosome II aneuploidy was not seen
among wild type segregants. These data suggest that mrp20-105E increases the rate of
aneuploidization and that genetic variation in MKT1 influences the degree to which mrp20-105E
segregants duplicate Chromosome II. The aneuploidy’s contribution to phenotypic variation was
90
relatively minor, explaining 5% of phenotypic variance among mrp20-105E segregants in a model
also including all identified loci.
3.3.8 Multiple mechanisms underlie poor growth in the presence of mrp20-105E
Evidence suggests mitochondrial genome instability contributes to the variable
expressivity of mrp20-105E. First, mitochondrial genome instability is known to cause poor
growth on non-fermentative carbon sources, such as ethanol
92,93
. Second, the exact variants that
segregate in our cross at MKT1 and SAL1 were previously linked to mitochondrial genome
instability
91
. Third, both Mrp20 and Sal1 function in the mitochondria
89,94
. Fourth, two other
candidate genes in the newly detected loci encode proteins that function in the mitochondria (Table
S3.4).
To determine the role of mitochondrial genome instability in the variable expressivity of
mrp20-105E, we measured petite formation, a proxy for spontaneous mitochondrial genome loss
(Fig. 3.4) (28). In addition to MRP20 and mrp20-105E BY and 3S parent strains, 16 MRP20
segregants and 42 mrp20-105E segregants were examined. Despite causing reduced growth in both
parents, mrp20-105E only led to elevated mitochondrial genome instability in BY (t-test p = 0.013
in BY and p = 0.39 in 3S; Fig. 3.4A). Also, although mrp20-105E segregants exhibited increased
mitochondrial genome instability relative to MRP20 segregants (Wilcoxon rank-sum test p =
0.023), especially at lower levels of growth, a subset of inviable segregants did not show elevated
petite formation (Fig. 3.4B and C). These results suggest that mitochondrial genome instability
explains part, but not all, of response to mrp20-105E.
91
Figure 3.4 Mitochondrial genome instability partially underlies the expressivity of mrp20-105E.
We measured petite formation frequency, which estimates the proportion of cells within a clonal
population capable of respiratory growth. Higher petite frequency is a proxy for greater
mitochondrial genome instability. (A) We examined MRP20 and mrp20-105E versions of the BY
and 3S parent strains. For each, average values and 95% bootstrapped confidence intervals are
shown. BY showed elevated mitochondrial genome instability in the presence of mrp20-105E,
while 3S showed no change. (B) We examined 16 BYx3S MRP20 segregants. These segregants
were randomly selected and spanned the range of growth values for MRP20 segregants. (C) 45
BYx3S mrp20-105E segregants. Poorer growing segregants tended to exhibit higher mitochondrial
genome instability, though some exhibited wild type levels of mitochondrial genome instability.
The gray dashed line indicates the threshold used to call inviability.
3.3.9 Genetic underpinnings of mrp20-105E’s expressivity in segregants and parents
We determined the extent to which our identified loci explained phenotypic variability
among mutants. Modeling growth as a function of all identified loci and the aneuploidy accounted
for the majority (78%) of the broad-sense heritability among mrp20-105E segregants (ANOVA, p
= 5.2 x 10
-188
). Further, phenotypic predictions for segregants based on their genotypes were
strongly correlated with their observed phenotypes (r = 0.85, p = 4.4 x 10
-209
; Fig. 3.5A). These
results show that the variable expressivity of mrp20-105E is driven by many loci that collectively
produce a spectrum of mutational responses.
92
Figure 3.5 Detected loci quantitatively and qualitatively explain mutant phenotypes.
(A) We fit a linear model accounting for the effects of all detected loci and the aneuploidy on the
growth of mrp20-105E segregants. This model not only explained the growth of the new BY x 3S
mrp20-105E crosses generated in this paper, but also accurately predicted the phenotypes of the
mutant parents and previously generated segregants. (B) We examined growth relative to the sum
of detrimental alleles carried by a segregant. This relationship shows how collections of loci
produce a quantitative spectrum of phenotypes, including instances of qualitative phenotypic
responses. This relationship explains the full range of responses, from inviable to wild type growth,
across MKT1-SAL1
genotypes. The gray dashed line indicates the threshold used to call inviability.
Confirming this point, the model was also effective for other genotypes that were not
present in the new crosses, but had been generated throughout the course of this work. For instance,
the model accurately predicted the phenotypes of the original mrp20-105E segregant population (r
= 0.90, p = 1.6 x 10
-39
), as well as the phenotypes of cross parents engineered to carry mrp20-105E
(Fig. 3.5A). Moreover, the model explained both qualitative and quantitative variation within and
between the two Chromosome XIV classes that were originally seen among mrp20-105E
segregants.
93
Finally, we examined how diverse combinations of loci collectively produced similar
phenotypic responses to mrp20-105E. We examined the relationship between growth and the total
number of detrimental alleles carried by mrp20-105E segregants, keeping track of each
individual’s genotype at MKT1 and SAL1, the largest effect loci (Fig. 3.5B). The number of
detrimental alleles carried by a segregant showed a strong negative relationship with growth,
which was not observed in wild type segregants (Fig. S3.5). Further, regardless of genotype at
MKT1 and SAL1, the effect of mrp20-105E ranged from lethal to benign in a manner dependent
on the number of detrimental alleles present at other loci. These findings demonstrate that many
segregating loci beyond the large effect MKT1-SAL1 locus influence the expressivity of mrp20-
105E and enable different genotypes in the cross to exhibit a broad range of responses to the
mutation.
3.4 Discussion
We have provided a detailed genetic characterization of the expressivity of a spontaneous
mutation. Response to this mutation in a budding yeast cross is influenced by at least 18 genetic
factors in total, with the largest effect due to two closely linked variants. However, at least 15
additional loci segregate and jointly exert larger effects than the largest two. Different
combinations of alleles across these loci produce a continuous spectrum of mutational responses.
Due to tight linkage between MKT1 and SAL1 in the original cross parents, the full extent of this
continuum was not originally observed, leading to an initial understanding of the expressivity of
the mrp20-105E mutation that was simplistic.
These findings also show how quantitative variation in mutational response can produce
seemingly discrete outcomes. In part, whether responses appear qualitative depends on the
configuration of mutationally responsive alleles in examined mutants. Approaches such as
94
crossing of genetically engineered strains can be used to disrupt these configurations that mask the
full extent of variation. However, another part of this expressivity is the tolerance of a system to
quantitative variation in key processes, for example mitochondrial genome stability in the case of
mrp20-105E. Our data suggest that these processes can only tolerate quantitative variation to a
point, but also indicate that lethality to the same mutation may arise in different genetic
backgrounds due to impairment of distinct cellular processes.
Our results inform efforts to understand expressivity in other systems, including humans.
For example, there is interest in determining why people who carry highly penetrant alleles known
to cause disease do not develop pathological conditions
30,95,96
. Such resilience, as observed here,
may involve numerous loci. This speaks to the complicated and unexpected epistasis that can arise
between mutations and segregating loci in genetically diverse populations
9,10,14,29,32,34,51,61,63,64,84–
87
. It also illustrates the importance of characterizing epistasis
7,8,11,24,41,59,97–100
, including
background effects, as these forms of genetic interactions are immediately relevant to evolution
and disease, and may not emerge from studies that do not directly interrogate natural variation in
genetically diverse populations.
3.5 Methods
3.5.1 Generation of segregants
The haploid BYx3S segregants in which mrp20-105E was identified were the hos3∆ F 2 segregants
generated and described in Mullis et al
85
. In brief, a BY MATa can1∆::STE2pr-SpHIS5 his3∆ strain was
mated to a 3S MATα ho::HphMX his3∆::NatMX strain to generate a wild type BY/3S diploid. PCR-
mediated, targeted gene disruption was then used to produce a BY/3S HOS3/hos3∆::KanMX strain. Both
the wild type and hemizygous deletion strains were sporulated, and random BYx3S MATa spores were
obtained from each using the magic marker system with plating on His- plates containing canavanine
21
.
95
Following discovery of the mrp20-105E mutation, we performed tetrad dissected of this diploid to obtain
mrp20-105E segregants in both HOS3 and hos3∆ genetic backgrounds.
To produce haploid mrp20-105E F 3 segregants, we deleted URA3 from a BYx3S F 2 MATa
can1∆::STE2pr-SpHIS5 his3∆ hos3∆::KanMX mrp20-105E
XIV
3S
segregant. We then mating type
switched the strain by transforming it with a URA3 plasmid containing an inducible HO endonuclease,
inducing HO, and plating single cells. The mating type-switched BYx3S F 2 MATα can1∆::STE2pr-SpHIS5
his3∆ ura3∆ hos3∆::KanMX mrp20-105E
XIV
3S
segregant was then mated to a BYx3S F 2 MATa
can1∆::STE2pr-SpHIS5 his3∆ hos3∆::KanMX mrp20-105E XIV
BY
segregant.
The resulting diploid was
sporulated and random segregants were obtained by plating on His- media.
To obtain additional haploid mrp20-105E MKT1
BY
and MKT1
3S
F 2 segregants, we engineered
mrp20-105E, as well as the 3S and BY causal variants at MKT1 position 467,219 into BY and 3S,
respectively. BY mrp20-105E was independently mated to 3S mrp20-105E MKT1
BY
twice. Two resultant
diploids were sporulated and tetrads were dissected to obtain BYx3S mrp20-105E MKT1
BY
haploid
segregants. The same process was followed with BY mrp20-105E MKT1
3S
and 3S mrp20-105E strains to
obtain BYx3S mrp20-105E MKT1
3S
haploid segregants.
3.5.2 Genotyping
F 2 segregants shown in Fig 1 and Fig 2 A-B were previously genotyped in Mullis et al. using the
same techniques described below
85
. In this paper, F 3 segregants and all remaining F 2 segregants shown in
Figs 1 and 3-5 were genotyped by low coverage whole genome sequencing. Freezer stocks of strains were
inoculated into liquid overnight cultures and grown to stationary phase at 30°C. DNA was extracted using
Qiagen 96-well DNeasy kits (Qiagen P/N 69581). Sequencing libraries were prepared using the Illumina
Nextera Kit and custom barcoded adapter sequences. Segregants from each respective cross (361 F 3s and
872 F 2s) were pooled in equimolar fractions into three separate multiplexes, run on a gel, size selected, and
purified with the Qiagen Gel Extraction Kit. F 2 and F 3 segregants were sequenced by Novogene on Illumina
HiSeq 4000 lanes using 150 bp x 150 bp paired-end reads.
96
Sequencing reads were mapped against the S288C genome (version
S288C_reference_sequence_R64-2-1_20150113.fsa from the Saccharomyces Genome Database
https://www.yeastgenome.org) using BWA version 0.7.7-r44
77
. Samtools v1.9 was then used to create a
pileup file for each segregant
78
. For both BWA and Samtools, default settings were employed. Base calls
and coverages were gathered for 44,429 SNPs that segregate in the cross
10
. Low coverage individuals (<0.7x
average per site coverage) were removed from analyses. Diploid and contaminated individuals were
identified by abnormal patterns of heterozygosity or sequencing coverage, and were also excluded. For each
segregant, a raw genotype vector was determined by the percent of calls at each site for the 3S allele. We
then used a Hidden Markov Model (HMM) implemented in the ‘HMM’ package v 1.0 in R to correct each
raw genotype vector using the following probability matrices
7977
:
transitionProbabilitiy = matrix(c(.9999,.0001,.0001,.9999),2)
and
emissionProbability = matrix(c(.0.25,0.75,0.75,0.25),2).
Aneuploidies were identified based on elevated sequencing coverage at particular chromosomes
within each individual sample. This identified a chromosome II duplication event in a subset of BYx3S
mrp20-105E MKT1
BY
and BYx3S mrp20-105E MKT1
3S
segregants. The BY mrp20-105E MKT1
3S
x 3S
mrp20-105E cross had the highest prevalence (50%), and thus individuals from this cross were further
examined. We employed the normalmixEM() function from the mixtools library in R
101
to determine that
coverage on Chr II was bimodal and centered on 0.98 and 1.8 (log likelihood of 237). Posterior probabilities
were used to call aneuploid individuals which that had an average per site coverage of 1.5x or greater. This
threshold was also applied to other crosses to identify aneuploid individuals.
3.5.3 Phenotyping
Segregants were inoculated into rich media containing glucose (‘YPD’), which was comprised of
1% yeast extract (BD P/N 212750), 2% peptone (BD P/N: 211677), and 2% dextrose (BD P/N 15530).
Cultures were grown to stationary phase (two days at 30°C). Strains were then pinned onto YP + 2% agar
97
(BD P/N 214050) rich media containing ethanol (‘YPE’). The YPE recipe was 1% yeast extract (BD P/N
212750), 2% peptone (BD P/N: 211677), and 2% ethanol (Koptec P/N A06141602W). Plates were then
grown at 30°C for two days. Growth assays were conducted in a minimum of three replicates across three
plates. On each plate, a BY control was included. Plates were imaged with the BioRAD Gel Doc XR+
Molecular Imager at a standard size of 11.4 x 8.52 cm
2
(width x length) and imaged with epi-illumination
using an exposure time of 0.5 seconds. Images were saved as 600 dpi tiffs. ImageJ
(http://rsbweb.nih.gov/ij/) was used to quantify pixel intensity of each colony through the Plate Analysis
JRU v1 plugin (https://research.stowers.org/imagejplugins/zipped_plugins.html), as described in Matsui et
al.
80
. Growth values were normalized against the same plate BY control, then averaged across replicates to
produce a single growth value for each segregant.
3.5.4 Linkage mapping
Initial linkage mapping was conducted with F 2 segregants. Initial discovery of the spontaneous
mrp20-105E mutation resulted from linkage mapping with 385 F 2 segregants (164 wild type and 221
hos3∆) from Mullis et al.
85
. We employed the linear model growth ~ hos3∆ + locus + hos3∆xlocus + error,
from which the hos3∆xlocus interaction term was used to identify loci that differentially explained growth
in hos3∆ segregants. Examination of the hos3∆xlocus interaction term led to discovery of the spontaneous
mrp20-105E mutation on the MRP20
BY
allele present in hos3∆ segregants. Following discovery of mrp20-
105E, we used the fixed effects linear model growth ~ MRP20 + locus + MRP20 x locus + error using
only hos3∆ individuals from Mullis et al.
85
. From this scan, we examined the MRP20 x locus interaction
term. 361 mrp20-105E F 3 segregants were used to better resolve the Chromosome XIV locus. We employed
the model growth ~ locus + error and examined the locus term. We examined the minimum observed test
on chromosome XIV to delimit that locus.
To find loci affecting growth in the mrp20-105E background, we generated new populations of
mrp20-105E MKT1
BY
(353) and mrp20-105E MKT1
3S
(396) haploid segregants. The combined 749 mrp20-
105E segregants were used for linkage mapping that followed a forward regression approach. We first
98
obtained residuals from the linear model growth ~ MKT1 + error, and then implemented a genome-wide
scan using the model residuals ~ locus + error. We examined the locus term and significance was
determined by using 1,000 permutations with the threshold set at the 95
th
quantile of observed -log 10(p-
values)
102
. A maximum of one locus per chromosome per scan was identified as significant. Following the
identification of additional loci, we accounted for the newly detected loci in a new model, residuals ~ locus
1 + locus 2 + … locus n + error and obtained the residuals. These new residuals were used in another
genome-wide scan using the model residuals ~ locus + error. Permuted thresholds were calculated for each
scan. This process was repeated for a total of 5 iterations at which point no loci were detected above our
significance threshold. Chromosome II was excluded from linkage mapping due to the presence of a
chromosomal duplication in a subset of individuals. The Chromosome II duplication was tested for
significance using the model growth ~ MKT1 + ChromosomeII + error, from which the Chromosome II
term was examined.
All linkage mapping was performed in R. Linear models were implemented using the lm() function.
To call peaks for each scan we required that the local minimum position within each peak be a minimum
of 150,000 kb away from any other peak. We also required peaks to be more than 20kB from the edge of a
chromosome. We report 99% confidence intervals as 2-lod intervals surrounding the peak position at each
locus.
3.5.5 Classification of inviable segregants
Initial discovery of the MRP20 x MKT1 genetic interaction suggested that expressivity of mrp20-
105E was largely determined by variation at MKT1. Furthermore, mrp20-105E MKT1
BY
segregants
exhibited very poor growth, while mrp20-105E MKT1
3S
segregants showed more tolerant, variable growth.
We termed this initial mrp20-105E MKT1
BY
segregant population as ‘inviable’. Figures 3.4 and 3.5 include
a gray dashed line to denote the highest growth value observed among the original inviable segregants.
3.5.6 Delimiting loci with recombination breakpoints
99
For each locus examined, we split the appropriate segregants into two groups: individuals carrying
the BY allele and individuals carrying the 3S allele. Segregants’ haplotypes across the adjacent genomic
window were then examined. The causal region was determined by identifying the SNPs fixed for BY
among all BY individuals and fixed for 3S among all 3S individuals. Raw Illumina sequencing reads were
examined to confirm the delimit of IV to MRP20 among original F 2 segregants, the delimit of XIV to the
MKT1 coding SNP at 467,219 among F 3 segregants, and the delimit of the secondary XIV locus to SAL1
and PMS1 among the new F 2 segregants.
3.5.7 Reciprocal hemizygosity experiments
Four hos3∆ F 2 MATa segregants were used in all reciprocal hemizygosity (RH) experiments
103
:
two were hos3∆ IV
BY
XIV
BY
and two were hos3∆ IV
3S
XIV
BY
. The four segregants were first mating type
switched to enable mating of these segregants to produce homozygous IV
BY
/IV
BY
, homozygous IV
3S
/IV
3S
,
or heterozygous IV
BY
/IV
3S
diploids. Each pairwise mating was performed and confirmed by plating on
mating type tester plates. These diploid strains were then phenotyped on agar plates containing ethanol,
which verified that IV
BY
has an effect in diploids and acts in a recessive manner. Using the haploid MATa
and MATα versions of these four segregants, we individually engineered premature stop codons into DIT1,
MRP20 and PDR15 using CRISPR-mediated targeted gene disruption and lithium acetate
transformations
76
. Plasmid-based CRISPR-Cas9 was employed to target the beginning of each coding
region and 20bp repair templates which contained a premature stop codon followed by 1bp deletions were
incorporated. Each sgRNA and repair template was designed so that only the first 15 (of 537), 26 (of 264),
and 33 (of 1,530) amino acids would be translated for DIT1, MRP20 and PDR15, respectively. Engineered
strains were confirmed by PCR and Sanger sequencing. After confirmation, wild type and knockout strains
for each gene were then mated in particular combinations to produce reciprocal hemizygotes that were
otherwise isogenic. A minimum of two distinct hemizygotes were generated for each allele of each gene.
3.5.8 Construction of nucleotide replacement strains
100
Single nucleotide replacement strains were generated for MRP20 and MKT1 using a CRISPR/Cas9-
mediated 100pproach. For a given replacement, the appropriate strain was first transformed with a modified
version of pML104 that constitutively expresses Cas9 using LiAc transformation
76,104
. We then inserted the
KanMX gene using co-transformation of a double-stranded DNA containing KanMX with 30bp upstream
and 30bp downstream homology tails and gRNAs targeting the region containing the site of interest
105
.
DNA oligos and PCR were used to construct custom sgDNA templates which included crRNA and
tracrRNA in a single molecule. Next, we employed T7 RNA Polymerase to express sgDNA templates in
vitro. Dnase treatment and phenol extraction were used to obtain purified sgRNAs. Transformants were
selected on media containing G418, and KanMX integration was confirmed by PCR. Next, KanMX was
replaced with the nucleotide of interest. To do this, integrants were co-transformed with four gRNAs
targeting KanMX, a 60 bp single-stranded DNA repair oligo, and a marker plasmid expressing either
HygMX or NatMX using electroporation
106
. Marker plasmids were constructed by Gibson assembly with
HygMX or NatMX and pRS316
107,108
. Repair constructs were 60bp ssDNA oligos ordered from Integrated
DNA technologies that included upstream homology, the desired nucleotide at the site of interest, and
downstream homology. Transformants were selected on media containing either hygromycin or
nourseothricin, depending on what marker plasmid was used. Replacement strains were then confirmed by
sanger sequencing.
Following this strategy, the mrp20-105E nucleotide was engineered into two hos3∆ IV
3S
XIV
BY
segregants, and two hos3∆ IV
BY
XIV
BY
segregants were restored to MRP20. Similarly, at MKT1 the causal,
nearest upstream and downstream SNPs were engineered into two hos3∆ IV
BY
XIV
3S
segregants. Similarly,
we generate BY mrp20-105E, BY MKT1
3S
, 3S mrp20-105E, and 3S MKT1
BY
strains in this manner. Each
single nucleotide parental replacement strain was then backcrossed to its own progenitor. Each subsequent
diploid was sporulated and tetrad dissected, and we confirmed haploid genotypes by sequencing. The same
approach was used to generate 3S mrp20-105E MKT1
BY
haploids by crossing 3S mrp20-105E and 3S
MKT1
BY
strains. However, this strategy could not be followed to generate BY mrp20-105E MKT1
3S
101
haploids, because, crossing BY mrp20-105E and BY MKT1
3s
strains failed to produce any tetrads with 4
viable spores. Instead, we took BY MKT1
3S
strains and converted MRP20 to mrp20-105E.
3.5.9 Mitochondrial genome instability experiments
We performed petite frequency assays as described in Dimitrov et al.
91
. In brief, freezer stocks were
streaked onto solid YPD media and grown for two days at 30°C. Single colonies were then resuspended in
PBS, plated across dilutions onto YPDG plates (1% yeast extract, 2% peptone, 0.1% glucose, and 3%
glycerol) and grown for five days at 30°C. Plates were then imaged with the BioRAD Gel Doc XR+
Molecular Imager at a standard size of 12.4 x 8.9 cm
2
(width x length) and imaged with epi-illumination
using an exposure time of 0.5 seconds. Images were saved as 600 dpi tiffs. ImageJ
(http://rsbweb.nih.gov/ij/) was used to examine growth and quantitate colony size as described in Dimitrov
et al.
91
. Colonies were then classified as petite and grande using a threshold defined as the maximum colony
diameter of observed petites among BY and 3S wild type strains. Petite frequency is the ratio of small
colonies to total colonies.
3.5.10 Modeling growth and examining the model in additional segregant populations
We modeled growth for mrp20-105E segregants from the Byx3S crosses fixed for mrp20-105E
and engineered at MKT1. We incorporated MKT1, the 16 detected loci and the Chromosome II duplication
in the linear model growth ~ MKT1 + locus1 + locus2 + … locus16 + Chromsome II + error. This model
was used to generate predicted growth values. We then compared our observed growth values to these
predictions. Next, we sought to determine whether loci influencing the expressivity of mrp20-105E also
affected growth in other strains. To accomplish this, we input the genotype information for each strain into
our model to obtain predictions for its growth. We then compared the predicted values to the observed
growth values and obtained Pearson correlations when possible.
3.5.11 Relationship between detrimental alleles, growth, and inviability
At each detected locus influencing response to mrp20-105E, we determined the allele associated
with worse growth (‘detrimental allele’). Next, we counted the number of detrimental alleles carried by
102
each mrp20-105E segregant and examined how phenotypic response to mrp20-105E related to it. The
MKT1 and SAL1
loci were not included when counting detrimental alleles, so that this relationship could
be examined across different MKT1-SAL1
genotype classes.
103
3.6 Supplementary Material
Supplementary Text
Resolution of the Chromosome IV locus
We detected a locus on Chromosome IV with a peak marker position from 1,277,231 to
1,277,378, and a 99% confidence interval from position 1,277,231to position 1,278,618,
encompassing the promoter and most of the coding region of MRP20.
Resolution of the Chromosome XIV locus
The chromosome XIV locus that interacts with MRP20 in hos3∆ segregants had a peak
from 463,554 to 465,005 , and a 99% confidence interval that extended from 457,243 to 478,701.
10 protein-coding genes were completely or partially encompassed in this confidence interval
limiting possible insight into the causal variation underlying this effect.
104
Figure S3.1 Identification of mrp20-105E.
(A) Wild type and hemizygous BY/3S diploids were generated and sporulated to produce HOS3
and hos3∆ F2 BYx3S segregants. BYx3S hos3∆ segregants exhibited a large increase in
phenotypic variability relative to wild type segregants. (B) Linkage mapping using the HOS3 and
hos3∆ segregants identified a single locus on Chromosome IV. The peak marker was from
1,277,231 to 1,277,959 and the confidence interval extended from position 1,272,164 to position
1,278,407, encompassing (from left to right) part of URH1 and all of DIT2, DIT1, RPB7, and
105
MRP20. (C) The BY allele of the Chromosome IV locus had a large effect in hos3∆ segregants,
but no effect in HOS3 segregants. (D) Recombination breakpoints in hos3∆ segregants delimited
the Chromosome IV locus to five SNPs (small vertical black lines along the x-axis) in the RPB7-
MRP20 region of the chromosome. Dashed vertical lines show the window delimited by the
recombination breakpoints. One of these variants was a spontaneous mutation in MRP20. Blue and
orange respectively refer to the BY and 3S alleles of the locus. (E) Reciprocal hemizygosity
analysis in a hos3∆ BY/3S diploid was conducted at closely linked non-essential genes and found
that MRP20 is the causal gene underlying the Chromosome IV locus. In these experiments the IV
BY
allele includes the mrp20-105E mutation and results in a substantial decrease in growth. Black
triangles denote the absence of one allele and colored triangles indicate the alleles that are present.
(F) The causality of mrp20-105E was validated by engineering in segregants with MRP20 (left)
and mrp20-105E (right). (G) Tetrad dissection of the original BY/3S HOS3/hos3∆ MRP20/mrp20-
105E diploid showed that increased variation was due to mrp20-105E, not hos3∆. Throughout the
paper, blue and orange are used to denote BY and 3S genetic material, respectively. All growth
data presented in the paper are measurements of colonies on agar plates containing rich medium
with ethanol as the carbon source.
106
Figure S3.2 Representative MRP20 and mrp20-105E segregants on ethanol.
Each colony is a genetically distinct BYx3S segregant grown on ethanol. A wide range of growth
phenotypes was observed among mrp20-105E segregants, some of which were inviable in this
condition.
107
Figure S3.3 Linkage mapping in the F3 panel more finely resolves the Chromosome XIV locus.
The model growth ~ locus + error was used. The genome-wide significance plot of the locus term
is shown in (A) and the relationship between genotype at the Chromosome XIV locus are shown
in (B). The peak and 99% confidence interval solely included the position 467,219.
108
Figure S3.4 Growth effects of loci detected in BY x 3S mrp20-105E crosses.
The relationship between genotype is shown at each of the 16 loci detected among BYx3S mrp20-
015E segregants shown in Fig. 3.3 B-D. Effects are shown from greatest to least effect size, left
to right, top to bottom.
109
Figure S3.5 Loci affecting expressivity of mrp20-015E show minimal effects in MRP20
segregants.
Growth relative to the sum of detrimental alleles is shown for MRP20 segregants. While
predictions for MRP20 segregants correlated with observed growth (r = 0.70, p = 9.6 x 10
-25
), the
cumulative effects of loci differed between mrp20-105E and MRP20 segregants (ANOVA,
observedGrowth ~ predictedGrowth*MRP20; interaction term p = 2.8 x 10
-23
). This is likely, in
part, due to the fact that wildtype segregants exhibited a narrower range of phenotypes which did
not include inviable segregants.
110
Diploid Cross Segregants Method Publication Total
A BY x 3S Wildtype F 2
random
spores
Mullis, et al, 2019 164
B
BY mrp20-105E hos3∆ x
3S MRP20 HOS3
MRP20 hos3∆
F 2 random
spores
Mullis, et al, 2019 131
mrp20-105E
hos3∆ F 2
Mullis, et al, 2019 90
MRP20 hos3∆
F 2
tetrad
dissection
this paper 27
mrp20-105E
hos3∆ F 2
this paper 32
MRP20 HOS3
F 2
this paper 34
mrp20-105E
HOS3 F 2
this paper 30
C
BYx3S mrp20-105E XIV
BY
F 2 x
BYx3S mrp20-105E XIV
3S
F 2
mrp20-105E F 3
random
spores
this paper 361
D
BY mrp20-105E MKT1
BY
x
3S mrp20-105E MKT1
BY
mrp20-105E
MKT1
BY
F 2
tetrad
dissection
this paper 353
E
BY mrp20-105E MKT1
3S
x
3S mrp20-105E MKT1
3S
mrp20-105E
MKT1
3S
F 2
tetrad
dissection
this paper 396
Table S3.1 Crosses and segregant populations examined in this study.
All BY x 3S crosses and segregant populations examined in this study are listed. Note, at times
different methods of obtaining segregants, either random spores or tetrad dissection were
employed.
111
Chromosome Peak Position(s)
99% Confidence Interval
P-value
4 615,114 565,313 to 639,037 6.6 x 10
-7
7 140,806 to 142,398 86,297 to 164,025 1.4 x 10
-6
11 171,261 to 171,457 150,554 to 264,387 2.1 x 10
-6
12 448,368 444,569 to 513,765 7.5 x 10
-9
12 679,600 660,371 to 701,793 4.8 x 10
-10
12 116,314 116,980 85,385 to 141,900 3.2 x 10
-5
12 1,022,895 to 1,022,933 1,013,592 to 1,059,611 1.1 x 10
-8
13 612,355 594,280 to 652,003 2.8 x 10
-5
13 833,534 812,150 to 892,748 4.6x10
-7
13 449,844 to 451,590 436,342 to 465,619 3.2 x 10
-5
14 299,515 295,050 to 312,454 2.0 x 10
-12
14 473,648 468,488 to 478,701 4.8 x 10
-34
15 185,793 167,270 to 189,700 3.4 x 10
-5
15 343,484 to 343,921 340,625 to 363,553 2.4 x 10
-24
15 627,315 to 628,209 553,072 to 717,181 2.7 x 10
-5
15 885,437 to 885,914 869,837 to 916,094 5.0 x 10
-6
Table S3.2 Loci beyond MKT1 that additively influence growth in mrp20-105E segregants.
These loci were detected by mapping growth ~ locus in in mrp20-105E MKT
BY
F2 and in mrp20-
105E MKT
3S
F2 individuals shown in Supplementary Figures 3.3-3.5 and Figure S3.4.
112
Chromosome Peak Position(s) Candidate Gene(s)
Position of Peak
SNP(s)
4 615,114 AFR1 Coding
7 140,806 to 142,398
HOS2, YGL193C, and
IME4
Promoter and Coding
11 171,261 to 171,457 SDH1 Promoter
12 116,314 116,980 BPT1 Promoter and Coding
12 448,368 RNH203 Promoter
12 679,600 BOP2 Coding
12 1,022,895 to 1,022,933 ECM7 Coding
13 449,844 to 451,590 YMR090W and NPL6 Promoter
13 612,355 ECM5 Coding
13 833,534 AEP2 Coding
14 299,515 PBR1 Coding
14 472,584 to 473,648 SAL1 and PMS1 Promoter and Coding
15 185,793 BRX1 3’ UTR
15 343,484 to 343,921 YOR008C-A and TIR4 Promoter
15 627,315 to 628,209 ISN1 Promoter and Coding
15 885,437 to 885,914 ISW2 Promoter and Coding
Table S3.3 Genes at loci that influence growth in mrp20-105E segregants.
These loci additively affect the expressivity of mrp20-105E mutation. Recombination delimits
each peak to between one candidate (12 loci), two candidate genes (3 loci) or 4 genes (1 locus).
Location of the delimited SNPs is included.
113
Candidate
Gene(s)
Summary of Associated Function(s)
AFR1 Pheromone-induced projection (shmoo) formation; Septin architecture during mating
HOS2 Histone deacetylase, subunit of Set3 and Rpd3L complexes
YGL193C Haploid specific gene
IME4 Methyltransferase, conditionally essential for meiosis
SDH1 Flavoprotein involved in TCA cycle in mitochondria
BPT1 Vacuolar transmembrane protein
RNH203 Ribonuclease H2 subunit, ribonucleotide excision repair
BOP2 Unknown
ECM7 Putative integral membrane protein with a role in calcium uptake
YMR090W Unknown
NPL6 Component of the RSC chromatin remodeling complex
ECM5 Subunit of Snt2C complex involved in gene regulation in response to oxidative stress
AEP2 Mitochondrial protein; likely involved in translation
PBR1 Putative oxidoreductase; required for cell viability
SAL1 ADP/ATP transporter in mitochondria
PMS1 ATP-binding protein required for mismatch repair; required for both mitosis and meiosis
BRX1 Nucleolar protein involved in rRNA processing
YOR008C-A Unknown, potential transmembrane domain
TIR4 Cell wall mannoprotein; required for anaerobic growth
ISN1 Ionosine metabolism
ISW2 ATP-dependent DNA translocase involved in chromatin remodeling
Table S3.4 Candidate genes have diverse cellular functions.
Candidate genes delimited by recombination at mrp20-105E listed in Table S3.3 are included here
with a brief summary of each’s function based on summarized descriptions on the Saccharomyces
Genome Database.
114
Diploid Cross % wildtype % aneuploid
A BY x 3S 100 0
D BY mrp20-105E MKT1
B
x 3S mrp20-105E MKT1
BY
94 5.9
E BY mrp20-105E MKT1
3S
x 3S mrp20-105E MKT1
3S
51 49
Table S3.5 Presence of Chromosome II duplication differs among BY x 3S crosses.
We observed a Chromosome II duplication in crosses fixed for mrp20-105E. Among these two
crosses, the cross fixed for MKT1
3S
had much higher prevalence of the aneuploidy relative to the
cross fixed for MKT1
BY
.
115
Supplementary Data
Dataset S1. Segregant genotype table
Chromosome and position columns refer to the chromosome and position of each genetic
variant used in this study, The mitochondria is referred to as chromosome 17. Each additional
column contains the genetic information for a given segregant. Each segregant was named by type
(F2 and F3), the diploid from which it originated (A, B, C, D, E defined in table S2), whether that
segregant was wildtype or mutant at MRP20 (‘MRP20’ or ‘mrp20’), and was randomly numbered
from one through the total number of that segregant type. Segregants originating from diploid B
contained additional information pertaining to whether that segregant was wildtype or knockout at
HOS3 (‘HOS3’ or ‘hos3∆’) and whether that segregant was obtained by random spore prep or
tetrad dissection (‘random’ or ‘dissected’). Note, that for hos3∆ segregants obtained by random
spore preparation of diploid B, the BY allele at MRP20 contained the mrp20-105E mutation. Each
genetic variant used in this study is presented as a row, whereby the haplotype information for
each segregant is denoted as 0 for BY or 1 for 3S respectively. A value of ‘NA’ indicates a site
that lacked coverage for which a haplotype was not called. In the BY x 3S crosses fixed for mrp20-
105E (diploids D and E), a third heterozygous state (2) was used to denote heterozygosity for
individuals with the chromosome II duplication event.
Data S2. Segregant phenotype table
Each segregant’s growth in ethanol is presented. A single growth measurement is reported,
which is the mean value of three biological replicates of growth normalized to on plate BY
controls.
Data S3. Reciprocal hemizygosity experiments
116
Segregants that were used in reciprocal hemiygosity experiments that delimied the
Chromosome IV allele to MRP20 are included. The segregants mated together for each
hemizygous diploid are listed under ‘Parent1’ and ‘Parent2’ columns. The ‘Gene’ column lists the
gene at which reciprocal hemizygosity was engineered. The ‘LossOfFunction’ column indicates
which allele, BY or 3S (encoded as 0 or 1) was engineered to be non-functional. The ‘Ethanol’
columns contains a growth value normalized to on plate BY controls. Each biological replicate is
included separately and denoted by the ‘Replicate’ column to enable calculation of confidence
intervals.
Data S4. Cloning experiments
Each segregant and parent strain used for cloning causal nucleotides at MRP20 and MKT1
are included. The ‘Type’ column denotes whether the engineered strain is a segregant or parent
(‘segregant’ or ‘parent’), the ‘MRP20’ column describes whether that strain is wildtype or mrp20-
105E (denoted as ‘MRP20’ or ‘mrp20’), and the ‘MKT1’ column describes whether that strain
was BY or 3S (encoded as 0 or 1) at the causal SNP at position 467,219. The ‘Gene’ column lists
the gene at which engineering had occurred, including (‘MRP20’, ‘MKT1’, ‘MRP20andMKT1’,
and ‘WT’ which denotes parental control samples. The ‘Edit’ column explains the type of
engineering that was performed in segregants. Thus, cloning experiments at MRP20 in segregants
are described as ‘fromMuttoWT’ or ‘fromWTtoMut’. Similarly cloning experiments at MKT1 in
segregants are described as ‘from3StoBY-1SNP’, ‘from3StoBYcandidate’, and
‘from3StoBY+1SNP’ for strain engineering at the nearest upstream, causal, and nearest
downstream SNPs. This column is not relevant for parental cloning and therefore NA is reported
in those cells. Lastly, the ‘Ethanol’ column contains a single growth measurement which is the
mean value of three biological replicates of growth normalized to on plate BY controls.
117
Data S5. Petite frequency
Each strain used in petite frequency assays, either segregants described in Data S1-S3 or
parental strains is reported in the ‘Sample’ column. The 'Type’ column denotes whether the given
strain is a segregant or parent (‘segregant’ or ‘parent’), and the ‘MRP20’ column describes whether
or not the strain is wildtype or mrp20-105E (denoted as ‘MRP20’ or ‘mrp20’). A comma separated
list of Image J reported colony sizes (in
2
) is also included in the ‘ColonySizes’ column. The largest
observed petite colony among wildtype parental strains was 0.001 in
2
and was thus used to
designate petites from total colonies. The final ‘Frequency’ column reports the petite frequency,
or the number colonies at or below this threshold relative to the total number of colonies times
100.
Data S6.Individuals with Chromsome II duplication event
Each segregant from BY x 3S crosses that were engineered at mrp20-105E and MKT1
(croses D and E) in which the Chromosome II duplication was observed is listed under the
‘Sample’ column, The average normalized coverage acrpss chromosome II is reported in the
‘AverageChr2Coverage’ column, and the ‘Chr2C’ column denotes if that segregant was
determined to be WT or Aneuploid (encoded as 0 or 1).
Data S7. Genetic mapping analysis code
All code used for linkage mapping and multiple testing correction is included as ‘.R’ file
to be used in R programming language.
Data S8. Statistical analysis code
All code used for statistical analyses is included as ‘.R’ file used to be used in R
programming language.
118
Data S9. Figure plotting code
All code used for plotting data shown in main text and manuscript is included as ‘.R’ file
to be used in R programming language.
119
Links to Supplementary Data
DataS1:
https://drive.google.com/file/d/1OURXNu8bHkbB6rbUzswF07vBauYfmQ9S/view?usp=sharing
DataS2:
https://drive.google.com/file/d/1_6LaPckKvzLdkSaeefjjQc7yAQ1L-NA6/view?usp=sharing
DataS3:
https://drive.google.com/file/d/1TEmdHQ_5yHsrsiH-1qJZxSr4o0WSwVzd/view?usp=sharing
DataS4:
https://drive.google.com/file/d/1k1-XgWtn2GPRsaPBTnrrGRwu9T4Z_jfS/view?usp=sharing
DataS5:
https://drive.google.com/file/d/1u71y1TV7vdCjrl2dN72uAO1JUlXJunzq/view?usp=sharing
DataS6:
https://drive.google.com/file/d/1OZNNJxVE5lLF3JgwdD5LlfE2TYbCPX7T/view?usp=sharing
DataS7:
https://drive.google.com/file/d/1gUaZgfqJk4jq7AUliAH30VYYzcC42K8n/view?usp=sharing
DataS8:
https://drive.google.com/file/d/1Wnvf-CurH8UceBTYJcE4S78IeWoHwUW6/view?usp=sharing
DataS9:
https://drive.google.com/file/d/1emHITn8rnEB6Y8HpBp9kIG3nIHuMxwxE/view?usp=sharing
120
Chapter 4 A gene deletion with no growth effect causes
widespread molecular changes
This paper is being prepared for submission to G3: Genes, Genomes and Genetics
4.1 Abstract
Many genes, including HOS3 in Saccharomyces cerevisiae, show little to no effect on growth
when deleted. Analysis of these deletions’ impacts on the molecular activities of other genes and
genetic polymorphisms can shed light on why they lack phenotypic consequences. Here, we
analyze the biochemical effects of lysine deacetylase HOS3 deletion on chromatin accessibility
and transcription in a highly heterozygous strain. The deletion has widespread effects on the
molecular activities of genetic variants, as thousands of polymorphic sites show allelic changes
in chromatin accessibility and hundreds of genes, including ones involved in plasma membrane
and cellular periphery, that show changes in allele-specific gene expression. We identify seven
transcription factors whose nuclear activities are likely altered by HOS3 deletion, two of which
play roles in the transcription of genes involved in translation. These results suggest that the
molecular consequences of HOS3 deletion are widespread, but complex compensatory changes
in the transcriptional wiring of translation, the cellular periphery, and potentially other cellular
processes may limit the ability of these changes from affecting growth.
121
4.2 Introduction
Systematic gene deletion studies have shown that many genes have little to no effects on
growth when deleted
16–20,109,110
. Biochemical changes often accompany these gene deletions,
changes at the molecular level are often tolerated and produce no phenotypic effects
111–114
. Most
studies that examine the molecular impacts of gene deletions use genetically homogeneous
systems to determine the mechanisms explaining their lack of effect on phenotype. However,
examination of gene deletions in the presence of genetic diversity may provide additional insight
into this problem because the consequences of a given gene’s deletion on different alleles or even
single nucleotides can be interrogated genome-wide. Allele-specific consequences could make
visible molecular effects that might otherwise be unobservable in genetically homogeneous
systems. These experiments could also identify genetic variants that respond differently to the
same perturbation.
Here, we examine the consequences of deleting the HOS3 gene in Saccharomyces
cerevisiae on chromatin accessibility and transcription. This gene shows no measurable effect on
growth when deleted. HOS3 encodes a histone deacetylase (HDAC) that is similar at the sequence
level to the canonical class II HDAC Hda1, the catalytic subunit of the HDA1 complex
72,115
. While
Hda1 orthologs are highly conserved across eukaryotes, Hos3 appears to be present in fungi
only
115
. Despite in vitro biochemical studies showing Hos3 can deacetylate histones, analysis of
histone deacetylation in the nucleus suggests Hos3 has very limited, if any, in vivo chromatin
activity
72,73
. Only the rDNA locus showed a change in acetylation following HOS3 deletion, and
whether this is a direct or indirect result of lost Hos3 activity has not been determined
73
. Notably,
studies to date have indicated that HOS3 deletion has little to no effect on most examined traits,
122
suggesting that any impact of Hos3 at the rDNA locus on chromatin or elsewhere is of minor
consequence
72–74
.
Emerging evidence suggests that Hos3 has important cellular roles beyond histone
deacetylation, which have yet to be fully elucidated. Hos3 is expressed at a level of ~4,000 proteins
per cell, which are diffusely distributed throughout the cytoplasm except during cell division
74
. In
budding cells, Hos3 initially localizes to the bud neck prior to cytokinesis and then exhibits
enrichment at the nuclear periphery of daughter cells after the migration of a nucleus into the
bud
74,75,116
. A recent study demonstrated that perinuclear Hos3 plays a critical role in delaying
entry into the cell cycle in daughter cells
75
. Deacetylation of the nuclear pore by Hos3 in daughter
cells appears to increase import of the transcriptional repressor Whi5, which may cause the
perinuclear tethering and silencing of the G1/S cyclin CLN2 gene
75
. Disruption of these dynamics
using constitutive Hos3 nuclear localization is lethal, showing that Hos3 dysregulation has
biochemical consequences that can result in large phenotypic effects under certain
circumstances
75
.
Figure 4.1 HOS3 deletion has no effect on growth.
Wild type (‘WT’) and hos3∆/hos3∆ versions of a heterozygous BY/3S F1 diploid strain were
grown in rich growth media containing dextrose or ethanol as the carbon source, hereafter referred
to as ‘Glucose’ and ‘Ethanol’, respectively. We found no difference in doubling time in either
environment.
123
Here, consistent with past work, we found that HOS3 deletion had no discernible effect on
growth (Fig. 4.1). To help clarify the mechanisms that cause HOS3 deletion to show no phenotypic
effect, we performed RNA-seq and ATAC-seq on HOS3 and hos3∆ heterozygous diploids from a
mating of the BY reference strain and 3S clinical isolate (Fig. 4.2)
49,50
. These data were generated
for each strain in two environments—rich medium containing either glucose or ethanol as the
carbon source, enabling characterization of general and allele-specific effects on transcription and
chromatin accessibility. These environments were selected due to widespread transcriptional
differences that underly growth by fermentation in glucose relative to growth by respiration in
ethanol. This work shows that despite having no effect on growth, HOS3 deletion has widespread
biochemical consequences, which are particularly visible through allele-specific accessibility and
expression changes. The deletion impacts activity of at least seven transcription factors, including
two that regulate expression of key translational components. These complex changes in
transcriptional wiring may explain why the deletion and its widespread biochemical consequences
do not affect growth.
Figure 4.2 Workflow of experiments conducted in this paper.
WT and hos3∆/hos3∆ cultures were grown in biological triplicate in glucose and ethanol. Cells
were harvested and ATAC-seq and RNA-seq libraries were then prepared in parallel from the same
cultures.
4.3 Results and Discussion
124
HOS3 deletion had pervasive effects on both allele-specific expression (ASE) and allelic
accessibility. 202 and 320 genes showed HOS3-dependent changes in ASE in glucose and ethanol,
respectively (Fig. 4.3A, Fig. S4.1, Fisher’s test, 5% FDR). However, the effect of the deletion was
minor in comparison with the effect of environment on the wild type strain in which over 6-times
as many genes showed ASE changes (Fig. S4.2, 1,631 vs. an average of 261). ASE changes were
largely distinct between environments with only 13% (60 of 462 unique genes) detected in both
glucose and ethanol (Fig. 4.3B). Genes showing ASE in both glucose and ethanol exhibited similar
changes in each environment (Fig. S4.3). ASE changes caused by HOS3 deletion were enriched
for plasma membrane genes and cellular periphery genes (15-20% of ASE genes for each GO
term, Gene Ontology (GO) enrichments p <= 1.3 x 10
-5
, hypergeometric test, 5% FDR). In contrast,
75% of ASE changes caused by environmental perturbation of the WT strain function in the
cytoplasm (GO enrichment, p = 6.6 x 10
-7
, 5% FDR, hypergeometric test). 1,476 and 2,377 SNPs
exhibited allelic accessibility changes in the deletion in glucose and ethanol, respectively (Fig.
4.4A, Fisher’s exact test; 5% FDR). Again, the effect of the deletion was minor in comparison
with the effect of environment on the wild type strain in which nearly 3-times as many
polymorphisms showed allelic accessibility changes (5,327 vs. an average of 1,927). Allelic
accessibility changes caused by HOS3 deletion were largely environment-specific, with ~85%
(3,225 of 3,853) detected in only one environment (Fig. 4.4B). SNPs showing allelic accessibility
changes in both glucose and ethanol displayed similar changes in each environment (Fig. S4.4).
~50% of allelic accessibility changes occurred in coding regions (722 in glucose and 1,201 in
ethanol) and ~22% (294 glucose and 566 ethanol) occurred in promoters, respectively (Fig. 4.4C).
We found no GO enrichments for genes showing allelic accessibility changes. ~34% of genes with
allele-specific expression changes (75 of 202 in glucose and 100 of 320 in ethanol) also
125
experienced allelic accessibility changes. Similarly, fewer than half of ASE genes also experienced
and allelic accessibility changes for WT between glucose and ethanol (40%, 663 of 1,631 genes,
binomial test p = 2.1 x 10
-9
), The fact that few genes experienced both ASE and allelic accessibility
changes and each dataset’s environmental specificity implies that most biochemical impacts of
HOS3 deletion are highly contextual and may have limited influence on cellular functions.
Figure 4.3 HOS3 deletion affects allele-specific expression (ASE) at hundreds of genes.
(A) Genes showing ASE changes in HOS3 deletion are shown (5% FDR). The percent of
expression from the BY chromosome relative to total allelic expression of both BY and 3S
chromosomes for a given gene is shown in WT relative to hos3∆/hos3∆ in glucose (left) and
(ethanol). Pink datapoints indicate genes showing ASE changes in both glucose and ethanol. (B)
The overlap of HOS3-mediated allele-specific expression changes between environments is
shown.
126
Figure 4.4 HOS3 deletion affects chromatin accessibility at thousands of SNPs in hundreds of
genes.
(A) Polymorphisms showing allele-specific accessibility changes in HOS3 deletion are shown.
The percent of accessibility coverage at the BY chromosome relative to total accessibility coverage
at both BY and 3S chromosomes is shown in WT relative to hos3∆/hos3∆ in glucose (left) and
(ethanol). Pink datapoints indicate SNPs showing accessibility changes in both glucose and
ethanol. (B) The overlap of allelic-accessibility changes at SNPs between environments is shown.
(C) The relative location of allelic-accessibility changes in HOS3 deletion is displayed.
We next examined how gene expression irrespective of allelic variation was affected by
HOS3 deletion. 258 and 4 genes were differentially expressed in glucose and ethanol, respectively
(Fig. 4.5A, exact negative binomial test, 5% FDR). The vast majority (84%; 219 of 262) were
reductions in expression. 132 gene ontology terms were enriched among these genes (5% FDR),
including nucleolus, ribosome biogenesis, rRNA metabolic process, preribosome and rRNA
processing, each of which contained at least 80 (30%) of the differentially expressed genes (five
127
most significant GO terms; p value <= 8.3 x 10
-64
, , hypergeometric test, 5% FDR). The proteins
encoded by the differentially expressed genes physically interacted with many ‘ribosome’ or
‘rRNA’ proteins (p <= 1.9 x 10
-3
; 5% FDR, supplement). Additionally, thecellmap.org identified
the predominant function of the differentially expressed genes as rRNA and ncRNA processing
(Fig. 4.5B, 65 genes; > 9 fold enrichment; p value = 2.07 x 10
-95
). De novo motif discovery focused
on the promoters of these differentially expressed genes identified two motifs that closely matched
known ribosome biogenesis transcription factors Tod6 and Sfp1 (hypergeometric test, p = 1 x 10
-
80
; 1.0 motif match score and p = 1 x 10
-46
; 0.96 motif match score)
117,118
. Around half of the
differentially expressed genes contained at least one motif for either Tod6 or Sfp1 (Fig. 4.5C and
Fig. 4.5D, 55%, 144 of 262 genes). In all but one differentially expressed gene with a motif,
expression was reduced (Fig. 4.5C, 99%, 143 of 144 genes). Neither TOD6 nor SFP1 were
differentially expressed in the HOS3 deletion strain, implying that HOS3 deletion altered their
biochemical activities. Together these results show that HOS3 deletion leads to reduced expression
in genes involved in translation but the mechanisms by which this occurs is unknown.
128
Figure 4.5 HOS3 deletion alters expression at ribosome genes regulated by transcription factors
Tod6 and Sfp1.
(A) Genes that were differentially expressed in HOS3 deletion in glucose (left) are shown in blue
and in ethanol (right) are shown in green. The log2 of the fold change in expression relative to
significance is shown. (B) The cellmap.org detected enrichment for rRNA and ncRNA function
among the differentially expressed genes. (C) The number of differentially expressed genes
containing Tod6 and Sfp1 motifs and the type of change in expression in HOS3 deletion is
displayed. (D) The overlap in differentially expressed genes containing Tod6 and Sfp1
transcription factor binding motifs is shown.
We also identified chromatin accessibility changes that were caused by HOS3 deletion
irrespective of allelic variation. Consistent with past work showing acetylation changes in HOS3
deletion, the rDNA locus exhibited a change in accessibility between HOS3 and hos3∆ strains in
both environments. Another 85 and 191 genomic regions additionally showed changes in
accessibility in HOS3 deletion in glucose and ethanol, respectively (Fig. 4.6A). However, the
effect of the deletion was minor in comparison with the effect of environment on the wild type
129
strain in which nearly 11-times as many sites showed accessibility changes (1,455 WT between
environments vs. an average of 138 between WT and hos3∆ in each environment). HOS3-mediated
accessibility changes were mixed between showing increased and decreased accessibility in
response to the HOS3 deletion with the proportions 0.8:1 in glucose and 1.9:1 in ethanol.
Accessibility changes had size and shape characteristics resembling nucleosomes, implying
nucleosome remodeling occurred in response to the deletion (average median peak size of 156bp).
Among genes within occupancy changes, only 15% (34 of 226) occurred in both environments
(Fig. 4.6B). With the exception of five genes involved in ADP, purine, and RNA metabolic
processes (p <= 2.0 x 10
-5
, 5% FDR), changes in accessibility in either environment did not show
GO enrichment. Few genes with accessibility changes were also differentially expressed (only
10%; 7 of 73 genes in glucose and 1%; 2 of 192 genes in ethanol). De novo motif discovery
identified several transcription factors motifs at or surrounding accessibility changes (Fig. 4.5C,
hypergeometric test, p <= 1 x 10
-10
and motif match score >= 0.8). The Reb1 motif was enriched
in both environments and about a third of all accessibility changes (28%: 78 of 276). Additionally,
Fkh2 motifs were enriched in glucose while Rph1, Sko1 and Sut1 motifs were in enriched in
ethanol. 33% (28 of 85) of glucose changes and 77% (148 of 191) of ethanol changes contained at
least one of the aforementioned motifs. Only one implicated transcription factor (Fkh2) showed a
change in expression (decrease) in HOS3 deletion in the environment in which it was detected. De
novo motif discovery identified transcription factor motifs among accessibility changes occurring
in WT strain between glucose and ethanol, including motifs for Abf1, Ime1, Opi1, Phd1, Rap1,
Reb1, Rsc30 and Msn2 (hypergeometric test, p <= 1 x 10
-10
and motif match score >= 0.8). A
single motif (Reb1) overlapped with motifs identified in accessibility changes in HOS3 deletion.
130
In total, these results suggest that HOS3 deletion modifies the chromatin binding dynamics of five,
mostly distinct transcription factors.
Figure 4.6 HOS3 deletion alters chromatin accessibility in genomic regions enriched for six
specific transcription factors.
(A) Genomic regions that experienced accessibility changes in HOS3 deletion are shown for
glucose and ethanol. The difference in accessibility between WT and HOS3 deletion was
normalized by row and visualized as a heatmap. Changes were approximately nucleosome sized
and mixed for increased accessibility (red) and decreased accessibility (blue) in HOS3 deletion in
glucose and ethanol. (B) The overlap in genes affected by accessibility changes between glucose
(blue) and ethanol (green) is shown. (C) The occurrence of each motif for the five motifs enriched
in accessibility changes is shown.
4.4 Conclusion
While HOS3 deletion has little to no effect on growth
72,74
, these data show widespread
expression and chromatin accessibility changes occur in the response to this perturbation.
Thousands of genetic variants across hundreds of genes exhibited altered biochemical effects on
131
expression and chromatin accessibility in HOS3 deletion. Yet, over 6 times as many genes
experienced ASE changes and nearly 3 times as many genes experienced allelic accessibility
changes in the WT strain between glucose and ethanol. HOS3-mediated ASE changes affected
distinct cellular compartments and functions compared to environment-mediated ASE changes
(plasma membrane and cellular periphery genes compared to cytoplasm genes). Further, the
majority of these HOS3 mediated effects do not fall within specific functional processes and
exhibited high environmental-specificity. These data show that HOS3 affects expression and
chromatin at more sites than previously known and in distinct cellular compartments relative to
environmental perturbation.
Past work suggests its direct activity at the chromatin, if any, is restricted to the rDNA
locus. Recent work has shown Hos3 regulates protein acetylation in the cytoplasm and can be toxic
if constitutively present in the nucleus
73–75
. Therefore, it is surprising that we observe differential
expression at 206 genes in the deletion. Canonical understanding of deacetylases is that their
activity typically has a repressive effect on transcription, and therefore a loss of deacetylation
would lead to increased expression. However, the opposite effect was seen for most genes here, as
84% of changes were decreases in expression. Many (55%) of the affected genes appear to be
regulated by ribosome biogenesis transcription factors Tod6 and Sfp1
117,118
. HOS3 also affected
chromatin accessibility at hundreds of genomic regions. Both increased and decreased accessibility
changes occurred in HOS3 deletion in both glucose and ethanol. The affected regions were
enriched for transcription factor binding sites for five transcription factors, including Reb1, Fkh2,
Rph1, Sko1, and Sut1. Again, while many more regions experienced changes in accessibility in
response to environmental perturbation, the transcription factors implicated in response to HOS3
132
deletion were mostly distinct. Therefore, HOS3’s deletion appears to affect chromatin accessibility
and expression by regulatory changes to transcription factors and their gene targets.
Whether expression or accessibility changes are directly or indirectly caused by Hos3
deacetylase function at identified genes and genomic regions is unclear, but arguably these data
suggest a predominantly indirect model. While, in vitro studies have shown that Hos3 can directly
bind DNA containing the Reb1 motif, HOS3 deletion only changes acetylation at the rDNA
locus
72,73
. Whether Hos3 directly binds Reb1 sites is unclear, but in any case these sites are a
minority of affected regions. Furthermore, seven transcription factors are implicated in response
to HOS3 deletion, six of which do not show altered expression in the deletion. It is also possible
that Hos3 regulates acetylation of the transcription factors themselves. While deacetylases are
known regulate transcription factors in mammals, the transcription factors indicated in this study
are not known to be regulated by acetylation
119,120
. Recent work has shown that Hos3 impacts the
ability of the Whi5 transcription factor to move from the cytoplasm to the nucleus via nuclear pore
acetylation
75
. It is possible that Hos3 acetylation at the nuclear pore might also influence the
activity other transcription factors, such as those implicated in this work, three of which are known
to be spatially regulated between the cytoplasm and nucleus
121–123
. While it is possible that Hos3
directly binds DNA at Reb1 motifs, a majority of changes to transcription and chromatin
accessibility appear to be mediated indirectly.
Biochemical changes in HOS3 deletion differed dramatically between allele-specific
expression and differential expression, between allelic accessibility and differential accessibility,
between differential expression and differential accessibility, and between glucose and ethanol
(Fig. 4.7). The lack of overlap suggests most HOS3-mediated biochemical changes are generally
context dependent and may have minimal consequences on cellular function. The data indicates
133
that complex modification of transcriptional regulatory networks may generally occur in response
to perturbation. In response to HOS3 deletion at least seven transcription factors could limit the
molecular changes that occur in response this genetic perturbation. This transcriptional rewiring
might explains why HOS3 deletion has minimal impact on growth despite its dramatic biochemical
effects on genetic variation, expression, and chromatin accessibility. Changes in the regulation of
translation, plasma membrane, and the cellular periphery might also be involved. Additionally, far
fewer molecular changes occurred in response to HOS3 deletion compared to the more dramatic
environmental perturbation of the WT strain. Notably, HOS3 deletion does not affect growth while
environment does, suggesting that many molecular changes can be tolerated, up to a point, before
they have observable consequences on phenotype.
Figure 4.7 Wide-spread buffering occurs among accessibility and expression changes occurring in
HOS3 deletion.
The incidence of genes showing allele-specific expression (ASE), allelic accessibility (Allelic
Occ), differential expression (DEG), and accessibility changes (Occ) is shown in glucose (left) and
ethanol (right).
4.5 Methods
4.5.1 Generation of strains
134
A BY MATa ura3∆ strain was mated to a 3S MATα ura3∆ strain to generate a wild type
BY/3S diploid. A CRISPR/Cas9-mediated approach was then used to generate a BY/3S
hos3∆::KanMX/hos3∆::KanMX diploid. We co-transformed a modified version of pML104 which
contained a gRNA that targeted the coding region of HOS3 along with dsDNA repair construct
comprised of KanMX with 30bp upstream and 30bp downstream homology tails to HOS3 using
LiAc transformations
76,104
. Transformants were selected for on plates containing G418 and
confirmed with PCR and sanger sequencing.
4.5.2 Growth curve experiments
BY/3S and BY/3S hos3∆::KanMX/hos3∆::KanMX, strains were streaked onto solid rich
media containing glucose (‘YPD’), which was comprised of 1% yeast extract (BD P/N 212750),
2% peptone (BD P/N: 211677), 2% dextrose (BD P/N 15530), and 2% agar (BD P/N 214050) and
incubated at 30°C overnight. Individual colonies from each strain, along with haploid BY parent
controls, were inoculated into liquid YPD in randomized positioning of a 96 well plate. This plate
was incubated overnight at 30°C until each well reached stationary phase. Next, setbacks of each
culture were inoculated into a new 96 well plate containing fresh media. In this plate, wells were
randomized such that half contained liquid YPD and half contained liquid YPE media which
contained 2% ethanol (Koptec P/N A06141602W) rather than 2% dextrose. This new plate was
incubated at 30°C, shaking on a Biotek ELx808 Absorbance Microplate reader. Optical density
was measured at 600nm every 15 minutes for twenty-two hours.
4.5.3 ATAC-seq and RNA-seq Experiments
BY/3S and BY/3S hos3∆::KanMX/hos3∆::KanMX, strains were streaked onto YPD plates.
After two days of growth at 30°C, three colonies from each strain were inoculated into separate
liquid YPD starter cultures and grown for ~15 hours at 30°C, until 1:4 dilutions of each were at
135
least 1.5 OD. An aliquot from each starter culture was then setback into two separate cultures, one
YPD and one YPE culture, for a total of 12 distinct cultures. Each new culture had a starting
concentration of ~5 x 10
5
cells/ mL. All cultures were incubated at 30°C for ~5 hours at which
time cell density was measured to ensure a minimum of one doubling had occurred in each culture.
From each culture, multiple aliquots of 2.5 x 10
6
cells were taken. One aliquot was immediately
used for ATAC-seq experiments, and the other two were stored in RNA later solution at -80°C. At
a later time, total RNA was extracted from these samples using DNase (Qiagen P/N 79254) and
the Qiagen RNeasy Kit (P/N 74104). Novogene then prepared 150bp paired end mRNA-seq
libraries from the total RNA. All cell counts were measured using a hemocytometer.
4.5.4 ATAC-seq protocol
A modified version of an ATAC-seq protocol optimized for yeast was used
124–126
. Briefly,
cells were first washed in Sorbitol buffer (1.4 M Sorbitol, 40 mM HEPES-KOH pH 7.5, 0.5 mM
MgCl2), resuspended into 0.5mg/mL zymolyase solution (MP Biomedicals, Zymolyse 100T
250mg solution P/N 8320932), and incubated at 30°C, shaking at 300RPM, for 30 minutes.
Following, cells were washed in sorbitol buffer, and each sample was resuspended in 2.5uL
Nextera transposase enzyme + 47.5uL nextera buffer (Illumina DNA Enzyme & Buffer Small Kit
P/N 20034197). Samples were incubated at at 37°C, shaking at 300RPM, for 30 minutes.
Immediately following, DNA was purified using Qiagen MinElute DNA purifaication columns
(P/N 28004), and PCR was performed as described in Buenrostro et al. 2015 using primers with
distinct indices for each sample to a total of no more than 12 cycles of amplification
127
. Samples
were then sent to Novogene for sequencing.
4.5.6 Sequencing and mapping
136
Both mRNA-seq and ATAC-seq libraries were sequenced by Novogene on Illumina HiSeq
4000 lanes using 150 bp x 150 bp paired-end reads on separate runs. Both mRNA-seq and ATAC-
seq libraries were mapped to a custom BY/3S diploid modified from the S288C genome (version
S288C_reference_sequence_R64-2-1_20150113.fsa from the Saccharomyces Genome Database
https://www.yeastgenome.org) which included 44,429 high confidence SNPs from previous
experiments
10
. For both mRNA and ATAC-seq samples, Trimmomatic v0.38 was first used to
remove adapter contamination from fastq reads
128
.
Each mRNA sample was mapped to the diploid genome using STAR aligner v 2.6.0 with
the ‘--outMultimapperOrder Random’ option which ensured that alignments for reads without
SNPs were randomly assigned with equal probability to BY or 3S chromosome
129
. Samtools was
then used to filter out supplementary and non-primary alignments which ensured one alignment
per read
78
. We then used bedtools2 intersect command to obtain the total number of unique reads
mapping to each gene
130
. These data were used to generate a table of read counts per gene per
sample. In parallel, a custom python script using the pysam fetch
78
function was used to determine
the number of unique reads that contained SNPs at each the BY and 3S chromosomes for each
gene containing SNPs.
ATAC-seq samples were mapped to the diploid genome using BWA version 0.7.7-r44
77
.
Duplicate reads were removed using Picard Tools version 2.18.22 MarkDuplicatesWithMateCigar
function
131
. Following, a custom shell script was used to shift each read on the forward strand +4
nt and reads on reverse strand by -5nt to reflect the center of transposon insertion events
124
.
Samtools mpileup was used to generate pileups with coverage at each nucleotide for each sample
78
.
4.5.7 Expression Analyses
137
Differential expression analysis was conducted using edgeR version 3.22.5 in R
132
. We
used DGEList to store gene counts with 4 groupings for each sample type, WT_glucose,
WT_ethanol, hos3_glucose and hos3_ethanol. Glucose and ethanol expression was examined
separately using two distinct DGEList data objects. First, we removed lowly expressed genes with
filterByExpr and determined appropriate scaling factors to account for sequencing depth
differences with calcNormFactors. The classic edgeR pipeline was employed: the extimateDisp
function fit negative binomial models to expression at each gene and exactTest was used to
compared WT and hos3∆/hos3∆ strains. We employed a 5% FDR to determine significant changes
in expression.
To account for regions that may have lost heterozygosity within each sample, we ran
binomial tests on raw counts for gene on the BY and 3S chromosomes for a given sample. Binomial
pvalues were converted into q-values in R
133
. We then ran a custom Hidden Markov Model (HMM)
v 1.0 in R
79
on the vector on q-values on each chromosome to identify long contiguous regions of
significance. These custom probabilities were determine empirically by matching to visual
estimations seen by plotting p-values:
transitionProbabilitiy = matrix(c(0.99,0.08,0.01,0.92),2),
and
emissionProbability = matrix(c(0.7,0.4,0.3,0.6),2))
These LoH regions were flagged and excluded from subsequent allelic expression analyses (ASE).
One sample for 19% (924 of 4,776) of all allelic genes was flagged. Biological replicates were
summed to produce total coverage at BY and 3S chromosomes for each gene in each sample type,
WT_glucose, WT_ethanol, hos3_glucose and hos3_ethanol. We then identified genes showing
allele specific expression by conducting binomial tests within each sample type. A 5% false
138
discovery rate was used to determine significance. Among the subset of genes showing ASE in
either WT or hos3∆/hos3∆ (or both) in a given environment, we then explicitly tested whether
ASE changed between WT and hos3∆/hos3∆ strains with fisher’s exact test at each gene. A 5%
false discovery rate was used to determine significance.
4.5.8 Accessibility Analyses
Bological replicates were summed to produce total coverage at each base for each sample
type, WT_glucose, WT_ethanol, hos3_glucose and hos3_ethanol. Coverage in sliding windows of
100bp were computed at each base and normalized by sequencing depth for each sample type. We
then calculated the difference between WT and hos3∆/hos3∆ in glucose and in ethanol. For a given
comparison, positive values indicated cases of lower coverage indicating more condensed
chromatin in HOS3 deletion, while negative values indicated cases of higher coverage and more
accessible chromatin in HOS3 deletion. The absolute value of all differences were fit to an
exponential distribution with the fitdistr function from the ‘MASS’ package version 7.3.51.1 in
R
134
. The parameters from that fit were used in pexp function to determine which differences
departed from the theoretical exponential distribution. We then used qexp and a bonferonni
threshold to identify values that significantly departed from the exponential distribution. We then
recorded contiguous stretches of significant differences as windows. The most significant
difference within a given window was considered the peak.
Allelic accessibility was examined by comparing sample types at polymorphisms.
Biological replicates were summed at the BY and 3S chromosome for at each SNP. Similarly to
allele specific analyses, we then used binomial tests and employed a 5% false discovery rate to
identify sites showing significant allelic accessibility within each sample type. Significant sites
were then further examined between WT and hos3∆/hos3∆ in glucose and in ethanol and between
139
WT in glucose and WT in ethanol with fisher’s exact test. Again, a 5% false discovery rate was
used to determine significance. When possible allelic accessibility changes were attributed to
genes based on the location of SNPs, either in the promoter (-300bp to -1 bp), coding region, or
3’UTR (+1bp to 100bp).
4.5.9 De novo motif discovery and gene ontology enrichment analyses
Expression and accessibility changes were examined for de novo motif discovery, gene ontology
(GO) enrichment and protein-protein interactions were determined using HOMER v.4.11
135
. The
findMotifsGenome function was used to interrogate accessibility changes which were defined as
peaks +/- 150bp. When possible allelic accessibility changes at polymorphisms were attributed to
genes which were then interrogated with the findMotifs.pl function. Differential expression and
allele specific expression genes were also investigated for motif discovery with findMotifs. Each
discovered motif was required to have a position weight matrix score match of greater than or
equal to 0.8 relative to a known motif along with its p-value to be less than or equal to 1 x 10
-10
to
be considered significant. Both findMotifsGenome.pl and findMotifs.pl functions simultaneously
conducted gene ontology enrichment and protein-protein interaction analyses. We used a 5% false
discovery rate to distinguish significant GO enrichments.
140
4.6 Supplementary Material
Figure S4.1 Correlation of ASE data among individual samples.
The spearman correlation among all samples on genome-wide allele-specific expression values is
displayed. Samples are ordered by hierarchical clustering.
141
Figure S4.2 Environmental perturbation of WT strain alters allele-specific expression (ASE) and
allelic accessibility at thousands of polymorphisms and genes.
(A) Genes showing ASE changes between environments are shown (5% FDR). The percent of
expression from the BY chromosome relative to total allelic expression of both BY and 3S
chromosomes for a given gene is shown in glucose relative to ethanol. (B) Polymorphisms showing
allele-specific accessibility changes between environments are shown. The percent of accessibility
coverage at the BY chromosome relative to total accessibility coverage at both BY and 3S
chromosomes is shown in glucose relative to wildtype.
142
Figure S4.3 Genes experiencing allele-specific expression (ASE) changes in HOS3 deletion in
both environments showed similar effects.
Genes showing ASE changes in the HOS3 deletion strain in both glucose and ethanol, just glucose
and just ethanol are shown. We plot the difference in ASE in WT relative to the difference in ASE
in hos3∆/hos3∆ for glucose relative to ethanol.
143
Figure S4.4 Polymorphisms that showed allelic occupancy changes in HOS3 deletion in both
environments showed similar effects.
SNPs showing allelic occupancy changes in the HOS3 deletion strain in both glucose and ethanol,
just glucose and just ethanol are shown. We plot the difference in allelic occupancy in WT relative
to the difference in allelic occupancy in hos3∆/hos3∆ for glucose relative to ethanol.
144
Chapter 5 Concluding Remarks
5.1 Impact of my work
The work presented in this thesis has prvided insight into genetic and molecular properties
of the genotype-phenotype relationship. I have engineered different genetic perturbations into a
same budding yeast cross and examined modifications of how genotype specifes phenotype. My
work has addressed these specific questions:
What genetic architectures underly genetic background effects?
In chapter 2, I identify 7 gene deletions that increased phenotypic variation in a budding
yeast cross. Each of these genetic background effects are instances of rewiring of the genotype-
phenotype relationship, because while the same genetic variation was present a new range of
phenotypes was observed. Genetic mapping identified more than 1,000 mutationally-responsive
genetic effects which explicitly proved that genetic background effects can be highly polygenic.
A majority of these genetic effects consisted of pairs and trios of loci whose collective phenotypic
effects were modified by the gene deletions. Specific instances of genotype-phenotype rewiring
occurred in complex and different ways, included instances where genetic variants gained
phenotypic influence as well as cases where genetic variants lost phenotypic influence. Most
genetic effects that responded to the gene deletions also responded to environment. Furthermore,
a common set of genetic effects exhibited reduced phenotypic effects in each of the examined
background effects, implying a common set of loci may be generally responsive to genetic
perturbations. This work suggests general properties of the genetic architecture underlying genetic
background effects and the genotype-phenotype relationship more generally.
145
How can a common perturbation produce a spectrum of phenotypic responses across genetically
distinct individuals?
In chapter 3, I characterize the expressivity of a spontaneous mutation. I detected 18 genetic
factors that determined the full range of the mutation’s expressivity ranging from lethal to benign.
Due to tight linkage of two mutationally-responsive genetic variants, initial experiments implied a
single locus determined two qualitative responses to the mutation. Cloning, genetic engineering
and additional crosses revealed this simplicity to be an illusion, and showed that many
mutationally-responsive loci produced a spectrum of phenotypic responses. Importantly, this work
shows that quantitative variation can lead to what appear to be discrete, qualitative outcomes. In
the course of this work, I determined that the spontaneous mutation affected mitochondrial genome
stability. Notable, quantitative variation in this key cellular process was tolerated up to a point.
Interestingly, in some individuals the mutation didn’t affect mitochondrial genome stability,
implying that distinct cellular processes also underlie response to this mutation. Variable
expressivity of mutations is important in other systems, including in humans where it is unknown
why some people with highly penetrant disease alleles show no pathological phenotype
2,30
. This
study suggests many loci and distinct cellular processes may underlie such examples of resilience.
What molecular features underly modifications of the genotype-phenotype connection?
In chapter 4, I examine the molecular consequences of a gene deletion with no observable
impact on growth. While it is known that a majority of gene deletions have minimal impact on
phenotype, molecular changes often accompany these phenotypically silent perturbations
111–114
. In
this work, I show this gene deletion modifies chromatin accessibility and expression at thousands
of polymorphisms across hundreds of genes, including enrichment for cellular periphery genes.
146
However, few allelic changes overlapped with expression and accessibility changes that were
irrespective of variation. Instead differential expression and accessibility changes identified seven
transcription factors whose activity were likely altered by the deletion and resulted in complex
changes to transcriptional regulation of genes involved in translation. Together, these data show
that HOS3 deletion lead to many biochemical changes and complex transcriptional re-wiring of
translation and other processes that likely buffered this deletion from affecting phenotype. This
study shows how genetic perturbations can modify mechanisms that govern the genotype-
phenotype relationship in a way that does not alter phenotype.
5.2 Future Directions
Following the work discussed in Chapter 2, it will be important to determine how often
gene deletions result in genetic background effects. The study described in Chapter 2 was limited
to a survey of 47 chromatin genes, and the study design prevented detection of more subtle kinds
of background effects, such as line crossing epistasis
31
. A genome-wide examination of gene
deletions for background effects would address answer this question. This study would provide
general insight into background effects by answering questions such as, do genes with particular
functions or genetic relationships tend to produce background effects, are there different genetic
architectures that underly different kinds of background effects, how often do background effects
produce novel phenotypes, and are there cohorts of loci that respond to sets of background effects.
However, implementation of new technology is required to enable such a high-throughput study.
In fact, this experiment is currently underway in a collaboration between the Ehrenreich Lab and
the Levy Lab which has developed a barcoding system to enable such work. I have contributed to
resource building as detailed in Appendix C which is in part necessary for this follow-up study.
147
Following the discovery of the spontaneous mrp20-105E mutation and elucidation of the
complex genetics that determine phenotypic response to this mutation, questions arise about how
general it is for spontaneous mutations to alter the genotype-phenotype relationship. As presented
in Chapter 3, strain engineering and budding yeast crosses can be used to more deeply characterize
the expressivity of mutations. Determining general take-aways about how more mutations modify
the genotype-phenotype relationship, such as determining genetic complexity, phenotypic
changes, and cellular processes underlying these responses could provide insights into diagnosis
and pathology in various diseases in which spontaneous mutations are thought to be important
53-
55
. These studies might also lead to an improved ability to predict phenotypes from genotypes.
The research presented in Chapters 3 and 4 shows that perturbations to the genotype-
phenotype map can alter cells in complex ways. For instance, natural variation in conjunction with
the spontaneous mrp20-105E mutation affected mitochondria function. It would be interesting to
know how genetic variation could compensate for this mutation to produce a spectrum of
phenotypic responses. Cloning of the remaining 18 genetic variants influencing this mutation could
provide insight into the molecular means by which this was mutation was tolerated. Furthermore,
examining how Mkt1 expression and protein activity are affected by mrp20-105E might clarify
how variation in Mkt1 can impact mitochondrial function and expressivity of this mutation.
Following discovery that HOS3 deletion affects genetic variation’s biochemical impacts
on chromatin accessibility and gene expression but has no impact on growth, it is important to
determine how other gene deletion’s do or do not influence genetic variation’s effects on
expression and translation. Examining more gene deletions using heterozygous diploids would
answer this important question. If this study examined many kinds of genes, it could also lead to
a better understanding of which molecular consequences do and do not lead to phenotypic changes.
148
References
1. Chandler, C. H., Chari, S. & Dworkin, I. Does your gene need a background check? How
genetic background impacts the analysis of mutations, genes, and evolution. Trends Genet
29, 358–366 (2013).
2. Nadeau, J. H. Modifier genes in mice and humans. Nat Rev Genet 2, 165–174 (2001).
3. Chow, C. Y. Bringing genetic background into focus. Nat Rev Genet 17, 63–64 (2016).
4. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–
753 (2009).
5. Yang, J. et al. Common SNPs explain a large proportion of the heritability for human height.
Nat Genet 42, 565–569 (2010).
6. Lee, J. J. et al. Gene discovery and polygenic prediction from a 1.1-million-person GWAS of
educational attainment. Nat Genet 50, 1112–1121 (2018).
7. Mackay, T. F. C. Epistasis and quantitative traits: using model organisms to study gene-gene
interactions. Nat Rev Genet 15, 22–33 (2014).
8. Mackay, T. F. & Moore, J. H. Why epistasis is important for tackling complex human
disease genetics. Genome Med 6, 124 (2014).
9. Lee, J. T., Coradini, A. L. V., Shen, A. & Ehrenreich, I. M. Layers of Cryptic Genetic
Variation Underlie a Yeast Complex Trait. Genetics 211, 1469–1482 (2019).
10. Taylor, M. B., Phan, J., Lee, J. T., McCadden, M. & Ehrenreich, I. M. Diverse genetic
architectures lead to the same cryptic phenotype in a yeast cross. Nat Commun 7, 11669
(2016).
11. Forsberg, S. K. G., Bloom, J. S., Sadhu, M. J., Kruglyak, L. & Carlborg, Ö. Accounting for
genetic interactions improves modeling of individual quantitative trait phenotypes in yeast.
Nat Genet 49, 497–503 (2017).
12. Bloom, J. S., Ehrenreich, I. M., Loo, W. T., Lite, T.-L. V. & Kruglyak, L. Finding the
sources of missing heritability in a yeast cross. Nature 494, 234–237 (2013).
13. Bloom, J. S. et al. Genetic interactions contribute less than additive effects to quantitative
trait variation in yeast. Nat Commun 6, 8712 (2015).
14. Lee, J. T., Taylor, M. B., Shen, A. & Ehrenreich, I. M. Multi-locus Genotypes Underlying
Temperature Sensitivity in a Mutationally Induced Trait. PLoS Genet 12, e1005929 (2016).
15. Ziv, N., Shuster, B. M., Siegal, M. L. & Gresham, D. Resolving the Complex Genetic Basis
of Phenotypic Variation and Variability of Cellular Growth. Genetics 206, 1645–1657
(2017).
16. Winzeler, E. A. et al. Functional characterization of the S. cerevisiae genome by gene
deletion and parallel analysis. Science 285, 901–906 (1999).
17. Kamath, R. S. et al. Systematic functional analysis of the Caenorhabditis elegans genome
using RNAi. Nature 421, 231–237 (2003).
18. Baba, T. et al. Construction of Escherichia coli K-12 in-frame, single-gene knockout
mutants: the Keio collection. Mol Syst Biol 2, 2006.0008 (2006).
19. Hentges, K. E., Pollock, D. D., Liu, B. & Justice, M. J. Regional variation in the density of
essential genes in mice. PLoS Genet 3, e72 (2007).
20. Dietzl, G. et al. A genome-wide transgenic RNAi library for conditional gene inactivation in
Drosophila. Nature 448, 151–156 (2007).
21. Tong, A. H. Y. & Boone, C. Synthetic genetic array analysis in Saccharomyces cerevisiae.
Methods Mol Biol 313, 171–192 (2006).
149
22. Tong, A. H. Y. et al. Global mapping of the yeast genetic interaction network. Science 303,
808–813 (2004).
23. Tong, A. H. et al. Systematic genetic analysis with ordered arrays of yeast deletion mutants.
Science 294, 2364–2368 (2001).
24. Costanzo, M. et al. A global genetic interaction network maps a wiring diagram of cellular
function. Science 353, aaf1420 (2016).
25. Costanzo, M. et al. The Genetic Landscape of a Cell. Science 327, 425–431 (2010).
26. Li, Z. et al. Systematic exploration of essential yeast gene function with temperature-
sensitive mutants. Nat Biotechnol 29, 361–367 (2011).
27. Calarco, J. A. & Norris, A. D. Synthetic Genetic Interaction (CRISPR-SGI) Profiling in
Caenorhabditis elegans. Bio Protoc 8, e2756 (2018).
28. Byrne, A. B. et al. A global analysis of genetic interactions in Caenorhabditis elegans. J Biol
6, 8 (2007).
29. Taylor, M. B. & Ehrenreich, I. M. Genetic interactions involving five or more genes
contribute to a complex trait in yeast. PLoS Genet 10, e1004324 (2014).
30. Riordan, J. D. & Nadeau, J. H. From Peas to Disease: Modifier Genes, Network Resilience,
and the Genetics of Health. Am J Hum Genet 101, 177–191 (2017).
31. Richardson, J. B., Uppendahl, L. D., Traficante, M. K., Levy, S. F. & Siegal, M. L. Histone
variant HTZ1 shows extensive epistasis with, but does not increase robustness to, new
mutations. PLoS Genet 9, e1003733 (2013).
32. Taylor, M. B. & Ehrenreich, I. M. Transcriptional Derepression Uncovers Cryptic Higher-
Order Genetic Interactions. PLoS Genet 11, e1005606 (2015).
33. Chandler, C. H., Chari, S., Tack, D. & Dworkin, I. Causes and consequences of genetic
background effects illuminated by integrative genomic analysis. Genetics 196, 1321–1336
(2014).
34. Vu, V. et al. Natural Variation in Gene Expression Modulates the Severity of Mutant
Phenotypes. Cell 162, 391–402 (2015).
35. Tirosh, I., Reikhav, S., Sigal, N., Assia, Y. & Barkai, N. Chromatin regulators as capacitors
of interspecies variations in gene expression. Mol Syst Biol 6, 435 (2010).
36. Dworkin, I. et al. Genomic consequences of background effects on scalloped mutant
expressivity in the wing of Drosophila melanogaster. Genetics 181, 1065–1076 (2009).
37. Rutherford, S. L. & Lindquist, S. Hsp90 as a capacitor for morphological evolution. Nature
396, 336–342 (1998).
38. Queitsch, C., Sangster, T. A. & Lindquist, S. Hsp90 as a capacitor of phenotypic variation.
Nature 417, 618–624 (2002).
39. Jarosz, D. F. & Lindquist, S. Hsp90 and environmental stress transform the adaptive value of
natural genetic variation. Science 330, 1820–1824 (2010).
40. Geiler-Samerotte, K. A., Zhu, Y. O., Goulet, B. E., Hall, D. W. & Siegal, M. L. Selection
Transforms the Landscape of Genetic Variation Interacting with Hsp90. PLoS Biol 14,
e2000465 (2016).
41. Ehrenreich, I. M. Epistasis: Searching for Interacting Genetic Variants Using Crosses.
Genetics 206, 531–535 (2017).
42. Carlborg, O. & Haley, C. S. Epistasis: too often neglected in complex trait studies? Nat Rev
Genet 5, 618–625 (2004).
43. She, R. & Jarosz, D. F. Mapping Causal Variants with Single-Nucleotide Resolution Reveals
Biochemical Drivers of Phenotypic Change. Cell 172, 478-490.e15 (2018).
150
44. Treusch, S., Albert, F. W., Bloom, J. S., Kotenko, I. E. & Kruglyak, L. Genetic Mapping of
MAPK-Mediated Complex Traits Across S. cerevisiae. PLOS Genetics 11, e1004913 (2015).
45. Churchill, G. A. et al. The Collaborative Cross, a community resource for the genetic
analysis of complex traits. Nat Genet 36, 1133–1137 (2004).
46. Long, A. D., Macdonald, S. J. & King, E. G. Dissecting Complex Traits Using the
Drosophila Synthetic Population Resource. Trends Genet 30, 488–495 (2014).
47. Porto, A., Schmelter, R., VandeBerg, J. L., Marroig, G. & Cheverud, J. M. Evolution of the
Genotype-to-Phenotype Map and the Cost of Pleiotropy in Mammals. Genetics 204, 1601–
1612 (2016).
48. Carlborg, O., Hocking, P. M., Burt, D. W. & Haley, C. S. Simultaneous mapping of epistatic
QTL in chickens reveals clusters of QTL pairs with similar genetic effects on growth. Genet
Res 83, 197–209 (2004).
49. Liti, G. et al. Population genomics of domestic and wild yeasts. Nature 458, 337–341 (2009).
50. Schacherer, J., Shapiro, J. A., Ruderfer, D. M. & Kruglyak, L. Comprehensive
polymorphism survey elucidates population structure of Saccharomyces cerevisiae. Nature
458, 342–345 (2009).
51. Dowell, R. D. et al. Genotype to phenotype: a complex problem. Science 328, 469 (2010).
52. Cooper, D. N., Krawczak, M., Polychronakos, C., Tyler-Smith, C. & Kehrer-Sawatzki, H.
Where genotype is not predictive of phenotype: towards an understanding of the molecular
basis of reduced penetrance in human inherited disease. Hum Genet 132, 1077–1130 (2013).
53. Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly
associated with autism. Nature 485, 237–241 (2012).
54. Jin, S. C. et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart
disease probands. Nat Genet 49, 1593–1601 (2017).
55. Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic networks. Nature
506, 179–184 (2014).
56. Jerison, E. R. et al. Genetic variation in adaptability and pleiotropy in budding yeast. Elife 6,
e27167 (2017).
57. Carlborg, O., Jacobsson, L., Ahgren, P., Siegel, P. & Andersson, L. Epistasis and the release
of genetic variation during long-term selection. Nat Genet 38, 418–420 (2006).
58. Hemani, G., Knott, S. & Haley, C. An evolutionary perspective on epistasis and the missing
heritability. PLoS Genet 9, e1003295 (2013).
59. Taylor, M. B. & Ehrenreich, I. M. Higher-order genetic interactions and their contribution to
complex traits. Trends Genet 31, 34–40 (2015).
60. Sackton, T. B. & Hartl, D. L. Genotypic Context and Epistasis in Individuals and
Populations. Cell 166, 279–287 (2016).
61. Chandler, C. H. et al. How well do you know your mutation? Complex effects of genetic
background on expressivity, complementation, and ordering of allelic effects. PLoS Genet
13, e1007075 (2017).
62. Matsui, T., Lee, J. T. & Ehrenreich, I. M. Genetic suppression: Extending our knowledge
from lab experiments to natural populations. Bioessays 39, (2017).
63. Chandler, C. H., Chari, S., Tack, D. & Dworkin, I. Causes and consequences of genetic
background effects illuminated by integrative genomic analysis. Genetics 196, 1321–1336
(2014).
64. Paaby, A. B. et al. Wild worm embryogenesis harbors ubiquitous polygenic modifier
variation. Elife 4, (2015).
151
65. van Swinderen, B. & Greenspan, R. J. Flexibility in a gene network affecting a simple
behavior in Drosophila melanogaster. Genetics 169, 2151–2163 (2005).
66. Schell, R., Mullis, M. & Ehrenreich, I. M. Modifiers of the Genotype-Phenotype Map:
Hsp90 and Beyond. PLoS Biol 14, e2001015 (2016).
67. Paaby, A. B. & Rockman, M. V. Cryptic genetic variation: evolution’s hidden substrate. Nat
Rev Genet 15, 247–258 (2014).
68. Gibson, G. & Dworkin, I. Uncovering cryptic genetic variation. Nat Rev Genet 5, 681–690
(2004).
69. Bergman, A. & Siegal, M. L. Evolutionary capacitance as a general feature of complex gene
networks. Nature 424, 549–552 (2003).
70. Li, B., Carey, M. & Workman, J. L. The role of chromatin during transcription. Cell 128,
707–719 (2007).
71. Rando, O. J. & Winston, F. Chromatin and transcription in yeast. Genetics 190, 351–387
(2012).
72. Carmen, A. A. et al. Yeast HOS3 forms a novel trichostatin A-insensitive homodimer with
intrinsic histone deacetylase activity. Proc Natl Acad Sci U S A 96, 12356–12361 (1999).
73. Robyr, D. et al. Microarray deacetylation maps determine genome-wide functions for yeast
histone deacetylases. Cell 109, 437–446 (2002).
74. Wang, M. & Collins, R. N. A lysine deacetylase Hos3 is targeted to the bud neck and
involved in the spindle position checkpoint. Mol Biol Cell 25, 2720–2734 (2014).
75. Kumar, A. et al. Daughter-cell-specific modulation of nuclear pore complexes controls cell
cycle entry during asymmetric division. Nat Cell Biol 20, 432–442 (2018).
76. Gietz, R. D. & Woods, R. A. Transformation of yeast by lithium acetate/single-stranded
carrier DNA/polyethylene glycol method. Methods Enzymol 350, 87–96 (2002).
77. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25, 1754–1760 (2009).
78. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–
2079 (2009).
79. Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech
recognition. Proceedings of the IEEE 257–286 (1989).
80. Matsui, T. & Ehrenreich, I. M. Gene-Environment Interactions in Stress Response
Contribute Additively to a Genotype-Environment Interaction. PLoS Genet 12, e1006158
(2016).
81. Griffiths, A. J. F., Wessler, S. R., Carroll, S. B. & Doebley, J. Introduction to genetic
analysis. (Macmillian Publishers, 2015).
82. Raj, A., Rifkin, S. A., Andersen, E. & van Oudenaarden, A. Variability in gene expression
underlies incomplete penetrance. Nature 463, 913–918 (2010).
83. Wagner, M. R. et al. Microbe-dependent heterosis in maize. Proc Natl Acad Sci U S A 118,
e2021965118 (2021).
84. Chari, S. & Dworkin, I. The conditional nature of genetic interactions: the consequences of
wild-type backgrounds on mutational interactions in a genome-wide modifier screen. PLoS
Genet 9, e1003661 (2013).
85. Mullis, M. N., Matsui, T., Schell, R., Foree, R. & Ehrenreich, I. M. The complex
underpinnings of genetic background effects. Nat Commun 9, 3548 (2018).
86. Hou, J., Tan, G., Fink, G. R., Andrews, B. J. & Boone, C. Complex modifier landscape
underlying genetic background effects. PNAS 116, 5045–5054 (2019).
152
87. Parts, L. et al. Natural variants suppress mutations in hundreds of essential genes. Mol Syst
Biol 17, e10138 (2021).
88. Galardini, M. et al. The impact of the genetic background on gene deletion phenotypes in
Saccharomyces cerevisiae. Mol Syst Biol 15, e8831 (2019).
89. Fearon, K. & Mason, T. L. Structure and function of MRP20 and MRP49, the nuclear genes
for two proteins of the 54 S subunit of the yeast mitochondrial ribosome. J Biol Chem 267,
5162–5170 (1992).
90. Koc, E. C. et al. The large subunit of the mammalian mitochondrial ribosome. Analysis of
the complement of ribosomal proteins present. J Biol Chem 276, 43958–43969 (2001).
91. Dimitrov, L. N., Brem, R. B., Kruglyak, L. & Gottschling, D. E. Polymorphisms in multiple
genes contribute to the spontaneous mitochondrial genome instability of Saccharomyces
cerevisiae S288C strains. Genetics 183, 365–383 (2009).
92. Lipinski, K. A., Kaniak-Golik, A. & Golik, P. Maintenance and expression of the S.
cerevisiae mitochondrial genome--from genetics to evolution and systems biology. Biochim
Biophys Acta 1797, 1086–1098 (2010).
93. Shadel, G. S. Yeast as a model for human mtDNA replication. Am J Hum Genet 65, 1230–
1237 (1999).
94. Kucejova, B., Li, L., Wang, X., Giannattasio, S. & Chen, X. J. Pleiotropic effects of the yeast
Sal1 and Aac2 carriers on mitochondrial function via an activity distinct from adenine
nucleotide transport. Mol Genet Genomics 280, 25–39 (2008).
95. Chen, R. et al. Analysis of 589,306 genomes identifies individuals resilient to severe
Mendelian childhood diseases. Nat Biotechnol 34, 531–538 (2016).
96. Narasimhan, V. M. et al. Health and population effects of rare gene knockouts in adult
humans with related parents. Science 352, 474–477 (2016).
97. Carlborg, O. & Haley, C. S. Epistasis: too often neglected in complex trait studies? Nat Rev
Genet 5, 618–625 (2004).
98. Shao, H. et al. Genetic architecture of complex traits: large phenotypic effects and pervasive
epistasis. Proc Natl Acad Sci U S A 105, 19910–19914 (2008).
99. Campbell, R. F., McGrath, P. T. & Paaby, A. B. Analysis of Epistasis in Natural Traits Using
Model Organisms. Trends Genet 34, 883–898 (2018).
100. Kuzmin, E. et al. Systematic analysis of complex genetic interactions. Science 360,
eaao1729 (2018).
101. Benaglia, T., Chauveau, D., Hunter, D. R. & Young, D. S. mixtools: An R Package for
Analyzing Mixture Models. Journal of Statistical Software 32, 1–29 (2009).
102. Churchill, G. A. & Doerge, R. W. Empirical threshold values for quantitative trait
mapping. Genetics 138, 963–971 (1994).
103. Steinmetz, L. M. et al. Dissecting the architecture of a quantitative trait locus in yeast.
Nature 416, 326–330 (2002).
104. Laughery, M. F. et al. New vectors for simple and streamlined CRISPR-Cas9 genome
editing in Saccharomyces cerevisiae. Yeast 32, 711–720 (2015).
105. Kannan, K. et al. One step engineering of the small-subunit ribosomal RNA using
CRISPR/Cas9. Sci Rep 6, 30714 (2016).
106. Thompson, J. R., Register, E., Curotto, J., Kurtz, M. & Kelly, R. An improved protocol
for the preparation of yeast cells for transformation by electroporation. Yeast 14, 565–571
(1998).
153
107. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred
kilobases. Nat Methods 6, 343–345 (2009).
108. Sikorski, R. S. & Hieter, P. A system of shuttle vectors and yeast host strains designed
for efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122, 19–27
(1989).
109. Giaever, G. et al. Functional profiling of the Saccharomyces cerevisiae genome. Nature
418, 387–391 (2002).
110. Korona, R. Gene dispensability. Curr Opin Biotechnol 22, 547–551 (2011).
111. Gu, Z. et al. Role of duplicate genes in genetic robustness against null mutations. Nature
421, 63–66 (2003).
112. Kafri, R., Bar-Even, A. & Pilpel, Y. Transcription control reprogramming in genetic
backup circuits. Nat Genet 37, 295–299 (2005).
113. Kafri, R., Dahan, O., Levy, J. & Pilpel, Y. Preferential protection of protein interaction
network hubs in yeast: Evolved functionality of genetic redundancy. Proc Natl Acad Sci U S
A 105, 1243–1248 (2008).
114. Lin, Y.-S., Hwang, J.-K. & Li, W.-H. Protein complexity, gene duplicability and gene
dispensability in the yeast genome. Gene 387, 109–117 (2007).
115. Rundlett, S. E. et al. HDA1 and RPD3 are members of distinct yeast histone deacetylase
complexes that regulate silencing and transcription. Proc Natl Acad Sci U S A 93, 14503–
14508 (1996).
116. Lam, C. et al. A visual screen of protein localization during sporulation identifies new
components of prospore membrane-associated complexes in budding yeast. Eukaryot Cell
13, 383–391 (2014).
117. Fingerman, I., Nagaraj, V., Norris, D. & Vershon, A. K. Sfp1 plays a key role in yeast
ribosome biogenesis. Eukaryot Cell 2, 1061–1068 (2003).
118. Huber, A. et al. Sch9 regulates ribosome biogenesis via Stb3, Dot6 and Tod6 and the
histone deacetylase complex RPD3L. EMBO J 30, 3052–3064 (2011).
119. Choudhary, C., Weinert, B. T., Nishida, Y., Verdin, E. & Mann, M. The growing
landscape of lysine acetylation links metabolism and cell signalling. Nat Rev Mol Cell Biol
15, 536–550 (2014).
120. Park, J.-M., Jo, S.-H., Kim, M.-Y., Kim, T.-H. & Ahn, Y.-H. Role of transcription factor
acetylation in the regulation of metabolic homeostasis. Protein Cell 6, 804–813 (2015).
121. Pascual-Ahuir, A., Posas, F., Serrano, R. & Proft, M. Multiple levels of control regulate
the yeast cAMP-response element-binding protein repressor Sko1p in response to stress. J
Biol Chem 276, 37373–37378 (2001).
122. Marion, R. M. et al. Sfp1 is a stress- and nutrient-sensitive regulator of ribosomal protein
gene expression. Proc Natl Acad Sci U S A 101, 14315–14322 (2004).
123. Tkach, J. M. et al. Dissecting DNA damage response pathways by analyzing protein
localization and abundance changes during DNA replication stress. Nat Cell Biol 14, 966–
976 (2012).
124. Buenrostro, J. D., Giresi, P. G., Zaba, L. C., Chang, H. Y. & Greenleaf, W. J.
Transposition of native chromatin for fast and sensitive epigenomic profiling of open
chromatin, DNA-binding proteins and nucleosome position. Nat Methods 10, 1213–1218
(2013).
154
125. Buenrostro, J. D., Wu, B., Chang, H. Y. & Greenleaf, W. J. ATAC-seq: A Method for
Assaying Chromatin Accessibility Genome-Wide. Curr Protoc Mol Biol 109, 21.29.1-
21.29.9 (2015).
126. Schep, A. N. et al. Structured nucleosome fingerprints enable high-resolution mapping of
chromatin architecture within regulatory regions. Genome Res 25, 1757–1770 (2015).
127. Buenrostro, J., Wu, B., Chang, H. & Greenleaf, W. ATAC-seq: A Method for Assaying
Chromatin Accessibility Genome-Wide. Curr Protoc Mol Biol 109, 21.29.1-21.29.9 (2015).
128. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina
sequence data. Bioinformatics 30, 2114–2120 (2014).
129. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21
(2013).
130. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing
genomic features. Bioinformatics 26, 841–842 (2010).
131. Picard Tools. (Broad Institute).
132. Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for
differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140
(2010).
133. Storey, J., Bass, A., Dabney, A. & Robinson, D. qvalue: Q-value estimation for false
discovery rate control. (2021).
134. Venables, W. & Ripley, B. Modern Applied Statistics with S. (2002).
135. Heinz, S. et al. Simple combinations of lineage-determining transcription factors prime
cis-regulatory elements required for macrophage and B cell identities. Mol Cell 38, 576–589
(2010).
136. Seidl, F. et al. Genome of Spea multiplicata, a Rapidly Developing, Phenotypically
Plastic, and Desert-Adapted Spadefoot Toad. G3 (Bethesda) 9, 3909–3919 (2019).
137. Seidl, F. et al. Variation in hybrid gene expression: implications for the evolution of
genetic incompatibilities in interbreeding species. Mol Ecol 28, 4667–4679 (2019).
138. Levis, N. A., Reed, E. M. X., Pfennig, D. W. & Burford Reiskind, M. O. Identification of
candidate loci for adaptive phenotypic plasticity in natural populations of spadefoot toads.
Ecol Evol 10, 8976–8988 (2020).
155
Appendix A: Modifiers of the Genotype–Phenotype Map: Hsp90 and Beyond
This mini review was published as a primer to accompany Geiler-Samerotte, et. al. 2016.
40
in
PLOS Biology.
A.1 Summary of Contribution
As co-first author on this review, I conceptualized and co-wrote this paper along with Martin
Mullis and Ian Ehrenreich.
A.2 Abstract
Disruption of certain genes alters the heritable phenotypic variation among individuals. Research
on the chaperone Hsp90 has played a central role in determining the genetic basis of this
phenomenon, which may be important to evolution and disease. Key studies have shown that
Hsp90 perturbation modifies the effects of many genetic variants throughout the genome. These
modifications collectively transform the genotype–phenotype map, often resulting in a net increase
or decrease in heritable phenotypic variation. Here, we summarize some of the foundational work
on Hsp90 that led to these insights, discuss a framework for interpreting this research that is
centered upon the standard genetics concept of epistasis, and propose major questions that future
studies in this area should address.
156
A.3 Main Text
A.3.1 Perturbation of Hsp90 Impacts Heritable Phenotypic Variation
How particular genes modulate the heritable phenotypic variation that segregates within
populations is not fully understood. Work on the chaperone Hsp90 has generated some of the most
provocative insights into this problem. In a landmark paper, Rutherford and Lindquist
demonstrated that perturbation of Hsp90 reveals heritable phenotypic variation among genetically
distinct Drosophila isolates [1]. Similar experiments in Arabidopsis [2] and yeast [3] subsequently
showed that this effect generalizes across species.
The initial interpretation of these results was that Hsp90 is a buffer that suppresses the
effects of genetic variation on a global scale [1,2] (Box 1). Perturbation of Hsp90 was thus thought
to uncover cryptic polymorphisms that do not typically show effects, thereby expanding the
amount of heritable phenotypic variation in a population [1,2,4–7] (Fig 1A). Because of its role in
buffering, Hsp90 began to be regarded as a capacitor that facilitates the accumulation of cryptic
genetic variation [1–3,8–12].
The buffer and capacitor concepts were appealing because of Hsp90’s function as a
chaperone for many key signaling proteins and transcriptional regulators [13,14]. However, as
additional studies examined the influence of Hsp90 on genetic variation, a more complex picture
emerged. Contrary to previous findings, instances were found in which perturbation of Hsp90
reduced heritable phenotypic variation [3,8] (Fig 1B). This suggested that Hsp90 does not always
act as a genetic buffer but instead sometimes makes possible (or potentiates) the effects of genetic
variation [3,8].
157
Box 1. Glossary of Relevant Terms
Capacitor. A gene that facilitates the accumulation of cryptic genetic variation.
Crypticgeneticvariants. Genetic variants that only show effects under atypical
conditions, such as when specificgenes are compromised or the environment markedly
changes.
Epistasis. When the effect of one variant is influenced by one or more other variants.
Geneticbuffer. A gene that suppresses the effects of one or more variants.
Genotype–phenotypemap. The correspondence between genotypes and the
phenotypes they produce.
Globalmodifier.A gene that acts as an epistatic modifierfor many variants.
Magnitude epistasis. An epistatic interaction that involves a variant’s effect size
changing while its effect sign remains the same.
Modifier. A gene or variant that influences the effect of another through an epistatic
interaction.
Mutation accumulationlines. Strains that have accumulated new mutations during
many generations of minimal selection.
New mutations. Recently arisen variants that have not yet been exposed to natural
selection.
Potentiator. A gene that enables certain variants to exhibit effects.
Sign epistasis. An epistatic interaction that involves a variant’s effect changing in
sign (i.e., from positive to negative or vice versa).
Standing geneticvariation. Genetic variation that segregates within populations and
may have been subject to natural selection.
A.3.2 Selection Complicates Inferences about Buffering and Potentiation by Hsp90
The extent to which Hsp90 acts as a buffer or a potentiator has bearing on our expectations for
how new mutations and standing genetic variants will affect heritable phenotypic variation.
Fig 1. Effects of genetic buffers and potentiators on heritable phenotypic variation. In both A and
B, the gray distributions indicate when a buffer or potentiator is functional, while the blue
distributions show when a buffer or potentiator is perturbed. As shown in A, buffers suppress
heritable phenotypic variation among individuals. If these buffers are compromised, heritable
phenotypic variation increases. B illustrates how potentiators act in an opposite manner to buffers,
with heritable phenotypic variation decreasing when potentiators are disrupted.
158
Fig 2. Inferences about the extent of buffering and potentiation are affected by natural selection.
Here, we illustrate how selection may lead to misinterpretations regarding the extent to which
Hsp90 acts as a buffer and a potentiator by disproportionately acting against Hsp90-potentiated
heritable phenotypic variation. We present three scenarios: Hsp90 exclusively acts as a buffer,
Hsp90 exclusively acts as a potentiator, and Hsp90 acts equally often as a buffer and a potentiator.
In each case, we show the genotypes and heritable phenotypic variation that existed prior to
selection as well as the genotypes and heritable phenotypic variation that remain after selection.
Within a given Hsp90 state, black circles represent distinct genotypes. Purple lines indicate the
change in an individual’s phenotype upon Hsp90 perturbation. Gray boxes show the range of
phenotypes that are tolerated by selection under normal conditions in which Hsp90 is functional.
In this figure, we only consider selection that acts to stabilize the population around a particular
mean value.
159
Defining how often Hsp90 occupies each of these roles is important but requires examining
genetic variation that has not been exposed to natural selection. This is because Hsp90-potentiated
phenotypic variation is visible to selection under normal conditions, whereas Hsp90-buffered
phenotypic variation may not be (Fig 2). Thus, standing genetic variation might be biased to
provide evidence for buffering (Fig 2).
In this issue of PLOS Biology, Geiler-Samerotte, Siegal, and coauthors measure the degree
to which Hsp90 buffers and potentiates the effects of new mutations [15]. To do this, the authors
utilize a panel of yeast mutation accumulation (MA) lines, which accrued mutations during many
generations of growth in the absence of selection. They measure the heritable phenotypic variation
shown by these MA lines—as well as by wild isolates and cross progeny from the same species—
before and after disruption of Hsp90. They then compare how these three groups, which differ in
their exposure to selection,respond to Hsp90 perturbation.
Following Hsp90 impairment, heritable phenotypic variation is reduced in the MA lines
but increased in the isolates [15]. Given that the MA lines have not been exposed to selection and
the isolates have, this result implies that Hsp90 predominantly acts as a potentiator and that
selection has acted against Hsp90-potentiated phenotypic variation in nature. Consistent with this
interpretation, cross progeny that carry new combinations of standing variants also show reduced
heritable phenotypic variation upon Hsp90 perturbation [15]. This finding suggests that standing
variants with Hsp90-potentiated effects are common in nature, but combinations of these variants
that produce extreme phenotypes are selected against. Together, the results from the MA lines and
cross progeny, both of which represent groups of strains that have not been biased by selection,
support a view that Hsp90 acts more as a potentiator than a buffer.
A.3.3 Hsp90 Is a Global Modifier That Shows Extensive Epistasis
160
With continued research, the utility of the terms buffer, capacitor, and potentiator becomes
less clear. Not only do some of these words have meanings that seem mutually exclusive (e.g.,
buffer versus potentiator), but their relevance is also shaped by contextual factors, such as the type
of genetic variation being examined and its exposure to selection. Moreover, these terms are often
used to explain increases and decreases in heritable phenotype variation, which are in fact mediated
by specific genetic variants that may show different types of responses to Hsp90 perturbation
[3,15–17] (Fig 3).
In light of these considerations, the standard genetics concept of epistasis (or genetic
interaction) may be a more straightforward way to communicate Hsp90’s effect on heritable
phenotypic variation [15,18]. Epistasis occurs when the effect of one variant depends on one or
more other variants [19–22]. Thus, variants that show changes in effect upon Hsp90 perturbation
by definition epistatically interact with Hsp90.
Examining response to Hsp90 perturbation from the perspectiveof epistasis has multiple
advantages. This framing encompasses situations in which Hsp90 buffers or potentiates the effects
of variants (Fig 3). However, it also accommodates quantitative changes in the effect magnitude
or sign of epistatically interacting variants (Fig 3). These quantitative epistatic interactions have
received limited discussion in the Hsp90 literature but are prevalent in nature [23,24,42].
Epistasis can also be used to convey Hsp90’s global impact on the effects of genetic
variants. Typically, a variant that influences the effect of another through an epistatic interaction
is called a modifier. Therefore, we propose that Hsp90 be referred to as a global modifierbecause
of the fact that it may show different types of epistatic interactions with many variants. The term
global modifieris a more general way to describeHsp90’s effect on the genotype–phenotype map
161
than the buffer, capacitor, and potentiator concepts. Notably, this term remains applicable across
the different contexts in which one might investigate response to Hsp90 perturbation.
A.3.4 Questions Regarding Hsp90 and Other Global Modifiers
Research on Hsp90 has been instrumental in showing the significant influence that global
modifiers can exert upon the genotype–phenotypemap. Moving forward, it will be important to
address multiple major questions in this area, including:
Fig 3. Hsp90 shows different forms of epistasis with individual genetic variants. Transformations
of the genotype–phenotype map that occur following Hsp90 perturbation likely reflect the
composite effect of multiple variants that show different types of epistatic interactions with Hsp90.
In A and B, we illustrate how Hsp90 can buffer (A) or potentiate (B) the effects of individual
variants. Furthermore, in C and D, we show how Hsp90 may also behave as a quantitative modifier
that exhibits magnitude (C) or sign epistasis (D) with some variants. In each panel, the two alleles
of a variant are shown along the x-axis, represented by “A” and “B.” On the y-axis, the phenotypes
162
associated with these variants are shown when Hsp90 is functional and compromised using light
and dark blue, respectively. Each circle represents a genetically distinct haploid. For a given Hsp90
state, lines indicate the difference in phenotype between individuals carrying the alternate alleles
of the interacting variant.
Which Genes Behave as Global Modifiers?
Evidence suggests that Hsp90 is not unique in its ability to act as a global modifier. For
example, a number of studies have shown that genes involved in chromatin regulation and
transcriptional repression uncover large amounts of cryptic variation when perturbed[7,25–27] and
that prions [28–30] and proteins with regions of intrinsic disorder [31] can behave in a similar
manner to Hsp90. Moreover, many genes buffer the effects of environmental variation; these genes
might also modify the effects of genetic variation [32]. Future screens will hopefully clarify the
space of genes that are global modifiers.
What Is the Genetic Architecture That Underlies Response to a Global Modifier’s Perturbation?
Much of the work on global modifiers to date has inferred alterations to the genotype–
phenotype map from increases or decreases in heritable phenotypic variation. Yet, such changes
do not provide direct insights into the number of genetic variants and forms of epistasis underlying
these responses. This information can be obtained through statistically powerful genetic mapping
experiments focused on standing genetic variation [23,33] or comprehensive analysis of MA lines
that harbor known new mutations [15,26].
What Are the Mechanisms by Which Global Modifiers Act?
Systematically resolving the exact variants that interact with particular global modifiers
can shed light on how these genes influence heritable phenotypic variation. Epistatic interactions
between a global modifierand variants may arise through a mixture of direct and indirect functional
relationships. For instance, Hsp90 physically interacts with an array of client proteins, and amino
163
acid changes in these clients might impact response to Hsp90 perturbation [13,14]. At the same
time, epistatic interactions can involve variants in genes that have less direct functional
relationships if these genes act in regulatory networks [7,34–36] or in compensatory or parallel
cellular processes [37].
To What Extent Do Global Modifiers Harbor Functional Variation?
Although global modifiers influence the effects of genetic variants in other genes, how
much polymorphisms in global modifiers themselves contribute to heritable phenotypic variation
remains unclear. For example, cis regulatory polymorphisms causing decreases in Hsp90
expression segregate in wild populations of Drosophila and alter the effects of other variants [38].
However, these Hsp90 polymorphisms are highly deleterious and quite rare, suggesting they are
not a major source of heritable phenotypic variation. In contrast, the discoveries of variants that
affect levels of phenotypic variability in mapping studies [39,40] or show epistatic interactions
with many other variants [41] are consistent with the possibility that polymorphisms in global
modifiers might contribute to heritable phenotypic variation in some populations.
A.4 Conclusion
Hsp90 significantly influences the genotype–phenotypemap through to its epistatic
interactions with genetic variants on a global scale. For this reason, it is appropriate to describe
Hsp90 and genes that behave similarly as global modifiers.Major questions to address in this area
moving forward regard the broader space of genes that can act as global modifiers,the mechanisms
by which Hsp90 and other global modifiers modulate the genotype–phenotypemap, and the extent
to which functional polymorphisms in global modifiers themselves affect heritable phenotypic
variation. Research on these topics should improve our understanding of the relationship between
genotype and phenotype.
164
A.5 References
1. Rutherford SL, Lindquist S. Hsp90 as a capacitor for morphological evolution. Nature. 1998; 396
(6709):336–42. doi: 10.1038/24550 PMID: 9845070
2. Queitsch C, Sangster TA, Lindquist S. Hsp90 as a capacitor of phenotypic variation. Nature. 2002;
417
(6889):618–24. doi: 10.1038/nature749 PMID: 12050657
3. Jarosz DF, Lindquist S. Hsp90 and environmental stress transform the adaptive value of natural
genetic variation. Science. 2010; 330(6012):1820–4. doi: 10.1126/science.1195487 PMID:
21205668
4. Gibson G, Dworkin I. Uncovering cryptic genetic variation. Nat Rev Genet. 2004; 5(9):681–90.
doi: 10. 1038/nrg1426 PMID: 15372091
5. Hermisson J, Wagner GP. The population genetic theory of hidden variation and genetic
robustness. Genetics. 2004; 168(4):2271–84. doi: 10.1534/genetics.104.029173 PMID: 15611191
6. Paaby AB, Rockman MV. Cryptic genetic variation: evolution’s hidden substrate. Nat Rev Genet.
2014; 15(4):247–58. doi: 10.1038/nrg3688 PMID: 24614309
7. Taylor MB, Phan J, Lee JT, McCadden M, Ehrenreich IM. Diverse genetic architectures lead to the
same cryptic phenotype in a yeast cross. Nat Commun. 2016; 7:11669. doi: 10.1038/ncomms11669
PMID: 27248513
8. Cowen LE, Lindquist S. Hsp90 potentiates the rapid evolution of new traits: drug resistance in
diverse fungi. Science. 2005; 309(5744):2185–9. doi: 10.1126/science.1118370 PMID: 16195452
9. Rohner N, Jarosz DF, Kowalko JE, Yoshizawa M, Jeffery WR, Borowsky RL, et al. Cryptic
variation in morphological evolution: Hsp90 as a capacitor for loss of eyes in cavefish. Science.
2013; 342
(6164):1372–5. doi: 10.1126/science.1240276 PMID: 24337296
10. Le Rouzic A, Carlborg O. Evolutionary potential of hidden genetic variation. Trends Ecol Evol.
2008; 23
(1):33–7. doi: 10.1016/j.tree.2007.09.014 PMID: 18079017
11. Masel J, Siegal ML. Robustness: mechanisms and consequences. Trends Genet. 2009; 25(9):395–
403. doi: 10.1016/j.tig.2009.07.005 PMID: 19717203
12. Ehrenreich IM, Pfennig DW. Genetic assimilation: a review of its potential proximate causes and
evolutionary consequences. Ann Bot. 2016; 117(5):769–79. doi: 10.1093/aob/mcv130 PMID:
26359425
13. Taipale M, Jarosz DF, Lindquist S. Hsp90 at the hub of protein homeostasis: emerging mechanistic
insights. Nat Rev Mol Cell Biol. 2010; 11(7):515–28. doi: 10.1038/nrm2918 PMID: 20531426
14. Jarosz DF, Taipale M, Lindquist S. Protein homeostasis and the phenotypic manifestation of
genetic diversity: principles and mechanisms. Annu Rev Genet. 2010; 44:189–216. doi:
10.1146/annurev. genet.40.110405.090412 PMID: 21047258
15. Geiler-Samerotte KA, Zhu YO, Goulet BE, Hall DW, Siegal ML. Selection transforms the
landscape of genetic variation interacting with Hsp90. PLoS Biol. 2016; 14(10). doi:
10.1371/journal.pbio.2000465
16. Sangster TA, Salathia N, Lee HN, Watanabe E, Schellenberg K, Morneau K, et al. Hsp90-buffered
genetic variation is common in Arabidopsis thaliana. Proc Natl Acad Sci U S A. 2008;
105(8):2969–74. doi: 10.1073/pnas.0712210105 PMID: 18287064
17. Sangster TA, Salathia N, Undurraga S, Milo R, Schellenberg K, Lindquist S, et al. Hsp90 affects
the expression of genetic variation and developmental stability in quantitative traits. Proc Natl Acad
Sci U S A. 2008; 105(8):2963–8. doi: 10.1073/pnas.0712200105 PMID: 18287065
18. Siegal ML, Leu JY. On the nature and evolutionary impact of phenotypic robustness mechanisms.
Annu Rev Ecol Evol Syst. 2014; 45:496–517. doi: 10.1146/annurev-ecolsys-120213-091705
PMID: 26034410
165
19. Mani R, St Onge RP, Hartman JLt, Giaever G, Roth FP. Defining genetic interaction. Proc Natl
Acad Sci U S A. 2008; 105(9):3461–6. doi: 10.1073/pnas.0712255105 PMID: 18305163
20. Phillips PC. Epistasis—the essential role of gene interactions in the structure and evolution of
genetic systems. Nat Rev Genet. 2008; 9(11):855–67. doi: 10.1038/nrg2452 PMID: 18852697
21. Mackay TF. Epistasis and quantitative traits: using model organisms to study gene-gene
interactions. Nat Rev Genet. 2014; 15(1):22–33. doi: 10.1038/nrg3627 PMID: 24296533
22. Taylor MB, Ehrenreich IM. Higher-order genetic interactions and their contribution to complex
traits. Trends Genet. 2015; 31(1):34–40. doi: 10.1016/j.tig.2014.09.001 PMID: 25284288
23. Bloom JS, Ehrenreich IM, Loo WT, Lite TL, Kruglyak L. Finding the sources of missing
heritability in a yeast cross. Nature. 2013; 494(7436):234–7. doi: 10.1038/nature11867 PMID:
23376951
24. Bloom JS, Kotenko I, Sadhu MJ, Treusch S, Albert FW, Kruglyak L. Genetic interactions
contribute less than additive effects to quantitative trait variation in yeast. Nat Commun. 2015;
6:8712. doi: 10. 1038/ncomms9712 PMID: 26537231
25. Tirosh I, Reikhav S, Sigal N, Assia Y, Barkai N. Chromatin regulators as capacitors of interspecies
variations in gene expression. Mol Syst Biol. 2010; 6:435. doi: 10.1038/msb.2010.84 PMID:
21119629
26. Richardson JB, Uppendahl LD, Traficante MK, Levy SF, Siegal ML. Histone variant HTZ1 shows
extensive epistasis with, but does not increase robustness to, new mutations. PLoS Genet. 2013;
9(8): e1003733. doi: 10.1371/journal.pgen.1003733 PMID: 23990806
27. Taylor MB, Ehrenreich IM. Transcriptional derepression uncovers cryptic higher-order genetic
interactions. PLoS Genet. 2015; 11(10):e1005606. doi: 10.1371/journal.pgen.1005606 PMID:
26484664
28. True HL, Lindquist SL. A yeast prion provides a mechanism for genetic variation and phenotypic
diversity. Nature. 2000; 407(6803):477–83. doi: 10.1038/35035005 PMID: 11028992
29. True HL, Berlin I, Lindquist SL. Epigenetic regulation of translation reveals hidden genetic
variation to produce complex traits. Nature. 2004; 431(7005):184–7. doi: 10.1038/nature02885
PMID: 15311209
30. Halfmann R, Jarosz DF, Jones SK, Chang A, Lancaster AK, Lindquist S. Prions are a common
mechanism for phenotypic inheritance in wild yeasts. Nature. 2012; 482(7385):363–8. doi:
10.1038/ nature10875 PMID: 22337056
31. Chakrabortee S, Byers JS, Jones S, Garcia DM, Bhullar B, Chang A, et al. Intrinsically Disordered
Proteins Drive Emergence and Inheritance of Biological Traits. Cell. 2016; 167(2):369–81 e12.
doi: 10. 1016/j.cell.2016.09.017 PMID: 27693355
32. Levy SF, Siegal ML. Network hubs buffer environmental variation in Saccharomyces cerevisiae.
PLoS Biol. 2008; 6(11):e264. doi: 10.1371/journal.pbio.0060264 PMID: 18986213
33. Ehrenreich IM, Torabi N, Jia Y, Kent J, Martis S, Shapiro JA, et al. Dissection of genetically
complex traits with extremely large pools of yeast segregants. Nature. 2010; 464(7291):1039–42.
doi: 10.1038/ nature08923 PMID: 20393561
34. Bergman A, Siegal ML. Evolutionary capacitance as a general feature of complex gene networks.
Nature. 2003; 424(6948):549–52. doi: 10.1038/nature01765 PMID: 12891357
35. Omholt SW, Plahte E, Oyehaug L, Xiang K. Gene regulatory networks generating the phenomena
of additivity, dominance and epistasis. Genetics. 2000; 155(2):969–80. PMID: 10835414
36. Gjuvsland AB, Hayes BJ, Omholt SW, Carlborg O. Statistical epistasis is a generic feature of gene
regulatory networks. Genetics. 2007; 175(1):411–20. doi: 10.1534/genetics.106.058859 PMID:
17028346
37. Costanzo M, Baryshnikova A, Bellay J, Kim Y, Spear ED, Sevier CS, et al. The genetic landscape
of a cell. Science. 2010; 327(5964):425–31. doi: 10.1126/science.1180823 PMID: 20093466
38. Chen B, Wagner A. Hsp90 is important for fecundity, longevity, and buffering of cryptic
deleterious variation in wild fly populations. BMC Evol Biol. 2012; 12:25. doi: 10.1186/1471-
2148-12-25 PMID: 22369091
166
39. Ronnegard L, Valdar W. Detecting major genetic loci controlling phenotypic variability in
experimental crosses. Genetics. 2011; 188(2):435–47. doi: 10.1534/genetics.111.127068 PMID:
21467569
40. Nelson RM, Pettersson ME, Li X, Carlborg O. Variance heterogeneity in Saccharomyces cerevisiae
expression data: trans-regulation and epistasis. PLoS ONE. 2013; 8(11):e79507. doi:
10.1371/journal. pone.0079507 PMID: 24223957
41. Forsberg SKG, Bloom JS, Sadhu MJ, Kruglyak L, Carlborg O. Accounting for genetic interactions
is necessary for accurate prediction of extreme phenotypic values of quantitative traits in yeast.
bioRxiv. 2016. Preprint. http://dx.doi.org/10.1101/059485
42. Hallin J, Ma¨rtens K, Young AI, Zackrisson M, Salinas F, Parts L, et al. Powerful decomposition
of complex traits in a diploid model. Nat Commun. 2016; 7:13311. doi: 10.1038/ncomms13311.
PMID: 27804950
167
Appendix B: Genome of Spea multiplicata, a Rapidly Developing, Phenotypically
Plastic, and Desert-Adapted Spadefoot Toad
This research paper is presented as published in G3: Genes, Genomes, Genetics as Seidl et al.
136
2019.
B.1 Summary of Contribution
As middle author on this research paper, I conducted experiments that were critical to this
paper. In the course of this work, I implemented a new technology in the lab, long read Nanopore
sequencing. For this project, I prepared libraries and conducted Nanopore sequencing of Spea
multiplicate DNA. Along with other sequencing technologies, including NGS and PacBio long
read sequencing, this data was used to generate a genome for this organism. This resource has
enabled genetic experiments aimed at understanding complex traits, evolution, inter-species
hybridization, and gene-environment interactions both in this work and others, including Seidl et
al.
137
2020 and Levis, et al.
138
2020.
B.2 Abstract
Frogs and toads (anurans) are widely used to study many biological processes. Yet, few
anuran genomes have been sequenced, limiting research on these organisms. Here, we produce a
draft genome for the Mexican spadefoot toad, Spea multiplicata, which is a member of an
unsequenced anuran clade. Atypically for amphibians, spadefoots inhabit deserts. Consequently,
they possess many unique adaptations, including rapid growth and development, prolonged
dormancy, phenotypic (developmental) plasticity, and adaptive, interspecies hybridization. We
assembled and annotated a 1.07 Gb Sp. multiplicata genome containing 19,639 genes. By
comparing this sequence to other available anuran genomes, we found gene amplifications in the
gene families of nodal, hyas3, and zp3 in spadefoots, and obtained evidence that anuran genome
168
size differences are partially driven by variability in intergenic DNA content. We also used the
genome to identify genes experiencing positive selection and to study gene expression levels in
spadefoot hybrids relative to their pure-species parents. Completion of the Sp. multiplicata genome
advances efforts to determine the genetic bases of spadefoots’ unique adaptations and enhances
comparative genomic research in anurans.
169
B.3 Introduction
With at least 7,040 species (AmphibiaWeb 2018), frogs and toads (anurans) occur across
diverse habitats and exhibit a stunning array of adaptations (Duellman and Trueb 1986; Halliday
2016). Moreover, anurans are critical, but increasingly threatened, components of most ecosystems
and thus serve as key bioindicators (Stuart et al. 2004). Despite their importance to fields from
developmental biology and physiology to ecology and evolution, genomic resources are relatively
scarce for anurans. Indeed, fewer genomes are available for anurans than for most other major
groups of vertebrates, with only seven anurans sequenced: the Western clawed frog, Xenopus
tropicalis, and the closely related African clawed frog, Xenopus laevis (Hellsten et al. 2010), the
Tibetan Plateau frog, Nanorana parkeri (Sun et al. 2015), the American bullfrog, Rana (Lithobates)
catesbeiana (Hammond et al. 2017), the Cane toad, Rhinella marina (Edwards et al. 2018), the
Strawberry Poison frog (Oophaga pumilio) (Rogers et al. 2018), and the African bullfrog
(Pyxicephalus adspersus) (Denton et al. 2018). This paucity of genomes limits the use of anurans
as model systems for many important biological questions, especially given their deep levels of
divergence (Bossuyt and Roelants 2009).
Here, we present a draft genome of a New World spadefoot toad, the Mexican spadefoot
toad, Spea multiplicata (family Scaphiopoididae; Figure 1A,B). New World spadefoot toads
(hereafter, ‘spadefoots’) comprise seven diploid species, two of which––Scaphiopus holbrookii
and Sc. hurterii––occur in relatively mesic eastern and central North America, and five of which–
–Sp. multiplicata, Sp. bombifrons, Sp. hammondii, Sp. intermontana, and Sc. couchii––inhabit
xeric western North America. Crucially, relative to the other frogs and toads with published
genomes, spadefoots fill an unsequenced gap of . 200 My on the anuran phylogeny (Figure 1A).
Additionally, spadefoots possess some of the smallest anuran genomes of between 1.0 and 1.4 Gb
170
(Gregory 2018). By contrast, the genomes of other sequenced diploid anurans range from 1.7 Gb
for X. tropicalis (estimated via karyotype) (Hellsten et al. 2010) to a 5.8 Gb assembly size for R.
catesbeiana (Hammond et al. 2017).
Figure1. Spea and Scaphiopus; family Scaphiopodidae) and six other species of sequenced
anurans: Xenopus laevis (Pipidae), Rhinella marina (Bufonidae), Oophaga pumilio
(Dendrobatidae), Rana (Lithobates) catesbeiana (Ranidae), Nanorana parkeri (Dicroglossidae),
and Pyxicephalus adspersus (Pyxicephalidae) [phylogeny after AmphibiaWeb (2018); estimated
divergence times from Bossuyt and Roelants (2009)]. B Mexican spadefoot toads, Spea
multiplicata, possess numerous adaptations for dealing with desert conditions including a
keratinized spade on their hind feet (arrow), which enables them to C burrow underground. D They
171
emerge for only a few weeks each year to feed and to breed in temporary, rain-filled pools. e
Spadefoot tadpoles exhibit rapid, and adaptively flexible, larval development (here, a metamorph
emerges from a drying pond). F They also produce alternative, environmentally induced morphs:
a slower developing omnivore morph (left) and a more rapidly developing carnivore morph (right),
which is induced by, and specializes on, animal prey, such as fairy shrimp (center).
Spadefoots serve as important models in ecology and evolution, owing to their unusual
ecology, rapid development, and striking phenotypic plasticity. For example, spadefoots cope with
their arid habitat by burrowing underground (Ruibal et al. 1969) and estivating for a year or longer
(Mayhew 1965; Mcclanahan 1967; Seymour 1973), emerging for only a few weeks following
warm rains to feed and breed in shortlived pools (Figure 1C, D) (Bragg 1965). Although these
highly ephemeral pools are inaccessible to most anurans, spadefoot tadpoles can survive in them
by developing rapidly––in some cases metamorphosing in eight days post-hatching (Figure 1E)
(Newman 1989). Spadefoots also exhibit multiple forms of phenotypic plasticity that further
hastens their development and allows them to thrive in environments (such as deserts) where
rainfall is highly variable (Tinsley and Tocque 1995).
Specifically, spadefoot tadpoles can facultatively speed up (Denver et al. 1998; Morey and
Reznick 2000; Boorse and Denver 2003; Gomez-Mestre and Buchholz 2006) or slow down
(Newman 1992; Denver et al. 1998; Morey and Reznick 2000) development in response to the
environment. Additionally, whereas most anuran tadpoles are omnivores and exhibit traits adapted
for feeding on detritus and plankton (Wells 2007), Spea tadpoles can develop into an alternative–
–and more rapidly developing––‘carnivore’ ecomorph, which exhibits enlarged jaw muscles and
mouthparts for capturing and consuming large animal prey (Figure 1F) (Pfennig 1990; Pfennig
1992a; Pfennig 1992b; Levis et al. 2015; Levis et al. 2018). This carnivore morph has an additional
advantage in a rapidly drying pond: it can reduce competition and further enhance growth and
development by eating other tadpoles (Pfennig 2000). Finally, as adults, when breeding in shallow,
172
rapidly drying ponds, Sp. bombifrons females preferentially mate with sympatric Sp. multiplicata
males, thereby producing hybrid tadpoles that develop even faster than purespecies tadpoles
(Pfennig 2007). Genome resources for spadefoots will greatly enable further work on
understanding spadefoot’s unique characteristics.
In this paper, we performed a combination of long- and short-read sequencing on Mexican
spadefoots, Sp. multiplicata, and produce a draft genome for this species. By comparing this
genome to other available anuran genomes, we identify several distinctive gene amplifications, as
well as factors contributing to the substantial genome size variation found among anurans. We
then leverage the Sp. multiplicata genome as a platform for exploring evolution in two ways. First,
we produce short-read, whole genome sequencing data for three other species of spadefoots: Plains
spadefoots, Sp. bombifrons, Couch’s spadefoots, Sc. couchii, and Eastern spadefoots, Sc.
holbrookii. We obtain thousands of protein-coding gene sequences for these species by mapping
the data against Sp. multiplicata gene models. This allows us to identify genes exhibiting different
selection pressures in Spea or Scaphiopus, including positive selection, thereby providing insights
into specific genes that may underlie adaptive evolution in these genera. Second, we generate
transcriptome data for Sp. multiplicata and Sp. bombifrons tadpoles, as well as for tadpoles
produced by hybridizing these species, thereby providing insights into why hybridization might be
ecologically and evolutionarily significant in Spea (Pfennig 2007; Pierce et al. 2017).
Together, our results demonstrate how the Sp. multiplicata genome can facilitate genetics
and genomics research in spadefoots. Ultimately, such research promises to provide key insights
into the distinctive phenotypes of these unique organisms.
B.4 Results
B.4.1 Properties of the Sp. multiplicata genome
173
We generated a Sp. multiplicata draft genome using a combination of high-coverage long-
and short-read sequencing (Supplementary Table 1), hybrid read assembly, and scaffolding of
contigs against the X. tropicalis reference genome (Supplementary Table 2; Methods). We
obtained 84,984 contigs summing to a total haploid genome size of 1.09 Gb with an N50 of 29,771
bp and a maximum contig length of 401,788 bp. Our scaffolded assembly of 1.07 Gb, is consistent
with historical, densitometry-based genome-size estimates (Sexsmith 1968; Bachmann 1972). The
draft genome consisted of 49,736 scaffolds with a scaffold N50 of 70,967 bp and a maximum
scaffold size of 60,197,306 Mb. This assembly contiguity is similar to other recently published
genomes (Supplementary Table 3). Thirty-two percent of the assembly was comprised of repetitive
DNA, with 18%, 5%, 3.9%, 2.5%, and 2.4% of the genome annotated as unclassified interspersed
repeats, LINEs, transposable DNA elements, long terminal repeats, and simple repeats,
respectively (Supplementary Table 4). We used Benchmarking Universal Single-Copy Orthologs
(BUSCO) (Simão et al. 2015) to check draft genome completeness (Methods). Among the 978
genes in BUSCO’s metazoan database, 878 (89.8%) were complete, whereas 47 (4.8%) were
incomplete and 53 (5.4%) were absent, which is similar to the other recently published anuran
genomes (Sun et al. 2015; Hammond et al. 2017; Edwards et al. 2018).
After confirming a high level of genome completeness, we used the software package
AUGUSTUS (Stanke and Morgenstern 2005) to perform ab initio prediction of protein-coding
genes (Supplementary Data 1; Methods). We then BLASTed the predicted proteins against the
proteome of X. tropicalis and filtered them using multiple
qualitycontrolcriteria(Methods).ComparisonsoftheproteomeofSp.multiplicata to those of the other
five anurans suggests our approach yielded high quality models (Supplementary Figure 1). We
thereby identified 19,639 protein-coding gene models, which were on average 1,370 bp excluding
174
introns and 9,398 bp including introns (Supplementary Data 1). This number of genes in Sp.
multiplicata is slightly lower than, but comparable to, the range of gene numbers reported for the
four other diploid anuran genomes to which we compared Sp. multiplicata, which range from
21,067 to 25,846 genes (Hellsten et al. 2010; Hammond et al. 2017; Edwards et al. 2018)
(Supplementary Table 5; Supplementary Figure 2).
B.4.2 Genes with elevated copy numbers in Sp. multiplicata
We examined the gene content of Sp. multiplicata in greater detail. At a False Discovery
Rate (FDR) of 0.05, Gene Ontology (GO) enrichment analysis failed to identify any significant
differences relative to the other diploid anuran genomes with available annotations (Methods).
However, comparison of these genomes revealed that three specific gene families were expanded
in Sp. multiplicata (Supplementary Table 6; Methods). These were hyaluronan synthase (hyas),
nodal (nod), and zona pellucida glycoprotein (zp3) (Figure 2). Hyaluronan is a component of
extracellular matrices, which play an important role in cell adhesion, differentiation, and migration
throughout the body (Spicer and Mcdonald 1998). Whereas the other anurans have one to three
annotated copies of hyas, Sp. multiplicata has seven (Figure 2A). As for nodal, this gene encodes
a cytokine that plays a key role in mesoderm formation and body patterning during embryogenesis
and development in deuterostomes (Osada and Wright 1999; Schier and Shen 2000; Takahashi et
al. 2000). Vertebrates exhibit substantial diversity in nodal content: humans have just a single copy
of nodal, zebrafish has three, and the other sequenced, diploid anurans have nine or fewer (Figure
2) (Takahashi et al. 2006). In comparison, we found evidence that Sp. multiplicata has at least 12
copies of nodal (Figure 2B; Supplementary Table 7). The nodal gene family also expanded in X.
tropicalis (Terai et al. 2006; Hellsten et al. 2010). When we compared the 12 Sp. multiplicata
copies of nodal to those present in X. tropicalis, we found that all were most similar to xnr6. To
175
further investigate this, we relaxed our criteria and included matches to all the nodals in the
SWISSprot database. We found evidence for potentially as many as 24 copies of nodal in Sp.
multiplicata: 22 copies were most similar to xnr6, while the remaining two copies were most
similar to another gene in X. tropicalis, xnr2 (Supplementary Table 7; Methods). Unlike most other
nodal copies in Xenopus, xnr6 acts in a cell-autonomous manner (Takahashi et al. 2000) and plays
a key role in mesoendoderm specification (Luxardi et al. 2010). In contrast, xnr2 acts later in
development and is a mesoderminducing factor (Agius et al. 2000). Lastly, zp3 encodes a protein
component of sperm-binding glycoproteins in the egg’s zona pellucida (Harris et al. 2009). While
the four other diploid anuran genomes had four or fewer copies of zp3, Sp. multiplicata had nine
(Figure 2C).
Figure 2 Gene trees for genes with elevated copy numbers in Sp. multiplicata. We utilized Usearch
(Edgar 2010) to identify gene enrichments in Spea compared to four other anurans. We then
retrieved copies of the gene in question for humans and zebrafish from Uniprot. Trees were built
from alignments including only sites where all species had data. Distance is in number of
substitutions per site. Node labels are shown for all nodes with at least 50% bootstrapping support
over 100 iterations. A hyaluronan synthase 3 (hyas3; seven copies of Sp. multiplicata); B nodal
(12 copies); and C zona pellucida glycoprotein 3 (zp3; nine copies).
176
B.4.3 Factors contributing to anuran genome size differences
Spea multiplicata has a smaller genome than most anurans (Figure 3A), including the four
other sequenced diploid anurans (Supplementary Table 5). We sought to identify genomic features
that explain these genome size differences (Methods). In our analyses, we excluded R. catesbiana
because of its fragmented gene set (Supplementary Note 2), which is a result of its comparatively
large and very repetitive genome (Hammond et al. 2017). Among the other four assemblies, there
is a more than twofold range of genome sizes (Figure 3B; Supplementary Table 5). Repetitive
DNA content––i.e., percent of a genome assembly comprised of any class of repetitive DNA––
exhibited a near perfect correlation with genome size (r = 0.99, P = 0.01; Figure 3B). Repetitive
DNA exhibits an almost twofold range across the four species, suggesting it accounts for most, but
not all, of the differences in genome size.
We also found smaller contributors to genome size differences (Supplementary Table 5).
Number of annotated genes was strongly correlated with genome size (r = 1, P , 0.0001), although
we note that this feature is highly sensitive to assembly and annotation methods. Additionally,
gene length, calculated as the number of exonic and intronic bases within a protein-coding gene,
varied substantially across the four genomes. Sp. multiplicata has appreciably smaller genes (9.4
kb on average) than the other species ($16.5 kb on average; Figure 3C). These differences in gene
size are driven byvariability in intronic DNA, as the exonic portions of genes are approximately
the same in the four species (1.2 to 1.3 kb).
177
Figure 3 Factors contributing to anuran genome size differences. a Estimated genome sizes
(Supplementary Note 1) among 284 species of anurans (Gregory 2018); arrow, estimated genome
size of Spea from densitometry. b Assembly size and c gene and transcript lengths of Sp.
multiplicata compared to three other sequenced, diploid anurans.
B.4.4 Identification of genes showing evidence of positive selection
Spadefoots, in particular those in the genus Spea, exhibit remarkable adaptations,
especially for living in desert conditions where most anurans would not survive. We therefore used
the Sp. multiplicata genome as a resource for studying adaptive evolution across spadefoot species.
To do so, we obtained short-read sequencing data for an additional Spea species, Sp. bombifrons,
as well as for two species of the closest sister taxa, Sc. holbrookii, and Sc. couchii (Methods).
Using these data, we generated near complete four-species nucleotide and amino acid alignments
for 1,967 single-copy, protein-coding genes (Methods). We then estimated dN/dS (v) within each
genus and tested whether genes showed evidence of different selection pressures in the two genera
(Supplementary Table 8; Methods).
At an FDR of 0.05, 172 genes had significantly different v values between Spea and
Scaphiopus (Figure 4, Methods). Of these, 26 genes (22 in Spea and 4 in Scaphiopus) exhibited
evidence of positive selectioninoneofthegenera,hereoperationallydefinedasv. 2(Methods). In every
case, genes under positive selection in one genus were under purifying selection in the other. The
178
remaining genes either showed evidence of neutral evolution in one genus but not the other, or
exhibited differing degrees of purifying selection between the genera (Figure 4).
Figure 4 Identification of genes showing evidence of positive selection genera of spadefoots. v
values for genes showing significantly different selection between sequenced Spea and Scaphiopus
species. Genes with a v .= 2 (horizontal dashed line) were considered as putatively under positive
selection in this study. We found more genes (n = 22) that met these criteria in Spea than in
Scaphiopus (n = 4). Significant genes are labeled by their closest match in the SWISSprot database.
We determined the functions of the 22 genes that were under positive selection in Spea
(Methods). These genes played roles in eye function, immune function, metabolism and digestion,
oxygen transport, and smell (Supplementary Table 9). Gene ontology (GO) analysis of these genes
identified 13 biological processes that were enriched (Supplementary Table 10), including:
coenzyme biosynthesis, immune function, intracellular organization, lipid metabolism and
transport, photoreceptor cell maintenance, and zinc ion transport.
B.4.5 Insights into adaptive hybridization from the transcriptome
Finally, we leveraged the Sp. multiplicata genome to gain insights into the genomic factors
that might contribute to adaptive hybridization that is observed between the Spea species, Sp.
multiplicata and Sp. bombifrons (Pfennig and Simovich 2002; Pfennig 2007; Pierce et al. 2017).
179
Specifically, we first performed 39 RNA-seq on seven Sp. bombifrons and seven Sp. multiplicata
tadpoles (Methods). The tadpoles had different parents that were sampled from distinct geographic
locations, allowing us to identify those genes that exhibit fixed expression differences between the
two species.
In total, we obtained measurements in all 14 tadpoles for 10,695 annotated genes
(Supplementary Data 2; Methods). At an FDR of 0.05, we identified 5,865 genes (54.8% of all
genes) that were differentially expressed between the species (Supplementary Data 3; Methods).
Among these genes, 53.3% exhibited higher expression in Sp. bombifrons and 46.7% showed
higher expression in Sp. multiplicata. On average, differentially expressed genes had a 3.8-fold
difference in transcript levels between the two species. However, differences as small as 1.twofold
and as large as 117.7-fold were detected. Notably, many genes show sizable changes in
transcription between the species; for example, 10% of all identified genes exhibited an at least
6.5-fold expression difference.
We also used 39 RNA-seq to measure the expression of the same 10,695 genes in 14 F1
hybrid tadpoles produced by mating Sp. bombifrons and Sp. multiplicata adults obtained from
multiple geographic locations (Supplementary Data 3; Methods). For 93.8% of all genes
differentially expressed between the species, transcript levels were higher in the hybrid tadpoles
than in the lower expressing parent-species tadpoles (Supplementary Data 3). When we focused
on the 586 genes showing greater than 6.5-fold expression differences between the species, this
proportion was even higher: 585 of 586 (99.8%) had transcript levels in the hybrid tadpoles that
were above the lower expressing parent-species tadpoles. Indeed, hybrids exhibited expression
levels close to the average of their parents (Figure 5).
180
Figure 5 Gene expression analysis in Spea hybrids. Genes exhibiting differential
expression between Sp. bombifrons and Sp. multiplicata are shown in these pure species and their
hybrids. Genes expressed at a higher level in Sp. bombifrons (n = 3,123) are shown in A, while
genes expressed at a higher level in Sp. multiplicata (n = 2,742) are shown in B. Each point
represents the average expression level of a single gene across all samples in a given class.
B.5 Discussion
The Sp. multiplicata genome advances research not only in New World spadefoot toads,
but also in anurans more generally. Anurans are noted for their genome size variation, so they are
powerful models for evaluating how and why genome size evolves (Liedtke et al. 2018; Mueller
and Jockusch 2018). Recent findings indicate that anurans’ continuous rate of genome size
evolution is higher on average than other amphibian clades, and that life history—specifically
larval development time–– is positively correlated with genome size (Liedtke et al. 2018). The
small genome size and rapid development of Sp. multiplicata exemplify this relationship.
181
The genomic factors that contribute to variation in genome size remain an issue of active
inquiry (Petrov 2001; Gregory 2005; Lynch 2007). We found that, despite their smaller genome,
spadefoots are similar to other sequenced anurans in terms of number and type of genes. Thus,
Spea’s smaller genome appears to derive from diminished repetitive and intronic DNA, which is
consistent with the prevailing hypothesis that genome size has undergone gradual change––as
opposed to abrupt change––throughout much of amphibian evolutionary history (Chrtek et al.
2009; Liedtke et al. 2018). As more amphibian genomes become available, greater insights will be
attained into the evolutionary and genomic factors that contribute to genome size evolution in this
clade.
Because of their diverse adaptations (Duellman and Trueb 1986; Halliday 2016), anurans
are also classic models in ecology, evolution, and development. The Sp. multiplicata genome will
help provide additional insights into these fields. For example, we found that spadefoots possess
modest increases in copies of genes involved in development and fertilization, most notably in the
key developmental regulator nodal. Studies using X. tropicalis have shown that nodal paralogs
exhibit different spatiotemporal expression patterns during development, which play roles in the
formation of distinct tissues (Osada and Wright 1999; Charney et al. 2017). The numerous copies
of nodal in Sp. multiplicata might contribute to this species’ remarkable phenotypic plasticity by
assigning specialized functions to different copies during development and/or facilitating rapid
bursts of transcription following abrupt changes in the environment (e.g., diet or pond volume)
that allows alternative traits to develop quickly. Although it is unlikely to be the sole contributing
factor, such gene proliferation might help explain how key developmental pathways become
environmentally sensitive without disrupting overall organism form and function (West-Eberhard
182
2003). The Sp. multiplicata genome, along with other anuran genomes, will enable future work
that can address this and related issues in evolution and development.
As an example of how the Sp. multiplicata genome can facilitate new lines of genomic
research in this system, we report a large-scale scan for genes showing evidence for selection in
Spea and/or Scaphiopus. Both genera include desert-adapted and extremely rapid developing
species, but Scaphiopus cannot produce carnivore-morph tadpoles (Figure 1E). We identified 26
genes (22 in Spea and 4 in Scaphiopus) exhibiting signatures of potential positive selection (Figure
4). Interestingly, genes under positive selection in one genus were exclusively under purifying
selection in the other genus. This suggests that the two genera, while ecologically similar in many
ways, are nevertheless experiencing, and responding to, distinct selection pressures. A key aspect
of spadefoot biology that could be impacted by these putatively selected genes in Spea is the
production of carnivores, which frequently feed on other tadpoles. Consumption of other tadpoles
increases risk of pathogen transmission (Pfennig et al. 1998) and might thereby drive the observed
positive selection on the immune function genes in Spea. The Sp. multiplicata genome will enable
explicit testing of these hypotheses and allow for deeper investigation of the mechanisms
underlying both adaptive evolution and the diversification of phenotypes among species that share
the same environments.
Further, we show how the Sp. multiplicata genome enables genomic research on the
evolution of hybridization. Here, we analyzed gene expression differences between Sp.
multiplicata and Sp. bombifrons, focusing on same-age tadpoles reared in a controlled
environment. We found evidence that more than half of the genes in the genome exhibit differential
expression between these species (Figure 5), which are the most distantly related in their genus
(Wiens and Titus 1991; Zeng et al. 2014). The number of genes showing higher expression in Sp.
183
multiplicata vs. Sp. bombifrons was roughly equal, which is consistent with the notion that species
differences accumulate via genetic drift. Yet, despite these genome-wide expression differences,
Sp. multiplicata and Sp. bombifrons interbreed and produce viable hybrid offspring (Pfennig and
Simovich 2002; Pfennig et al. 2012). Although we cannot rule out gene expression differences
arising due to variability in developmental timing, F1 hybrid gene expression being intermediate
at those genes that differ in expression between Sp. multiplicata and Sp. bombifrons may play a
role in adaptive hybridization in Spea (Pfennig 2007). Given that hybridization’s role in the origin
and distribution of species remains a topic of keen interest (Abbott et al. 2013; Pfennig et al. 2016),
the Sp. multiplicata genome provides a new resource for evaluating how genomic factors and
ecological context interact to determine how and when hybridization is adaptive.
In summary, spadefoots possess many striking ecological, evolutionary, and
developmental features that arenowpossible tostudy atthe genomic level. Moving forward, the
genome described in this paper should provide a critical foundation for analyzing this substantial
diversity within and between spadefoot species, as well as for more deeply understanding the
mechanisms producing the features that distinguish spadefoots from other anurans.
B.6 Methods
B.6.1 Genome assembly
We selected an adult male Sp. multiplicata that had been collected in July 2011 at a
breeding aggregation in an ephemeral pond (‘410 Pond’) 20 km SSE of Portal, Arizona USA
(31.7384, -109.1). Immediately after euthanizing the male, we removed and homogenized his liver
and extracted from it high molecular weight DNA using Qiagen 500G Genomic-tip columns.
Additional tissue from this specimen was stored at the North Carolina Museum of Natural Sciences
184
under the identifier NCSM84230. Three types of whole genome sequencing data were generated:
Illumina, PacBio, and Oxford Nanopore. Illumina sequencing libraries were constructed with the
Illumina Nextera kit. Five replicate Illumina sequencing libraries were prepared, multiplexed using
barcoded adapters, and then sequenced at the USC Molecular Genomics Core on an Illumina
NextSeq using the 150 bp paired-end kit. PacBio libraries were generated and sequenced on a
Pacbio Sequel by the UC Irvine Genomics HighThroughput facility. We also generated Oxford
Nanopore long-read libraries, which we sequenced on two 2D DNA chips using a Mk1b Oxford
Nanopore Minion. More information about the sequencing data are provided in Supplementary
Table 1.
Assembly was performed using all long- and short-read sequencing data. After trying
multiple assemblers, we found that MaSuRCA v3.2.1 (Zimin et al. 2013) produced the most
contiguous assembly. Because MaSuRCA uses Quorum to perform error correction internally, we
inputted raw read data into the program. We employed a kmer size of 51, a cgwError rate of 0.15,
and a jellyfish hash size of 6x1010.
Following completion of the assembly, duplicate contigs were identified using the LAST
aligner (Kielbasa et al. 2011) with default parameters. All contigs were mapped against all other
contigs. If a contig was entirely contained within another contig, with the overlapping regions
showing similarity $ 0.9, we classified the smaller contig as a duplicate. Such duplicates were
removed from the assembly and excluded from all subsequent analyses. We termed this filtered
assembly our “contig assembly” and used it in for downstream comparisons to X. tropicalis.
Characteristics of the contig assembly are described in row three of Supplementary Table 2.
We next ran RepeatModeler v1.0.4 (Smit et al. 2013-2015) on the assembly, using default
settings. Repetitive elements identified in the Sp. multiplicata genome were combined with RepDB
185
volume 16, issue 12 (Bao et al. 2015). RepeatMasker v4.0.7 (Smit et al. 2013-2015) with -e ncbi
was then used to mask repetitive DNA. Characteristics of the repetitive DNA identified in the
assembly are described in Supplementary Table 3.
Anuran genomes have a slow rate of structural evolution and high
structuralsimilarityhasbeenshownbetweenN.parkeriandX.tropicalis (Sun et al. 2015). In addition,
Spea multiplicata has a similar number of chromosomes to X. tropicalis (n = 13 and n = 12)
(Wasserman 1970; Hellsten et al. 2010). Thus, we attempted to scaffold the repeat-masked Sp.
multiplicata contigs using the X. tropicalis genome as a reference. This was done in Chromosomer
v0.1.4 (Tamazian et al. 2016) with a gap length setting of 100 bases.
To assess the quality of the scaffolded assembly, we applied BUSCO v2.0 (Simão et al.
2015) to both repeat-masked contigs and scaffolds. BUSCO performs BLASTs against an
assembly to ascertain the presence or absence of proteins known to be highly conserved among all
members of a phylogenetic branch. Assembly completeness was assessed based on the metazoan
gene set, as this set has been previously used in other published assembly reports (Hammond et al.
2017). The scaffolds showed significantly improved contiguity relative to the contigs (row four of
Supplementary Table 2), indicating that scaffolding improved the assembly.
Lastly, we unmasked repeats in the scaffolds. This scaffolded assembly without repeat
masking is what we refer to throughout the paper as the Sp. multiplicata ‘assembly.’
B.6.2 Annotation of protein-coding genes
We performed ab initio protein-coding gene prediction using Augustus v3.2.3 (Stanke and
Morgenstern 2005; Stanke et al. 2008). However, to first generate a training set for Augustus, we
empirically annotated a subset of genes in Sp. multiplicata. To do this, we utilized previously
generated RNA-seq data from tadpoles (Seidl et al. 2019). RNA-seq reads were mapped to the
186
assembly using Tophat2 v2.1.1 (Kim et al. 2013). We then extracted the nucleotide sequences of
parts of the assembly covered by the RNA-seq data. Best matches for these sequences in X.
tropicalis were obtained by comparing six-frame translations of the Sp. multiplicata data against
X. tropicalis v9.1 peptides obtained from Xenbase (Karimi et al. 2018) using Blastx v2.2.30
(Camacho et al. 2009). Putative translation start sites were defined as the ATG in the Sp.
multiplicata sequence closest to the beginning of the alignment. This resulted in a set of 1,478
empirically defined gene models for which the Sp. multiplicata peptide spanned $ 80% of the X.
tropicalis match with $ 30% sequence identity and a putative translation start site was found.
To train Augustus, we randomly split our empirically defined gene set into a training set
(1,000 genes) and a verification set (478 genes). We then ran the Augustus etraining pipeline to
estimate parameters that accurately described features of protein-coding genes in Sp. multiplicata.
We further optimized the parameter estimates using the optimize_augustus.pl script.
After estimating species-specific parameters, we ran Augustus to annotate protein-coding
genes in the contig assembly, as well as in the scaffolded assembly. We performed gene prediction
in both assemblies to determine the effectiveness of scaffolding against
X. tropicalis. Augustus produced a total of 81,079 genes from our contigs and 42,671 genes
from our scaffolds (Supplementary Data 1; Supplementary Figure 3). The average gene size of the
two sets was 316.9 and 365.8 peptides respectively. We took this as evidence that scaffolding
enabled better gene prediction than the contigs alone. We compared the protein sequences
predicted by Augustus against the X. tropicalis (Karimi et al. 2018) database using global_search
in Usearch v10.0.240 (Edgar 2010). To obtain a high confidence set of 19,639 protein-coding
genes, we filtered the complete set of annotated protein-coding genes using multiple criteria. We
required proteins to be 30 amino acids or larger (for comparison, the smallest protein in X.
187
tropicalis is 33 amino acids long). Also, relative to their best X. tropicalis match, we employed
thresholds of $30% identity, $30% target coverage, and $75% query coverage.
B.6.3 Gene ontology analysis
We obtained full gene sets for all publicly available genomes with annotations, namely X.
tropicalis, R. marina, N. parkeri, and R. catesbeiana, from Xenbase (Karimi et al. 2018) or NCBI.
We assigned each gene in each species a uniprot ID (Uniprot Consortium 2019) by determining
its best match in a SWISS-prot database, retrieved from Uniprot, including anurans, human, and
zebrafish (Supplementary Data 4). To generate these matches, we performed VSEARCH (Rognes
et al. 2016) global pairwise alignment, returning only the best match alignment showing $30%
identity. We then assigned Biological Process gene ontology (GO) terms to each gene based on its
best match uniprot ID.
To determine if Sp. multiplicata showed enrichment or depletion of any GO term relative
to the other species, we performed chi-square tests for each term (Cai et al. 2006). Specifically, we
counted the number of genes with and without a given term in each species and compared these
values using a series of pairwise chi-square tests. We then corrected for multiple testing using FDR
(Benjamini and Hochberg 1995) implemented with the qvalue function in R (Bass et al. 2018). A
GO term was considered significant if all its pairwise chi-square test results were significant at an
FDR below 0.05 and if Sp. multiplicata had more (or fewer) genes with that term than each of the
other species. If Sp. multiplicata was not significantly enriched or depleted for a given GO term
relative to every other species, we did not consider the GO term significant in our overall analysis.
B.6.4 Copy number analysis
We compared the peptide sequences of our gene models, as well as the peptide models of
other published anuran assemblies and annotations against the vertebrate SWISS-prot (Bairoch
188
and Apweiler 2000) database. To decrease the likelihood of false positives, we first removed all
genes with at least $30% identity and $90% target coverage. We then counted the number of
matches in each species for each of the 18,341 genes in the database. We compared the counts
across all five anurans and looked for cases of enrichment specific to Spea (Supplementary Table
6). For the purposes of this paper, we defined a gene as enriched specifically in Sp. multiplicata if
it was present in at least twice the number of copies in Sp. multiplicata compared to all the other
sequenced anurans, with Sp. multiplicata having at least five copies.
To supplement the five anuran sequences we retrieved peptide sequences of nodal, hyas,
and zp3 from human and zebrafish from uniprot (Uniprot Consortium 2019). We aligned the
peptide sequences of all seven species using Muscle v3.8.31 (Edgar 2004) and removed all
positions where any species had a gap. We used the R package phangorn (Schliep et al. 2017) to
calculate maximum likelihood models of amino acid substitution distance using optim.pml with
model = WAG and stochastic rearrangement. To further investigate the nodal expansion, we
included matches to all nodal related genes in the SWISSprot database and reduced our coverage
requirement to $50% for all five anuran species (Supplementary Table 6),
We used short-read sequencing data to verify gene amplifications detected in the assembly.
We mapped short-read data from Sp. multiplicata against the gene alignments generated for our
PAML analyses using bwa v0.7.12 (Li and Durbin 2009) and extracted per base coverage
information using the bedtools (Quinlan and Hall 2010) genomecov module. We then divided the
median coverage values of our enriched gene models by the median coverage across all gene
models to generate a fold coverage measurement. To estimate the total number of gene models in
each family we summed fold coverage.
B.6.5 dN/dS analysis
189
Wesampled,preserved,andstoredattheNorthCarolinaMuseumof Natural Sciences a single
adult from Sp. bombifrons (NCSM84228), Sc. holbrookii (NCSM84231), and Sc. couchii
(NCSM84229). We used Qiagen 500G Genomic-tip columns to extract DNA from liver tissue.
For each sample, we constructed between three and five replicate Illumina Nextera libraries using
the same DNA but different tagmentation and PCR steps following standard protocols. Indexed
libraries were then sequenced on an Illumina NextSeq using the 150 bp paired-end kit at the USC
Molecular Genomics Core.
We performed traditional short-read assembly on the data from Sp. bombifrons, Sc.
couchii, and Sc. holbrookii. We trimmed short reads to remove low-quality bases using
Trimmomatic (Bolger et al. 2014) and then used the trimmed reads as inputs for the SOAPdenovo2
v2.04 assembler (Luo et al. 2012), and ran the assembler across a range of kmers (31, 41, 51, 61,
71, 81, 91, 101) for each species. A single contig set was selected for further use based on the
assembly quality parameters, with an emphasis on a total assembly size of 1 Gb (Supplementary
Table 10). We aligned the contigs of these assemblies to the nucleotide sequences of our predicted
genic models using lastal (Kielbasa et al. 2011) with default settings. The contig with the best
match score for each exon from each species was used to generate an alignment for each gene.
Bases that were not covered by the top match contig were encoded as N. We were able to generate
alignments for 25,382 genes with partial data from each species.
We used Phylogentic Analysis by Maximum Likelihood (PAML) v4.9 (Yang 2007) to
estimate dN/dS (v) for all genes alignments. We used the cleandata = 1 option to remove all sites
where any species had 1 or more Ns in the codon. We utilized the same two models on two distinct
mid-point rooted tree topologies. In the first model, a single v was estimated across all branches.
In the second model, two v parameters were estimated, one for Spea (Sp. multiplicata, Sp.
190
bombifrons) and one for Scaphiopus (Sc. couchii, Sc. holbrookii). We removed those genes with
any v estimates of 999 as well as those with fewer than 50 sites. We then further removed all genes
without $ 80% of the gene model covered in all four species. This left a total of 1,967 genes.
Significance of two v models relative to one v models was determined using likelihood ratio tests,
with FDRs estimated as above (Bass et al. 2018). Typically, v . 1 indicates positive selection, v ,
1 indicates purifying selection, and v = 1 indicates neutral evolution. However, here, hypothesis
tests were performed by comparing v between Spea and Scaphiopus, rather than by comparing v
in a given species against a model where v = 1 (Yang 2007). To account for this, we used a more
stringent threshold for positive selection (v$ 2). Likewise, we also used a more stringent threshold
for calling genes as experiencing purifying selection (v# 0.5).
We performed GO enrichment tests on genes operationally defined as experiencing
positive selection in Spea. To do this, we compared the number of genes with or without a given
GO term in the set of genes under positive selection in Spea to the number of genes with or without
a given GO term in those genes not under positive selection in Spea. For each GO term, a chi-
square test was performed using the chisq.test function in R and multiple testing correction was
conducted on the entire set of tests in qvalue (Bass et al. 2018). We then manually explored
information regarding the parent-child relationships of each term, the full description of the GO
term, and terms with which a focal term frequently co-occurs using the QuickGO online resource
(Binns et al. 2009) to determine potential biological relevance (‘Functional grouping’ in
Supplementary Table 9). Note that, unlike other analyses in the paper, terms reported as significant
in this analysis were identified at an FDR threshold of 0.065.
B.6.6 Analysis of gene expression in pure species and their hybrids
191
Wecontrastedgeneexpression intadpoles between the Speaspeciesand their hybrids as
follows. To generate pure-species tadpoles, we bred pairs of Sp. multiplicata adults and pairs of
Sp. bombifrons. To create hybrid tadpoles, we paired Sp. multiplicata adults with Sp. bombifrons
adults. We generated hybrids of both maternal cross directions. All adults were wild-caught and
maintained in lab facilities. To induce breeding, we injected adults with 0.07 mL 0.01ug/ml
gonadotropin releasing hormone (GnRH) agonist. Males and females were placed as pairs in
separate aquaria with 10 L of dechlorinated water and allowed to oviposit. We generated at least
three replicate families per cross type. After egg release was complete, adults were removed and
the eggs were aerated until they hatched. When tadpoles were swimming freely, we selected 16
tadpoles at random from each family; divided these tadpoles into two groups of eight; and placed
each group in a tank (34 cm X 21 cm X 11.5 cm) filled with dechlorinated water. All tadpoles were
fed their natural diet of shrimp and detritus ad libitum. On day 10 after fertilization (comparable
to stage 47 in X. tropicalis), we killed tadpoles by placing them in MS-222 and then immediately
froze them in liquid nitrogen. Thus, all tadpoles were the same age, but not necessarily the same
developmental stage, at sampling.
We sampled seven tadpoles each of pure Sp. multiplicata, pure Sp. bombifrons, and 6 and
8 tadpoles respectively for each type of interspecies hybrid combinations (i.e., female Sp.
multiplicata x male Sp. bombifrons and female Sp. bombifrons x male Sp. multiplicata). We
extracted RNA from whole tadpoles using a combination of TRIzol Reagent and the Ambion
PureLink RNA Mini Kit according to Levis et al. (2017) and submitted the samples to Cornell
Capillary DNA Analysis for preparation and sequencing of 39 RNA-seq libraries. We trimmed the
resulting short reads for poly-A tails and adapter contamination using Trimmomatic (Bolger et al.
192
2014), and aligned individual libraries, as well as pooled libraries, to our scaffolds using STAR
aligner(Dobin et al. 2013).
39 RNA-seq only generates reads from the 39 ends of transcripts, resulting in peaks of data
(Beck et al. 2010). To identify transcription peaks, we selected the base with maximum coverge
for each region with continuous coverage above 50. We then extracted coverage from each
individual sample for each of these peaks. We normalized by library size (millions of reads) and
log2 transformed the resulting values before performing digital normalization across all data. We
calculated the mean-fold coverage for all peaks for pure individuals and hybrids. Gene-specific
ANOVA models were then used to identify genes showing significant differential expression
between the species. For each gene, we fit the following model:
expression ¼ species þ error;
where expression corresponds to vector of log2 expression measurements for the samples, species
is a vector containing the species from which each measurement was taken, and error denotes the
vector of residuals. These models were fit only to data from the pure species, using the aov function
in R. P-values for the models were obtained and corrected for multiple testing using ‘qvalue’ (Bass
et al. 2018), with a significance threshold of FDR # 0.05. After identifying these differentially
expressed genes in pure species, we then examined their log2 expression levels in hybrids. Mean
expression levels for a given gene within a particular sample class were determined by computing
the arithmetic mean of all measurements for that gene in the appropriate class using the mean
function in R.
B.6 Supplementary Material
Supplementary Notes
193
Supplementary Note 1. Conversion of C-value to gigabases. Genome size were converted from
picogram to base pairs by the formula provided by Dolezel et al. 2007.
Supplementary Note 2. Rationale for excluding the Rana catesbeiana genome. Annotated peptides
in Rana (Lithobates) catesbeiana are slightly more than half the size of genes in the other species
(Supplementary Table 4), suggesting a highly fragmented assembly. We posit that this is a
technical artifact as this genome reported 40.7% of genes in the Core Eukaryotic Genes Mapping
Approach (CEGMA) as complete (AmphibiaWeb2018).
Suppemental tables and datasets can be found online:
https://doi.org/10.25387/g3.8303672
B.7 References
Abbott, R., D. Albach, S. Ansell, J. W. Arntzen, S. J. E. Baird et al., 2013 Hybridization and
speciation. J. Evol. Biol. 26: 229–246. https://
doi.org/10.1111/j.1420-9101.2012.02599.xAgius, E., M. Oelgeschlager, O. Wessely, C. Kemp,
and E. M. De Robertis, 2000 Endodermal Nodal-related signals and mesoderm induction
in Xenopus. Development 127: 1173–1183.
AmphibiaWeb, 2018 ,https://amphibiaweb.org., pp., Berkeley, California.
Bachmann, K., 1972 Nuclear DNA and developmental rate in frogs. Q. J. Fla. Acad. Sci. 35:
225–231.
Bairoch, A., and R. Apweiler, 2000 The SWISS-PROT protein sequence database and its
supplement TrEMBL in 2000. Nucleic Acids Res. 28:
45–48. https://doi.org/10.1093/nar/28.1.45
Bao, W., K. K. Kojima, and O. Kohany, 2015 Repbase Update, a database of repetitive elements
in eukaryotic genomes. Mob. DNA 6: 11. https://doi.org/10.1186/s13100-015-0041-9
Bass, A. J., A. Dabney and D. Robinson, 2018 qvalue: Q-value estimation for false discovery rate
control. R package version 2.14.0.
Beck, A. H., Z. Weng, D. M. Witten, S. Zhu, J. W. Foley et al., 2010 39-end sequencing for
expression quantification (3SEQ) from archival tumor samples. PLoS One 5: e8768.
https://doi.org/10.1371/ journal.pone.0008768
Benjamini, Y., and Y. Hochberg, 1995 Controlling the false discovery rate: a practical and
powerful approach to multiple testing. J. R. Stat. Soc. B 57: 289–300.
194
Binns, D., E. Dimmer, R. Huntley, D. Barrell, C. O’Donovan et al., 2009 QuickGO: a web-
based tool for Gene Ontology searching.
Bioinformatics 25: 3045–3046. https://doi.org/10.1093/bioinformatics/btp536
Bolger, A. M., M. Lohse, and B. Usadel, 2014 Trimmomatic: a flexible trimmer for Illumina
sequence data. Bioinformatics 30: 2114–2120.
https://doi.org/10.1093/bioinformatics/btu170
Boorse, G. C., and R. J. Denver, 2003 Endocrine mechanisms underlying plasticity in
metamorphic timing in spadefoot toads. Integr. Comp. Biol. 43: 646–657.
https://doi.org/10.1093/icb/43.5.646
Bossuyt, F., and K. Roelants, 2009 Frogs and toads (Anura), pp. 357–364 in The timetree of life,
edited by Hedges, S. B., and S. Kumar. Oxford University Press, Oxford, U.K.
Bragg, A. N., 1965 Gnomes of the night: the spadefoot toads, University of Pennsylvania Press,
Philadelphia, PA. https://doi.org/10.9783/9781512800685
Cai, Z., X. Mao, S. Li, and L. Wei, 2006 Genome comparison using Gene Ontology (GO)
with statistical testing. BMC Bioinformatics 7: 374. https://doi.org/10.1186/1471-2105-7-
374
Camacho, C., G. Coulouris, V. Avagyan, N. Ma, J. Papadopoulos et al., 2009 BLAST+:
architecture and applications. BMC Bioinformatics 10:
421. https://doi.org/10.1186/1471-2105-10-421
Charney, R. M., K. D. Paraiso, I. L. Blitz, and K. W. Y. Cho, 2017 A gene regulatory program
controlling early Xenopus mesendoderm formation: network conservation and motifs.
Semin. Cell Dev. Biol. 66: 12–24. https://doi.org/10.1016/j.semcdb.2017.03.003
Chrtek, J. I. J., J. Zahradnek, K. Krak, and J. Fehrer, 2009 Genome size in Hieracium subgenus
Hieracium (Asteraceae) is strongly correlated with major phylogenetic groups. Ann. Bot.
(Lond.) 104: 161–178. https://doi.org/10.1093/aob/mcp107
Denton, R. D., R. S. Kudra, J. W. Malcolm, L. Du Preez, and J. H. Malone, 2018 The African
Bullfrog (Pyxicephalus adspersus) genome unites the two ancestral ingredients for making
vertebrate sex chromosomes. bioRxiv. https://doi.org/doi: 10.1101/329847
Denver, R. J., N. Mirhadi, and M. Phillips, 1998 Adaptive plasticity in amphibian
metamorphosis: response of Scaphiopus hammondii tadpoles to habitat desiccation.
Ecology 79: 1859–1872.
Dobin, A., C. A. Davis, F. Schlesinger, J. Drenkow, C. Zaleski et al., 2013 STAR: ultrafast
universal RNA-seq aligner. Bioinformatics 29:15–21.
https://doi.org/10.1093/bioinformatics/bts635
Duellman, W. E., and L. Trueb, 1986 Biology of amphibians, MacGraw Hill, New York.
Edgar, R. C., 2004 MUSCLE: multiple sequence alignment with high accuracy and high
throughput. Nucleic Acids Res. 32: 1792–1797. https://doi.org/10.1093/nar/gkh340
Edgar, R. C., 2010 Search and clustering orders of magnitude faster than BLAST.
Bioinformatics 26: 2460–2461. https://doi.org/10.1093/bioinformatics/btq461
Edwards, R. J., D. E. Tuipulotu, T. G. Amos, D. O’Meally, M. F. Richardson et al., 2018 Draft
genome assembly of the invasive cane toad, Rhinella marina. Gigascience 7: 1–13.
https://doi.org/10.1093/gigascience/giy095
Gomez-Mestre, I., and D. R. Buchholz, 2006 Developmental plasticity mirrors differences among
taxa in spadefoot toads linking plasticity and diversity. Proc. Natl. Acad. Sci. USA 103:
19021–19026. https://doi.org/10.1073/pnas.0603562103
Gregory, T. R., 2005 The evolution of the genome, Academic/Elsevier, Burlington, VT.
195
Gregory, T. R., 2018 Animal Genome Size Database. http:// www.genomesize.com.
Halliday, T. R., 2016 The book of frogs, University of Chicago Press, Chicago, IL.
Hammond, S. A., R. L. Warren, B. P. Vandervalk, E. Kucuk, H. Khan et al., 2017 The North
American bullfrog draft genome provides insight into hormonal regulation of long
noncoding RNA. Nat. Commun. 8: 1433. https://doi.org/10.1038/s41467-017-01316-7
Harris, J. D., D. W. Hibler, G. K. Fontenot, K. T. Hsu, E. C. Yurewicz et al., 2009 Cloning and
characterization of zona pellucida genes and cDNAs from a variety of mammalian species:
The ZPA, ZPB and ZPC gene families. DNA Seq. 4: 361–393. https://doi.org/10.3109/
10425179409010186
Hellsten, U., R. M. Harland, M. J. Gilchrist, D. Hendrix, J. Jurka et al.,
2010 The genome of the western clawed frog Xenopus tropicalis. Science
328: 633–636. https://doi.org/10.1126/science.1183670
Karimi, K., J. D. Fortriede, V. S. Lotay, K. A. Burns, D. Z. Wang et al., 2018 Xenbase: a
genomic, epigenomic and transcriptomic model organism database. Nucleic Acids Res. 46:
D861–D868. https://doi.org/10.1093/nar/gkx936
Kielbasa, S. M., R. Wan, K. Sato, P. Horton, and M. C. Frith, 2011 Adaptive seeds tame genomic
sequence comparison. Genome Res. 21: 487–493. https://doi.org/10.1101/gr.113985.110
Kim, D., G. Pertea, C. Trapnell, H. Pimentel, R. Kelley et al., 2013 TopHat2: accurate alignment
of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol.
14: R36. https://doi.org/10.1186/ gb-2013-14-4-r36
Levis, N. A., S. de la Serna Buzon, and D. W. Pfennig, 2015 An inducible offense:
carnivore morph tadpoles induced by tadpole carnivory. Ecol. Evol. 5: 1405–1411.
https://doi.org/10.1002/ece3.1448
Levis, N. A., A. Isdaner, and D. W. Pfennig, 2018 Morphological novelty emerges from pre-
existing phenotypic plasticity. Nat. Ecol. Evol. 2: 1289–1297.
https://doi.org/10.1038/s41559-018-0601-8
Levis, N. A., A. Serrato-Capuchina, and D. W. Pfennig, 2017 Genetic accommodation in
the wild: evolution of gene expression plasticity during character displacement. J. Evol.
Biol. 30: 1712–1723. https://doi.org/10.1111/jeb.13133
Li, H., and R. Durbin, 2009 Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics 25: 1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Liedtke, H. C., D. J. Gower, M. Wilkinson, and I. Gomez-Mestre, 2018 Macroevolutionary shift
in the size of amphibian genomes and the role of life history and climate. Nat. Ecol. Evol.
2: 1792–1799. https:// doi.org/10.1038/s41559-018-0674-4
Luo, R., B. Liu, Y. Xie, Z. Li, W. Huang et al., 2012 SOAPdenovo2: an empirically
improved memory-efficient short-read de novo assembler. Gigascience 1: 18.
https://doi.org/10.1186/2047-217X-1-18
Luxardi, G., L. Marchal, V. Thomé, and L. Kodjabachian, 2010 Distinct Xenopus Nodal ligands
sequentially induce mesendoderm and control gastrulation movements in parallel to the
Wnt/ PCP pathway. Development 137: 417–426. https://doi.org/10.1242/ dev.039735
Lynch, M., 2007 The origin of genome architecture, Sinauer Associates, Sunderland, MA.
Mayhew, W. W., 1965 Adaptations of the amphibian, Scaphiopus couchii, to desert conditions.
Am. Midl. Nat. 74: 95–109. https://doi.org/10.2307/2423123
McClanahan, L. J., 1967 Adaptations of the spadefoot toad Scaphiopus couchii, to desert
environments. Comp. Biochem. Physiol. 20: 73–99. https://doi.org/10.1016/0010-
406X(67)90726-8
196
Morey, S., and D. Reznick, 2000 A comparative analysis of plasticity in larval development
in three species of spadefoot toads. Ecology 81: 1736–1749. https://doi.org/10.1890/0012-
9658(2000)081[1736:ACAOPI]2.0.CO;2
Mueller, R. L., and E. L. Jockusch, 2018 Jumping genomic gigantism. Nat. Ecol. Evol. 2:
1687–1688. https://doi.org/10.1038/s41559-018-0703-3
Newman, R. A., 1989 Developmental plasticity of Scaphiopus couchii tadpoles in an unpredictable
environment. Ecology 70: 1775–1787. https://doi.org/10.2307/1938111
Newman, R. A., 1992 Adaptive plasticity in amphibian metamorphosis. Bioscience 42: 671–678.
https://doi.org/10.2307/1312173
Osada, S. I., and C. V. E. Wright, 1999 Xenopus nodal-related signaling is essential for
mesendodermal patterning during early embryogenesis.Development 126: 3229–3240.
Petrov, D. A., 2001 Evolution of genome size: new approaches to an old problem. Trends Genet.
17: 23–28. https://doi.org/10.1016/S01689525(00)02157-0
Pfennig, D. W., 1990 The adaptive significance of an environmentally-cued developmental switch
in an anuran tadpole. Oecologia 85: 101–107. https://doi.org/10.1007/BF00317349
Pfennig, D. W., 1992a Polyphenism in spadefoot toads as a locally adjusted evolutionarily stable
strategy. Evolution 46: 1408–1420.
Pfennig, D. W., 1992b Proximate and functional causes of polyphenism in an anuran tadpole.
Funct. Ecol. 6: 167–174. https://doi.org/10.2307/2389751
Pfennig, D. W., 2000 Effect of predator-prey phylogenetic similarity on the fitness consequences
of predation: A trade-off between nutrition and disease? Am. Nat. 155: 335–345.
https://doi.org/10.1086/303329
Pfennig, D. W., S. G. Ho, and E. A. Hoffman, 1998 Pathogen transmission as a selective force
against cannibalism. Anim. Behav. 55: 1255–1261.
https://doi.org/10.1006/anbe.1997.9996
Pfennig, K. S., 2007 Facultative mate choice drives adaptive hybridization. Science 318: 965–
967. https://doi.org/10.1126/science.1146035
Pfennig, K. S., A. Allenby, R. A. Martin, A. Monroy, and C. D. Jones, 2012 A suite of
molecular markers for identifying species, detecting introgression and describing
population structure in spadefoot toads (Spea spp.). Mol. Ecol. Resour. 12: 909–917.
https://doi.org/10.1111/ j.1755-0998.2012.03150.x
Pfennig, K. S., A. L. Kelly, and A. A. Pierce, 2016 Hybridization as a facilitator of species range
expansion. Proceedings of the Royal Society B: Biological Sciences 283.
10.1098/rspb.2016.1329https://doi.org/10.1098/ rspb.2016.1329
Pfennig, K. S., and M. A. Simovich, 2002 Differential selection to avoid hybridization in two
toad species. Evolution 56: 1840–1848. https://doi.org/10.1111/j.0014-
3820.2002.tb00198.x
Pierce, A. A., R. Gutierrez, A. M. Rice, and K. S. Pfennig, 2017 Genetic variation during
range expansion: effects of habitat novelty and hybridization. Proc. Biol. Sci. 284:
20170007. https://doi.org/10.1098/ rspb.2017.0007
Quinlan, A. R., and I. M. Hall, 2010 BEDTools: a flexible suite of utilities for comparing genomic
features. Bioinformatics 26: 841–842. https://doi.org/10.1093/bioinformatics/btq033
Rogers, R. L., L. Zhou, C. Chu, R. Marquez, A. Corl et al., 2018 Genomic Takeover by
Transposable Elements in the Strawberry Poison Frog. Mol. Biol. Evol. 35: 2913–2927.
Rognes, T., T. Flouri, B. Nichols, C. Quince, and F. Mahé, 2016 VSEARCH: a versatile open
source tool for metagenomics. PeerJ 4: e2584. https://doi.org/10.7717/peerj.2584
197
Ruibal, R., L. Tevis, and V. Roig, 1969 The terrestrial ecology of the spadefoot toad Scaphiopus
hammondii. Copeia 1969: 571–584. https://doi.org/10.2307/1441937
Schier, A. F., and M. M. Shen, 2000 Nodal signalling in vertebrate development. Nature 403:
385–389. https://doi.org/10.1038/35000126
Schliep, K., A. J. Potts, D. A. Morrison, and G. W. Grimm, 2017 Intertwining phylogenetic trees
and networks. Methods Ecol. Evol. 8: 1212–1220. https://doi.org/10.1111/2041-
210X.12760
Seidl, F., N. A. Levis, C. D. Jones, A. Monroy-Eklund, I. M. Ehrenreich et al., 2019 Variation in
hybrid gene expression: implications for the evolution of genetic incompatibilities in
interbreeding species. Mol. Ecol. https://doi.org/10.1111/mec.15246
Sexsmith, L. E., 1968 DNA values and karyotypes of amphibians. Thesis University of Toronto.
Seymour, R. S., 1973 Energy metabolism of dormant spadefoot toads(Scaphiopus). Copeia 1973:
435–445. https://doi.org/10.2307/1443107 Simão, F. A., R. M. Waterhouse, P. Ioannidis,
E. V. Kriventseva, and E. M. Zdobnov, 2015 BUSCO: assessing genome assembly and
annotation completeness with single-copy orthologs. Bioinformatics 31: 3210–3212.
https://doi.org/10.1093/bioinformatics/btv351
Smit, A. F. A., R. Hubley and P. Green, 2013–2015 RepeatMasker Open-4.0.
Spicer, A. P., and J. A. McDonald, 1998 Characterization and molecular evolution of a vertebrate
hyaluronan synthase gene family. J. Biol. Chem. 273: 1923–1932.
https://doi.org/10.1074/jbc.273.4.1923
Stanke, M., M. Diekhans, R. Baertsch, and D. Haussler, 2008 Using native and syntenically
mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24: 637–644.
https://doi.org/10.1093/bioinformatics/btn013
Stanke, M., and B. Morgenstern, 2005 AUGUSTUS: a web server for gene prediction in
eukaryotes that allows user-defined constraints. Nucleic Acids Res. 33: W465–W467.
https://doi.org/10.1093/nar/gki458
Stuart, S. N., J. S. Chanson, N. A. Cox, B. E. Young, A. S. L. Rodrigues et al., 2004 Status and
Trends of Amphibian Declines and Extinctions Worldwide. Science 306: 1783–1786.
https://doi.org/10.1126/ science.1103538
Sun, Y.-B., Z.-J. Xiong, X.-Y. Xiang, S.-P. Liu, W.-W. Zhou et al., 2015 Whole-genome sequence
of the Tibetan frog Nanorana parkeri and the comparative evolution of tetrapod genomes.
Proc. Natl. Acad. Sci. USA 112: E1257–E1262. https://doi.org/10.1073/pnas.1501764112
Takahashi, S., Y. Onuma, C. Yokota, J. J. Westmoreland, M. Asashima et al., 2006 Nodal-
related gene Xnr5 is amplified in the Xenopus genome. Genesis 44: 309–321.
https://doi.org/10.1002/dvg.20217
Takahashi, S., C. Yokota, K. Takano, K. Tanegashima, Y. Onuma et al., 2000 Two novel
nodal-related genes initiate early inductive events in Xenopus Nieuwkoop center.
Development 127: 5319–5329.
Tamazian, G., P. Dobrynin, K. Krasheninnikova, A. Komissarov, K. P. Koepfli et al., 2016
Chromosomer: a reference-based genome arrangement tool for producing draft
chromosome sequences. Gigascience 5: 38.https://doi.org/10.1186/s13742-016-0141-6
Terai, Y., O. Seehausen, T. Sasaki, K. Takahashi, S. Mizoiri et al.,2006 Divergent selection on
opsins drives incipient speciation in Lake Victoria cichlids. PLoS Biol. 4: e433.
https://doi.org/10.1371/ journal.pbio.0040433
Tinsley, R. C., and K. Tocque, 1995 The population dynamics of a desert anuran, Scaphiopus
couchii. Aust. J. Ecol. 20: 376–384. https://doi.org/ 10.1111/j.1442-9993.1995.tb00553.x
198
UniProt Consortium, 2019 UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res.
47: D506–D515. https://doi.org/10.1093/nar/ gky1049
Wasserman, A. O., 1970 Chromosomal Studies of the Pelobatidae (Salientia) and some Instances
of Ploidy. Southwest. Nat. 15: 239–248. https://doi.org/10.2307/3670352
Wells, K. D., 2007 The ecology and behavior of amphibians, University ofChicago Press,
Chicago, IL. https://doi.org/10.7208/chicago/9780226893334.001.0001
West-Eberhard, M. J., 2003 Developmental plasticity and evolution, Oxford University Press, New
York.
Wiens, J. J., and T. A. Titus, 1991 A phylogenetic analysis of Spea (Anura:Pelobatidae).
Herpetologica 47: 21–28.
Yang, Z., 2007 PAML 4: a program package for phylogenetic analysis by maximum
likelihood. Mol. Biol. Evol. 24: 1586–1591. https://doi.org/10.1093/molbev/msm088
Zeng, C., I. Gomez-Mestre, and J. J. Wiens, 2014 Evolution of rapid development in spadefoot
toads is unrelated to arid environments. PLoS One 9: e96637.
https://doi.org/10.1371/journal.pone.0096637
Zimin, A. V., G. Marcais, D. Puiu, M. Roberts, S. L. Salzberg et al., 2013 The MaSuRCA genome
assembler. Bioinformatics 29: 2669–2677. https:// doi.org/10.1093/bioinformatics/btt476
199
Appendix C: The interplay of additivity, dominance, and epistasis in a diploid
yeast cross
This research paper is currently in review at Nature Communications and is posted on BioRxiv.
C.1 Summary of Contribution
As middle author on this research paper, I conducted experiments that were critical to building the
genetic resource used in this work.
C.2 Abstract
We used a double barcoding system to generate and phenotype a panel of ~200,000 diploid yeast
segregants that can be partitioned into hundreds of interrelated families. This experimental design
enabled the detection of thousands of genetic interactions and many loci whose effects vary across
families. Traits were largely specified by a small number of hub loci with major additive and
dominance effects, and pervasive epistasis. Genetic background commonly influenced both the
additive and dominance effects of loci, with multiple modifiers typically involved. The most
prominent dominance modifier was the mating locus, which had no effect on its own. Our findings
show that the interplay between additivity, dominance, and epistasis underlies a complex
genotype-to-phenotype map in diploids.
200
C.3 Introduction
Most complex traits, including many phenotypes of agricultural, clinical, and evolutionary
significance, are specified by multiple loci (1). How alleles at these loci collectively produce the
heritable trait variation in genetically diverse populations remains unresolved (2, 3). While
additive loci play a major role in most traits, non-additive genetic effects are also likely important
(4–9). However, loci with non-additive genetic effects are often difficult to detect, limiting
knowledge of their properties (10, 11).
The two purely genetic sources of non-additivity are dominance among alleles of
individual loci and epistasis between alleles at different loci (or genetic interactions) (12, 13). Most
empirical studies of non-additive genetic effects have focused on haploid or inbred individuals
(14–16), which provide higher statistical power to detect loci due to their nominal levels of
heterozygosity. However, by design, these populations cannot furnish insight into dominance or
its relationship with epistasis. This is a problem because many eukaryotic species that matter to
humans, including our species itself, exist predominantly as diploids that outbreed and have high
levels of heterozygosity (17–19). Dominance may be an important contributor to traits in these
species.
When epistasis occurs in diploids, a locus may influence only the additive effects, only the
dominance effects, or both the additive and dominance effects of its interactor(s) (20–22). Such
interplay has implications for efforts to genetically dissect phenotypes, predict heritable traits from
genotypes, and understand the evolutionary trajectories of beneficial and deleterious alleles. Yet,
exploration of the relationship between additivity, dominance, and epistasis has mainly been
limited to theory because of technical challenges in identifying non-additive loci.
201
The budding yeast Saccharomyces cerevisiae is a potentially powerful system for studying
nonadditive genetics in diploids. Haploid yeast segregants with known genotypes can be mated to
produce diploids that also have known genotypes (23). This strategy facilitates the generation of
diploid mapping populations that are roughly the square of the number of haploid progenitors.
However, phenotyping large diploid populations of more than ~10,000 individuals has been
technically difficult (23, 24), limiting the use of this strategy.
C.5 Results
C.5.1 Phenotyping of a large diploid cross by barcode sequencing
To enable phenotyping of large yeast diploid mapping populations, we developed a system
that fuses two genomic barcodes, one from each haploid parent, in vivo to create a unique double
barcode for each diploid genotype (Fig. 1A and Fig. S1). We started with two S. cerevisiae isolates,
the commonly used lab strain BY4716 (BY) and a haploid derivative of the clinical isolate
322134S (3S). These strains differ at ~45,000 SNPs (~0.4% of genome) (25, 26). To ensure
segregation of the mating locus, both BY MATa x 3S MATα and 3S MATa x BY MATα crosses
were performed using isogenic strains that had been mating type switched. From these crosses,
600 MATα and 400 MATa segregants from distinct four-spore tetrads were marked at the neutral
YBR209W locus by integrating a random barcode (27, 28). At least two uniquely barcoded strains
were recovered per haploid segregant and the genome of each segregant was sequenced to define
the genotype represented by each barcode (Fig. S1 and Fig. S2A).
202
203
Fig. 1 Generating a large panel of diploid segregants with known genotypes that can be phenotyped
as a pool A. Overview of the experimental design. Parental haploids, BY and 3S, were mated and
sporulated. The resulting MATα and MATa segregants were barcoded at a common genomic
location and sequenced. Segregants were mated as pairs to generate a panel of ~240,000 double-
barcoded diploid segregants with known genotypes. All diploid segregants originating from a
single haploid parent are referred to as ‘family’. B. MATα and MATa barcodes were brought to
the same genomic location by inducing recombination between homologous chromosomes via
Cre-loxP. C. Diploid segregants were pooled and grown in competition for 12-15 generations.
Barcode sequencing over the course of the competition was used to estimate the fitness of each
strain. D. Density plot of the raw fitness of double barcodes representing the same genotype in the
same pooled growth condition (Glucose 1). E. Density plot of the mean raw fitness of the same
genotype measured in two replicate growth cultures (Glucose 1 and Glucose 2). F. The broad-
sense and narrow-sense heritability estimates for the 8 environments. The standard errors for both
heritability estimates are shown as error bars for each point. G. Violin plots of the fitnesses of
diploid segregants in 8 environments. Raw fitness estimates of BY/BY, BY/3S, 3S/BY, and 3S/3S
diploid segregants are shown as colored lines.
MATa strains and MATα strains (2 barcodes per segregant) were mated as pairs and grown
on media that induced site-directed recombination between the MATa barcode and MATα barcode
on homologous chromosomes (Fig. 1B and Fig. S1E) (28, 29). This process resulted in a double
barcode on one chromosome that uniquely identifies both parents of a diploid segregant and
therefore its presumptive genotype (Fig. S2B). Using similar methods, we also constructed
BY/BY, BY/3S, 3S/BY, and 3S/3S parental diploids.
After the matings, diploids were pooled and competed in seven conditions: cobalt chloride,
copper sulfate, glucose, hydrogen peroxide, sodium chloride, rapamycin, and zeocin with the
glucose condition performed twice (Table S1). Cells were grown for ~15 generations in serial
batch culture, with 1:8 dilution every ~3 generations and a bottleneck population size greater than
2 x 10
9
cells (Fig. 1C). Double barcodes were enumerated over 4-5 timepoints by sequencing
amplicons from the double barcode locus, and the resulting frequency trajectories were used to
estimate the relative fitness of each strain (30, 31).
We recovered on average 197,267 strains per environment with a minimum of two replicate
fitness estimates (Table S2). Fitness measures correlated well between different barcodes marking
204
the same strain within a growth pool (0.524 < r < 0.8 Spearman’s correlation, Fig. 1D and Fig. S3-
4), as did fitness measures of the same barcoded strain assayed in replicate glucose growth cultures
(Spearman’s correlation = 0.863, Fig. 1E).
Substantial phenotypic diversity was observed in every environment. The majority of this
variation was due to genetic factors: broad-sense heritabilities were on average 61% (52-76%
across environments), with 40% (19-53%) being additive and 21% (19-26%) being non-additive
(Fig. 1F, Fig. S5, and Table S3). Every environment contained many diploids with more extreme
fitness than either the BY/BY or 3S/3S parent (i.e., transgressive segregation). BY/3S and 3S/BY
segregants were more fit than the BY/BY or 3S/3S diploids in all environments but one (i.e.,
heterozygote advantage, Fig. 1G).
C.5.2 Mapping within interrelated families increases statistical power
Using quantile normalized fitness estimates from barcode sequencing, we mapped loci that
contribute to growth (Fig. S6). Due to our experimental design, diploids generated from the same
haploid parent (families) are more genetically related than diploids generated from different
parents (Fig. 1A). Such family structure causes false positives in genetic mapping. Here, we found
that most sites throughout the genome exceeded nominal significance thresholds when fixed
effects linear models were applied in a given environment (Fig. S7). To enable mapping despite
the family structure, we used mixed effects linear models, which are commonly employed in
genetic mapping studies involving populations in which individuals show nonrandom relatedness.
Specifically, we used Factored Spectrally Transformed Linear Mixed Models (FaST-LMM) (32,
33) to identify an average of 17 loci per environment (10 to 26).
Contrary to expectations that larger sample sizes should yield better statistical power, and
therefore, more detections, the numbers of loci identified here were comparable to studies that
205
were at least 60-fold smaller (2, 14, 23). There are multiple potential, non-mutually exclusive
explanations for this observation. For example, statistical controls for family structure may result
in a high rate of false negatives or the effects of loci may depend on genetic background. To bypass
these difficult to disentangle issues, each of which may impact statistical power, we performed
linkage mapping using an alternative strategy that did not require explicitly controlling for family
structure. Fixed effects linear models were conducted individually within each of 392 families of
diploids that descended from distinct MATa parents and consisted of ~600 individuals each.
The family-level scans yielded an average of 16.3 detections per family (Fig. 2A-B and
Fig. S8), which were largely reproducible across replicate cultures (Fig. 2C). Detections across the
families were then consolidated using 95% confidence intervals, resulting in approximately 58.3
distinct loci per environment (49 to 65), >2.5-fold more loci than detected by FaST-LMM (Fig.
2A, B, and D). These loci included ~85% of the loci detected using FaST-LMM, implying that the
results of family tests encompass those of conventional approaches using aggregate data. Loci
identified only in family tests had an average effect that was around one-third of loci detected by
both FaST-LMM and family tests (0.08 vs. 0.25; Fig 2E). This implies that despite smaller sample
sizes, mapping within families provided greater statistical power than mapping in the aggregate
data while controlling for relatedness.
206
207
Fig. 2 Identification of loci that affect fitness A-B. Loci mapped in CoCl2 (A) and CuSO4 (B).
Panels from top to bottom are 1) loci detected using the mixed effects linear model FaST-LMM
(red bars), 2) loci with dominance effects detected using a fixed effects linear model on the
nonadditive portion of each diploid’s phenotype (green bars), 3) loci detected using family-tests
(black or blue points), and 4) the total number of detections across families for each 50 kb interval
(grey bars). C. Violin plot showing the % of loci that were detected in both glucose replicates for
each family. D. Barplot of the number of loci detected by family-tests (red) and FaST-LMM (blue).
E. Family-level effect size (3S/3S - Het or Het - BY/BY) of loci detected with FaST-LMM in
CoCl2 (left panel) and family-tests (right panel). Colors indicate whether an effect was detected
(black) or undetected (gray) in a family-level scan. F. Examples of loci with only additive effect
(or low dominance), incomplete dominance, complete dominance, overdominance, and
underdominance. Black lines are the mean fitness of diploids subsetted by the genotype state at
the focal locus. Gray lines are the standard errors. Green lines are the expected mean fitness of
heterozygotes assuming no dominance. Genotype state at each locus is denoted by colored boxes:
BY/BY (blue), 3S/3S (orange), is BY/3S (half blue, half orange). Dominance and additive effects
(blue and red bars, respectively) for each subset of the data are shown next to the relevant genotype
classes. The degree of dominance at a locus is included in parentheses. G. Violin plot showing the
degree of dominance for all loci detected in the dominance scan. Loci with positive values are
dominant towards the allele conferring higher fitness (green), while loci with negative values are
dominant towards the deleterious allele (red). All loci with degree of dominance >100% or <-
100% exhibit overdominance and underdominance, respectively.
C.5.3 Loci frequently show dominance effects
In diploids, non-additivity can arise due to dominance among alleles at the same locus,
epistasis between alleles at different loci, or a mixture of the two. To identify such non-additive
loci from the aggregate data, we extracted the non-additive portion of each diploid’s phenotype
(Fig. S9) (23). Using these values accounts for family structure and enables mapping of non-
additive loci in the full segregant panel with fixed effects linear models. Regarding dominance
effects, we identified an average of 18 loci showing dominance per environment (12 to 30). Only
45% of these loci were also identified by FaST-LMM, while 82% of these loci were detected in
the family-level scans. Among the loci with dominance effects, the average degree of dominance
was ~51% (i.e., heterozygotes’ fitnesses were roughly halfway between the average of the two
homozygotes and one of the homozygotes), with 82% of the loci showing incomplete dominance
(Fig. 2F-G). Only ~7% of the loci exhibited complete dominance, while overdominance (~8%)
208
and underdominance (~3%) were seen among the remaining loci with dominance effects. ~77%
of the loci showed dominance towards the allele conferring higher fitness (Fig. 2G), which may
explain why segregants were more fit than the BY/BY or 3S/3S diploids(Fig. 1G).
C.5.4 Epistatic hubs govern both additivity and non-additivity
We also used the non-additive portion of phenotype to perform comprehensive genome-
wide scans for genetic interactions. We identified an average of 440 two-locus interactions per
environment (377 to 538) (Fig. 3A and Fig. S10). Our large sample size had a pronounced impact
on detection: ~40-fold more interactions per environment were detected than previous studies that
phenotyped smaller mapping populations using conventional approaches (14, 23). Our large
sample size also enabled comprehensive scans for three-locus interactions with a reduced set of
markers, identifying an average of 6,152 per environment (4,845 to 7,301) (Fig. 3A and Fig. S11).
Loci involved in three-locus interactions were identified across all chromosomes and distributed
widely throughout the genome.
We next analyzed the relationship between individual loci and their genetic interactions.
We found a strong positive relationship between the effect of a locus and its involvement in two-
and three-locus interactions (Fig. 3B). This suggests that loci with larger effects tend to genetically
interact with many loci or that their interactions are easier to detect. We also observed a clear linear
relationship between the number of two- and three-locus interactions of a given locus (Fig. 3C).
Notably, certain loci exhibited many more interactions than others, acting as ‘hubs’ (here defined
as loci with >20 two-locus interactions in at least one environment) (8). On average, ~4.5 hubs
were detected per environment, and the same hub was often detected in multiple environments. A
majority (>54%) of all two- and three-locus interactions involved at least one hub. Fine-mapping
localized the Chromosome VI, VIII, X, and XII hubs to genes involved in amino acid sensing
209
(PTR3), copper resistance (CUP1), vacuolar protein sorting (VPS70), and a gene of unknown
function (YLR257W), respectively.
C.5.5 Relationships between epistasis and dominance in diploids
In haploids, epistasis can only influence the additive effect of a locus because there are no
heterozygotes. In diploids, however, epistasis can modify a locus’ additive effects, dominance
effects, or both additive and dominance effects (20–22). To better characterize how loci are
modified, each two-locus interaction was partitioned into additive and dominance components.
We found that changes in dominance account for ~44% of the average epistatic effect (Fig. 3DE),
implying that interactions often affect both additivity and dominance. However, this fraction
varied depending on whether the modifying locus was a hub. When the modifier was a hub,
dominance accounted for little of the epistatic effect (11.9% on average), implying that hubs
mostly modify the additive component of the interacting loci. By comparison, when the modifier
was not a hub, epistasis was mostly composed of dominance (64% of interactions had a larger
dominance component). These data suggest that epistasis commonly involves modification of
dominance in diploids and that hubs act in a distinct manner from loci that are not hubs.
210
211
Fig. 3 Interactions often affect both the additive and dominance effects of involved loci A.
Interaction plots of all two-locus (left) and three locus (right) effects for two representative
environments. Significant interactions between loci are shown as connecting lines. Green bars are
the effect size of a locus, calculated as the absolute difference between the mean fitness of diploids
that are 3S/3S and BY/BY at the focal locus. Orange bars are the number of interactions detected
for each locus. B. Scatter plot of the absolute effect size of a locus and the number of two-locus
(left) and three-locus (right) interactions in which it is involved. Local regressions are shown as
blue lines. C. Scatter plot of the number of two-locus and three-locus interactions per locus. D.
Examples of genetic interactions with different fractions of epistasis involving dominance. Black
lines are the mean fitness of diploids subsetted by the genotype state at the two involved loci. Gray
lines are the standard errors. Green lines are the expected mean fitness of heterozygotes assuming
no dominance. Genotype state at each locus is denoted by colored boxes: BY/BY (blue), 3S/3S
(orange), is BY/3S (half blue, half orange). The first locus is the locus whose effect is being
modified, and the second locus is the modifier locus. Dominance and additive effects (blue and
red bars, respectively) for each subset of the data are shown next to the relevant genotype classes.
E. Density plot of the fraction of epistasis involving dominance for all interactions (red), hub--hub
(yellow), non-hub--hub (blue), hub--nonhub (green), and non-hub--non-hub interactions (purple).
We next examined how the additive and dominance effects of hubs were modified by
genetic interactions. In most cases, hubs genetically interacted with a small number of major effect
modifiers and many minor effect modifiers (Fig. 4A). The major effect modifiers typically
influenced only the additive or only the dominance effect of a hub, suggesting that distinct sets of
loci govern additive and dominance effect sizes (Fig. 4A). Whereas the most frequent major effect
modifiers of the additive effects of hubs were other hubs (Fig. 4B), the single most frequent major
effect modifier of the dominance effects of hubs was a locus on Chromosome III. Collectively,
multiple modifier loci could cause a hub locus to show a broad range of effect sizes across different
genetic backgrounds (Fig. 4C).
212
Fig. 4 Multiple modifier loci cause hubs to exhibit a range of effect sizes across different genetic
backgrounds A. Specific examples of hubs on chromosome VI, X, and XII (each row) and their
additive (left) and dominance (right) modifiers. The height of the bar corresponds to the magnitude
of the modifying effect. The dotted red line shows the threshold in which loci were considered as
major effect modifiers. B. Barplot showing the total number of times loci were detected as a major
effect modifier of additive (left) or dominance (right) effects of hubs across environments. Colored
dots indicate hub loci. An asterisk indicates a non-hub locus on ChrIII. C. Additive (left) or
dominance (right) effect size of chromosome VI hub across different allelic combinations of its 4
largest effect modifiers. Red points are the effect size of a genotype class based on the genotype
state of the 4 modifiers. Black point is the overall effect size of the locus. Black lines are
bootstrapped 95% confidence intervals.
C.5.6 Characteristics of the Chromosome III dominance modifier
Although not a hub, the Chromosome III locus nevertheless had a prominent impact on
phenotype by modifying the dominance effects of multiple variable effect loci. Interactions with
the Chromosome III locus had greater impacts on dominance than additivity at all focal loci. For
example, in hydrogen peroxide, dominance at the Chromosome X variable effect locus depended
213
on the Chromosome III locus, ranging from complete to nearly absent in a genotypedependent
manner (Fig. 5A). We delimited the Chromosome III locus to a 3 kb region containing the mating
locus and a few other genes (BUD5, TAF2, and YCR041W).
Fig. 5 Parent-of-origin of the mating locus influences dominance at hubs A. Violin plots of the
fitness distribution of diploids split by the genotype at the Chromosome X locus (top), further split
by the genotype at the mating locus on Chromosome III (middle), and the parent-of-origin at the
mating locus (bottom). Genotype state at each locus is denoted by colored boxes: BY/BY (blue),
3S/3S (orange), is BY/3S (half blue, half orange). Lines are the observed mean fitness in the
homozygous genotype classes (gray), the observed mean fitness in heterozygous genotype classes
(red), and the expected heterozygous fitness if there was no dominance (blue). B. Violin plots of
214
the fitness distribution of diploids split by the genotype state of a hub locus and the parent-of-
origin of the mating locus.
Yeast mating types possess different nonhomologous gene cassettes at the mating locus,
which encode distinct transcription factors that are master regulators of the MATa, MATα, and
diploid transcriptional programs (34). This region of the genome is unique because four genotype
classes segregate (BY MATa, 3S MATa, 3S MATα, and BY MATα), and as a result, the two
heterozygotes are not identical (Fig. S12). To test if the mating locus is the dominance modifier,
we partitioned Chromosome III heterozygotes based on their parents-of-origin for the MATa and
MATα cassettes and found a difference: dominance was only visible in the 3S MATa / BY MATα
genotype class (Fig. 5A). Other hub loci modified by Chromosome III showed the same
relationship between dominance and the parent-of-origin of the mating loci (Fig. 5B). These results
suggest that BY and 3S harbor functional differences in one or both mating cassettes.
C.4 Discussion
We used a double barcoding system to generate and phenotype an extremely large panel of
diploid yeast segregants that can be partitioned into hundreds of interrelated families. This
experimental design enabled the detection of thousands of loci, including at least an order of
magnitude more genetic interactions than discovered in previous yeast crosses. Analysis of these
epistatic loci identified a modest number of hubs that have large effects, show pervasive epistasis,
and control most phenotypic variation across environments, as well as many other loci that
genetically interact with these hubs.
Genetic background commonly modified the magnitude of, or completely masked, the
effects of the hubs, indicating that the largest effect loci identified in mapping studies are highly
sensitive to genetic background. Such non-additive genetic background effects are likely to hamper
215
efforts to predict phenotype from genotype by limiting the extrapolation of effect estimates from
one genetic context to others. However, our finding that large effect loci were most impacted by
other major effect loci does provide some optimism that characterizing a limited set of interactions
may account for a substantial portion of these genetic background effects.
Because our experiments were performed in outbred diploids rather than haploids or inbred
diploids, we could detect dominance effects and whether dominance is modified by epistasis. We
showed that dominance effects are common and that the magnitude of dominance can strongly
depend on the alleles of interacting loci. The potential existence of dominance modifiers has been
discussed in theory, but to date, only a single dominance modifier has been found in a plant self-
incompatibility locus (35, 36). Our results show that dominance modifiers are prevalent and raise
the intriguing possibility that sites with atypical allele dynamics within natural populations, the
yeast mating locus here and a self-incompatibility locus in plants, are more likely to harbor
dominance modifiers with major effects.
Generally, we found that heritable traits in yeast are more genetically complex than
formerly appreciated. Relative to the cross that we examined, natural populations may harbor
substantially higher genetic diversity, meaning traits could be even more complex and difficult to
dissect. Our work supports the premise that, to the extent possible, focusing on groups of more
closely related individuals, such as the families studied here, can enhance statistical power and
precision relative to populations with greater diversity (23, 24, 37). The genetic insights gained
from these more closely related groups can then be leveraged to inform the genetic architecture of
traits in more diverse populations in which many critical genetic effects may otherwise be
obscured.
216
C.7 References
1. T. F. C. Mackay, E. A. Stone, J. F. Ayroles, The genetics of quantitative traits: challenges
and prospects. Nat. Rev. Genet. 10, 565–577 (2009).
2. J. S. Bloom, I. M. Ehrenreich, W. T. Loo, T.-L. V. Lite, L. Kruglyak, Finding the sources
of missing heritability in a yeast cross. Nature. 494, 234–237 (2013).
3. E. A. Boyle, Y. I. Li, J. K. Pritchard, An Expanded View of Complex Traits: From
Polygenic to Omnigenic. Cell. 169, 1177–1186 (2017).
4. H. Shao, L. C. Burrage, D. S. Sinasac, A. E. Hill, S. R. Ernest, W. O’Brien, H.-W.
Courtland, K. J. Jepsen, A. Kirby, E. J. Kulbokas, M. J. Daly, K. W. Broman, E. S. Lander,
J. H. Nadeau, Genetic architecture of complex traits: Large phenotypic effects and
pervasive epistasis. Proc. Natl. Acad. Sci. 105, 19910–19914 (2008).
5. T. F. C. Mackay, Epistasis and quantitative traits: using model organisms to study gene–
gene interactions. Nat. Rev. Genet. 15, 22–33 (2014).
6. W. Huang, T. F. C. Mackay, The Genetic Architecture of Quantitative Traits Cannot Be
Inferred from Variance Component Analysis. PLOS Genet. 12, e1006421 (2016).
7. M. B. Taylor, I. M. Ehrenreich, Higher-order genetic interactions and their contribution to
complex traits. Trends Genet. 31, 34–40 (2015).
8. S. K. G. Forsberg, J. S. Bloom, M. J. Sadhu, L. Kruglyak, Ö. Carlborg, Accounting for
genetic interactions improves modeling of individual quantitative trait phenotypes in yeast.
Nat. Genet. 49, 497–503 (2017).
9. H. C. Rowe, B. G. Hansen, B. A. Halkier, D. J. Kliebenstein, Biochemical Networks and
Epistasis Shape the Arabidopsis thaliana Metabolome. Plant Cell. 20, 1199–1216 (2008).
10. W.-H. Wei, G. Hemani, C. S. Haley, Detecting epistasis in human complex traits. Nat. Rev.
Genet. 15, 722–733 (2014).
11. I. M. Ehrenreich, Epistasis: Searching for Interacting Genetic Variants Using Crosses.
Genetics. 206, 531–535 (2017).
12. D. S. Falconer, T. F. C. Mackay, Introduction to Quantitative Genetics (Longmans Green,
Harlow, Essex, UK, ed. 4th, 1996).
13. M. Lynch, B. Walsh, Genetics and Analysis of Quantitative Traits (Sinauer Associates,
Sunderland, MA, 1998).
14. J. S. Bloom, I. Kotenko, M. J. Sadhu, S. Treusch, F. W. Albert, L. Kruglyak, Genetic
interactions contribute less than additive effects to quantitative trait variation in yeast. Nat.
Commun. 6, 8712 (2015).
15. M. B. Taylor, I. M. Ehrenreich, Genetic Interactions Involving Five or More Genes
Contribute to a Complex Trait in Yeast. PLoS Genet. 10, e1004324 (2014).
16. W. Huang, S. Richards, M. A. Carbone, D. Zhu, R. R. H. Anholt, J. F. Ayroles, L. Duncan,
K. W. Jordan, F. Lawrence, M. M. Magwire, C. B. Warner, K. Blankenburg, Y. Han, M.
Javaid, J. Jayaseelan, S. N. Jhangiani, D. Muzny, F. Ongeri, L. Perales, Y.-Q. Wu, Y.
Zhang, X. Zou, E. A. Stone, R. A. Gibbs, T. F. C. Mackay, Epistasis dominates the genetic
architecture of Drosophila quantitative traits. Proc. Natl. Acad. Sci. 109, 15553–15559
(2012).
17. The 1000 Genomes Project Consortium, A map of human genome variation from
population-scale sequencing. Nature. 467, 1061–1073 (2010).
217
18. P. M. Magwene, Ö. Kayıkçı, J. A. Granek, J. M. Reininga, Z. Scholl, D. Murray,
Outcrossing, mitotic recombination, and life-history trade-offs shape genome evolution in
Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. 108, 1987–1992 (2011).
19. G. A. Churchill, D. M. Gatti, S. C. Munger, K. L. Svenson, The diversity outbred mouse
population. Mamm. Genome. 23, 713–718 (2012).
20. J. M. Cheverud, E. J. Routman, Epistasis and its contribution to genetic variance
components. Genetics. 139, 1455–1461 (1995).
21. J. M. Cheverud, E. J. Routman, Epistasis as a source of increased additive genetic variance
at population bottlenecks. Evolution. 50, 1042–1051 (1996).
22. R. F. Campbell, P. T. McGrath, A. B. Paaby, Analysis of Epistasis in Natural Traits Using
Model Organisms. Trends Genet. 34, 883–898 (2018).
23. J. Hallin, K. Märtens, A. I. Young, M. Zackrisson, F. Salinas, L. Parts, J. Warringer, G.
Liti, Powerful decomposition of complex traits in a diploid model. Nat. Commun. 7, 13311
(2016).
24. K. Märtens, J. Hallin, J. Warringer, G. Liti, L. Parts, Predicting quantitative traits from
genome and phenome with near perfect accuracy. Nat. Commun. 7, 11512 (2016).
25. M. B. Taylor, J. Phan, J. T. Lee, M. McCadden, I. M. Ehrenreich, Diverse genetic
architectures lead to the same cryptic phenotype in a yeast cross. Nat. Commun. 7, 11669
(2016).
26. M. N. Mullis, T. Matsui, R. Schell, R. Foree, I. M. Ehrenreich, The complex underpinnings
of genetic background effects. Nat. Commun. 9, 3548 (2018).
27. S. F. Levy, J. R. Blundell, S. Venkataram, D. A. Petrov, D. S. Fisher, G. Sherlock,
Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature. 519,
181–186 (2015).
28. X. Liu, Z. Liu, A. K. Dziulko, F. Li, D. Miller, R. D. Morabito, D. Francois, S. F. Levy,
iSeq 2.0: A Modular and Interchangeable Toolkit for Interaction Screening in Yeast. Cell
Syst. 8, 338-344.e8 (2019).
29. U. Schlecht, Z. Liu, J. R. Blundell, R. P. St.Onge, S. F. Levy, A scalable double-barcode
sequencing platform for characterization of dynamic protein-protein interactions. Nat.
Commun. 8, 15586 (2017).
30. L. Zhao, Z. Liu, S. F. Levy, S. Wu, Bartender: a fast and accurate clustering algorithm to
count barcode reads. Bioinformatics. 34, 739–747 (2018).
31. F. Li, M. L. Salit, S. F. Levy, Unbiased Fitness Estimation of Pooled Barcode or Amplicon
Sequencing Studies. Cell Syst. 7, 521-525.e4 (2018).
32. C. Lippert, J. Listgarten, Y. Liu, C. M. Kadie, R. I. Davidson, D. Heckerman, FaST linear
mixed models for genome-wide association studies. Nat. Methods. 8, 833–835 (2011).
33. C. Widmer, C. Lippert, O. Weissbrod, N. Fusi, C. Kadie, R. Davidson, J. Listgarten, D.
Heckerman, Further Improvements to Linear Mixed Models for Genome-Wide
Association Studies. Sci. Rep. 4, 6874 (2015).
34. J. E. Haber, Mating-Type Genes and MAT Switching in Saccharomyces cerevisiae.
Genetics. 191, 33–64 (2012).
35. Y. Tarutani, H. Shiba, M. Iwano, T. Kakizaki, G. Suzuki, M. Watanabe, A. Isogai, S.
Takayama, Trans-acting small RNA determines dominance relationships in Brassica
selfincompatibility. Nature. 466, 983–986 (2010).
36. S. Billiard, V. Castric, Evidence for Fisher’s dominance theory: how many ‘special cases’?
Trends Genet. 27, 441–445 (2011).
218
37. A. I. Young, S. Benonisdottir, M. Przeworski, A. Kong, Deconstructing the sources of
genotype-phenotype associations in humans. Science. 365, 1396–1400 (2019).
Abstract (if available)
Abstract
Connecting genotypes to phenotypes is often challenging. Many traits are genetically complex and involve many loci. Exactly how loci collectively produce phenotypes is poorly understood at the molecular level. Furthermore, natural populations can harbor substantial amounts of genetic variation making it difficult to detect and identify causal variation underlying specific traits. The way that genotypes specify phenotypes is dynamic, and environmental or genetic perturbations can alter the relationship between genotype and phenotype.
In this dissertation, I use genetic perturbations to examine the genotype-phenotype relationship. In chapter 2, I show that gene knockouts modify how pairs and trios of genetic variants contribute to growth. In chapter 3, I determine the genetic variation underlying variable expressivity of a spontaneous mutation which produces phenotypes that range from lethal to benign. In chapter 4, I show that a phenotypically silent gene deletion has wide-spread biochemical effects on expression and chromatin accessibility. Together, these studies shed light on the genetic and molecular mechanisms connecting genotypes to phenotype.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Exploring the genetic basis of quantitative traits
PDF
Complex mechanisms of cryptic genetic variation
PDF
Genetic architectures of phenotypic capacitance
PDF
Genome-scale insights into the underlying genetics of background effects
PDF
Understanding the genetic architecture of complex traits
PDF
The complex genetic and molecular basis of oxidative stress tolerance
PDF
Understanding the genetics, evolutionary history, and biomechanics of the mammalian penis bone
PDF
Robustness and stochasticity in Drosophila development
PDF
Developing genetic tools to assist in the domestication of giant kelp
PDF
Understanding genetics of traits critical to the domestication of crops using Mixed Linear Models
PDF
Phenotypic plasticity and its ecological and evolutionary significance for reef building coral
PDF
The evolution of gene regulatory networks
PDF
Genetic architecture underlying variation in different traits in the Pacific oyster Crassostrea gigas
PDF
Cellular level bottlenecks: genetic diversity, population dynamics, and technology development
PDF
Male-female conflict after mating: function and dynamics of the copulatory plug in mice (Mus domesticus)
PDF
Evolutionary mechanisms responsible for genetic and phenotypic variation
PDF
Dps contributes to typical growth, survival, and genome organization in E. coli
PDF
Natural divergence of traits in species of mice reveals novel molecular mechanisms of cellular senescence
PDF
Genomic and phenotypic novelties in the Southeast Asian house mouse
PDF
Evolutionary genomic analysis in heterogeneous populations of non-model and model organisms
Asset Metadata
Creator
Schell, Rachel
(author)
Core Title
Genetic and molecular insights into the genotype-phenotype relationship
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Degree Conferral Date
2022-05
Publication Date
04/18/2022
Defense Date
09/07/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
epistasis,genotype,OAI-PMH Harvest,phenotype,quantitative genetics
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ehrenreich, Ian (
committee chair
), Arnheim, Norman (
committee member
), Dean, Matthew (
committee member
), Phillips, Carolyn (
committee member
), Pratt, Matthew (
committee member
)
Creator Email
rachelschellscience@gmail.com,rsschell@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111004384
Unique identifier
UC111004384
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Schell, Rachel
Type
texts
Source
20220418-usctheses-batch-928
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
epistasis
genotype
phenotype
quantitative genetics