Close
About
FAQ
Home
Collections
Login
USC Login
Register
0
Selected
Invert selection
Deselect all
Deselect all
Click here to refresh results
Click here to refresh results
USC
/
Digital Library
/
University of Southern California Dissertations and Theses
/
Exploring the genetic basis of quantitative traits
(USC Thesis Other)
Exploring the genetic basis of quantitative traits
PDF
Download
Share
Open document
Flip pages
Contact Us
Contact Us
Copy asset link
Request this asset
Transcript (if available)
Content
Exploring the Genetic Basis of Quantitative Traits
By
Martin N. Mullis
A dissertation presented to the
FACULTY OF THE USC GRADUATE SCHOOL
UNIVERSITY OF SOUTHERN CALIFORNIA
In partial fulfillment of the
Requirements for the degree
DOCTOR OF PHILOSOPHY
(MOLECULAR BIOLOGY)
May 2022
Copyright 2022 Martin N. Mullis
ii
Acknowledgements
I would like to thank my advisor, Ian Ehrenreich, as well as my lab mates and collaborators (past
and present) for their invaluable input and assistance with the work presented in this dissertation.
I would also like to thank my committee members: James Boedicker, Matt Dean, Steven Finkel,
and Sergey Nuhzdin.
To my family and friends: without you, I wouldn’t be here. And finally, thank you to my partner
Jenn, who has supported me through this journey. I will try my best to do the same as you pursue
your own research.
iii
Table of Contents
Acknowledgements ii
List of Tables vi
List of Figures viii
Abstract xii
Chapter 1: Introduction 1
1.1 CONNECTING GENOTYPE TO PHENOTYPE 1
1.2 EPISTASIS MODIFIES THE EFFECTS OF VARIANTS ACROSS GENETIC BACKGROUND 1
1.3 DOMINANCE AND THE ARCHITECTURE OF COMPLEX TRAITS IN DIPLOIDS 2
1.4 PLEIOTROPY 3
1.5 GOALS OF THIS DISSERTATION 4
1.6 CHAPTER SUMMARY 5
1.7 REFERENCES 7
Chapter 2: The complex underpinnings of genetic background effects 12
2.1 AUTHOR CONTRIBUTION TO THIS WORK 12
2.2 ABSTRACT 13
2.3 INTRODUCTION 15
2.4 RESULTS 17
2.3.1. Preliminary screen 17
2.3.2. Mapping of mutation-independent and mutation-responsive effects 17
2.3.3. Higher-order epistasis influences response to mutations 20
2.3.4. The environment plays a strong role in background effects 22
2.3.5. Interactions between segregating loci and knockouts 23
2.3.6. Genetic basis of induced changes in phenotypic variance 25
2.5 DISCUSSION 28
2.6 METHODS 30
2.5.1. Generation of different BYx3S knockout backgrounds 30
2.5.2. Genotyping of segregants 31
2.5.3. Phenotyping of segregants 32
2.5.4. Scans for one-locus effects 33
2.5.5. Scans for two-locus effects 34
2.5.6. Scans for three-locus effects 35
2.5.7. Assignment of mutation-responsive effects to knockouts 36
2.5.8. Statistical power analysis 37
2.5.9. Contributions of loci involved in higher-order epistasis 38
2.5.10. Analysis of mutation-responsive effects across environments 38
2.5.11. Genetic explanation of changes in phenotypic variance 39
2.5.12. Checking potential consequences of allele frequency bias 39
2.7 SUPPLEMENTARY MATERIALS 41
Chapter 3: The interplay of additivity, dominance, and epistasis in a diploid yeast cross 79
3.1. ABSTRACT 79
3.2. INTRODUCTION 80
3.3. RESULTS 82
3.3.1. Phenotyping of a large diploid cross by barcode sequencing 82
iv
3.3.2. Mapping within interrelated families increases statistical power 85
3.3.3. Loci frequently show dominance effects 88
3.3.4. Epistatic hubs govern both additivity and non-additivity 89
3.3.5. Relationships between additivity and dominance in diploids 92
3.3.6. Characteristics of the ChrIII dominance modifier 93
3.4. DISCUSSION 96
3.5. METHODS 98
3.5.1. Generation of haploid segregants 98
3.5.2. Barcoding of haploid segregants 99
3.5.3. Whole genome sequencing of parental haploid segregants 100
3.5.4. Determining barcode sequences 102
3.5.5. Generation of diploid segregants 103
3.5.6. Translocating barcodes onto the same chromosome 103
3.5.7. Growth Assays 104
3.5.8. Library preparation 105
3.5.9. Barcode sequencing 107
3.5.10. Counting double barcode reads 108
3.5.11. Fitness estimation 109
3.5.12. Heritability estimates 110
3.5.13. Quantile normalization of fitness 111
3.5.14. Genome-wide scan for one-locus effects 111
3.5.15. Family-level scans for one-locus effects 112
3.5.16. Effect size of loci across different families 113
3.5.17. Correcting for family structure 114
3.5.18. Genome-wide scans for loci with dominance 114
3.5.19. Calculating degree of dominance 115
3.5.20. Comprehensive scan for pairwise interactions 116
3.5.21. Scan for three-locus interactions 117
3.5.22. Resolving hub loci to single-gene resolution using recombination breakpoints 118
3.5.23. Estimation of the fraction of epistatic effect involving dominance 119
3.5.24. Examination of how combinations of modifiers explain variance in hub effect size 119
3.6. SUPPLEMENTARY MATERIALS 121
Chapter 4: Widespread antagonistic pleiotropy causes differential fungal persistence across mammalian
organs 139
4.1. ABSTRACT 139
4.2. INTRODUCTION 140
4.3. RESULTS 142
4.3.1. Persistence of barcoded yeast strains inside a mammalian host 142
4.3.2. Antagonistic pleiotropy constrains localization of strains within the host 146
4.4. DISCUSSION 157
4.5. METHODS 159
4.5.1. Generation of haploid segregants 159
4.5.2. Barcoding of haploid segregants 160
4.5.3. Whole genome sequencing of haploid segregants 160
4.5.4. Determination of segregant barcodes 161
4.5.5. Animal Husbandry 162
4.5.6. Experimental infection of mice 163
4.5.7. Organ harvesting and processing 164
4.5.8. YPD plate growth controls 164
4.5.9. Barcode library preparation 165
4.5.10. Barcode sequencing 167
4.5.11. Barcode quantification and phenotype calculation 168
4.5.12. Identification of samples with significant variation among segregants 168
4.5.13. Calculation of broad-sense heritability 169
v
4.5.14. Linkage mapping in individual samples 169
4.5.15. Consolidation of loci detected in individual samples 170
4.5.16. Aggregation of persistence measurements across samples 170
4.5.17. Linkage mapping with aggregate phenotype data 171
4.5.18. Calculating the effects of loci in individual samples 171
4.5.19. The effects of individual loci identified in aggregate phenotype data 172
4.5.20. Combinatorial effects of loci identified in aggregate data 172
4.5.21. Genes underlying loci detected in aggregate scans 174
4.6. SUPPLEMENTARY MATERIALS 176
Chapter 5: Concluding remarks 192
5.1. IMPACT OF MY WORK 192
5.1.1. Characterization of genetic background effects 192
5.1.2. Examining the architecture of complex traits in a diploid model 193
5.1.3. The genetics of a host-pathogen interaction 194
5.2. FUTURE DIRECTIONS 195
5.2.1. Chapter 2 195
5.2.2. Chapter 3 195
5.2.3. Chapter 4 196
References 198
Appendix A: 217
Genetic basis of a spontaneous mutation’s expressivity. 217
A.1. SUMMARY 217
A.2. INTRODUCTION 218
A.3. RESULTS 219
A.3.1. A spontaneous mutation increases phenotypic variation in the BYx3S cross 219
A.3.2. A large effect locus shows epistasis with mrp20-105E 220
A.3.3. Epistasis between MRP20 and MKT1 differs in cross parents and segregants 222
A.3.4. Fixation of mrp20-105E and MKT1 genotypes increases phenotypic variance 224
A.3.5. Many additional loci affect the expressivity of mrp20-105E 224
A.3.6. The chromosome XIV locus contains multiple causal variants 225
A.3.7. Aneuploidy also contributes to the expressivity of mrp20-105E 226
A.3.8. Multiple mechanisms underlie poor growth in the presence of mrp20-105E 226
A.3.9. Genetic underpinnings of mrp20-105E’s expressivity in segregants and parents 228
A.4. DISCUSSION 231
A.5. MATERIALS AND METHODS 233
A.5.1. Generation of segregants 233
A.5.2. Genotyping 234
A.5.3. Phenotyping 235
A.5.4. Linkage mapping 236
A.5.5. Classification of inviable segregants 237
A.5.6. Delimiting loci with recombination breakpoints 238
A.5.7. Reciprocal hemizygosity experiments 238
A.5.8. Construction of nucleotide replacement strains 239
A.5.9. Mitochondrial genome instability experiments 240
A.5.10. Modeling growth and examining the model in additional segregant populations 241
A.5.11. Relationship between detrimental alleles, growth, and inviability 241
A.6. SUPPLEMENTARY MATERIALS 242
A.7. REFERENCES 249
vi
List of Tables
Table S2.1. List of 47 genes that were screened for background effects. 56
Table S2.2. Screen summary statistics. 58
Table S2.3. Mapping population breakdown. 59
Table S2.4. Genomic regions with allele frequency bias. 60
Table S2.5. Fixed regions in BY/3S hemizygous diploids. 61
Table S2.6. Phenotyping environments. 62
Table S2.7. Number of mutation-independent and mutation-responsive genetic effects
across different significance thresholds. 63
Table S2.8. Chi-squared test results for mutation-responsive effects. 64
Table S2.9. Percentage of mutation-responsive genetic effects that showed phenotypic
effect in any environment outside the one in which they were originally detected. 65
Table S2.10. Number of mutation-responsive effects that show biased individual and
multi-locus allele frequencies. 66
Table S2.11. Phenotypic variance and the number of identified genetic effects. 68
Table S2.12. List of all primers used in this study. 76
Table S3.1 List of chemical additives and their concentrations used in the competition
experiments. 134
Table S3.2 Number of replicate diploid strains in each fitness assay. 135
Table S3.3 Broad and narrow-sense heritability estimates for each fitness assay. 136
Table S3.4 The number of unique genotypes and the number of barcode replicates
per genotype detected before filtering for quality. 137
Table S3.5 Estimated rate of PCR chimera. 138
Table S4.1 Results from principal component analysis. 186
Table S4.2 Loci detected across individual organ samples. 187
vii
Table S4.3 Loci detected in aggregate scans. 189
Supplementary Table S4.4 Candidate genes within loci detected in aggregate scans. 190
viii
List of Figures
Figure 2.1 Examples of mutation-responsive genetic effects. 19
Figure 2.2 Most mutation-responsive genetic effects involve multiple loci. 20
Figure 2.3 Higher-order epistasis among knockouts and multiple loci is an important
contributor to background effects. 22
Figure 2.4 Analysis of mutation-responsive effects across environments. 23
Figure 2.5 Analysis of mutation-responsive effects across knockout backgrounds. 25
Figure 2.6 Mutation-responsive effects underlie differences in phenotypic variance
between knockout and wild-type backgrounds across environments. 26
Figure S2.1. Generation of BYx3S knockout segregants. 41
Figure S2.2. Certain genes exhibit significant background effects when perturbed. 42
Figure S2.3. Allele frequency plot. 43
Figure S2.4. Growth of all 1,411 segregants across the 10 environments. 45
Figure S2.5. Individual and joint contributions of loci to background effects across
different significance thresholds. 47
Figure S2.6 Analysis of how mutation-responsive effects interact with different knockouts
at multiple significance thresholds. 48
Figure S2.7 Statistical power analysis for one, two, and three-locus interactions. 50
Figure S2.8 Extent to which mutation-responsive effects interact with different knockouts. 51
Figure S2.9. Individual and joint contributions of loci to background effects across
different significance thresholds. 52
Figure S2.10 All seven knockout populations show nominally significant correlations
between changes in phenotypic variance and detected mutation-responsive effects. 53
Figure S2.11. Most mutation-responsive effects show small differences in phenotypic
variance explained (‘PVE’) in mutants relative to wild type segregants. 54
ix
Figure 3.1 Generating a large panel of diploid segregants with known genotypes that
can be phenotyped as a pool. 84
Figure 3.2 Identification of loci that affect fitness. 87
Figure 3.3 Interactions often affect both the additive and dominance effects of involved
loci. 91
Figure 3.4 Multiple modifier loci cause hubs to exhibit a range of effect sizes across
different genetic backgrounds. 93
Figure 3.5 Parent-of-origin of the mating locus influences dominance at hubs. 95
Figure S3.1 Workflow of how haploid and diploid segregants were barcoded. 121
Figure S3.2 Genome-wide allele frequencies of haploid segregants and the inferred
genome-wide allele frequencies of diploid segregants. 123
Figure S3.3 Correlation of fitness estimates between strain replicates in each experiment. 124
Figure S3.4 Standard error as a function of mean strain fitness. 125
Figure S3.5 Correlation of fitnesses across environments. 126
Figure S3.6 Quantile normalization of fitnesses. 127
Figure S3.7 Most loci are detected as significant without family correction. 128
Figure S3.8 Loci with individual effects on fitness in each environment. 129
Figure S3.9 Accounting for differences in mean parental fitness corrects for family-level
fitness effects in the population. 130
Figure S3.10 Pairwise genetic interactions detected in each environment. 131
Figure S3.11 Three-way genetic interactions detected in each environment. 132
Figure S3.12 Genetic variation at the mating locus results in distinct heterozygote classes. 133
Figure 4.1. Experimental infections of mice using a pool of barcoded, haploid segregants. 145
Figure 4.2. Organ type is the main driver of variation in persistence across samples. 149
Figure 4.3. Identification of loci associated with persistence in the host in organ samples. 152
Figure 4.4 Identified loci show a mixture of general effects and antagonistic pleiotropy. 153
x
Figure 4.5. General and antagonistically pleiotropic loci collectively influence strain
persistence in the host. 155
Figure S4.1: Barcoding of haploid segregants. 176
Figure S4.2: Reproducibility of controls across plating density and organ homogenization. 178
Figure S4.3: Controlling for on-plate growth. 179
Figure S4.4: Outlier brain, gonad, and liver samples. 180
Figure S4.5: Hierarchical clustering and principal components analysis of organ samples. 181
Figure S4.6: Samples show a positive relationship between broad-sense heritability and
the number of detected loci. 183
Figure S4.7: Loci detected in both control and organ samples show different effects in
these two contexts. 185
Figure A.1. The mrp20-105E mutation occurred spontaneously, increasing phenotypic
variance in the BYx3S cross. 219
Figure A.2. Epistasis between MRP20 and MKT1 appears to mostly explain response to
mrp20-105E. 221
Figure A.3. Additional loci govern response to the mutation.. 223
Figure A.5. Detected loci quantitatively and qualitatively explain mutant phenotypes. 229
Supplementary Text SA.1. 242
Supplementary Text SA.2. 242
Figure SA.1. Identification of mrp20-105E. 243
Figure SA.2. Representative MRP20 and mrp20-105E segregants on ethanol. 245
Figure SA.3. Linkage mapping in the F3 panel more finely resolves the Chromosome
XIV locus. 246
Figure SA.4. Growth effects of loci detected in BY x 3S mrp20-105E crosses. 247
Figure SA.5. Loci affecting expressivity of mrp20-015E show minimal effects in
MRP20 segregants. 248
xii
Abstract
Understanding how genetic variants, both individually and collectively, contribute to the
heritable phenotypic variation of organisms is a central goal of modern genetics. While linkage
mapping and association studies continue to identify thousands of genetic variants involved in
complex traits across organisms, these variants often explain a small proportion of heritability in
phenotype. This is due to several factors, including but not limited to: the complexity of many
biological traits, which can involve many genetic loci; genetic interactions, which can cause loci
to exhibit conditional or nonlinear effects; and technical limitations, which prevent the mapping
of relevant traits or limit our ability to appropriately model a trait of interest.
The aim of this dissertation is to improve our collective understanding of the relationship
between genotype and complex, polygenic phenotypes. In order to accomplish this, I have
performed comprehensive dissection of the genetic basis of several quantitative traits in the
budding yeast, Saccaromyces cerevisiae. In chapter two, I demonstrate how standing genetic
variants interact with both each other and with genetic perturbations to influence cell growth. In
chapter three, I use a diploid model to characterize the role of dominance in pairwise genetic
interactions and show how certain loci exert a large influence on phenotype through genome-wide
interactions. In chapter four, I employ genomic barcoding to map the genetic basis of fungal
pathogenicity in mammalian model. Taken together, the research in this dissertation sheds light on
the genetic basis of complex traits, with a specific emphasis on epistasis and host-pathogen
interaction.
1
Chapter 1: Introduction
1.1 Connecting genotype to phenotype
Biological complexity often makes linking genotype to phenotype a difficult task. To date,
genome-wide linkage mapping and association studies have identified thousands of genetic loci
influencing complex phenotypes like disease susceptibility
(1, 2), human height
(3), and crop yield
(4). However, these loci often explain only small a fraction of the observed heritable phenotypic
variation, a discrepancy referred to as the “missing heritability problem”
(5, 6). Missing heritability
is a challenge facing many current genetic studies, as it hinders the ability to predict phenotype
from an organism’s genotype. In order to better understand how genotype specifies phenotype, a
more detailed characterization of the genetic architecture behind complex traits is required.
1.2 Epistasis modifies the effects of variants across genetic background
One factor that may complicate efforts to link genotype to phenotype is epistasis, or genetic
interaction
(7, 8). Epistasis occurs when two or more genetic variants influence the effect of one
another, causing each to exhibit different phenotypic effects across genetic backgrounds. Epistasis
among genetic variants is pervasive
(9); however, there are different perspectives on the extent to
which genetic background effects must be accounted for in genetic studies
(8, 10). Historically,
the perspective that additivity is the predominant contributor to quantitative traits, as well as the
fact that detecting epistasis is more challenging than detecting effects involving single variants,
has resulted in quantitative genetics studies favoring additive genetic models. Despite extensive
nonlinearity in biological systems
(11), this framework has been supported by reports that many
complex traits in laboratory animals and humans are predominantly additive
(10), and additive
genomic prediction models in agricultural breeding programs have been largely successful
(12–
14).
2
Despite these findings, there is increasing evidence that epistasis plays a key role in
specifying a wide range of medically, agriculturally, and evolutionarily important traits. Studies
of the heat shock protein HSP90, among other genes, have demonstrated that the consequences of
gene perturbation differ across genetic backgrounds
(15–17). In humans, Mendelian disorders vary
in their expressivity and penetrance, suggesting that deleterious alleles can be suppressed in certain
individuals
(18). Finally, knockout studies in model systems have also shown that mutations in
certain genes result in inviability in certain individuals or strains but have miniscule effects in
others
(19).
There are now numerous studies demonstrating that interactions between standing genetic
variants and genetic perturbations or mutations can lead to unexpected phenotypes (20–22),
revealing that epistasis is a major contributor to phenotype in certain contexts. Theoretical work
has also shown that accounting for epistasis can improve phenotypic predictions, particularly for
extreme phenotypes
(23), and better explain the heritable component of polygenic traits
(11). These
results underscore the importance of accounting for genetic interactions when studying complex
traits, but few studies have systematically identified the genetic loci involved in epistasis across a
range of contexts. Thus, to deepen the understanding of how many genes collectively specify
phenotype, a thorough characterization of how the effects of variants change across genetic
background is necessary.
1.3 Dominance and the architecture of complex traits in diploids
While many organisms relevant to humans are diploid, studies of genetic background
effects often focus on haploid systems or recombinant inbred lines, as heterozygosity decreases
the statistical power to detect genetic effects
(24). In such systems, additive genetic effects are all
that can be detected, so dominance is overlooked. However, genetic interactions can modify the
3
additive, dominance, or both components of loci
(24, 25). This means that in addition to
dominance, a set of genetic interactions underlying background effects are undetectable in these
contexts.
While dominance can prove a technical challenge to mapping studies, it appears to play an
important role in specifying phenotype in a range of important species, including humans
(26, 27).
Dominance and dominance-related epistasis can be a major contributor to heterosis observed in
experimental systems and in agriculture
(28–30). Additionally, dosage effects in gene products due
to partially or completely dominant effects and their interaction(s) with other loci have been
associated with disease in humans
(31). Multiple molecular mechanisms have been linked to these
effects in cases of dominance involving single loci (27, 31). However, dominance can also emerge
from interaction between multiple loci (25, 28 ,21), a phenomenon that is not as well characterized.
The ability to fully explain the heritability of complex traits likely depends on statistical
frameworks accounting for complex interactions between loci exhibiting a combination of additive
and dominance effects.
1.4 Pleiotropy
Genetic variants often affect multiple traits, a phenomenon known as pleiotropy
(32). As
the number of traits investigated by genetic mapping studies grows, so too does the number of
genetic loci associated with multiple distinct (and sometimes seemingly unrelated) phenotypes
(33–35). This observation complicates prediction of complex phenotypes from genetic data by
indicating that single genetic changes may have a profound effect on a multitude of traits. This
problem is exacerbated if interplay between one or more traits influenced by a genetic variant feed
back into the trait being studied
(34).
4
Pleiotropy is thought to result in evolutionary constraint, as mutations in a pleiotropic gene
or regulatory element is more likely to have a negative impact on at least one trait. From this
perspective, pleiotropy is expected to result in the accumulation of certain types of genetic
variation (e.g. variation in cis-regulatory elements)
(36–38) or favor specific types of genetic
networks (e.g. ‘scale-free’ networks) (39, 40). Genetic variants with beneficial effects on one
phenotype may also incur a penalty in other circumstances. This effect, known as “antagonistic
pleiotropy”, is implicated in aging
(41) and the persistence of disease alleles in populations
(42),
as it is hypothesized that certain variants with benefits early in life may be deleterious at later
developmental stages
(41). Despite the prevalence of pleiotropy in a range of biological processes,
it remains unclear which genetic variants are subject to this phenomenon and how much influence
it exerts over the relationship between genotype and phenotype.
1.5 Goals of this dissertation
My time in the Ehrenreich lab has been spent trying to determine how sets of genetic loci
individually and collectively contribute to complex phenotypes. Specifically, my research interests
centered on 1) how the effects of variants differ across genetic background, 2) the genetic
architecture of genetic background effects in diploid systems, and 3) how genetic background and
pleiotropy shapes the outcome of host-pathogen interactions.
The budding yeast, Saccharomyces cerevisiae, is an excellent model system for studying
these questions for a multitude of reasons. It is highly genetically tractable, with an expansive
toolkit of genetic engineering approaches and selectable markers available. Further, high levels
of homologous recombination and the ability to cross strains quickly allow for efficient mapping
studies with high statistical power and resolution. While quantitative genetic studies often utilize
haploid strains of budding yeast, the species more commonly exists as a diploid in nature
(43).
5
Strains can be mated together in all possible combinations to rapidly generate large populations of
diploids with known genotypes, making it possible to study complex traits in a diploid system.
Finally, S cerevisiae are closely related to other pathogenic fungal species
(44), and can themselves
act as an opportunistic pathogen
(45).
1.6 Chapter summary
In chapter 2, I use haploid segregants from a yeast cross to genetically dissect background
effects involving chromatin-associated proteins. I find that certain individuals with disruptions in
these genes exhibit extreme phenotypes, such that individuals range from growth exceeding the
parental strains to inviability. These mutations are environmentally-dependent, and interact with
many genetic variants across the genome. I also show that these mutations change the way that
segregating variants interact with one another, significantly modifying the relationship between
genotype and phenotype in the cross.
In chapter 3, I perform pairwise mating of barcoded haploid progeny from a yeast cross to
generate a structured population of ~200,000 dual-barcoded recombinant diploids. I use this
resource to perform parallel fitness measurements of these diploid strains in seven distinct
environments, mapping loci that contribute to fitness in each. Epistasis is pervasive in each
environment, with loci interacting with one another in both an additive and dominant manner. I
find evidence of variation at the mating locus that acts primarily as a modifier of dominance within
the cross while exhibiting little to no effect on its own.
In chapter 4, I use a panel of barcoded haploid yeast strains to examine the genetic basis of
fungal persistence in a mammalian host. I examine persistence as a function of organ, time, sex,
and immunocompetency of the host, using barcode sequencing data to map the loci influencing
which strains are the most successful inside a mammal. While there are genetic loci that provide
6
general benefits across organ type, the presence of many antagonistically pleiotropic loci suggests
a trade-off between persistence in the brain and persistence in non-brain organs.
In chapter 5, I discuss the implications of my work to date. I also consider unanswered
questions in the field and propose future research directions that may address these questions.
7
1.7 References
1. Freedman, M. L. et al. Principles for the post-GWAS functional characterization of cancer risk
loci. Nat. Genet. 43, 513–518 (2011).
2. DIAGRAM Consortium et al. New genetic loci implicated in fasting glucose homeostasis and
their impact on type 2 diabetes risk. Nat. Genet. 42, 105–116 (2010).
3. The Diabetes Genetics Initiative et al. A common variant of HMGA2 is associated with adult
and childhood height in the general population. Nat. Genet. 39, 1245–1250 (2007).
4. Shi, J. et al. Unraveling the Complex Trait of Crop Yield With Quantitative Trait Loci Mapping
in Brassica napus. Genetics 182, 851–861 (2009).
5. Eichler, E. E. et al. Missing heritability and strategies for finding the underlying causes of
complex disease. Nat. Rev. Genet. 11, 446–450 (2010).
6. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–
753 (2009).
7. Hemani, G., Knott, S. & Haley, C. An Evolutionary Perspective on Epistasis and the Missing
Heritability. PLoS Genet. 9, e1003295 (2013).
8. Mackay, T. F. & Moore, J. H. Why epistasis is important for tackling complex human disease
genetics. Genome Med. 6, 125 (2014).
9. Chandler, C. H., Chari, S., Tack, D. & Dworkin, I. Causes and Consequences of Genetic
Background Effects Illuminated by Integrative Genomic Analysis. Genetics 196, 1321–1336
(2014).
10. Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and Theory Point to Mainly Additive
Genetic Variance for Complex Traits. PLoS Genet. 4, e1000008 (2008).
8
11. Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability:
Genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. 109, 1193–1198 (2012).
12. Goddard, M. E., Hayes, B. J. & Meuwissen, T. H. E. Using the genomic relationship matrix to
predict the accuracy of genomic selection: Predict the accuracy of genomic selection. J. Anim.
Breed. Genet. 128, 409–421 (2011).
13. Crow, J. F. On epistasis: why it is unimportant in polygenic directional selection. Philos. Trans.
R. Soc. B Biol. Sci. 365, 1241–1244 (2010).
14. Crow, J. F. Maintaining evolvability. J. Genet. 87, 349–353 (2008).
15. Geiler-Samerotte, K. A., Zhu, Y. O., Goulet, B. E., Hall, D. W. & Siegal, M. L. Selection
Transforms the Landscape of Genetic Variation Interacting with Hsp90. PLOS Biol. 14,
e2000465 (2016).
16. Queitsch, C., Sangster, T. A. & Lindquist, S. Hsp90 as a capacitor of phenotypic variation.
Nature 417, 618–624 (2002).
17. Rutherford, S. L. & Lindquist, S. Hsp90 as a capacitor for morphological evolution. Nature
396, 336–342 (1998).
18. Chen, R. et al. Analysis of 589,306 genomes identifies individuals resilient to severe
Mendelian childhood diseases. Nat. Biotechnol. 34, 531–538 (2016).
19. Dowell, R. D. et al. Genotype to Phenotype: A Complex Problem. Science 328, 469–469
(2010).
20. Taylor, M. B., Phan, J., Lee, J. T., McCadden, M. & Ehrenreich, I. M. Diverse genetic
architectures lead to the same cryptic phenotype in a yeast cross. Nat. Commun. 7, 11669
(2016).
9
21. Lee, J. T., Coradini, A. L. V., Shen, A. & Ehrenreich, I. M. Layers of Cryptic Genetic Variation
Underlie a Yeast Complex Trait. Genetics 211, 1469–1482 (2019).
22. Hou, J., Tan, G., Fink, G. R., Andrews, B. J. & Boone, C. Complex modifier landscape
underlying genetic background effects. Proc. Natl. Acad. Sci. 116, 5045–5054 (2019).
23. Forsberg, S. K. G., Bloom, J. S., Sadhu, M. J., Kruglyak, L. & Carlborg, Ö. Accounting for
genetic interactions improves modeling of individual quantitative trait phenotypes in yeast.
Nat. Genet. 49, 497–503 (2017).
24. Phillips, P. C. Epistasis — the essential role of gene interactions in the structure and evolution
of genetic systems. Nat. Rev. Genet. 9, 855–867 (2008).
25. Hallgrímsdóttir, I. B. & Yuster, D. S. A complete classification of epistatic two-locus models.
BMC Genet. 9, 17 (2008).
26. Musani, S. K. et al. Detection of Gene × Gene Interactions in Genome-Wide Association
Studies of Human Population Data. Hum. Hered. 63, 67–84 (2007).
27. Wilkie, A. O. The molecular basis of genetic dominance. J. Med. Genet. 31, 89–98 (1994).
28. Monir, Md. M. & Zhu, J. Dominance and Epistasis Interactions Revealed as Important Variants
for Leaf Traits of Maize NAM Population. Front. Plant Sci. 9, 627 (2018).
29. Guo, T. et al. Genetic basis of grain yield heterosis in an “immortalized F2” maize population.
Theor. Appl. Genet. 127, 2149–2158 (2014).
30. Boeven, P. H. G. et al. Negative dominance and dominance-by-dominance epistatic effects
reduce grain-yield heterosis in wide crosses in wheat. Sci. Adv. 6, eaay4897 (2020).
31. Veitia, R. A. & Birchler, J. A. Dominance and gene dosage balance in health and disease: why
levels matter!: Dominance and gene dosage balance in health and disease. J. Pathol. 220, 174–
185 (2010).
10
32. Paaby, A. B. & Rockman, M. V. The many faces of pleiotropy. Trends Genet. 29, 66–73
(2013).
33. Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits.
Nat. Genet. 51, 1339–1348 (2019).
34. Geiler-Samerotte, K. A. et al. Extent and context dependence of pleiotropy revealed by high-
throughput single-cell phenotyping. PLOS Biol. 18, e3000836 (2020).
35. Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex
traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).
36. Metzger, B. P. H., Wittkopp, P. J. & Coolon, Joseph. D. Evolutionary Dynamics of Regulatory
Changes Underlying Gene Expression Divergence among Saccharomyces Species. Genome
Biol. Evol. 9, 843–854 (2017).
37. Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Regulatory changes underlying expression
differences within and between Drosophila species. Nat. Genet. 40, 346–350 (2008).
38. Signor, S. A. & Nuzhdin, S. V. The Evolution of Gene Expression in cis and trans. Trends
Genet. 34, 532–544 (2018).
39. Featherstone, D. E. & Broadie, K. Wrestling with pleiotropy: Genomic and topological
analysis of the yeast gene expression network. BioEssays 24, 267–274 (2002).
40. Tyler, A. L., Asselbergs, F. W., Williams, S. M. & Moore, J. H. Shadows of complexity: what
biological networks reveal about epistasis and pleiotropy. BioEssays 31, 220–227 (2009).
41. Austad, S. N. & Hoffman, J. M. Is antagonistic pleiotropy ubiquitous in aging biology? Evol.
Med. Public Health 2018, 287–294 (2018).
42. Carter, A. J. & Nguyen, A. Q. Antagonistic pleiotropy as a widespread mechanism for the
maintenance of polymorphic disease alleles. BMC Med. Genet. 12, 160 (2011).
11
43. Gerstein, A. C., Chun, H.-J. E., Grant, A. & Otto, S. P. Genomic Convergence toward Diploidy
in Saccharomyces cerevisiae. PLoS Genet. 2, e145 (2006).
44. Berman, J. & Sudbery, P. E. Candida albicans: A molecular revolution built on lessons from
budding yeast. Nat. Rev. Genet. 3, 918–931 (2002).
45. Strope, P. K. et al. The 100-genomes strains, an S. cerevisiae resource that illuminates its
natural phenotypic and genotypic variation and emergence as an opportunistic pathogen.
Genome Res. 25, 762–774 (2015).
12
Chapter 2: The complex underpinnings of genetic background effects
This work was performed in collaboration with Takeshi Matsui, Rachel Schell, Ryan Foree,
and Ian M. Ehrenreich. It was published in Nature Communications on September 17
th
, 2018.
Doi: https://doi.org/10.1038/s41467-018-06023-5
2.1 Author contribution to this work
Early in my graduate studies, I attempted to identify and characterize background effects
by deleting the gene TUP1, a general repressor of transcription, in a cross between two strains of
the budding yeast Saccharomyces cerevisiae. The thought behind this experiment was that genetic
variation in generally repressed genes may not lead to quantifiable changes in phenotype under
normal growth conditions, but could become visible in the absence of the repressive mechanism.
After knocking out TUP1, I did in fact observe many changes in phenotype that were not present
in wild-type segregants from the cross; however, segregants derived from the cross were incredibly
hard to work with and displayed high levels of flocculation, a loss of mating type identity in MATα
cells, and meiosis defects. Rather than continue work in this system – and in an effort to produce
more generalized findings – my advisor and I decided to examine background-dependent effects
of knockouts in a larger set of genes, focusing on those involved in transcriptional regulation,
nucleosome positioning, and chromatin architecture.
As a co-first and co-corresponding author of this work, I generated the yeast crosses used
in this work alongside fellow coauthors Rachel Schell and Takeshi Matsui. The three of us also
contributed to measuring growth of these populations and sequencing them to facilitate mapping
of loci responsive to particular genetic perturbations. Later, I conducted genetic mapping at the
1D, 2D, and 3D levels and performed downstream analysis of the background effects mapped in
each of the mapping populations in the study. My specific focus was on whether loci involved in
13
background effects were unique to particular gene knockouts or whether they appeared in the
context of multiple knockouts in the study. However, I also played a role in the quantification of
genetic effect sizes, characterization of individual genetic interactions involving knockouts, and
the role of environment in background effects. My role in the publication of this manuscript also
involved generation of figures, data curation, and assistance writing the manuscript.
Our findings, while helpful for the understanding of genetic background effects, were
limited by the fact that we characterized interactions with a relatively small number of genes in
the genome, all but one of which are thought to be functionally related in the cell. As a result, this
work was led to several subsequent studies, which are ongoing or discussed in later chapters of
this dissertation.
2.2 Abstract
Spontaneous and induced mutations often show different phenotypic effects across genetically
distinct individuals. Although these background effects are known to result from epistasis between
mutations and standing polymorphisms, their underlying genetic architecture remains poorly
characterized. Here, we genotyped 1,411 wild type and mutant segregants from the same budding
yeast cross, and measured their growth in 10 environments. Using these data, we mapped 1,086
genetic interactions between segregating loci and seven different gene knockouts. Between 73 and
543 interactions were identified for each knockout, with 89% of the detected interactions involving
higher-order epistasis between a knockout and multiple loci. Identified loci interacted with as few
as one knockout and as many as all seven knockouts. Loci that interacted with fewer knockouts
tended to show enhanced phenotypic effects in mutants. In contrast, loci that interacted with more
knockouts typically had reduced phenotypic effects in mutants. Analysis of the identified loci
across environments found that most interactions between the knockouts and segregating loci also
14
depended on the environment. These results provide detailed insights into the complicated
interactions between mutations, standing polymorphisms, and the environment that ultimately
cause background effects.
15
2.3 Introduction
Background effects occur when the same spontaneous or induced mutations show different
phenotypic effects across genetically distinct individuals
(19, 20, 46–50). Countless examples of
background effects have been described across species and traits
(46, 47), collectively suggesting
that this phenomenon is common in biological systems and plays a significant role in many
phenotypes. For example, alleles that show background effects contribute to a wide range of
hereditary disorders, including, but not limited to, colorectal cancer, hypertension, and
phenylketonuria (51). Background effects may also impact other disorders that frequently involve
de novo mutations, such as autism
(52), congenital heart disease
(53), and schizophrenia
(54).
Additionally, it has been proposed that background effects can shape the potential trajectories of
evolutionary adaptation (55, 56), influence the emergence of novel traits
(20), and help maintain
deleterious genetic variation within populations
(7).
Despite the importance of background effects to biology and medicine, understanding of
their causal genetic mechanisms remains limited. Although superficially background effects are
known to arise due to genetic interactions (or “epistasis”) between mutations and standing
polymorphisms
(57–62), only recently have studies begun to provide deeper insights into the
architecture of epistasis underlying background effects. These papers indicate that background
effects often involve multiple polymorphisms that interact not only with a mutation but also with
each other
(9, 16, 17, 19, 49, 50, 63–65)and the environment
(50). This work also suggests that
background effects are caused by a mixture of loci that show enhanced and reduced phenotypic
effects in mutants relative to wild-type individuals (15–17, 20, 46, 49, 50, 60, 64, 66–70). Together,
these previous reports imply that the phenotypic effect of a mutation in a given genetic background
can depend on an individual’s genotype at a potentially large number of loci that interact in
16
complicated, highly contextual ways. However, this point is difficult to explicitly show because
doing so requires systematically mapping the interactions between mutations, polymorphisms, and
environment that give rise to background effects.
In this paper, we perform a detailed genetic characterization of a number of background
effects across multiple environments. Previous work in yeast, as well as other model species, has
established that mutations in chromatin regulation and transcription often show background effects
(9, 20, 49, 50, 67, 68, 71).
We extend this past work by knocking out seven different chromatin
regulators in a cross of the BY4716 (BY) and 322134S (3S) strains of Saccharomyces cerevisiae.
We generate and genotype 1411 wild-type and knockout segregants, measure the growth of these
individuals in 10 environments, and perform linkage mapping with these data. In total, we identify
1086 interactions between the knockouts and segregating loci. These interactions allow us to
obtain novel, detailed insights into the genetic architecture of background effects across different
mutations and environments.
17
2.4 Results
2.3.1. Preliminary screen
When a mutation that exhibits background effects is introduced into a population, the
phenotypic variance among individuals will often change
(15–17, 66). Here, we attempted to
identify mutations that induce such changes in phenotypic variance. Specifically, we screened 47
complete gene knockouts of histones, histone-modifying enzymes, chromatin remodelers, and
other chromatin-associated genes for impacts on phenotypic variance in segregants from a cross
of the BY and 3S strains of budding yeast (Figures S2.1 and S2.2; Table S2.1; Methods). To do
this, we generated BY/3S diploid hemizygotes, sporulated these hemizygotes to obtain haploid
knockout segregants, and then quantitatively phenotyped these BY×3S knockout segregants for
growth on rich medium containing ethanol, an environment in which we previously found
background effects that influence yeast colony morphology(20, 49, 50, 72). For each panel of
segregants, three biological replicate end-point colony growth assays were performed and
averaged. We then tested whether the knockout segregants exhibited significantly higher
phenotypic variance than wild-type segregants using Levene’s test (Table S2.2). This analysis
implicated CTK1 (a kinase that regulates RNA polymerase II), ESA1 and GCN5 (two histone
acetyltransferases), HOS3 and RPD3 (two histone deacetylases), HTB1 (a copy of histone H2B),
and INO80 (a chromatin remodeler) as knockouts that show background effects in the BY×3S
cross (Figure S2.1; Note S2.1).
2.3.2. Mapping of mutation-independent and mutation-responsive effects
To map loci that interact with the seven knockouts identified in the screen, we genotyped
1411 segregants in total. This included 164 wild-type, 210 ctk1∆, 122 esa1∆, 215 gcn5∆,
220 hos3∆, 177 htb1∆, 141 ino80∆, and 162 rpd3∆ segregants (Figure S2.3;
18
Tables S2.3 through S2.5; Methods). These genotyped segregants were phenotyped for growth in
10 diverse environments using replicated end-point colony growth assays (Figure S2.4;
Table S2.6; Methods). We note that, despite causing increased phenotypic variance in ethanol,
the knockouts induced a broad range of phenotypic responses in other environments (Figure S2.4).
As described in detail in Methods, genome-wide linkage mapping scans were conducted
within each environment. To maximize statistical power, we analyzed the 1411 segregants jointly
using a fixed-effects linear model that accounted for genetic background. We identified individual
loci, as well as two- and three-way genetic interactions among loci, that exhibited the same
phenotypic effect across the wild-type and knockout backgrounds (hereafter “mutation-
independent” effects). We also conducted scans for individual loci, as well as two- and three-way
genetic interactions among loci, that exhibited different phenotypic effects in at least one knockout
background relative to the wild-type background (hereafter “mutation-responsive” effects). Post
hoc tests were used to associate mutation-responsive effects with specific knockouts. Mutation-
responsive one-, two-, and three-locus effects can alternatively be viewed as two-, three-, and four-
way interactions where one of the involved genetic factors is a knockout. However, to avoid
confusion throughout the paper, we do not count the knockouts as genetic factors. Instead, we
classify each genetic effect as mutation-independent or –responsive and report how many loci it
involves. Representative examples of mutation-responsive effects are shown in Figure 2.1.
19
Figure 2.1 Examples of mutation-responsive genetic effects. a shows representative examples of one-,
two-, and three-locus mutation-responsive effects with larger phenotypic effects in wild-type segregants
than mutants. In contrast, b shows representative examples of one-, two-, and three-locus mutation-
responsive effects with larger phenotypic effects in mutants than wild-type segregants. Means depicted
along the y axis show residuals from a fixed-effects linear model that includes the mutation-independent
effect of each involved locus, as well as any possible lower-order mutation-independent and mutation-
responsive effects. The different genotype classes are plotted below the x axis. Blue and orange boxes
correspond to the BY and 3S alleles of a locus, respectively. Error bars represent one standard deviation
from the mean.
In total, we detected 1211 genetic effects across the 10 environments
(Figures S2.5 and S2.6; Tables S2.7 through S2.10; Note S2.2 and S2.3). One hundred and twenty
five (10%) of these genetic effects were mutation-independent while 1086 (90%) were mutation-
responsive (Figure 2.2a). On average, we identified 121 genetic effects per environment, 109 of
20
which were mutation-responsive. However, the number of detected genetic effects varied
significantly across the environments, from 15 in room temperature to 359 in ethanol. Despite this
variability, in every environment, ≥47% of the identified genetic effects were mutation-responsive.
This suggests that, regardless of environment, most genetic effects in the cross were responsive to
the knockouts. Additionally, the seven knockouts exhibited major differences in their numbers of
mutation-responsive effects. Between 73 and 118 mutation-responsive effects were found for
the CTK1, ESA1, GCN5, HTB1, INO80, and RPD3 knockouts (Figure 2.2b). In contrast,
the HOS3 knockout had 543 mutation-responsive effects (Figure 2.2b).
Figure 2.2 Most mutation-responsive genetic effects involve multiple loci. In a, the number of mutation-
independent and mutation-responsive genetic effects detected in each environment are shown. In b, the
aggregate numbers of mutation-responsive effects found for each knockout across the 10 environments are
provided.
2.3.3. Higher-order epistasis influences response to mutations
While only 29% (36 of 125) of the mutation-independent effects involved multiple loci,
this proportion was more than tripled (89%; 965 of 1086) among the mutation-responsive effects
(Figure 2.2a). Simulations indicate that our statistical power to detect mutation-responsive loci
was appreciably higher for single locus effects than for multiple locus effects, suggesting that our
21
results may underestimate the importance of higher-order epistasis to background effects
(Figure S2.7). To better assess how loci involved in the identified higher-order interactions
contribute to background effects, we partitioned the individual and joint contributions of involved
loci to mutation-responsive phenotypic variance (Data S2.4 and S2.5; Methods). For mutation-
responsive two-locus effects, on average, 78% of the mutation-responsive phenotypic variance
was attributed to the higher-order interaction between the knockout and both loci (Figure 2.3a).
Likewise, among mutation-responsive three-locus effects, on average, 58% of the mutation-
responsive phenotypic variance was explained by the higher-order interaction of the knockout and
the three loci (Figure 2.3b). Thus, most mutation-responsive effects involve multiple loci that
contribute to background effects predominantly through their higher-order interactions with each
other and a mutation, rather than through their individual interactions with a mutation.
22
Figure 2.3 Higher-order epistasis among knockouts and multiple loci is an important contributor to
background effects. In a, for each mutation-responsive two-locus effect, we partitioned the individual and
joint contributions of the two loci. L1 and L2 refer to the involved loci, while KO denotes the relevant
knockout. We determined the relative phenotypic variance explained (PVE) by interactions between a
knockout and each individual locus (i.e., KO × L1 and KO × L2) and higher-order epistasis involving a
knockout and the two loci (i.e., KO × L1 × L2). Similarly, in b, for each mutation-responsive three-locus
effect, we determined the relative PVE for all possible mutation-responsive one-, two-, and three-locus
effects involving the participating loci. In both a and b, relative PVE values were calculated using sum of
squares obtained from ANOVA tables, as described in Methods. Mutation-responsive effects that interact
with multiple knockouts are shown multiple times, once for each relevant knockout.
2.3.4. The environment plays a strong role in background effects
The role of the environment in background effects has yet to be fully characterized.
Although our group previously showed that the genetic architecture of background effects can
significantly change across environments
(50), this past work focused on only a modest number of
segregating loci and environments. To more generally assess how the environment influences the
genetic architecture of background effects, we determined whether the 1086 mutation-responsive
effects impacted phenotype in environments outside the ones in which they were originally
detected. This analysis was performed using statistical thresholds that were more liberal than those
employed in our initial genetic mapping (Methods). In all, 29% of the mutation-responsive effects
were detectable in additional environments, with this proportion varying between 7% and 65%
across the 10 environments. (Figure 2.4). Of these mutation-responsive effects, 64% were
identified in only one additional environment, 28% were found in two additional environments,
and just 8% were detected in ≥3 environments. Given the limited resolution of the data, it is
possible that some of the mutation-responsive effects that were detected in multiple environments
in fact represent distinct, closely linked loci that act in different environments. Such linkage would
lead us to overestimate how often mutation-responsive effects contribute to background effects in
23
different environments, further suggesting that most mutation-responsive effects act in a limited
number of environments. These findings support the idea that background effects are caused by
complex interactions between not only mutations and polymorphisms but also the environment.
Figure 2.4 Analysis of mutation-responsive effects across environments. The height of each stacked bar
indicates the number of mutation-responsive effects that were detected in a given environment. The bars
are color-coded according to the number of additional environments in which these mutation-responsive
effects could be detected when liberal statistical thresholds were employed (Methods).
2.3.5. Interactions between segregating loci and knockouts
We next looked at how the same mutation-responsive effects interact with different
knockouts. Based on involvement of the same loci, the 1086 mutation-responsive effects were
collapsed into 594 distinct mutation-responsive effects that showed epistasis with at least one
knockout (Methods). In all, 65% of these mutation-responsive effects were found in only one
knockout background, while 35% were identified in ≥2 knockout backgrounds (Figure S2.8).
Also, 97% of the mutation-responsive effects that interacted with only one knockout were HOS3-
24
responsive and these effects represented 69% of the total interactions detected in
a hos3∆ background (Figure 2.5a). In contrast, nearly all (between 95% and 100%) of the CTK1-
, ESA1-, GCN5-, HTB1-, INO80-, and RPD3-responsive effects were detected in multiple
backgrounds (Figure 2.5a). Although the mutation-responsive effects exhibited a broad,
continuous range of responses to the knockouts (Figure 2.5b), they could be partitioned into two
qualitative classes—enhanced and reduced—based on whether they explained more or less
phenotypic variance in mutants than in wild-type segregants, respectively. The distinct mutation-
responsive effects exhibited a strong relationship between their number of interacting knockouts
and how they were classified. Mutation-responsive effects that interacted with fewer than three
knockouts predominantly were in the enhanced class, while mutation-responsive effects that
interacted with four or more knockouts typically were in the reduced class (χ
2
= 709.37,
d.f. = 6, p = 5.81 × 10
−150
; Figure 2.5c). These results illustrate how background effects are caused
by a mixture of loci that respond specifically to mutations in particular genes and loci that respond
more generically to mutations in different genes, with the relative contribution of these two classes
of loci varying significantly across mutations. Our findings also suggest that how loci respond to
mutations in a particular gene is related to the degree to which they interact with mutations in other
genes.
25
Figure 2.5 Analysis of mutation-responsive effects across knockout backgrounds. In a, the number of
mutation-responsive effects that interacted with only one knockout (pink) or interacted with multiple
knockouts (blue) are shown for each knockout. In b, the phenotypic variance explained (PVE) for each
mutation-responsive effect is shown in the relevant knockout (KO) segregants, as well as in the wild-type
(WT) segregants. The PVE for each mutation-responsive effect was determined using fixed-effects linear
models fit within each individual background (Methods). Mutation-responsive effects are color-coded by
the knockout population in which they were identified. In c, the percentage of mutation-responsive effects
that showed larger phenotypic effects in mutants than in wild-type segregants (y axis, left side) and
mutation-responsive effects that showed larger phenotypic effects in wild-type segregants than in mutants
(y axis, right side) is depicted. These values are plotted as a function of the number of knockouts that interact
with a given mutation-responsive effect. Error bars represent 95% bootstrap confidence intervals
(Methods).
2.3.6. Genetic basis of induced changes in phenotypic variance
Lastly, we looked at the extent to which the identified mutation-responsive effects in
aggregate related to differences in phenotypic variance between the knockout and wild-type
versions of the BY×3S cross across the 10 environments (Table S2.11). This was important
because many studies (e.g., refs. 15– 17, 64, 68, 73) have described how perturbing certain genes
can alter the phenotypic variance within a population, but the genetic underpinnings of this
phenomenon have not been fully determined. Among the 70 different combinations of the 7
knockouts and 10 environments, we found a highly significant relationship between differences in
the numbers of mutation-responsive effects with reduced and enhanced phenotypic effects and
knockout-induced changes in phenotypic variance (Figure 2.6;
Spearman’s ρ = 0.84, p = 4.33 × 10
−20
). No such relationship was seen when we looked at the mean
phenotypic changes induced by the mutations (Figure S2.9). To control for potential biases in our
analyses that might arise from allele frequency differences among the backgrounds (Figure S2.3),
we performed the same analysis on each knockout background individually using data from the 10
26
environments (Table S2.11). When we did this, we found that all seven knockout backgrounds
exhibited nominally significant correlations between observed changes in phenotypic variance and
detected mutation-responsive effects (Spearman’s ρ > 0.71, p < 0.02; Figure S2.10). Permutations
indicate that the probability of observing this result by chance is low (p < 10−5). Thus, these
findings are consistent with the knockout-induced changes in phenotypic variance resulting from
a large number of epistatic loci with small phenotypic effects (Figure S2.11; Table S2.11). In
summary, our results not only provide valuable insights into the genetic architecture of background
effects but also illustrate how interactions between mutations, segregating loci, and the
environment can influence a population’s phenotypic variance.
Figure 2.6 Mutation-responsive effects underlie differences in phenotypic variance between knockout
and wild-type backgrounds across environments. Each point’s position on the x axis represents the
difference in phenotypic variance between a knockout background of the cross (V P.Mut) and the wild-type
background of the cross (V P.WT) in a single environment. The y axis shows the difference in the number of
27
mutation-responsive effects with enhanced and reduced phenotypic effects. In this paper, we classified
mutation-responsive effects as enhanced or reduced based on whether they explained more or less
phenotypic variance in mutants relative to wild-type segregants, respectively. Spearman’s ρ and its
associated p value are provided on the plot. Colors denote different knockout backgrounds.
28
2.5 Discussion
Most prior studies of background effects have described specific examples without
identifying the contributing loci. Here we used a screen of 47 different chromatin regulators to
identify 7 knockout mutations that exhibit strong background effects in a yeast cross. We then
generated and phenotyped a panel of 1411 mutant and wild-type segregants. Using these data, we
detected 1086 genetic interactions that involve the 7 knockouts and loci that segregate in the cross.
To better understand the genetic architecture of background effects, we comprehensively
examined how the identified loci interact not only with the knockouts but also with each other and
the environment. Our results confirm important points about the genetic architecture of
background effects that to date have been suggested but not conclusively proven. Namely,
background effects can be highly polygenic, with many, if not most, loci contributing through
higher-order genetic interactions that involve a mutation and multiple loci. These loci can respond
to mutations in different ways, such as by exhibiting enhanced and reduced phenotypic effects in
mutants relative to wild-type individuals. Moreover, most of these interactions between mutations
and segregating loci also involve the environment. Altogether, these findings shed light on the
complex genetic and genotype–environment interactions that give rise to background effects.
Our work also illustrates how the genetic architecture of background effects varies
significantly across different mutated genes. In our study, response to six of the seven knockouts
was mediated almost exclusively by loci that respond to mutations in different genes and
predominantly exhibit reduced effects in mutants relative to wild-type segregants. Given that some
of the examined chromatin regulators have counteracting or unrelated biochemical activities
(74,
75), we propose that loci detected in multiple knockout backgrounds respond generically to
perturbations of cell state or fitness, rather than to any specific biochemical process. In contrast,
29
response to HOS3 knockout was largely mediated by loci that were not detected when the other
genes were compromised. Why so many loci responded specifically to perturbation of HOS3 is
difficult to infer from current understanding of Hos3’s biochemical activities. Although Hos3 can
deacetylate all four of the core histones
(76) and influence chromatin regulation in certain genomic
regions
(77), it also plays roles in cell cycle
(78) and nuclear pore regulation
(79). Thus further
work is needed to characterize HOS3 and its extensive epistasis with polymorphisms in the BY×3S
cross.
In addition to advancing understanding of background effects, our results may also have
more general implications for the genetic architecture of complex traits. Many phenotypes,
including common disorders like autism
(52) and schizophrenia (54), are influenced by loss-of-
function mutations that occur de novo or persist within populations at low frequencies. We have
shown that these mutations can significantly change the phenotypic effects of many
polymorphisms within a population by altering how these polymorphisms interact with each other
and the environment. Although these complicated interactions between mutations, standing
polymorphisms, and the environment are often ignored in genetics research, our study suggests
that they in fact play a major role in determining the relationship between genotype and phenotype.
30
2.6 Methods
2.5.1. Generation of different BYx3S knockout backgrounds
All BY×3S segregants described in this paper were generated using the synthetic genetic
array marker system, which makes it possible to obtain MATa haploids by digesting tetrads and
selecting for spores on minimal medium lacking histidine and containing canavanine (Figure
S2.1) (80). We first constructed a BY/3S diploid by mating a BY MATa can1Δ::STE2pr-SpHIS5
his3∆ strain to a 3S MATα ho::HphMX his3∆::NatMX strain. This diploid served as the progenitor
for the wild-type segregants. Hemizygous complete gene deletions were engineered into this wild-
type BY/3S diploid to produce the progenitors of the knockout segregants. Genes were deleted
using transformation with PCR products that were comprised of (in the following order) 60 bp of
genomic sequence immediately upstream of the targeted gene, KanMX, and 60 bp of genomic
sequence immediately downstream of the targeted gene. Lithium acetate transformation was
employed
(81). To obtain a given knockout, transformants were selected on rich medium
containing G418, ClonNAT, and Hygromycin B, and PCR was then used to check transformants
for correct integration of the KanMX cassette. These PCRs were conducted with primer pairs
where one primer was located within KanMX and the other primer was located adjacent to the
expected site of integration. PCR products were Sanger sequenced. Primers used in these checks
are reported in Table S2.12. Wild-type and hemizygous knockout diploids were sporulated using
standard techniques. Low-density random spore plating (around 100 colonies per plate) was then
used to obtain haploid BY×3S segregants from each wild-type and knockout background of the
cross. Wild-type segregants were isolated directly from MATa selection plates, while knockout
segregants were first replica plated from MATa selection plates onto G418 plates, which selected
for the gene deletions.
31
2.5.2. Genotyping of segregants
Segregants were genotyped using low-coverage whole-genome sequencing. A sequencing
library was prepared for each segregant using the Illumina Nextera Kit and custom barcoded
adapters. Libraries from different segregants were pooled in equimolar fractions and these
multiplex pools were size selected using the Qiagen Gel Extraction Kit. Multiplexed samples were
sequenced by BGI on an Illumina HiSeq 2500 using 100 bp × 100 bp paired-end reads. For each
segregant, reads were mapped against the S288c genome (version
S288C_reference_sequence_R64-2-1_20150113.fsa from https://www.yeastgenome.org) using
BWA version 0.7.7-r44
(82). Pileup files were then produced with SAMTOOLS version 0.1.19-
44428 cd
(83). BWA and SAMTOOLS were run with their default settings. Base calls and
coverages were obtained from the pileup files for 36,756 previously identified high-confidence
single-nucleotide polymorphisms (SNPs) that segregate in the cross
(20). Individuals who showed
evidence of being aneuploid, diploid, or contaminated based on unusual patterns of coverage or
heterozygosity were excluded from further analysis. We also used the data to confirm the presence
of KanMX at the gene that had been knocked out. Individuals with an average per site coverage
<1.5× were removed from the dataset. A vector containing the fraction of 3S calls at each SNP
was generated and used to make initial genotype calls with sites above and below 0.5 classified as
3S and BY, respectively. This vector of initial genotype calls was then corrected with a Hidden
Markov Model (HMM), implemented using the HMM package version 1.0 in R
(84). We used the
following transition and emission probability matrices:
transProbs = matrix(c(.9999,.0001,.0001,.9999),2) and
emissionProbs = matrix(c(.0.25,0.75,0.75,0.25),2). We examined the HMM-corrected genotype
calls for adjacent SNPs that lacked recombination in the segregants. In such instances, a single
32
SNP was chosen to serve as the representative for the entire set of adjacent SNPs that lacked
recombination. This reduced the number of markers used in subsequent analyses from 36,756 to
8311.
2.5.3. Phenotyping of segregants
Prior to phenotyping, segregants were always inoculated from freezer stocks into YPD
broth containing 1% yeast extract (BD Product #: 212750), 2% peptone (BD 211677), and 2%
dextrose (BD Product #:15530). After these cultures had reached stationary phase, they were
pinned onto and outgrown on plates containing 2% agar (BD 214050). Unless specified, these
plates were made with YPD and incubated at 30 °C for 2 days. However, some of the environments
required adding a chemical compound to the YPD plates or changing the temperature or carbon
source. In addition to YPD at 30 °C, we measured growth in the following environments: 21 °C,
42 °C, 2% ethanol (Koptec A06141602W), 250 ng/mL 4-nitroquinoline 1-oxide (“4NQO”) (TCI
N0250), 9 mM copper sulfate (Sigma 209198), 50 mg/mL fluconazole (TCI F0677), 260 mM
hydrogen peroxide (EMD Millipore HX0640-5), 7 mg/mL neomycin sulfate (Gibco 21810-031),
and 5 mg/mL zeocin (Invivogen ant-zn-1). For 4-NQO, copper sulfate, fluconazole, hydrogen
peroxide, neomycin sulfate, and zeocin, the doses used for phenotyping were chosen based on
preliminary experiments across a broader range of concentrations (Table S2.6). Growth assays
were conducted in triplicate using a randomized block design to account for positional effects on
the plates. Four BY controls were included on each plate. Plates were imaged using the BioRAD
Gel Doc XR+ Molecular Imager. Each image was 11.4 × 8.52 cm
2
(width × length) and imaged
under white epi-illumination with an exposure time of 0.5 s. Images were exported as Tiff files
with a resolution of 600 dpi. As in ref.
(85) , image analysis was conducted in the ImageJ software,
with pixel intensity for each colony calculated using the Plate Analysis JRU v1 plugin
33
(http://research.stowers.org/imagejplugins/index.html). The growth of each segregant on each
plate was computed by dividing the segregant’s total pixel intensity by the mean pixel intensity of
the average of BY controls from the same plate. The replicates for a segregant within an
environment were then averaged and used as that individual’s phenotype in subsequent analyses.
2.5.4. Scans for one-locus effects
All genetic mapping was conducted within each environment using fixed-effects linear
models applied to the complete set of 1411 wild-type and knockout segregants. To ensure that
mean differences in growth among the eight backgrounds were always controlled for during
mapping, we included a background term in our models. Throughout the paper, we refer to loci or
combinations of loci that statistically interact or do not statistically interact with
the background term as mutation-responsive and mutation-independent, respectively. Genetic
mapping was performed in R using the lm() function, with the p values for relevant terms obtained
from tables generated using the summary() function.
We first identified individual loci that show mutation-independent or mutation-responsive
phenotypic effects using forward regression. To detect mutation-independent loci, genome-wide
scans were conducted with the model phenotype ~ background + locus + error. Significant loci
identified in this first iteration were then used as covariates in the next iteration,
i.e., phenotype ~ background + known_locus1 … known_locusN + locus + error, where
the known_locus terms corresponded to each of the loci that had already been identified in a given
environment. To determine significance, 1000 permutations were conducted at each iteration of
the forward regression, with the correspondence between genotypes and phenotypes randomly
shuffled each time. Among the minimum p values obtained in the permutations, the fifth quantile
34
was identified and used as the threshold for determining significant loci. This process was iterated
until no additional loci could be detected for each environment.
To identify mutation-responsive one-locus effects, we employed the same procedure
described in the preceding paragraph, except the
model phenotype ~ background + locus + background:locus + error was used. Here the
significance of the background:locus interaction term was tested, again with significance
determined by permutations as described in the preceding paragraph. The locus term was included
in the model to ensure that phenotypic variance explained by mutation-independent effects did not
load onto the mutation-dependent effects. For each locus with significant background:locus terms,
we included both an additive and background interaction term in the subsequent iterations of the
forward regression:
i.e., phenotype ~ background + known_locus1 … known_locusN + locus + background:known_lo
cus1 … background:known_locusN + background:locus + error. The known_locus terms were
included in these forward regression models to ensure that variance due to the mutation-
independent effects of previously identified loci was not inadvertently attributed to the mutation-
responsive terms for these loci. This process was iterated until no
additional background:locus terms were discovered for each environment.
2.5.5. Scans for two-locus effects
We also performed full-genome scans for two-locus effects. Here every unique pair of loci
was interrogated using fixed-effects linear models like those described in the preceding section.
As with the one-locus effects, we employed two models in parallel. The
model phenotype ~ background + locus1 + locus2 + locus1:locus2 + error was used to identify
mutation-independent two-locus effects, whereas the
35
model phenotype ~ background + locus1 + locus2 + background:locus1 + background:locus2 + lo
cus1:locus2 + background:locus1:locus2 + error was employed to detect mutation-responsive two-
locus effects. Specifically, we tested for significance of
the locus1:locus2 and background:locus1:locus2 interaction terms with the former and latter
models, respectively. Simpler terms were included in each model to ensure that variance was not
erroneously attributed to more complex terms. Significance thresholds for these terms were
determined using 1000 permutations with the correspondence between genotypes and phenotypes
randomly shuffled each time. However, to reduce computational run time, 10,000 random pairs of
loci, rather than all possible pairs of loci, were examined in each permutation. Significance
thresholds were again established based on the fifth quantile of minimum p values observed across
the permutations. To ensure that our main findings were robust to threshold, we also generated
results at false discovery rates (FDRs) of 0.01, 0.05, and 0.1 by comparing the rate of discoveries
at a given p value in the permutations to the rate of discoveries at that same p value in our results
(Table S2.10; Note S2.2).
2.5.6. Scans for three-locus effects
Owing to computational limitations, we were unable to run a comprehensive scan for
mutation-independent and mutation-responsive three-locus effects. Instead, we scanned for three-
locus effects involving two loci that had already been identified in a given environment
(known_locus1 and known_locus2) and a third locus that had yet to be detected (locus3). The
model phenotype ~ background + known_locus1 + known_locus2 + locus3 + known_locus1:kno
wn_locus2 + known_locus1:locus3 + known_locus2:locus3 + known_locus1:known_locus2:locu
s3 + error was used to identify mutation-independent three-locus effects, whereas the
model phenotype ~ background + known_locus1 + known_locus2 + locus3 + background:known
36
_locus1 + background:known_locus2 + background:locus3 + known_locus1:known_locus2 + kn
own_locus1:locus3 + known_locus2:locus3 + background:known_locus1:known_locus2 + backg
round:known_locus1:locus3 + background:known_locus2:locus3 + known_locus1:known_locus
2:locus3 + background:known_locus1:known_locus2:locus3 + error was employed to detect
mutation-responsive three-locus effects. Significance of
the known_locus1:known_locus2:locus3 and background:known_locus1:known_locus2:locus3 te
rms in the respective models was determined using 1000 permutations with the correspondence
between genotypes and phenotypes randomly shuffled each time. For each permutation, 10,000
trios of sites were chosen by first randomly picking two loci on different chromosomes and then
randomly selecting an additional 10,000 sites. The minimum p value across the 10,000 tests was
retained. Significance thresholds were again established based on the fifth quantile of
minimum p values observed across the permutations. As with the two-locus effect scans, we also
performed our analysis across multiple FDR thresholds to ensure that our findings were robust
(Table S2.10; Note S2.2).
2.5.7. Assignment of mutation-responsive effects to knockouts
In the aforementioned linkage scans, genetic effects exhibited statistical interactions with
the background term if they had a different phenotypic effect in at least one of the eight
backgrounds relative to the rest. To determine the specific knockouts that interacted with each
mutation-responsive effect, we used the contrast() function from the R package lsmeans. This was
applied to the specific effect of interest post hoc using the same linear models that were employed
for detection. All possible pairwise contrasts between wild-type and knockout segregants were
conducted. Mutation-responsive effects were assigned to specific mutations if the contrast between
a mutation and a WT population was nominally significant. Unless otherwise noted, we counted
37
each assignment of a mutation-responsive effect to a specific knockout as a separate genetic effect
even if they involved the same set of loci.
2.5.8. Statistical power analysis
To determine the statistical power of our mapping procedures, we simulated phenotypes
for the 1411 genotyped segregants given their genotypes at randomly chosen loci and then tried to
detect these loci using the approaches described earlier. In each simulation, a given segregant’s
phenotype was determined based on both the mutation it carried (if any), as well as its genotype at
one, two, or three randomly chosen loci. The effects of mutations were calculated based on the
real phenotype data for the glucose environment. Phenotypic effects of the mutation-responsive
locus or loci were also attributed to each segregant. Specifically, the phenotype of segregants in
only one of the possible genotype classes were increased by a given increment, which we refer to
as the absolute effect size. For one-, two-, and three-locus effects, this respectively entailed half,
one quarter, and one eighth of the individuals having their phenotypes increased by the increment.
In the case of mutation-independent effects, these increments were applied to all eight of the wild-
type and knockout backgrounds. In contrast, for mutation-responsive effects, increments were only
applied to one of the eight backgrounds, with the specific background randomly chosen. Lastly,
random environmental noise was added to each segregant’s phenotype. Using these genotype and
phenotype data, we tested whether we could detect the loci that had been given a phenotypic effect.
This was done by fitting the appropriate fixed-effects linear model, extracting the p value for the
relevant term, and determining if that p value fell below a nominal significance threshold
of α = 0.05. Statistical power was calculated as the proportion of tests at a given phenotypic
increment where p ≤ 0.05. The results of this analysis are shown in Figure S2.7.
38
2.5.9. Contributions of loci involved in higher-order epistasis
For all mutation-responsive two- and three-locus effects, we determined the proportion of
mutation-responsive phenotypic variance explained by each individual locus and the interactions
among these loci. To do this, we generated seven subsets of data, each of which were comprised
of the wild-type segregants and one set of knockout segregants. We then fit the same model that
was used to originally identify a given mutation-responsive effect to the appropriate data subsets.
For two-locus effects, we obtained the sum of squares for
the background:locus1, background:locus2, and background:locus1:locus2 terms. We then
divided each of these values by the sum of all three sum of squares. For three-locus effects, we
obtained the sum of sum of squares associated with each individual locus
(background:locus1, background:locus2, background:locus3) and pair of loci
(background:locus1:locus2, background:locus1:locus3, background:locus2:locus2), as well as the
sum of squares associated with the trio of loci (background:locus1:locus2:locus3). We then divided
the total sum of squares associated with each class of terms by the total sum of squares across all
mutation-responsive terms. The ternary plots used to show these results were generated using the
R package ggtern.
2.5.10. Analysis of mutation-responsive effects across environments
We determined whether each one-, two-, and three-locus mutation-responsive effect
exhibited a phenotypic effect in any environment outside the one in which it was originally
detected. To do this, we used seven subsets of data, each of which was comprised of the wild-type
segregants and one set of knockout segregants. We then fit the same model that was used to
originally identify a given mutation-responsive effect to the appropriate data subsets for each of
39
the nine additional environments. The p value was then extracted for the relevant term. Bonferroni
corrections were used to account for multiple testing.
2.5.11. Genetic explanation of changes in phenotypic variance
We measured the phenotypic variance explained by each mutation-responsive genetic
effect in the relevant knockout background(s), as well as in the wild-type background. Here we fit
each mutation-responsive genetic effect in both populations without using any background term.
For mutation-responsive one-, two-, and three-locus effects, the following models were
respectively
employed: phenotype ~ locus1 + error, phenotype ~ locus1 + locus2 + locus1:locus2 + error,
and phenotype ~ locus1 + locus2 + locus3 + locus1:locus2 + locus1:locus3 + locus2:locus3 + loc
us1:locus2:locus3 + error. Partial R
2
values were obtained in each population by obtaining the
sum of squares associated with the term of interest and dividing it by the total sum of squares.
Mutation-responsive effects were then classified by the number of knockout backgrounds in which
they were detected. For each class, the number of genetic effects with larger partial R
2
values in
the knockout background than in the wild-type background (enhanced effects) and the number of
genetic effects with smaller partial R
2
values in the knockout background than in the wild-type
background (reduced effects) were determined. The proportion of mutation-responsive effects that
show enhanced and reduced phenotypic effects was calculated for each class. 95% bootstrap
confidence intervals were then generated using 1000 random samplings of the data with
replacement.
2.5.12. Checking potential consequences of allele frequency bias
Allele frequency bias may result in the erroneous detection of mutation-responsive genetic
effects due to uneven representation of one-, two-, or three-locus combinations across the knockout
40
and wild-type backgrounds. To account for this, we generated 2 × 8, 4 × 8, and 8 × 8 contingency
tables for all one-, two-, or three-locus interactions, respectively, counting each of the possible
allele combinations in the wild-type and seven knockout populations. Specifically, for one-locus
interactions, we counted the number of individuals carrying the BY and 3S allele at the significant
locus for each population. For two-locus interactions, the number of individuals carrying the
BY/BY, BY/3S, 3S/BY, and 3S/3S alleles at the two loci were enumerated. For three-locus
interactions, the number of individuals carrying the BY/BY/BY, BY/BY/3S, BY/3S/BY,
3S/BY/BY, BY/3S/3S, 3S/BY/3S, 3S/3S/BY, and 3S/3S/3S alleles at the three loci were counted.
We then ran chi-square tests to identify individual loci or combinations of loci that show different
frequencies across the eight backgrounds, using Bonferroni corrections to account for multiple
testing. After filtering out genetic effects that involve loci or combinations of loci with biased
frequencies, we repeated our main analyses to ensure that our results were robust to allele
frequency differences (Figures S2.5 and S2.6; Tables S2.11 and S2.12; Note S2.3).
41
2.7 Supplementary Materials
Figure S2.1. Generation of BYx3S knockout segregants. For each knockout or wild type background in
our study, a BY/3S diploid was generated and sporulated. MATa segregants were obtained using the
synthetic genetic array marker system (Methods) (80). Wild type segregants were collected directly from
MATa selection plates, while knockout segregants were replica-plated from MATa selection plates onto
G418 plates prior to their collection to select for segregants with the gene deletion.
Grow diploid in YPD
BY/3S WT or YFG/ yfg∆::KanMX
MA Ta/α
his3∆/his3∆
CAN1/can1∆::STE2prSpHIS5
WT or YFG/ yfg∆::KanMX
MA Ta
his3∆
can1∆::STE2prSpHIS5
yfg∆::KanMX
MA Ta
his3∆
can1∆::STE2prSpHIS5
Pick yfg∆ segregants
Sporulate
Select MATa haploids on YNB + canavanine
Replica plate onto
YPD + G418
Pick WT
segregants
42
Figure S2.2. Certain genes exhibit significant background effects when perturbed. In a preliminary
screen, we generated and phenotyped segregants from 47 mutant versions of the same yeast cross, each of
which lacked a different chromatin-associated protein (Supplementary Table 1). The coefficient of
variation found in a given knockout background is shown on the x-axis, while the phenotypic variance is
shown on the y-axis. In addition to increased phenotypic variance, we found that knockout of certain genes,
in particular ESA1, caused severe growth reductions in all but few outlier segregants, which resulted in a
high coefficient of variation. Presence of these outliers may reflect higher-order interactions among loci,
which would lead to small fraction of individuals showing unusual growth (49, 72). Using Levene’s Test,
we found that seven genes exhibit significant background effects when deleted: CTK1, ESA1, GCN5, HOS3,
HTB1, INO80, and RPD3 (Supplementary Table 2). The points corresponding to these genes are shown
in color, while the point corresponding to wild type is illustrated in black. All other points are gray.
43
Figure S2.3. Allele frequency plot. (a) Allele frequency plots are shown for each knockout and wild type
background. 50kb regions surrounding the knockouts, as well as regions that were fixed due to selection on
markers used to generate MATa haploids, are highlighted in red. Regions where the allele frequency in at
44
least one population is significantly different from other populations (Supplementary Table 4;
Supplementary Note 1; Methods), as well as regions that were fixed due to mitotic recombination or gene
conversion in the progenitor hemizygous diploids, are highlighted in blue (Supplementary Table 5;
Supplementary Note 3; Methods). (b) We observed that all of Chromosome XII was enriched in the
gcn5∆ population. This appears to be due to selection against a recombinant version of the chromosome.
Specifically, individuals who harbored a 3S-BY recombinant haplotype centered on YCS4 were depleted
(Supplementary Table 5). This site of increased recombination on Chromosome XII in the gcn5∆
population is denoted with an asterisk. No recombinants were observed at this site among wild type
segregants, suggesting that the gcn5∆ knockout resulted in a new recombination hotspot.
45
Figure S2.4. Growth of all 1,411 segregants across the 10 environments.
46
47
Figure S2.5. Individual and joint contributions of loci to background effects across different
significance thresholds. In a, for each mutation-responsive two-locus effect identified across different
significance thresholds, the relative phenotypic variance explained (PVE) by the individual involved loci
and their interaction is illustrated. The same analysis was also performed after loci that show biased two-
locus allele frequencies were filtered from the set of two-locus effects identified at the α = 0.05 threshold.
In b, for each mutation-responsive three-locus effect identified across different significance thresholds, the
relative PVE for the individual loci, the pairs of loci, and the trio of loci is provided. Similar to mutation-
responsive two-locus effects, loci that show biased three-locus allele frequencies were filtered from the set
of three-locus effects identified at the α = 0.05 threshold and their relative PVE values were determined.
Relative PVE values were calculated using sum of squares obtained from ANOVA tables, as described in
the Methods. As with the results reported in the paper, which were obtained using the α = 0.05 threshold,
we find that loci involved in most mutation-responsive effects identified at other threshold mainly
contribute to background effects through higher-order epistasis.
48
Figure S2.6 Analysis of how mutation-responsive effects interact with different knockouts at
multiple significance thresholds. Our findings are reported across FDRs of 0.01, 0.05, and 0.10, as well
as α = 0.05 after filtering of loci and multi-locus genotype classes with biased frequencies. In each row,
the left plot shows the number of genetic effects found in one (pink) or more (blue) knockout
backgrounds. The middle plot shows the PVE for mutation-responsive effects in the wild type and
relevant knockout segregants. The third plot shows the percentage of genetic effects with larger PVE in
the relevant knockout background than the wild type background (PVE KO > PVE WT) as a function of the
number of knockouts that interact with the effect. We provide an additional y-axis on the right side of the
49
plot, which indicates the percentage of genetic effects with smaller PVE in the relevant knockout
background than the wild type background (PVE KO < PVE WT) . Error bars represent 95% bootstrap
confidence intervals (Methods).
50
Figure S2.7 Statistical power analysis for one, two, and three-locus interactions. We determined our
statistical power to detect different types of mutation-independent and mutation-responsive genetic effects.
This was done by simulating phenotype data for the 1,411 segregants and then performing genetic mapping
on the simulated phenotype data (Methods). Each set of simulated phenotypes was determined based on
which background of the BYx3S cross a segregant came from, as well as the segregant’s genotype at one
or more randomly chosen loci, and a knockout that was randomly selected to interact with the loci. We then
applied a random deviate to each segregant’s phenotype, which was intended to represent environmental
noise. The relevant fixed-effects linear model was fit using the genotype and simulated phenotype data, and
the p-value for the appropriate term in the model was obtained. Statistical power for a given absolute effect
size was calculated as the proportion of tests that had a p-value £ 0.05. These simulations are based on our
real phenotype data for glucose. In this environment, we detected average absolute effect sizes of 0.07,
0.15, and 0.3 for mutation-independent one, two-, and three-locus effects, and 0.09, 0.13, and 0.26 for
mutation-responsive one-, two-, and three-locus effects.
mutation-independent one-locus
mutation-independent two-locus
mutation-independent three-locus
mutation-responsive one-locus
mutation-responsive two-locus
mutation-responsive three-locus
0.0 0.1 0.2 0.3 0.4 0.5
0.2
0.4
0.6
0.8
1.0
Absolute effect size
Probability of detection at α = 0.05
51
Figure S2.8 Extent to which mutation-responsive effects interact with different knockouts. The
number of knockout backgrounds in which a particular mutation-responsive effect was detected is shown
on the x-axis. The number of mutation-responsive effects in each class is shown on the y-axis.
52
Figure S2.9. Individual and joint contributions of loci to background effects across different
significance thresholds. Each point represents a different knockout background and environment. A
point’s position on the x-axis indicates the difference in mean between a particular knockout (Mean P.Mut)
background and the wild type (Mean P.WT) background in a single environment. On the y-axis, the difference
in the number of genetic effects with enhanced and reduced phenotypic effect in mutants relative to wild
type segregants are shown. The spearman’s ρ and its associated p-value are provided on the plot.
53
Figure S2.10 All seven knockout populations show nominally significant correlations between
changes in phenotypic variance and detected mutation-responsive effects. Individual panels show the
results for the ctk1∆, esa1∆, gcn5∆, hos3∆, htb1∆, ino80∆, and rpd3∆ backgrounds. Each point’s position
on the x-axis represents the difference in phenotypic variance between wild type and knockout populations.
On the y-axis, the difference in the number of genetic effects with enhanced and reduced phenotypic effect
in mutants relative to wild type segregants are shown. The spearman’s ρ values and their associated p-
values are provided on the plot.
54
Figure S2.11. Most mutation-responsive effects show small differences in phenotypic variance
explained (‘PVE’) in mutants relative to wild type segregants. On the y-axis, we show differences in
PVE for each mutation-responsive effect when PVE is computed separately in mutant and wild type
segregants. Mutation-responsive effects above the red line have a higher PVE in mutants , while mutation-
responsive effects below the red line have a higher PVE in wild type segregants.
55
Standard
Name
Systematic
Name
Function(s)
ASF1 YJL115W Nucleosome assembly factor; involved in chromatin assembly,
disassembly
CHD1 YER164W Chromatin remodeler; regulate chromatin structure and maintain chromatin
integrity
CTK1 YKL139W Catalytic (alpha) subunit of C-terminal domain kinase I (CTDK-I);
required for H3K36 trimethylation but not dimethylation by Set2p
DOT1 YDR440W Nucleosomal histone H3-Lys79 methylase
EAF3 YPR023C Subunit of Rpd3S deacetylase and NuA4 acetyltransferase complexes;
essential histone acetyltransferase complex
ELP3 YPL086C Subunit of Elongator complex; exhibits histone acetyltransferase activity
ESA1 YOR244W Catalytic subunit of the histone acetyltransferase complex (NuA4)
GCN5 YGR252W Catalytic subunit of ADA and SAGA histone acetyltransferase complexes
GIS1 YDR096W Histone demethylase and transcription factor
HAT1 YPL001W Catalytic subunit of the Hat1p-Hat2p histone acetyltransferase complex
HAT2 YEL056W Subunit of the Hat1p-Hat2p histone acetyltransferase complex
HDA1 YNL021W Putative catalytic subunit of a class II histone deacetylase complex
HHF1 YBR009C Histone H4
HHF2 YNL030W Histone H4
HHO1 YPL127C Histone H1
HHT1 YBR010W Histone H3
HHT2 YNL031C Histone H3
HIR1 YBL008W Subunit of HIR nucleosome assembly complex
HMT1 YBR034C Nuclear SAM-dependent mono- and asymmetric methyltransferase
HOS1 YPR068C Class I histone deacetylase (HDAC) family member
HOS2 YGL194C Histone deacetylase and subunit of Set3 and Rpd3L complexes
HOS3 YPL116W Trichostatin A-insensitive homodimeric histone deacetylase (HDAC)
HST1 YOL068C NAD(+)-dependent histone deacetylase
HTA1 YDR225W Histone H2A
HTA2 YBL003C Histone H2A
HTB1 YDR224C Histone H2B
HTB2 YBL002W Histone H2B
HTZ1 YOL012C Histone variant H2AZ
INO80 YGL150C ATPase and nucleosome spacing factor
ISW1 YBR245C ATPase subunit of imitation-switch (ISWI) class chromatin remodelers
JHD1 YER051W JmjC domain family histone demethylase specific for H3-K36
JHD2 YJR119C JmjC domain family histone demethylase
NAT4 YMR069W N alpha-acetyl-transferase
RLF2 YPR018W Largest subunit (p90) of the Chromatin Assembly Complex (CAF-1)
RPD3 YNL330C Histone deacetylase, component of both the Rpd3S and Rpd3L complexes
RPH1 YER169W JmjC domain-containing histone demethylase
56
RTT109 YLL002W Histone acetyltransferase
SAS2 YMR127C Histone acetyltransferase (HAT) catalytic subunit of the SAS complex
SAS3 YBL052C Histone acetyltransferase catalytic subunit of NuA3 complex
SET1 YHR119W Histone methyltransferase, subunit of the COMPASS (Set1C) complex
SET2 YJL168C Histone methyltransferase with a role in transcriptional elongation
SET3 YKR029C Subunit of the SET3 histone deacetylase complex
SET4 YJL105W Unknown function; paralog of SET3
SNF1 YDR477W AMP-activated S/T protein kinase; regulates H3 acetylation and chromatin
remodelling
SNF2 YOR290C Catalytic subunit of the SWI/SNF chromatin remodeling complex
SNF5 YBR289W Subunit of the SWI/SNF chromatin remodeling complex
SWR1 YDR334W Swi2/Snf2-related ATPase; structural component of the SWR1 complex,
which exchanges histone variant H2AZ (Htz1p) for chromatin-bound
histone H2A
Table S2.1. List of 47 genes that were screened for background effects. The standard gene name, the
systematic name, and the biological function of each gene are shown.
57
Knockout Mean Growth Variance Levene's p-value
asf1Δ 0.779317208 0.020201882 0.977508865
chd1Δ 1.007573633 0.008422714 0.012616542
ctk1Δ 0.505330301 0.046562534 0.014559018
dot1Δ 1.000469872 0.026939673 0.572692518
eaf3Δ 0.892511372 0.02996817 0.159451789
elp3Δ 0.938803654 0.020439982 0.89517808
esa1Δ 0.299208796 0.124484989 0.009723665
gcn5Δ 0.727106888 0.050249092 0.033325098
gis1Δ 1.062032492 0.012881687 0.134256473
hat1Δ 1.076895249 0.027302896 0.235343973
hat2Δ 1.051544847 0.023497867 0.757295234
hda1Δ 0.932223832 0.029161218 0.50857877
hhf1Δ 0.991287515 0.013080162 0.102604216
hhf2Δ 0.957913123 0.012705815 0.123495677
hho1Δ 1.014461673 0.013516731 0.147089216
hht1Δ 1.007595814 0.027733915 0.488955838
hht2Δ 1.021186261 0.020401193 0.968439543
hir1Δ 0.903639888 0.022913698 0.783955806
hmt1Δ 0.958772913 0.021849944 0.591705973
hos1Δ 0.947034124 0.047755186 0.118881424
hos2Δ 0.979073108 0.015371335 0.275248056
hos3Δ 0.882145103 0.07291512 2.51478E-05
hst1Δ 0.859791025 0.028231512 0.552051447
hta1Δ 0.955155532 0.032539889 0.213817393
hta2Δ 1.051833126 0.042651912 0.125633861
htb1Δ 0.453580596 0.070488173 0.002211331
htb2Δ 1.058080223 0.009848719 0.035718658
htz1Δ 0.739161229 0.016096516 0.299152086
ino80Δ 0.368393318 0.068870748 0.000178763
isw1Δ 0.999762052 0.018146537 0.389392505
jhd1Δ 0.985389791 0.018189507 0.863810698
jhd2Δ 1.015143929 0.028568957 0.436252639
nat4Δ 1.007693438 0.024951602 0.653665809
rlf2Δ 0.992563075 0.040109928 0.850565985
rpd3Δ 0.733300085 0.023198875 0.657552748
rph1Δ 0.977926455 0.022801194 0.949320021
rtt109Δ 0.787318686 0.014075713 0.210182586
sas2Δ 1.032806204 0.015742986 0.32742934
sas3Δ 0.98983123 0.017062259 0.448430416
set1Δ 0.963951523 0.045359724 0.021614959
58
set2Δ 0.917625313 0.02221723 0.963192992
set3Δ 1.011235345 0.011976087 0.092060159
set4Δ 1.042182908 0.022035319 0.910385044
snf1Δ 0.236803736 0.011258233 0.0814094
snf2Δ 0.172978254 0.019080041 0.238953594
snf5Δ 0.315511891 0.012033825 0.1426345
swr1Δ 0.817082916 0.023596511 0.810466613
WT 1.000603396 0.024287414 NA
Table S2.2. Screen summary statistics. The mean growth and variance for each of the 48 knockout and
wild type segregant backgrounds are listed in this table. The “Levene’s p-value” column specifies the p-
value statistic from the Levene’s test used to assess whether the populations of knockout segregants
exhibited significantly different heritable phenotypic variation than a wild type BYx3S segregant
population on ethanol.
59
Population
Number of segregants
with good data
Diploid Aneuploid
Low coverage, cross-
contamination
WT 164 0 0 76
ctk1Δ 210 0 2 28
esa1Δ 122 0 25 93
gcn5Δ 215 9 15 1
hos3Δ 220 1 3 16
htb1Δ 177 1 2 60
ino80Δ 141 5 5 89
rpd3Δ 162 0 0 78
Total 1411 16 52 441
Table S2.3. Mapping population breakdown. This table shows the number of segregants from each
knockout populations that were excluded due to technical issues (i.e., low coverage or cross-contamination)
or biological issues (i.e., anueploidy or diploidy).
60
Chromosome Start position End position
4 720061 861257
4 975252 1408452
5 190283 256159
7 84678 170211
7 275701 386593
7 905443 946974
10 642317 684208
11 53931 316100
12 50192 749934
13 75046 130721
14 231034 607186
15 69484 462566
15 843892 959820
16 195879 525073
Table S2.4. Genomic regions with allele frequency bias. Regions on the genome where the allele
frequency in at least one knockout population significantly differed from other populations were found
using chi-square tests. Specifically, 2x8 contingency tables were generated and chi-square tests were ran
for all 8,311 markers, counting the number of BY and 3S allele in the wild type and 7 knockout populations.
Bonferroni correction was used for multiple testing correction.
61
Hemizygote Chromosome
Start
Position
Stop
Position
Range
(bp)
Fixed
Allele
Positions confirmed
by Sanger
Sequencing
ctk1∆/CTK1 11 0 182309 182309 BY
85326 - 85727,
118436 - 119014, and
178304 - 178885
esa1∆/ESA1 15 657980 663812 5832 3S 658744 - 659697
gcn5∆/GCN5 4 969630 985630 11205 3S 975757 - 976743
gcn5∆/GCN5 4 1123442 1408596 285154 3S
1140174 - 1140998,
1251735 - 1252459,
and 1386674 -
1387285
gcn5∆/GCN5 4 1483478 1525043 41565 3S 1491421 - 1492191
gcn5∆/GCN5 12 519517 543736 24219 BY 529628 - 530346
hos3∆/HOS3 13 524221 526917 2696 3S 524221 - 524865
Table S2.5. Fixed regions in BY/3S hemizygous diploids. This summary table shows regions in the
genome where all segregants within a knockout segregant population carried the same parental haplotype
due to mitotic recombination in our parental diploids. Also listed are positions within the fixed regions that
were PCRed and Sanger sequenced to confirm that these regions were fixed in the parental diploids.
62
Condition Stock solution Amount added per liter Final Concentration
4NQO 0.5 mg/ml 500 uL .25ug/ml
Copper 500mM 18 mL 9 mM
Ethanol 20% 100mL 2% Ethanol
Fluconazole 10 mg/ml 5 mL 50 ug/ml
Glucose NA NA 2% Glucose
High temperature NA NA 2% Glucose/42°C
Hydrogen peroxide 30% 541uL 260 uM
Neomycin 50 mg/ml 140 mL 7 mg/ml
Room temperature NA NA 2% Glucose/22°C
Zeocin 100 mg/ml 50 uL 5 ug/ml
Table S2.6. Phenotyping environments. This table provides information on the environments used for
phenotyping assays in our study. Drugs and chemicals were added to 2% glucose plates with the exception
of the ethanol condition, in which 2% ethanol was used as the carbon source instead of glucose.
63
# of Mutation-independent effects # of Mutation-responsive effects
Two-locus Three-locus Two-locus Three-locus
α 0.05 17 19 295 670
FDR 0.10 17 2 542 2324
FDR 0.05 10 1 250 1311
FDR 0.01 4 0 79 389
Table S2.7. Number of mutation-independent and mutation-responsive genetic effects across
different significance thresholds. One-locus effects are excluded from this analysis as we only performed
scans for these effects at a commonly used significance threshold of α=0.05. Total numbers of all identified
genetic effects, including one-locus effects, can be found in Table S2.9.
64
α 0.05 FDR 0.01 FDR 0.05 FDR 0.10 α 0.05 filtered
No.
interact
ing KOs
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
PVE KO
>
PVE W
T
PVE KO
<
PVE W
T
1 383 2 303 3 893 7 1485 16 178 0
2 174 20 92 8 274 48 478 92 70 8
3 76 50 37 26 97 59 142 110 34 32
4 9 51 3 21 15 69 35 133 3 21
5 21 84 13 27 20 80 28 157 3 32
6 12 120 6 36 6 72 11 181 3 57
7 3 81 1 13 3 39 3 116 2 40
Χ
2
709.37 343.92 981.52 1870.9 341.41
p-value 5.81E-150 3.12E-71 8.87E-209 < 2.2E-16 (0) 1.08E-70
Table S2.8. Chi-squared test results for mutation-responsive effects.This table shows p-values from
chi-squared tests used to test if the ratio of mutation-responsive genetic effects with enhanced and reduced
phenotypic effects in mutants changes as a function of number of knockouts a loci interacts (Methods).
Results are reported across significance thresholds, as well as at the initial α = 0.05 threshold after filtering
out effects involving loci with biased individual or multi-locus allele frequences. Test statistics are also
reported (“Χ-sq statistics”).
65
Condition α 0.05 FDR 0.01 FDR 0.05 FDR 0.10 α 0.05 filtered
4-NQO 0.070 0.167 0.154 0.118 0.000
Copper 0.104 0.148 0.118 0.084 0.123
Fluconazole 0.270 0.846 0.591 0.286 0.053
Hydrogen peroxide 0.136 0.183 0.109 0.100 0.115
High temperature 0.189 0.400 0.400 0.147 0.179
Neomycin 0.262 0.391 0.204 0.175 0.175
Room temperature 0.364 0.500 0.500 0.367 0.000
Glucose 0.649 0.769 0.645 0.635 0.586
Ethanol 0.437 0.443 0.331 0.293 0.405
Zeocin 0.500 0.875 0.875 0.875 0.500
Average 0.298 0.472 0.393 0.308 0.214
Table S2.9. Percentage of mutation-responsive genetic effects that showed phenotypic effect in any
environment outside the one in which they were originally detected. Results are reported across
significance thresholds, as well as at the initial α = 0.05 threshold after filtering out effects involving loci
with biased individual or multi-locus allele frequences on different tabs.
66
1-locus
interaction
2-locus
interaction
3-locus
interaction
Mutation-responsive without
filtering
45 143 406
Mutation-responsive with filtering 14 73 181
Mutation-independent 89 17 19
Table S2.10. Number of mutation-responsive effects that show biased individual and multi-locus
allele frequencies. The numbers of two-locus and three-locus mutation-responsive effects that show biased
individual and multi-locus allele frequencies were determined using chi-square tests. Specifically, 2x8, 4x8,
and 8x8 contingency tables were generated and chi-square tests were ran for all one, two, and three-locus
interactions respectively, counting the number of each loci combination in the wild type and 7 knockout
populations. Bonferroni correction was used for multiple testing correction.
67
4NQO Copper Flu H2O2 HT Neo RT Glu Eth Zeo
ctk1.Enh 1 1 2 2 0 2 1 3 10 1
ctk1.Red 1 20 2 19 11 5 2 5 5 0
ctk1.Diff 0 -19 0 -17 -11 -3 -1 -2 5 1
esa1.Enh 0 2 0 2 1 0 0 3 5 2
esa1.Red 1 20 3 14 11 5 1 2 1 0
esa1.Diff -1 -18 -3 -12 -10 -5 -1 1 4 2
gcn5.Enh 7 3 4 3 4 12 1 1 2 0
gcn5.Red 0 18 3 19 10 6 0 4 4 1
gcn5.Diff 7 -15 1 -16 -6 6 1 -3 -2 -1
hos3.Enh 17 5 15 55 15 70 4 26 296 6
hos3.Red 2 15 3 3 5 2 0 3 1 0
hos3.Diff 15 -10 12 52 10 68 4 23 295 6
htb1.Enh 7 0 19 5 5 9 0 2 4 1
htb1.Red 0 21 2 22 9 5 1 3 2 1
htb1.Diff 7 -21 17 -17 -4 4 -1 -1 2 0
ino80.Enh 0 0 2 7 0 2 1 0 9 2
ino80.Red 0 20 4 13 10 6 0 2 1 1
ino80.Diff 0 -20 -2 -6 -10 -4 1 -2 8 1
rpd3.Enh 6 4 0 2 2 2 0 0 3 0
rpd3.Red 1 15 4 18 12 4 0 3 0 1
rpd3.Diff 5 -11 -4 -16 -10 -2 0 -3 3 -1
WT.Mean 0.918 0.436 0.875 0.896 0.896 1.374 1.053 1.016 1.020 0.973
ctk1.Mean 0.425 0.372 0.113 0.727 0.054 0.407 0.206 0.633 0.504 0.522
esa1.Mean 0.028 0.301 0.066 0.089 0.024 0.161 0.076 0.388 0.144 0.177
gcn5.Mean 0.579 0.537 0.154 0.521 0.260 1.141 0.804 0.897 0.824 0.733
hos3.Mean 0.729 0.356 0.954 0.939 0.595 1.961 0.996 0.887 0.793 0.807
htb1.Mean 0.276 0.495 0.476 0.843 0.414 1.370 0.121 0.721 0.560 0.644
ino80.Mean 0.075 0.439 0.300 0.607 0.014 1.092 0.341 0.558 0.330 0.447
rpd3.Mean 0.387 0.492 0.189 0.721 0.166 1.456 0.931 0.874 0.694 0.827
WT.Var 0.046 0.077 0.083 0.074 0.050 0.183 0.012 0.012 0.022 0.014
ctk1.Var 0.065 0.024 0.039 0.028 0.007 0.080 0.018 0.012 0.043 0.020
esa1.Var 0.015 0.029 0.023 0.029 0.013 0.043 0.016 0.014 0.026 0.028
gcn5.Var 0.121 0.059 0.060 0.031 0.044 0.233 0.013 0.013 0.033 0.011
hos3.Var 0.166 0.083 0.140 0.136 0.060 0.459 0.042 0.032 0.166 0.036
htb1.Var 0.114 0.028 0.137 0.042 0.047 0.183 0.026 0.012 0.029 0.022
ino80.Var 0.018 0.015 0.039 0.036 0.001 0.068 0.024 0.010 0.043 0.017
rpd3.Var 0.100 0.052 0.035 0.032 0.043 0.146 0.018 0.009 0.030 0.012
68
Table S2.11. Phenotypic variance and the number of identified genetic effects. This is a summary table
showing the phenotypic mean and variance in the ctk1Δ, esa1Δ, gcn5Δ, hos3Δ, htb1Δ, ino80Δ, rpd3Δ, and
wild type populations across 10 environments. In addition, the numbers of genetic effects with reduced
(‘Red’) and enhanced (‘Enh’) effects in the knockout background in which the genetic effect was identified
relative to wild type background are shown.
69
Primer Name Primer Sequence
ASF1 Diagnostic F GTGCCACACCTAACCTTCGA
ASF1 Diagnostic R CCGCATCCTTTGGAGTGGAT
ASF1 MX F CTCGAAAGTGTAACAGCGTACTCTCCCTACCATCCAATTGAAACATAA
GATATAGAAAAGCCTTGACAGTCTTGACGTGC
ASF1 MX R ATACATTTTATAAAGTGTACCTCTCTTGCAGGTACCATTAATCTTATAA
CCCATAAATTCCGCACTTAACTTCGCATCTG
CHD1 Diagnostic F CCTCAGCCCATCAATGCGTA
CHD1 Diagnostic R GTATCCAACCAGGCAGGCAT
CHD1 MX F ATTTCTTTAAACCTATACCCAATTCAAAGCAGAACCTTTTCTAATTTAA
TTCTCACTTAT CCTTGACAGTCTTGACGTGC
CHD1 MX R AATACGTTTATAGTTATGGGGGGAAGGAACAATGGAAAATGTGGTGA
AGAAAAATTGTTT CGCACTTAACTTCGCATCTG
CTK1 Diagnostic F CAACTCTTCGACAACTGCGC
CTK1 Diagnostic R TGCTCCTTTGCTGCGTTAGA
CTK1 MX F AGCACTATTCTTTGCACTAGAATAACACAGGGACCATACAGCATAAAT
TATTTGGTAACACCTTGACAGTCTTGACGTGC
CTK1 MX R GTAATAAATAAGTTATTAATCTATTTTTTGTGTCTACTTATTTCAATTG
GCTATATATCCCGCACTTAACTTCGCATCTG
CYC8 Diagnostic F TCCACCGTAGAACCCAAAGC
CYC8 Diagnostic R TAATTGGCGCACAGGAACCT
CYC8 MX F GCTACACAACATTTCTCGTTGATTATAAATTAGTAGATTAATTTTTTGA
ATGCAAACTTTCCTTGACAGTCTTGACGTGC
CYC8 MX R TACAACTACAACAGCAACAACAACAAACAAAACACGACTGGAAAAAA
AAAATTAGGAAAACGCACTTAACTTCGCATCTG
DOT1 Diagnostic F TGGATGAAAGAGCTCTGGCA
DOT1 Diagnostic R AGACCAGGCAGCTGTATTCA
DOT1 MX F AATGGGCGGTCAAGAAAGTATATCAAATAATAACTCAGACTCATTCAT
TATGTCGTCCCCCCTTGACAGTCTTGACGTGC
DOT1 MX R GTGTACATGTTATTTCTACTTAGTTATTCATACTCATCGTTAAAAGCCG
TTCAAAGTGCCCGCACTTAACTTCGCATCTG
EAF3 Diagnostic F TTGCTGCGTCAGAGGGTTAA
EAF3 Diagnostic R TTGCTGCGTCAGAGGGTTAA
EAF3 MX F ATTGCATAAATACGGAAGAACTAAATACTAGAAATAATCCCAAGCTA
GAATATAAACGTCCCTTGACAGTCTTGACGTGC
EAF3 MX R CATTTTGCATTAGCATCTGTGAGGCCTCGTCACTGGATTTACCCTATTG
AAGAACGTATACGCACTTAACTTCGCATCTG
ELP3 Diagnostic F ATCCAAATGACTTCTTATTT
ELP3 Diagnostic R TATACAGCGATAAGACAGTG
ELP3 MX F ATTTAAATTTCTGCTTGGAAAACCGGCCATGTCGGCGGCACATAAAAG
TTCTATTTACCT CCTTGACAGTCTTGACGTGC
ELP3 MX R GTTTTGAAATAAACAAGTCCTAAAAGCACCTAAGGAAAATCGAAGAA
CACCCTGACAAAG CGCACTTAACTTCGCATCTG
ESA1 Diagnostic F GGCCTCTAAATCCTGGCAAT
ESA1 Diagnostic R AGTGCCTGTGTTTGCATTTG
ESA1 MX F TTACCATTCTTTAGACGCTTCCTGTGCTACCATTCTCGGAAATACTGCA
AGAAATCATCG CCTTGACAGTCTTGACGTGC
ESA1 MX R TATTTAAAGCTTTTACATTAGAAGTTGTTTGAATGTAAGTTTAGGAAAG
CACTACATAGC CGCACTTAACTTCGCATCTG
70
GCN5 Diagnostic F GCACACAAGATGACGCTTCC
GCN5 Diagnostic R CTCGCCATTGTACATTCGGC
GCN5 MX F AAGGGAAGACCGTGAGCCGCCCAAAAGTCTTCAGTTAACTCAGGTTCG
TATTCTACATTA CCTTGACAGTCTTGACGTGC
GCN5 MX R CGTACTAAACATTTATTTCTTCTTCGAAAGGAATAGTAGCGGAAAAGC
TTCTTCTACGCA CGCACTTAACTTCGCATCTG
GIS1 Diagnostic F TGGCGTTTTGTGCTTCAAGG
GIS1 Diagnostic R TGTTTGCGGACGGTATGGAA
GIS1 MX F AACAACATCGTTGTAATTTTTTTTTTTTAATTTGAAGAATAGCTACAAA
AACAGACTACACCTTGACAGTCTTGACGTGC
GIS1 MX R TACAGGAAAATATTCGATAAAAATTTTTTTTGAACCCATTTTGTATATC
ATTTTCTTGACCGCACTTAACTTCGCATCTG
HAT1 Diagnostic F CAAACTGTTTCATGTTAGTG
HAT1 Diagnostic R TAAATTCTTGATCAAATACG
HAT1 MX F TTCTGGAATTGTTTTCAGCAAAATTATGCTTAAGCTATAACTATAGTGA
GAATCAAGAAT CCTTGACAGTCTTGACGTGC
HAT1 MX R GAATTTCTTATTTCAGGCTTGTTAAACAAATAAATATGTTATTATATAT
TTAATAAACAG CGCACTTAACTTCGCATCTG
HAT2 Diagnostic F CTCTTTAACGGCGCCTCAAG
HAT2 Diagnostic R CACGATGGGTAGACTATGGGA
HAT2 MX F TATCTCTCCTATCAATTGTGGTTAGCCTAGTAGTCACCAAATAGCAAAT
TACCAATCAAC CCTTGACAGTCTTGACGTGC
HAT2 MX R TGTATGTTGATCTTTGTTTAATTACGCCTTTTCGCCAAAGAAACAATAA
AAAAACTATAT CGCACTTAACTTCGCATCTG
HDA1 Diagnostic F TTATATTTCCAACACGAATCGAGAACT
HDA1 Diagnostic F TTATATTTCCAACACGAATCGAGAACT
HDA1 MX F ACATAACAAAATATTGAGAAAGGGAAAGTTGAGCACTGTAATACGCC
GAACAGATTAAGCCCTTGACAGTCTTGACGTGC
HDA1 MX R ATTCAACTTTCATAAGGCATGAAGGTTGCCGAAAAAAAATTATTAATG
GCCAGTTTTTCCCGCACTTAACTTCGCATCTG
HHF1 Diagnostic F CAAATTATTCCATCATTAAA
HHF1 Diagnostic R AGTCAAGGAGAGATATTACG
HHF1 MX F CAGTTGAATACGAATCCCAAATATTTGCTTGTTGTTACCGTTTTCTTAG
AATTAGCTAAA CCTTGACAGTCTTGACGTGC
HHF1 MX R TGTACTCTATAGTACTAAAGCAACAAACAAAAACAAGCAACAAATAT
AATATAGTAAAAT CGCACTTAACTTCGCATCTG
HHF2 Diagnostic F AAAAGAACAAGAAAAAGATT
HHF2 Diagnostic R CAACAGATAAATGATGACAA
HHF2 MX F TCTTTTTTCCTACATCTTGTTCAAAAGAGTAGCAAAAACAACAATCAAT
ACAATAAAATA CCTTGACAGTCTTGACGTGC
HHF2 MX R TTTTATTTTTTGAAAGGCATGAAAATAATTTCAAACACCGATTGTTTAA
CCACCGATTGT CGCACTTAACTTCGCATCTG
HHO1 Diagnostic F CCACGTCGTGAACAGACAGT
HHO1 Diagnostic R GCTGTTTGCTTTGATGAAATGCT
HHO1 MX F AAGAAAATAGGTTTGATAGTATTGCTATCACCATTGACATTCTCGTTTG
GATATTCACTT CCTTGACAGTCTTGACGTGC
HHO1 MX R TTATGGGCACCTGATAATGCTTGGCAGCGAGGGAAGCAATTATAATAC
AACTAAAGCAAC CGCACTTAACTTCGCATCTG
HHT1 Diagnostic F AAGCGCTCGGAACAGTTTTA
71
HHT1 Diagnostic R GACACCCCAACCTACTCCAA
HHT1 MX F TATATTCTTTCTTTCTAGTTAATAAGAAAAACATCTAACATAAATATAT
AAACGCAAACA CCTTGACAGTCTTGACGTGC
HHT1 MX R TGATTTATATTTTATTGTGTTTTTGTTCGTTTTTTACTAAAACTGATGAC
AATCAACAAA CGCACTTAACTTCGCATCTG
HHT2 Diagnostic F TGATTGGTTGTATAAGAAAA
HHT2 Diagnostic R TAAGTAACAGAGTCCCTGAT
HHT2 MX F TGTTTGTATGATGTCCCCCCAGTCTAAATGCATAGAAAAAAAAAAATT
CCCGCTTTATAT CCTTGACAGTCTTGACGTGC
HHT2 MX R ACTTTGGCCCTTCCAACTGTTCTTCCCCTTTTACTAAAGGATCCAAGCA
AACACTCCACA CGCACTTAACTTCGCATCTG
HIR1 Diagnostic F CGCACTTACTCGATCCTGCT
HIR1 Diagnostic R AGCGGGATCAAAAACAACGC
HIR1 MX F ATGAAAGTGGTAAAGTTTCCATGGTTGGCTCACCGTGAAGAATCGCGA
AAATATGAAATACCTTGACAGTCTTGACGTGC
HIR1 MX R ATAAAATATAGACGTAATTATGAGGGAAAAAACTTGTCCAAAGGAAG
GGGTATAAGCTTACGCACTTAACTTCGCATCTG
HMT1 Diagnostic F TTGCTGCAATTCGGATGCTG
HMT1 Diagnostic R GACAGCCGTGAAAGATTCTGC
HMT1 MX F TATGAACAAGTTTGTTTATTTGCTTTTCAAATTTTTTTCTTTCTCCAGCA
AACAAAAGTC CCTTGACAGTCTTGACGTGC
HMT1 MX R CTGCTCACCTTGCCGTTTCCAAAAAAGAGTTAGAACCGACAAATTCAT
CCAAAGAAAATA CGCACTTAACTTCGCATCTG
HOS1 Diagnostic F AGCCGTTGATCACACCGAAA
HOS1 Diagnostic R AAACAGAGGGCAAGCGGATT
HOS1 MX F TTACAGTTCGTAAAACTTCATAAGTTCGACCATATCTCTATCCTTATTG
TCATTCCTTAACCTTGACAGTCTTGACGTGC
HOS1 MX R AAAAAGGTGTATGTACTGTAATATGAATTAATAAACACCTGTCCATTT
TAGAAAAACGCTCGCACTTAACTTCGCATCTG
HOS2 Diagnostic F TCGAATACGGAGTGCACCTT
HOS2 Diagnostic R TCTCGATGTTCTTTGGGGCA
HOS2 MX F GAAAATAAAAAAAAAAAAAAAAAAAAAAACGGGAGATTAACCGAAT
AGCAAACTCTTAAA CCTTGACAGTCTTGACGTGC
HOS2 MX R AAGACGCCAGATTACTCAAGTACGTTAAAATCAGGTATCAAGTGAATA
ACAACACGCAAC CGCACTTAACTTCGCATCTG
HOS3 Diagnostic F AGGGAAAAGAATCGGCTGGG
HOS3 Diagnostic R GTCCCATCACCGTGGTGTAG
HOS3 MX F ACTGAAAATATAACGAAAAAAAGGGCTCTGGAAGTAAACAGAGAAAT
TCGACGATATAATCCTTGACAGTCTTGACGTGC
HOS3 MX R TCACCATCTTCCACCACTTCTTGTTGTATGTTTTCTTGAAACATGAGAA
ATCATTGATATCGCACTTAACTTCGCATCTG
HST1 Diagnostic F TGGAATCGTTGCTGGGTCAG
HST1 Diagnostic R GCCAGTGGAAGTCAACTCGA
HST1 MX F TTACTGTTGTTTCTTTCGTGGCTGTTTCTTAATCTTATACGTACCTTTAT
CTATTTCCGT CCTTGACAGTCTTGACGTGC
HST1 MX R TGGTAGTGATACGAACACTTCTCTTCTTTTTTGTTGTTTTTGTGAGAAA
AAAAAATCTAA CGCACTTAACTTCGCATCTG
HTA1 Diagnostic F GGTTTCTTTTCAGCTGGGGC
HTA1 Diagnostic R TCAACATGTCACCAGTGGCC
72
HTA1 MX F TTATTTCTCAGTGAATAAACAACTTCAAAACAAACAAATTTCATACAT
ATAAAATATAAA CCTTGACAGTCTTGACGTGC
HTA1 MX R TAGTTACAATGGAGAAGCAGTTTAGTTCCTTCCGCCTTCTTTAAAATAC
CAGAACCGATC CGCACTTAACTTCGCATCTG
HTA2 Diagnostic F GCGCCTTCTATTCCGGAGAA
HTA2 Diagnostic R TGACGGCAAGTGTCTCACTG
HTA2 MX F TACTTTAAAACCCCAAATGACAAGAATGTTTGATTTGCTTTGTTTCTTT
TCAACTCAGTT CCTTGACAGTCTTGACGTGC
HTA2 MX R ATTCTTGTCTTTTTACATAAGAATTAGGAAAGTACAGAACAAGAGCAA
ATTTAATATATA CGCACTTAACTTCGCATCTG
HTB1 Diagnostic F ACGATCCAGTCAGCGACATC
HTB1 Diagnostic R AGCCGAAAAGAAACCAGCCT
HTB1 MX F TTAATTTTTATATACCCATATAAATAATAATATTAATTATAACCAAAGG
AAGTGATTTCACCTTGACAGTCTTGACGTGC
HTB1 MX R TTATATTAAATTTATCCTATATAGACAAGTCAAACCACAAATAAACCA
TACACACATACACGCACTTAACTTCGCATCTG
HTB2 Diagnostic F ATGGCCCCCAGGTTAATGTG
HTB2 Diagnostic R ACAGCCCTAGTACCTTCGGA
HTB2 MX F TCTTCTTGTTAATTTTTTCTGATTGCTCTATACTCAAACCAACAACAAC
TTACTCTACAACCTTGACAGTCTTGACGTGC
HTB2 MX R TATAAAAAATGCCACTAATAAAAAGAAAACATGACTAAATCACAATA
CCTAGTGAGTGACCGCACTTAACTTCGCATCTG
HTZ1 Diagnostic F CGTTCGTGGAACAGTGAGGA
HTZ1 Diagnostic R GCAACGCACAAAGCTTCGTA
HTZ1 MX F ATAGAATGAGGATACAGGAGCAGGGAGAATTACGGGAAATGGGAAA
GAAAAACTATTCTTCCTTGACAGTCTTGACGTGC
HTZ1 MX R GAAAAAATATCGTTAAATTCAATTTCGCACTATAGCCGCACGTAAAAA
TAACTTAACATACGCACTTAACTTCGCATCTG
INO80 Diagnostic F AGCTTTGGAAAACGGCGAAA
INO80 Diagnostic R ATACTGGATGAGGCCCAAGC
INO80 MX F AGCAGATTAAAGATAGACATTAACTCCGCTTAATGTAAATAACACAAT
ATGAATACCTTTCCTTGACAGTCTTGACGTGC
INO80 MX R ACCGATCCTGTCCATATTAGCAAAGCAAGGCTTAAGACATATAGAAGA
GCATTTATAGACCGCACTTAACTTCGCATCTG
ISW1 Diagnostic F CGGCCGGCCATATCTAGAC
ISW1 Diagnostic R CAACCCCACCAAACGTGAAC
ISW1 MX F TTTTCTTCAGAAGCATGGTGTAGGATATATTAAAAAAAATCGAAATAT
AAAAAAAGAAGG CCTTGACAGTCTTGACGTGC
ISW1 MX R AGAGTCCATATGTATAGCTATGCAAAAACCAGCTAGAGGTGGATGTAG
AAATACCCTATT CGCACTTAACTTCGCATCTG
JHD1 Diagnostic F CCTGTGGTGGCATTCATCTTG
JHD1 Diagnostic R1 GTGATTGAACGTCCATTACATCA
JHD1 MX F ATTATAATGAGTAAGAAGACGTAATGATCATAAAACAAAATACTAAT
AAGCTATGGTGCACCTTGACAGTCTTGACGTGC
JHD1 MX R TTACGAAATAAGATCGGCTAAAGCTTTCGCTAAATGCTTCTTTGAAGT
GAAATTTAACGGCGCACTTAACTTCGCATCTG
JHD2 Diagnostic F GTGGAGTTCAAGGCTTGAGC
JHD2 Diagnostic R TGTGCATGTTGACAACGCAG
73
JHD2 MX F GGCGTGCTAAATGCCAAGTATTATTCTAAAAAATCATTACGCCATACA
CAAATATTGAAGCCTTGACAGTCTTGACGTGC
JHD2 MX R CAATTTTACCTCTAGATCATATTAACTAATCTCATCTTGCACAAAAAAC
GTATCACTATCCGCACTTAACTTCGCATCTG
KanMX check F CAGATGCGAAGTTAAGTGCG
KanMX check R GCACGTCAAGACTGTCAAGG
NAT4 Diagnostic F GTATGCCTGGAGTATGGCAGT
NAT4 Diagnostic R AAGCGCCTCATATAGTCGCC
NAT4 MX F CGCGCCATTAAAAGATTTTTTTCTTAGCTCTTTTTCTTTTTTCTTTTTCTT
TCCACTGAG CCTTGACAGTCTTGACGTGC
NAT4 MX R GCCGCGTACGTAGTTTTACTTTTAATTTTTTTTTTATCGCGCGTTGTCCC
TGTCGGCTTT CGCACTTAACTTCGCATCTG
RLF2 Diagnostic F GCTGGATCAAGTGGTTCCCT
RLF2 Diagnostic R GCACGCCTTTGTCTTTCCTC
RLF2 MX F CAGAGAATTATATGTTTTAGTGAACCTCAAGACAGAAGAGAATCGAA
AGGAAAAGGGAAACCTTGACAGTCTTGACGTGC
RLF2 MX R TGTATACCAATAAATAATCAGTTTATCTGTATGTTTCTATATACTAAAG
ATCCGTTCAAGCGCACTTAACTTCGCATCTG
RPD3 Diagnostic F ACTGGTTTTGTACAGCGCTG
RPD3 Diagnostic R ACTGGTTTTGTACAGCGCTG
RPD3 MX F TTCACTTTTCTTCTTTTGTTTCACATTATTTATATTCGTATATACTTCCA
ACTCTTTTTTCCTTGACAGTCTTGACGTGC
RPD3 MX R GGTTCATAAAACAATTGCGCCATACAAAACATTCGTGGCTACAACTCG
ATATCCGTGCAGCGCACTTAACTTCGCATCTG
RPH1 Diagnostic F CATCGCCATGCAATTAATCA
RPH1 Diagnostic R ACCACATGGCAGATGGTTTT
RPH1 MX F AAAAAAAAGGGAAAAAAAGAATAAGACTGTCTTGGTGAGGATATTCA
GTTGCGTGAAATC CCTTGACAGTCTTGACGTGC
RPH1 MX R CGAGCACATTTTAAGAGCCTTCAAAATGAGAGATCTCGGTAAACAACT
GGCAATCGTGAG CGCACTTAACTTCGCATCTG
RSC4 Diagnostic F CGGAAGACAAGTAGCCCAAGA
RSC4 Diagnostic R CGCCGCTGTAAAACTATCGC
RSC4 MX F ACCCTCAGCCTGTTATTACAATAGAGAGCAAACCAAAGAATAATAGAT
AAAGTACACAGACCTTGACAGTCTTGACGTGC
RSC4 MX R ATATAGGTTGTATATAGATACATGCATATGATGGGAAGACTATGAAGA
GAGAGATAGTCACGCACTTAACTTCGCATCTG
RTT109 Diagnostic
F
TCGAGGTGTTTCGTCATCGG
RTT109 Diagnostic
R
GATGTTGCTTGCAGGAACCG
RTT109 MX F TTTGTCAATAGAGTTGTCCAGTAGAGTTAAAAGGTCAATTCAACCGGT
CTTCAATAAGAC CCTTGACAGTCTTGACGTGC
RTT109 MX R CATGCATTTTCTAAGATCGATGCTACATACGTGTACTAAATAATAAAT
ATCAATATGTAT CGCACTTAACTTCGCATCTG
SAS2 Diagnostic F CATCGAAAAACCGGCCCAAA
SAS2 Diagnostic R TACACGGATGATCAGACGCG
SAS2 MX F CATCGAGCGATATTCTATCCTGAAATACATATGCCATTAAGTTACATCC
TGAATAGATTCCCTTGACAGTCTTGACGTGC
74
SAS2 MX R ATTTTTTGATATTGGAGGCTCCTATTTTCTAGTTGCTTTTTGTTTTCACT
CGCAAAAAAACGCACTTAACTTCGCATCTG
SAS3 Diagnostic F ACCATTTGTCGCCGCTAGAA
SAS3 Diagnostic R TCGCCATAAAACACGCTCAT
SAS3 MX F CTTATTGCTATTAATAATGTTACATGTATATGCTTATATCCAATATATA
CCCATCGCCGCCCTTGACAGTCTTGACGTGC
SAS3 MX R AAAAGCATTGCTATTCTTTTCTCATAGGTGTTATTCATACCGCCCTCTC
TCTTCTTCCTTCGCACTTAACTTCGCATCTG
SET1 Diagnostic F AAATATCCGCTCGACCAGTCC
SET1 Diagnostic F AAATATCCGCTCGACCAGTCC
SET1 MX F CTAGCATAGGTAACATTCCTTATTTGTTGAATCTTTATAAGAGGTCTCT
GCGTTTAGAGACCTTGACAGTCTTGACGTGC
SET1 MX R TTTGCTGGAAAGCAACGATATGTTAAATCAGGAAGCTCCAAACAAATC
AATGTATCATCGCGCACTTAACTTCGCATCTG
SET2 Diagnostic F ACTAGTCAACGACGCTGACC
SET2 Diagnostic R GCCTGGCGCTTTAGACTCTA
SET2 MX F ACAAGACTTCCTTTGGGACAGAAAACGTGAAACAAGCCCCAAATATG
CATGTCTGGTTAACCTTGACAGTCTTGACGTGC
SET2 MX R AAAACTGCATAGTCGTGCTGTCAAACCTTTCTCCTTTCCTGGTTGTTGT
TTTACGTGATCCGCACTTAACTTCGCATCTG
SET3 Diagnostic F GCACTAGGCACCAGTGTTGA
SET3 Diagnostic R TCCTTAATCGAAATAATGGTCCAAAA
SET3 MX F TGAATATTCACTTTTGAATATACTTAAGTTTATATAGGTGTAAGAAGG
AAATGTCCATGTCCTTGACAGTCTTGACGTGC
SET3 MX R GATTTAAAGCGTATATACAACAGTTTTAGATCGTACTTCACAAAATAC
GAGAACTGAATCCGCACTTAACTTCGCATCTG
SET4 Diagnostic F ACATGTAGAGGTCACCGGGA
SET4 Diagnostic R GGAGTCACTTGAACCGCAGA
SET4 MX F AACGCCGGAATAAGATTGGTACCCTCGTCAGAAAGTTACAAATACCGC
TTCATCTTCAAACCTTGACAGTCTTGACGTGC
SET4 MX R ATGATAAAATTAAGCTTTCAAAAAAGATTAAAATGAATACTATTAATT
TTAAAATTTCGTCGCACTTAACTTCGCATCTG
SIR2 Diagnostic F CTTAACACATTTAAACCATG
SIR2 Diagnostic R AGCTCTAATTTGAAAGAAAT
SIR2 MX F TTGCCATACTATGTAAATTGATATTAATTTGGCACTTTTAAATTATTAA
ATTGCCTTCTA CCTTGACAGTCTTGACGTGC
SIR2 MX R AGGCATCGCTTCGGTAGACACATTCAAACCATTTTTCCCTCATCGGCA
CATTAAAGCTGG CGCACTTAACTTCGCATCTG
SNF1 Diagnostic F GCAAAAGGATGGGCGTGATG
SNF1 Diagnostic F2 GCAAAAGGATGGGCGTGATG
SNF1 MX F GAAAGAAATAGAAGTTTTTTTTTGTAACAAGTTTTGCTACACTCCCTTA
ATAAAGTCAACCCTTGACAGTCTTGACGTGC
SNF1 MX R AAATACGTTACGATACATAAAAAAAAGGGAACTTCCATATCATTCTTT
TACGTTCCACCACGCACTTAACTTCGCATCTG
SNF2 Diagnostic F TGTGTTGCTAGCAGGGTGTT
SNF2 Diagnostic R CTTGATTATGTCCGCACGCC
SNF2 MX F GAGGGATTAATGTTTGTCTACGTATAAACGAATAAGTACTTATATTGC
TTTAGGAAGGTACCTTGACAGTCTTGACGTGC
75
SNF2 MX R ATGAACATACCACAGCGTCAATTTAGCAACGAAGAGGTCAACCGCTG
CTATTTAAGATGGCGCACTTAACTTCGCATCTG
SNF5 Diagnostic F CAGGTGCTTGAAGGAGGGAG
SNF5 Diagnostic R ACTGTTGTTGCTGTTGTCGC
SNF5 MX F AAACACCAAAACAAAGCATCATCAAGGGAACATATAGTAAAGAACTA
CACAAAAGCAACACCTTGACAGTCTTGACGTGC
SNF5 MX R TACAAATTCTTCCACGGTTATTTACATCTCCGGTATATTTTATATATGT
GTATATATTTTCGCACTTAACTTCGCATCTG
SWR1 Diagnostic F ACCCTTGTCTTTTGCAGCCT
SWR1 Diagnostic R TGAACTGCGAACCTGGCTAG
SWR1 MX F TGAAAATTTATGAATTCTAACTGCTCTTTGCATTTTCCAAGTTATTGCA
TTACAAGAATACCTTGACAGTCTTGACGTGC
SWR1 MX R TCAATAATAATAACCGTTGGCAATAAACCTGATCATGTACTCGTCAAC
ATGGGCAGTGCCCGCACTTAACTTCGCATCTG
TUP1 Diagnostic F AAGCTCTCCCGTCAAAGCAA
TUP1 Diagnostic R AGGAAAAGGAGGGGAAGGGA
TUP1 MX F TTTTTTTGTCTTTTTTGATAAGCAGGGGAAGAAAGAAATCAGCTTTCCA
TCCAAACCAATCCTTGACAGTCTTGACGTGC
TUP1 MX R ATGAATTGAAGAATAGTTTAGTTAGTTACATTTGTAAAGTGTTCCTTTT
GTGTTCTGTTCCGCACTTAACTTCGCATCTG
Ctk1_ChrXI_check1
_F
CCGCATTTACTGCACATCC
Ctk1_ChrXI_check1
_R
GAAATTATACGCGGCAAGGA
Ctk1_ChrXI_check2
_F
GCACCACACTAGCCTTCGAT
Ctk1_ChrXI_check2
_R
CCAAAAGCAATCCAGGAAAA
Ctk1_ChrXI_check3
_F
TTGTGGATTTCGGAGAAAGG
Ctk1_ChrXI_check3
_R
AGGGAACACTCCGTCTGAAA
Ctk1_ChrXI_check_
control1_F
GCACCCGAAAAATTAACTTGA
Ctk1_ChrXI_check_
control1_R
TGGGATCAGAACAATCATTACAA
Gcn5_ChrIV_check
1_F
GAACTGGCTCTCCCACAGTC
Gcn5_ChrIV_check
1_R
GGACGAATCAACGGAGGGTT
Gcn5_ChrIV_check
2.1_F
TCCGAATTGCCCGAAAATGC
Gcn5_ChrIV_check
2.1_R
GACACCTTACGTTCTCGCCT
Gcn5_ChrIV_check
2.2_F
AGGGTAATTCATCCTTCCACCA
Gcn5_ChrIV_check
2.2_R
TCGAAAGCAAACTTGTGAAGGC
76
Gcn5_ChrIV_check
2.3_F
TGCTTTTTCCCATCCCTGCA
Gcn5_ChrIV_check
2.3_R
TCGCCGTTGTTCAAGACCTT
Gcn5_ChrIV_check
3_F
CCCAAACTGATTGCCAAGCT
Gcn5_ChrIV_check
3_R
TGGCACTAGCTGGTAGTAGC
Gcn5_ChrXII_check
1_F
TCGGCCCAAACACCAGATTT
Gcn5_ChrXII_check
1_R
CCCCTTTTCCTTCCTCGTCC
Gcn5_ChrIV_check
_control_F
CATGGTCCAAGGTTGCAAGT
Gcn5_ChrIV_check
_control_R
CCCGCCATCAATCATTGTGC
Gcn5_ChrXII_check
_control_F
GTGCCATTCAAGTGCCCAATT
Gcn5_ChrXII_check
_control_R
ACTTTCCCATCCGCATAAAGA
Table S2.12. List of all primers used in this study.
77
Note S2.1. Conditional essentiality of esa1Δ segregants. Esa1, the catalytic subunit of the NuA4 HAT
complex, is essential in the BY background, with esa1∆ spores only able to divide four or five times
following germination before dying (86). However, we found that roughly 1% of the esa1Δ knockout
segregants generated from the ESA1 hemizygous diploids survive in our study. To examine for selection
on specific genotypes in the esa1Δ segregants, allele frequency was examined across all SNP markers
(Figure S2.3a). All esa1∆ segregants carried the 3S genotype from positions 467,219 bp to 472,584 bp on
Chromosome XIV, which is centered on END3, an EH-domain containing protein that functions in
endocytosis and actin cytoskeleton formation. Two lines of evidence suggest that END3 is the causal gene
at this locus: end3∆ and a temperature sensitive allele of ESA1 were previously found to be synthetic lethal
in the BY background (87), and END3 is a major contributor to trait variation in the BYx3S cross (20, 49,
50, 72). We note that this chromosome XIV locus was also identified as having a large additive effect in
other knockout and wild type populations. We did not observe any other fixed regions in esa1∆ segregants.
This implies that conditional essentiality of ESA1 may depend on the cumulative effect of the chromosome
XIV region and many small effects variants.
Note S2.2. All major results are robust to different significance thresholds. Significance thresholds can
affect the results and interpretations of genetic mapping studies, especially those focused on genetic
interactions. To address this possibility, we reiterated our work across a number of different False Discovery
Rate (FDR) thresholds (Methods). Although choice of threshold impacted the number of genetic effects
that were detected (Table S2.7), we found that all of the major results remain the same regardless of
threshold (Figures S2.5 and S2.6; Tables S2.8 and S2.9). This implies that our main conclusions are robust
to threshold.
Note S2.3. All major results are also robust to bias in allele combinations. In the paper, we show that
mutation-independent effects tend to be genetically simpler, while mutation-responsive effects tend to be
more genetically complex. This may be a technical artifact driven by allele frequency differences between
the different knockout and wild type versions of the BYx3S cross, which has the potential to cause both
false negatives and false positives. To examine this possibility, we excluded loci that show biased individual
or multi-locus allele frequencies from our analyses. After exclusion, we found that mutation-responsive
effects still involve more higher-order epistasis than mutation-independent effects (Table S2.10). This
difference was determined to be significant for both two-locus and three-locus mutation-responsive effects
using chi-square tests (p-value = 6.731 x 10
-19
and 5 x 10
-30
, respectively). Additionally, similar to analyses
done with different significance thresholds, we find that our conclusions remain qualitatively the same
78
when we include or exclude loci that exhibit allele frequency bias (Figures. S2.5 and S2.6; Tables S2.8
and 2.9). This implies that our major findings are not the results of technical artifacts.
79
Chapter 3: The interplay of additivity, dominance, and epistasis in a diploid
yeast cross
This work was performed in collaboration with Takeshi Matsui, Kevin Roy, Joseph J. Hale,
Rachel Schell, Sasha F. Levy, and Ian M. Ehrenreich. It is available as a preprint on bioRxiv.
https://doi.org/10.1101/2021.07.20.453124
3.1. Abstract
We used a double barcoding system to generate and phenotype a panel of ~200,000 diploid yeast
segregants that can be partitioned into hundreds of interrelated families. This experimental design
enabled the detection of thousands of genetic interactions and many loci whose effects vary across
families. Traits were largely specified by a small number of hub loci with major additive and
dominance effects, and pervasive epistasis. Genetic background commonly influenced both the
additive and dominance effects of loci, with multiple modifiers typically involved. The most
prominent dominance modifier was the mating locus, which had no effect on its own. Our findings
show that the interplay between additivity, dominance, and epistasis underlies a complex
genotype-to-phenotype map in diploids.
80
3.2. Introduction
Most complex traits, including many phenotypes of agricultural, clinical, and evolutionary
significance, are specified by multiple loci (88). How alleles at these loci collectively produce the
heritable trait variation in genetically diverse populations remains unresolved
(89, 90). While
additive loci play a major role in most traits, non-additive genetic effects are also likely important
(23, 57, 59, 91–93). However, loci with non-additive genetic effects are often difficult to detect,
limiting knowledge of their properties (60, 94).
The two purely genetic sources of non-additivity are dominance among alleles of
individual loci and epistasis between alleles at different loci (or genetic interactions) (95, 96). Most
empirical studies of non-additive genetic effects have focused on haploid or inbred individuals
(72,
97, 98), which provide higher statistical power to detect loci due to their nominal levels of
heterozygosity. However, by design, these populations cannot furnish insight into dominance or
its relationship with epistasis. This is a problem because many eukaryotic species that matter to
humans, including our species itself, exist predominantly as diploids that outbreed and have high
levels of heterozygosity (99–101). Dominance may be an important contributor to traits in these
species.
When epistasis occurs in diploids, a locus may influence only the additive effects, only the
dominance effects, or both the additive and dominance effects of its interactor(s) (102–104). Such
interplay has implications for efforts to genetically dissect phenotypes, predict heritable traits from
genotypes, and understand the evolutionary trajectories of beneficial and deleterious alleles. Yet,
exploration of the relationship between additivity, dominance, and epistasis has mainly been
limited to theory because of technical challenges in identifying non-additive loci.
81
The budding yeast Saccharomyces cerevisiae is a potentially powerful system for studying
non-additive genetics in diploids. Haploid yeast segregants with known genotypes can be mated
to produce diploids that also have known genotypes (105). This strategy facilitates the generation
of diploid mapping populations that are roughly the square of the number of haploid progenitors.
However, phenotyping large diploid populations of more than ~10,000 individuals has been
technically difficult (105, 106), limiting the use of this strategy.
82
3.3. Results
3.3.1. Phenotyping of a large diploid cross by barcode sequencing
To enable phenotyping of large yeast diploid mapping populations, we developed a system
that fuses two genomic barcodes, one from each haploid parent, in vivo to create a unique double
barcode for each diploid genotype (Figure 3.1A and Figure S3.1). We started with two S.
cerevisiae isolates, the commonly used lab strain BY4716 (BY) and a haploid derivative of the
clinical isolate 322134S (3S). These strains differ at ~45,000 SNPs (~0.4% of genome)
(20, 107).
To ensure segregation of the mating locus, both BY MATa x 3S MATα and 3S MATa x BY MATα
crosses were performed using isogenic strains that had been mating type switched. From these
crosses, 600 MATα and 400 MATa segregants from distinct four-spore tetrads were marked at the
neutral YBR209W locus by integrating a random barcode (108, 109). At least two uniquely
barcoded strains were recovered per haploid segregant and the genome of each segregant was
sequenced to define the genotype represented by each barcode (Figure S3.1 and Figure S3.2A).
MATa strains and MATα strains (2 barcodes per segregant) were mated as pairs and grown
on media that induced site-directed recombination between the MATa barcode and MATα barcode
on homologous chromosomes (Figure 3.1B and Figure S3.1E) (109, 110). This process resulted
in a double barcode on one chromosome that uniquely identifies both parents of a diploid segregant
and therefore its presumptive genotype (Figure S3.2B). Using similar methods, we also
constructed BY/BY, BY/3S, 3S/BY, and 3S/3S parental diploids.
83
84
Figure 3.1 Generating a large panel of diploid segregants with known genotypes that can be
phenotyped as a pool. A. Overview of the experimental design. Parental haploids, BY and 3S, were mated
and sporulated. The resulting MATα and MATa segregants were barcoded at a common genomic location
and sequenced. Segregants were mated as pairs to generate a panel of ~240,000 double-barcoded diploid
segregants with known genotypes. All diploid segregants originating from a single haploid parent are
referred to as ‘family’. B. MATα and MATa barcodes were brought to the same genomic location by
inducing recombination between homologous chromosomes via Cre-loxP. C. Diploid segregants were
pooled and grown in competition for 12-15 generations. Barcode sequencing over the course of the
competition was used to estimate the fitness of each strain. D. Density plot of the raw fitness of double
barcodes representing the same genotype in the same pooled growth condition (Glucose 1). E. Density plot
of the mean raw fitness of the same genotype measured in two replicate growth cultures (Glucose 1 and
Glucose 2). F. The broad-sense and narrow-sense heritability estimates for the 8 environments. The
standard errors for both heritability estimates are shown as error bars for each point. G. Violin plots of the
fitnesses of diploid segregants in 8 environments. Raw fitness estimates of BY/BY, BY/3S, 3S/BY, and
3S/3S diploid segregants are shown as colored lines.
After the matings, diploids were pooled and grown in seven conditions: cobalt chloride,
copper sulfate, glucose, hydrogen peroxide, sodium chloride, rapamycin, and zeocin with the
glucose condition performed twice (Table S3.1). Cells were grown for ~15 generations in serial
batch culture, with 1:8 dilution every ~3 generations and a bottleneck population size greater than
2 x 10
9
cells (Figure 3.1C). Double barcodes were enumerated over 4-5 timepoints by sequencing
amplicons from the double barcode locus, and the resulting frequency trajectories were used to
estimate the relative fitness of each strain (111, 112).
We recovered on average 197,267 strains per environment with a minimum of two replicate
fitness estimates (Table S3.2). Fitness measures correlated well between different barcodes
marking the same strain within a growth pool (0.524 < r < 0.8 Spearman’s correlation, Figure
3.1D; Figures S3.3 and S3.4), as did fitness measures of the same barcoded strain assayed in
replicate glucose growth cultures (Spearman’s correlation = 0.863, Figure 3.1E).
85
Substantial phenotypic diversity was observed in every environment. The majority of this
variation was due to genetic factors: broad-sense heritabilities were on average 61% (52-76%
across environments), with 40% (19-53%) being additive and 21% (19-26%) being non-additive
(Figure 3.1F, Figure S3.5, and Table S3.3). Every environment contained many diploids with
more extreme fitness than either the BY/BY or 3S/3S parent (i.e., transgressive segregation).
BY/3S and 3S/BY segregants were more fit than the BY/BY or 3S/3S diploids in all environments
but one (i.e., heterozygote advantage, Figure 3.1G).
3.3.2. Mapping within interrelated families increases statistical power
Using quantile normalized fitness estimates from barcode sequencing, we mapped loci that
contribute to growth (Figure S3.6). Due to our experimental design, diploids generated from the
same haploid parent (families) are more genetically related than diploids generated from different
parents (Figure 3.1A). Such family structure causes false positives in genetic mapping. Here, we
found that most sites throughout the genome exceeded nominal significance thresholds when fixed
effects linear models were applied in a given environment (Figure S3.7). To enable mapping
despite the family structure, we used mixed effects linear models, which are commonly employed
in genetic mapping studies involving populations in which individuals show nonrandom
relatedness. Specifically, we used Factored Spectrally Transformed Linear Mixed Models (FaST-
LMM) (113, 114) to identify an average of 17 loci per environment (min=10, max=26).
Contrary to expectations that larger sample sizes should yield better statistical power, and
therefore, more detections, the numbers of loci identified here were comparable to studies that
were at least 60-fold smaller (89, 97, 105). There are multiple potential, non-mutually exclusive
explanations for this observation. For example, statistical controls for family structure may result
in a high rate of false negatives or the effects of loci may depend on genetic background. To bypass
86
these difficult to disentangle issues, each of which may impact statistical power, we performed
linkage mapping using an alternative strategy that did not require explicitly controlling for family
structure. Fixed effects linear models were conducted individually within each of 392 families of
diploids that descended from distinct MATa parents and consisted of ~600 individuals each.
The family-level scans yielded an average of 16.3 detections per family per environment
(Figures 3.2A and 3.2B and Figure S3.8), which were largely reproducible across the two
replicate glucose cultures (Figure 3.2C). Detections across the families were then consolidated
using 95% confidence intervals (i.e., all loci detected in different families that had overlapping
confidence intervals were considered the same locus) resulting in approximately 58.3 distinct loci
per environment (49 to 65), >2.5-fold more loci than detected by FaST-LMM (Figures 3.2A, 3.2B,
and 3.2D). These loci included ~85% of the loci detected using FaST-LMM, implying that the
results of family tests encompass those of conventional approaches using aggregate data. Loci
identified only in family tests had an average effect that was around one-third of loci detected by
both FaST-LMM and family tests (0.08 vs. 0.25; Figure 3.2E). This implies that despite smaller
sample sizes, mapping within families provided greater statistical power than mapping in the
aggregate data while controlling for relatedness.
87
Figure 3.2 Identification of loci that affect fitness. A-B. Loci mapped in CoCl2 (A) and CuSO4 (B). Panels
from top to bottom are 1) loci detected using the mixed effects linear model FaST-LMM (red bars), 2) loci
with dominance effects detected using a fixed effects linear model on the non-additive portion of each
diploid’s phenotype (green bars), 3) loci detected using family-tests (black or blue points), and 4) the total
88
number of detections across families for each 50 kb interval (grey bars). C. Violin plot showing the % of
loci that were detected in both glucose replicates for each family. D. Barplot of the number of loci detected
by family-tests (red) and FaST-LMM (blue). E. Family-level effect size (3S/3S - Het or Het - BY/BY) of
loci detected with FaST-LMM in CoCl2 (left panel) and family-tests (right panel). Colors indicate whether
an effect was detected (black) or undetected (gray) in a family-level scan. F. Examples of loci with only
additive effect (or low dominance), incomplete dominance, complete dominance, overdominance, and
underdominance. Black lines are the mean fitness of diploids subsetted by the genotype state at the focal
locus. Gray lines are the standard errors. Green lines are the expected mean fitness of heterozygotes
assuming no dominance. Genotype state at each locus is denoted by colored boxes: BY/BY (blue), 3S/3S
(orange), is BY/3S (half blue, half orange). Dominance and additive effects (blue and red bars, respectively)
for each subset of the data are shown next to the relevant genotype classes. The degree of dominance at a
locus is included in parentheses. G. Violin plot showing the degree of dominance for all loci detected in the
dominance scan. Loci with positive values are dominant towards the allele conferring higher fitness (green),
while loci with negative values are dominant towards the deleterious allele (red). All loci with degree of
dominance >100% or <-100% exhibit overdominance and underdominance, respectively.
3.3.3. Loci frequently show dominance effects
In diploids, non-additivity can arise due to dominance among alleles at the same locus,
epistasis between alleles at different loci, or a mixture of the two. To identify such non-additive
loci from the aggregate data, we extracted the non-additive portion of each diploid’s phenotype by
taking the residuals of a model fitting each diploid’s phenotype as a function of the mean estimated
phenotypes of the haploid parental strains it arose from (Figure S3.9) (105). Using these values
accounts for family structure and enables mapping of non-additive loci in the full segregant panel
with fixed effects linear models. Regarding dominance effects, we identified an average of 18 loci
showing dominance per environment (min=12, max=30). Only 45% of these loci were also
identified by FaST-LMM, while 82% of these loci were detected in the family-level scans. Among
the loci with dominance effects, the average degree of dominance was ~51% (i.e., heterozygotes’
fitnesses were roughly halfway between the average of the two homozygotes and one of the
89
homozygotes), with 82% of the loci showing incomplete dominance (Figures 3.2F and 3.2G).
Only ~7% of the loci exhibited complete dominance, while overdominance (~8%) and
underdominance (~3%) were seen among the remaining loci with dominance effects. ~77% of the
loci showed dominance towards the allele conferring higher fitness (Figure 3.2G), which may
explain why segregants were more fit than the BY/BY or 3S/3S diploids (Figure 3.1G).
3.3.4. Epistatic hubs govern both additivity and non-additivity
We also used the non-additive portion of phenotype to perform comprehensive genome-
wide scans for genetic interactions. We identified an average of 440 two-locus interactions per
environment (377 to 538) (Figure 3.3A and Figure S3.10). Our large sample size had a
pronounced impact on detection: ~40-fold more interactions per environment were detected than
previous studies that phenotyped smaller mapping populations using conventional approaches (14,
23). Our large sample size also enabled comprehensive scans for three-locus interactions with a
reduced set of markers, identifying an average of 6,152 per environment (4,845 to 7,301) (Figure
3.3A and Figure S3.11). Loci involved in three-locus interactions were identified across all
chromosomes and distributed widely throughout the genome.
90
91
Figure 3.3 Interactions often affect both the additive and dominance effects of involved loci. A.
Interaction plots of all two-locus (left) and three locus (right) effects for two representative environments.
Significant interactions between loci are shown as connecting lines. Green bars are the effect size of a locus,
calculated as the absolute difference between the mean fitness of diploids that are 3S/3S and BY/BY at the
focal locus. Orange bars are the number of interactions detected for each locus. B. Scatter plot of the
absolute effect size of a locus and the number of two-locus (left) and three-locus (right) interactions in
which it is involved. Local regressions are shown as blue lines. C. Scatter plot of the number of two-locus
and three-locus interactions per locus. D. Examples of genetic interactions with different fractions of
epistasis involving dominance. Black lines are the mean fitness of diploids subsetted by the genotype state
at the two involved loci. Gray lines are the standard errors. Green lines are the expected mean fitness of
heterozygotes assuming no dominance. Genotype state at each locus is denoted by colored boxes: BY/BY
(blue), 3S/3S (orange), is BY/3S (half blue, half orange). The first locus is the locus whose effect is being
modified, and the second locus is the modifier locus. Dominance and additive effects (blue and red bars,
respectively) for each subset of the data are shown next to the relevant genotype classes. E. Density plot of
the fraction of epistasis involving dominance for all interactions (red), hub--hub (yellow), non-hub--hub
(blue), hub--non-hub (green), and non-hub--non-hub interactions (purple).
We next analyzed the relationship between individual loci and their genetic interactions.
We found a strong positive relationship between the effect of a locus and its involvement in two-
and three-locus interactions (Figure 3.3B). This suggests that loci with larger effects tend to
genetically interact with many loci or that their interactions are easier to detect. We also observed
a clear linear relationship between the number of two- and three-locus interactions of a given locus
(Figure 3.3C). Notably, certain loci exhibited many more interactions than others, acting as ‘hubs’
(here defined as loci with >20 two-locus interactions in at least one environment) (23). On average,
~4.5 hubs were detected per environment, and the same hub was often detected in multiple
environments. A majority (>54%) of all two- and three-locus interactions involved at least one
hub. Fine-mapping localized the Chromosome VI, VIII, X, and XII hubs to genes involved in
amino acid sensing (PTR3), copper resistance (CUP1), vacuolar protein sorting (VPS70), and a
gene of unknown function (YLR257W), respectively.
92
3.3.5. Relationships between additivity and dominance in diploids
In haploids, epistasis can only influence the additive effect of a locus because there are no
heterozygotes. In diploids, however, epistasis can modify a locus’ additive effects, dominance
effects, or both additive and dominance effects (102–104). To better characterize how loci are
modified, each two-locus interaction was partitioned into additive and dominance components.
We found that changes in dominance account for ~44% of the average epistatic effect (Figures
3.3D and 3.3E), implying that interactions often affect both additivity and dominance. However,
this fraction varied depending on whether the modifying locus was a hub. When the modifier was
a hub, dominance accounted for little of the epistatic effect (11.9% on average), implying that hubs
mostly modify the additive component of the interacting loci. By comparison, when the modifier
was not a hub, epistasis was mostly composed of dominance (64% of interactions had a larger
dominance component). These data suggest that epistasis commonly involves modification of
dominance in diploids and that hubs act in a distinct manner from loci that are not hubs.
We next examined how the additive and dominance effects of hubs were modified by
genetic interactions. In most cases, hubs genetically interacted with a small number of major effect
modifiers and many minor effect modifiers (Figure 3.4A). The major effect modifiers typically
influenced only the additive or only the dominance effect of a hub, suggesting that distinct sets of
loci govern additive and dominance effect sizes (Figure 3.4A). Whereas the most frequent major
effect modifiers of the additive effects of hubs were other hubs (Figure 3.4B), the single most
frequent major effect modifier of the dominance effects of hubs was a locus on Chromosome III.
Collectively, multiple modifier loci could cause a hub locus to show a broad range of effect sizes
across different genetic backgrounds (Figure 3.4C).
93
Figure 3.4 Multiple modifier loci cause hubs to exhibit a range of effect sizes across different genetic
backgrounds. A. Specific examples of hubs on chromosome VI, X, and XII (each row) and their additive
(left) and dominance (right) modifiers. The height of the bar corresponds to the magnitude of the modifying
effect. The dotted red line shows the threshold in which loci were considered as major effect modifiers. B.
Barplot showing the total number of times loci were detected as a major effect modifier of additive (left)
or dominance (right) effects of hubs across environments. Colored dots indicate hub loci. An asterisk
indicates a non-hub locus on ChrIII. C. Additive (left) or dominance (right) effect size of chromosome VI
hub across different allelic combinations of its 4 largest effect modifiers. Red points are the effect size of a
genotype class based on the genotype state of the 4 modifiers. Black point is the overall effect size of the
locus. Black lines are bootstrapped 95% confidence intervals.
3.3.6. Characteristics of the ChrIII dominance modifier
Although not a hub, the Chromosome III locus nevertheless had a prominent impact on
phenotype by modifying the dominance effects of multiple variable effect loci. Interactions with
the Chromosome III locus had greater impacts on dominance than additivity at all focal loci. For
94
example, in hydrogen peroxide, dominance at the Chromosome X variable effect locus depended
on the Chromosome III locus, ranging from complete to nearly absent in a genotype-dependent
manner (Figure 3.5A). We delimited the Chromosome III locus to a 3 kb region containing the
mating locus and a few other genes (BUD5, TAF2, and YCR041W).
Yeast mating types possess different nonhomologous gene cassettes at the mating locus,
which encode distinct transcription factors that are master regulators of the MATa, MATα, and
diploid transcriptional programs (115). This region of the genome is unique because four genotype
classes segregate (BY MATa, 3S MATa, 3S MATα, and BY MATα), and as a result, the two
heterozygotes are not identical (Figure S3.12). To test if the mating locus is the dominance
modifier, we partitioned Chromosome III heterozygotes based on their parents-of-origin for the
MATa and MATα cassettes and found a difference: dominance was only visible in the 3S MATa /
BY MATα genotype class (Figure 3.5A). Other hub loci modified by Chromosome III showed the
same relationship between dominance and the parent-of-origin of the mating loci (Figure 3.5B).
These results suggest that BY and 3S harbor functional differences in one or both mating cassettes.
95
Figure 3.5 Parent-of-origin of the mating locus influences dominance at hubs. A. Violin plots of the
fitness distribution of diploids split by the genotype at the Chromosome X locus (top), further split by the
genotype at the mating locus on Chromosome III (middle), and the parent-of-origin at the mating locus
(bottom). Genotype state at each locus is denoted by colored boxes: BY/BY (blue), 3S/3S (orange), is
BY/3S (half blue, half orange). Lines are the observed mean fitness in the homozygous genotype classes
(gray), the observed mean fitness in heterozygous genotype classes (red), and the expected heterozygous
fitness if there was no dominance (blue). B. Violin plots of the fitness distribution of diploids split by the
genotype state of a hub locus and the parent-of-origin of the mating locus.
96
3.4. Discussion
We used a double barcoding system to generate and phenotype an extremely large panel of
diploid yeast segregants that can be partitioned into hundreds of interrelated families. This
experimental design enabled the detection of thousands of loci, including at least an order of
magnitude more genetic interactions than discovered in previous yeast crosses. Analysis of these
epistatic loci identified a modest number of hubs that have large effects, show pervasive epistasis,
and control most phenotypic variation across environments, as well as many other loci that
genetically interact with these hubs.
Genetic background commonly modified the magnitude of, or completely masked, the
effects of the hubs, indicating that the largest effect loci identified in mapping studies are highly
sensitive to genetic background. Such non-additive genetic background effects are likely to hamper
efforts to predict phenotype from genotype by limiting the extrapolation of effect estimates from
one genetic context to others. However, our finding that large effect loci were most impacted by
other major effect loci does provide some optimism that characterizing a limited set of interactions
may account for a substantial portion of these genetic background effects.
Because our experiments were performed in outbred diploids rather than haploids or inbred
diploids, we could detect dominance effects and whether dominance is modified by epistasis. We
showed that dominance effects are common and that the magnitude of dominance can strongly
depend on the alleles of interacting loci. The potential existence of dominance modifiers has been
discussed in theory, but to date, only a single dominance modifier has been found in a plant self-
incompatibility locus (116, 117). Our results show that dominance modifiers are prevalent and
raise the intriguing possibility that sites with atypical allele dynamics within natural populations,
97
the yeast mating locus here and a self-incompatibility locus in plants, are more likely to harbor
dominance modifiers with major effects.
Generally, we found that heritable traits in yeast are more genetically complex than
formerly appreciated. Relative to the cross that we examined, natural populations may harbor
substantially higher genetic diversity, meaning traits could be even more complex and difficult to
dissect. Our work supports the premise that, to the extent possible, focusing on groups of more
closely related individuals, such as the families studied here, can enhance statistical power and
precision relative to populations with greater diversity(105, 106, 118). The genetic insights gained
from these more closely related groups can then be leveraged to inform the genetic architecture of
traits in more diverse populations in which many critical genetic effects may otherwise be
obscured.
98
3.5. Methods
3.5.1. Generation of haploid segregants
All haploid segregants and diploid segregants described in this paper were generated from
a cross using two isolates of Saccharomyces cerevisiae, the commonly used lab strain BY4716
(‘BY’) and a haploid derivative of the clinical isolate 322134S (‘3S’). To generate counter
selectable markers in each strain, we first introduced clean deletions of FCY1 and URA3. Each
gene was deleted using a two step approach: first, genes were replaced with a KanMX cassette
(119)
via lithium acetate transformation (120). Next, the cassette was targeted using CRISPR/Cas9
and a gRNA specific to the pTEF promoter region in each cassette
(121). A repair template
homologous to the upstream and downstream region of the target gene was co-transformed with
the CRISPR system to generate a clean deletion. The MATa BY and 3S strains were then mating-
type-switched by transforming a plasmid containing galactose-inducible HO and URA3 using the
lithium acetate protocol (122). Strains with the plasmid were selected on SCM-Ura plates and
inoculated into SCM-Ura + 2% galactose media overnight. Individual colonies were then obtained
by plating out 10
2
cells onto YPD plates. Colonies were tested for their mating type using the yeast
mating halo assay (123). Successfully mating-type-switched BY and 3S MATα clones were cured
of the HO plasmid by growing the cells on YPD + 5-FOA plates.
To prevent aggregation of cells in pools grown in liquid, FLO8, a transcriptional activator
of many flocculins (124), and FLO11, the flocculin responsible for many aggregation phenotypes
in S. cerevisiae (125), were first knocked out in both mating types of the BY fcy1∆ ura3∆ and 3S
fcy1∆ ura3∆ strains. Each parental strain was then engineered to have a genomic ‘landing pad’
(108–110) containing two partially crippled LoxP sites (126), Lox71 and Lox2272/71, and a
galactose-inducible Cre recombinase (127) at the YBR209W locus via CRISPR/Cas9-mediated
99
homologous recombination (Figure S3.1A). Earlier studies have shown that deletion of YBR209W
or incorporation of our barcoding system has no effect on fitness (108).
Opposite mating types of BY fcy1Δ flo8Δ flo11Δ ura3Δ ho YBR209W::pGal1-Cre - Lox71
- Lox2272/71 and 3S fcy1Δ flo8Δ flo11Δ ura3Δ ho YBR209W::pGal1-Cre - Lox71 - Lox2272/71
were next mated, creating two BY/3S heterozygous diploids. Diploids were sporulated and >500
tetrads were dissected from each diploid. Performing these two crosses would, in theory, enable
us to achieve ~50% allele frequency at all sites, including the mating type locus (Figure S3.2).
Either one MATa segregant or one MATα segregant was then randomly selected from each tetrad
(a total of >500 each) to maximize the number of unique recombination breakpoints. To avoid
segregants carrying aneuploidies, only tetrads that produced 4 spores were utilized.
3.5.2. Barcoding of haploid segregants
Segregants were uniquely barcoded using two different methods (Figures S3.1A through
S3.1D). MATα segregants were barcoded by integrating a randomly barcoded plasmid via Cre-
mediated homologous recombination at lox2272/71 (Figure S3.1C). Barcoded plasmids were
made by modifying the pBAR6 plasmid (109). First, pBAR6 was digested with KpnI and EcoRI
(Figure S3.1B). Linearized pBAR6 was then assembled by Gibson assembly with a PCR product
containing a Lox2272/66 site, a random 20-mer barcode sequence, and a partial TruSeq read 2
adapter sequence. The resulting product was transformed into chemically competent NEB 10-beta
cells using standard heat shock protocol, and transformants were selected on LB + 50ug/ml
Carbenicillin plates. To ensure high barcode complexity, ~100,000 transformants were scraped
and pooled prior to plasmid extraction. Purified barcoded plasmids (300 ng) were then transformed
into the yeast cells using the lithium acetate protocol
(120). After transformation, the barcoded
plasmids were recombined into the yeast genome by inducing the galactose-inducible Cre
100
recombinase by growing the yeast cells in YP + 2% galactose media. Homologous recombination
between the two partially crippled Lox2272 variants, Lox2272/66 and Lox2272/71, resulted in the
formation of a fully crippled Lox2272/66/71 and a fully functional Lox2272. Transformants with
successful integration were selected on YPD + 200 ug/ml G418 agar plates.
MATa segregants were barcoded by CRISPR/Cpf1-driven
(128) homologous
recombination at the genomic landing pad (Figure S3.1D). The genomic region containing the
two partially crippled Lox sites, Lox71 and Lox2272/71, was replaced with a DNA fragment
containing (in the following order) a 60 bp sequence homologous to the region upstream of the
Lox sites, an HphMX cassette
(129), the 5’ end of a split URA3 marker, a 5’ artificial intron splice
site, a partial TruSeq read 1 adapter sequence, a random 20-mer barcode sequence, a partially
crippled Lox66 site, and a 60 bp sequence homologous to the region downstream of the Lox sites.
To integrate the randomly barcoded DNA fragment in the yeast genome, 200 ng of the DNA
fragment, 200 ng of a PCR amplicon containing pTEF-CPF1 flanked by 2 nuclear localization
sequences, and 200 ng of a PCR amplicon containing a polyA-tailed pSNR52-20-mer guide
sequence were co-transformed into the yeast cells using lithium acetate
(120). Transformants with
successful integration were then selected on YPD + 300 ug/ml Hygromycin B agar plates.
For each segregant transformation, we chose ~5 colonies that presumably represented
independent integration events and barcodes. In addition to the segregants, both mating types of
the parental strains were barcoded in the same manner as the segregants and ~50 colonies were
picked for each parental strain.
3.5.3. Whole genome sequencing of parental haploid segregants
Clones containing different barcodes for 1,003 MATα and 500 MATa segregants were
pooled and these pools were whole-genome sequenced at low-coverage (~10x) to determine where
101
crossover events occurred. A sequencing library was prepared using the Illumina Nextera Kit with
custom multiplexing barcodes
(110). Libraries from different segregants were pooled in equimolar
fractions and these multiplex pools were size selected using the Qiagen Gel Extraction Kit.
Multiplexed samples were then sequenced on an Illumina HiSeq 2500 using 150 bp × 150 bp
paired-end reads. For each strain, reads were mapped against the S288c genome using BWA with
default settings
(82). Alignments were converted to a bam format and sorted using SAMTOOLS
(default settings)
(83). Read duplicates were then removed and bam files were converted to pileups
using SAMTOOLS (default settings). Base calls and coverage values were obtained from the
pileup files for 43,865 high-confidence SNPs that segregate in the BYx3S cross. Any segregants
that showed signs of cross-contamination, where the allele frequency at the SNP markers
significantly deviated from the expected 0% or 100% 3S allele, were excluded from further
analysis. To avoid segregants with aneuploidy, all segregants whose average coverage of each
individual chromosome or segments of the chromosome significantly deviated from the overall
average coverage were also removed. Additionally, all segregants with a mean coverage of less
than 2 were removed. In total, 76 MATα and 11 MATa segregants were removed after filtering for
quality. For the remaining 927 MATa and 489 MATα segregants, a vector containing the fraction
of 3S calls at each SNP was generated and used to make initial genotype calls with sites above and
below 50% classified as 3S and BY, respectively. This vector of initial genotype calls was then
corrected with a Hidden Markov Model (HMM), implemented using the HMM package version
1.0 in R
(130, 131). We used the following transition and emission probability matrices: transProbs
= matrix(c(0.9999, 0.0001, 0.0001, 0.9999), emissionProbs = matrix(c(.0.25, 0.75, 0.75, 0.25). All
SNP markers within the first and last 30 kb of each chromosome were omitted because we
observed higher sequencing error rate and/or lower read mapping quality for these specific regions.
102
Adjacent SNPs in the HMM-corrected genotype calls that lack recombination in the segregants
were collapsed into a single SNP, reducing the number of SNP markers in subsequent analysis
from 43,865 to 7,742.
3.5.4. Determining barcode sequences
For all segregants, the associated barcodes were determined by pooling all clones and
sequencing the genomic region containing the barcode using Novogene Illumina HiSeq 2500
150 bp × 150 bp paired-end reads at ~2,000x coverage per barcode. For the MATa segregants, the
genomic region containing the barcode was PCR amplified using primer pairs where one primer
was located on the partial TruSeq read 1 adapter sequence and the other primer was located
downstream of the barcode. Similarly, barcodes for MATα segregants were PCR amplified using
primer pairs where one primer was located on the partial TruSeq read 2 adapter sequence and the
other primer was located upstream of the barcode. PCR products were purified using a Qiagen
MinElute PCR purification kit, amplified using Illumina P1 and P2 primers, and then size selected
via gel extraction prior to sequencing. The 20-mer barcode sequences for each segregant were then
extracted from the sequencing reads and barcodes within a Hamming distance of 2 were clustered
with Bartender
(111). Only barcode clusters comprising >5% of the total reads for each sample
were considered true barcodes. One possible concern with using random barcodes is that
sequencing errors from an abundant barcode could erroneously contribute to read counts of a
barcode with a similar sequence. To prevent this, we determined the number of mismatches away
from a nearest neighbor for each barcode. All barcodes were at least 4 mismatches away from each
other. Thus, a sequencing read was unlikely to be assigned to the wrong barcode cluster unless it
contained 3 or more errors. Overall, we recovered 403 MATa segregants and 679 MATα segregants
with good genotype calls and at least 3 barcodes that are unique from all others.
103
3.5.5. Generation of diploid segregants
400 MATa and 600 MATα segregants were mated to create a panel of ~240,000 diploid
segregants, each labeled with 4 unique pairs of barcodes (~960,000 total double barcodes). To
minimize skews in the initial frequency distribution of genotypes, each MATa segregant was
individually mated to each MATα segregant to generate the panel of diploid segregants.
Specifically, two barcoded versions of each segregant were first grown to saturation and mixed in
equal proportion. Each MATa segregant mixture was then systematically mated with a MATα
segregant mixture using a LabCyte Echo 550 acoustic liquid handling robot. For each pairwise
mating, the Echo transferred 100 nL of MATa and MATα segregants onto overlapping positions on
a new YPD plate, resulting in the formation of diploid segregants labeled with ~4 double barcodes.
After growing the cells overnight at 30°C, successfully mated diploids were isolated from unmated
haploids by pinning the colonies onto YPD + 200 ug/ml G418 + 300 ug/ml Hygromycin B agar
plates using Singer ROTOR pinning robot and grown for 2 days. Haploid strains carried only one
of either KanMX or HphMX, therefore only mated diploids that carry both selection markers should
grow. In addition to the diploid segregants, different barcoded versions of the homozygous and
heterozygous parental diploid strains were made as controls, using the same approaches described
for segregants (total of 81 unique pairs of barcodes per parental diploid strain).
3.5.6. Translocating barcodes onto the same chromosome
After mating, the two barcodes are located on different chromosomes (Figure S3.1E). To
bring the barcodes onto the same chromosome, site-directed chromosomal translocation was
induced via Cre-LoxP-mediated site-specific recombination. To do this, mated diploid colonies
were pinned onto YP + 2% galactose plates and grown for 2 days. Presence of galactose induces
the expression of the galactose-inducible Cre recombinase, causing Cre-mediated homologous
104
recombination at the LoxP site and translocation of the chromosomes, bringing the two barcodes
to close proximity. In addition to the barcodes, chromosomal translocation brings the two halves
of the split URA3 marker onto the same chromosome, resulting in the reconstitution of a functional
URA3 marker. Diploids that have undergone successful recombination were selected by pinning
the diploid colonies onto SCM -Ura agar plates and growing for 2 days. To minimize bias due to
differences in growth rate between diploids, the colonies were then pinned onto fresh SCM -Ura
plates and pooled immediately with SC -Ura media. The pooled sample of ~960,000 double
barcoded segregants was then spun down, re-suspended in SC -Ura media at a concentration of
2x10
9
cells/ml, and frozen in 50% glycerol for later use. In parallel, the BY/BY, 3S/3S, BY/3S,
and 3S/BY parental diploids were generated in the same manner and stored at the same
concentration as the segregants.
3.5.7. Growth Assays
The panel of diploid segregants was evolved for 15 generations by serial batch culture
under carbon limitation in 100 ml of SC -Ura media with 4% ammonium sulfate and 2% dextrose.
First, 2 ml of the segregant frozen stock (2x10
9
cells/ml) was inoculated in 198 ml of SC -Ura
media and then grown in a 500 ml flask at 30°C with 300 rpm shaking for 48 hours. At saturation,
the cell concentration was 1.5x10
8
cells/ml for a total of 3x10
10
cells, or ~3x10
4
cells per double
barcode. In parallel, 10 ul of the parental frozen stocks were inoculated in 990 ul of SCM-Ura
media and grown at 30°C with 300 rpm for 48 hours. The parental diploids were then spiked into
the segregant culture at a concentration of 10
-5
. This mixed culture served as the seed culture (time
point 0 or T0) for subsequent growth time points.
Next, one-eighth or 12.5 ml of the T0 culture (~3.75x10
3
cells per double barcode at
bottleneck) was transferred into eight 500 ml flasks, each containing 87.5 ml of SC -Ura media
105
supplemented with different drugs or chemicals (Table S3.1). Two flasks contained no supplement
(Glucose 1 and Glucose 2) and served as growth replicates for the experiment. The remaining 75
ml of T0 culture was spun down at 3,000 rpm and frozen down for subsequent DNA library
preparation. Each culture was grown in serial batch conditions for 15 generations, bottlenecking
every ~3 generations. For each transfer, cultures were grown at 30°C with 300rpm shaking for 48
hours until saturation. One-eighth (12.5 ml) of the culture was then transferred into new 500 ml
flasks, each containing 87.5 ml of the appropriate media. The remaining 87.5 ml of culture was
frozen down for subsequent DNA library preparation. At each transfer, the number of cells was
counted using a hemocytometer. Contamination checks for bacteria or other non-yeast microbes
were performed regularly by observing the cultures under a microscope.
3.5.8. Library preparation
Frozen cultures were thawed and DNA was extracted for library preparation using the
MasterPure Yeast DNA Purification kit. Any remaining RNA was removed by adding RNaseA
(10 mg/ml) and incubating the sample at 37°C for one hour. DNA was then cleaned by adding one
volume of phenol:chloroform:isoamyl alcohol (25:24:1). The sample was gently mixed using a
tube rotator at 30 rpm for 10 min and then centrifuged at 16,000g for 10 min. The upper aqueous
layer was transferred to a new tube and cleaned of phenol by adding one volume of
chloroform:isoamyl alcohol (24:1). The sample was again mixed at 30 rpm for 10 min, centrifuged
at 16,000g for 10 min, and the upper aqueous layer was transferred to a new tube. Finally, to
remove residual chloroform, the DNA sample was ethanol precipitated using one-tenth volume of
3 M sodium acetate (pH 5.5) and 2.5 volume of 100% ethanol. The resulting DNA pellet was
washed with ice-cold 70% ethanol twice, and dissolved in 10 mM Tris-HCl pH8.0. After the DNA
was extracted and cleaned, a two-step PCR was used to amplify the double barcodes for
106
sequencing. Because only a small fraction of the total genome contains relevant information (∼250
bp amplicon out of a 12 Mb genome size), we amplified 30 µg of template per time point sample,
which corresponds to ∼2x10
9
genomes or ∼2000 copies per double barcode.
First, a 4-cycle PCR with OneTaq polymerase (New England Biolabs) was performed in
60 wells of 96-well PCR plates, with ∼500 ng of DNA template, 4.5 mM MgCl2, 3% DMSO, and
125 µL total volume per well. Primers for this reaction were:
AATGATACGGCGACCACCGAGATCTACACNXXXXXNNACACTCTTTCCCTAC
ACGACGCTCTT
and
CAAGCAGAAGACGGCATACGAGATNXXXXXNNGTGACTGGAGTTCAGACGT
GTGCTCTTCCGATCT
The Ns in these sequences correspond to any random nucleotide and are used as unique
molecular identifiers (UMIs) in downstream analysis to identify PCR duplicates. The Xs
correspond to one of several multiplexing tags, which were used to distinguish different samples
when loaded on the same sequencing flow cell. Multiplexing tags were designed to have a
Levenshtein distance of 3 from each other, such that reads with 1 or less sequencing errors in the
multiplexing tag can be correctly assigned to the correct sample.
After amplification, 16 wells of PCR products were pooled (4 pools total), run through a
NucleoSpin PCR cleanup column, and eluted in 120 μL of elution buffer. The pooled PCR
107
products were then cleaned for the second time to remove any residual primers using a NucleoSpin
column and eluted in 120 μL of elution buffer. A second 22-cycle PCR was performed with
PrimeStar HS DNA polymerase (Takara) in 16 reaction tubes, with 30 μl of cleaned product from
the first PCR as template and 125 μL total volume per tube. Illumina primers P1 and P2 were used
for this reaction.
PCR products from all reaction tubes were concentrated into 100 ul using ethanol
precipitation. The appropriate PCR band (264 bp) was isolated by agarose gel electrophoresis and
quantified by Bioanalyzer (Agilent) and Qubit fluorometer (Life Technologies).
3.5.9. Barcode sequencing
Sequencing (150 bp paired-end) was performed on an Illumina HiSeq 4000 or NovaSeq
6000. Each flow cell contained at least 2 multiplexed time points and 25% genomic DNA or PhiX.
Genomic DNA/PhiX was included to increase the read complexity for proper calibration of the
instrument since most of the bases in each barcode read are fixed. Sequencing reads were analyzed
using custom written code in Python and R, which are available on GitHub. Reads were sorted by
their multiplexing tags (the Xs in the primers above) and removed if they failed to pass either of
two quality filters: 1) The average Illumina quality score for the double barcode insert region must
be greater than 30, and 2) the double barcode region must contain the fixed sequence
ATAACTTCGTATAATGTATGCTAT with less than 2 mismatches. After filtering, MATa and
MATα barcodes were extracted from the sequencing reads and fused into one double barcode.
We next used Bartender to cluster similar sequences into consensus double barcodes, with
the maximum allowable sequence distance (-d) set to 2 (111). One source of bias in these counts
that we wanted to avoid is PCR duplicates or other non-linearities between the amount of template
for a double barcode and the number of sequences observed for that double barcode. We removed
108
these errors by using UMIs (the Ns in the primers above). Specifically, 2 random 3-mers were
attached to each template DNA molecule in the first few rounds of PCR (see section ‘Library
preparation’ for more detail). Because the total sequence space of 2 random 3-mers (4^6 = 4,096
possibilities) is much larger than the target coverage for each double barcode (~100x), it is unlikely
that any two template DNA molecules from the same time point that contain the same double
barcode will be attached to identical pairs of 3-mers. Thus, sequence reads for a double barcode
that contained the same pair of 3-mers were counted as PCR duplicates and removed from our
final counts.
After the double barcode clusters were counted and filtered, only double barcode clusters
carrying the previously determined MATa and MATα segregant and parental barcodes were
extracted. All double barcodes with an average read count of <5 across the 4-5 time points were
removed, as accurately estimating fitness for samples with very low read counts is difficult
(111).
In total, ~227,999 unique genotypes with an average of ~2.82 barcode replicates per genotype were
detected across conditions (Table S3.4). These numbers are lower than the theoretical number of
double barcodes, 240,000 diploids x 4 biological replicates = 960,000. This discrepancy may be
due to segregants failing to mate during the generation of diploid segregants or being lost during
rearraying procedures.
3.5.10. Counting double barcode reads
Previous studies have shown that PCR amplification of DNA sequences containing two
variable regions (for this study the MATa and MATα barcodes) separated by a fixed region can
lead to formation of undesired chimeric molecules due to template switching
(109, 110). This can
result in erroneous double barcode counts and result errors in fitness estimation.
109
To identify PCR chimeras in each sample, we pooled data from all 4-5 time points and
counted the number reads that contain each combination of a parental barcode and a haploid
segregant barcode. Because the parental strains were never mated to the haploid segregants, these
parental-segregant double barcodes must be due to a PCR chimera. We next counted the number
of times each single barcode is present in the entire data set regardless of the barcode it is paired
with. We then fit a linear model between the count of each PCR chimera and the product of these
total count of each single barcode within the pool: # of copies of PCR chimera ~ # of copies of
barcode1 x # of copies of barcode2. Because template switching occurs randomly, we expect to
see a linear relationship between the number of PCR chimeras and the abundance of the involved
barcodes (average R
2
≈ 0.623 across environments) (Table S3.5). Using this linear model, the
expected number of PCR chimeras for each double barcode was calculated. The expected number
of PCR chimeras were subtracted from the actual read counts to calculate the corrected counts. We
estimate that the PCR chimera rate is ~0.11 across environments (Table S3.5). Corrected counts
less than 0 were set to 0. All double barcodes with an average corrected read count of <5 across
the 4-5 time points were removed.
3.5.11. Fitness estimation
The corrected read counts from the 4-5 time points were used to estimate fitness for each
double barcode using a maximum likelihood algorithm PyFitSeq using default settings
(112).
Fitness estimates with low maximum likelihood scores were removed, as these fitness estimates
are most likely technical artifacts. Outliers with low maximum likelihood scores were defined as
data scores that are more than 1.5 interquartile range below the first quartile. Additionally, for
strains with three or more replicates, outlier replicates with significantly different fitness estimates
were removed, as these are most likely due to spontaneous mutations that significantly alter fitness.
110
Outlier replicates were determined by examining all pairwise differences in fitness estimates
between barcode replicates for the same strain. Based on this distribution, outlier replicates were
defined as replicates whose fitness is more than 0.5 fitness units different from the other replicates.
For strains with only two replicates, if the fitness estimate difference between the two replicates
was greater than 0.5, both replicates were removed. Finally, all strains with only 1 replicate were
omitted from further analysis as we have no way to assess the quality of the fitness estimate.
Overall, fitness estimates for ~548,046 (~190,604 genotypes with average of ~2.88 replicates per
genotype) double barcodes per environment were used for the remaining downstream analysis
(Table S3.2).
3.5.12. Heritability estimates
Broad-sense heritability was estimated using the reproducibility of phenotype across
replicates. The linear model, fitness ~ genotype was used, where genotype is a categorical value
corresponding to the 2 to 4 biological replicates for each diploid segregant. Broad-sense
heritability was calculated by taking the sum of squares of genotype and dividing it by the total
sum of squares. Because large memory overhead is required to calculate broad-sense heritability
for the entire dataset, we calculated 1,000 broad-sense heritability estimates for a smaller subset
of randomly selected 2,500 genotypes. The reported broad-sense heritability and standard errors
are the mean value and the standard error across the 1,000 tests.
Variances explained by additive, dominance, and epistasis effects were estimated using the
‘sommer’ package in R
(132). The additive, dominance, and epistasis relationship matrices were
calculated using the A.mat(), D.mat(), and E.mat() functions, respectively. Variances explained by
additive, dominance, and epistasis were then estimated by dividing the variance of the respective
components by the total variance. Similar to the broad-sense heritability estimates, we calculated
111
1,000 variance estimates for a smaller subset of randomly selected 2,500 genotypes. The reported
variance estimates and standard errors are the mean value and the standard error across the 1,000
tests.
3.5.13. Quantile normalization of fitness
For each genotype in each condition, the average fitness estimate across all barcode
replicates was calculated. Because the distribution of the average fitness estimates was slightly left
skewed, average fitness estimates were quantile normalized such that the data is normally
distributed using the ‘bestNormalize’ R package (Figure S3.6)
(133). This quantile normalized
fitness estimate was used as the fitness phenotype for all downstream analysis.
3.5.14. Genome-wide scan for one-locus effects
Our experimental design resulted in genotypes that are not equally related to each other.
For example, two diploid segregants that share a parental haploid segregant are more related to
each other than two diploids that do not share a parental haploid segregant. We found differences
in relatedness had a large effect on the fitness of the diploid segregants. Thus, failing to account
for family structure could inflate type I error and result in false positives. To account for these
potential errors, we detected one-locus effect loci using FaST-LMM
(113, 114), which runs a
mixed effects linear model (MLM) with a spectrally decomposed genetic similarity matrix
(GSM).
We used FaST-LMM’s single_snp function to perform a genome-wide linear mixed effect
analysis. The genotype table was reformatted as a binary biallelic genotype table (BED) prior to
running FaST-LMM using PLINK version 1.07 (134). To avoid proximal contamination, all SNP
markers that are located on the same chromosome as the SNP marker that is being tested were
omitted from the calculation of the GSM by setting the leave_out_one_chrom function as TRUE.
112
To determine appropriate significance thresholds, 1,000 permutations were conducted with the
correspondence between genotypes and phenotypes randomly shuffled each time. Among the
minimum p values obtained in the permutations, the fifth quantile was identified and used as the
threshold for determining significant loci.
To detect loci that are closely linked to a nearby locus, we ran multiple rounds of stepwise
forward regression in FaST-LMM. For the second round of forward regression, we included the
most significant SNP marker from each chromosome that was above the significance threshold
from the initial genome-wide scan as a covariate in the mixed effect linear model. In the third
round, all SNP markers detected in the first two rounds were added as covariates in the mixed
effect model. This process was repeated until no significant SNP markers were detected. If the
detected SNP markers were within a 95% confidence interval of each other, the loci were
consolidated and the SNP marker with the most significant p-value was used.
3.5.15. Family-level scans for one-locus effects
We also performed scans for one-locus effects within families generated from the same
MATa haploid parental strain. Results from families generated from the same MATα haploid
parental stains were not used as the difference in sample size (~600 diploids in MATa families vs
~400 diploids in MATα families) had a huge impact on the statistical power to detect loci. Because
all genetic differences between diploids within a family are random, they do not need to be
corrected for family structure prior to mapping. Genome-wide mapping was conducted in each
family using a fixed effects linear model in R using the lm() function: lm(fitness ~ locus). The
fitness term corresponded to the vector of quantile normalized fitness values of individuals within
a family and the locus term corresponded to the vector of these individuals’ genotypes at a given
marker. Although there are three possible allele states at a locus in the aggregate population
113
(BY/BY, BY/3S, 3S/3S), a family of individuals derived from the same MATa haploid possesses
only two possible genotype states: BY/BY and BY/3S or BY/3S and 3S/3S, depending on the
allele present in the MATa progenitor. The locus term was included in each model as a categorical
variable, rather than numeric, such that the possible genotype states are treated as two independent
categories. The p-value of the locus term was obtained using the summary.aov() in R. Among
markers exceeding a significance threshold, the most significant marker from each chromosome
was identified and utilized in subsequent rounds of forward regression. Specifically, identified loci
were included in the fixed effects linear model as a covariate, e.g. fitness ~ known_locus1 +
known_locus2 + … known_locusN + locus. This process was repeated until no additional locus
terms were discovered within an environment. The significance threshold for these forward scans
was established by pooling the p values from 1,000 permutations of 10 randomly selected families.
In each permutation, the correspondence between genotype and phenotype was shuffled within a
family. From the resulting distribution of minimum p-values obtained in each permutation, the
fifth quantile was identified and used as the significance threshold. After calling peaks, loci
detected in multiple families were consolidated using their 95% confidence intervals within each
family.
3.5.16. Effect size of loci across different families
The effect size of a locus detected in the family-level scans was calculated separately in
each individual family. There are three possible genotype states at a locus in the aggregate
population (BY/BY, BY/3S, 3S/3S). However, because each family is derived from a particular
MATa segregant, which carries specific alleles, there are only two possible genotype states at a
locus within a family. In each family, the mean fitness of diploids with either genotype state was
calculated. If the genotypes at a locus were 3S/3S and BY/3S, the mean fitness of heterozygotes
114
(one 3S allele) was subtracted from the mean phenotype of the homozygotes (two 3S alleles). If
the genotype states present at a locus were BY/BY and BY/3S, the mean fitness of the
homozygotes (zero 3S allele) was subtracted from the heterozygotes (one 3S allele). Two different
formulas were used such that the calculated effect size is always measuring the change in fitness
when the number of 3S allele possessed decreases by one, and thus retaining comparability across
families.
3.5.17. Correcting for family structure
For detection of dominance effects and interactions, family structure was corrected using
a previously described strategy
(105). Specifically, we first estimated the fitnesses of each parental
haploid segregant by calculating the mean fitness of all diploids that originated from each parental
strain (additive genetic background contribution to fitness). Next, the expected midparent value
for each diploid was obtained by taking the average of the 2 parental fitnesses. If phenotypic
variation is due to only additive effects, then we expect the fitness of a diploid to be the same as
its midparent value. Deviations from the midparent value were then determined by fitting a linear
model between fitness and the midparent value, fitness ~ midparent value, and taking the residuals
(hereafter referred to as ‘residuals’). Not only does this correct out the additive portion of each
diploid’s phenotype, it also effectively accounts for family structure
(105). These residuals, or the
non-additive portions of each diploid’s phenotype, were used as phenotype for the subsequent
scans for loci involved in non-additivity, such as dominance and genetic interactions.
3.5.18. Genome-wide scans for loci with dominance
To detect loci with one-locus non-additive effects (e.g. dominance), we ran multiple rounds
of forward regression. In the first round, a genome-wide scan was conducted using the following
fixed effects linear model: residuals ~ locus, where locus is the genotype of the diploids at a given
115
SNP marker encoded as a categorical variable. Here, the significance of the locus term was tested.
The significance threshold was determined using 1,000 permutations where the correspondence
between genotypes and residuals randomly shuffled each time. Among the minimum p values
obtained in the permutations, the fifth quantile was identified and used as the threshold for
determining significant loci. The most significant SNP marker from each chromosome that was
above the significance threshold was identified as a significant locus. For subsequent rounds of
forward regression, identified loci were included in the fixed effects linear model as a covariate,
e.g. residuals ~ known_locus1 + known_locus2 + … known_locusN + locus. The most significant
SNP marker from each chromosome that was above the significance threshold was again identified
as a significant locus. This process was repeated until no additional locus terms were discovered
for an environment.
3.5.19. Calculating degree of dominance
For each locus detected in the scan for dominance effects, the magnitude of dominance was
calculated in the following manner. First, the data was subsetted based on the genotype state at the
focal locus (e.g. BY/BY, BY/3S, 3S/3S). Then, the mean fitness value for each genotype class was
calculated. Dominance effect sizes for each focal locus were calculated by subtracting the mean
fitness of diploids that are BY/3S at the focal locus from the average of the mean fitness of diploids
that are BY/BY and 3S/3S at the focal locus (i.e., the additive expectation for heterozygotes).
Dominance effect sizes were then normalized based on the absolute additive effect sizes of the 3S
allele relative to the BY allele at the same loci. Additive effect sizes of the 3S allele were calculated
by subtracting the mean fitness of diploids that are BY/BY at the focal locus from the mean fitness
of diploids that are 3S/3S at the focal locus and dividing by two. To get the normalized dominance
effect size, dominance effect size was divided by the absolute additive effect size of the 3S allele.
116
Normalized dominance values ranging from -0.9 to 0 were classified as incomplete dominance for
the deleterious allele, while values ranging from 0 to 0.9 were classified as incomplete dominance
for the allele conferring higher fitness. Values that were between -0.9 to -1.1 or 0.9 to 1.1 were
classified as complete dominance for the deleterious and fit allele, respectively. Values less than -
1.1 or greater than 1.1 were classified as underdominant or overdominant, respectively.
3.5.20. Comprehensive scan for pairwise interactions
We conducted a genome-wide scan for pairwise interactions using the following linear
fixed effect model: residuals ~ locus1 + locus2 + locus1:locus2. Here, the significance of the
locus1:locus2 interaction term was tested. Simpler terms were included in each model to ensure
that variances due to one-locus non-additive effects (e.g. dominance) were not erroneously
attributed to more complex terms. All locus terms were treated as categorical values, including the
locus1:locus2 interaction term, such that all 9 genotype classes (BY/BY--BY/BY, BY/BY--
BY/3S, BY/BY--3S/3S, BY/3S--BY/BY, BY/3S--BY/3S, BY/3S--3S/3S, 3S/3S--BY/BY, 3S/3S-
-BY/3S, 3S/3S--3S/3S) are treated as independent categories rather than continuous numerical
values. The significance threshold was determined by conducting 1,000 permutations with the
correspondence between genotype and residuals shuffled each time. Among the minimum p-
values obtained in each permutation, the fifth quantile was identified and used as the significance
threshold for our comprehensive scan of all two-loci interactions. To reduce computational time,
10,000 random pairs of loci were randomly selected for each permutation rather than all possible
pairs of loci. All significant pairs of loci where both loci were within a 95% confidence interval of
each other were consolidated: for each set of overlapping loci, the SNP marker with the most
significant p-value was used for downstream analysis while less significant markers were recorded
but treated as the same genetic effect and not used downstream.
117
During our comprehensive scan for pairwise interactions, we detected several hubs that
were involved in large numbers of interactions. These interactions often spanned the entire length
of the genome, making it difficult to differentiate sites interacting with these hubs. In such cases,
a forward regression approach was conducted. Specifically, we first ran the same fixed effects
linear model as the comprehensive scan for pairwise interactions, but with locus1 fixed for a hub
locus, e.g. residuals ~ hub + locus2 + hub:locus2. The significance of the hub:locus2 interaction
term was tested. Significance thresholds were determined using the same permutation strategy as
the comprehensive scan. The most significant interacting locus from each chromosome that was
above the significance threshold was identified as a significant interactor of a hub locus. For
subsequent rounds of forward regression, all identified interactions were included in the fixed
effects linear model as covariates, including the simpler terms, e.g. residuals ~ hub + locus2 +
known_locus1 + known_locus2 + … known_locusN + hub:known_locus1 + hub:known_locus2 +
… hub:known_locusN + hub:locus2. The most significant interacting locus from each
chromosome that was above the significance threshold was again identified as a significant
interactor with a hub locus. This process was repeated until no additional hub:locus2 terms were
discovered for each environment. To avoid double counting interactions, all pairwise interactions
identified in the comprehensive scan that involved a locus within 100 kb of a hub were removed.
3.5.21. Scan for three-locus interactions
A comprehensive scan for three-locus effects using all SNP markers is computationally
expensive. Instead, we scanned for three-locus effects using a smaller subset of 579 SNP markers,
where each SNP marker was at least 5cM apart. The following fixed effects linear model was used:
residuals ~ locus1 + locus2 + locus3 + locus1:locus2 + locus1:locus3 + locus2:locus3 +
locus1:locus2:locus3. Similar to the scan for pairwise interactions, all locus and interaction terms
118
were treated as categorical values. Here, the significance of the locus1:locus2:locus3 interaction
term was tested. Simpler terms were included in each model to ensure that variances due to one-
locus non-additive effects (e.g. dominance) and pairwise interactions were not erroneously
attributed to more complex terms. As before, the significance threshold was determined by
conducting 1,000 permutations with the correspondence between genotype and residuals shuffled
each time. Among the minimum p-values obtained in each permutation, the fifth quantile was
identified and used as the significance threshold for our comprehensive scan of all two-loci
interactions. Similar to the pairwise interaction scan, for each permutation, 10,000 trios of loci
were randomly selected for each permutation rather than all possible trios of loci. All significant
three-locus interactions where all three loci were within a 95% confidence interval of each other
were consolidated: for each set of overlapping loci, the SNP marker with the most significant p-
value was used for downstream analysis while less significant markers were recorded but treated
as the same genetic effect and not used for downstream analyses.
3.5.22. Resolving hub loci to single-gene resolution using recombination
breakpoints
Hub loci were resolved into single causal genes through fine mapping, using data from
family-level scans in each of the 392 MATa families. For each hub loci, confidence intervals were
checked in each family with a detection, and the minimum intervals were recorded to determine
the tightest possible interval around the peak of a locus. Family-level confidence intervals in which
the 3’ marker occured before the 5’ marker in the family where a hub had the most significant
effect were discarded. These intervals were used to resolve each hub to a minimum number of
haplotype blocks.
119
3.5.23. Estimation of the fraction of epistatic effect involving dominance
To understand how a locus modifies the effect of an interacting locus in a pairwise
interaction, each pairwise interaction was partitioned into four epistasis types: additive-additive,
dominance-additive, additive-dominance, and dominance-dominance {ref}. The following linear
fixed effect model was used: residuals ~ locus1 + locus2 +
additive_locus1:additive_locus2(a1:a2) + dominance_locus1:additive_locus2(d1:a2) +
additive_locus1:dominance_locus2(a1:d2) + dominance_locus1:dominance_locus2(d1:d2).
Additive_locus terms were treated as numerical values, while locus and dominance_locus terms
were treated as categorical values. The percent variance explained (PVE) by each of the four
epistatic types was then estimated by taking the sum of squares of each interaction term and
dividing it by the total sum of squares. Using these PVE values for each interaction term, the
fraction of epistatic effect involving dominance was then calculated. To estimate the fraction in
which locus1’s modifying effect on locus2 acts on dominance, the following equation was used:
(d1:a2 + d1:d2) / (a1:a2 + d1:a2 + a1:d2 + d1:d2). Conversely, to estimate the fraction in which
locus2’s modifying effect on locus1 acts on dominance, the following equation was used: (a1:d2
+ d1:d2) / (a1:a2 + d1:a2 + a1:d2 + d1:d2).
3.5.24. Examination of how combinations of modifiers explain variance
in hub effect size
Using the PVE values for the four epistatic types (see section ‘Estimating the fraction of
epistatic effect that involves dominance’), we examined how loci that interact with a hub contribute
to additive and dominance effect size variance across different genetic backgrounds. For each two-
locus interaction involving a hub, the magnitude in which the hub modifier affects the additive
component of the hub (‘add_mod’) was determined by adding the PVE of a1:a2 and a1:d2
120
interaction terms, where locus1 is the hub. The magnitude in which the hub modifier affects the
dominance component of the hub (‘dom_mod’) was determined by adding the PVE of d1:a2 and
d1:d2 interaction terms. Significantly large outliers in add_mod or dom_mod, defined as values
more than 1.5 interquartile range above the third quartile, were identified as major effect
modifiers.
To examine how modifiers collectively change the effect size of hubs, four modifiers with
the largest modifying effects on additivity and dominance were chosen for each hub. Data was
then subsetted into 81 genotype classes based on the genotype state at the four modifiers. For each
genotype class, the data was further subsetted based on the genotype state of the focal hub locus.
Additive effect sizes for each genotype class were calculated by subtracting the mean fitness of
diploids that are BY/BY at the focal hub from the mean fitness of diploids that are 3S/3S at the
focal hub. Dominance effect sizes for each genotype class were calculated by subtracting the mean
fitness of diploids that are heterozygous at the focal hub from the average of the mean fitness of
diploids that are BY/BY and 3S/3S at the focal hub.
121
3.6. Supplementary materials
Figure S3.1 Workflow of how haploid and diploid segregants were barcoded. A. Each parental strain
was engineered to have a genomic “landing pad” containing two partially crippled LoxP sites, LoxP/71 and
Lox2272/71, and a galactose-inducible Cre recombinase at the YBR209W locus. B. pBAR6 plasmid used
to make the library of barcoding plasmids for MATα segregants. C. Barcoding plasmids were made by
122
combining linearized pBAR6 with a PCR product containing a partially crippled Lox2272/66 site, a random
20-mer barcode sequence, and a partial TrueSeq read 2 adapter sequence using Gibson assembly. The
barcoding plasmids were then individually transformed into each MATα segregant and integrated into the
genome at the genomic “landing pad” using cre-recombinase mediated homologous recombination at the
Lox2272 site. D. MATa segregants were barcoded by integrating a PCR product containing a split URA3
marker, a partial TruSeq read1 adapter sequence, a random 20-mer barcode, and a partially crippled
LoxP/66 at the genomic “landing pad” using CRISPR/Cas9 mediated homologous recombination. E. After
the MATa and MATα segregants were mated, the two barcodes were brought onto the same chromosome
using site-directed chromosomal translocation via Cre-LoxP homologous recombination.
123
Figure S3.2 Genome-wide allele frequencies of haploid segregants and the inferred genome-wide
allele frequencies of diploid segregants. A. Genome-wide allele frequencies of haploid BYx3S
segregants. With the exception of a region on Chromosome IV, allele frequencies were balanced in the
BYx3S segregants used to generate the diploid population. B. Inferred genome-wide allele frequencies of
diploid BYx3S strains. Genotypes for diploid strains were generated in silico using sequencing data from
segregants containing the barcodes present in each diploid.
124
Figure S3.3 Correlation of fitness estimates between strain replicates in each experiment. Between
two and four barcoded replicates of all strains were included in each pooled fitness assay. Fitness estimates
between strain replicates were well correlated (0.524 ≤ Spearman’s ⍴ ≤ 0.8) within each experiment.
125
Figure S3.4 Standard error as a function of mean strain fitness. A negative relationship was observed
between the standard error and the mean fitness of a strain.
126
Figure S3.5 Correlation of fitnesses across environments. Cells in the heatmap contain Spearman’s
correlation coefficients of the mean fitnesses of strains across all environments. Two environments, CoCl 2
and CuSO4, were less correlated with other environments and each other.
127
Figure S3.6 Quantile normalization of fitnesses. Because the distributions of fitness were slightly left
skewed, mean fitnesses were quantile normalized. This quantile normalized fitness estimate was used as
the fitness phenotype for all downstream analyses.
128
Figure S3.7 Most loci are detected as significant without family correction. Histogram showing the p-
values of all 7,742 SNP markers when tested for significance using a fixed effects linear model without
correcting for family structure. The red line shows the nominal significance threshold at p-value = 0.05.
129
Figure S3.8 Loci with individual effects on fitness in each environment. Panels from top to bottom are
1) loci detected by the mixed effects linear model FaST-LMM (red bars), 2) dominance loci detected by
the fixed effect linear model using the non-additive portions of each diploid’s phenotype (green bars), 3)
loci detected by the family-level scans for each MATa family (black or blue points), and 4) the total number
of detections across families for each 50 kb interval (grey bars).
130
Figure S3.9 Accounting for differences in mean parental fitness corrects for family-level fitness
effects in the population. A. The mean fitness of each of the 392 MATa and 591 MATɑ families in glucose,
ordered from lowest to highest mean fitness. B. The mean residuals, or the non-additive portions of each
diploid’s phenotype, for each family after accounting for mean parental fitness.
131
Figure S3.10 Pairwise genetic interactions detected in each environment. In each plot, interior lines
connect regions of the genome involved in pairwise interactions. Outer rings contain barplots with height
corresponding to the number of pairwise interactions at each locus (orange) and its absolute effect size
(green) in the entire population.
132
Figure S3.11 Three-way genetic interactions detected in each environment. In each circos plot, interior
lines connect regions of the genome involved in three-way interactions. Outer rings contain barplots with
height corresponding to the number of three-way interactions at each locus (orange) and its absolute effect
size (green) in the entire population.
133
Figure S3.12 Genetic variation at the mating locus results in distinct heterozygote classes. To ensure
segregation of the mating locus, both BY MATa x 3S MATα and 3S MATa x BY MATα crosses were
performed using isogenic strains that had been mating type switched. Pairwise matings between these
haploids results in two possible heterozygous genotypes at the mating locus in the resulting diploids: BY
MATa / 3S MATɑ and 3S MATa / BY MATɑ.
134
Drug/chemical Concentration
Cobalt Chloride 6.675 ug/ml
Copper Sulfate 225 ug/ml
Hydrogen Peroxide 0.006%
Sodium Chloride 225 mM
Rapamycin 0.25 mM
Zeocin 625 ug/ml
Table S3.1 List of chemical additives and their concentrations used in the competition experiments.
Chemicals are added to SCM-Ura + 2% glucose (the Glucose1 and Glucose2 growth condition).
135
Environment 1_rep 2_rep 3_rep 4_rep total 2+_rep
CoCl2 28858 84334 43211 59599 216002 187177
CuSO4 25020 94378 43159 54555 217112 192096
Glucose1 32545 82656 41473 63861 220535 187990
Glucose2 28985 82262 41400 69888 222535 193550
H2O2 29557 86235 44804 57669 218265 188708
NaCl 28581 85554 43625 60870 218630 190050
Rapamycin 25441 87722 40534 65017 218714 193292
Zeocin 25632 82913 44274 64779 217598 191969
Table S3.2 Number of replicate diploid strains in each fitness assay. Column 1 (‘Environment’) lists
the eight pooled fitness assays (including glucose replicates) conducted in this study. Columns 2 - 5 (‘1_rep’
- ‘4_rep’) list how many diploid strains are present in one, two, three, or four barcoded strain replicates
within a given fitness assay. Column 6 (‘total’) lists the total number of diploid strains present in each
fitness assay. Column 7 (‘2+_rep’) is how many of the total strains are present at least twice in an
experiment.
136
Environment broad broad_se narrow narrow_se dom dom_se epi epi_se
CoCl2 0.621 0.011 0.507 0.035 0.012 0.008 0.22 0.038
CuSO4 0.572 0.011 0.453 0.042 0.01 0.008 0.247 0.044
Glucose1 0.745 0.007 0.517 0.025 0.071 0.012 0.147 0.024
Glucose2 0.762 0.007 0.525 0.025 0.076 0.012 0.144 0.024
H2O2 0.516 0.015 0.194 0.032 0.061 0.022 0.147 0.055
NaCl 0.553 0.013 0.343 0.037 0.04 0.016 0.156 0.049
Rapamycin 0.576 0.014 0.409 0.034 0.028 0.013 0.2 0.047
Zeocin 0.54 0.015 0.292 0.033 0.036 0.016 0.154 0.053
Table S3.3 Broad and narrow-sense heritability estimates for each fitness assay. Column 1 (‘env’) lists
the eight pooled fitness assays (including glucose replicates) conducted in this study. Columns 2 (‘broad’)
and 3 (‘broad_se’) list the broad sense heritability estimates and corresponding standard error values.
Columns 4 (‘narrow’) and 5 (‘narrow_se’) list the narrow sense heritability estimates and corresponding
standard error values. Columns 6 (‘dom’) and 7 (‘dom_se’) list estimates of phenotypic variance explained
by dominance and the corresponding standard error values. Columns 8 (‘epis’) and 9 (‘epi_se’) list
estimates of phenotypic variance explained by genetic interactions and the corresponding standard error
values.
137
Environment Number of genotypes Number of reps per genotype
CoCl2 230474 2.904
CuSO4 229726 2.782
Glucose1 221747 2.613
Glucose2 227025 2.828
H2O2 228008 2.814
NaCl 227207 2.782
Rapamycin 230373 2.908
Zeocin 229428 2.91
Table S3.4 The number of unique genotypes and the number of barcode replicates per genotype
detected before filtering for quality. Column 1 (‘Environment’) lists the eight pooled fitness assays
(including glucose replicates) conducted in this study. Column 2 (‘Number of genotypes’) list the total
number of diploid strains detected before filtering for quality. Column 3 (‘Number of rep per genotype’)
lists the average number of barcoded strain replicates each diploid was represented before filtering for
quality.
138
Environment Coefficient of determination PCR chimera rate
CoCl2 0.695 0.123
CuSO4 0.734 0.167
Glucose1 0.488 0.115
Glucose2 0.558 0.098
H2O2 0.593 0.109
NaCl 0.578 0.075
Rapamycin 0.728 0.108
Zeocin 0.611 0.079
Table S3.5 Estimated rate of PCR chimera. Column 1 (‘Environment’) lists the eight pooled fitness
assays (including glucose replicates) conducted in this study. Column 2 (‘Coefficient of determination’)
lists the R
2
value when observing the linear relationship between the number of PCR chimeras and the
abundance of the involved barcodes: # of copies of PCR chimera ~ # of copies of barcode1 * # of copies of
barcode2. Column 3 (‘PCR chimera rate’) lists the overall PCR chimera rate estimated using this linear
model.
139
Chapter 4: Widespread antagonistic pleiotropy causes differential fungal
persistence across mammalian organs
This work was performed in collaboration with Caleb Ghione, Michael Lough-Stevens,
Ilan Goldstein, Takeshi Matsui, Sasha F. Levy, Matthew D. Dean, and Ian M. Ehrenreich.
4.1. Abstract
Determining how genetic polymorphisms enable certain fungi to persist in mammalian
hosts can improve understanding of opportunistic fungal pathogenesis, a source of substantial
human morbidity and mortality. We examined the genetic basis of fungal persistence in mice using
a cross between a clinical isolate and the lab reference strain of the budding yeast Saccharomyces
cerevisiae. Employing chromosomally-encoded barcodes, we tracked the relative abundances of
822 genotyped, haploid segregants in multiple organs over time and performed linkage mapping
of their persistence in hosts. Detected loci showed a mix of general and antagonistically pleiotropic
effects across organs. General loci showed similar effects across all organs, while antagonistically
pleiotropic loci showed contrasting effects in the brain and the kidneys, liver, and spleen.
Persistence in an organ required both generally beneficial alleles and organ-appropriate pleiotropic
alleles. This genetic architecture resulted in many segregants persisting in the brain or in non-brain
organs, but few segregants persisting in all organs. These results show complex combinations of
genetic polymorphisms collectively cause and constrain fungal persistence in different parts of the
mammalian body.
140
4.2. Introduction
Fungi are a major class of opportunistic human pathogen, infecting billions and killing
millions of people per year (135, 136). Hundreds of diverse fungal species are known to infect
humans (137). These fungi mainly infect the immunocompromised, an increasing segment of the
global population due to improvements in medicine that have lowered the mortality associated
with life-threatening conditions (137). Such opportunistic infections can be difficult to treat (136–
139), but the identification of mechanisms enabling fungi to cause these infections may facilitate
the development of more effective antifungal therapies (140–146).
Many opportunistic fungal pathogens can be challenging to work with genetically
(147,148). However, the budding yeast Saccharomyces cerevisiae, one of the main eukaryotic
model organisms in biology, is also an opportunistic pathogen, with numerous isolates obtained
from clinical infections (144–146, 149–151). Humans are regularly exposed to S. cerevisiae, as it
occurs naturally in the environment and is used in the production of beer, wine, bread, chocolate,
and other foods and dietary supplements (152–156). Notably, S. cerevisiae is in the same
Saccharomycetaceae family of Ascomycete yeasts as Candida, the main genus involved in
opportunistic fungal infections (Candida albicans) (157).
The ability to infect immunocompromised humans varies among S. cerevisiae strains
(145), with clinical isolates found throughout the species’ genealogy (151, 158). Despite their lack
of genetic relatedness, clinical S. cerevisiae isolates are thought to possess similar traits, including
the ability to attach to and penetrate surfaces and tolerance to human body temperature (100, 144,
145, 151, 155, 159). However, determining why certain strains are able to infect humans ultimately
requires mapping the specific genetic polymorphisms that cause opportunistic pathogenicity and
141
determining the traits they affect. Such work is challenging because it requires performing genetic
mapping in yeast inside mammalian hosts, such as mice.
The most powerful genetic mapping approach in S. cerevisiae is linkage mapping with
large mapping panels of haploid meiotic progeny (segregants) (89, 160). When two haploid
isolates are crossed and sporulated, their haploid segregants each receive a unique random
combination of alleles from their parents (161). This shuffling of genetic material makes it possible
to measure traits of interest in segregants and then link these traits to specific genomic locations
(loci) that cause phenotypic differences (162). Examination of large numbers of segregants
provides the statistical power to identify loci explaining most of the heritable differences in traits
of interest (89, 160).
To enable linkage mapping with an S. cerevisiae cross in mice, we generated a panel of
haploid segregants with known genotypes and chromosomally-encoded barcodes (160, 163),
which were genetically engineered for high-throughput phenotyping as a pool. We mated the lab
reference strain (BY) and a haploid derivative of the 322134S clinical isolate (3S) (158). BY and
3S are highly diverged at the sequence level, with a genetic difference present every ~270 base
pairs (20, 107). While 3S is an isolate obtained from the throat sputum of a patient with a clinical
infection (158), BY is a commonly used reference strain that is avirulent and rapidly cleared by
mice (144).
We used this BYx3S cross to obtain insights into how genetic differences among strains
influence the persistence of yeast in mice. This was done by injecting a pool of 822 barcoded
BYx3S MATɑ segregants into the mouse bloodstream and enumerating segregant abundances in
host organs over time by barcode sequencing. Linkage mapping with these data identified
numerous loci that explained most of the heritable variation in yeast persistence in hosts. Some
142
loci showed consistent effects across host organs (general loci), while others had counteracting
effects across host organs (antagonistically pleiotropic loci) (164–166), causing different
segregants to be superior in different organs. Our work advances the use of S. cerevisiae as a model
for opportunistic fungal pathogenicity and host-microbe interactions.
4.3. Results
4.3.1. Persistence of barcoded yeast strains inside a mammalian host
We crossed haploid BY and 3S strains that were genetically engineered to produce
segregants amenable to pooled, high-throughput phenotyping. FLO11 and FLO8, which
respectively encode the main cell surface flocculin in this organism (167, 168) and its primary
transcriptional activator (169), were deleted from these strains. These deletions eliminate cell
clumping and flocculation within and between segregants, which are problematic for pooled
experiments. However, they also diminish surface adhesion and invasion, limiting our insight into
these traits. BY and 3S were also engineered to have a genomic landing pad at the neutral
YBR209W locus, enabling site-specific integration of barcodes into segregants (109, 160, 163).
The engineered BY and 3S strains were mated to produce a BY/3S diploid, which was
sporulated. To ensure balanced allele frequencies and random multilocus genotypes among
segregants, we performed tetrad dissection and randomly chose and barcoded one MAT haploid
from each of 822 tetrads using transformation with a random barcode library (Figure S4.1) (109,
160, 163). 86 segregants were marked with two additional distinct random barcodes and these
replicates were included in our experiments. Illumina sequencing was used to genotype segregants
and determine their barcodes. All barcoded segregants and replicates were then grown individually
to stationary phase and combined into a single pool in equimolar fractions.
143
Thirty-six mice were infected with 1x10
7
cells from the segregant pool by tail vein injection
(Figure 4.1a). Equal numbers of male, female, immunocompromised (injected with 4 mg/mL
dexamethasone [dex]), and immunocompetent (injected with water) mice were included. No
morbidity or mortality was observed. One-third of the mice were euthanized at each of three time
points (one, two, and five days post-injection). From each mouse, we harvested the brain, gonads,
kidneys, liver, and spleen. The five organs were dissected, homogenized, and plated on selective
media to isolate yeast from the mouse cells. On average, 69,150 colony forming units (CFU) were
recovered per liver, 32,032 CFU per spleen, 3,843 CFU per kidney pair, 1,741 CFU per brain, and
69 CFU per gonad pair. For every organ, recovery decreased over time, suggesting clearance of at
least some segregants by the mice (Figure 4.1b). Recovery was lowest in the brain and gonads,
which both have blood barriers (170–172).
Barcode sequencing was performed on all samples. For every barcode in each sample, we
divided the change in barcode frequency between the time of sampling and the initial pool by the
initial barcode frequency. A linear model was then used to correct these changes in barcode
frequency for differences in growth among segregants in on-plate controls, which were highly
reproducible across plating densities and the homogenization process (Figures S.42 and S4.3).
Residuals from this correction for on-plate growth were used as persistence phenotypes in
downstream analyses. Of 180 processed samples (five organs x two sexes x two dex treatments x
three timepoints x three replicates), yeast were present in 166 and recovered from 157; only these
samples were analyzed further.
We used the 86 segregants that were replicated in the pool to estimate broad sense
heritability (H
2
)
in each sample. Across samples, H
2
ranged from 0 to 0.92 (median H
2
= 0.57). Our
ability to measure H
2
was strongly affected by yeast recovery from samples, with higher recovery
144
resulting in higher H
2
(simple linear regression of H
2
on CFU, R
2
= 0.77, p = 9.6x10
-48
; Figure
4.1c). Variability in yeast recovery among samples was presumably due to both differences in
clearance among mice and organs, as well as technical factors associated with organ dissociation.
Despite limitations associated with recovering yeast from dissociated organs, the high H
2
values
in many samples shows that genetic polymorphisms among segregants caused differences in
persistence in hosts.
To distinguish samples with significant differences in persistence among segregants, we
applied one-way analysis of variance (ANOVA) to each sample, again using the replicated
segregants. 94 samples showed significant differences among segregants (Bonferroni-corrected ɑ
= 0.05 threshold, p 3.7x10
-4
). Of these, two were excluded because they had distorted
measurements for persistent segregants, suggesting their sequencing libraries were of low quality
(Figure S4.4). Only a single gonad sample showed significant differences in persistence among
segregants; this sample, which had a lower H
2
value (0.29), was also omitted from later analyses
due to a lack of organ replicates (Figure 4.4). In the 91 remaining significant samples from the
brain, kidneys, liver, and spleen, the median H
2
was 0.75 (from 0.24 to 0.92), indicating most of
the variability among segregants in these replicated, higher quality samples was genetic in origin
(Figure 4.1c).
145
Figure 4.1. Experimental infections of mice using a pool of barcoded, haploid segregants. a, The
workflow and general design of the experiment. Haploid BY and 3S strains were crossed, the resulting
diploid was sporulated, and tetrads were dissected to generate a panel of 822 recombinant haploid progeny.
These strains were then barcoded with unique 20mer nucleotide sequences and pooled together (T 0). T 0 was
used to infect mice using tail vein injection and simultaneously plated in triplicate onto control plates
containing rich medium. At three time points post-infection, organ samples were collected from infected
mice and plated on rich medium. After two days of growth on plates, DNA was extracted from the yeast
146
and sequenced to measure relative barcode abundance of each strain. DNA was also extracted from T 0 to
measure initial barcode frequencies. For each strain, the change in normalized barcode frequency relative
to T 0 after correcting for on-plate growth was used as the phenotype in all downstream analysis. b, Plot
reporting the CFU recovered from each organ sample across time points. Each panel shows the samples
from a different organ. Color of each point corresponds to the sex and immunological state of the mouse
from which the sample was recovered. Mean log 10(CFU) over time is shown as a black line. c, Scatterplot
showing the broad-sense heritability (H
2
) of each organ sample as a function of the CFU recovered from
that sample. The color of each sample corresponds to organ type.
4.3.2. Antagonistic pleiotropy constrains localization of strains within the
host
We analyzed the relationships among the 91 samples using hierarchical clustering,
principal components analysis (PCA), and examination of pairwise correlations. All methods
found the same result: the samples split into two clusters, brain and non-brain (kidneys, liver, and
spleen) (Figure 4.2a and b; Figure S4.5). In PCA, these groups were visible in the loadings on
the first principal component (PC1), which was the only PC to account for a meaningful portion of
the variance across samples (54.1%; other PCs explained ≤7.4% of the variance across samples;
Table S4.1). Whether a sample was from the brain or a non-brain organ was the only experimental
factor showing a major relationship with PC1, explaining 85.2% of the variance in PC1 in a one-
way ANOVA (p = 8.84x10
-39
). Time and the interaction between brain vs. non-brain and time also
had minor significant effects, each explaining <2.7% of the variance in PC1 (full-factorial ANOVA
with brain vs. non-brain, time, and brain vs. non-brain-time interaction, factor effect test p <
2.94x10
-3
). Such time effects would be expected if selection acts on phenotypic differences among
segregants and these differences vary across organs. Immunological state and sex showed no
relationship with PC1, perhaps because we directly injected yeast into the bloodstream. Following
these results, we generated aggregate brain and non-brain measurements for each segregant by
147
averaging data from the 15 brain samples and 76 non-brain-samples, respectively. These aggregate
measurements showed a poor but highly significant correlation (Spearman’s rho = 0.21, p =
1.78x10
-9
; Fig. 2c).
148
149
Figure 4.2. Organ type is the main driver of variation in persistence across samples. a, Heatmap
showing phenotypes of strains (x-axis) across organ samples (y-axis). Samples are clustered by organ type
and segregants are clustered by phenotype across non-brain samples. b, All pairwise comparisons of
segregant phenotypes between organs. Here, each segregant phenotype in an organ represents its average
measurement across all samples for that organ in which reproducible loci were detected. c, Comparison of
segregants’ aggregate phenotypes in the brain and non-brain samples.
These above results show that segregants show reproducible differences in persistence in
brain and non-brain organs, and indicate that the genetic bases of persistence in these different
parts of the host body are likely only partially overlapping. We began determining the genetic basis
of these differences in persistence within and between organs. We performed linkage mapping on
each of the samples with significant phenotypic differences among segregants, detecting 494 loci
in total (Figure 4.3a; Table S4.2). On average, 5.43 loci were identified per sample (min: 0, max:
12) and 90 of 91 samples had at least one detected locus that was also identified in another sample.
Multiple loci were mapped in the spleen (181 loci), liver (157 loci), kidney (101 loci), and brain
(55 loci), and many loci were detected in numerous samples (min = 2, max = 87, median = 4), as
expected if segregants show reproducible phenotypes across samples due to a common set of loci.
These detections could be consolidated to 35 distinct loci, based on overlapping confidence
intervals (Table S4.2). The number of loci identified in these samples showed a highly significant
relationship with H
2
(simple linear regression of number of loci on H
2
, R
2
= 0.32, p = 4.02x10
-9
;
Supplementary Fig. 6), suggesting heterogeneity in measurement noise across samples impacted
statistical power.
To improve statistical power and minimize the chance of false positives, we performed
linkage mapping on aggregate brain and non-brain measurements, as well as the difference
between the two; these measurements should be more precise than data from individual samples.
150
The scans on brain, non-brain, and difference measurements respectively identified nine, nine, and
10 loci (Figure 4.3b). Some loci were mapped in multiple of these scans, resulting in the
identification of 18 unique loci. Eight of these loci overlapped loci detected in controls, but in
these cases loci showed different effects between the samples and controls (Figure 4.3a through
4.3c; Figure S4.7). Loci detected in the brain, non-brain, and brain vs. non-brain scans explained
89.7%, 62.3%, and 83.6% of H
2
in their respective measurements.
151
152
Figure 4.3. Identification of loci associated with persistence in the host in organ samples. a,
Consolidated loci in the plate controls (top), consolidated loci across all organ samples (middle), and
individual loci detected in each organ sample (bottom) shown in descending order from greatest to least
number of loci detected. Corresponding broad sense heritability (H
2
) measurements for each sample are
shown to the right of each individual sample. Samples are colored by organ type. b, Loci detected in
genome-wide scans using aggregate data across samples (top), followed by loci detected using mean
segregant phenotypes in brain and non-brain samples, as well as the difference in mean phenotype between
brain and non-brain samples (bottom). Red loci have effects of the same sign in both brain and non-brain
samples (general), while blue loci have effects with opposite signs in brain and non-brain samples
(pleiotropic). * indicates two linked, but distinguishable, loci on chromosome XII. c, The effects of loci
detected in whole-genome scans using aggregate data from the brain samples and non-brain samples, as
well as the difference between brain and non-brain samples, are shown. Effects were calculated as the mean
persistence of strains with the 3S allele at the focal locus minus the mean persistence of strains with the BY
allele after correction foron-plate growth. The effect of each locus in the controls after correcting for on-
plate growth is also shown (gray).
The resolution of loci was poor, with confidence intervals from scans using aggregate data
spanning 58 kb (min: 9 kb, max: 100 kb) and 32.5 genes (min: 6, max: 62) on average Table S4.3).
To better resolve these loci, we leveraged confidence intervals from detections of these loci in
multiple individual samples. While average resolution was only slightly improved (mean interval
= 24 kb, mean number of genes = 12, min number of genes = 2, max number of genes = 39; Table
S4.4). The two most finely resolved loci were each localized to two candidate protein-coding genes
(173). A locus on Chromosome XIV spanned a subunit of the BLOC-1 complex involved in
endosomal maturation (SNN1) and a poorly understood, pleiotropic gene (MKT1) known to
influence many quantitative traits in S. cerevisiae (174–177). A locus on Chromosome XV
encompassed alcohol dehydrogenase (ADH1) and a gene regulated by phosphate levels (PHM7).
Additionally, a locus on Chromosome XII fractionated into two distinct intervals, one including
only the genes for DNA topoisomerase III (TOP3) and a thiamine transporter (THI7).
153
We next focused on understanding how the 18 loci influence the ability of segregants to
persist in different parts of the mammalian body. We calculated the effects of each locus in the
brain and non-brain organs. Loci were then classified as general or antagonistically pleiotropic if
the same allele or different alleles were superior in both the brain and non-brain organs,
respectively (Figure 4.4). Of the 18 loci, ten were general and eight showed antagonistic
pleiotropy between the brain and non-brain organs. Eight of the beneficial alleles at the general
loci were contributed by the 3S clinical isolate (Figure 4.4a upper right quadrant), as opposed to
two by the BY lab strain (Figure 4.4a lower left quadrant). By contrast, eight loci showed
antagonistic pleiotropy between the brain and non-brain organs, suggesting a fitness trade-off
between persistence in the brain and other organs (Figure 4.4a, 4.4c, and 4.4d).
Figure 4.4 Identified loci show a mixture of general effects and antagonistic pleiotropy. a, The effect
sizes of loci detected using aggregate phenotype data in the brain and non-brain organs are shown on the
x- and y-axes, respectively. Positive effect sizes mean that strains carrying the 3S allele were enriched in
the samples while negative values mean that strains carrying the BY allele were enriched. Loci are colored
by whether the same allele is beneficial in both brain and non-brain samples (red; general effects) or not
(blue; antagonistic pleiotropy). Numbers correspond to specific examples in b and c. b, A locus with a
154
general effect on persistence within the host. Brain (left) and non-brain (right) phenotypes are plotted as a
function of strain genotype at this locus. Positional information for the locus is denoted by bold text above
the example. c, An antagonistically pleiotropic locus at which the BY allele is beneficial in the brain (left)
and detrimental in other organs (right). d, An antagonistically pleiotropic locus at which the 3S allele is
beneficial in the brain (left) and detrimental in other organs (right).
We also determined how alleles at general and antagonistically pleiotropic loci combine to
cause fungal persistence. There was a positive relationship between the number of beneficial
alleles at general loci and persistence in both brain and non-brain organs (simple linear regression
of number of general alleles on persistence, brain R
2
= 0.20 and p = 6.45x10
-41
, non-brain R
2
=
0.11 and p = 8.12x10
-23
; Figure 4.5a). Similarly, we found that the number of brain or non-brain
alleles at antagonistically pleiotropic loci was positively and negatively related to segregants’
persistence in the brain (simple linear regression of number of brain alleles on brain persistence,
R
2
= 0.12, p = 1.32x10
-23
) and non-brain organs (simple linear regression of brain alleles on non-
brain persistence, R
2
= 0.08, p = 2.51x10
-16
), respectively (Figure 4.5b).
Lastly, we examined how sets of general and antagonistically pleiotropic loci jointly
influence persistence. In a given organ, the most persistent segregants were enriched for both the
beneficial alleles at general loci and the appropriate alleles at antagonistically pleiotropic loci
(brain 2x2 "
2
test: "
2
= 21.89, p = 2.9x10
-6
; non-brain 2x2 "
2
test: "
2
= 15.42, p = 8.6x10
-5
; Figure
4.5c and 4.5d). In the brain, segregants were even able to persist if they were enriched for organ-
specific alleles alone, although their persistence was lower than segregants enriched for both
organ-specific and generally beneficial alleles (Figure 4.5c).
155
Figure 4.5. General and antagonistically pleiotropic loci collectively influence strain
persistence in the host. a, Violin plots showing the mean strain phenotypes in the brain samples
(left) and non-brain samples (right) as a function of the number of generally beneficial alleles
present in a segregant. Thresholds for strains considered to have a high or low number of general
persistence alleles are represented by colored backgrounds. b, Violin plots showing the mean strain
phenotypes in the brain (left) and non-brain samples (right) as a function of the number of
156
antagonistically pleiotropic brain alleles present in a strain. Thresholds for strains considered to
have a higher or low number of alleles favoring persistence in the brain over other organs are
represented by colored backgrounds. c, Plot showing mean change in enrichment in the brain
samples over time relative to T1 measurements (bold lines) for strains that have a high or low
number of generally beneficial alleles as well as a high or low number of alleles favoring
persistence in the brain over other organs (according to thresholding in panels a and b). Error bars
show the standard error about the mean enrichment of strains at five days post-infection. Faint
lines show the enrichment over time of bootstrapped data (1,000 replicates). d, Plot showing mean
enrichment in the non-brain samples over time relative to day one measurements (bold lines) for
strains that have a high or low number of generally beneficial alleles as well as a high or low
number of alleles favoring persistence in the brain over other organs (according to thresholding in
panels a and b). Error bars show the standard error about the mean enrichment of strains at five
days post-infection. Faint lines show the enrichment over time of bootstrapped data (1,000
replicates).
157
4.4. Discussion
We used barcode sequencing to phenotype a pool of genotyped segregants in mice.
Analysis of segregants replicated in the pool showed that persistence in mice has a largely genetic
basis, and comparison of samples revealed different segregants are superior in the brain and in the
kidneys, liver, and spleen. Although technical noise limited our ability to map many loci in
individual samples, aggregating brain and non-brain samples made it possible to identify loci
explaining most of the variability in persistence within and between organs. 18 loci were detected,
with the majority having effects across all organs. Some of these loci were generally beneficial,
while others exhibited antagonistic pleiotropy, showing tradeoffs between brain and non-brain
organs. These antagonistically pleiotropic loci could represent either single polymorphisms that
have different effects in distinct parts of the host body or closely linked polymorphisms with
different effects.
Our findings may explain why diverse S. cerevisiae isolates act as opportunistic pathogens
(151, 158). The ability to persist in mammalian hosts is highly polygenic: we identified 18 loci in
a cross of two isolates and examination of additional isolates would likely detect even more (178,
179). With so many loci involved, many S. cerevisiae isolates will possess beneficial alleles at
some general loci, as we saw with both BY, an avirulent isolate (144), and 3S, a clinical isolate
(158). Furthermore, all isolates will carry alleles of antagonistically pleiotropic loci that are
beneficial somewhere in the host body. Thus, the mixing of genetic material throughout the species
by chance outcrossing events may produce strains that can persist in particular mammalian organs.
Supporting such a possibility, clinical isolates are often highly heterozygous diploids that likely
resulted from recent mating events in nature (100, 151).
158
Our results, in particular the identification of numerous antagonistically pleiotropic loci,
also indicate different organs in the mammalian body represent distinct environments for fungi.
The brain and non-brain organs have a myriad of functional and physiological differences: for
example, the brain has its own semi-permeable barrier (170, 171) and the kidneys, liver, and spleen
filter blood (180). Persisting in these distinct organs may require different traits, which may be
beneficial in some organs and detrimental in others. If these traits vary across strains, which seems
likely based on our data, many individuals may only be able to infect certain organs in the
mammalian body and may be constrained in their potential to spread to other organs post-infection.
Genetic mapping has the potential to help reveal molecular mechanisms shaping the
abilities and constraints of persistence in different parts of the host body. Although our resolution
was coarse in most cases, a few finely resolved loci implicated a potential diversity of cellular
processes, including endosome maturation (SNN1), ethanol production (ADH1), genome stability
(TOP3), phosphate metabolism (PHM7), and thiamine uptake (THI7). The other gene in these
intervals (MKT1) has an unclear function. Notably, ADH1 (181), MKT1 (182), and PHM7 (183)
have been found to affect pathogenicity in other fungi, and both endosomal function (184) and
thiamine transport (185) have been linked to virulence as well. Our system provides an
opportunity not only to identify new mechanisms underlying persistence in hosts, but also to study
how both known and unknown mechanisms act in combination.
Finally, fungal infections in the brain and central nervous system (meningitis) are a leading
cause of morbidity and mortality among immunocompromised patients (137, 186). We detected
specific allele combinations that allowed segregants to persist in the brain, but our limited mapping
resolution precluded insight into how these alleles act mechanistically. A possibility is they
influenced passage through the blood-brain barrier, as we recovered fewer yeast from the brain
159
than the kidneys, liver, or spleen. This hypothesis, which requires future testing, illustrates how
our experimental system can be used to understand the mechanisms by which genetic
polymorphisms modify interactions between yeast cells and the mammalian body.
4.5. Methods
4.5.1. Generation of haploid segregants
Haploid segregants were generated from a cross of two isolates of Saccharomyces
cerevisiae, the lab strain BY4716 (BY) and a haploid derivative of the clinical isolate 322134S
(3S) (158). Specifically, BY ho fcy1∆ flo8∆ flo11∆ ura3∆ YBR209W::pGal1-Cre - Lox71 -
Lox2272/71 and 3S ho fcy1∆ flo8∆ flo11∆ ura3∆ YBR209W::pGal1-Cre - Lox71 - Lox2272/71
parent strains were used. Construction of these strains is described in detail in Matsui et al. (160)
In brief, for each of these gene deletions, the coding region was completely removed, without any
marker left behind. The fcy1∆ and ura3∆ gene deletions provide counterselectable markers, while
the flo8∆ and flo11∆ gene deletions should eliminate clumping and flocculation (167–169), traits
problematic for pooling of segregants and recovery of yeast from mouse organs. Parental strains
were also engineered to have a genomic landing pad (109, 110, 163) with two partially crippled
LoxP sites (126), Lox71 and Lox2272/71, and a galactose-inducible Cre recombinase (127) at the
neutral YBR209W locus (163) (Figure S4.1a). We first generated MATa versions of the BY and
3S parent strains and then obtained MATɑ versions through mating-type-switching of the MATa
strains with a galactose-inducible HO plasmid (122). All segregants in a pool should be the same
mating type, otherwise mating will occur. To ensure that a pool of segregants of the same mating
type could be generated without any allelic bias near the mating locus, we created two BY/3S
heterozygous diploids: BY MATa x 3S MATɑ and 3S MATa x BY MATɑ. Both diploids were
160
sporulated and roughly equal numbers of four-spore tetrads (~500) were obtained from each by
tetrad dissection. To maximize the number of unique recombination breakpoints in our panel of
segregants, one MATα segregant was then randomly selected from each tetrad for inclusion in this
study.
4.5.2. Barcoding of haploid segregants
822 segregants were barcoded through the transformation and integration of a barcoded
plasmid library via Cre-mediated homologous recombination at lox2272/71 (Figure S4.1). A
pBAR6 plasmid (109) marked with KanMX was modified by Gibson assembly to include a
Lox2272/66 site, a random 20-mer barcode sequence, and a partial TruSeq read 2 adapter
sequence. The barcoded plasmids were transformed into each segregant individually and then
recombined into the yeast genome by inducing the galactose-inducible Cre recombinase using YP
+ 2% galactose media for ~20 hours. Cre-mediated recombination between the two partially
crippled Lox2272 variants, Lox2272/66 and Lox2272/71, produces a crippled Lox2272/66/71 and
a fully functional Lox2272. For each segregant, all integrants (between one and five) were picked
from YPD + 200 ug/ml G418 agar plate. Glycerol freezer stocks of these integrants were then
made and stored at -80°C. A subset of 86 segregants containing three different barcodes were
included in all work, enabling internal replication and measurement of broad sense heritability.
4.5.3. Whole genome sequencing of haploid segregants
Genomic DNA was obtained from each segregant using the Qiagen DNeasy Blood and
Tissue kit. For each segregant, a whole genome sequencing library was then constructed using the
Illumina Nextera kit. Each library was barcoded and ~192 segregants were multiplexed per
sequencing lane. Sequencing libraries from segregants were pooled in equimolar fractions, size
selected from an agarose gel, and purified using the Qiagen Gel Extraction kit. Multiplexed
161
samples were sequenced by Novogene on six Illumina HiSeq 2500 lanes using 150 bp × 150 bp
paired-end reads. Reads for each segregant were mapped against the S288c reference genome
using BWA with default settings (82). Using SAMTOOLS (83) with default settings, alignments
were converted to bam files and sorted, read duplicates were removed, and pileups were generated.
Data for 43,865 high-confidence SNPs (107, 160) that differ between BY and 3S was then
extracted from the pileup files. Segregants showing any signs of aneuploidy or cross-
contamination in their genotype data were excluded from further analysis. Also, all segregants
with a mean per site coverage of less than 2 were removed. In total, 76 MATα segregants were
removed based on these criteria. For the remaining 927 MATα segregants, a vector containing the
fraction of 3S calls at each SNP was generated. Initial genotype calls were made by classifying
sites above and below 50% classified as 3S and BY, respectively. A Hidden Markov Model
(HMM) was then used to correct these initial genotype calls and impute information at missing
sites. The HMM was implemented using the HMM package version 1.0 (187) in R. We employed
the transition and emission probability matrices: transProbs = matrix(c(0.9999, 0.0001, 0.0001,
0.9999), emissionProbs = matrix(c(.0.25, 0.75, 0.75, 0.25). Adjacent SNPs in the HMM-corrected
genotype calls that lacked recombination in the segregants were collapsed, as they contained
identical information. This reduced the number of SNP markers in subsequent analyses from
43,865 to 14,347.
4.5.4. Determination of segregant barcodes
To determine the barcode(s) in each segregant, we performed targeted Illumina sequencing.
Libraries were generated via PCR using custom primers flanking the barcode. The primers used to
amplify the barcode were:
Forward:
162
5’-
TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGACGAAGTTATTGCGCGGTGATC
-3’
Reverse:
5’-
GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGTACGTGTGCTCTTCCGATCT
-3’
where the sequences in bold are handles overlapping TruSeq adapter sequences, and the
remainder of the sequence is homologous to the flanking sequence in the genomic landing pad.
PCR products were purified using a Qiagen MinElute PCR purification kit, amplified using
barcoded TruSeq adapters and Illumina P1 and P2 primers, and then size selected via gel extraction
prior to sequencing. Sequencing was performed using Novogene Illumina HiSeq 2500
150 bp × 150 bp paired-end reads at ~2,000x coverage per barcode. 20-mer barcode sequences for
each segregant were extracted from the sequencing reads and clustered with Bartender
65
using a
Hamming distance of 2. Clusters comprising >5% of the total reads for each sample were
considered true barcodes. Only MATα segregants that passed genotyping quality thresholds and
had at least 1 barcode different from all others were used in this study.
4.5.5. Animal Husbandry
All procedures and personnel involving mice were approved by the University of Southern
California’s Institute for Animal Care and Use Committee under protocol #21102. All mice were
housed on a 14:10 hour light:dark cycle and had ad libitum access to food and water. A single
mouse strain, C57BL/6J (JAX stock #000664), of Mus musculus was used for all experiments.
Mice were ordered at 6 weeks of age and housed for two weeks prior to experiments. Upon arrival,
163
mice were housed in groups of three according to their sex, then housed individually beginning at
7 weeks of age until they were euthanized at their experimental endpoints (one, two, or five days
post-infection). To minimize batch effects, animals in the same cage were chosen at random to
receive a different combination of immunocompromisation with dex and/or experimental
endpoint.
4.5.6. Experimental infection of mice
36 mice (2 sexes x 2 dex treatments x 3 timepoints x 3 replicates) were used in the below
experiments. At 8 weeks of age, individuals were split into immunocompromised (dex+) or
immunocompetent (dex-) treatments. To generate immunocompromised animals, 500 uL of 4
mg/ml dexamethasone sodium phosphate (Sigma-Aldrich) was administered twice daily at ~9 am
and ~9 pm to animals via intraperitoneal (IP) injection. Immunocompromised animals were treated
with 500uL water twice daily via IP injection. These treatments began two days prior to infection
and continued until experimental endpoints. Daily treatments were administered at regular
intervals, with the first injection administered between 9 and 10 am and the second between 9 and
10 pm. To prevent infection by agents other than the panel of haploid segregants, we also gave
dexamethasone-treated animals water with 0.1 mg/ml gentamicin sulfate and 2 mg/ml
streptomycin sulfate. Animals injected with water were provided water without antibiotics.
Two days prior to infection, the yeast strains used in this study were grown from frozen
stocks to stationary phase in 800 uL YPD (~2 days) in 96 well plates. Strains were resuspended
and equal volumes (100 uL) of each strain were pooled and mixed to ensure homogeneity. Cells
from the pool were harvested, washed with water, and resuspended in 10 mL of water. Cell
concentration was calculated using a hemocytometer and 1x10
7
cells in 100 uL water from the pool
were used to inoculate the mice via tail vein injection. Prior to the injection, the tails of the mice
164
were dipped in warm water to help dilate the tail vein. Because the C57BL/6J are dark furred mice,
a Leica KL 500 LCD light source was used to visualize the tail vein more clearly during injection.
C.G. and M.L.S. performed all injections. To minimize batch effects, injection order was
randomized with respect to dex treatment and timepoint, with C.G. injecting all ‘odd’ mice and
M.L.S. injecting all ‘even’ mice.
4.5.7. Organ harvesting and processing
At each experimental endpoint (1, 2, or 5 days post-infection), animals were euthanized
via cervical dislocation and wiped down with 70% EtOH to disinfect them prior to dissection.
From each of the 36 mice, five distinct organs were dissected and removed from each animal in
the following order: liver (only the two largest lobes by weight were used), right and left kidney,
spleen, gonads, and brain, for a total of 180 samples. Samples were split and transferred into two
Qiagen PowerBead Tubes (Metal 2.38 mm) each (one lobe of liver per tube, one kidney per tube,
half of the brain, gonad, and spleen samples per tube) and 1 mL of 1X TrypLE Select Enzyme
(Thermo Fisher) was added to each tube. Organ samples were incubated at 37℃ for 5 minutes and
then homogenized at 30 Hz for 3 minutes using a Qiagen Tissuelyzer II. Homogenized samples
were plated on YPD agar plates containing 200 ug/ml G418 and grown for two days at 30°C to
allow yeast colony formation. 1:50 and 1:100 dilutions of each organ sample were also plated to
ensure accurate colony counts could be taken in case of crowding on the plates. After two days of
growth, plates were imaged using a BioRad Gel Doc XR+ with white light and an exposure time
of 0.5 seconds. Next, 10 mL of water was added to the surface of each plate, and colonies were
scraped off of the surface. Resuspended yeast samples were vortexed to ensure homogeneity,
harvested via centrifugation, and stored at -20°C prior to sequencing.
4.5.8. YPD plate growth controls
165
After pooling strains for injection, aliquots of cells were subjected to the tissue
homogenization process prior to plating on YPD agar plates containing 200 ug/mL G418. This
process involved harvesting cells and resuspending them in 2 mL of 1X TrypleSelect enzyme,
incubating the cells at 37°C for 5 minutes in Qiagen PowerBead Tubes (Metal 2.38 mm), and
homogenization at 30 Hz for 5 minutes using a Qiagen Tissuelyser II. Cells were plated at densities
of 10
4
, 10
5
, 10
6
, and 10
7
cells per plate in triplicate. Control plates were grown at 30℃ for two days
before imaging and collecting cells as described above. Dilutions of each plated sample were also
plated to assess the number of colonies formed per plate. After collecting yeast from the plates,
samples were vortexed to ensure homogeneity, harvested via centrifugation, and stored at -20°C
prior to sequencing. To measure the effect of the tissue homogenization process on yeast cells,
separate aliquots of cells were plated directly onto YPD agar plates containing 200 ug/mL G418
at 10
4
,10
5
, and 10
6
cells per plate in duplicate without being subjected to tissue homogenization.
These untreated controls were grown, imaged, harvested, and stored in the same manner as those
described above.
4.5.9. Barcode library preparation
Frozen cultures were thawed and DNA was extracted using the Zymo Quick-DNA
Fungal/Bacterial Minprep Kit. Quantification of DNA was performed using a Qubit high-
sensitivity assay. After DNA extraction, a two-step PCR was used to amplify the barcoded region
of the genome for sequencing. We amplified 150 ng of DNA per organ sample or control,
corresponding to ~1.16x10
7
genomes or ~11,670 copies per barcode (994 barcodes total; 736
strains barcoded once and 86 strains barcoded in triplicate). We performed a 4-cycle PCR on each
sample using Phusion polymerase, 150 ng of DNA, and 1 uL of 10 uM primers each at a total
reaction volume of 50 uL. The primers used in this reaction were:
166
Forward:
5’-
AATGATACGGCGACCACCGAGATCTACACNNXXXXNNACACTCTTTCCCTACACGA
CGCTCTTCCGATCTACGAAGTTATTGCGCGGTGA-3’
Reverse:
5’-
CAAGCAGAAGACGGCATACGAGATNNXXXXNNGTGACTGGAGTTCAGACGTGTGCT
CTTCCGATCT-3’,
with Ns in this sequence corresponding to random nucleotides at a frequency of A: 25%,
C: 25%, G: 25%, T: 25%. These random sequences were used as unique molecular identifiers
(UMIs), enabling identification of PCR duplicates in downstream analyses. Xs in the above
sequences correspond to known, custom multiplex tags, which are used to demultiplex reads from
different samples when pooled onto the same flow cell for sequencing. Multiplexing tags were
designed to have a Hamming distance of four from one another. The 4-cycle PCR reaction was
performed using the following steps:
98C, 3:00 minutes
98C, 0:30 seconds
54C, 0:30 seconds
72C, 2:00 minutes
Repeat steps 2 through 4 4X
72C, 5:00 minutes
4C, Hold indefinitely
167
After amplification, samples were purified using MinElute 96 UF PCR Purification Kit
(Qiagen) protocol and eluted into 20uL of water. Next, a 24-cycle PCR was performed using 15uL
of purified library and Phusion Polymerase (New England Biolabs) in a total reaction volume of
50uL. TruSeq F and R primers were used at 10uM concentration for this PCR. The 24-cycle PCR
was performed using the following steps:
98C, 3:00 minutes
98C, 0:30 seconds
54C, 0:30 seconds
72C, 2:00 minutes
Repeat steps 2 through 4 24X
72C, 5:00 minutes
4C, Hold indefinitely
~45 uL of each PCR product were pooled together and the appropriate PCR product (~220
bp) was isolated by gel electrophoresis and extracted using a QIAquick Gel Extraction Kit. The
pooled samples were checked for purity using a Nanodrop ND-1000 and concentration using a
Qubit Fluorometer and shipped to Novogene. Prior to sequencing, the pooled libraries were further
quantified by Agilent Bioanalyzer at Novogene.
4.5.10. Barcode sequencing
Samples were sequenced on an Illumina HiSeq 4000 using a single flow cell. Because the
majority of the nucleotides in the barcode amplicons are fixed, a 25% PhiX DNA spike-in was
done to increase read diversity. Sequencing reads were analyzed using custom scripts in R and
Python. Reads were demultiplexed and discarded if the average Illumina quality score of the read
in the barcoded region was less than 30 or the landing pad sequence
168
AGTATCCTATACGAACGGTA adjacent to the barcode was not present in the read. PCR
duplicates were eliminated by excluding reads within samples that contained the same UMI
sequences (only one copy was retained). In each file, barcodes were clustered using Bartender
65
using a maximum allowable clustering distance of 3 (bartender_single_com -f $infiles -o
${infiles%.*} -d 3). Raw counts were obtained for each present barcode and values were
normalized to the total number of barcode reads in the sample.
4.5.11. Barcode quantification and phenotype calculation
For each segregant in each sample, we computed (TF - T0)/T0, with T0 and TF corresponding
to a segregant’s barcode frequency in the initial pool and a sample, respectively. These raw
phenotypes were calculated for both control and organ samples. To compute the processed
phenotypes employed in the paper, we next accounted for potential impacts of outgrowing samples
on YPD plates after organ harvesting using the fixed effects linear model phenotypesample ~
phenotypecontrol + error. In this model, phenotypesample and phenotypecontrol corresponded to
individuals’ raw phenotypes in organ samples and mean raw phenotypes in controls, respectively.
This model was implemented using the lm() function in R. The mean raw phenotype of controls
was calculated using three YPD control samples plated at 10
5
cells per plate. Residuals were
extracted from the model and used in all subsequent analyses; these are the persistence phenotypes
used in the manuscript. For the 86 segregants that were replicated in the pool with distinct
barcodes, all replicates were separately corrected. These replicates were only utilized in heritability
analyses. In all work with the full set of 822 segregants, only a single barcode replicate was utilized
for each replicated segregant.
4.5.12. Identification of samples with significant variation among
segregants
169
To identify samples in which segregants showed significant phenotypic differences, we
performed one-way analysis of variance (ANOVA) analysis. For each sample, we conducted a test
using the fixed effects linear model phenotype ~ genotype + error, which was implemented with
the lm() function in R. The p-value of each model was recorded and significant samples were
identified using a Bonferroni corrected threshold (ɑ = 0.05 / 139, samples = 139). Non-significant
samples were excluded from further analysis.
4.5.13. Calculation of broad-sense heritability
Broad-sense heritability (H
2
) was calculated using the 86 segregants that were replicated in
the segregant pool. We used a mixed effects linear model with the formula phenotype ~ genotype
+ error, with genotype a categorical, random effect variable corresponding to the identities of
replicated segregants. This model was implemented using the lmer() function in the lmer package
in R (89, 160). H
2
was computed by taking the sum of squares of genotype and dividing it by the
total sum of squares of the model.
4.5.14. Linkage mapping in individual samples
Linkage mapping was performed on each individual organ sample using forward
regression. 14,347 SNPs distributed throughout the genome were employed as markers, with each
categorically encoded as ‘0’ or ‘1’ for the BY or 3S alleles, respectively. In first stage scans, the
fixed effects linear model phenotype ~ locus + error was implemented for each marker in a sample
using the lm() function in R. In these models, phenotype corresponded to raw phenotypes and
persistence measurements in control and organ samples, respectively, and locus corresponded to
individuals’ genotypes at a marker. For each test, the p-value of the locus term was extracted using
the R function summary.aov(). Significance thresholds were determined separately for each
sample using 1,000 permutations (188) of the segregant persistence data. In each permutation, the
170
phenotype vector was randomly shuffled while the matrix of genotypes was held constant. A
genome-wide scan was performed on each permuted dataset using the fixed effects linear model
phenotype ~ locus + error, with the minimum p-value in each permutation recorded. The threshold
for a first stage scan in a sample was determined based on the 5th percentile of its minimum p-
values from permutations. We only allowed detection of a single locus per chromosome per scan.
Conservative confidence intervals (95%) were determined for each locus using 2 x -log10(p-value)
drops from a peak marker at a locus.
After the first stage scans, additional scans were performed on each sample. In these scans,
we used the fixed effects linear model phenotype ~ known_locus1 + … known_locusN + locus +
error, with each known_locus term corresponding to individuals’ genotypes at any marker
identified in a scan. The phenotype and locus terms were defined in the same manner as in the first
stage scans. In each additional scan, permutations were conducted in the same manner as the first
stage scans, except with known_locus terms also included. In these additional scans, we allowed
detection of only a single locus per chromosome per scan and confidence intervals were computed
in the same way as the first stage scan. Scans were continued until no loci were detectable at
permutation-based significance thresholds.
4.5.15. Consolidation of loci detected in individual samples
Loci detected in individual organ samples were consolidated across samples according to
their 2 x -log10(p-value) confidence intervals. Two loci were considered the same if these intervals
overlapped. The intersection and union of all consolidated confidence intervals were recorded; the
intersection of all confidence intervals was used to resolve loci (Table S4.2), while the union was
used to determine which detections were consolidated into each locus.
4.5.16. Aggregation of persistence measurements across samples
171
Brain and non-brain samples with statistically significant differences in persistence were
identified. Segregant phenotypes (raw measurements) were first time-corrected by dividing by the
number of days since injection into a host. Next, a fixed effects linear model was fit to account for
growth on rich medium as described above. Each segregant’s residual measurements (persistence)
were then averaged across all 15 brain or 75 non-brain samples. Differential persistence between
brain and non-brain samples was computed by taking the difference between a segregant’s mean
brain and mean non-brain time-corrected persistence values.
4.5.17. Linkage mapping with aggregate phenotype data
We performed linkage mapping with aggregate brain and non-brain persistence
measurements, as well as the difference between the two. All aspects of these three forward
regression scans were implemented in the same manner as the linkage mapping scans on individual
samples. The only difference was that in these scans with aggregate data, phenotype in these
models was the aggregate brain data, the aggregate non-brain data, or the difference between the
two.
4.5.18. Calculating the effects of loci in individual samples
For each consolidated locus, the effect size was calculated in each of the individual samples
with significant heritabilities as well as in the control samples. Effect sizes in the controls were
calculated using the raw phenotypes while effect sizes in the organ samples were calculated using
the residual values after correcting for segregant growth on YPD (described in the ‘Barcode
quantification and phenotype calculation’ section above). To calculate the effect size of a locus in
a particular organ or control, phenotype data for the sample was split by genotype at the locus.
Mean phenotypes were calculated for individuals with the BY and 3S alleles and the difference
172
between these means (3S - BY) was computed. Effect sizes were only calculated in organ samples
with significant phenotypic variation.
4.5.19. The effects of individual loci identified in aggregate phenotype
data
We computed the effects of the 17 loci detected in the linkage mapping scans with
aggregate data using both the aggregate brain and aggregate non-brain persistence measurements.
For every locus, we subtracted the mean aggregate brain measurement among segregants with the
BY allele from the mean aggregate brain measurement among segregants with the 3S allele. The
same procedure was then repeated using the aggregate non-brain measurements. A positive value
in the brain or non-brain organs implies the 3S allele was beneficial and the BY allele was
detrimental, while negative value indicates the opposite relationship. 95% bootstrap confidence
intervals were computed for these values by the sampling phenotypes of individuals with BY or
3S alleles at a given locus 1,000 times with replacement using the sample() function in R with
replace=T, computing the difference in mean brain or non-brain phenotype between these two
groups for each sampling, and taking the the 2.5
th
and 97.5
th
percentiles from these differences.
Loci with positive or negative values in both brain and non-brain organs were classified as having
general effects. By contrast, loci showing counteracting effects between brain and non-brain
organs were categorized as antagonistically pleiotropic.
4.5.20. Combinatorial effects of loci identified in aggregate data
We first determined the alleles present in each segregant at the 18 loci detected in the
aggregate scans. The effect size at each locus was calculated in both brain and non-brain samples
by subtracting the mean persistence of segregants containing the BY allele from the mean
173
persistence of segregants containing a 3S allele at a given locus. Segregants were considered
“enriched” for generally beneficial alleles if they contained seven or more generally beneficial
alleles and “depleted” if they contained fewer than three. At pleiotropic loci, segregants were
considered “enriched” for brain loci if they contained six or more loci favoring persistence in the
brain over non-brain organs, and “depleted” if they contained fewer than three. These thresholds
were established so to include enough segregants to enable contingency tests on segregants were
“enriched” or “depleted” for both general and pleiotropic loci to test the combinatorial effects of
the loci; however, no more than 25% of segregants in the dataset were considered “enriched” or
“depleted” for generally beneficial or pleiotropic loci.
Fixed-effect linear models were employed to test the relationship between mean segregant
persistence in brain or non-brain samples and the number of generally beneficial or pleiotropic loci
present in each segregant. These models were implemented using the lm() function in R and took
the form phenotype ~ num_loci, where phenotype corresponds to segregants’ mean persistence
values in either the brain or the non-brain samples, and num_loci corresponds to the number of
either generally beneficial alleles or the number of pleiotropic alleles favoring persistence in the
brain. P values were extracted from these models using the summary.aov() function and R
2
values
were extracted using the summary() function.
Contingency tests were performed to determine whether segregants with increasing
persistence measurements over time in brain or non-brain samples were enriched for generally
beneficial alleles as well as pleiotropic alleles favoring persistence in a particular sample type. To
do this, the linear model phenotype ~ time + error was fit for each segregant using the lm() function
in R, where phenotype was a segregant’s phenotype in each of the 15 brain or 76 non-brain
samples, and time was a numeric vector encoding the number of days post-injection that each
174
sample was collected. The time coefficient from each model was extracted using the coefficients()
function and used in a 2x2 contingency test in which segregants were grouped by whether they
were enriched for both beneficial general and pleiotropic alleles or not as well as whether they had
a positive coefficient or not.
To visualize the collective effects of general and pleiotropic loci on strain persistence over
time, mean time point one and time point five phenotypes for all strains were calculated separately
for brain and non-brain samples. These mean phenotypes were then normalized to the time point
1 phenotypes (by subtracting the time point one phenotype). Strains were divided into four focal
classes depending on whether they had ≥ X or ≤ X beneficial alleles at general loci and ≥ X or ≤
X alleles associated with brain persistence at pleiotropic loci. Within these four groups, the mean
normalized time point five phenotype and standard error were calculated. Bootstrapping was also
performed using 1,000 samplings of the mean time point five persistence values within each focal
group. The slope of the mean segregant persistence between the beginning (day one) and end (day
five) of the experiment were then plotted for each focal group, along with the slopes resulting from
the 1,000 bootstrap samplings (Figure 4.5c and 4.5d).
4.5.21. Genes underlying loci detected in aggregate scans
To resolve the loci identified using aggregate scans, we utilized loci detected in the
individual organ samples. For each consolidated aggregate locus, we first determined the smallest
bounds for the locus among the three scans using aggregate persistence measurements. Next, we
determined which loci from individual samples had 95% confidence intervals overlapping these
bounds. After all individual loci had been identified, we utilized these confidence intervals to find
the most narrow window for each locus. Genes within each window were identified based on the
175
R64-2-1_20150113 S. cerevisiae genome annotation from the Saccharomyces Genome Database
(44).
176
4.6. Supplementary Materials
Figure S4.1: Barcoding of haploid segregants. a, The parental strains used to generate haploid segregants
for this study were each transformed with a genomic landing pad consisting of a galactose-inducible Cre
recombinase and partially crippled LoxP sites. This construct was inserted at the neutral YBR209W site. b,
The plasmid pBAR6 was used to generate a barcode library for integration into haploid segregants. This
plasmid contains a KanMX marker. c, Barcoding plasmids were constructed via Gibson assembly of
linearized pBAR6 with a PCR product containing a partially crippled Lox2272/66 site, a random 20-mer
barcode sequence, and a partial TruSeq read 2 adapter sequence. This plasmid library was then transformed
individually into each segregant. Galactose was used to induce Cre-Lox recombination between barcoded
plasmids and the genomic landing pad at Lox 2272, resulting in integration of the barcode into the genome
of each segregant. Segregants were plated on YPD containing G418 to select for integration.
177
178
Figure S4.2: Reproducibility of controls across plating density and organ homogenization. a,
Scatterplots showing all pairwise comparisons of samples that underwent the tissue homogenization process
prior to plating (+ Homogenization) or not (- Homogenization). Each dot represents a different segregant.
Phenotypes are mean phenotypes from 2 to 3 technical replicate experiments with the same treatment and
plating density. The Pearson correlation coefficient for each pairwise comparison is in the upper-left corner
of each plot. b, Genomic positions of loci identified in each control sample. For each control, loci were
mapped independently in 2 to 3 technical replicates and consolidated by overlapping confidence intervals.
179
Figure S4.3: Controlling for on-plate growth. a, Violin plots showing distributions of the raw phenotypes
of the 822 strains across control samples used to fit fixed effects linear models of on-plate growth (105
cells, + homogenization; top) and the residual phenotype values in these controls after correcting for on-
plate growth (bottom). The median variance in phenotype explained by modeling on-plate growth in the
controls is 93.8%. b, Series of violin plots showing the distributions of the raw phenotypes of the 822 strains
across organ samples (top) and the residual phenotype values within these samples after correcting for on-
plate growth (bottom). The median variance in phenotype explained by modeling on-plate growth in the
organ samples is 24.7%.
180
Figure S4.4: Outlier brain, gonad, and liver samples. a, Phenotypes of segregants (x-axis) are shown
across organ samples (y-axis). Two outlier samples are marked by * symbols. These samples were poorly
correlated with others in the dataset, contained many segregants with near-zero barcode counts, and had
unusually high phenotypic variance among segregants. These outliers likely reflect poor recovery of yeast
from these samples, poor quality sequencing library preparations, or a combination of the two. Thus, these
outlier brain and spleen samples were excluded from downstream analyses. The single gonad sample was
also excluded from downstream analyses due to a lack of within-organ replication and the fact that the
gonad had similar properties to the two outlier samples. b, Histogram of euclidean distances between all
pairs of the 94 organ samples prior to exclusion of the gonad and outlier samples. Distances including one
or more of the three excluded samples are marked by * and clearly show much higher distances than all
other sample-sample comparisons.
181
Figure S4.5: Hierarchical clustering and principal components analysis of organ samples. a,
Dendrogram showing the relationships among hierarchically clustered organ samples. The two main groups
are colored cyan (brain) and orange (non-brain). Sample labels include organ, time, and replicate
information, separated by underscores. b, Scree plot showing the percentage of variance across organ
samples explained (y-axis) by each principal component (x-axis). The first principal component explains
182
most variability in the dataset. c, Sample loadings on the first two principal components. Samples are
colored by organ type. The main division captured by the first principal component is brain vs. non-brain.
d, Sample loadings on the first two principal components. Samples are colored according to the time point
at which they were collected. The second principal component captures a time effect that is much weaker
than the brain vs. non-brain effect.
183
Figure S4.6: Samples show a positive relationship between broad-sense heritability and the number
of detected loci. Scatterplot of the number of loci detected in samples as a function of their broad-sense
heritabilities.
184
185
Figure S4.7: Loci detected in both control and organ samples show different effects in these two
contexts. Data are shown for eight loci detected in both on-plate control samples and aggregate data. Effects
were calculated as the mean persistence of strains with the 3S allele at the focal locus minus the mean
persistence of strains with the BY allele. For each locus, the effect sizes of the locus across sample types
before correction for on-plate growth (“Raw phenotype”, top row) and after correction for on-plate growth
(“Residual phenotype”, bottom row) are shown. In the paper, all mapping results for organ samples were
generated using the latter. In all cases, the loci showed different effects in the organ samples than the control
samples.
186
PC Eigenvalue Percent variance explained Cumulative_variance_percent
1 49.2313938 54.1004327 54.1004327
2 6.7249852 7.39009363 61.4905263
3 4.64572989 5.10519768 66.595724
4 1.90195534 2.09006081 68.6857848
5 1.45642361 1.60046551 70.2862503
6 1.24398835 1.36702017 71.6532705
7 1.1446708 1.25788 72.9111505
8 1.01284261 1.11301386 74.0241644
9 0.97674384 1.07334487 75.0975092
10 0.89437642 0.98283123 76.0803405
Table S4.1 Results from principal component analysis. Eigenvalues and variance explained by the first
ten principal components (PCs) across the 91 samples with significant differences in persistence among
strains. Each principal component is listed in column 1 (“PC”). The eigenvalue of each PC is listed in
column 2 (“Eigenvalue”). The percentage of variance in the data explained by each PC is listed in column
3 (“Percent variance explained”). Column 4 tracks the cumulative variance explained by each PC and all
preceding PCs (“Cumulative_variance_percent”).
187
Chromosome Position -log10(p-value) Samples with detection CI start CI end
14 467028 31.9055081 87 466103 468488
15 163253 26.596455 55 159655 165358
10 656567 14.1894071 17 655475 664824
14 382896 13.7777158 45 373134 388136
15 558912 12.9211021 26 552421 569690
12 677900 12.1338515 13 648649 680005
13 27865 12.0857298 19 23620 29994
15 506101 8.78892383 12 502258 512566
12 614136 8.31693972 46 611179 614708
12 604895 7.66591787 2 603024 609791
15 86817 7.58198738 6 79585 91150
2 514806 7.18993782 4 503196 536637
13 333449 7.17131203 15 327769 341722
15 328791 7.16382027 7 289957 341179
8 113931 7.10793557 16 96126 117504
7 141481 6.67391166 41 120263 144867
13 369062 6.53474435 3 352789 384682
13 50831 6.11920421 4 45801 65881
2 247993 6.00542966 3 215181 254349
3 195779 5.92591648 19 190765 209941
3 162694 5.91797534 3 137654 163810
8 422538 5.76723709 10 413653 465091
11 531524 5.01907183 2 527638 557824
12 82033 4.9644584 3 70019 91612
10 38691 4.94663669 2 24374 41794
4 694033 4.88346484 7 671424 703386
4 484028 4.73426903 2 453755 520815
4 1420155 4.6805821 2 1398006 1466633
13 923989 4.67297161 2 904006 923989
11 210267 4.66220151 3 193550 214537
12 854278 4.63793522 3 837919 872180
6 173376 4.60662207 2 167198 201882
8 218718 4.58254191 2 185403 242447
7 444466 4.55021075 2 396575 483704
16 530286 4.5376581 2 484429 581662
Table S4.2 Loci detected across individual organ samples. Loci detected in individual organ samples,
consolidated by overlapping confidence intervals. Chromosome and positional information for each locus
188
are listed in columns 1 (“Chromosome”) and 2 (“Position”). Column 3 lists the maximum -log 10(p value)
observed across samples for each locus (“-log10(p-value)”). Column 4 lists the number of samples each
locus was detected in (“Samples with detection”); only reproducible loci detected in at least two samples
are listed. The minimal confidence intervals observed across all consolidated detections are listed in
columns 5 and 6 (“CI start” and “CI end”, respectively).
189
Chromosome Position Brain NonBrain B vs NB CI start CI end
14 468488 1 1 0 464233 473648
15 154799 1 1 1 151709 164284
10 662146 1 0 1 654612 665450
13 26333 1 1 1 13364 32621
14 368183 0 1 1 366896 390845
7 123330 0 1 1 95130 146193
12 604895 0 1 0 601489 622969
12 662627 1 0 0 629690 721983
4 505548 1 0 0 453843 540228
4 1425361 0 0 1 1420155 1507540
15 558912 1 1 0 516828 604757
3 201026 0 1 0 109046 210757
2 499454 1 0 0 434976 517160
13 329249 0 0 1 319136 346162
8 74372 0 1 0 56225 121745
4 148638 1 0 1 131075 235676
8 422538 0 0 1 380512 451071
5 481570 0 0 1 470796 560376
Table S4.3 Loci detected in aggregate scans. Loci detected in scans using aggregated phenotypes,
consolidated by overlapping 95% (2LOD) confidence intervals. Chromosome and positional information
for each locus are listed in columns 1 (“Chromosome”) and 2 (“Position”). Whether the loci was detected
(“1”) or not (“0”) in an aggregate scan using brain samples, non-brain samples, or the difference between
mean brain and mean non-brain phenotypes is listed in columns 3 through 5 (“Brain, “NonBrain”, and “B
vs NB”, respectively). The minimal confidence intervals observed across all consolidated detections are
listed in columns 6 and 7 (“CI start” and “CI end”, respectively).
190
Chromosome Position Min
interval
Max
interval
Number of
genes
Gene IDs
14 467028 466103 468488 2
YNL086W,YNL085W
15 163253 159655 165358 2
YOL086C,YOL084W
10 656567 655475 664824 4
YJR125C,YJR126C,YJR127C,YJR129C
13 27865 23620 29994 4
YML124C,YML123C,YML121W,YML120C
14 382896 373134 388136 10
YNL134C,YNL133C,YNL132W,YNL131W,YNL130C,
YNL130C-A,YNL129W,YNL128W,YNL127W,YNL126W
7 141481 120263 144867 11
YGL201C,YGL200C,YGL198W,YGL197W,YGL196W,
YGL195W,YGL194C-A,YGL194C,YGL193C,YGL192W,
YGL191W
12 614136 611179 614708 2
YLR234W,YLR237W
12 604895 603024 609791 4
YLR229C,YLR231C,YLR233C,YLR234W
12 677900 648649 680005 14
YLR256W,YLR257W,YLR258W,YLR259C,YLR260W,
YLR261C,YLR262C,YLR262C-A,YLR263W,YLR264W,
YLR264C-A,YLR265C,YLR266C,YLR267W
4 484028 453755 520815 30
YDR003W,YDR003W-A,YDR004W,YDR005C,YDR006C,
YDR007W,YDR009W,YDR011W,YDR012W,YDR013W,
YDR014W,YDR014W-A,YDR016C,YDR017C,YDR018C,
YDR019C,YDR020C,YDR021W,YDR022C,YDR023W,
YDR025W,YDR026C,YDR027C,YDR028C,YDR030C,
YDR031W,YDR032C,YDR033W,YDR034C,YDR034C-A
4 1420155 1398006 1466633 39
YDR468C,YDR469W,YDR470C,YDR471W,YDR472W,
YDR473C,YDR475C,YDR476C,YDR477W,YDR478W,
YDR479C,YDR480W,YDR481C,YDR482C,YDR483W,
YDR484W,YDR485C,YDR486C,YDR487C,YDR488C,
YDR489W,YDR490C,YDR492W,YDR493W,YDR494W,
YDR495C,YDR496C,YDR497C,YDR498C,YDR499W,
YDR500C,YDR501W,YDR502C,YDR503C,YDR504C,
YDR505C,YDR506C,YDR507C,YDR508C
15 558912 552421 569690 8
YOR122C,YOR123C,YOR124C,YOR125C,YOR126C,
YOR127W,YOR128C,YOR129C
3 195779 190765 209941 13
YCR034W,YCR035C,YCR036W,YCR037C,YCR038C,
YCR039C,YCR040W,YCR041W,YCR042C,YCR043C,
YCR044C,YCR045C,YCR046C
3 162694 137654 163810 15
YCR012W,YCR014C,YCR015C,YCR016W,YCR017C,
YCR018C,YCR019W,YCR020C,YCR020C-A,YCR020W-B,
YCR021C,YCR023C,YCR024C,YCR024C-B,YCR024C-A
2 514806 503196 536637 13
YBR133C,YBR135W,YBR136W,YBR137W,YBR138C,
YBR139W,YBR140C,YBR141C,YBR142W,YBR143C,
YBR145W,YBR146W,YBR147W
13 333449 327769 341722 9
YMR028W,YMR029C,YMR030W,YMR031C,YMR032W,
YMR030W-A,YMR033W,YMR034C,YMR035W
8 113931 96126 117504 13
YHL007C,YHL006C,YHL004W,YHL003C,YHL002W,YHL001W,
YHR001W,YHR001W-A,YHR002W,YHR003C,YHR004C,YHR005C,
YHR005C-A
8 422538 413653 465091 23
YHR158C,YHR159W,YHR160C,YHR161C,YHR162W,
YHR163W,YHR164C,YHR165C,YHR166C,YHR167W,
YHR168W,YHR169W,YHR170W,YHR171W,YHR172W,
YHR173C,YHR174W,YHR175W,YHR175W-
A,YHR176W,YHR177W,YHR178W,YHR179W
Supplementary Table S4.4 Candidate genes within loci detected in aggregate scans. Chromosome and
positional information for each aggregate locus are listed in columns 1 (“Chromosome”) and 2 (“Position”).
The minimal confidence intervals observed across all loci detected in individual samples that overlapped a
191
particular aggregate locus are listed in columns 3 and 4 (“Min position” and “Max position”, respectively).
The number of genes within each confidence interval is listed in column 5 (“Number of genes”). The gene
IDs of all genes within a confidence interval are listed in column 6 (“Gene IDs”).
192
Chapter 5: Concluding remarks
In this chapter, I will summarize my work and discuss future directions.
5.1. Impact of my work
5.1.1. Characterization of genetic background effects
At the time of its publication, the work in chapter two was one of the most thorough
examinations of genetic background effects. Rather than screen pairwise (or higher-order)
combinations of genetic perturbations, the effects of seven gene knockouts were studied across
over 1,000 recombinant progenies of a yeast cross across 10 environments. This allowed me to
examine how many variants segregating in the cross individually and collectively interacted with
the knockouts and allowed us to answer several questions about how genetic interactions
contribute to quantitative phenotypes, including:
1) How common are genetic background effects?
Overall, I detected over 1,000 genetic interactions involving one of the seven gene
knockouts examined in this study, suggesting that epistasis is pervasive in the context of certain
genetic perturbations. I also found that the number of genetic interactions responsive to
perturbations was far greater than the number of loci exhibiting individual effects or interacting
with segregating loci independently of a mutation. This result has implications for the significance
of genetic interactions in complex traits and suggests that introduction of novel variants and/or
genetic perturbations may be an effective way to detect and study background effects.
2) What is the architecture of genetic background effects?
Genetic background effects were found to be highly polygenic, with knockouts interacting
with large numbers of segregating loci in the cross. The seven mutations examined were involved
193
in between 73 and 543 interactions, showing that genetic perturbations vary in the degree to which
they interact with genetic background.
The number of interactions involving a mutation and two or more additional loci far
exceeded the number of pairwise interactions with mutations. This result was consistent across all
seven mutations and is evidence that introduction of mutations into the cross changed the way that
segregating loci interacted with one another, effectively “rewiring” the relationship between
genotype and phenotype in the cross. Finally, we find that six out of seven mutations interact with
the same set of loci within the cross, while a single mutation interacted with a different subset of
segregating loci. It may be that functionally related genes are more likely to interact with the same
subsets of genetic variation in a population, that there are loci that are generally responsive to
genetic perturbation within biological systems, or both.
3) How do background effects change across environment?
The number of genetic interactions changed dramatically as a function of environment,
ranging from 15 to 359. Most interactions were environment-specific, with a small number of
interactions having effects across one or more additional environments. Overall, my work in
chapter two shows that the environment has a large effect on the presence of epistasis and genetic
background effects.
5.1.2. Examining the architecture of complex traits in a diploid model
From a technical perspective, the dual-barcoding system used in chapter three facilitated
the generation of an extremely large, structured, and barcoded panel of diploid yeast strains. This
panel was used to efficiently study fitness in a large number of environments, but can also be
applied to other contexts. Overall, this workflow can be used to generate large numbers of
barcoded genetic backgrounds, granting high statistical power to study many complex phenotypes.
194
Biologically, the work in chapter three provided novel insight into genetic background
effects in diploid systems. Using a structured population composed of hundreds of families of
closely related strains allowed me to examine how the effects of loci are modified or even masked
in different subpopulations of the panel. I also find that despite the large numbers of loci involved
in genetic interactions, fitness of these diploid strains across environments was largely due to the
effects of several large “hub” loci. These hub loci vary in their effects across families and modify
the additive and dominance effects of other loci and each other.
To my knowledge, this work is also one of the most comprehensive attempts to determine
how much dominance contributes to genetic interactions in diploids. In this study, dominance and
additivity contribute roughly equally to the variance that can be explained by genetic interactions,
implying that the contribution of epistasis to complex phenotypes may be higher than that reported
in many quantitative genetic studies employing haploid systems.
5.1.3. The genetics of a host-pathogen interaction
Many studies focused on host-pathogen interaction rely upon reverse-genetic screens to
determine the effects of mutations within the pathogen within a single (or limited) number of
genetic backgrounds. Chapter four provides a look at the genetic architecture of fungal persistence
in a mammalian host using a forward genetic approach in which loci contributing to persistence
are identified using over 800 genetic backgrounds from a two-parent yeast cross. I find evidence
that many loci have effects that change as a function of which organ the yeast inhabit and exert
effects on persistence over time, revealing that ongoing selection is acting upon these sites inside
the host. A major signature in the data is the difference between the brain and non-brain organs,
which has implications for future work in fungal pathogens that cause death primarily through
infection of the brain. I also find extensive antagonistic pleiotropy in the loci identified in this
195
study, with alleles having opposite effects on persistence in the brain and non-brain organs. These
loci constrain the ability of the yeast to persist throughout the host, and loci with general effects
on persistence across organ types do not alleviate these limitations. Collectively, this architecture
results in many strains that “specialize” in survival within the brain or elsewhere, but few that
exhibit high levels of persistence across organs. This finding has implications for the study of host-
pathogen interactions in other systems.
5.2. Future directions
5.2.1. Chapter 2
While the dissection of genetic background effects in chapter two was detailed, a relatively
modest number of genes were examined. Further, the genes perturbed in this study all encode
chromatin-associated proteins, meaning that their functional roles in the cell may be similar. In
particular, the finding that many loci are “pleiotropic”- that is, interacting with the majority of
mutations in the study- may be driven by the design of the experiment. As a result, a more
exhaustive study involving perturbations of genes spanning cellular function would provide more
compelling insight into the underpinnings of genetic background effects.
Another limitation of this study was the fact that the effect of each mutation was examined
in a different set of genetic backgrounds. Generation of a panel of strains in which genes can be
efficiently perturbed would increase statistical power to detect background effects and explicitly
show how individuals differ in their response to mutations.
5.2.2. Chapter 3
In chapter three, I find that the mating locus (or a closely-linked genetic variant)
significantly modifies the degree of dominance, but not additivity, of interacting loci. Through
these interactions, the mating locus exerts considerable impact on fitness despite a negligible
196
individual effect. The mechanisms by which this curious genetic effect occurs are not clear and
warrant further investigation, beginning with cloning of the causal variant at the mating locus and
at least one interacting locus. Introduction of all combinations of alleles at the cloned sites should
recapitulate the interaction observed in our study and lead to testable downstream hypotheses
regarding the mechanisms by which dominance effects can be influenced across genetic
backgrounds.
5.2.3. Chapter 4
Understanding the molecular mechanisms that govern host-pathogen relationships is
crucial for our ability to predict patient outcomes and treat diseases. My results in chapter four
shed light on the genetic basis of fungal persistence in a mammalian host; however, this study has
two primary limitations. First, I did not resolve any of the identified loci to single-gene or
nucleotide resolution, so I cannot yet speak to the specific mechanisms that determine whether
fungal strains will persist in the brain, non-brain organs, or both within a host. Second, S cerevisiae
exists primarily as a diploid in nature, including when it acts as an opportunistic pathogen. In
chapter three, I found that complex genetic effects can arise in diploids; such effects are invisible
to my work in chapter four.
Both of these issues can be addressed in parallel. By conducting pairwise mating using a
subset of the 822 MATα segregants in this study, and a subset of the MATa segregants used in
chapter three, a panel of recombinant diploid strains can be generated. Due to limits on the number
of yeast colonies recovered per mouse, the diploid panel constructed would likely need to be
significantly smaller than that used in chapter three (~10,000 – 30,000 strains). A panel of strains
this size would provide significantly improved statistical power for mapping, and segregants could
be chosen to maximize the number of recombination breakpoints present in the population,
197
minimizing loss in mapping resolution relative to the study in chapter four. Performing an
additional study using a resource of this type should allow resolution of the loci involved in fungal
persistence, facilitating cloning and downstream experiments in the future. This work would also
permit studying the genetic basis of fungal persistence in the host in a more realistic context.
Another direction for future work is to apply similar approaches to other fungal pathogens
such as Cryptococcus neoformans and Candida albicans. Current unbiased reverse-genetic
approaches to study fungal infectivity or survival in the host could be scoped to focus on brain/non-
brain differences and may uncover the mechanisms by which other pathogens can cross the blood-
brain barrier. Finally, crossing schemes in certain pathogens may permit the genetic dissection of
pathogenicity in a manner similar to the work in chapter four.
198
References
1. Freedman, M. L. et al. Principles for the post-GWAS functional characterization of cancer risk
loci. Nat. Genet. 43, 513–518 (2011).
2. DIAGRAM Consortium et al. New genetic loci implicated in fasting glucose homeostasis and
their impact on type 2 diabetes risk. Nat. Genet. 42, 105–116 (2010).
3. The Diabetes Genetics Initiative et al. A common variant of HMGA2 is associated with adult
and childhood height in the general population. Nat. Genet. 39, 1245–1250 (2007).
4. Shi, J. et al. Unraveling the Complex Trait of Crop Yield With Quantitative Trait Loci Mapping
in Brassica napus. Genetics 182, 851–861 (2009).
5. Eichler, E. E. et al. Missing heritability and strategies for finding the underlying causes of
complex disease. Nat. Rev. Genet. 11, 446–450 (2010).
6. Manolio, T. A. et al. Finding the missing heritability of complex diseases. Nature 461, 747–
753 (2009).
7. Hemani, G., Knott, S. & Haley, C. An Evolutionary Perspective on Epistasis and the Missing
Heritability. PLoS Genet. 9, e1003295 (2013).
8. Mackay, T. F. & Moore, J. H. Why epistasis is important for tackling complex human disease
genetics. Genome Med. 6, 125 (2014).
9. Chandler, C. H., Chari, S., Tack, D. & Dworkin, I. Causes and Consequences of Genetic
Background Effects Illuminated by Integrative Genomic Analysis. Genetics 196, 1321–1336
(2014).
10. Hill, W. G., Goddard, M. E. & Visscher, P. M. Data and Theory Point to Mainly Additive
Genetic Variance for Complex Traits. PLoS Genet. 4, e1000008 (2008).
199
11. Zuk, O., Hechter, E., Sunyaev, S. R. & Lander, E. S. The mystery of missing heritability:
Genetic interactions create phantom heritability. Proc. Natl. Acad. Sci. 109, 1193–1198 (2012).
12. Goddard, M. E., Hayes, B. J. & Meuwissen, T. H. E. Using the genomic relationship matrix to
predict the accuracy of genomic selection: Predict the accuracy of genomic selection. J. Anim.
Breed. Genet. 128, 409–421 (2011).
13. Crow, J. F. On epistasis: why it is unimportant in polygenic directional selection. Philos. Trans.
R. Soc. B Biol. Sci. 365, 1241–1244 (2010).
14. Crow, J. F. Maintaining evolvability. J. Genet. 87, 349–353 (2008).
15. Geiler-Samerotte, K. A., Zhu, Y. O., Goulet, B. E., Hall, D. W. & Siegal, M. L. Selection
Transforms the Landscape of Genetic Variation Interacting with Hsp90. PLOS Biol. 14,
e2000465 (2016).
16. Queitsch, C., Sangster, T. A. & Lindquist, S. Hsp90 as a capacitor of phenotypic variation.
Nature 417, 618–624 (2002).
17. Rutherford, S. L. & Lindquist, S. Hsp90 as a capacitor for morphological evolution. Nature
396, 336–342 (1998).
18. Chen, R. et al. Analysis of 589,306 genomes identifies individuals resilient to severe
Mendelian childhood diseases. Nat. Biotechnol. 34, 531–538 (2016).
19. Dowell, R. D. et al. Genotype to Phenotype: A Complex Problem. Science 328, 469–469
(2010).
20. Taylor, M. B., Phan, J., Lee, J. T., McCadden, M. & Ehrenreich, I. M. Diverse genetic
architectures lead to the same cryptic phenotype in a yeast cross. Nat. Commun. 7, 11669
(2016).
200
21. Lee, J. T., Coradini, A. L. V., Shen, A. & Ehrenreich, I. M. Layers of Cryptic Genetic Variation
Underlie a Yeast Complex Trait. Genetics 211, 1469–1482 (2019).
22. Hou, J., Tan, G., Fink, G. R., Andrews, B. J. & Boone, C. Complex modifier landscape
underlying genetic background effects. Proc. Natl. Acad. Sci. 116, 5045–5054 (2019).
23. Forsberg, S. K. G., Bloom, J. S., Sadhu, M. J., Kruglyak, L. & Carlborg, Ö. Accounting for
genetic interactions improves modeling of individual quantitative trait phenotypes in yeast.
Nat. Genet. 49, 497–503 (2017).
24. Phillips, P. C. Epistasis — the essential role of gene interactions in the structure and evolution
of genetic systems. Nat. Rev. Genet. 9, 855–867 (2008).
25. Hallgrímsdóttir, I. B. & Yuster, D. S. A complete classification of epistatic two-locus models.
BMC Genet. 9, 17 (2008).
26. Musani, S. K. et al. Detection of Gene × Gene Interactions in Genome-Wide Association
Studies of Human Population Data. Hum. Hered. 63, 67–84 (2007).
27. Wilkie, A. O. The molecular basis of genetic dominance. J. Med. Genet. 31, 89–98 (1994).
28. Monir, Md. M. & Zhu, J. Dominance and Epistasis Interactions Revealed as Important Variants
for Leaf Traits of Maize NAM Population. Front. Plant Sci. 9, 627 (2018).
29. Guo, T. et al. Genetic basis of grain yield heterosis in an “immortalized F2” maize population.
Theor. Appl. Genet. 127, 2149–2158 (2014).
30. Boeven, P. H. G. et al. Negative dominance and dominance-by-dominance epistatic effects
reduce grain-yield heterosis in wide crosses in wheat. Sci. Adv. 6, eaay4897 (2020).
31. Veitia, R. A. & Birchler, J. A. Dominance and gene dosage balance in health and disease: why
levels matter!: Dominance and gene dosage balance in health and disease. J. Pathol. 220, 174–
185 (2010).
201
32. Paaby, A. B. & Rockman, M. V. The many faces of pleiotropy. Trends Genet. 29, 66–73
(2013).
33. Watanabe, K. et al. A global overview of pleiotropy and genetic architecture in complex traits.
Nat. Genet. 51, 1339–1348 (2019).
34. Geiler-Samerotte, K. A. et al. Extent and context dependence of pleiotropy revealed by high-
throughput single-cell phenotyping. PLOS Biol. 18, e3000836 (2020).
35. Solovieff, N., Cotsapas, C., Lee, P. H., Purcell, S. M. & Smoller, J. W. Pleiotropy in complex
traits: challenges and strategies. Nat. Rev. Genet. 14, 483–495 (2013).
36. Metzger, B. P. H., Wittkopp, P. J. & Coolon, Joseph. D. Evolutionary Dynamics of Regulatory
Changes Underlying Gene Expression Divergence among Saccharomyces Species. Genome
Biol. Evol. 9, 843–854 (2017).
37. Wittkopp, P. J., Haerum, B. K. & Clark, A. G. Regulatory changes underlying expression
differences within and between Drosophila species. Nat. Genet. 40, 346–350 (2008).
38. Signor, S. A. & Nuzhdin, S. V. The Evolution of Gene Expression in cis and trans. Trends
Genet. 34, 532–544 (2018).
39. Featherstone, D. E. & Broadie, K. Wrestling with pleiotropy: Genomic and topological
analysis of the yeast gene expression network. BioEssays 24, 267–274 (2002).
40. Tyler, A. L., Asselbergs, F. W., Williams, S. M. & Moore, J. H. Shadows of complexity: what
biological networks reveal about epistasis and pleiotropy. BioEssays 31, 220–227 (2009).
41. Austad, S. N. & Hoffman, J. M. Is antagonistic pleiotropy ubiquitous in aging biology? Evol.
Med. Public Health 2018, 287–294 (2018).
42. Carter, A. J. & Nguyen, A. Q. Antagonistic pleiotropy as a widespread mechanism for the
maintenance of polymorphic disease alleles. BMC Med. Genet. 12, 160 (2011).
202
43. Gerstein, A. C., Chun, H.-J. E., Grant, A. & Otto, S. P. Genomic Convergence toward Diploidy
in Saccharomyces cerevisiae. PLoS Genet. 2, e145 (2006).
44. Berman, J. & Sudbery, P. E. Candida albicans: A molecular revolution built on lessons from
budding yeast. Nat. Rev. Genet. 3, 918–931 (2002).
45. Strope, P. K. et al. The 100-genomes strains, an S. cerevisiae resource that illuminates its
natural phenotypic and genotypic variation and emergence as an opportunistic pathogen.
Genome Res. 25, 762–774 (2015).
46. Chandler, C. H., Chari, S. & Dworkin, I. Does your gene need a background check? How
genetic background impacts the analysis of mutations, genes, and evolution. Trends Genet 29,
358-366 (2013).
47. Nadeau, J. H. Modifier genes in mice and humans. Nat Rev Genet 2, 165-174 (2001).
48. Chow, C. Y. Bringing genetic background into focus. Nat Rev Genet 17, 63-64 (2016).
49. Taylor, M. B. & Ehrenreich, I. M. Transcriptional derepression uncovers cryptic higher-order
genetic interactions. PLoS Genet 11, e1005606 (2015).
50. Lee, J. T., Taylor, M. B., Shen, A. & Ehrenreich, I. M. Multi-locus genotypes underlying
temperature sensitivity in a mutationally induced trait. PLoS Genet 12, e1005929 (2016).
51. Cooper, D. N., Krawczak, M., Polychronakos, C., Tyler-Smith, C. & Kehrer-Sawatzki, H.
Where genotype is not predictive of phenotype: towards an understanding of the molecular
basis of reduced penetrance in human inherited disease. Hum Genet 132, 1077-1130 (2013).
52. Sanders, S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly
associated with autism. Nature 485, 237-241 (2012).
53. Jin, S. C. et al. Contribution of rare inherited and de novo variants in 2,871 congenital heart
disease probands. Nat Genet 49, 1593-1601 (2017).
203
54. Fromer, M. et al. De novo mutations in schizophrenia implicate synaptic networks. Nature
506, 179-184 (2014).
55. Jerison, E. R. et al. Genetic variation in adaptability and pleiotropy in budding yeast. bioRxiv
(2017).
56. Carlborg, O., Jacobsson, L., Ahgren, P., Siegel, P. & Andersson, L. Epistasis and the release
of genetic variation during long-term selection. Nat Genet 38, 418-420 (2006).
57. Taylor, M. B. & Ehrenreich, I. M. Higher-order genetic interactions and their contribution to
complex traits. Trends Genet 31, 34-40 (2015).
58. Sackton, T. B. & Hartl, D. L. Genotypic context and epistasis in individuals and populations.
Cell 166, 279-287 (2016).
59. Mackay, T. F. Epistasis and quantitative traits: using model organisms to study gene-gene
interactions. Nat Rev Genet 15, 22-33 (2014).
60. Ehrenreich, I. M. Epistasis: searching for interacting genetic variants using crosses. Genetics
206, 531-535 (2017).
61. Chandler, C. H. et al. How well do you know your mutation? Complex effects of genetic
background on expressivity, complementation, and ordering of allelic effects. PLoS Genet 13,
e1007075 (2017).
62. Matsui, T., Lee, J. T. & Ehrenreich, I. M. Genetic suppression: Extending our knowledge from
lab experiments to natural populations. Bioessays 39 (2017).
63. Paaby, A. B. et al. Wild worm embryogenesis harbors ubiquitous polygenic modifier variation.
Elife 4 (2015).
64. Jarosz, D. F. & Lindquist, S. Hsp90 and environmental stress transform the adaptive value of
natural genetic variation. Science 330, 1820-1824 (2010).
204
65. van Swinderen, B. & Greenspan, R. J. Flexibility in a gene network affecting a simple behavior
in Drosophila melanogaster. Genetics 169, 2151-2163 (2005).
66. Schell, R., Mullis, M. & Ehrenreich, I. M. Modifiers of the genotype-phenotype map: Hsp90
and beyond. PloS Biol 14, e2001015 (2016).
67. Tirosh, I., Reikhav, S., Sigal, N., Assia, Y. & Barkai, N. Chromatin regulators as capacitors of
interspecies variations in gene expression. Mol Syst Biol 6, 435 (2010).
68. Richardson, J. B., Uppendahl, L. D., Traficante, M. K., Levy, S. F. & Siegal, M. L. Histone
variant HTZ1 shows extensive epistasis with, but does not increase robustness to, new
mutations. PloS Genet 9, e1003733 (2013).
69. Paaby, A. B. & Rockman, M. V. Cryptic genetic variation: evolution’s hidden substrate. Nat
Rev Genet 15, 247-258 (2014).
70. Gibson, G. & Dworkin, I. Uncovering cryptic genetic variation. Nat Rev Genet 5, 681-690
(2004).
71. Vu, V. et al. Natural variation in gene expression modulates the severity of mutant phenotypes.
Cell 162, 391-402 (2015).
72. Taylor, M. B. & Ehrenreich, I. M. Genetic interactions involving five or more genes contribute
to a complex trait in yeast. PloS Genet 10, e1004324 (2014).
73. Bergman, A. & Siegal, M. L. Evolutionary capacitance as a general feature of complex gene
networks. Nature 424, 549-552 (2003).
74. Li, B., Carey, M. & Workman, J. L. The role of chromatin during transcription. Cell 128, 707-
719 (2007).
75. Rando, O. J. & Winston, F. Chromatin and transcription in yeast. Genetics 190, 351-387
(2012).
205
76. Carmen, A. A. et al. Yeast HOS3 forms a novel trichostatin A-insensitive homodimer with
intrinsic histone deacetylase activity. Proc Natl Acad Sci U S A 96, 12356-12361 (1999).
77. Robyr, D. et al. Microarray deacetylation maps determine genome-wide functions for yeast
histone deacetylases. Cell 109, 437-446 (2002).
78. Wang, M. & Collins, R. N. A lysine deacetylase Hos3 is targeted to the bud neck and involved
in the spindle position checkpoint. Mol Biol Cell 25, 2720-2734 (2014).
79. Kumar, A. et al. Daughter-cell-specific modulation of nuclear pore complexes controls cell
cycle entry during asymmetric division. Nat Cell Biol 20, 432-442 (2018).
80. Tong, A. H. & Boone, C. Synthetic genetic array analysis in Saccharomyces cerevisiae.
Methods Mol Biol 313, 171-192 (2006).
81. Gietz, R. D. & Woods, R. A. Transformation of yeast by lithium acetate/single-stranded carrier
DNA/polyethylene glycol method. Methods Enzymol 350, 87-96 (2002).
82. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform.
Bioinformatics 25, 1754-1760 (2009).
83. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078-
2079 (2009).
84. Rabiner, L. R. A tutorial on hidden markov models and selected applications in speech
recognition. Proceedings of the IEEE 77, 257-286 (1989).
85. Matsui, T. & Ehrenreich, I. M. Gene-environment interactions in stress response contribute
additively to a genotype-environment interaction. PloS Genet 12, e1006158 (2016).
86. Smith, E. R. et al. ESA1 is a histone acetyltransferase that is essential for growth in yeast. Proc
Natl Acad Sci U S A 95, 3561-3565 (1998).
206
87. Lin, Y. Y. et al. A comprehensive synthetic genetic interaction network governing yeast
histone acetylation and deacetylation. Genes Dev 22, 2062-2074 (2008).
88. T. F. C. Mackay, E. A. Stone, J. F. Ayroles, The genetics of quantitative traits: challenges and
prospects. Nat. Rev. Genet. 10, 565–577 (2009).
89. J. S. Bloom, I. M. Ehrenreich, W. T. Loo, T.-L. V. Lite, L. Kruglyak, Finding the sources of
missing heritability in a yeast cross. Nature. 494, 234–237 (2013).
90. E. A. Boyle, Y. I. Li, J. K. Pritchard, An Expanded View of Complex Traits: From Polygenic
to Omnigenic. Cell. 169, 1177–1186 (2017).
91. H. Shao, L. C. Burrage, D. S. Sinasac, A. E. Hill, S. R. Ernest, W. O’Brien, H.-W. Courtland,
K. J. Jepsen, A. Kirby, E. J. Kulbokas, M. J. Daly, K. W. Broman, E. S. Lander, J. H. Nadeau,
Genetic architecture of complex traits: Large phenotypic effects and pervasive epistasis. Proc.
Natl. Acad. Sci. 105, 19910–19914 (2008).
92. W. Huang, T. F. C. Mackay, The Genetic Architecture of Quantitative Traits Cannot Be
Inferred from Variance Component Analysis. PLOS Genet. 12, e1006421 (2016).
93. H. C. Rowe, B. G. Hansen, B. A. Halkier, D. J. Kliebenstein, Biochemical Networks and
Epistasis Shape the Arabidopsis thaliana Metabolome. Plant Cell. 20, 1199–1216 (2008).
94. W.-H. Wei, G. Hemani, C. S. Haley, Detecting epistasis in human complex traits. Nat. Rev.
Genet. 15, 722–733 (2014).
95. D. S. Falconer, T. F. C. Mackay, Introduction to Quantitative Genetics (Longmans Green,
Harlow, Essex, UK, ed. 4
th
, 1996).
96. M. Lynch, B. Walsh, Genetics and Analysis of Quantitative Traits (Sinauer Associates,
Sunderland, MA, 1998).
207
97. J. S. Bloom, I. Kotenko, M. J. Sadhu, S. Treusch, F. W. Albert, L. Kruglyak, Genetic
interactions contribute less than additive effects to quantitative trait variation in yeast. Nat.
Commun. 6, 8712 (2015).
98. W. Huang, S. Richards, M. A. Carbone, D. Zhu, R. R. H. Anholt, J. F. Ayroles, L. Duncan, K.
W. Jordan, F. Lawrence, M. M. Magwire, C. B. Warner, K. Blankenburg, Y. Han, M. Javaid,
J. Jayaseelan, S. N. Jhangiani, D. Muzny, F. Ongeri, L. Perales, Y.-Q. Wu, Y. Zhang, X. Zou,
E. A. Stone, R. A. Gibbs, T. F. C. Mackay, Epistasis dominates the genetic architecture of
Drosophila quantitative traits. Proc. Natl. Acad. Sci. 109, 15553–15559 (2012).
99. The 1000 Genomes Project Consortium, A map of human genome variation from population-
scale sequencing. Nature. 467, 1061–1073 (2010).
100. P. M. Magwene, Ö. Kayıkçı, J. A. Granek, J. M. Reininga, Z. Scholl, D. Murray,
Outcrossing, mitotic recombination, and life-history trade-offs shape genome evolution in
Saccharomyces cerevisiae. Proc. Natl. Acad. Sci. 108, 1987–1992 (2011).
101. G. A. Churchill, D. M. Gatti, S. C. Munger, K. L. Svenson, The diversity outbred mouse
population. Mamm. Genome. 23, 713–718 (2012).
102. J. M. Cheverud, E. J. Routman, Epistasis and its contribution to genetic variance
components. Genetics. 139, 1455–1461 (1995).
103. J. M. Cheverud, E. J. Routman, Epistasis as a source of increased additive genetic variance
at population bottlenecks. Evolution. 50, 1042–1051 (1996).
104. R. F. Campbell, P. T. McGrath, A. B. Paaby, Analysis of Epistasis in Natural Traits Using
Model Organisms. Trends Genet. 34, 883–898 (2018).
208
105. J. Hallin, K. Märtens, A. I. Young, M. Zackrisson, F. Salinas, L. Parts, J. Warringer, G.
Liti, Powerful decomposition of complex traits in a diploid model. Nat. Commun. 7, 13311
(2016).
106. K. Märtens, J. Hallin, J. Warringer, G. Liti, L. Parts, Predicting quantitative traits from
genome and phenome with near perfect accuracy. Nat. Commun. 7, 11512 (2016).
107. M. N. Mullis, T. Matsui, R. Schell, R. Foree, I. M. Ehrenreich, The complex underpinnings
of genetic background effects. Nat. Commun. 9, 3548 (2018).
108. S. F. Levy, J. R. Blundell, S. Venkataram, D. A. Petrov, D. S. Fisher, G. Sherlock,
Quantitative evolutionary dynamics using high-resolution lineage tracking. Nature. 519, 181–
186 (2015).
109. X. Liu, Z. Liu, A. K. Dziulko, F. Li, D. Miller, R. D. Morabito, D. Francois, S. F. Levy,
iSeq 2.0: A Modular and Interchangeable Toolkit for Interaction Screening in Yeast. Cell Syst.
8, 338-344.e8 (2019).
110. U. Schlecht, Z. Liu, J. R. Blundell, R. P. St.Onge, S. F. Levy, A scalable double-barcode
sequencing platform for characterization of dynamic protein-protein interactions. Nat.
Commun. 8, 15586 (2017).
111. L. Zhao, Z. Liu, S. F. Levy, S. Wu, Bartender: a fast and accurate clustering algorithm to
count barcode reads. Bioinformatics. 34, 739–747 (2018).
112. F. Li, M. L. Salit, S. F. Levy, Unbiased Fitness Estimation of Pooled Barcode or Amplicon
Sequencing Studies. Cell Syst. 7, 521-525.e4 (2018).
113. C. Lippert, J. Listgarten, Y. Liu, C. M. Kadie, R. I. Davidson, D. Heckerman, FaST linear
mixed models for genome-wide association studies. Nat. Methods. 8, 833–835 (2011).
209
114. C. Widmer, C. Lippert, O. Weissbrod, N. Fusi, C. Kadie, R. Davidson, J. Listgarten, D.
Heckerman, Further Improvements to Linear Mixed Models for Genome-Wide Association
Studies. Sci. Rep. 4, 6874 (2015).
115. J. E. Haber, Mating-Type Genes and MAT Switching in Saccharomyces cerevisiae.
Genetics. 191, 33–64 (2012).
116. Y. Tarutani, H. Shiba, M. Iwano, T. Kakizaki, G. Suzuki, M. Watanabe, A. Isogai, S.
Takayama, Trans-acting small RNA determines dominance relationships in Brassica self-
incompatibility. Nature. 466, 983–986 (2010).
117. S. Billiard, V. Castric, Evidence for Fisher’s dominance theory: how many ‘special cases’?
Trends Genet. 27, 441–445 (2011).
118. A. I. Young, S. Benonisdottir, M. Przeworski, A. Kong, Deconstructing the sources of
genotype-phenotype associations in humans. Science. 365, 1396–1400 (2019).
119. A. Wach, A. Brachat, R. Pöhlmann, P. Philippsen, New heterologous modules for classical
or PCR-based gene disruptions in Saccharomyces cerevisiae. Yeast. 10, 1793–1808 (1994).
120. R. D. Gietz, R. H. Schiestl, High-efficiency yeast transformation using the LiAc/SS carrier
DNA/PEG method. Nat. Protoc. 2, 31–34 (2007).
121. J. E. DiCarlo, J. E. Norville, P. Mali, X. Rios, J. Aach, G. M. Church, Genome engineering
in Saccharomyces cerevisiae using CRISPR-Cas systems. Nucleic Acids Res. 41, 4336–4343
(2013).
122. Herskowitz, I. & Jensen, R. E. [8] Putting the HO gene to work: Practical uses for
mating-type switching. In Methods in Enzymology vol. 194 132–146 (Elsevier, 1991).
210
123. D. Julius, L. Blair, A. Brake, G. Sprague, J. Thorner, Yeast ar Factor Is Processed from a
Larger Precursor Polypeptide: The Essential Role of a Membrane-Bound Dipeptidyl
Aminopeptidase. Cell. 32, 839–852 (1983).
124. O. Kobayashi, H. Suda, T. Ohtani, H. Sone, Molecular cloning and analysis of the
dominant flocculation gene FLO8 from Saccharomyces cerevisiae. Mol. Gen. Genet. MGG.
251, 707–715 (1996).
125. W.-S. Lo, A. M. Dranginis, The Cell Surface Flocculin Flo11 Is Required for
Pseudohyphae Formation and Invasion by Saccharomyces cerevisiae. Mol. Biol. Cell. 9, 161–
171 (1998).
126. G. Lee, I. Saito, Role of nucleotide sequences of loxP spacer region in Cre-mediated
recombination. Gene. 216, 55–65 (1998).
127. B. Sauer, Functional Expression of the cre-lox Site-Specific Recombination System in the
Yeast Saccharomyces cerevisiae. MOL CELL BIOL. 7, 10 (1987).
128. R. Verwaal, N. Buiting-Wiessenhaan, S. Dalhuijsen, J. A. Roubos, CRISPR/Cpf1 enables
fast and simple genome editing of Saccharomyces cerevisiae: CRISPR/Cpf1-mediated genome
editing of Saccharomyces cerevisiae. Yeast. 35, 201–211 (2018).
129. A. L. Goldstein, J. H. McCusker, Three new dominant drug resistance cassettes for gene
disruption in Saccharomyces cerevisiae. Yeast Chichester Engl. 15, 1541–53 (1999).
130. L. Himmelmann, Package “HMM” (2015; https://cran.r-
project.org/web/packages/HMM/HMM.pdf).
131. The R Core Team, R: A language and environment for statistical computing.
132. G. Covarrubias-Pazaran, Genome-Assisted Prediction of Quantitative Traits Using the R
Package sommer. PLOS ONE. 11, e0156744 (2016).
211
133. R. A. Peterson, J. E. Cavanaugh, Ordered quantile normalization: a semiparametric
transformation built for the cross-validation era. J. Appl. Stat. 47, 2312–2327 (2020).
134. S. Purcell, B. Neale, K. Todd-Brown, L. Thomas, M. A. R. Ferreira, D. Bender, J. Maller,
P. Sklar, P. I. W. de Bakker, M. J. Daly, P. C. Sham, PLINK: A Tool Set for Whole-Genome
Association and Population-Based Linkage Analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
135. Brown, G. D. et al. Hidden Killers: Human Fungal Infections. Sci. Transl. Med. 4,
(2012).
136. Almeida, F., Rodrigues, M. L. & Coelho, C. The Still Underestimated Problem of Fungal
Diseases Worldwide. Front Microbiol 10, 214 (2019).
137. Kohler, J. R., Casadevall, A. & Perfect, J. The spectrum of fungi that infects humans.
Cold Spring Harb Perspect Med 5, a019273 (2014).
138. Brown, G. D. & May, R. C. Editorial overview: Host-microbe interactions: fungi. Curr
Opin Microbiol 40, v–vii (2017).
139. Borman, A. M., Linton, C. J., Miles, S.-J. & Johnson, E. M. Molecular identification of
pathogenic fungi. J. Antimicrob. Chemother. 61 Suppl 1, i7-12 (2008).
140. Scorzoni, L. et al. Antifungal Therapy: New Advances in the Understanding and
Treatment of Mycosis. Front Microbiol 8, 36 (2017).
141. Nivoix, Y., Ledoux, M. P. & Herbrecht, R. Antifungal Therapy: New and Evolving
Therapies. Semin Respir Crit Care Med 41, 158–174 (2020).
142. Kumamoto, C. A. Niche-specific gene expression during C. albicans infection. Curr.
Opin. Microbiol. 11, 325–30 (2008).
143. Kumamoto, C. A. Molecular mechanisms of mechanosensing and their roles in fungal
contact sensing. Nat. Rev. Microbiol. 6, 667–73 (2008).
212
144. Clemons, K. V., McCusker, J. H., Davis, R. W. & Stevens, D. A. Comparative
pathogenesis of clinical and nonclinical isolates of Saccharomyces cerevisiae. J Infect Dis
169, 859–67 (1994).
145. McCusker, J. H., Clemons, K. V., Stevens, D. A. & Davis, R. W. Genetic
characterization of pathogenic Saccharomyces cerevisiae isolates. Genetics 136, 1261–9
(1994).
146. Byron, J. K., Clemons, K. V., McCusker, J. H., Davis, R. W. & Stevens, D. A.
Pathogenicity of Saccharomyces cerevisiae in complement factor five-deficient mice. Infect.
Immun. 63, 478–485 (1995).
147. Cavalheiro, M. & Teixeira, M. C. Candida Biofilms: Threats, Challenges, and Promising
Strategies. Front Med Lausanne 5, 28 (2018).
148. Hickman, M. A. et al. The ‘obligate diploid’ Candida albicans forms mating-competent
haploids. Nature 494, 55–9 (2013).
149. McCusker, J. H. Saccharomyces cerevisiae: an Emerging and Model Pathogenic Fungus.
In Molecular Principles of Fungal Pathogenesis (eds. Heitman, J., Filler, S. G., Edwards, J.
E. & Mitchell, A. P.) 245–259 (ASM Press, 2006).
150. Goldstein, A. L. & McCusker, J. H. Development of Saccharomyces cerevisiae as a
model pathogen. A system for the genetic identification of gene products required for
survival in the mammalian host environment. Genetics 159, 499–513 (2001).
151. Strope, P. K. et al. The 100-genomes strains, an S. cerevisiae resource that illuminates its
natural phenotypic and genotypic variation and emergence as an opportunistic pathogen.
Genome Res 25, 762–74 (2015).
213
152. Ludlow, C. L. et al. Independent Origins of Yeast Associated with Coffee and Cacao
Fermentation. Curr. Biol. CB 26, 965–971 (2016).
153. Fay, J. C. et al. A polyploid admixed origin of beer yeasts derived from European and
Asian wine populations. PLOS Biol. 17, e3000147 (2019).
154. Gallone, B. et al. Domestication and Divergence of Saccharomyces cerevisiae Beer
Yeasts. Cell 166, 1397-1410.e16 (2016).
155. Peter, J. et al. Genome evolution across 1,011 Saccharomyces cerevisiae isolates. Nature
556, 339–344 (2018).
156. Khatri, I., Tomar, R., Ganesan, K., Prasad, G. S. & Subramanian, S. Complete genome
sequence and comparative genomics of the probiotic yeast Saccharomyces boulardii. Sci.
Rep. 7, 371 (2017).
157. Cannon, P. F. & Kirk, P. M. Fungal Families of the World. (CABI, 2007).
158. Liti, G. et al. Population genomics of domestic and wild yeasts. Nature 458, 337–41
(2009).
159. Phadke, S. S. et al. Genome-Wide Screen for Saccharomyces cerevisiae Genes
Contributing to Opportunistic Pathogenicity in an Invertebrate Model Host. G3 Bethesda 8,
63–78 (2018).
160. Matsui, T. et al. The interplay of additivity, dominance, and epistasis in a diploid yeast
cross. bioRxiv (2021) doi:10.1101/2021.07.20.453124.
161. Ehrenreich, I. M., Gerke, J. P. & Kruglyak, L. Genetic Dissection of Complex Traits in
Yeast: Insights from Studies of Gene Expression and Other Phenotypes in the BY×RM
Cross. Cold Spring Harb. Symp. Quant. Biol. 74, 145–153 (2009).
214
162. Rockman, M. V. Reverse engineering the genotype–phenotype map with natural genetic
variation. Nature 456, 738–744 (2008).
163. Levy, S. F. et al. Quantitative evolutionary dynamics using high-resolution lineage
tracking. Nature 519, 181–186 (2015).
164. Chen, P. & Zhang, J. Antagonistic pleiotropy conceals molecular adaptations in changing
environments. Nat Ecol Evol 4, 461–469 (2020).
165. Qian, W., Ma, D., Xiao, C., Wang, Z. & Zhang, J. The genomic landscape and
evolutionary resolution of antagonistic pleiotropy in yeast. Cell Rep 2, 1399–410 (2012).
166. Wei, X. & Zhang, J. Environment-dependent pleiotropic effects of mutations on the
maximum growth rate r and carrying capacity K of population growth. PloS Biol 17,
e3000121 (2019).
167. Lo, W. S. & Dranginis, A. M. FLO11, a yeast gene related to the STA genes, encodes a
novel cell surface flocculin. J Bacteriol 178, 7144–51 (1996).
168. Bayly, J. C., Douglas, L. M., Pretorius, I. S., Bauer, F. F. & Dranginis, A. M.
Characteristics of Flo11-dependent flocculation in Saccharomyces cerevisiae. FEMS Yeast
Res 5, 1151–6 (2005).
169. Liu, H., Styles, C. A. & Fink, G. R. Saccharomyces cerevisiae S288C has a mutation in
FLO8, a gene required for filamentous growth. Genetics 144, 967–78 (1996).
170. Profaci, C. P., Munji, R. N., Pulido, R. S. & Daneman, R. The blood-brain barrier in
health and disease: Important unanswered questions. J Exp Med 217, (2020).
171. Daneman, R. & Prat, A. The Blood–Brain Barrier. Cold Spring Harb. Perspect. Biol. 7,
a020412 (2015).
215
172. Mruk, D. D. & Cheng, C. Y. The Mammalian Blood-Testis Barrier: Its Biology and
Regulation. Endocr. Rev. 36, 564–591 (2015).
173. Cherry, J. M. et al. Saccharomyces Genome Database: the genomics resource of budding
yeast. Nucleic Acids Res 40, D700-5 (2012).
174. Steinmetz, L. M. et al. Dissecting the architecture of a quantitative trait locus in yeast.
Nature 416, 326–30 (2002).
175. Demogines, A., Smith, E., Kruglyak, L. & Alani, E. Identification and Dissection of a
Complex DNA Repair Sensitivity Phenotype in Baker’s Yeast. PLOS Genet. 4, e1000123
(2008).
176. Dimitrov, L. N., Brem, R. B., Kruglyak, L. & Gottschling, D. E. Polymorphisms in
multiple genes contribute to the spontaneous mitochondrial genome instability of
Saccharomyces cerevisiae S288C strains. Genetics 183, 365–83 (2009).
177. Lewis, J. A., Broman, A. T., Will, J. & Gasch, A. P. Genetic architecture of ethanol-
responsive transcriptome variation in Saccharomyces cerevisiae strains. Genetics 198, 369–
382 (2014).
178. Ehrenreich, I. M. et al. Genetic Architecture of Highly Complex Chemical Resistance
Traits across Four Yeast Strains. PLOS Genet. 8, e1002570 (2012).
179. Bloom, J. S. et al. Rare variants contribute disproportionately to quantitative trait
variation in yeast. eLife 8, e49212 (2019).
180. The Columbia Electronic Encyclopedia. (Columbia University Press, 2000).
181. Song, Y. et al. ADH1 promotes Candida albicans pathogenicity by stimulating oxidative
phosphorylation. Int. J. Med. Microbiol. IJMM 309, 151330 (2019).
216
182. Son, Y.-E. et al. Pbp1-Interacting Protein Mkt1 Regulates Virulence and Sexual
Reproduction in Cryptococcus neoformans. Front. Cell. Infect. Microbiol. 9, 355 (2019).
183. Jiang, L. & Pan, H. Functions of CaPhm7 in the regulation of ion homeostasis, drug
tolerance, filamentation and virulence in Candida albicans. BMC Microbiol. 18, 49 (2018).
184. Bercusson, A., de Boer, L. & Armstrong-James, D. Endosomal sensing of fungi: current
understanding and emerging concepts. Med. Mycol. 55, 10–15 (2017).
185. Huang, J. et al. The mitochondrial thiamine pyrophosphate transporter TptA promotes
adaptation to low iron conditions and virulence in fungal pathogen Aspergillus fumigatus.
Virulence 10, 234–247 (2019).
186. Uppin, M. S. et al. Fungal infections as a contributing cause of death: An autopsy study.
Indian J. Pathol. Microbiol. 54, 344 (2011).
187. Himmelmann, L. Package ‘HMM’. (2015).
188. Churchill, G. A. & Doerge, R. W. Empirical threshold values for quantitative trait
mapping. Genetics 138, 963–71 (1994).
217
Appendix A:
Genetic basis of a spontaneous mutation’s expressivity.
This work was performed in collaboration with Rachel Schell, Joseph J. Hale, Takeshi Matsui,
Ryan Foree, and Ian Ehrenreich. It appears as a preprint on bioRxiv.org.
doi: https://doi.org/10.1101/2020.04.03.024547
A.1. Summary
Genetic background often influences the phenotypic consequences of mutations, resulting
in variable expressivity. How standing genetic variants collectively cause this phenomenon is not
fully understood. Here, we comprehensively identify loci in a budding yeast cross that impact the
growth of individuals carrying a spontaneous missense mutation in the nuclear-encoded
mitochondrial ribosomal gene MRP20. Initial results suggested that a single large effect locus
influences the mutation’s expressivity, with one allele causing inviability in mutants. However,
further experiments revealed this simplicity was an illusion. In fact, many additional loci shape the
mutation’s expressivity, collectively leading to a wide spectrum of mutational responses. These
results exemplify how complex combinations of alleles can produce a diversity of qualitative and
quantitative responses to the same mutation.
218
A.2. Introduction
Mutations frequently exhibit different effects in genetically distinct individuals (or
‘background effects’) (1–3). For example, not all people with the same disease-causing mutations
manifest the associated disorder or exhibit identical symptoms. A commonly observed form of
background effect among individuals carrying the same mutation is different degrees of response
to that mutation (or ‘variable expressivity’) (4). Variable expressivity can arise due to a myriad of
reasons, including genetic interactions (or epistasis) between a mutation and segregating loci (1),
dominance (1), stochastic noise (5), the microbiome (6), and the environment (1).
The role of epistasis in expressivity has proven especially difficult to study, in part because
natural populations harbor substantial genetic diversity, which can facilitate complex genetic
interactions between segregating loci and mutations (7–20). Mapping the loci involved in these
interactions is technically challenging. However, controlled laboratory crosses provide a powerful
tool for identifying the loci that interact with particular mutations, giving rise to background effects
(7, 12, 17–19, 21).
In this paper, we use a series of controlled crosses in the budding yeast Saccharomyces
cerevisiae to comprehensively characterize the genetic basis of a mutation’s expressivity. We
focus on a missense mutation in MRP20, an essential nuclear-encoded subunit of the mitochondrial
ribosome (22). This mutation occurred by chance in a cross between the reference strain BY4716
(‘BY’) and a clinical isolate 322134S (‘3S’), and was found to show variable expressivity among
BYx3S cross progeny. This presented an opportunity to determine how loci segregating in the
BYx3S cross individually and collectively influence this mutation’s expressivity.
219
A.3. Results
A.3.1. A spontaneous mutation increases phenotypic variation in the
BYx3S cross
In the BY/3S diploid progenitor of haploid BYx3S segregants, a spontaneous mutation
occurred in a core domain of Mrp20 that is conserved from bacteria to humans (Figure
A.1A, Figure SA.1) (22, 23). This mutation resulted in an alanine to glutamine substitution at
amino acid 105 (mrp20-105E) and showed variable expressivity among segregants carrying it.
Specifically, segregants with this mutation showed increased phenotypic variance relative to wild
type segregants when ethanol was provided as the carbon source, the condition used hereafter
(Figure A.1B; Levene’s test, p = 5.9 × 10
-22
). Mutant segregants exhibited levels of growth ranging
from inviable to wild type, and fit a bimodal distribution that centered on 10% and 57% growth
relative to the haploid BY parent strain (bimodal fit log likelihood = 30; Figure SA.2).
Figure A.1. The mrp20-105E mutation occurred spontaneously, increasing phenotypic variance in
the BYx3S cross. (A) A spontaneous mutation in a BY/3S diploid gave rise to a BYx3S segregant
220
population in which mrp20-105E segregated. (B) The mrp20-105E segregants exhibited increased
phenotypic variance and a bimodal distribution of phenotypes. Throughout the paper, blue and orange are
used to denote BY and 3S genetic material, respectively. All growth data presented in the paper are
measurements of colonies on agar plates containing rich medium with ethanol as the carbon source.
A.3.2. A large effect locus shows epistasis with mrp20-105E
Loci contributing to this variable expressivity should be detectable through their genetic
interactions with MRP20. To find such loci, we performed linkage scans for two-way epistasis
with MRP20. We identified a single locus on Chromosome XIV (ANOVA, interaction term p =
4.3 × 10
−16
; Figure A.2A). Individuals with XIV
BY
showed reduced growth among
both MRP20 and mrp20-105E segregants, but to a greater degree among the latter (Figure A.2B).
The Chromosome XIV locus exlained 79% of the phenotypic variance among mrp20-
105E segregants (ANOVA, p = 3.2 × 10
−31
) and accounted for all observed cases of inviability
(Figure A.2B).
To further resolve the Chromosome XIV locus, we crossed an mrp20-105E
XIV
BY
F2 segregant and an mrp20-105E XIV
3S
F2 segregant (Supplementary Text SA.2; Figure
A.2C). 361 F3 progeny were genotyped by low-coverage whole genome sequencing and
phenotyped for growth. Linkage mapping with these data reidentified the Chromosome XIV locus
at a p-value of 2.50 × 10
−43
(ANOVA; Figure A.2D, Figure SA.3) and resolved it to a single SNP
in the coding region of MKT1 (Figure A.2E). This SNP, which encodes a glycine in BY and a
serine in 3S at amino acid 30, was then validated by nucleotide replacement in mrp20-
105E segregants (Figure A.2F). Notably, this specific SNP was previously shown to play a role
in mitochondrial genome stability (24), suggesting epistasis between MRP20 and MKT1 involves
mitochondrial dysfunction, impairing growth on non-fermentative carbon sources such as ethanol.
221
Figure A.2. Epistasis between MRP20 and MKT1 appears to mostly explain response to mrp20-105E.
(A) Linkage mapping in the BYx3S segregants shown in Fig 1 identified a locus on Chromosome XIV that
exhibits a two-way genetic interaction with MRP20. (B) The Chromosome XIV locus had effects in both
MRP20 and mrp20-105E segregants but had a greater effect among mrp20-105E segregants. (C) To identify
222
the causal gene, we crossed two mrp20-105E F2 segregants that differed at the Chromosome XIV locus
and gathered a panel of F3 segregants. (D) Linkage mapping in the F3 segregants identified the
Chromosome XIV locus at high resolution, with a peak at position 467,219. Tick marks denote every
100,000 bases along the chromosome. (E) Recombination breakpoints in the F3 segregants delimited the
Chromosome XIV locus to a single SNP in MKT1 at position 467,219. Vertical dashed line highlights the
delimited causal polymorphism, while small vertical lines along the x-axis indicate different SNPs in the
window that is shown. (F) Engineering the BY allele into mrp20-105E XIV3S segregants changed growth
(left), while substitutions at the nearest upstream and downstream variants did not (right).
A.3.3. Epistasis between MRP20 and MKT1 differs in cross parents and
segregants
We attempted to validate the epistasis between MRP20 and MKT1 by introducing all four
possible combinations of the causal nucleotides at these two genes into haploid versions of both
BY and 3S (Figure A.3A). The mrp20-105E mutation affected growth in both parent strains
(ANOVA, p = 4.3 × 10
−24
and p = 4.0 × 10
−4
). However, the magnitude of the effect differed
between the two: mrp20-105E caused inviability in BY but had a more modest effect in 3S. In
addition, MKT1 influenced response to mrp20-105E in the 3S background (ANOVA, p = 0.01) but
not in the BY background (ANOVA, p = 0.99).
The phenotypic consequence of epistasis between MRP20 and MKT1 differed between
parent and segregant strains. Specifically, the phenotypes of BY mrp20-105E MKT13S, 3S mrp20-
105E MKT1BY, and 3S mrp20-105E MKT13S all differed from the expectations established by
BYx3S mrp20-105E segregants. These departures from expectation imply that additional
unidentified loci also influence response to mrp20-105E.
223
Figure A.3. Additional loci govern response to the mutation. (A) We engineered all combinations of
MRP20 and MKT1 into the BY and 3S cross parents. Expected phenotypes are shown as shaded boxes
denoting 95% confidence interval based on the originally obtained segregant phenotypes. (B) We generated
BY x 3S crosses in which all segregants carried mrp20-105E. Two crosses were performed: one in which
all segregants carried MKT1BY and one in which all segregants carried MKT13S. Tetrads were dissected
and spores were phenotyped for growth on ethanol. (C) Each of the new crosses showed increased growth
that extended from inviable to wild type, differing from the more qualitative bimodal phenotypes seen
among the original mrp20-105E MKT1 segregant populations. (D) Linkage mapping identified a total of
16 loci that influenced growth. After four iterations of a forward regression, no additional loci were
identified. (E) Inviable segregants were present among all mrp20-105E MKT1 SAL1 genotype classes. (F)
224
Aneuploid individuals with duplicated Chromosome II showed reduced growth. Aneuploid individuals
were not evenly detected across the different MKT1-SAL1 genotype classes.
A.3.4. Fixation of mrp20-105E and MKT1 genotypes increases phenotypic
variance
To enable the identification of other loci underlying response to mrp20-105E, we generated
two new BYx3S crosses (Figure A.3B). In both crosses, the BY and 3S parents were engineered
to carry mrp20-105E. Furthermore, one cross was engineered so that both parents
carried MKT1
BY
and the other cross was engineered so that both parents carried MKT1
3S
. By
altering the parent strains in this manner, we increased the chance of detecting additional loci
contributing to the variable expressivity of mrp20-105E. From these engineered crosses, 749 total
segregants were obtained through tetrad dissection, genotyped by low-coverage genome
sequencing, and phenotyped for growth on ethanol.
The new crosses exhibited continuous ranges of phenotypes, in contrast to the bimodal
phenotypic distribution observed in the original segregants (Figure A.3C). In both
the MKT1
BY
and MKT1
3S
crosses, mrp20-105E segregants ranged from inviable to nearly wild
type. The distributions of phenotypes in the two crosses differed in a manner consistent with
their MKT1 alleles, with the mean of the MKT1
BY
segregants lower than the MKT1
3S
segregants (t-
test, p = 4.8 × 10
−34
). These data show that regardless of the MKT1 allele present, additional loci
can cause mrp20-105E to show phenotypic effects ranging from lethal to benign.
A.3.5. Many additional loci affect the expressivity of mrp20-105E
We used the new crosses to map other loci contributing to response to mrp20-105E.
Excluding MKT1, which explained 18% of the phenotypic variance in the new crosses, linkage
mapping identified 16 new loci (Figure A.3D, Figure SA.4). We found no evidence for genetic
225
interactions among the loci (pairs and trios examined with fixed effects linear models, Bonferroni
threshold).
Of the new loci, the BY allele was inferior at 10 and superior at six. These loci individually
explained between 0.79% and 14% of the phenotypic variance in the new crosses. 13 of these loci
resided on a subset of chromosomes but were distantly linked: four on Chromosomes XII, three
on XIII, two on XIV, and four on XV. The three remaining loci were detected on Chromosomes
IV, VII, and XI.
Recombination breakpoints delimited the loci to small genomic intervals spanning one (12
loci), two (3 loci) or three (1 locus) genes. These candidate genes functioned in many
compartments of the cell and implicated a diversity of cellular pathways and processes in the
expressivity of mrp20-105E. Thus, the molecular basis of mrp20-105E’s expressivity is complex.
A.3.6. The chromosome XIV locus contains multiple causal variants
Among the newly detected loci, the largest effect (14% phenotypic variance explained)
was on Chromosome XIV. The position of maximal significance at this site was located two genes
away from the end of MKT1, with a 99% confidence interval that did not encompass the causal
variant in MKT1. Thus, the originally identified large effect Chromosome XIV locus in fact
represents multiple distinct closely linked nucleotides that both genetically interact
with MRP20 and occur in different genes (Figure A.3E).
The new locus on Chromosome XIV was delimited to two genes, one of which was SAL1,
encoding a mitochondrial ADP/ATP transporter that physically interacts with Mrp20. A SNP
in SAL1 that segregates in this cross was previously linked to increased mitochondrial genome
instability in BY (24), suggesting it is likely also causal in our study. For this reason, we refer to
226
this additional Chromosome XIV locus as ‘SAL1’. We found no evidence for epistasis
between MKT1 and SAL1 (ANOVA, p = 0.77).
Although the MKT1-SAL1 locus had a large effect, it explained a minority of the
phenotypic variance among mrp20-105E segregants in a model including all detected loci (32%
for MKT1-SAL1 vs. 36% for all other loci collectively). Thus, by enabling MKT1 and SAL1 to
segregate independently through genetic engineering and examining a large number of mrp20-
105E segregants with different MKT1-SAL1 genotypes, we observed a greater diversity of
mutational responses than was originally seen and detected many additional loci.
A.3.7. Aneuploidy also contributes to the expressivity of mrp20-105E
Despite the fact that the identified loci explain most of mrp20-105E’s expressivity, some
individuals exhibited unexpectedly poor growth (Figure A.3F). This finding led to the
identification of a Chromosome II duplication that reduced growth (ANOVA, 1.2 × 10
−48
). The
aneuploidy was common among mrp20-105E segregants, with a higher prevalence
when MKT1
3S
was also present (Fisher’s exact test, p = 1.5 × 10
−43
). The Chromosome II
aneuploidy was not seen among wild type segregants. These data suggest that mrp20-
105E increases the rate of aneuploidization and that genetic variation in MKT1 influences the
degree to which mrp20-105E segregants duplicate Chromosome II. The aneuploidy’s contribution
to phenotypic variation was relatively minor, explaining 5% of phenotypic variance among mrp20-
105E segregants in a model also including all identified loci.
A.3.8. Multiple mechanisms underlie poor growth in the presence of
mrp20-105E
Evidence suggests mitochondrial genome instability contributes to the variable
expressivity of mrp20-105E. First, mitochondrial genome instability is known to cause poor
227
growth on non-fermentative carbon sources, such as ethanol (25,26). Second, the exact variants
that segregate in our cross at MKT1 and SAL1 were previously linked to mitochondrial genome
instability (24). Third, both Mrp20 and Sal1 function in the mitochondria (22, 27). Fourth, two
other candidate genes in the newly detected loci encode proteins that function in the mitochondria.
To determine the role of mitochondrial genome instability in the variable expressivity
of mrp20-105E, we measured petite formation, a proxy for spontaneous mitochondrial genome
loss (Figure A.4) (28). In addition to MRP20 and mrp20-105E BY and 3S parent strains,
16 MRP20 segregants and 42 mrp20-105E segregants were examined. Despite causing reduced
growth in both parents, mrp20-105E only led to elevated mitochondrial genome instability in BY
(t-test p = 0.013 in BY and p = 0.39 in 3S; Figure A.4A). Also, although mrp20-105E segregants
exhibited increased mitochondrial genome instability relative to MRP20 segregants (Wilcoxon
rank-sum test p = 0.023), especially at lower levels of growth, a subset of inviable segregants did
not show elevated petite formation (Figure A.4B and C). These results suggest that mitochondrial
genome instability explains part, but not all, of response to mrp20-105E.
Figure A.4. Mitochondrial genome instability partially underlies the expressivity of
mrp20-105E. We measured petite formation frequency, which estimates the proportion of cells
228
within a clonal population capable of respiratory growth. Higher petite frequency is a proxy for
greater mitochondrial genome instability. (A) We examined MRP20 and mrp20-105E versions of
the BY and 3S parent strains. For each, average values and 95% bootstrapped confidence intervals
are shown. BY showed elevated mitochondrial genome instability in the presence of mrp20-105E,
while 3S showed no change. (B) We examined 16 BYx3S MRP20 segregants. These segregants
were randomly selected and spanned the range of growth values for MRP20 segregants. (C) 45
BYx3S mrp20-105E segregants. Poorer growing segregants tended to exhibit higher mitochondrial
genome instability, though some exhibited wild type levels of mitochondrial genome instability.
The gray dashed line indicates the threshold used to call inviability.
A.3.9. Genetic underpinnings of mrp20-105E’s expressivity in segregants
and parents
We determined the extent to which our identified loci explained phenotypic variability
among mutants. Modeling growth as a function of all identified loci and the aneuploidy accounted
for the majority (78%) of the broad-sense heritability among mrp20-105E segregants (ANOVA, p
= 5.2 × 10
−188
). Further, phenotypic predictions for segregants based on their genotypes were
strongly correlated with their observed phenotypes (r = 0.85, p = 4.4 × 10
−209
; Figure A.5). These
results show that the variable expressivity of mrp20-105E is driven by many loci that collectively
produce a spectrum of mutational responses.
229
Figure A.5. Detected loci quantitatively and qualitatively explain mutant phenotypes. (A) We fit a
linear model accounting for the effects of all detected loci and the aneuploidy on the growth of mrp20-105E
segregants. This model not only explained the growth of the new BY x 3S mrp20-105E crosses generated
in this paper, but also accurately predicted the phenotypes of the mutant parents and previously generated
segregants. (B) We examined growth relative to the sum of detrimental alleles carried by a segregant. This
relationship shows how collections of loci produce a quantitative spectrum of phenotypes, including
instances of qualitative phenotypic responses. This relationship explains the full range of responses, from
inviable to wild type growth, across MKT1-SAL1 genotypes. The gray dashed line indicates the threshold
used to call inviability.
Confirming this point, the model was also effective for other genotypes that were not
present in the new crosses, but had been generated throughout the course of this work. For instance,
the model accurately predicted the phenotypes of the original mrp20-105E segregant population (r
= 0.90, p = 1.6 × 10
−39
), as well as the phenotypes of cross parents engineered to carry mrp20-
230
105E (Figure A.5A). Moreover, the model explained both qualitative and quantitative variation
within and between the two Chromosome XIV classes that were originally seen among mrp20-
105E segregants.
Finally, we examined how diverse combinations of loci collectively produced similar
phenotypic responses to mrp20-105E. We examined the relationship between growth and the total
number of detrimental alleles carried by mrp20-105E segregants, keeping track of each
individual’s genotype at MKT1 and SAL1, the largest effect loci (Figure A.5B). The number of
detrimental alleles carried by a segregant showed a strong negative relationship with growth,
which was not observed in wild type segregants (Figure SA.5). Further, regardless of genotype
at MKT1 and SAL1, the effect of mrp20-105E ranged from lethal to benign in a manner dependent
on the number of detrimental alleles present at other loci. These findings demonstrate that many
segregating loci beyond the large effect MKT1-SAL1 locus influence the expressivity of mrp20-
105E and enable different genotypes in the cross to exhibit a broad range of responses to the
mutation.
231
A.4. Discussion
We have provided a detailed genetic characterization of the expressivity of a spontaneous
mutation. Response to this mutation in a budding yeast cross is influenced by at least 18 genetic
factors in total, with the largest effect due to two closely linked variants. However, at least 15
additional loci segregate and jointly exert larger effects than the largest two. Different
combinations of alleles across these loci produce a continuous spectrum of mutational responses.
Due to tight linkage between MKT1 and SAL1 in the original cross parents, the full extent of this
continuum was not originally observed, leading to an initial understanding of the expressivity of
the mrp20-105E mutation that was simplistic.
These findings also show how quantitative variation in mutational response can produce
seemingly discrete outcomes. In part, whether responses appear qualitative depends on the
configuration of mutationally responsive alleles in examined mutants. Approaches such as
crossing of genetically engineered strains can be used to disrupt these configurations that mask the
full extent of variation. However, another part of this expressivity is the tolerance of a system to
quantitative variation in key processes, for example mitochondrial genome stability in the case
of mrp20-105E. Our data suggest that these processes can only tolerate quantitative variation to a
point, but also indicate that lethality to the same mutation may arise in different genetic
backgrounds due to impairment of distinct cellular processes.
Our results inform efforts to understand expressivity in other systems, including humans.
For example, there is interest in determining why people who carry highly penetrant alleles known
to cause disease do not develop pathological conditions (3, 29, 30). Such resilience, as observed
here, may involve numerous loci. This speaks to the complicated and unexpected epistasis that can
232
arise between mutations and segregating loci in genetically diverse populations (7–20). It also
illustrates the importance of characterizing epistasis (31–41), including background effects, as
these forms of genetic interactions are immediately relevant to evolution and disease, and may not
emerge from studies that do not directly interrogate natural variation in genetically diverse
populations.
233
A.5. Materials and Methods
A.5.1. Generation of segregants
The haploid BYx3S segregants in which mrp20-105E was identified were the hos3Δ
F2 segregants generated and described in Mullis et al (17). In brief, a BY MATa can1Δ::STE2pr-
SpHIS5 his3Δ strain was mated to a 3S MATα ho::HphMX his3Δ::NatMX strain to generate a wild
type BY/3S diploid. PCR-mediated, targeted gene disruption was then used to produce a
BY/3S HOS3/hos3Δ::KanMX strain. Both the wild type and hemizygous deletion strains were
sporulated, and random BYx3S MATa spores were obtained from each using the magic marker
system with plating on His- plates containing canavanine (42). Following discovery of the mrp20-
105E mutation, we performed tetrad dissected of this diploid to obtain mrp20-105E segregants in
both HOS3 and hos3Δ genetic backgrounds.
To produce haploid mrp20-105E F3 segregants, we deleted URA3 from a BYx3S F2 MATa
can1Δ::STE2pr-SpHIS5 his3Δ hos3Δ::KanMX mrp20-105E XIV
3S
segregant. We then mating type
switched the strain by transforming it with a URA3 plasmid containing an
inducible HO endonuclease, inducing HO, and plating single cells. The mating type-switched
BYx3S F2 MATα can1Δ::STE2pr-SpHIS5 his3Δ ura3Δ hos3Δ::KanMX mrp20-105E
XIV
3S
segregant was then mated to a BYx3S F2 MATa can1Δ::STE2pr-SpHIS5
his3Δ hos3Δ::KanMX mrp20-105E XIV
BY
segregant. The resulting diploid was sporulated and
random segregants were obtained by plating on His- media.
To obtain additional haploid mrp20-105E MKT1
BY
and MKT1
3S
F2 segregants, we
engineered mrp20-105E, as well as the 3S and BY causal variants at MKT1 position 467,219 into
BY and 3S, respectively. BY mrp20-105E was independently mated to 3S mrp20-105E
234
MKT1
BY
twice. Two resultant diploids were sporulated and tetrads were dissected to obtain
BYx3S mrp20-105E MKT1
BY
haploid segregants. The same process was followed with BY
mrp20-105E MKT1
3S
and 3S mrp20-105E strains to obtain BYx3S mrp20-105E MKT1
3S
haploid
segregants.
A.5.2. Genotyping
F2 segregants shown in Figure A.1 and Figure A.2 A and B were previously genotyped in
Mullis et al. using the same techniques described below (17). In this paper, F3 segregants and all
remaining F2 segregants shown in Figures A.1 and A.3-A.5 were genotyped by low coverage
whole genome sequencing. Freezer stocks of strains were inoculated into liquid overnight cultures
and grown to stationary phase at 30°C. DNA was extracted using Qiagen 96-well DNeasy kits
(Qiagen P/N 69581). Sequencing libraries were prepared using the Illumina Nextera Kit and
custom barcoded adapter sequences. Segregants from each respective cross (361 F3s and 872 F2s)
were pooled in equimolar fractions into three separate multiplexes, run on a gel, size selected, and
purified with the Qiagen Gel Extraction Kit. F2 and F3 segregants were sequenced by Novogene
on Illumina HiSeq 4000 lanes using 150 bp x 150 bp paired-end reads.
Sequencing reads were mapped against the S288C genome (version
S288C_reference_sequence_R64-2-1_20150113.fsa from the Saccharomyces Genome
Database https://www.yeastgenome.org) using BWA version 0.7.7-r44 (43). Samtools v1.9
was then used to create a pileup file for each segregant (44). For both BWA and Samtools, default
settings were employed. Base calls and coverages were gathered for 44,429 SNPs that segregate
in the cross (14). Low coverage individuals (<0.7x average per site coverage) were removed from
analyses. Diploid and contaminated individuals were identified by abnormal patterns of
heterozygosity or sequencing coverage, and were also excluded. For each segregant, a raw
235
genotype vector was determined by the percent of calls at each site for the 3S allele. We then used
a Hidden Markov Model (HMM) implemented in the ‘HMM’ package v 1.0 in R to correct each
raw genotype vector using the following probability matrices (45): transitionProbabilitiy =
matrix(c(.9999,.0001,.0001,.9999),2) and emissionProbability =
matrix(c(.0.25,0.75,0.75,0.25),2).
Aneuploidies were identified based on elevated sequencing coverage at particular
chromosomes within each individual sample. This identified a chromosome II duplication event
in a subset of BYx3S mrp20-105E MKT1
BY
and BYx3S mrp20-105E MKT1
3S
segregants. The
BY mrp20-105E MKT1
3S
x 3S mrp20-105E cross had the highest prevalence (50%), and thus
individuals from this cross were further examined. We employed the normalmixEM() function
from the mixtools library in R (46) to determine that coverage on Chr II was bimodal and centered
on 0.98 and 1.8 (log likelihood of 237). Posterior probabilities were used to call aneuploid
individuals which that had an average per site coverage of 1.5x or greater. This threshold was also
applied to other crosses to identify aneuploid individuals.
A.5.3. Phenotyping
Segregants were inoculated into rich media containing glucose (‘YPD’), which was
comprised of 1% yeast extract (BD P/N 212750), 2% peptone (BD P/N: 211677), and 2% dextrose
(BD P/N 15530). Cultures were grown to stationary phase (two days at 30°C). Strains were then
pinned onto YP + 2% agar (BD P/N 214050) rich media containing ethanol (‘YPE’). The YPE
recipe was 1% yeast extract (BD P/N 212750), 2% peptone (BD P/N: 211677), and 2% ethanol
(Koptec P/N A06141602W). Plates were then grown at 30°C for two days. Growth assays were
conducted in a minimum of three replicates across three plates. On each plate, a BY control was
included. Plates were imaged with the BioRAD Gel Doc XR+ Molecular Imager at a standard size
236
of 11.4 × 8.52 cm
2
(width x length) and imaged with epi-illumination using an exposure time of
0.5 seconds. Images were saved as 600 dpi tiffs. ImageJ (http://rsbweb.nih.gov/ij/) was used to
quantify pixel intensity of each colony through the Plate Analysis JRU v1 plugin
(https://research.stowers.org/imagejplugins/zipped_plugins.html), as described in Matsui et al.
(47). Growth values were normalized against the same plate BY control, then averaged across
replicates to produce a single growth value for each segregant.
A.5.4. Linkage mapping
Initial linkage mapping was conducted with F2 segregants. Initial discovery of the
spontaneous mrp20-105E mutation resulted from linkage mapping with 385 F2 segregants (164
wild type and 221 hos3Δ) from Mullis et al. (17). We employed the linear
model growth ~ hos3Δ + locus + hos3Δxlocus + error, from which the hos3Δxlocus interaction
term was used to identify loci that differentially explained growth in hos3Δ segregants.
Examination of the hos3Δxlocus interaction term led to discovery of the spontaneous mrp20-
105E mutation on the MRP20
BY
allele present in hos3Δ segregants. Following discovery
of mrp20-105E, we used the fixed effects linear model growth ~ MRP20 + locus + MRP20 x locus
+ error using only hos3Δ individuals from Mullis et al. (17). From this scan, we examined
the MRP20 x locus interaction term. 361 mrp20-105E F3 segregants were used to better resolve
the Chromosome XIV locus. We employed the model growth ~ locus + error and examined
the locus term. We examined the minimum observed test on chromosome XIV to delimit that
locus.
To find loci affecting growth in the mrp20-105E background, we generated new
populations of 353 mrp20-105E MKT1
BY
and 396 mrp20-105E MKT1
3S
haploid segregants. The
combined 749 mrp20-105E segregants were used for linkage mapping that followed a forward
237
regression approach. We first obtained residuals from the linear model growth ~ MKT1 + error,
and then implemented a genome-wide scan using the model residuals ~ locus + error. We
examined the locus term and significance was determined by using 1,000 permutations with the
threshold set at the 95
th
quantile of observed −log10(p-values) (48). A maximum of one locus per
chromosome per scan was identified as significant. Following the identification of additional loci,
we accounted for the newly detected loci in a new model, residuals ~ locus 1 + locus 2 + … locus
n + error and obtained the residuals. These new residuals were used in another genome-wide scan
using the model residuals ~ locus + error. Permuted thresholds were calculated for each scan.
This process was repeated for a total of 5 iterations at which point no loci were detected above our
significance threshold. Chromosome II was excluded from linkage mapping due to the presence
of a chromosomal duplication in a subset of individuals. The Chromosome II duplication was
tested for significance using the model growth ~ MKT1 + ChromosomeII + error, from which the
Chromosome II term was examined.
All linkage mapping was performed in R. Linear models were implemented using the lm()
function. To call peaks for each scan we required that the local minimum position within each peak
be a minimum of 150,000 kb away from any other peak. We also required peaks to be more than
20kB from the edge of a chromosome. We report 99% confidence intervals as 2-lod intervals
surrounding the peak position at each locus.
A.5.5. Classification of inviable segregants
Initial discovery of the MRP20 x MKT1 genetic interaction suggested that expressivity
of mrp20-105E was largely determined by variation at MKT1. Furthermore, mrp20-105E
MKT1
BY
segregants exhibited very poor growth, while mrp20-105E MKT1
3S
segregants showed
more tolerant, variable growth. We termed this initial mrp20-105E MKT1
BY
segregant population
238
as ‘inviable’. Figures A.4 and 5 include a gray dashed line to denote the highest growth value
observed among the original inviable segregants.
A.5.6. Delimiting loci with recombination breakpoints
For each locus examined, we split the appropriate segregants into two groups: individuals
carrying the BY allele and individuals carrying the 3S allele. Segregants’ haplotypes across the
adjacent genomic window were then examined. The causal region was determined by identifying
the SNPs fixed for BY among all BY individuals and fixed for 3S among all 3S individuals. Raw
Illumina sequencing reads were examined to confirm the delimit of IV to MRP20 among original
F2 segregants, the delimit of XIV to the MKT1 coding SNP at 467,219 among F3 segregants, and
the delimit of the secondary XIV locus to SAL1 and PMS1 among the new F2 segregants.
A.5.7. Reciprocal hemizygosity experiments
Four hos3Δ F2 MATa segregants were used in all reciprocal hemizygosity (RH)
experiments (49): two were hos3Δ IV
BY
XIV
BY
and two were hos3Δ IV
3S
XIV
BY
. The four segregants
were first mating type switched to enable mating of these segregants to produce
homozygous IV
BY
/IV
BY
, homozygous IV
3S
/IV
3S
, or heterozygous IV
BY
/IV
3S
diploids. Each pairwise
mating was performed and confirmed by plating on mating type tester plates. These diploid strains
were then phenotyped on agar plates containing ethanol, which verified that IV
BY
has an effect in
diploids and acts in a recessive manner. Using the haploid MATa and MATα versions of these four
segregants, we individually engineered premature stop codons into DIT1, MRP20
and PDR15 using CRISPR-mediated targeted gene disruption and lithium acetate transformations
(50). Plasmid-based CRISPR-Cas9 was employed to target the beginning of each coding region
and 20bp repair templates which contained a premature stop codon followed by 1bp deletions were
incorporated. Each sgRNA and repair template was designed so that only the first 15 (of 537), 26
239
(of 264), and 33 (of 1,530) amino acids would be translated for DIT1, MRP20 and PDR15,
respectively. Engineered strains were confirmed by PCR and Sanger sequencing. After
confirmation, wild type and knockout strains for each gene were then mated in particular
combinations to produce reciprocal hemizygotes that were otherwise isogenic. A minimum of two
distinct hemizygotes were generated for each allele of each gene.
A.5.8. Construction of nucleotide replacement strains
Single nucleotide replacement strains were generated for MRP20 and MKT1 using a
CRISPR/Cas9-mediated 10pproach. For a given replacement, the appropriate strain was first
transformed with a modified version of pML104 that constitutively expresses Cas9 using LiAc
transformation (50,51). We then inserted the KanMX gene using co-transformation of a double-
stranded DNA containing KanMX with 30bp upstream and 30bp downstream homology tails and
gRNAs targeting the region containing the site of interest (52). DNA oligos and PCR were used to
construct custom sgDNA templates which included crRNA and tracrRNA in a single molecule.
Next, we employed T7 RNA Polymerase to express sgDNA templates in vitro. Dnase treatment
and phenol extraction were used to obtain purified sgRNAs. Transformants were selected on media
containing G418, and KanMX integration was confirmed by PCR. Next, KanMX was replaced
with the nucleotide of interest. To do this, integrants were co-transformed with four gRNAs
targeting KanMX, a 60 bp single-stranded DNA repair oligo, and a marker plasmid expressing
either HygMX or NatMX using electroporation (53). Marker plasmids were constructed by Gibson
assembly with HygMX or NatMX and pRS316 (54,55). Repair constructs were 60bp ssDNA
oligos ordered from Integrated DNA technologies that included upstream homology, the desired
nucleotide at the site of interest, and downstream homology. Transformants were selected on
240
media containing either hygromycin or nourseothricin, depending on what marker plasmid was
used. Replacement strains were then confirmed by sanger sequencing.
Following this strategy, the mrp20-105E nucleotide was engineered into two hos3Δ
IV
3S
XIV
BY
segregants, and two hos3Δ IV
BY
XIV
BY
segregants were restored to MRP20. Similarly,
at MKT1 the causal, nearest upstream and downstream SNPs were engineered into two hos3Δ
IV
BY
XIV
3S
segregants. Similarly, we generate BY mrp20-105E, BY MKT1
3S
, 3S mrp20-105E, and
3S MKT1
BY
strains in this manner. Each single nucleotide parental replacement strain was then
backcrossed to its own progenitor. Each subsequent diploid was sporulated and tetrad dissected,
and we confirmed haploid genotypes by sequencing. The same approach was used to generate 3S
mrp20-105E MKT1
BY
haploids by crossing 3S mrp20-105E and 3S MKT1
BY
strains. However, this
strategy could not be followed to generate BY mrp20-105E MKT1
3S
haploids, because, crossing
BY mrp20-105E and BY MKT1
3s
strains failed to produce any tetrads with 4 viable spores.
Instead, we took BY MKT1
3S
strains and converted MRP20 to mrp20-105E.
A.5.9. Mitochondrial genome instability experiments
We performed petite frequency assays as described in Dimitrov et al. (24) In brief, freezer
stocks were streaked onto solid YPD media and grown for two days at 30°C. Single colonies were
then resuspended in PBS, plated across dilutions onto YPDG plates (1% yeast extract, 2% peptone,
0.1% glucose, and 3% glycerol) and grown for five days at 30°C. Plates were then imaged with
the BioRAD Gel Doc XR+ Molecular Imager at a standard size of 12.4 × 8.9 cm2 (width x length)
and imaged with epi-illumination using an exposure time of 0.5 seconds. Images were saved as
600 dpi tiffs. ImageJ (http://rsbweb.nih.gov/ij/) was used to examine growth and quantitate colony
size as described in Dimitrov et al. (24). Colonies were then classified as petite and grande using
241
a threshold defined as the maximum colony diameter of observed petites among BY and 3S wild
type strains. Petite frequency is the ratio of small colonies to total colonies.
A.5.10. Modeling growth and examining the model in additional
segregant populations
We modeled growth for mrp20-105E segregants from the Byx3S crosses fixed for mrp20-
105E and engineered at MKT1. We incorporated MKT1, the 16 detected loci and the Chromosome
II duplication in the linear model growth ~ MKT1 + locus1 + locus2 + … locus16 + Chromsome
II + error. This model was used to generate predicted growth values. We then compared our
observed growth values to these predictions. Next, we sought to determine whether loci
influencing the expressivity of mrp20-105E also affected growth in other strains. To accomplish
this, we input the genotype information for each strain into our model to obtain predictions for its
growth. We then compared the predicted values to the observed growth values and obtained
Pearson correlations when possible.
A.5.11. Relationship between detrimental alleles, growth, and inviability
At each detected locus influencing response to mrp20-105E, we determined the allele
associated with worse growth (‘detrimental allele’). Next, we counted the number of detrimental
alleles carried by each mrp20-105E segregant and examined how phenotypic response to mrp20-
105E related to it. The MKT1 and SAL1 loci were not included when counting detrimental alleles,
so that this relationship could be examined across different MKT1-SAL1 genotype classes.
242
A.6. Supplementary Materials
Supplementary Text SA.1. We detected a locus on Chromosome IV with a peak marker position from
1,277,231 to 1,277,378, and a 99% confidence interval from position 1,277,231to position 1,278,618,
encompassing the promoter and most of the coding region of MRP20.
Supplementary Text SA.2. The chromosome XIV locus that interacts with MRP20 in hos3Δ segregants
had a peak from 463,554 to 465,005, and a 99% confidence interval that extended from 457,243 to 478,701.
10 protein-coding genes were completely or partially encompassed in this confidence interval limiting
possible insight into the causal variation underlying this effect.
243
Figure SA.1. Identification of mrp20-105E. (A) Wild type and hemizygous BY/3S diploids were
generated and sporulated to produce HOS3 and hos3Δ F2 BYx3S segregants. BYx3S hos3Δ segregants
exhibited a large increase in phenotypic variability relative to wild type segregants. (B) Linkage mapping
244
using the HOS3 and hos3Δ segregants identified a single locus on Chromosome IV. The peak marker was
from 1,277,231 to 1,277,959 and the confidence interval extended from position 1,272,164 to position
1,278,407, encompassing (from left to right) part of URH1 and all of DIT2, DIT1, RPB7, and MRP20. (C)
The BY allele of the Chromosome IV locus had a large effect in hos3Δ segregants, but no effect in HOS3
segregants. (D) Recombination breakpoints in hos3Δ segregants delimited the Chromosome IV locus to
five SNPs (small vertical black lines along the x-axis) in the RPB7-MRP20 region of the chromosome.
Dashed vertical lines show the window delimited by the recombination breakpoints. One of these variants
was a spontaneous mutation in MRP20. Blue and orange respectively refer to the BY and 3S alleles of the
locus. (E) Reciprocal hemizygosity analysis in a hos3Δ BY/3S diploid was conducted at closely linked non-
essential genes and found that MRP20 is the causal gene underlying the Chromosome IV locus. In these
experiments the IVBY allele includes the mrp20-105E mutation and results in a substantial decrease in
growth. Black triangles denote the absence of one allele and colored triangles indicate the alleles that are
present. (F) The causality of mrp20-105E was validated by engineering in segregants with MRP20 (left)
and mrp20-105E (right). (G) Tetrad dissection of the original BY/3S HOS3/hos3Δ MRP20/mrp20-105E
diploid showed that increased variation was due to mrp20-105E, not hos3Δ. Throughout the paper, blue
and orange are used to denote BY and 3S genetic material, respectively. All growth data presented in the
paper are measurements of colonies on agar plates containing rich medium with ethanol as the carbon
source.
245
Figure SA.2. Representative MRP20 and mrp20-105E segregants on ethanol. Each colony is a
genetically distinct BYx3S segregant grown on ethanol. A wide range of growth phenotypes was observed
among mrp20-105E segregants, some of which were inviable in this condition.
246
Figure SA.3. Linkage mapping in the F3 panel more finely resolves the Chromosome XIV locus. The
model growth ~ locus + error was used. The genome-wide significance plot of the locus term is shown in
(A) and the relationship between genotype at the Chromosome XIV locus are shown in (B). The peak and
99% confidence interval solely included the position 467,219.
247
Figure SA.4. Growth effects of loci detected in BY x 3S mrp20-105E crosses. The relationship between
genotype is shown at each of the 16 loci detected among BYx3S mrp20-015E segregants shown in Fig. 3
B-D. Effects are shown from greatest to least effect size, left to right, top to bottom.
248
Figure SA.5. Loci affecting expressivity of mrp20-015E show minimal effects in MRP20 segregants.
Growth relative to the sum of detrimental alleles is shown for MRP20 segregants. While predictions for
MRP20 segregants correlated with observed growth (r = 0.70, p = 9.6 × 10−25), the cumulative effects of
loci differed between mrp20-105E and MRP20 segregants (ANOVA, observedGrowth ~
predictedGrowth*MRP20; interaction term p = 2.8 × 10−23). This is likely, in part, due to the fact that
wildtype segregants exhibited a narrower range of phenotypes which did not include inviable segregants.
249
A.7. References
1. J. H. Nadeau, Modifier genes in mice and humans. Nat Rev Genet 2, 165–174 (2001).
2. C. H. Chandler, S. Chari, I. Dworkin, Does your gene need a background check? How genetic
background impacts the analysis of mutations, genes, and evolution. Trends Genet 29, 358–
366 (2013).
3. J. D. Riordan, J. H. Nadeau, From peas to disease: modifier genes, network resilience, and the
genetics of health. Am J Hum Genet 101, 177–191 (2017).
4. A. J. F. Griffiths, S. R. Wessler, S. B. Carroll, J. Doebley, Introduction to genetic
analysis (Macmillian Publishers, New York, NY, ed. 12th, 2015).
5. A. Raj, S. A. Rifkin, E. Andersen, A. van Oudenaarden, Variability in gene expression
underlies incomplete penetrance. Nature 463, 913–918 (2010).
6. M. R. Wagner et al., Microbe-dependent heterosis in maize. Proc Natl Acad Sci U S
A 118 (2021).
7. R. D. Dowell et al., Genotype to phenotype: a complex problem. Science 328, 469 (2010).
8. S. Chari, I. Dworkin, The conditional nature of genetic interactions: the consequences of
wild-type backgrounds on mutational interactions in a genome-wide modifier screen. PLoS
Genet 9, e1003661 (2013).
9. C. H. Chandler, S. Chari, D. Tack, I. Dworkin, Causes and consequences of genetic background
effects illuminated by integrative genomic analysis. Genetics 196, 1321–1336 (2014).
10. A. B. Paaby et al., Wild worm embryogenesis harbors ubiquitous polygenic modifier
variation. Elife 4 (2015).
11. M. B. Taylor, I. M. Ehrenreich, Genetic interactions involving five or more genes contribute
to a complex trait in yeast. PLoS Genet 10, e1004324 (2014).
12. M. B. Taylor, I. M. Ehrenreich, Transcriptional derepression uncovers cryptic higher-order
genetic interactions. PLoS Genet 11, e1005606 (2015).
13. V. Vu et al., Natural variation in gene expression modulates the severity of mutant
phenotypes. Cell 162, 391–402 (2015).
14. M. B. Taylor, J. Phan, J. T. Lee, M. McCadden, I. M. Ehrenreich, Diverse genetic architectures
lead to the same cryptic phenotype in a yeast cross. Nat Commun 7, 11669 (2016).
15. J. T. Lee, M. B. Taylor, A. Shen, I. M. Ehrenreich, Multi-locus genotypes underlying
temperature sensitivity in a mutationally induced trait. PLoS Genet 12, e1005929 (2016).
16. C. H. Chandler et al., How well do you know your mutation? Complex effects of genetic
background on expressivity, complementation, and ordering of allelic effects. PLoS
Genet 13, e1007075 (2017).
250
17. M. N. Mullis, T. Matsui, R. Schell, R. Foree, I. M. Ehrenreich, The complex underpinnings of
genetic background effects. Nat Commun 9, 3548 (2018).
18. J. Hou, G. Tan, G. R. Fink, B. J. Andrews, C. Boone, Complex modifier landscape underlying
genetic background effects. Proc Natl Acad Sci U S A 116, 5045–5054 (2019).
19. J. T. Lee, A. L. V. Coradini, A. Shen, I. M. Ehrenreich, Layers of cryptic genetic variation
underlie a yeast complex trait. Genetics 211, 1469–1482 (2019).
20. L. Parts et al., Natural variants suppress mutations in hundreds of essential genes. Mol Syst
Biol 17, e10138 (2021).
21. M. Galardini et al., The impact of the genetic background on gene deletion phenotypes in
Saccharomyces cerevisiae. Mol Syst Biol 15, e8831 (2019).
22. K. Fearon, T. L. Mason, Structure and function of MRP20 and MRP49, the nuclear genes for
two proteins of the 54 S subunit of the yeast mitochondrial ribosome. J Biol
Chem 267, 5162–5170 (1992).
23. E. C. Koc et al., The large subunit of the mammalian mitochondrial ribosome. Analysis of the
complement of ribosomal proteins present. J Biol Chem 276, 43958–43969 (2001).
24. L. N. Dimitrov, R. B. Brem, L. Kruglyak, D. E. Gottschling, Polymorphisms in multiple genes
contribute to the spontaneous mitochondrial genome instability of Saccharomyces
cerevisiae S288C strains. Genetics 183, 365–383 (2009).
25. K. A. Lipinski, A. Kaniak-Golik, P. Golik, Maintenance and expression of the S. cerevisiae
mitochondrial genome--from genetics to evolution and systems biology. Biochim Biophys
Acta 1797, 1086–1098 (2010).
26. G. S. Shadel, Yeast as a model for human mtDNA replication. Am J Hum Genet 65, 1230–
1237 (1999).
27. B. Kucejova, L. Li, X. Wang, S. Giannattasio, X. J. Chen, Pleiotropic effects of the yeast Sal1 and
Aac2 carriers on mitochondrial function via an activity distinct from adenine nucleotide
transport. Mol Genet Genomics 280, 25–39 (2008).
28. B. Ephrussi, P. P. Slonimski, Subcellular units involved in the synthesis of respiratory
enzymes in yeast. Nature 176, 1207–1208 (1955).
29. R. Chen et al., Analysis of 589,306 genomes identifies individuals resilient to severe
Mendelian childhood diseases. Nat Biotechnol 34, 531–538 (2016).
30. V. M. Narasimhan et al., Health and population effects of rare gene knockouts in adult
humans with related parents. Science 352, 474–477 (2016).
31. O. Carlborg, C. S. Haley, Epistasis: too often neglected in complex trait studies? Nat Rev
Genet 5, 618–625 (2004).
32. H. Shao et al., Genetic architecture of complex traits: large phenotypic effects and pervasive
epistasis. Proc Natl Acad Sci U S A 105, 19910–19914 (2008).
251
33. T. F. Mackay, Epistasis and quantitative traits: using model organisms to study gene-gene
interactions. Nat Rev Genet 15, 22–33 (2014).
34. T. F. Mackay, J. H. Moore, Why epistasis is important for tackling complex human disease
genetics. Genome Med 6, 124 (2014).
35. M. L. Siegal, J. Y. Leu, On the nature and evolutionary impact of phenotypic robustness
mechanisms. Annu Rev Ecol Evol Syst 45, 496–517 (2014).
36. M. B. Taylor, I. M. Ehrenreich, Higher-order genetic interactions and their contribution to
complex traits. Trends Genet 31, 34–40 (2015).
37. S. K. Forsberg, J. S. Bloom, M. J. Sadhu, L. Kruglyak, O. Carlborg, Accounting for genetic
interactions improves modeling of individual quantitative trait phenotypes in yeast. Nat
Genet 49, 497–503 (2017).
38. I. M. Ehrenreich, Epistasis: searching for interacting genetic variants using
crosses. Genetics 206, 531–535 (2017).
39. R. F. Campbell, P. T. McGrath, A. B. Paaby, Analysis of epistasis in natural traits using model
organisms. Trends Genet 34, 883–898 (2018).
40. M. Costanzo et al., A global genetic interaction network maps a wiring diagram of cellular
function. Science 353, aaf1420 (2016).
41. E. Kuzmin et al., Systematic analysis of complex genetic
interactions. Science 360, aao1729 (2018).
42. A. H. Tong, C. Boone, Synthetic genetic array analysis in Saccharomyces cerevisiae. Methods
Mol Biol 313, 171–192 (2006).
43. H. Li, R. Durbin, Fast and accurate short read alignment with Burrows-Wheeler
transform. Bioinformatics 25, 1754–1760 (2009).
44. H. Li et al., The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–
2079 (2009).
45. L. R. Rabiner, A tutorial on hidden Markov models and selected applications in speech
recognition. Proc IEEE, 257–286 (1989).
46. T. Benaglia, D. Cahauveau, D. R. Hunter, D. S. Young, mixtools: an R package for analyzing
mixture models. Journal of Statistical Software 32, 1–29 (2009).
47. T. Matsui, I. M. Ehrenreich, Gene-environment interactions in stress response contribute
additively to a genotype-environment interaction. PLoS Genet 12, e1006158 (2016).
48. G. A. Churchill, R. W. Doerge, Empirical threshold values for quantitative trait
mapping. Genetics 138, 963–971 (1994).
49. L. M. Steinmetz et al., Dissecting the architecture of a quantitative trait locus in
yeast. Nature 416, 326–330 (2002).
252
50. R. D. Gietz, R. A. Woods, Transformation of yeast by lithium acetate/single-stranded carrier
DNA/polyethylene glycol method. Methods Enzymol 350, 87–96 (2002).
51. M. F. Laughery et al., New vectors for simple and streamlined CRISPR-Cas9 genome editing
in Saccharomyces cerevisiae. Yeast 32, 711–720 (2015).
52. K. Kannan et al., One step engineering of the small-subunit ribosomal RNA using
CRISPR/Cas9. Sci Rep 6, 30714 (2016).
53. J. R. Thompson, E. Register, J. Curotto, M. Kurtz, R. Kelly, An improved protocol for the
preparation of yeast cells for transformation by electroporation. Yeast 14, 565–571 (1998).
54. D. G. Gibson et al., Enzymatic assembly of DNA molecules up to several hundred
kilobases. Nat Methods 6, 343–345 (2009).
55. R. S. Sikorski, P. Hieter, A system of shuttle vectors and yeast host strains designed for
efficient manipulation of DNA in Saccharomyces cerevisiae. Genetics 122, 19–27 (1989).
Abstract (if available)
Abstract
Understanding how genetic variants, both individually and collectively, contribute to the heritable phenotypic variation of organisms is a central goal of modern genetics. While linkage mapping and association studies continue to identify thousands of genetic variants involved in complex traits across organisms, these variants often explain a small proportion of heritability in phenotype. This is due to several factors, including but not limited to: the complexity of many biological traits, which can involve many genetic loci; genetic interactions, which can cause loci to exhibit conditional or nonlinear effects; and technical limitations, which prevent the mapping of relevant traits or limit our ability to appropriately model a trait of interest.
The aim of this dissertation is to improve our collective understanding of the relationship between genotype and complex, polygenic phenotypes. In order to accomplish this, I have performed comprehensive dissection of the genetic basis of several quantitative traits in the budding yeast, Saccaromyces cerevisiae. In chapter two, I demonstrate how standing genetic variants interact with both each other and with genetic perturbations to influence cell growth. In chapter three, I use a diploid model to characterize the role of dominance in pairwise genetic interactions and show how certain loci exert a large influence on phenotype through genome-wide interactions. In chapter four, I employ genomic barcoding to map the genetic basis of fungal pathogenicity in mammalian model. Taken together, the research in this dissertation sheds light on the genetic basis of complex traits, with a specific emphasis on epistasis and host-pathogen interaction.
Linked assets
University of Southern California Dissertations and Theses
Conceptually similar
PDF
Understanding the genetic architecture of complex traits
PDF
Genetic architectures of phenotypic capacitance
PDF
Complex mechanisms of cryptic genetic variation
PDF
Genetic and molecular insights into the genotype-phenotype relationship
PDF
Genome-scale insights into the underlying genetics of background effects
PDF
The complex genetic and molecular basis of oxidative stress tolerance
PDF
Exploring the genetic basis of complex traits
PDF
Mapping epigenetic and epistatic components of heritability in natural population
PDF
Genome sequencing and transcriptome analysis of the phenotypically plastic spadefoot toads
PDF
Robustness and stochasticity in Drosophila development
PDF
The evolution of gene regulatory networks
PDF
Understanding genetics of traits critical to the domestication of crops using Mixed Linear Models
PDF
Genetic architecture underlying variation in different traits in the Pacific oyster Crassostrea gigas
PDF
Understanding the genetics, evolutionary history, and biomechanics of the mammalian penis bone
PDF
Cellular level bottlenecks: genetic diversity, population dynamics, and technology development
PDF
Diversity and dynamics of giant kelp “seed-bank” microbiomes: Applications for the future of seaweed farming
PDF
Modeling the minor allele frequency and linkage disequilibrium joint architectures of human diseases and complex traits
PDF
Genetic diversity and bacterial death in the context of adaptive evolution
PDF
Investigating the potential roles of three mammalian traits in female reproductive investment
PDF
Shortcomings of the genetic risk score in the analysis of disease-related quantitative traits
Asset Metadata
Creator
Mullis, Martin Nadeau
(author)
Core Title
Exploring the genetic basis of quantitative traits
School
College of Letters, Arts and Sciences
Degree
Doctor of Philosophy
Degree Program
Molecular Biology
Degree Conferral Date
2022-05
Publication Date
05/02/2022
Defense Date
09/02/2021
Publisher
University of Southern California
(original),
University of Southern California. Libraries
(digital)
Tag
barcode sequencing,dominance,epistasis,genetic background effects,genetic interaction,Genetics,host-pathogen interaction,linkage mapping,OAI-PMH Harvest,pleiotropy,quantitative genetics
Format
application/pdf
(imt)
Language
English
Contributor
Electronically uploaded by the author
(provenance)
Advisor
Ehrenreich, Ian (
committee chair
), Boedicker, James (
committee member
), Dean, Matthew (
committee member
), Finkel, Steven (
committee member
), Nuzhdin, Sergey (
committee member
)
Creator Email
martinmullis91@gmail.com,mmullis@usc.edu
Permanent Link (DOI)
https://doi.org/10.25549/usctheses-oUC111160362
Unique identifier
UC111160362
Document Type
Dissertation
Format
application/pdf (imt)
Rights
Mullis, Martin Nadeau
Type
texts
Source
20220502-usctheses-batch-936
(batch),
University of Southern California
(contributing entity),
University of Southern California Dissertations and Theses
(collection)
Access Conditions
The author retains rights to his/her dissertation, thesis or other graduate work according to U.S. copyright law. Electronic access is being provided by the USC Libraries in agreement with the author, as the original true and official version of the work, but does not grant the reader permission to use the work if the desired use is covered by copyright. It is the author, as rights holder, who must provide use permission if such use is covered by copyright. The original signature page accompanying the original submission of the work to the USC Libraries is retained by the USC Libraries and a copy of it may be obtained by authorized requesters contacting the repository e-mail address given.
Repository Name
University of Southern California Digital Library
Repository Location
USC Digital Library, University of Southern California, University Park Campus MC 2810, 3434 South Grand Avenue, 2nd Floor, Los Angeles, California 90089-2810, USA
Repository Email
cisadmin@lib.usc.edu
Tags
barcode sequencing
dominance
epistasis
genetic background effects
genetic interaction
host-pathogen interaction
linkage mapping
pleiotropy
quantitative genetics